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Preface 


It is often said that investment management 
is an art, not a science. However, since the 
early 1990s the market has witnessed a pro¬ 
gressive shift toward a more industrial view of 
the investment management process. There are 
several reasons for this change. First, with 
globalization the universe of investable assets 
has grown many times over. Asset managers 
might have to choose from among several 
thousand possible investments from around 
the globe. Second, institutional investors, of¬ 
ten together with their consultants, have en¬ 
couraged asset management firms to adopt 
an increasingly structured process with docu¬ 
mented steps and measurable results. Pressure 
from regulators and the media is another fac¬ 
tor. Finally, the sheer size of the markets makes 
it imperative to adopt safe and repeatable 
methodologies. 

In its modern sense, financial modeling is 
the design (or engineering) of financial instru¬ 
ments and portfolios of financial instruments 
that result in predetermined cash flows con¬ 
tingent upon different events. Broadly speak¬ 
ing, financial models are employed to manage 
investment portfolios and risk. The objective 
is the transfer of risk from one entity to an¬ 
other via appropriate financial arrangements. 
Though the aggregate risk is a quantity that can¬ 
not be altered, risk can be transferred if there is 
a willing counterparty. 

Financial modeling came to the forefront of 
finance in the 1980s, with the broad diffusion 


of derivative instruments. However, the con¬ 
cept and practice of financial modeling are quite 
old. The notion of the diversification of risk 
(central to modem risk management) and the 
quantification of insurance risk (a requisite for 
pricing insurance policies) were already under¬ 
stood, at least in practical terms, in the 14th cen¬ 
tury. The rich epistolary of Francesco Datini, 
a 14th-century merchant, banker, and insurer 
from Prato (Tuscany, Italy), contains detailed 
instructions to his agents on how to diversify 
risk and insure cargo. 

What is specific to modem financial model¬ 
ing is the quantitative management of risk. Both 
the pricing of contracts and the optimization of 
investments require some basic capabilities of 
statistical modeling of financial contingencies. 
It is the size, diversity, and efficiency of mod¬ 
ern competitive markets that makes the use of 
financial modeling imperative. 

This three-volume encyclopedia offers not 
only coverage of the fundamentals and ad¬ 
vances in financial modeling but provides the 
mathematical and statistical techniques needed 
to develop and test financial models, as well as 
the practical issues associated with implemen¬ 
tation. The encyclopedia offers the following 
unique features: 

• The entries for the encyclopedia were writ¬ 
ten by experts from around the world. This 
diverse collection of expertise has created the 
most definitive coverage of established and 
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cutting-edge financial models, applications, 
and tools in this ever-evolving field. 

• The series emphasizes both technical and 
managerial issues. This approach provides 
researchers, educators, students, and practi¬ 
tioners with a balanced understanding of the 
topics and the necessary background to deal 
with issues related to financial modeling. 

• Each entry follows a format that includes the 
author, entry abstract, introduction, body, list¬ 
ing of key points, notes, and references. This 
enables readers to pick and choose among 
various sections of an entry, and creates con¬ 
sistency throughout the entire encyclopedia. 

* The numerous illustrations and tables 
throughout the work highlight complex top¬ 
ics and assist further understanding. 

* Each volume includes a complete table of con¬ 
tents and index for easy access to various 
parts of the encyclopedia. 

TOPIC CATEGORIES 

As is the practice in the creation of an ency¬ 
clopedia, the topic categories are presented al¬ 
phabetically. The topic categories and a brief 
description of each topic follow. 

VOLUME I 
Asset Allocation 

A major activity in the investment management 
process is establishing policy guidelines to sat¬ 
isfy the investment objectives. Setting policy be¬ 
gins with the asset allocation decision. That is, 
a decision must be made as to how the funds 
to be invested should be distributed among the 
major asset classes (e.g., equities, fixed income, 
and alternative asset classes). The term "asset 
allocation" includes (1) policy asset allocation, 
(2) dynamic asset allocation, and (3) tactical as¬ 
set allocation. Policy asset allocation decisions 
can loosely be characterized as long-term as¬ 
set allocation decisions, in which the investor 
seeks to assess an appropriate long-term "nor¬ 
mal" asset mix that represents an ideal blend 
of controlled risk and enhanced return. In dy¬ 
namic asset allocation the asset mix (i.e., the 


allocation among the asset classes) is mechanis¬ 
tically shifted in response to changing market 
conditions. Once the policy asset allocation has 
been established, the investor can turn his or her 
attention to the possibility of active departures 
from the normal asset mix established by policy. 
If a decision to deviate from this mix is based 
upon rigorous objective measures of value, it 
is often called tactical asset allocation. The fun¬ 
damental model used in establishing the policy 
asset allocation is the mean-variance portfolio 
model formulated by Harry Markowitz in 1952, 
popularly referred to as the theory of portfolio 
selection and modern portfolio theory. 

Asset Pricing Models 

Asset pricing models seek to formalize the rela¬ 
tionship that should exist between asset returns 
and risk if investors behave in a hypothesized 
manner. At its most basic level, asset pricing 
is mainly about transforming asset payoffs into 
prices. The two most well-known asset pricing 
models are the arbitrage pricing theory and the 
capital asset pricing model. The fundamental 
theorem of asset pricing asserts the equivalence 
of three key issues in finance: (1) absence of 
arbitrage; (2) existence of a positive linear pric¬ 
ing rule; and (3) existence of an investor who 
prefers more to less and who has maximized his 
or her utility. There are two types of arbitrage 
opportunities. The first is paying nothing to¬ 
day and obtaining something in the future, and 
the second is obtaining something today and 
with no future obligations. Although the prin¬ 
ciple of absence of arbitrage is fundamental for 
understanding asset valuation in a competitive 
market, there are well-known limits to arbitrage 
resulting from restrictions imposed on rational 
traders, and, as a result, pricing inefficiencies 
may exist for a period of time. 

Bayesian Analysis and Financial 
Modeling Applications 

Financial models describe in mathematical 
terms the relationships between financial 
random variables through time and / or across 
assets. The fundamental assumption is that the 
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model relationship is valid independent of the 
time period or the asset class under consider¬ 
ation. Financial data contain both meaningful 
information and random noise. An adequate 
financial model not only extracts optimally the 
relevant information from the historical data 
but also performs well when tested with new 
data. The uncertainty brought about by the 
presence of data noise makes imperative the use 
of statistical analysis as part of the process of fi¬ 
nancial model building, model evaluation, and 
model testing. Statistical analysis is employed 
from the vantage point of either of the two main 
statistical philosophical traditions—frequentist 
and Bayesian. An important difference be¬ 
tween the two lies with the interpretation of the 
concept of probability. As the name suggests, 
advocates of the frequentist approach interpret 
the probability of an event as the limit of its 
long-run relative frequency (i.e., the frequency 
with which it occurs as the amount of data in¬ 
creases without bound). Since the time financial 
models became a mainstream tool to aid in un¬ 
derstanding financial markets and formulating 
investment strategies, the framework applied 
in finance has been the frequentist approach. 
However, strict adherence to this interpretation 
is not always possible in practice. When study¬ 
ing rare events, for instance, large samples of 
data may not be available, and in such cases 
proponents of frequentist statistics resort to 
theoretical results. The Bayesian view of the 
world is based on the subjectivist interpretation 
of probability: Probability is subjective, a de¬ 
gree of belief that is updated as information or 
data are acquired. Only in the last two decades 
has Bayesian statistics started to gain greater 
acceptance in financial modeling, despite its 
introduction about 250 years ago. It has been 
the advancements of computing power and the 
development of new computational methods 
that have fostered the growing use of Bayesian 
statistics in financial modeling. 

Bond Valuation 

The value of any financial asset is the present 
value of its expected future cash flows. To value 


a bond (also referred to as a fixed-income secu¬ 
rity), one must be able to estimate the bond's 
remaining cash flows and identify the appro¬ 
priate discount rate(s) at which to discount the 
cash flows. The traditional approach to bond 
valuation is to discount every cash flow with 
the same discount rate. Simply put, the rele¬ 
vant term structure of interest rate used in val¬ 
uation is assumed to be flat. This approach, 
however, permits opportunities for arbitrage. 
Alternatively, the arbitrage-free valuation ap¬ 
proach starts with the premise that a bond 
should be viewed as a portfolio or package 
of zero-coupon bonds. Moreover, each of the 
bond's cash flows is valued using a unique dis¬ 
count rate that depends on the term structure 
of interest rates and when in time the cash flow 
is. The relevant set of discount rates (that is, 
spot rates) is derived from an appropriate term 
structure of interest rates and when used to 
value risky bonds augmented with a suitable 
risk spread or premium. Rather than model¬ 
ing to calculate the fair value of its price, the 
market price can be taken as given so as to 
compute a yield measure or a spread measure. 
Popular yield measures are the yield to matu¬ 
rity, yield to call, yield to put, and cash flow 
yield. Nominal spread, static (or zero-volatility) 
spread, and option-adjusted spread are popu¬ 
lar relative value measures quoted in the bond 
market. Complications in bond valuation arise 
when a bond has one or more embedded op¬ 
tions such as call, put, or conversion features. 
For bonds with embedded options, the finan¬ 
cial modeling draws from options theory, more 
specifically, the use of the lattice model to value 
a bond with embedded options. 

Credit Risk Modeling 

Credit risk is a broad term used to refer to three 
types of risk: default risk, credit spread risk, and 
downgrade risk. Default risk is the risk that the 
counterparty to a transaction will fail to satisfy 
the terms of the obligation with respect to the 
timely payment of interest and repayment of 
the amount borrowed. The counterparty could 
be the issuer of a debt obligation or an entity on 
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the other side of a private transaction such as a 
derivative trade or a collateralized loan agree¬ 
ment (i.e., a repurchase agreement or a secu¬ 
rities lending agreement). The default risk of 
a counterparty is often initially gauged by the 
credit rating assigned by one of the three rat¬ 
ing companies—Standard & Poor's, Moody's 
Investors Service, and Fitch Ratings. Although 
default risk is the one that most market partici¬ 
pants think of when reference is made to credit 
risk, even in the absence of default, investors 
are concerned about the decline in the market 
value of their portfolio bond holdings due to 
a change in credit spread or the price perfor¬ 
mance of their holdings relative to a bond in¬ 
dex. This risk is due to an adverse change in 
credit spreads, referred to as credit spread risk, 
or when it is attributed solely to the downgrade 
of the credit rating of an entity, it is called down¬ 
grade risk. Financial modeling of credit risk is 
used (1) to measure, monitor, and control a port¬ 
folio's credit risk, and (2) to price credit risky 
debt instruments. There are two general cate¬ 
gories of credit risk models: structural models 
and reduced-form models. There is consider¬ 
able debate as to which type of model is the 
best to employ. 

Derivatives Valuation 

A derivative instrument is a contract whose 
value depends on some underlying asset. The 
term "derivative" is used to describe this prod¬ 
uct because its value is derived from the value 
of the underlying asset. The underlying asset, 
simply referred to as the "underlying," can be 
either a commodity, a financial instrument, or 
some reference entity such as an interest rate or 
stock index, leading to the classification of com¬ 
modity derivatives and financial derivatives. 
Although there are close conceptual relations 
between derivative instruments and cash mar¬ 
ket instruments such as debt and equity, the two 
classes of instruments are used differently: Debt 
and equity are used primarily for raising funds 
from investors, while derivatives are primarily 


used for dividing up and trading risks. More¬ 
over, debt and equity are direct claims against a 
firm's assets, while derivative instruments are 
usually claims on a third party. A derivative's 
value depends on the value of the underly¬ 
ing, but the derivative instrument itself repre¬ 
sents a claim on the "counterparty" to the trade. 
Derivatives instruments are classified in terms 
of their payoff characteristics: linear and nonlin¬ 
ear payoffs. The former, also referred to as sym¬ 
metric payoff derivatives, includes forward, 
futures, and swap contracts while the latter in¬ 
clude options. Basically, a linear payoff deriva¬ 
tive is a risk-sharing arrangement between the 
counterparties since both are sharing the risk re¬ 
garding the price of the underlying. In contrast, 
nonlinear payoff derivative instruments (also 
referred to as asymmetric payoff derivatives) 
are insurance arrangements because one party 
to the trade is willing to insure the counter¬ 
party of a minimum or maximum (depending 
on the contract) price. The amount received by 
the insuring party is referred to as the contract 
price or premium. Derivative instruments are 
used for controlling risk exposure with respect 
to the underlying. Hedging is a special case of 
risk control where a party seeks to eliminate 
the risk exposure. Derivative valuation or pric¬ 
ing is developed based on no-arbitrage price 
relations, relying on the assumption that two 
perfect substitutes must have the same price. 

VOLUME II 

Difference Equations and Differential 
Equations 

The tools of linear difference equations and 
differential equations have found many ap¬ 
plications in finance. A difference equation is 
an equation that involves differences between 
successive values of a function of a discrete 
variable. A function of such a variable is 
one that provides a rule for assigning values 
in sequences to it. The theory of linear dif¬ 
ference equations covers three areas: solving 
difference equations, describing the behavior 
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of difference equations, and identifying the 
equilibrium (or critical value) and stability 
of difference equations. Linear difference 
equations are important in the context of dy¬ 
namic econometric models. Stochastic models 
in finance are expressed as linear difference 
equations with random disturbances added. 
Understanding the behavior of solutions of 
linear difference equations helps develop 
intuition for the behavior of these models. In 
nontechnical terms, differential equations are 
equations that express a relationship between 
a function and one or more derivatives (or 
differentials) of that function. The relationship 
between difference equations and differential 
equations is that the latter are invaluable for 
modeling situations in finance where there is a 
continually changing value. The problem is that 
not all changes in value occur continuously. If 
the change in value occurs incrementally rather 
than continuously, then differential equations 
have their limitations. Instead, a financial 
modeler can use difference equations, which 
are recursively defined sequences. It would 
be difficult to overemphasize the importance 
of differential equations in financial modeling 
where they are used to express laws that govern 
the evolution of price probability distributions, 
the solution of economic variational problems 
(such as intertemporal optimization), and 
conditions for continuous hedging (such as in 
the Black-Scholes option pricing model). The 
two broad types of differential equations are 
ordinary differential equations and partial dif¬ 
ferential equations. The former are equations or 
systems of equations involving only one inde¬ 
pendent variable. Another way of saying this 
is that ordinary differential equations involve 
only total derivatives. Partial differential equa¬ 
tions are differential equations or systems of 
equations involving partial derivatives. When 
one or more of the variables is a stochastic pro¬ 
cess, we have the case of stochastic differential 
equations and the solution is also a stochastic 
process. An assumption must be made about 
what is driving noise in a stochastic differential 


equation. In most applications, it is assumed 
that the noise term follows a Gaussian random 
variable, although other types of random 
variables can be assumed. 

Equity Models and Valuation 

Traditional fundamental equity analysis in¬ 
volves the analysis of a company's opera¬ 
tions for the purpose of assessing its economic 
prospects. The analysis begins with the finan¬ 
cial statements of the company in order to in¬ 
vestigate the earnings, cash flow, profitability, 
and debt burden. The fundamental analyst will 
look at the major product lines, the economic 
outlook for the products (including existing 
and potential competitors), and the industries 
in which the company operates. The result of 
this analysis will be the growth prospects of 
earnings. Based on the growth prospects 
of earnings, a fundamental analyst attempts 
to determine the fair value of the stock using 
one or more equity valuation models. The two 
most commonly used approaches for valuing a 
firm's equity are based on discounted cash flow 
and relative valuation models. The principal 
idea underlying discounted cash flow models 
is that what an investor pays for a share of stock 
should reflect what is expected to be received 
from it—return on the investor's investment. 
What an investor receives are cash dividends 
in the future. Therefore, the value of a share of 
stock should be equal to the present value of 
all the future cash flows an investor expects to 
receive from that share. To value stock, there¬ 
fore, an investor must project future cash flows, 
which, in turn, means projecting future divi¬ 
dends. Popular discounted cash flow models in¬ 
clude the basic dividend discount model, which 
assumes a constant dividend growth, and the 
multiple-phase models, which include the two- 
stage dividend growth model and the stochas¬ 
tic dividend discount models. Relative valua¬ 
tion methods use multiples or ratios—such as 
price/earnings, price/book, or price/free cash 
flow—to determine whether a stock is trad¬ 
ing at higher or lower multiples than its peers. 
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There are two critical assumptions in using rela¬ 
tive valuation: (1) the universe of firms selected 
to be included in the peer group are in fact com¬ 
parable, and (2) the average multiple across the 
universe of firms can be treated as a reason¬ 
able approximation of "fair value" for those 
firms. This second assumption may be prob¬ 
lematic during periods of market panic or eu¬ 
phoria. Managers of quantitative equity firms 
employ techniques that allow them to identify 
attractive stock candidates, focusing not on a 
single stock as is done with traditional funda¬ 
mental analysis but rather on stock character¬ 
istics in order to explain why one stock out¬ 
performs another stock. They do so by statis¬ 
tically identifying a group of characteristics to 
create a quantitative selection model. In con¬ 
trast to the traditional fundamental stock se¬ 
lection, quantitative equity managers create a 
repeatable process that utilizes the stock selec¬ 
tion model to identify attractive stocks. Equity 
portfolio managers have used various statistical 
models for forecasting returns and risk. These 
models, referred to as predictive return models, 
make conditional forecasts of expected returns 
using the current information set. Predictive re¬ 
turn models include regressive models, linear 
autoregressive models, dynamic factor models, 
and hidden-variable models. 

Factor Models and Portfolio 
Construction 

Quantitative asset managers typically employ 
multifactor risk models for the purpose of 
constructing and rebalancing portfolios and 
analyzing portfolio performance. A multifactor 
risk model, or simply factor model, attempts to 
estimate and characterize the risk of a portfolio, 
either relative to a benchmark such as a market 
index or in absolute value. The model allows 
the decomposition of risk factors into a sys¬ 
tematic and an idiosyncratic component. The 
portfolio's risk exposure to broad risk factors 
is captured by the systematic risk. For equity 
portfolios these are typically fundamental 
factors (e.g., market capitalization and value 


vs. growth), technical (e.g., momentum), and 
industry/sector/country. For fixed-income 
portfolios, systematic risk captures a portfolio's 
exposure to broad risk factors such as the 
term structure of interest rates, credit spreads, 
optionality (call and prepayment), credit, and 
sectors. The portfolio's systematic risk depends 
not only on its exposure to these risk factors but 
also the volatility of the risk factors and how 
they correlate with each other. In contrast to 
systematic risk, idiosyncratic risk captures the 
uncertainty associated with news affecting the 
holdings of individual issuers in the portfolio. 
In equity portfolios, idiosyncratic risk can be 
easily diversified by reducing the importance 
of individual issuers in the portfolio. Because 
of the larger number of issuers in bond indexes, 
however, this is a difficult task. There are dif¬ 
ferent types of factor models depending on the 
factors. Factors can be exogenous variables or 
abstract variables formed by portfolios. Exoge¬ 
nous factors (or known factors) can be identified 
from traditional fundamental analysis or from 
economic theory that suggests macroeconomic 
factors. Abstract factors, also called unidenti¬ 
fied or latent factors, can be determined with 
the statistical tool of factor analysis or principal 
component analysis. The simplest type of 
factor models is where the factors are assumed 
to be known or observable, so that time-series 
data are those factors that can be used to 
estimate the model. The four most commonly 
used approaches for the evaluation of return 
premiums and risk characteristics to factors are 
portfolio sorts, factor models, factor portfolios, 
and information coefficients. Despite its use by 
quantitative asset managers, the basic building 
blocks of factor models used by model builders 
and by traditional fundamental analysts are 
the same: They both seek to identify the drivers 
of returns for the asset class being analyzed. 

Financial Econometrics 

Econometrics is the branch of economics that 
draws heavily on statistics for testing and 
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analyzing economic relationships. The eco¬ 
nomic equivalent of the laws of physics, 
econometrics represents the quantitative, math¬ 
ematical laws of economics. Financial econo¬ 
metrics is the econometrics of financial markets. 
It is a quest for models that describe financial 
time series such as prices, returns, interest rates, 
financial ratios, defaults, and so on. Although 
there are similarities between financial econo¬ 
metric models and models of the physical sci¬ 
ences, there are two important differences. First, 
the physical sciences aim at finding immutable 
laws of nature; econometric models model the 
economy or financial markets—artifacts subject 
to change. Because the economy and financial 
markets are artifacts subject to change, econo¬ 
metric models are not unique representations 
valid throughout time; they must adapt to the 
changing environment. Second, while basic 
physical laws are expressed as differential 
equations, financial econometrics uses both 
continuous-time and discrete-time models. 

Financial Modeling Principles 

The origins of financial modeling can be traced 
back to the development of mathematical equi¬ 
librium at the end of the nineteenth century, fol¬ 
lowed in the beginning of the twentieth century 
with the introduction of sophisticated mathe¬ 
matical tools for dealing with the uncertainty 
of prices and returns. In the 1950s and 1960s, 
financial modelers had tools for dealing with 
probabilistic models for describing markets, the 
principles of contingent claims analysis, an op¬ 
timization framework for portfolio selection 
based on mean and variance of asset returns, 
and an equilibrium model for pricing capital 
assets. The 1970s ushered in models for pricing 
contingent claims and a new model for pricing 
capital assets based on arbitrage pricing. Con¬ 
sequently, by the end of the 1970s, the frame¬ 
works for financial modeling were well known. 
It was the advancement of computing power 
and refinements of the theories to take into 
account real-world market imperfections and 


conventions starting in the 1980s that facilitated 
implementation and broader acceptance of 
mathematical modeling of financial decisions. 
The diffusion of low-cost high-performance 
computers has allowed the broad use of numer¬ 
ical methods, the landscape of financial mod¬ 
eling. The importance of finding closed-form 
solutions and the consequent search for simple 
models has been dramatically reduced. Com¬ 
putationally intensive methods such as Monte 
Carlo simulations and the numerical solution 
of differential equations are now widely used. 
As a consequence, it has become feasible to 
represent prices and returns with relatively 
complex models. Nonnormal probability dis¬ 
tributions have become commonplace in many 
sectors of financial modeling. It is fair to say 
that the key limitation of financial modeling is 
now the size of available data samples or train¬ 
ing sets, not the computations; it is the data 
that limit the complexity of estimates. Math¬ 
ematical modeling has also undergone major 
changes. Techniques such as equivalent martin¬ 
gale methods are being used in derivative pric¬ 
ing, and cointegration, the theory of fat-tailed 
processes, and state-space modeling (including 
ARCH/GARCFI and stochastic volatility mod¬ 
els) are being used in financial modeling. 

Financial Statement Analysis 

Much of the financial data that are used in 
constructing financial models for forecasting 
and valuation purposes draw from the finan¬ 
cial statements that companies are required to 
provide to investors. The four basic financial 
statements are the balance sheet, the income 
statement, the statement of cash flows, and 
the statement of shareholders' equity. It is im¬ 
portant to understand these data so that the 
information conveyed by them is interpreted 
properly in financial modeling. The financial 
statements are created using several assump¬ 
tions that affect how to use and interpret the 
financial data. The analysis of financial state¬ 
ments involves the selection, evaluation, and 
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interpretation of financial data and other per¬ 
tinent information to assist in evaluating the 
operating performance and financial condition 
of a company. The operating performance of a 
company is a measure of how well a company 
has used its resources—its assets, both tangible 
and intangible—to produce a return on its in¬ 
vestment. The financial condition of a company 
is a measure of its ability to satisfy its obliga¬ 
tions, such as the payment of interest on its 
debt in a timely manner. There are many tools 
available in the analysis of financial informa¬ 
tion. These tools include financial ratio analysis 
and cash flow analysis. Cash flows are essen¬ 
tial ingredients in valuation. Therefore, under¬ 
standing past and current cash flows may help 
in forecasting future cash flows and, hence, de¬ 
termine the value of the company. Moreover, 
understanding cash flow allows the assessment 
of the ability of a firm to maintain current divi¬ 
dends and its current capital expenditure policy 
without relying on external financing. Financial 
modelers must understand how to use these fi¬ 
nancial ratios and cash flow information in the 
most effective manner in building models. 

Finite Mathematics and Basic Functions 
for Financial Modeling 

The collection of mathematical tools that does 
not include calculus is often referred to as 
"finite mathematics." This includes matrix 
algebra, probability theory, and statistical anal¬ 
ysis. Ordinary algebra deals with operations 
such as addition and multiplication performed 
on individual numbers. In financial modeling, 
it is useful to consider operations performed on 
ordered arrays of numbers. Ordered arrays of 
numbers are called vectors and matrices while 
individual numbers are called scalars. Prob¬ 
ability theory is the mathematical approach 
to formalize the uncertainty of events. Even 
though a decision maker may not know which 
one of the set of possible events may finally 
occur, with probability theory a decision maker 
has the means of providing each event with 


a certain probability. Furthermore, it provides 
the decision maker with the axioms to compute 
the probability of a composed event in a 
unique way. The rather formal environment 
of probability theory translates in a reasonable 
manner to the problems related to risk and 
uncertainty in finance such as, for example, the 
future price of a financial asset. Today, investors 
may be aware of the price of a certain asset, but 
they cannot say for sure what value it might 
have tomorrow. To make a prudent decision, 
investors need to assess the possible scenarios 
for tomorrow's price and assign to each sce¬ 
nario a probability of occurrence. Only then can 
investors reasonably determine whether the 
financial asset satisfies an investment objective 
included within a portfolio. Probability models 
are theoretical models of the occurrence of 
uncertain events. In contrast, statistics is about 
empirical data and can be broadly defined as 
a set of methods used to make inferences from 
a known sample to a larger population that is 
in general unknown. In finance, a particular 
important example is making inferences from 
the past (the known sample) to the future 
(the unknown population). There are impor¬ 
tant mathematical functions with which the 
financial modeler should be acquainted. These 
include the continuous function, the indicator 
function, the derivative of a function, the 
monotonic function, and the integral, as well 
as special functions such as the characteristic 
function of random variables and the factorial, 
the gamma, beta, and Bessel functions. 

Liquidity and Trading Costs 

In broad terms, liquidity refers to the ability 
to execute a trade or liquidate a position with 
little or no cost or inconvenience. Liquidity de¬ 
pends on the market where a financial instru¬ 
ment is traded, the type of position traded, and 
sometimes the size and trading strategy of an 
individual trade. Liquidity risks are those as¬ 
sociated with the prospect of imperfect mar¬ 
ket liquidity and can relate to risk of loss or 
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risk to cash flows. There are two main aspects 
to liquidity risk measurement: the measure¬ 
ment of liquidity-adjusted measures of mar¬ 
ket risk and the measurement of liquidity risks 
per se. Market practitioners often assume that 
markets are liquid—that is, that they can liq¬ 
uidate or unwind positions at going market 
prices—usually taken to be the mean of bid 
and ask prices—without too much difficulty or 
cost. This assumption is very convenient and 
provides a justification for the practice of mark¬ 
ing positions to market prices. However, it is 
often empirically questionable, and the failure 
to allow for liquidity can undermine the mea¬ 
surement of market risk. Because liquidity risk 
is a major risk factor in its own right, port¬ 
folio managers and traders will need to mea¬ 
sure this risk in order to formulate effective 
portfolio and trading strategies. A consider¬ 
able amount of work has been done in the eq¬ 
uity market in estimating liquidity risk. Because 
transaction costs are incurred when buying or 
selling stocks, poorly executed trades can ad¬ 
versely impact portfolio returns and therefore 
relative performance. Transaction costs are clas¬ 
sified as explicit costs such as brokerage and 
taxes, and implicit costs, which include market 
impact cost, price movement risk, and opportu¬ 
nity cost. Broadly speaking, market impact cost 
is the price that a trader has to pay for obtain¬ 
ing liquidity in the market and is a key com¬ 
ponent of trading costs that must be modeled 
so that effective trading programs for execut¬ 
ing trades can be developed. Typical forecast¬ 
ing models for market impact costs are based 
on a statistical factor approach where the in¬ 
dependent variables are trade-based factors or 
asset-based factors. 

VOLUME III 

Model Risk and Selection 

Model risk is the risk of error in pricing or 
risk-forecasting models. In practice, model risk 
arises because (1) any model involves simpli¬ 


fication and calibration, and both of these re¬ 
quire subjective judgments that are prone to er¬ 
ror, and/or (2) a model is used inappropriately. 
Although model risk cannot be avoided, there 
are many ways in which financial modelers can 
manage this risk. These include (1) recogniz¬ 
ing model risk, (2) identifying, evaluating, and 
checking the model's key assumption, (3) se¬ 
lecting the simplest reasonable model, (4) resist¬ 
ing the temptation to ignore small discrepancies 
in results, (5) testing the model against known 
problems, (6) plotting results and employing 
nonparametric statistics, (7) back-testing and 
stress-testing the model, (8) estimating model 
risk quantitatively, and (9) reevaluating mod¬ 
els periodically. In financial modeling, model 
selection requires a blend of theory, creativity, 
and machine learning. The machine-learning 
approach starts with a set of empirical data that 
the financial modeler wants to explain. Data are 
explained by a family of models that include 
an unbounded number of parameters and are 
able to fit data with arbitrary precision. There 
is a trade-off between model complexity and 
the size of the data sample. To implement this 
trade-off, ensuring that models have forecast¬ 
ing power, the fitting of sample data is con¬ 
strained to avoid fitting noise. Constraints are 
embodied in criteria such as the Akaike infor¬ 
mation criterion or the Bayesian information 
criterion. Economic and financial data are gen¬ 
erally scarce given the complexity of their pat¬ 
terns. This scarcity introduces uncertainty as 
regards statistical estimates obtained by the fi¬ 
nancial modeler. It means that the data might 
be compatible with many different models with 
the same level of statistical confidence. Methods 
of probabilistic decision theory can be used to 
deal with model risk due to uncertainty regard¬ 
ing the model's parameters. Probabilistic deci¬ 
sion making starts from the Bayesian inference 
process and involves computer simulations in 
all realistic situations. Since a risk model is typi¬ 
cally a combination of a probability distribution 
model and a risk measure, a critical assump¬ 
tion is the probability distribution assumed for 
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the random variable of interest. Too often, the 
Gaussian distribution is the model of choice. 
Empirical evidence supports the use of proba¬ 
bility distributions that exhibit fat tails such as 
the Student's t distribution and its asymmetric 
version and the Pareto stable class of distribu¬ 
tions and their tempered extensions. Extreme 
value theory offers another approach for risk 
modeling. 

Mortgage-Backed Securities Analysis 
and Valuation 

Mortgage-backed securities are fixed-income 
securities backed by a pool of mortgage loans. 
Residential mortgage-backed securities (RMBS) 
are backed by a pool of residential mortgage 
loans (one-to-four family dwellings). The RMBS 
market includes agency RMBS and nonagency 
RMBS. The former are securities issued by 
the Government National Mortgage Associa¬ 
tion (Ginnie Mae), Fannie Mae, and Freddie 
Mac. Agency RMBS include passthrough secu¬ 
rities, collateralized mortgage obligations, and 
stripped mortgage-backed securities (interest- 
only and principal-only securities). The valua¬ 
tion of RMBS is complicated due to prepayment 
risk, a form of call risk. In contrast, nonagency 
RMBS are issued by private entities, have no 
implicit or explicit government guarantee, and 
therefore require one or more forms of credit 
enhancement in order to be assigned a credit 
rating. The analysis of nonagency RMBS must 
take into account both prepayment risk and 
credit risk. The most commonly used method 
for valuing RMBS is the Monte Carlo method, 
although other methods have garnered favor, 
in particular the decomposition method. The 
analysis of RMBS requires an understanding of 
the factors that impact prepayments. 

Operational Risk 

Operational risk has been regarded as a mere 
part of a financial institution's "other" risks. 
However, failures of major financial entities 


have made regulators and investors aware of 
the importance of this risk. In general terms, 
operational risk is the risk of loss resulting from 
inadequate or failed internal processes, people, 
or systems or from external events. This risk 
encompasses legal risks, which includes, but is 
not limited to, exposure to fines, penalties, or 
punitive damages resulting from supervisory 
actions, as well as private settlements. Opera¬ 
tional risk can be classified according to several 
principles: nature of the loss (internally inflicted 
or externally inflicted), direct losses or indirect 
losses, degree of expectancy (expected or unex¬ 
pected), risk type, event type or loss type, and 
by the magnitude (or severity) of loss and the 
frequency of loss. Operational risk can be the 
cause of reputational risk, a risk that can occur 
when the market reaction to an operational loss 
event results in reduction in the market value 
of a financial institution that is greater than the 
amount of the initial loss. The two principal 
approaches in modeling operational loss dis¬ 
tributions are the nonparametric approach and 
the parametric approach. It is important to em¬ 
ploy a model that captures tail events, and for 
this reason in operational risk modeling, dis¬ 
tributions that are characterized as light-tailed 
distributions should be used with caution. The 
models that have been proposed for assessing 
operational risk can be broadly classified into 
top-down models and bottom-up models. Top- 
down models quantify operational risk without 
attempting to identify the events or causes of 
losses. Bottom-up models quantify operational 
risk on a micro level, being based on identified 
internal events. The obstacle hindering the im¬ 
plementation of these models is the scarcity of 
available historical operational loss data. 

Optimization Tools 

Optimization is an area in applied mathematics 
that, most generally, deals with efficient algo¬ 
rithms for finding an optimal solution among 
a set of solutions that satisfy given constraints. 
Mathematical programming, a management 
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science tool that uses mathematical opti¬ 
mization models to assist in decision making, 
includes linear programming, integer program¬ 
ming, mixed-integer programming, nonlinear 
programming, stochastic programming, and 
goal programming. Unlike other mathematical 
tools that are available to decision makers such 
as statistical models (which tell the decision 
maker what occurred in the past), forecasting 
models (which tell the decision maker what 
might happen in the future), and simulation 
models (which tell the decision maker what 
will happen under different conditions), 
mathematical programming models allow the 
decision maker to identify the "best" solution. 
Markowitz's mean-variance model for port¬ 
folio selection is an example of an application 
of one type of mathematical programming 
(quadratic programming). Traditional opti¬ 
mization modeling assumes that the inputs 
to the algorithms are certain, but there are 
also branches of optimization such as robust 
optimization that study the optimal decision 
under uncertainty about the parameters of the 
problem. Stochastic programming deals with 
both the uncertainty about the parameters and 
a multiperiod decision-making framework. 

Probability Distributions 

In financial models where the outcome of 
interest is a random variable, an assumption 
must be made about the random variable's 
probability distribution. There are two types 
of probability distributions: discrete and 
continuous. Discrete probability distributions 
are needed whenever the random variable is 
to describe a quantity that can assume values 
from a countable set, either finite or infinite. 
A discrete probability distribution (or law) is 
quite intuitive in that it assigns certain values, 
positive probabilities, adding up to one, while 
any other value automatically has zero proba¬ 
bility. Continuous probability distributions are 
needed when the random variable of interest 
can assume any value inside of one or more 


intervals of real numbers such as, for example, 
any number greater than zero. Asset returns, 
for example, whether measured monthly, 
weekly, daily, or at an even higher frequency 
are commonly modeled as continuous random 
variables. In contrast to discrete probability 
distributions that assign positive probability to 
certain discrete values, continuous probability 
distributions assign zero probability to any sin¬ 
gle real number. Instead, only entire intervals of 
real numbers can have positive probability such 
as, for example, the event that some asset return 
is not negative. For each continuous probabil¬ 
ity distribution, this necessitates the so-called 
probability density, a function that determines 
how the entire probability mass of one is dis¬ 
tributed. The density often serves as the proxy 
for the respective probability distribution. To 
model the behavior of certain financial assets in 
a stochastic environment, a financial modeler 
can usually resort to a variety of theoretical 
distributions. Most commonly, probability dis¬ 
tributions are selected that are analytically well 
known. For example, the normal distribution (a 
continuous distribution)—also called the Gaus¬ 
sian distribution—is often the distribution of 
choice when asset returns are modeled. Or the 
exponential distribution is applied to charac¬ 
terize the randomness of the time between two 
successive defaults of firms in a bond portfolio. 
Many other distributions are related to them or 
built on them in a well-known manner. These 
distributions often display pleasant features 
such as stability under summation—meaning 
that the return of a portfolio of assets whose 
returns follow a certain distribution again 
follows the same distribution. Flowever, one 
has to be careful using these distributions since 
their advantage of mathematical tractability 
is often outweighed by the fact that the 
stochastic behavior of the true asset returns 
is not well captured by these distributions. 
For example, although the normal distribution 
generally renders modeling easy because all 
moments of the distribution exist, it fails to 
reflect stylized facts commonly encountered in 
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asset returns—namely, the possibility of very 
extreme movements and skewness. To remedy 
this shortcoming, probability distributions 
accounting for such extreme price changes 
have become increasingly popular. Some of 
these distributions concentrate exclusively on 
the extreme values while others permit any real 
number, but in a way capable of reflecting mar¬ 
ket behavior. Consequently, a financial modeler 
has available a great selection of probability 
distributions to realistically reproduce asset 
price changes. Their common shortcoming is 
generally that they are mathematically difficult 
to handle. 

Risk Measures 

The standard assumption in financial models is 
that the distribution for the return on financial 
assets follows a normal (or Gaussian) distri¬ 
bution and therefore the standard deviation 
(or variance) is an appropriate measure of risk 
in the portfolio selection process. This is the 
risk measure that is used in the well-known 
Markowitz portfolio selection model (that is, 
mean-variance model), which is the foundation 
for modern portfolio theory. Mounting evi¬ 
dence since the early 1960s strongly suggests 
that return distributions do not follow a normal 
distribution, but instead exhibit heavy tails 
and, possibly, skewness. The "tails" of the dis¬ 
tribution are where the extreme values occur, 
and these extreme values are more likely than 
would be predicted by the normal distribution. 
This means that between periods where the 
market exhibits relatively modest changes in 
prices and returns, there will be periods where 
there are changes that are much higher (that 
is, crashes and booms) than predicted by the 
normal distribution. This is of major concern to 
financial modelers in seeking to generate prob¬ 
ability estimates for financial risk assessment. 
To more effectively implement portfolio se¬ 
lection, researchers have proposed alternative 
risk measures. These risk measures fall into 


two disjointed categories: dispersion measures 
and safety-first measures. Dispersion measures 
include mean standard deviation, mean abso¬ 
lute deviation, mean absolute moment, index 
of dissimilarity, mean entropy, and mean colog. 
Safety-first risk measures include classical 
safety first, value-at-risk, average value-at-risk, 
expected tail loss, MiniMax, lower partial 
moment, downside risk, probability-weighted 
function of deviations below a specified target 
return, and power conditional value-at-risk. 
Despite these alternative risk measures, the 
most popular risk measure used in financial 
modeling is volatility as measured by the 
standard deviation. There are different types 
of volatility: historical, implied volatility, 
level-dependent volatility, local volatility, 
and stochastic volatility (e.g., jump-diffusion 
volatility). There are risk measures commonly 
used for bond portfolio management. These 
measures include duration, convexity, key rate 
duration, and spread duration. 

Software for Financial Modeling 

The development of financial models requires 
the modeler to be familiar with spreadsheets 
such as Microsoft Excel and/or a platform to 
implement concepts and algorithms such as 
the Palisade Decision Tools Suite and other 
Excel-based software (mostly @RISK1, Solver2, 
VBA3), and MATLAB. Financial modelers can 
choose one or the other, depending on their 
level of familiarity and comfort with spread¬ 
sheet programs and their add-ins versus pro¬ 
gramming environments such as MATLAB. 
Some tasks and implementations are easier in 
one environment than in the other. MATLAB 
is a modeling environment that allows for in¬ 
put and output processing, statistical analysis, 
simulation, and other types of model build¬ 
ing for the purpose of analysis of a situa¬ 
tion. MATLAB uses a number-array-oriented 
programming language, that is, a program¬ 
ming language in which vectors and matrices 
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are the basic data structures. Reliable built-in 
functions, a wide range of specialized tool¬ 
boxes, easy interface with widespread software 
like Microsoft Excel, and beautiful graphing ca¬ 
pabilities for data visualization make imple¬ 
mentation with MATLAB efficient and useful 
for the financial modeler. Visual Basic for Appli¬ 
cations (VBA) is a programming language en¬ 
vironment that allows Microsoft Excel users to 
automate tasks, create their own functions, per¬ 
form complex calculations, and interact with 
spreadsheets. VBA shares many of the same 
concepts as object-oriented programming lan¬ 
guages. Despite some important limitations, 
VBA does add useful capabilities to spreadsheet 
modeling, and it is a good tool to know because 
Excel is the platform of choice for many finance 
professionals. 

Stochastic Processes and Tools 

Stochastic integration provides a coherent way 
to represent that instantaneous uncertainty (or 
volatility) cumulates over time. It is thus fun¬ 
damental to the representation of financial pro¬ 
cesses such as interest rates, security prices, or 
cash flows. Stochastic integration operates on 
stochastic processes and produces random vari¬ 
ables or other stochastic processes. Stochastic 
integration is a process defined on each path as 
the limit of a sum. However, these sums are dif¬ 
ferent from the sums of the Riemann-Lebesgue 
integrals because the paths of stochastic pro¬ 
cesses are generally not of bounded variation. 
Stochastic integrals in the sense of Ito are de¬ 
fined through a process of approximation by 
(1) defining Brownian motion, which is the con¬ 
tinuous limit of a random walk, (2) defining 
stochastic integrals for elementary functions as 
the sums of the products of the elementary 
functions multiplied by the increments of the 
Brownian motion, and (3) extending this defi¬ 
nition to any function through approximating 
sequences. The major application of integra¬ 
tion to financial modeling involves stochastic 


integrals. An understanding of stochastic in¬ 
tegrals is needed to understand an important 
tool in contingent claims valuation: stochastic 
differential equations. The dynamic of finan¬ 
cial asset returns and prices can be expressed 
using a deterministic process if there is no un¬ 
certainty about its future behavior, or, with a 
stochastic process, in the more likely case when 
the value is uncertain. Stochastic processes in 
continuous time are the most used tool to ex¬ 
plain the dynamic of financial assets returns 
and prices. They are the building blocks to con¬ 
struct financial models for portfolio optimiza¬ 
tion, derivatives pricing, and risk management. 
Continuous-time processes allow for more ele¬ 
gant theoretical modeling compared to discrete 
time models, and many results proven in prob¬ 
ability theory can be applied to obtain a simple 
evaluation method. 


Statistics 

Probability models are theoretical models of 
the occurrence of uncertain events. In contrast, 
statistics is about empirical data and can be 
broadly defined as a set of methods used to 
make inferences from a known sample to a 
larger population that is in general unknown. In 
finance, a particular important example is mak¬ 
ing inferences from the past (the known sam¬ 
ple) to the future (the unknown population). In 
statistics, probabilistic models are applied us¬ 
ing data so as to estimate the parameters of 
these models. It is not assumed that all param¬ 
eter values in the model are known. Instead, 
the data for the variables in the model to esti¬ 
mate the value of the parameters are used and 
then applied to test hypotheses or make infer¬ 
ences about their estimated values. In financial 
modeling, the statistical technique of regression 
models is the workhorse. However, because re¬ 
gression models are part of the field of financial 
econometrics, this topic is covered in that topic 
category. Understanding dependences or func¬ 
tional links between variables is a key theme in 
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financial modeling. In general terms, functional 
dependencies are represented by dynamic 
models. Many important models are linear 
models whose coefficients are correlation coeffi¬ 
cients. In many instances in financial modeling, 
it is important to arrive at a quantitative mea¬ 
sure of the strength of dependencies. The cor¬ 
relation coefficient provides such a measure. In 
many instances, however, the correlation coef¬ 
ficient might be misleading. In particular, there 
are cases of nonlinear dependencies that result 
in a zero correlation coefficient. From the point 
of view of financial modeling, this situation is 
particularly dangerous as it leads to substan¬ 
tially underestimated risk. Different measures 
of dependence have been proposed, in partic¬ 
ular copula functions. The copula overcomes 
the drawbacks of the correlation as a measure 
of dependency by allowing for a more general 
measure than linear dependence, allowing for 
the modeling of dependence for extreme events, 
and being indifferent to continuously increas¬ 
ing transformations. Another essential tool in 
financial modeling, because it allows the incor¬ 
poration of uncertainty in financial models and 
consideration of additional layers of complex¬ 
ity that are difficult to incorporate in analytical 
models, is Monte Carlo simulation. The main 
idea of Monte Carlo simulation is to represent 
the uncertainty in market variables through sce¬ 
narios, and to evaluate parameters of interest 
that depend on these market variables in com¬ 
plex ways. The advantage of such an approach 
is that it can easily capture the dynamics of un¬ 
derlying processes and the otherwise complex 
effects of interactions among market variables. 
A substantial amount of research in recent years 
has been dedicated to making scenario genera¬ 
tion more accurate and efficient, and a number 
of sophisticated computational techniques are 
now available to the financial modeler. 

Term Structure Modeling 

The arbitrage-free valuation approach to the 
valuation of option-free bonds, bonds with em¬ 


bedded options, and option-type derivative in¬ 
struments requires that a financial instrument 
be viewed as a package of zero-coupon bonds. 
Consequently, in financial modeling, it is essen¬ 
tial to be able to discount each expected cash 
flow by the appropriate interest rate. That rate 
is referred to as the spot rate. The term struc¬ 
ture of interest rates provides the relationship 
between spot rates and maturity. Because of its 
role in valuation of cash bonds and option-type 
derivatives, the estimation of the term struc¬ 
ture of interest rates is of critical importance as 
an input into a financial model. In addition to 
its role in valuation modeling, term structure 
models are fundamental to expressing value, 
risk, and establishing relative value across the 
spectrum of instruments found in the various 
interest-rate or bond markets. The term struc¬ 
ture is most often specified for a specific market 
such as the U.S. Treasury market, the bond mar¬ 
ket for double-A rated financial institutions, 
the interest rate market for LIBOR, and swaps. 
Static models of the term structure are char¬ 
acterizations that are devoted to relationships 
based on a given market and do not serve future 
scenarios where there is uncertainty. Standard 
static models include those known as the spot 
yield curve, discount function, par yield curve, 
and the implied forward curve. Instantiations of 
these models may be found in both a discrete- 
and continuous-time framework. An important 
consideration is establishing how these term 
structure models are constructed and how to 
transform one model into another. In model¬ 
ing the behavior of interest rates, stochastic dif¬ 
ferential equations (SDEs) are commonly used. 
The SDEs used to model interest rates must cap¬ 
ture the market properties of interest rates such 
as mean reversion and/or a volatility that de¬ 
pends on the level of interest rates. For a one- 
factor model, the SDE is used to model the 
behavior of the short-term rate, referred to as 
simply the "short rate." The addition of another 
factor (i.e., a two-factor model) involves extend¬ 
ing the SDE to represent the behavior of the 
short rate and a long-term rate (i.e., long rate). 
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The entries can serve as material for a wide 
spectrum of courses, such as the following: 

• Financial engineering 

• Financial mathematics 

• Financial econometrics 

• Statistics with applications in finance 


* Quantitative asset management 

* Asset and derivative pricing 

* Risk management 

Frank J. Fabozzi 
Editor, Encyclopedia of Financial Models 



Guide to the Encyclopedia of 
Financial Models 


The Encyclopedia of Financial Models provides 
comprehensive coverage of the field of finan¬ 
cial modeling. This reference work consists of 
three separate volumes and 127 entries. Each 
entry provides coverage of the selected topic 
intended to inform a broad spectrum of read¬ 
ers ranging from finance professionals to aca¬ 
demicians to students to fiduciaries. To derive 
the greatest possible benefit from the Encyclo¬ 
pedia of Financial Models, we have provided this 
guide. It explains how the information within 
the encyclopedia can be located. 

ORGANIZATION 

The Encyclopedia of Financial Models is organized 
to provide maximum ease of use for its readers. 

Table of Contents 

A complete table of contents for the entire en¬ 
cyclopedia appears in the front of each volume. 
This list of titles represents topics that have been 
carefully selected by the editor, Frank J. Fabozzi. 
The Preface includes a more detailed descrip¬ 
tion of the volumes and the topic categories that 
the entries are grouped under. 

Index 

A Subject Index for the entire encyclopedia is 
located at the end of each volume. The sub¬ 


jects in the index are listed alphabetically and 
indicate the volume and page number where 
information on this topic can be found. 

Entries 

Each entry in the Encyclopedia of Financial Mod¬ 
els begins on a new page, so that the reader may 
quickly locate it. The author's name and affilia¬ 
tion are displayed at the beginning of the entry. 
All entries in the encyclopedia are organized 
according to a standard format, as follows: 

• Title and author 

• Abstract 

• Introduction 

• Body 

• Key points 

• Notes 

• References 

Abstract 

The abstract for each entry gives an overview of 
the topic, but not necessarily the content of the 
entry. This is designed to put the topic in the 
context of the entire Encyclopedia, rather than 
give an overview of the specific entry content. 

Introduction 

The text of each entry begins with an intro¬ 
ductory section that defines the topic under 
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discussion and summarizes the content. By 
reading this section, the reader gets a general 
idea about the content of a specific entry. 

Body 

The body of each entry explains the purpose, 
theory, and math behind each model. 

Key Points 

The key points section provides in bullet point 
format a review of the materials discussed in 


each entry. It imparts to the reader the most 
important issues and concepts discussed. 

Notes 

The notes provide more detailed information 
and citations of further readings. 

References 

The references section lists the publications 
cited in the entry. 
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Mean-Variance Model for 
Portfolio Selection 


FRANK J. FABOZZI, PhD, CFA, CPA 

Professor of Finance, EDHEC Business School 

HARRY M. MARKOWITZ, PhD 

Consultant 

PETTER N. KOLM, PhD 

Director of the Mathematics in Finance M.S. Program and Clinical Associate Professor, 
Courant Institute of Mathematical Sciences, New York University 

FRANCIS GUPTA, PhD 

Director, Index Research & Design, Dow Jones Indexes 


Abstract: The theory of portfolio selection together with capital asset pricing theory provides the 
foundation and the building blocks for the management of portfolios. The goal of portfolio selec¬ 
tion is the construction of portfolios that maximize expected returns consistent with individually 
acceptable levels of risk. Using both historical data and investor expectations of future returns, 
portfolio selection uses modeling techniques to quantify expected portfolio returns and acceptable 
levels of portfolio risk and provides methods to select an optimal portfolio. 


The theory of portfolio selection presented in 
this entry, often referred to as mean-variance port¬ 
folio analysis or simply mean-variance analysis, 
is a normative theory. A normative theory is one 
that describes a standard or norm of behavior 
that investors should pursue in constructing a 
portfolio rather than a prediction concerning 
actual behavior. 

Asset pricing theory goes on to formalize 
the relationship that should exist between as¬ 
set returns and risk if investors behave in a hy¬ 
pothesized manner. In contrast to a normative 


theory, asset pricing theory is a positive 
theory—a theory that hypothesizes how in¬ 
vestors behave rather than how investors 
should behave. Based on that hypothesized be¬ 
havior of investors, a model that provides the 
expected return (a key input for constructing 
portfolios based on mean-variance analysis) is 
derived and is called an asset pricing model. 

Together, portfolio selection theory and asset 
pricing theory provide a framework to specify 
and measure investment risk and to develop re¬ 
lationships between expected asset return and 
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risk (and hence between risk and required re¬ 
turn on an investment). However, it is critically 
important to understand that portfolio selection 
is a theory that is independent of any theories 
about asset pricing. The validity of portfolio se¬ 
lection theory does not rest on the validity of 
asset pricing theory. 

It would not be an overstatement to say that 
modern portfolio theory has revolutionized the 
world of investment management. Allowing 
managers to quantify the investment risk and 
expected return of a portfolio has provided the 
scientific and objective complement to the sub¬ 
jective art of investment management. More 
importantly, whereas at one time the focus of 
portfolio management used to be the risk of in¬ 
dividual assets, the theory of portfolio selection 
has shifted the focus to the risk of the entire 
portfolio. This theory shows that it is possible 
to combine risky assets and produce a port¬ 
folio whose expected return reflects its com¬ 
ponents, but with considerably lower risk. In 
other words, it is possible to construct a portfo¬ 
lio whose risk is smaller than the sum of all its 
individual parts! 

Though practitioners realized that the risks of 
individual assets were related, before modern 
portfolio theory, they were unable to formalize 
how combining these assets into a portfolio im¬ 
pacted the risk at the entire portfolio level, or 
how the addition of a new asset would change 
the return-risk characteristics of the portfolio. 
This is because practitioners were unable to 
quantify the returns and risks of their invest¬ 
ments. Furthermore, in the context of the entire 
portfolio, they were also unable to formalize the 
interaction of the returns and risks across as¬ 
set classes and individual assets. The failure to 
quantify these important measures and formal¬ 
ize these important relationships made the goal 
of constructing an optimal portfolio highly sub¬ 
jective and provided no insight into the return 
investors could expect and the risk they were 
undertaking. The other drawback before the 
advent of the theory of portfolio selection and 
asset pricing theory was that there was no mea¬ 


surement tool available to investors for judging 
the performance of their investment managers. 


SOME BASIC CONCEPTS 

Portfolio theory draws on concepts from two 
fields: financial economic theory and probabil¬ 
ity and statistical theory. This section presents 
the concepts from financial economic theory 
used in portfolio theory. While many of the con¬ 
cepts presented here have a more technical or 
rigorous definition, the purpose is to keep the 
explanations simple and intuitive so that the 
importance and contribution of these concepts 
to the development of modern portfolio theory 
can be appreciated. 

Utility Function and 
Indifference Curves 

There are many situations where entities (i.e., 
individuals and firms) face two or more choices. 
The economic "theory of choice" uses the con¬ 
cept of a utility function to describe the way 
entities make decisions when faced with a set 
of choices. A utility function assigns a (numeric) 
value to all possible choices faced by the entity. 
The higher the value of a particular choice, the 
greater the utility derived from that choice. The 
choice that is selected is the one that results in 
the maximum utility given a set of constraints 
faced by the entity. 

In portfolio theory too, entities are faced with 
a set of choices. Different portfolios have dif¬ 
ferent levels of expected return and risk. Typi¬ 
cally, the higher the level of expected return, the 
larger the risk. Entities are faced with the deci¬ 
sion of choosing a portfolio from the set of all 
possible risk-return combinations, where when 
they like return, they dislike risk. Therefore, 
entities obtain different levels of utility from 
different risk-return combinations. The utility 
obtained from any possible risk-return com¬ 
bination is expressed by the utility function. 
Put simply, the utility function expresses the 
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Figure 1 Indifference Curves 

preferences of entities over perceived risk and 
expected return combinations. 

A utility function can be expressed in graph¬ 
ical form by a set of indifference curves. Fig¬ 
ure 1 shows indifference curves labeled ii\, 112 , 
and M 3 . By convention, the horizontal axis mea¬ 
sures risk and the vertical axis measures ex¬ 
pected return. Each curve represents a set of 
portfolios with different combinations of risk 
and return. All the points on a given indiffer¬ 
ence curve indicate combinations of risk and 
expected return that will give the same level 
of utility to a given investor. For example, on 
utility curve U\, there are two points u and 
with u having a higher expected return than 
but also having a higher risk. Because the two 
points lie on the same indifference curve, the 
investor has an equal preference for (or is indif¬ 
ferent to) the two points, or, for that matter, any 
point on the curve. The (positive) slope of an in¬ 
difference curve reflects the fact that, to obtain 
the same level of utility, the investor requires a 
higher expected return in order to accept higher 
risk. 

For the three indifference curves shown in 
Figure 1, the utility the investor receives is 
greater the further the indifference curve is from 
the horizontal axis because that curve repre¬ 
sents a higher level of return at every level 


of risk. Thus, for the three indifference curves 
shown in the figure, M 3 has the highest utility 
and Mi the lowest. 

The Set of Efficient Portfolios and 
the Optimal Portfolio 

Portfolios that provide the largest possible 
expected return for given levels of risk are 
called efficient portfolios. To construct an effi¬ 
cient portfolio, it is necessary to make some 
assumption about how investors behave when 
making investment decisions. One reasonable 
assumption is that investors are risk averse. A 
risk-averse investor is an investor who, when 
faced with choosing between two investments 
with the same expected return but two different 
risks, prefers the one with the lower risk. 

In selecting portfolios, an investor seeks to 
maximize the expected portfolio return given 
his tolerance for risk. (Alternatively stated, an 
investor seeks to minimize the risk that he is 
exposed to given some target expected return.) 
Given a choice from the set of efficient portfo¬ 
lios, an optimal portfolio is the one that is most 
preferred by the investor. 

Risky Assets vs. Risk-Free Assets 

A risky asset is one for which the return that 
will be realized in the future is uncertain. For 
example, an investor who purchases the stock 
of Pfizer Corporation today with the intention 
of holding it for some finite time does not know 
what return will be realized at the end of the 
holding period. The return will depend on the 
price of Pfizer's stock at the time of sale and on 
the dividends that the company pays during the 
holding period. Thus, Pfizer stock, and indeed 
the stock of all companies, is a risky asset. 

Securities issued by the U.S. government are 
also risky. For example, an investor who pur¬ 
chases a U.S. government bond that matures in 
30 years does not know the return that will be 
realized if this bond is held for only one year. 
This is because changes in interest rates in that 
year will affect the price of the bond one year 
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from now and that will impact the return on the 
bond over that year. 

There are assets, however, for which the re¬ 
turn that will be realized in the future is known 
with certainty today Such assets are referred to 
as risk-free or riskless assets. The risk-free asset 
is commonly defined as a short-term obligation 
of the U.S. government. For example, if an in¬ 
vestor buys a U.S. government security that ma¬ 
tures in one year and plans to hold that security 
for one year, then there is no uncertainty about 
the return that will be realized. The investor 
knows that in one year, the maturity date of 
the security, the government will pay a specific 
amount to retire the debt. Notice how this sit¬ 
uation differs for the U.S. government security 
that matures in 30 years. While the 1-year and 
the 30-year securities are obligations of the U.S. 
government, the former matures in one year 
so that there is no uncertainty about the return 
that will be realized. In contrast, while the in¬ 
vestor knows what the government will pay at 
the end of 30 years for the 30-year bond, he does 
not know what the price of the bond will be one 
year from now. 

MEASURING A PORTFOLIO'S 
EXPECTED RETURN 

We are now ready to define the actual and ex¬ 
pected return of a risky asset and a portfolio of 
risky assets. 

Measuring Single-Period 
Portfolio Return 

The actual return on a portfolio of assets over 
some specific time period is straightforward to 
calculate using the formula: 

R p = w 1 R 1 + w 2 R2 + ■ ■ ■ + w g R g (1) 
where 

Rp= rate of return on the portfolio over the 
period 

R,, = rate of return on asset g over the period 


w g= weight of asset g in the portfolio (i.e., mar¬ 
ket value of asset g as a proportion of the 
market value of the total portfolio) at the be¬ 
ginning of the period 
G = number of assets in the portfolio 

In shorthand notation, equation (1) can be ex¬ 
pressed as follows: 

G 

R P = J2 W 8 R 8 W 

g =1 

Equation (2) states that the return on a port¬ 
folio (R p ) of G assets is equal to the sum over all 
individual assets' weights in the portfolio times 
their respective return. The portfolio return R p 
is sometimes called the holding period return 
or the ex post return. 

For example, consider the following portfolio 
consisting of three assets: 



Market Value at 

Rate of Return 


the Beginning of 

over Holding 

Asset 

Holding Period 

Period 

1 

$6 million 

12% 

2 

$8 million 

10% 

3 

$11 million 

5% 


The portfolio's total market value at the be¬ 
ginning of the holding period is $25 million. 
Therefore, 

w\ = $6 million/$25 million=0.24, or 24% and Ri = 12% 
W 2 = $8 million/$25 million=0.32, or 32% and R 2 = 10% 
H7 3 = $ll million/$25 million = 0.44, or 44% and R 3 = 5% 

Notice that the sum of the weights is equal to 1. 
Substituting into equation (1), we get the hold¬ 
ing period portfolio return, 

R P = 0.24(12%) + 0.32(10%) + 0.44(5%) = 8.28% 

The Expected Return of a Portfolio 
of Risky Assets 

Equation (1) shows how to calculate the actual 
return of a portfolio over some specific time pe¬ 
riod. In portfolio management, the investor also 
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wants to know the expected (or anticipated) re¬ 
turn from a portfolio of risky assets. The ex¬ 
pected portfolio return is the weighted average 
of the expected return of each asset in the portfo¬ 
lio. The weight assigned to the expected return 
of each asset is the percentage of the market 
value of the asset to the total market value of 
the portfolio. That is, 

E(Rp) = w\E(Ri) + u>2E(R2 ) + ... + wgE(Rg) 

( 3 ) 

The E() signifies expectations, and E(R P ) is 
sometimes called the ex ante return, or the ex¬ 
pected portfolio return over some specific time 
period. 

The expected return, E(R,), on a risky asset i 
is calculated as follows. First, a probability dis¬ 
tribution for the possible rates of return that 
can be realized must be specified. A probability 
distribution is a function that assigns a proba¬ 
bility of occurrence to all possible outcomes for 
a random variable. Given the probability distri¬ 
bution, the expected value of a random variable 
is simply the weighted average of the possible 
outcomes, where the weight is the probability 
associated with the possible outcome. 

In our case, the random variable is the un¬ 
certain return of asset i. Having specified a 
probability distribution for the possible rates of 
return, the expected value of the rate of return 
for asset i is the weighted average of the possi¬ 
ble outcomes. Finally, rather than use the term 
"expected value of the return of an asset," we 
simply use the term "expected return." Math¬ 
ematically, the expected return of asset i is ex¬ 
pressed as 

E(Rj) = p\Ri + P 2 R 2 + ... + PnRn (4) 
where 

R n — the nth possible rate of return for asset i 
p n — the probability of attaining the rate of re¬ 
turn R n for asset i 

N = the number of possible outcomes for the 
rate of return 

How do we specify the probability distribu¬ 
tion of returns for an asset? We shall see later 


Table 1 Probability Distribution for the Rate of 
Return for Stock XYZ 


n 

Rate of Return 

Probability of Occurrence 

1 

12% 

0.18 

2 

10% 

0.24 

3 

8% 

0.29 

4 

4% 

0.16 

5 

-4% 

0.13 

Total 


1.00 


on in this entry that in most cases the probabil¬ 
ity distribution of returns is based on long-run 
historical returns. If there is no reason to be¬ 
lieve that future long-run returns should differ 
significantly from historical long-run returns, 
then probabilities assigned to different return 
outcomes based on the historical long-run per¬ 
formance of an uncertain investment could be a 
reasonable estimate for the probability distribu¬ 
tion. However, for the purpose of illustration, 
assume that an investor is considering an in¬ 
vestment, stock XYZ, which has a probability 
distribution for the rate of return for some time 
period as given in Table 1. The stock has five 
possible rates of return and the probability dis¬ 
tribution specifies the likelihood of occurrence 
(in a probabilistic sense) of each of the possible 
outcomes. 

Substituting into equation (4) we get 

E(Rxyz) = 0.18(12%) + 0.24(10%) + 0.29(8%) 
+ 0.16(4%) + 0.13(—4%) 

= 7% 

Thus, 7% is the expected return or mean of the 
probability distribution for the rate of return on 
stock XYZ. 

MEASURING PORTFOLIO 
RISK 

Investors have used a variety of definitions to 
describe risk. Markowitz (1952, 1959) quanti¬ 
fied the concept of risk using the well-known 
statistical measure: the standard deviation and 
the variance. The former is the intuitive concept. 
For most probability density functions, about 






8 


Asset Allocation 


95% of the outcomes fall in the range defined 
by two standard deviations above and below 
the mean. Variance is defined as the square of 
the standard deviation. Computations are sim¬ 
plest in terms of variance. Therefore, it is con¬ 
venient to compute the variance of a portfolio 
and then take its square root to obtain standard 
deviation. 


Variance and Standard Deviation as 
a Measure of Risk 

The variance of a random variable is a measure 
of the dispersion or variability of the possible 
outcomes around the expected value (mean). 
In the case of an asset's return, the variance is 
a measure of the dispersion of the possible rate 
of return outcomes around the expected return. 

The equation for the variance of the expected 
return for asset i, denoted var(R,), is 

var (Ri) = p 1 [r 1 - £(k)] 2 + P 2 V 2 ~ E(k)] 2 + • ■ ■ 
+ PnItn- E(Ri)] 2 

or 

N 

var (Ri) = Pn[r n ~ E (k )] 2 (5) 

n=l 

Using the probability distribution of the re¬ 
turn for stock XYZ, we can illustrate the calcu¬ 
lation of the variance: 

var(RxYz) = 0.18(12% - 7 %) 2 + 0.24(10% - 7 %) 2 
+ 0.29(8% - 7 %) 2 + 0.16(4% - 7 %) 2 
+ 0.13(—4% - 7%) 2 = 24.1% 

The variance associated with a distribution 
of returns measures the tightness with which 
the distribution is clustered around the mean 
or expected return. Markowitz argued that this 
tightness or variance is equivalent to the uncer¬ 
tainty or riskiness of the investment. If an asset 
is riskless, it has an expected return dispersion 
of zero. In other words, the return (which is 
also the expected return in this case) is certain, 
or guaranteed. 


Since the variance is squared units, as we 
know from earlier in this section, it is common 
to see the variance converted to the standard 
deviation by taking the positive square root: 

SD(Ri) = VVar(R,) 

For stock XYZ, then, the standard deviation is 
SD(R X yz) = V24.1% = 4.9% 

The variance and standard deviation are con¬ 
ceptually equivalent; that is, the larger the vari¬ 
ance or standard deviation, the greater the 
investment risk. (A criticism of the variance or 
standard deviation as a measure is discussed 
later in this entry.) 

Measuring the Portfolio Risk of a Two-Asset 
Portfolio 

Equation (5) gives the variance for an individ¬ 
ual asset's return. The variance of a portfolio 
consisting of two assets is a little more difficult 
to calculate. It depends not only on the variance 
of the two assets, but also upon how closely the 
returns of one asset track those of the other as¬ 
set. The formula is 

var (R p ) = wfvar(Rj) + wjvav(Rj) 

+ 2w jWj co v(Rj, Rj) ( 6 ) 

where 

co v(Ri, Rj) — covariance between the return 
for assets i and j 

In words, equation ( 6 ) states that the variance 
of the portfolio return is the sum of the squared 
weighted variances of the two assets plus two 
times the weighted covariance between the two 
assets. We will see that this equation can be 
generalized to the case where there are more 
than two assets in the portfolio. 

Covariance 

The covariance has a precise mathematical 
translation. Its practical meaning is the degree 
to which the returns of two assets covary 
or change together. The covariance is not 
expressed in a particular unit, such as dollars 
or percent. A positive covariance means the 
returns on two assets tend to move or change in 
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Table 2 Probability Distribution for the Rate of 
Return for Asset XYZ and Asset ABC 


n 

Rate of 
Return for 
Asset XYZ 

Rate of 
Return for 
Asset ABC 

Probability 

of 

Occurrence 

i 

12 % 

21 % 

0.18 

2 

10 % 

14% 

0.24 

3 

8 % 

9% 

0.29 

4 

4% 

4% 

0.16 

5 

-4% 

-3% 

0.13 

Total 

Expected 

7.0% 

10 .0% 

1.00 

return 

Variance 

24.1% 

53.6% 


Standard 

4.9% 

7.3% 


deviation 





the same direction, while a negative covariance 
means the returns tend to move in opposite di¬ 
rections. The covariance between any two assets 
i and j is computed using the following formula: 

cov(R,, Rj) = pi [r,i - E(Ri)][r n - E(Rj)] 

+ Pi\Ti2 — E(Ri)][rj 2 — E(Rj)] + ... 

+ p N [r iN -E(R i )][r jN -E(R j )] (7) 

where 

r in = the nth possible rate of return for asset i 
Tj n = the nth possible rate of return for asset j 
p n — the probability of attaining the rate of re¬ 
turn ri n and Tj n for assets i and j 
N = the number of possible outcomes for the 
rate of return 

To illustrate the calculation of the covariance 
between two assets, we use the two stocks in 
Table 2. The first is stock XYZ from Table 1 
that we used earlier to illustrate the calcula¬ 
tion of the expected return and the standard 
deviation. The other hypothetical stock is stock 
ABC, whose data are shown in Table 2. Substi¬ 
tuting the data for the two stocks from Table 2 
in equation (7), the covariance between stocks 
XYZ and ABC is calculated as follows: 

co v(R X yz, Rabc) 

= 0.18(12% - 7%)(21% - 10%) 

+ 0.24(10% - 7%)(14% - 10%) + 0.29(8% 

- 7%)(9% - 10%) + 0.16(4% - 7%)(4% - 10%) 
+ 0.13(—4% - 7%)(—3% - 10%) = 0.3396% 


Relationship between Covariance 
and Correlation 

The correlation is related to the covariance 
between the expected returns for two assets. 
Specifically, the correlation between the returns 
for assets i and j is defined as the covariance of 
the two assets divided by the product of their 
standard deviations: 

cor(R,, Rj) = co v(Rj, Rj)/[SD(Ri)SD(Rj)] 

( 8 ) 

Dividing the covariance between the returns 
of two assets by the product of their standard 
deviations results in the correlation between 
the returns of the two assets. Because the 
correlation is a standardized number (i.e., it has 
been corrected for differences in the standard 
deviation of the returns), the correlation is com¬ 
parable across different assets. The correlation 
between the returns for stock XYZ and stock 
ABC is 

cor(R xyz, Rabc) = 0.3396%/(4.9% x 7.3%)^0.95 

The correlation coefficient can have values 
ranging from +1.0, denoting perfect comove¬ 
ment in the same direction, to -1.0, denoting 
perfect comovement in the opposite direction. 
Also note that because the standard deviations 
are always positive, the correlation can only be 
negative if the covariance is a negative number. 
A correlation of zero implies that the returns are 
uncorrelated. 

Measuring the Risk of a Portfolio 
Consisting of More than Two Assets 

So far we have defined the risk of a portfolio 
consisting of two assets. The extension to three 
assets— i,j, and k —is as follows: 

vai(R p ) = wfvai(Ri) + w 2 var(Ry) + w\ var(Rt) 

+2u>; Wj co v(Ri , Rj) + 2 Wi Wk cov(R,, RP) 
+2wjWkCov(Rj, Rk) ( 9 ) 

In words, equation (9) states that the variance 
of the portfolio return is the sum of the squared 
weighted variances of the individual assets plus 
two times the sum of the weighted pairwise 
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covariances of the assets. In general, for a port¬ 
folio with G assets, the portfolio variance is 
given by 

G 

var(Rp) = ^ WgVar(R g ) 
g =i 

G G 

+ ^2 X! w g w hCOv(R g , R h ) 

g=1 h=1 
and ti^g 

( 10 ) 


PORTFOLIO 

DIVERSIFICATION 

Often, one hears investors talking about diver¬ 
sifying their portfolio. By this an investor means 
constructing a portfolio in such a way as to re¬ 
duce portfolio risk without sacrificing return. 
This is certainly a goal that investors should 
seek. However, the question is how to do this 
in practice. 

Some investors would say that including as¬ 
sets across all asset classes could diversify a 
portfolio. For example, a investor might argue 
that a portfolio should be diversified by invest¬ 
ing in stocks, bonds, and real estate. While that 
might be reasonable, two questions must be 
addressed in order to construct a diversified 
portfolio. First, how much should be invested 
in each asset class? Should 40% of the port¬ 
folio be in stocks, 50% in bonds, and 10% in 
real estate, or is some other allocation more ap¬ 
propriate? Second, given the allocation, which 
specific stocks, bonds, and real estate should the 
investor select? 

Some investors who focus only on one asset 
class such as common stock argue that such 
portfolios should also be diversified. By this 
they mean that an investor should not place 
all funds in the stock of one corporation, but 
rather should include stocks of many corpo¬ 
rations. Here, too, several questions must be 
answered in order to construct a diversified 
portfolio. First, which corporations should be 
represented in the portfolio? Second, how much 


of the portfolio should be allocated to the stocks 
of each corporation? 

Prior to the development of portfolio theory, 
while investors often talked about diversifica¬ 
tion in these general terms, they did not pos¬ 
sess the analytical tools by which to answer the 
questions posed above. For example, in 1945, 
Leavens (1945, p. 473) wrote: 

An examination of some fifty books and articles 
on investment that have appeared during the last 
quarter of a century shows that most of them refer 
to the desirability of diversification. The majority, 
however, discuss it in general terms and do not 
clearly indicate why it is desirable. 

Leavens illustrated the benefits of diversifi¬ 
cation on the assumption that risks are inde¬ 
pendent. However, in the last paragraph of his 
article, he cautioned: 

The assumption, mentioned earlier, that each secu¬ 
rity is acted upon by independent causes, is im¬ 
portant, although it cannot always be fully met in 
practice. Diversification among companies in one 
industry cannot protect against unfavorable fac¬ 
tors that may affect the whole industry; additional 
diversification among industries is needed for that 
purpose. Nor can diversification among industries 
protect against cyclical factors that may depress all 
industries at the same time. 

A major contribution of the theory of portfolio 
selection is that using the concepts discussed 
above, a quantitative measure of the diversifi¬ 
cation of a portfolio is possible, and it is this 
measure that can be used to achieve the maxi¬ 
mum diversification benefits. 

The Markowitz diversification strategy is pri¬ 
marily concerned with the degree of covariance 
between asset returns in a portfolio. Indeed a 
key contribution of Markowitz diversification 
is the formulation of an asset's risk in terms 
of a portfolio of assets, rather than in isolation. 
Markowitz diversification seeks to combine as¬ 
sets in a portfolio with returns that are less than 
perfectly positively correlated, in an effort to 
lower portfolio risk (variance) without sacrific¬ 
ing return. It is the concern for maintaining re¬ 
turn while lowering risk through an analysis of 
the covariance between asset returns that sep¬ 
arates Markowitz diversification from a naive 
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approach to diversification and makes it more 
effective. 

Markowitz diversification and the impor¬ 
tance of asset correlations can be illustrated 
with a simple two-asset portfolio example. To 
do this, we first show the general relationship 
between the risk of a two-asset portfolio and the 
correlation of returns of the component assets. 
Then we look at the effects on portfolio risk of 
combining assets with different correlations. 

Portfolio Risk and Correlation 

In our two-asset portfolio, assume that asset C 
and asset D are available with expected returns 
and standard deviations as shown: 


Asset 

E(R) 

SD(R) 

Asset C 

12 % 

30% 

Asset D 

18% 

40% 


If an equal 50% weighting is assigned to both 
stocks C and D, the expected portfolio return 
can be calculated as shown: 

E(R p ) = 0.50(12%) + 0.50(18%) = 15% 

The variance of the return on the two-stock 
portfolio from equation (6), using decimal form 
rather than percentage form for the standard 
deviation inputs, is 

var (R p ) = uj 2 var(Rc) + WpVar(Ro) 
+2w c w d cov(R c , Rd) 

= (0.5) 2 (0.30) 2 + (0.5) 2 (0.40) 2 
+2(0.5)(0.5)co v(R c ,R D ) 

From equation (8), 

cor (R c , Rd) = cov(Rc, R d )/[SD(Rc)SD(R d )] 
so 

cov(Rc, R d ) = SD(Rc)SD(R d )cov(Rc, R d ) 
Since SD(Rc) = 0.30 and SD(Rd) = 0.40, then 
co v(Rc, Rd) = (0.30)(0.40) cor(R c , Rd) 


Substituting into the expression for var(R p ), 
we get 

var(Rp) = (0.5) 2 (0.30) 2 + (0.5) 2 (0.40) 2 

+2(0.5)(0.5)(0.30)(0.40)cor(Rc, R D ) 
Taking the square root of the variance gives 
SD(Rp) _ 

(0.5) 2 (0.30) 2 + (0.5) 2 (0.40) 2 
+2(0.5)(0.5)(0.30)(0.40)cor(R c , Rd) 

= 70.0625 + (0.06)cor(R c + Rd) 

( 11 ) 

The Effect of the Correlation of 
Asset Returns on Portfolio Risk 

How would the risk change for our two-asset 
portfolio with different correlations between 
the returns of the component stocks? Let's con¬ 
sider the following three cases for cor(Rc, Rd): 
+1.0,0, and -1.0. Substituting into equation (11) 
for these three cases of cor(Rc, Rd), we get: 


cor (R c ,Rd) 

E(R p ) 

SD(R p ) 

+1.0 

15% 

35% 

0.0 

15% 

25% 

-1.0 

15% 

5% 


As the correlation between the expected re¬ 
turns on stocks C and D decreases from +1.0 
to 0.0 to -1.0, the standard deviation of the ex¬ 
pected portfolio return also decreases from 35% 
to 5%. However, the expected portfolio return 
remains 15% for each case. 

This example clearly illustrates the effect 
of Markowitz diversification. The principle of 
Markowitz diversification states that as the cor¬ 
relation (covariance) between the returns for as¬ 
sets that are combined in a portfolio decreases, 
so does the variance (hence the standard devi¬ 
ation) of the return for the portfolio. 

The good news is that investors can main¬ 
tain expected portfolio return and lower port¬ 
folio risk by combining assets with lower (and 
preferably negative) correlations. However, the 
bad news is that very few assets have small 
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Table 3 Portfolio Expected Returns and Standard Deviations for Five Mixes of Assets C and D 
Asset C: E(R C ) = 12%, SD(R C ) = 30% 

Asset D: E(R D ) = 18%, and SD(R d ) = 40% 

Correlation between Assets C and D = cot(Rc,Rd) = -0.5 


Portfolio 

Proportion of Asset C 

Proportion of Asset D 

E(R p ) 

SD(Rp) 

1 

100 % 

0 % 

12 .0% 

30.0% 

2 

75% 

25% 

13.5% 

19.5% 

3 

50% 

50% 

15.0% 

18.0% 

4 

25% 

75% 

16.5% 

27.0% 

5 

0 % 

100 % 

18.0% 

40.0% 


to negative correlations with other assets! The 
problem, then, becomes one of searching among 
large numbers of assets in an effort to discover 
the portfolio with the minimum risk at a given 
level of expected return or, equivalently, the 
highest expected return at a given level of risk. 

The stage is now set for a discussion of effi¬ 
cient portfolios and their construction. 


CHOOSING A PORTFOLIO OF 
RISKY ASSETS 

Diversification in the manner suggested by 
Markowitz leads to the construction of port¬ 
folios that have the highest expected return for 
a given level of risk. Such portfolios are called 
efficient portfolios. 

Constructing Efficient Portfolios 

The technique of constructing efficient portfo¬ 
lios from large groups of stocks requires a mas¬ 
sive number of calculations. In a portfolio of 
G securities, there are (G 2 - G)/2 unique co- 
variances to estimate. Hence, for a portfolio of 
just 50 securities, there are 1,225 covariances 
that must be calculated. For 100 securities, there 
are 4,950. Furthermore, in order to solve for the 
portfolio that minimizes risk for each level of re¬ 
turn, a mathematical technique called quadratic 
programming must be used. A discussion of 
this technique is beyond the scope of this entry. 
However, it is possible to illustrate the general 
idea of the construction of efficient portfolios by 


referring again to the simple two-asset portfolio 
consisting of assets C and D. 

Recall that for two assets, C and D, £(Rc) = 
12%, SD(R C ) = 30%, E(R d ) = 18%, and SD(R d ) 
= 40%. We now further assume that cor (Rc,Rd) 
— -0.5. Table 3 presents the expected portfolio 
return and standard deviation for five different 
portfolios made up of varying proportions of C 
and D. 

Feasible and Efficient Portfolios 

A feasible portfolio is any portfolio that an in¬ 
vestor can construct given the assets available. 
The five portfolios presented in Table 3 are all 
feasible portfolios. The collection of all feasi¬ 
ble portfolios is called the feasible set of portfo¬ 
lios. With only two assets, the feasible set of 
portfolios is graphed as a curve, which repre¬ 
sents those combinations of risk and expected 
return that are attainable by constructing port¬ 
folios from all possible combinations of the two 
assets. 

Figure 2 presents the feasible set of portfo¬ 
lios for all combinations of assets C and D. As 
mentioned earlier, the portfolio mixes listed in 
Table 3 belong to this set and are shown by 
the points 1 through 5, respectively. Starting 
from 1 and proceeding to 5, asset C goes from 
100% to 0%, while asset D goes from 0% to 
100 %—therefore, all possible combinations of 
C and D lie between portfolios 1 and 5, or on 
the curve labeled 1-5. In the case of two assets, 
any risk-return combination not lying on this 
curve is not attainable since there is no mix of 
assets C and D that will result in that risk-return 
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20 % 



0-10%' Feasible set represented by curve 1-5 

uj Markowitz efficient set: portion of curve 3-5 

5%' 

0 %-,-,-,-,-, 

0% 10% 20% 30% 40% 50% 

SD(Rp) 

Figure 2 Feasible and Efficient Portfolios for 
Assets C and D 

combination. Consequently, the curve 1-5 can 
also be thought of as the feasible set. 

In contrast to a feasible portfolio, an efficient 
portfolio is one that gives the highest expected 
return of all feasible portfolios with the same 
risk. An efficient portfolio is also said to be a 
mean-variance efficient portfolio. Thus, for each 
level of risk there is an efficient portfolio. The 
collection of all efficient portfolios is called the 
efficient set. 

The efficient set for the feasible set presented 
in Figure 2 is differentiated by the bold curve 
section 3-5. Efficient portfolios are the combi¬ 
nations of assets C and D that result in the 
risk-return combinations on the bold section 
of the curve. These portfolios offer the highest 
expected return at a given level of risk. Notice 
that two of our five portfolio mixes—portfolio 
1 with E(R p ) = 12% and SD(R p ) = 20% and 
portfolio 2 with E(R p ) = 13.5% and SD(R p ) = 
19.5%—are not included in the efficient set. This 
is because there is at least one portfolio in the 
efficient set (for example, portfolio 3) that has 
a higher expected return and lower risk than 
both of them. We can also see that portfolio 4 
has a higher expected return and lower risk than 
portfolio 1. In fact, the whole curve section 1-3 
is not efficient. For any given risk-return combi¬ 
nation on this curve section, there is a combina¬ 
tion (on the curve section 3-5) that has the same 
risk and a higher return, or the same return and 
a lower risk, or both. In other words, for any 



Figure 3 Feasible and Efficient Portfolios with 
More Than Two Assets' 1 

“The picture is for illustrative purposes only. The 
actual shape of the feasible region depends on the 
returns and risks of the assets chosen and the cor¬ 
relation among them. 

portfolio that results in the return/risk combi¬ 
nation on the curve section 1-3 (excluding port¬ 
folio 3), there exists a portfolio that dominates 
it by having the same return and lower risk, or 
the same risk and a higher return, or a lower 
risk and a higher return. For example, portfolio 
4 dominates portfolio 1, and portfolio 3 domi¬ 
nates both portfolios 1 and 2. 

Figure 3 shows the feasible and efficient sets 
when there are more than two assets. In this 
case, the feasible set is not a line, but an area. 
This is because, unlike the two-asset case, it is 
possible to create asset portfolios that result in 
risk-return combinations that not only result 
in combinations that lie on the curve I—II—III, 
but all combinations that lie in the shaded area. 
However, the efficient set is given by the curve 
II—III. It is easily seen that all the portfolios on 
the efficient set dominate the portfolios in the 
shaded area. 

The efficient set of portfolios is sometimes 
called the efficient frontier because graphically all 
the efficient portfolios lie on the boundary of the 
set of feasible portfolios that have the maximum 
return for a given level of risk. Any risk-return 
combination above the efficient frontier cannot 
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be achieved, while risk-return combinations of 
the portfolios that make up the efficient fron¬ 
tier dominate those that lie below the efficient 
frontier. 

Choosing the Optimal Portfolio in 
the Efficient Set 

Now that we have constructed the efficient set 
of portfolios, the next step is to determine the 
optimal portfolio. 

Since all portfolios on the efficient frontier 
provide the greatest possible return at their 
level of risk, an investor or entity will want 
to hold one of the portfolios on the efficient 
frontier. Notice that the portfolios on the effi¬ 
cient frontier represent trade-offs in terms of 
risk and return. Moving from left to right on the 
efficient frontier, the risk increases, but so does 
the expected return. The question is which one 
of those portfolios should an investor hold? The 
best portfolio to hold of all those on the efficient 
frontier is the optimal portfolio. 

Intuitively, the optimal portfolio should de¬ 
pend on the investor's preference over different 
risk-return trade-offs. As explained earlier, this 
preference can be expressed in terms of a utility 
function. 

In Figure 4, three indifference curves rep¬ 
resenting a utility function and the efficient 
frontier are drawn on the same diagram. An in¬ 
difference curve indicates the combinations of 
risk and expected return that give the same level 
of utility. Moreover, the farther the indifference 
curve from the horizontal axis, the higher the 
utility. 

From Figure 4, it is possible to determine the 
optimal portfolio for the investor with the indi¬ 
fference curves shown. Remember that the 
investor wants to get to the highest indifference 
curve achievable given the efficient frontier. 
Given that requirement, the optimal portfolio is 
represented by the point where an indifference 
curve is tangent to the efficient frontier. In 
Figure 4, that is the portfolio EF . For example, 
suppose that Pf FJ : corresponds to portfolio 4 



Kj, u p u 3 - indifference curves with u ] < u L < n } 

^mef ~ optimal portfolio on Markowitz efficient frontier 


Figure 4 Selection of the Optimal Portfolio 

in Figure 2. We know from Table 3 that this 
portfolio is made up of 25% of asset C and 75% 
of asset D, with an E(R p ) = 16.5% and SD(R p ) = 
27.0%. 

Consequently, for the investor's preferences 
over risk and return as determined by the shape 
of the indifference curves represented in Fig¬ 
ure 4, and expectations for asset C and D inputs 
(returns and variance-covariance) represented 
in Table 3, portfolio 4 is the optimal portfolio 
because it maximizes the investor's utility. If 
this investor had a different preference for ex¬ 
pected risk and return, there would have been 
a different optimal portfolio. 

At this point in our discussion, a natural ques¬ 
tion is how to estimate an investor's utility 
function so that the indifference curves can be 
determined. Economists in the field of behav¬ 
ioral and experimental economics have con¬ 
ducted a vast amount of research in the area 
of utility functions. Though the assumption 
sounds reasonable that individuals should pos¬ 
sess a function that maps the different prefer¬ 
ence choices they face, the research shows that it 
it not so straightforward to assign an individual 
with a specific utility function. This is because 
preferences may be dependent on circum¬ 
stances, and those may change with time. 
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Table 4 Annualized Expected Returns, Standard Deviations, and Correlations between the Four Country Equity 
Indexes: Australia, Austria, Belgium, and Canada 


Expected Returns 

Standard Deviation 

Correlations 


1 

2 

3 4 

7.9% 

19.5% 

Australia 

1 

1 



7.9% 

18.2% 

Austria 

2 

0.24 

1 


9.0% 

18.3% 

Belgium 

3 

0.25 

0.47 

1 

7.1% 

16.5% 

Canada 

4 

0.22 

0.14 

0.25 1 


The inability to assign an investor with a spe¬ 
cific utility function does not imply that the 
theory is irrelevant. Once the efficient frontier 
is constructed, it is possible for the investor to 
subjectively evaluate the trade-offs for the dif¬ 
ferent return-risk outcomes and choose the ef¬ 
ficient portfolio that is appropriate given his or 
her tolerance to risk. 

Example Using the MSCI World 
Country Indexes 

Now that we know how to calculate the optimal 
portfolios and the efficient frontier, let us take a 
look at a practical example. We start the exam¬ 
ple using only four assets and later show these 
results change as more assets are included. The 
four assets are the four country equity indexes 
in the MSCI World Index for Australia, Austria, 
Belgium, and Canada. 

Let us assume that we are given the annu¬ 
alized expected returns, standard deviations, 
and correlations between these countries as pre¬ 
sented in Table 4. The expected returns vary 
from 7.1% to 9%, whereas the standard devia¬ 
tions range from 16.5% to 19.5%. Furthermore, 
we observe that the four country indexes are not 
highly correlated with each other—the highest 
correlation, 0.47, is between Austria and Bel¬ 
gium. Therefore, we expect to see some benefits 
of portfolio diversification. 

Figure 5 shows the efficient frontier for the 
four assets. We observe that the four assets, rep¬ 
resented by the diamond-shaped marks, are all 
below the efficient frontier. This means that for 
a targeted expected portfolio return, the mean- 
variance portfolio has a lower standard devia¬ 
tion. A utility maximizing investor, measuring 


utility as the trade-off between expected return 
and standard deviation, will prefer a portfolio 
on the efficient frontier over any of the individ¬ 
ual assets. 

The portfolio at the leftmost end of the ef¬ 
ficient frontier (marked with a solid circle in 
Figure 5) is the portfolio with the smallest 
obtainable standard deviation. It is called the 
global minimum variance (GMV) portfolio. 

Increasing the Asset Universe 
We know that by introducing more (low corre¬ 
lating) assets, for a targeted expected portfolio 
return, we should be able to decrease the stan¬ 
dard deviation of the portfolio. In Table 5, the 
assumed annualized expected returns, stan- 



Figure 5 The Mean-Variance Efficient Frontier 
of Country Equity Indexes of Australia, Austria, 
Belgium, and Canada 

Note: Constructed using the data in Table 4. The 
expected return and standard deviation combi¬ 
nation of each country index is represented by 
a diamond-shaped mark. The global minimum 
variance portfolio (GMV) is represented by a solid 
circle. The portfolios on the curves above the GMV 
portfolio constitute the efficient frontier. 
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Figure 6 The Efficient Frontier Widens as the 
Number of Low Correlated Assets Increase 
Note: The efficient frontiers have been constructed 
with 4,12, and 18 countries (from the innermost to 
the outermost frontier) from the MSCI World In¬ 
dex. The portfolios on the curves above the GMV 
portfolio constitute the efficient frontiers for the 
three cases. 

dard deviations, and correlations of 18 coun¬ 
tries in the MSCI World Index are presented. 

Figure 6 illustrates how the efficient frontier 
moves outwards and upwards as we go from 4 
to 12 assets and then to 18 assets. By increas¬ 
ing the number of investment opportunities, 
we increase the level of possible diversification 
thereby making it possible to generate a higher 
level of return at each level of risk. 

Adding Short Selling Constraints 

So far in this section, our theoretical derivations 
imposed no restrictions on the portfolio weights 
other than having them add up to one. In par¬ 
ticular, we allowed the portfolio weights to take 
on both positive and negative values; that is, we 
did not restrict short selling. In practice, many 
portfolio managers cannot sell assets short. This 
could be for investment policy or legal rea¬ 
sons, or sometimes just because particular asset 
classes are difficult to sell short such real estate. 
In Figure 7, we see the effect of not allowing for 
short selling. Since we are restricting the oppor¬ 
tunity set by constraining all the weights to be 



Figure 7 The Effect of Restricting Short Sell¬ 
ing: Constrained versus Unconstrained Efficient 
Frontiers Constructed from 18 Countries from the 
MSCI World Index 

Note: The portfolios on the curves above the GMV 
portfolio constitute the efficient frontiers. 

positive, the resulting efficient frontier is inside 
the unconstrained efficient frontier. 


ROBUST PORTFOLIO 
OPTIMIZATION 

Despite the great influence and theoretical 
impact of modern portfolio theory, today full 
risk-return optimization at the asset level is 
primarily done only at the more quantitatively 
oriented asset management firms. The avail¬ 
ability of quantitative tools is not the issue— 
today's optimization technology is mature and 
much more user-friendly than it was at the 
time Markowitz first proposed the theory of 
portfolio selection—yet many asset managers 
avoid using the quantitative portfolio allocation 
framework altogether. 

A major reason for the reluctance of portfo¬ 
lio managers to apply quantitative risk-return 
optimization is that they have observed that 
it may be unreliable in practice. Specifically, 
mean-variance optimization (or any measure of 
risk for that matter) is very sensitive to changes 
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in the inputs (in the case of mean-variance opti¬ 
mization, such inputs include the expected re¬ 
turn and variance of each asset and the asset 
covariance between each pair of assets). While 
it can be difficult to make accurate estimates 
of these inputs, estimation errors in the fore¬ 
casts significantly impact the resulting portfolio 
weights. As a result, the optimal portfolios gen¬ 
erated by the mean-variance analysis generally 
have extreme or counterintuitive weights for 
some assets. 1 Such examples, however, are not 
necessarily a sign that the theory of portfolio se¬ 
lection is flawed; rather, that when used in prac¬ 
tice, the mean-variance analysis as presented 
by Markowitz has to be modified in order to 
achieve reliability, stability, and robustness with 
respect to model and estimation errors. 

It goes without saying that advances in the 
mathematical and physical sciences have had a 
major impact upon finance. In particular, math¬ 
ematical areas such as probability theory, statis¬ 
tics, econometrics, operations research, and 
mathematical analysis have provided the nec¬ 
essary tools and discipline for the development 
of modern financial economics. Substantial ad¬ 
vances in the areas of robust estimation and ro¬ 
bust optimization were made during the 1990s, 
and have proven to be of great importance for 
the practical applicability and reliability of port¬ 
folio management and optimization. 

Any statistical estimate is subject to error— 
that is, estimation error. A robust estimator is a 
statistical estimation technique that is less sen¬ 
sitive to outliers in the data and is not driven by 
one particular set of observations of the data. 
For example, in practice, it is undesirable that 
one or a few extreme returns have a large im¬ 
pact on the estimation of the average return of a 
stock. Nowadays, statistical techniques such as 
Bayesian analysis and robust statistics are more 
commonplace in asset management. Taking it 
one step further, practitioners are starting to 
incorporate the uncertainty introduced by es¬ 
timation errors directly into the optimization 
process. This is very different from traditional 
mean-variance analysis, where one solves the 


portfolio optimization problem as a problem 
with deterministic inputs (i.e., inputs that are 
assumed to be known with certainty), with¬ 
out taking the estimation errors into account. 
In particular, the statistical precision of individ¬ 
ual estimates is explicitly incorporated into the 
portfolio allocation process. Providing this ben¬ 
efit is the underlying goal of robust portfolio 
optimization. 2 

Modern robust optimization techniques 
allow a portfolio manager to solve the robust 
version of the portfolio optimization problem 
in about the same time as needed for the tra¬ 
ditional mean-variance portfolio optimization 
problem. The robust approach explicitly uses 
the distribution from the estimation process to 
find a robust portfolio in a single optimization, 
thereby directly incorporating uncertainty 
about inputs in the optimization process. As a 
result, robust portfolios are less sensitive to es¬ 
timation errors than other portfolios, and often 
perform better than optimal portfolios deter¬ 
mined by traditional mean-variance portfolios. 
Moreover, the robust optimization framework 
offers greater flexibility and many new interest¬ 
ing applications. For instance, robust portfolio 
optimization can exploit the notion of statis¬ 
tically equivalent portfolios. This concept is 
important in large-scale portfolio management 
involving many complex constraints such as 
transaction costs, turnover, or market impact. 
Specifically, with robust optimization, a portfo¬ 
lio manager can find the best portfolio that (1) 
minimizes trading costs with respect to the cur¬ 
rent holdings and (2) has an expected portfolio 
return and variance that are statistically equiv¬ 
alent to those of the classical mean-variance 
portfolio. 3 

KEY POINTS 

• Markowitz quantified the concept of diver¬ 
sification through the statistical notion of 
the covariances between individual securities 
that make up a portfolio and the overall stan¬ 
dard deviation of the portfolio. 
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• A basic assumption behind modern portfolio 
theory is that an investor's preferences over 
portfolios with different expected returns and 
variances can be represented by a function 
(utility function). 

• The basic principle underlying modern port¬ 
folio theory is that for a given level of ex¬ 
pected return an investor would choose the 
portfolio with the minimum variance from 
among the set of all possible portfolios. 

• Minimum variance portfolios are called 
mean-variance efficient portfolios. The set of 
all mean-variance efficient portfolios is called 
the efficient frontier. The portfolio on the ef¬ 
ficient frontier with the smallest variance is 
called the global minimum variance portfolio 
(GMVP). 

• The efficient frontier moves outwards and 
upwards as the number of (not perfectly 
correlated) securities increases. The efficient 
frontier shrinks as constraints are imposed 
upon the portfolio. 

• An advancement in the theory of portfolio 
selection is the development of estimation 
techniques that generate more robust mean- 
variance estimates along with optimization 
techniques that result in optimized portfolios 
being more robust to the mean-variance esti¬ 
mates used. 

NOTES 

1. See Best and Grauer (1991) and Chopra and 
Ziember (1993). 

2. There are two approaches that have been 
suggested for dealing with this problem. 
One is the application of estimation by us¬ 
ing a statistical technique known as Bayes 
analysis. (See Rachev, Hsu, Bagasheva, and 
Fabozzi, 2008.) The Black-Litterman model 
uses this approach. (See Black and Litter- 
man, 1990.) The other approach is using a 


resampling methodology as suggested by 
Michaud (2001). A study by Markowitz and 
Usmen (2003) found that the resampled ap¬ 
proach is superior to that of a Bayesian ap¬ 
proach. 

3. For a discussion of these models, see 
Fabozzi, Kolm, Pachamanova, and Focardi 
(2007). 
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Abstract: The mathematical theory of optimization has a natural application in the field of finance. 
From a general perspective, the behavior of economic agents in the face of uncertainty involves 
balancing expected risks and expected rewards. For example, the portfolio choice problem concerns 
the optimal trade-off between risk and reward. A portfolio is said to be optimal in the sense that 
it is the best portfolio among many alternative ones. The criterion that measures the quality of a 
portfolio relative to the others is known as the objective function in optimization theory. The set 
of portfolios among which we are choosing is called the "set of feasible solutions" or the "set of 
feasible points." 


In optimization theory there is a distinction 
between two types of optimization problems 
depending on whether the set of feasible so¬ 
lutions is constrained or unconstrained. If the 
optimization problem is a constrained one, then 
the set of feasible solutions is defined by means 
of certain linear and/or nonlinear equalities 
and inequalities. These functions are often said 
to be forming the constraint set. Furthermore, 
a distinction is made between the types of 
optimization problems depending on the 


assumed properties of the objective function and 
the functions in the constraint set, such as linear 
problems, quadratic problems, and convex problems. 
The solution methods vary with respect to the 
particular optimization problem type as there 
are efficient algorithms prepared for particular 
problem types. 

In this chapter, we describe the basic types 
of optimization problems and remark on 
the methods for their solution. Boyd and 
Vandenberghe (2004) and Ruszczynski (2006) 
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provide more detailed information on the 
topic. 


UNCONSTRAINED 

OPTIMIZATION 

When there are no constraints imposed on the 
set of feasible solutions, we have an uncon¬ 
strained optimization problem. Thus, the goal is 
to maximize or to minimize the objective func¬ 
tion with respect to the function arguments 
without any limits on their values. We con¬ 
sider directly the ?;-dimensional case; that is, 
the domain of the objective function / is the 
H-dimensional space and the function values 
are real numbers, / : R" —»■ R. Maximization is 
denoted by 

max/(xi, ... ,x n ) 
and minimization by 

min/(xi, ..., x n ) 

A more compact form is commonly used; for 
example 

min fix) (1) 

xeR" 

denotes that we are searching for the minimal 
value of the function /(x) by varying x in the 
entire n-dimensional space M". A solution to 
problem (1) is a value of x — x° for which the 
minimum of/ is attained: 

fo = f( x ° ) = min/(x) 

xeR M 

Thus, the vector Xq is such that the function 
takes a larger value than fo for any other 
vector x, 

f(x°) < f(x),x e r (2) 

Note that there may be more than one vector 
x° satisfying the inequality in (2) and, therefore, 
the argument for which fg is achieved may not 
be unique. If (2) holds, then the function is said 
to attain its global minimum at x °. If the inequal¬ 
ity in (2) holds for x belonging only to a small 
neighborhood of x° and not to the entire space 
M", then the objective function is said to have a 


local minimum at x°. This is usually denoted by 

f(x°) < /(*) 

for all x such that ||x — x °||2 < e where 
||x — x °||2 stands for the Euclidean distance 
between the vectors x and x°. 


I|x-X°|| 2 = 5 > - X?) 2 

\l 1=1 

and e is some positive number. A local mini¬ 
mum may not be global as there may be vectors 
outside the small neighborhood of Xq for which 
the objective function attains a smaller value 
than f(x o). Figure 3 shows the graph of a 
function with two local maxima, one of which 
is the global maximum. 

There is a connection between minimization 
and maximization. Maximizing the objective 
function is the same as minimizing the negative 
of the objective function and then changing the 
sign of the minimal value: 

max f(x) = — minf— fix) 1 

xeR" xeR” 

This relationship is illustrated in Figure 1. As 
a consequence, problems for maximization can 
be stated in terms of function minimization and 
vice versa. 



*o 


Figure 1 The Relationship between Minimiza¬ 
tion and Maximization for a One-Dimensional 
Function 
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Minima and Maxima of a 
Differentiable Function 

If the second derivatives of the objective func¬ 
tion exist, then its local maxima and minima, 
often called generically local extrema, can be 
characterized. Denote by V/(x) the vector of the 
first partial derivatives of the objective function 
evaluated at x, 



This vector is called the function gradient. At 
each point x of the domain of the function, it 
shows the direction of greatest rate of increase 
of the function in a small neighborhood of x. 
If for a given x the gradient equals a vector of 
zeros. 


V f(x) = (0,..., 0) 


then the function does not change in a small 
neighborhood of x e R". It turns out that all 
points of local extrema of the objective function 
are characterized by a zero gradient. As a result, 
the points yielding the local extrema of the ob¬ 
jective function are among the solutions of the 
system of equations. 




( 3 ) 


The system of equations (3) is often referred to 
as representing the first-order condition for the 
objective function extrema. However, it is only 
a necessary condition; that is, if the gradient is 
zero at a given point in the n-dimensional space, 
then this point may or may not be a point of a 
local extremum for the function. An illustration 
is given in Figures 2 and 3. Figure 2 shows the 
graph of a two-dimensional function and Fig¬ 
ure 3 contains the contour lines of the function 
with the gradient calculated at a grid of points. 
There are three points marked with a black dot 
which have a zero gradient. The middle point 
is not a point of a local maximum even though 
it has a zero gradient. This point is called a snd- 



Figure 2 A Function f(xi, X 2 ) with Two Local 
Maxima 


die point, since the graph resembles the shape of 
a saddle in a neighborhood of it. The left and 
the right points are where the function has two 
local maxima corresponding to the two peaks 
visible on the top plot. The right peak is a local 
maximum which is not the global one and the 
left peak represents the global maximum. 

This example demonstrates that the first- 
order conditions are generally insufficient to 
characterize the points of local extrema. The ad¬ 
ditional condition which identifies which of the 



Figure 3 The Contour Lines of f(x 1 ,^ 2 ) To¬ 
gether with the Gradient Evaluated at a Grid of 
Points 

Note: The middle black point shows the position 
of the saddle point between the two local maxima. 
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zero-gradient points are points of local mini¬ 
mum or maximum is given through the matrix 
of second derivatives. 



/ 3 2 f(x) 
dx* 

3 2 /W 

dx-[dX2 

3 2 /M \ 

dXidx n 

H = 

3 2 f(x) 

dX2dX\ 

a 2 f(x) 

dx£ 

3 2 f(x) 

‘ dX2dx n 


, 3 2 /W 

\ dx n dx\ 

3 7« 

dX n dX2 

3 2 f(x) t 
■ 3 xl / 


which is called the Hessian matrix or just the 
Hessian. The Hessian is a symmetric matrix be¬ 
cause the order of differentiation is insignifi¬ 
cant: 

3 2 f(x) 3 2 f(x) 

dXjdXj dXjdXj 

The additional condition is known as the 
second-order condition. We will not provide 
the second-order condition for functions of 
n-dimensional arguments because it is rather 
technical and goes beyond the scope of the en¬ 
try. We only state it for two-dimensional func¬ 
tions. 

In the case n — 2, the following conditions 
hold: 

• If V/(xi, X 2 ) = (0, 0) at a given point (xi, X 2 ) 
and the determinant of the Hessian matrix 
evaluated at (xi, X 2 ) is positive, then the func¬ 
tion has: 

A local maximum in (xi, X 2 ) if 

3 2 /(xi,x 2 ) 3 2 /(xi,x 2 ) 

3Xj 3x| 

A local minimum in (xi, x 2 ) if 

3 2 /(xi,x 2 ) 3 2 /(x 1; x 2 ) 

3Xj 3x| 

• If V/(xj, x 2 ) = (0, 0) at a given point (xi, x 2 ) 
and the determinant of the Hessian matrix 
evaluated at (xj, x 2 ) is negative, then the func¬ 
tion/ has a saddle point in (xj, x 2 ). 

• If V/(xj, x 2 ) = (0, 0) at a given point (xi, x 2 ) 
and the determinant of the Hessian matrix 
evaluated at (xi, x 2 ) is zero, then no conclu¬ 
sion can be drawn. 


Convex Functions 

We just demonstrated that the first-order con¬ 
ditions are insufficient in the general case to 
describe the local extrema. However, when 
certain assumptions are made for the objec¬ 
tive function, the first-order conditions can be¬ 
come sufficient. Furthermore, for certain classes 
of functions, the local extrema are necessarily 
global. Therefore, solving the first-order condi¬ 
tions we obtain the global extremum. 

A general class of functions with nice opti¬ 
mal properties is the class of convex functions. 
Not only are the convex functions easy to opti¬ 
mize but also they have important application 
in risk management. (See Chapter 6 in Rachev, 
Stoyanov, and Fabozzi [2008] for a discussion of 
general measures of risk.) It turns out that the 
property which guarantees that diversification 
is possible appears to be exactly the convexity 
property. As a consequence, a measure of risk 
is necessarily a convex functional. 

A function in mathematics can be viewed as 
a rule assigning to each element of a set D a 
single element of a set C. The set D is called the 
domain of/ and the set C is called the codomain 
of/. A fiinctional is a special kind of a function 
which takes other functions as its argument and 
returns numbers as output; that is, its domain 
is a set of functions. For example, the definite 
integral can be viewed as a functional because 
it assigns a real number to a function—the cor¬ 
responding area below the function graph. A 
risk measure can also be viewed as a functional 
because it assigns a number to a random vari¬ 
able. Any random variable is mathematically 
described as a certain function the domain of 
which is a set of outcomes £2. 

Precisely, a function /(x) is called a convex 
function if it satisfies the property: For a given 
a e [0,1] and all x 1 e M" and x 2 e R" in the 
function domain. 


f(ax 1 + (1 — a)x 2 ) < a/(x 1 ) + (1 — a)/(x 2 ) 

(5) 
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Figure 4 Illustration of the Definition of a Con¬ 
vex Function in the One-Dimensional Case 
Note: On the plot, x a = ax 1 + (1 — a)x 2 . 

The definition is illustrated in Figure 4. Basi¬ 
cally, if a function is convex, then a straight line 
connecting any two points on the graph lies 
"above" the graph of the function. 

There is a related term to convex functions. 
A function/ is called concave if the negative of 
/ is convex. In effect, a function is concave if 
it satisfies the property: For a given a e [0,1] 
and all x 1 e R" and x 2 e R" in the function 
domain, 

/( ax 1 + (1 - a)x 2 ) > af(x 1 ) + (1 - a)f(x 2 ) 

If the domain D of a convex function is not 
the entire space R", then the set D satisfies the 
property 

ax 1 + (1 — a)x 2 e D (6) 

where x 1 e D, x 2 e D, and 0 < a < 1. The sets 
which satisfy (6) are called convex sets. Thus, 
the domains of convex (and concave) func¬ 
tions should be convex sets. Geometrically, a 
set is convex if it contains the straight line con¬ 
necting any two points belonging to the set. 
Rockafellar (1997) provides detailed informa¬ 
tion on the implications of convexity in opti¬ 
mization theory. 

We summarize several important properties 
of convex functions: 


• Not all convex functions are differentiable. If 

a convex function is two times continuously 
differentiable, then the corresponding Fles- 
sian defined in (4) is a positive semidefinite 
matrix. (A matrix H is a positive semidef¬ 
inite matrix if x'Hx > 0 for all x e R" and 
* # (0 . 0 ).) 

• All convex functions are continuous if con¬ 
sidered in an open set. 

• The sublevel sets 

L c ={x: f{x) < c} (7) 

where c is a constant, are convex sets if / is 
a convex function. The converse is not true 
in general. Later, we provide more informa¬ 
tion about nonconvex functions with convex 
sublevel sets. 

• The local minima of a convex function are 
global. If a convex function / is twice con¬ 
tinuously differentiable, then the global min¬ 
imum is obtained in the points solving the 
first-order condition 

V/(x) = 0 

• A sum of convex functions is a convex func¬ 
tion: 

/( x ) = fl( x ) + fl{ x ) + . . . + fk( x ) 

if/, i = 1..... /c are convex functions. 

A simple example of a convex function is the 
linear function 

f(x) = a'x, x e R" 

where a e R” is a vector of constants. In fact, 
the linear function is the only function which 
is both convex and concave. In finance, if we 
consider a portfolio of assets, then the expected 
portfolio return is a linear function of portfo¬ 
lio weights, in which the coefficients equal the 
expected asset returns. 

As a more involved example, consider the fol¬ 
lowing function: 

f(x) = x e R” (8) 
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Figure 5 The Surface of a Two-Dimensional 
Convex Quadratic Function 



*2 


Figure 6 The Contour Lines of a Two- 
Dimensional Convex Quadratic Function 


where C = {cq}” - =1 is an n x w symmetric ma¬ 
trix. In portfolio theory, the variance of portfolio 
return is a similar function of portfolio weights. 
In this case, C is the covariance matrix. The func¬ 
tion defined in (8) is called a quadratic function 
because writing the definition in terms of the 
components of the argument X, we obtain 

n 

y cuxj+y cqxiXj 

i =i ¥; 

which is a quadratic function of the components 
Xi,i = l,...,n. The function in (8) is convex if 
and only if the matrix C is positive semidefi- 
nite. In fact, in this case the matrix C equals the 
Hessian matrix, C = H. Since the matrix C con¬ 
tains all parameters, we say that the quadratic 
function is defined by the matrix C. 

Figures 5-8 illustrate the surface and con¬ 
tour lines of a convex and nonconvex two- 
dimensional quadratic function. The contour 
lines of the convex function are concentric el¬ 
lipses and a sublevel set L c is represented by 
the points inside some ellipse. The point (0, 0) in 
Figure 8 is a saddle point. The convex quadratic 
function is defined by the matrix 




Figure 7 The Surface of a Nonconvex Two- 
Dimensional Quadratic Function 



C = 




Figure 8 The Contour Lines of a Nonconvex 
Two-Dimensional Quadratic Function 
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and the nonconvex quadratic function is de¬ 
fined by the matrix 


C = 



0.4 

1 


A property of convex functions is that the sum 
of convex functions is a convex function. As a 
result of the preceding analysis, the function 

f(x) = Xx'Cx — a'x (9) 


where X > 0 and C is a positive semidefinite 
matrix, is a convex function as a sum of two 
convex functions. In the mean-variance efficient 
frontier, as formulated by Markowitz (1952), we 
find functions similar to (9). Let us use the prop¬ 
erties of convex functions in order to solve the 
unconstrained problem of minimizing the func¬ 
tion in (9): 


min Xx'Cx — a'x 

xeM" 

This function is differentiable and we can search 
for the global minimum by solving the first- 
order conditions: 


V/(x) = 2 XCx — a = 0 

Therefore, the value of x minimizing the objec¬ 
tive function equals 


where C 1 denotes the inverse of the matrix C. 


- 0.2 

-0.4 

"" CM 
>< 

£2 - 0.6 
- 0.8 

-1 


Figure 9 Example of a Two-Dimensional Quasi- 
Convex Function 

A function / is called quasi-concave if —/ is 
quasi-convex. 

An illustration of a two-dimensional quasi- 
convex function is given in Figure 9. It shows 
the graph of the function and Figure 10 il¬ 
lustrates the contour lines. A sublevel set is 
represented by all points inside some contour 
line. From a geometric viewpoint, the sublevel 
sets corresponding to the plotted contour lines 
are convex because any of them contains the 
straight line connecting any two points belong¬ 
ing to the set. Nevertheless, the function is not 
convex, which becomes evident from the sur¬ 
face in Figure 9. It is not guaranteed that a 



Quasi-Convex Functions 

Besides convex functions, there are other classes 
of functions with convenient optimal proper¬ 
ties. An example of such a class is the class 
of quasi-convex functions. Formally, a function is 
called quasi-convex if all sublevel sets defined 
in (7) are convex sets. Alternatively, a function 
/(x) is called quasi-convex if 

/(x 1 ) > fix 2 ) implies /(ax 1 + (1 — a)x 2 ) 

< fix 1 ) 

where x 1 and x 2 belong to the function domain, 
which should be a convex set, and 0 < a < 1. 



Figure 10 The Contour Lines of a Two- 
Dimensional Quasi-Convex Function 
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straight line connecting any two points on the 
surface will remain "above" the surface. 

Properties of the quasi-convex functions in¬ 
clude: 

* Any convex function is also quasi-convex. 
The converse is not true, which is demon¬ 
strated in Figure 10. 

* In contrast to the differentiable convex func¬ 
tions, the first-order condition is not neces¬ 
sary and sufficient for optimality in the case of 
differentiable quasi-convex functions. (There 
exists a class of functions larger than the class 
of convex functions but smaller than the class 
of quasi-convex functions, for which the first- 
order condition is necessary and sufficient for 
optimality. This is the class of pseudo-convex 
functions. Mangasarian [2006] provides more 
detail on the optimal properties of pseudo- 
convex functions.) 

• It is possible to find a sequence of convex op¬ 
timization problems yielding the global min¬ 
imum of a quasi-convex function. Boyd and 
Vandenberghe (2004) provide further details. 
Its main idea is to find the smallest value of 
c for which the corresponding sublevel set 
L c is nonempty. The minimal value of c is 
the global minimum, which is attained in the 
points belonging to the sublevel set L c . 

• Suppose that g(x) > 0 is a concave function 
and f(x) > 0 is a convex function. Then the 
ratio g(x)/f(x) is a quasi-concave function 
and the ratio f(x)/g(x) is a quasi-convex 
function. 

Quasi-convex functions arise naturally in risk 
management when considering optimization of 
performance ratios. (See Chapter 10 in Rachev, 
Stoyanov, and Fabozzi [2008].) 


CONSTRAINED 

OPTIMIZATION 

In constructing optimization problems solving 
practical issues, it is very often the case that 
certain constraints need to be imposed in or¬ 


der for the optimal solution to make practi¬ 
cal sense. For example, long-only portfolio op¬ 
timization problems require that the portfolio 
weights, which represent the variables in op¬ 
timization, should be nonnegative and should 
sum up to one. According to the notation in 
this chapter, this corresponds to a problem of 
the type 

min f(x) 

x J 

subject to x'e = 1 (10) 

x > 0 


where 


fix) = the objective function 
e e K" = a vector of ones, e = (1,..., 1) 

x'e = the sum of all components of x, 
x'e = Ya Xi 

x > 0 = all components of the vector x e M" 
are nonnegative 


In problem (10), we are searching for the mini¬ 
mum of the objective function by varying x only 
in the set 


X = 


xeP: 


x'e — 11 

x > 0 | 


( 11 ) 


which is also called the set of feasible points or the 
constraint set. A more compact notation, similar 
to the notation in the unconstrained problems, 
is sometimes used. 


min /(x) 

xeX 

where X is defined in (11). 

We distinguish between different types of 
optimization problems depending on the as¬ 
sumed properties for the objective function and 
the constraint set. If the constraint set contains 
only equalities, the problem is easier to han¬ 
dle analytically. In this case, the method of 
Lagrange multipliers is applied. For more gen¬ 
eral constraint sets, when they are formed by 
both equalities and inequalities, the method 
of Lagrange multipliers is generalized by the 
Karush-Kuhn-Tucker conditions (KKT condi¬ 
tions). Like the first-order conditions we consid¬ 
ered in unconstrained optimization problems. 



Principles of Optimization for Portfolio Selection 


29 


none of the two approaches lead to necessary 
and sufficient conditions for constrained op¬ 
timization problems without further assump¬ 
tions. One of the most general frameworks in 
which the KKT conditions are necessary and 
sufficient is that of convex programming. We have 
a convex programing problem if the objective 
function is a convex function and the set of feasi¬ 
ble points is a convex set. As important subcases 
of convex optimization, linear programming and 
convex quadratic programming problems are con¬ 
sidered. 

In this section, we describe first the method 
of Lagrange multipliers, which is often applied 
to special types of mean-variance optimization 
problems in order to obtain closed-form solu¬ 
tions. Then we proceed with convex program¬ 
ming, which is the framework for reward-risk 
analysis. 

Lagrange Multipliers 

Consider the following optimization problem 
in which the set of feasible points is defined by 
a number of equality constraints: 

min f(x) 

X J 

subject to h\(x) = 0 

h 2 (x) = 0 (12) 

h k (x) = 0 

The functions hfx), i = 1,..., k build up the 
constraint set. Note that even though the right- 
hand side of the equality constraints is zero in 
the classical formulation of the problem given in 
(12), this is not restrictive. If in a practical prob¬ 
lem the right-hand side happens to be different 
from zero, it can be equivalently transformed; 
for example: 

{x e M" : v(x) = c} jrel": h\(x) 

— v(x) — c = 0} 

In order to illustrate the necessary condition 
for optimality valid for (12), let us consider the 


following two-dimensional example: 

min \x'Cx 

*£R 2 2 (13) 

subject to x'e = 1 

where the matrix is 

C = (o'4 “) 

The objective function is a quadratic function 
and the constraint set contains one linear equal¬ 
ity. A mean-variance optimization problem in 
which short positions are allowed is very simi¬ 
lar to (13). (See Chapter 8 in Rachev, Stoyanov, 
and Fabozzi [2008].) The surface of the objec¬ 
tive function and the constraint are shown in 
Figures 11 and 12. The black line on the surface 
shows the function values of the feasible points. 
Geometrically, solving problem (13) reduces to 
finding the lowest point of the black curve on 
the surface. The contour lines shown in Fig¬ 
ure 12 imply that the feasible point yielding the 
minimum of the objective function is where a 
contour line is tangential to the line defined by 
the equality constraint. On the plot, the tangen¬ 
tial contour line and the feasible points are in 
bold. The black dot indicates the position of the 
point in which the objective function attains its 
minimum subject to the constraints. 

Even though the example is not general in the 
sense that the constraint set contains one linear 



Figure 11 The Surface of a Two-Dimensional 
Quadratic Objective Function and the Linear Con¬ 
straint X\ + X2 = 1 
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Figure 12 The Tangential Contour Line to the 
Linear Constraint X\ + %2 = 1 


rather than a nonlinear equality, the same geo¬ 
metric intuition applies in the nonlinear case. 
The fact that the minimum is attained where a 
contour line is tangential to the curve defined 
by the nonlinear equality constraints in mathe¬ 
matical language is expressed in the following 
way: The gradient of the objective function at 
the point yielding the minimum is proportional 
to a linear combination of the gradients of the 
functions defining the constraint set. Formally, 
this is stated as: 


V f(x°) - mi V/ii(x°) - ... - fj, k Vh k (x°) = 0 

(14) 

where /x> , ; = 1,..., k are some real numbers 
called Lagrange multipliers and the point x° is 
such that f(x°) < f(x) for all x which are fea¬ 
sible. Note that if there are no constraints in 
the problem, then (14) reduces to the first-order 
condition we considered in unconstrained op¬ 
timization. Therefore, the system of equations 
behind (14) can be viewed as a generalization 
of the first-order condition in the unconstrained 
case. 

The method of a Lagrange multipliers basi¬ 
cally associates a function to the problem in 
(12) such that the first-order condition for un¬ 
constrained optimization for that function co¬ 
incides with (14). The method of a Lagrange 
multiplier consists of the following steps. 


1. Given the problem in (12), construct the fol¬ 
lowing function: 


L(x, p) = f(x) - Mi hi(x) - ... - Mfc h k (x) 

(15) 

where m = (Mi , • • • - M/c) is the vector of La¬ 
grange multipliers. The function L(x, fi) is 
called the Lagrangian corresponding to prob¬ 
lem (12). 

2. Calculate the partial derivatives with respect 
to all components of x and m and set them 
equal to zero: 


dL(x, m) 3/ (x) dhj(x) 

dxj 3 Xj dxj 

;=1 


i = 1,..., n 


3 L(x, m) 
3 Mm 


— ll-m(x') — 0 , 


(16) 


m = 1,..., k 


Basically, the system of equations (16) cor¬ 
responds to the first-order conditions for 
unconstrained optimization written for the 
Lagrangian as a function of both x and ji, 

L : R n+k -» M. 

3. Solve the system of equalities in (16) for x and 
M- Note that even though we are solving the 
first-order condition for unconstrained opti¬ 
mization of L(x, m), the solution (x°, m°) of 
(16) is not a point of local minimum or maxi¬ 
mum of the Lagrangian. In fact, the solution 
(x°, m°) is a saddle point of the Lagrangian. 

The first n equations in (16) make sure that 
the relationship between the gradients given in 
(14) is satisfied. The following k equations in 
(16) make sure that the points are feasible. As 
a result, all vectors x solving (16) are feasible 
and the gradient condition is satisfied at them. 
Therefore, the points that solve the optimiza¬ 
tion problem (12) are among the solutions of 
the system of equations given in (16). 

This analysis suggests that the method of La¬ 
grange multipliers provides a necessary condi¬ 
tion for optimality. Under certain assumptions 
for the objective function and the functions 
building up the constraint set, (16) turns out 
to be a necessary and sufficient condition. For 
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example, if f(x) is a convex and differentiable 
function and h,(x), i = 1,..., k are affine func¬ 
tions, then the method of Lagrange multipliers 
identifies the points solving (12). A function 
h(x) is called affine if it has the form h(x) = 
a + c'x, where a is a constant and c = (c\,..., c n ) 
is a vector of coefficients. All linear functions are 
affine. Figure 12 illustrates a convex quadratic 
function subject to a linear constraint. In this 
case, the solution point is unique. 


Convex Programming 

The general form of convex programming prob¬ 
lems is 


min 

X 

m 


subject to 

gi(x) <0, i = 1,..., m 
hj(x) = 0, j = 1,... ,k 

(17) 

where 


m 

is a convex objective 

func- 


tion 

yi(x),..., gm(x) are convex functions 

defining the inequality 
constraints 

h\(x), ..., hk{x) are affine functions 
defining the equality 
constraints 

Generally, without the assumptions of con¬ 
vexity, problem (17) is more involved than (12) 
because besides the equality constraints, there 
are inequality constraints. The KKT condition, 
generalizing the method of Lagrange multipli¬ 
ers, is only a necessary condition for optimality 
in this case. However, adding the assumption of 
convexity makes the KKT condition necessary 
and sufficient. 

Note that, similar to problem (12), the fact that 
the right-hand side of all constraints is zero is 
nonrestrictive. The limits can be arbitrary real 
numbers. 

Consider the following two-dimensional op¬ 
timization problem 

min \x'Cx 

areR 2 

subject to (xi + 2) 2 + (X 2 + 2) 2 < 3 (18) 


in which 


The objective function is a two-dimensional 
convex quadratic function and the function in 
the constraint set is also a convex quadratic 
function. In fact, the boundary of the feasible 
set is a circle with a radius of V3 centered at 
the point with coordinates (—2, —2). Figures 13 
and 14 show the surface of the objective func¬ 
tion and the set of feasible points. The shaded 
part on the surface indicates the function values 
of all feasible points. In fact, solving problem 
(18) reduces to finding the lowest point on the 
shaded part of the surface. Figure 14 shows the 
contour lines of the objective function together 
with the feasible set, which is in gray. Geomet¬ 
rically, the point in the feasible set yielding the 
minimum of the objective function is positioned 
where a contour line only touches the constraint 
set. The position of this point is marked with 
a black dot and the tangential contour line is 
given in bold. 

Note that the solution points of problems of 
the type (18) can happen to be not on the bound¬ 
ary of the feasible set but in the interior. For 
example, suppose that the radius of the circle 
defining the boundary of the feasible set in (18) 
is a larger number such that the point (0, 0) is 



Figure 13 The Surface of a Two-Dimensional 
Convex Quadratic Function and a Convex 
Quadratic Constraint 
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Figure 14 The Tangential Contour Line to the 
Feasible Set Defined by a Convex Quadratic Con¬ 
straint 


inside the feasible set. Then, the point (0, 0) 
is the solution to problem (18) because at this 
point the objective function attains its global 
minimum. 

In the two-dimensional case, when we can 
visualize the optimization problem, geometric 
reasoning guides us to finding the optimal solu¬ 
tion point. In a higher dimensional space, plots 
cannot be produced and we rely on the ana¬ 
lytic method behind the KKT conditions. The 
KKT conditions corresponding to the convex 
programming problem (17) are the following: 

m k 

V/(x) + ^2 + E /j.jVhj(x) = 0 

i =1 7=1 

gi(x) <0 i = 1 ,..., m 
hj(x) = 0 7 = 1 .Jfc (19) 

Xigi(x) = 0, i = 1 ,..., m 
ki > 0 , i = 1 . m 

A point x° such that (x°, k°, fi°) satisfies (19) is 
the solution to problem (17). Note that if there 
are no inequality constraints, then the KKT con¬ 
ditions reduce to (16) in the method of Lagrange 
multipliers. Therefore, the KKT conditions gen¬ 
eralize the method of Lagrange multipliers. 

The gradient condition in (19) has the same 
interpretation as the gradient condition in the 


method of Lagrange multipliers. The set of con¬ 
straints 

gi(x) <0 i = 1 ,..., m 
hj(x) = 0 j = 1,... ,k 

guarantee that a point satisfying (19) is feasible. 
The next conditions 

^igi(x) — 0 , i = 1 ,... ,m 

are called complementary slackness conditions. 
If an inequality constraint is satisfied as a strict 
inequality, then the corresponding multiplier /,, 
turns into zero according to the complemen¬ 
tary slackness conditions. In this case, the cor¬ 
responding gradient V^, (x) has no significance 
in the gradient condition. This reflects the fact 
that the gradient condition concerns only the 
constraints satisfied as equalities at the solution 
point. 

Important special cases of convex program¬ 
ming problems include linear programming 
problems and convex quadratic programming 
problems, which we consider in the remaining 
part of this section. 

Linear Programming 

Optimization problems are said to be linear pro¬ 
gramming problems if the objective function is 
a linear function and the feasible set is defined 
by linear equalities and inequalities. Since all 
functions are linear, they are also convex, which 
means that linear programming problems are 
also convex problems. The definition of linear 
programming problems in standard form is the 
following: 

min c'x 

X 

subject to Ax < b (20) 

x > 0 

where A is an m x n matrix of coefficients, 
c = (ci,... , c„) is a vector of objective function 
coefficients, and b = (bi,..., b m ) is a vector of 
real numbers. As a result, the constraint set con¬ 
tains m inequalities defined by linear functions. 
The feasible points defined by means of linear 
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Figure 15 The Surface of a Linear Function and 
a Polyhedral Feasible Set 


equalities and inequalities are also said to form 
a polyhedral set. In practice, before solving a 
linear programming problem, it is usually first 
reformulated in the standard form given in (20). 

Figures 15 and 16 show an example of a 
two-dimensional linear programming problem 
which is not in standard form as the two vari¬ 
ables may become negative. Figure 15 contains 
the surface of the objective function, which is 
a plane in this case, and the polyhedral set of 
feasible points. The shaded area on the surface 
corresponds to the points in the feasible set. 
Solving problem (20) reduces to finding the 
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Figure 16 The Bottom Plot Shows the Tangential 
Contour Line to the Polyhedral Feasible Set 


lowest point in the shaded area on the sur¬ 
face. Figure 16 shows the feasible set together 
with the contour lines of the objective function. 
The contour lines are parallel straight lines be¬ 
cause the objective function is linear. The point 
in which the objective function attains its mini¬ 
mum is marked with a black dot. 

A general result in linear programming is 
that, on condition that the problem is bounded, 
the solution is always at the boundary of the 
feasible set and, more precisely, at a vertex of 
the polyhedron. Problem (20) may become un¬ 
bounded if the polyhedral set is unbounded 
and there are feasible points such that the objec¬ 
tive function can decrease indefinitely. We can 
summarize that, generally, due to the simple 
structure of (20), there are three possibilities: 

1. The problem is not feasible, because the poly¬ 
hedral set is empty. 

2. The problem is unbounded. 

3. The problem has a solution at a vertex of the 
polyhedral set. 

From computational viewpoint, the polyhe¬ 
dral set has a finite number of vertices and an 
algorithm can be devised with the goal of find¬ 
ing a vertex solving the optimization problem in 
a finite number of steps. This is the basic idea be¬ 
hind the simplex method, which is an efficient 
numerical approach to solving linear program¬ 
ming problems. Besides the simplex algorithm, 
there are other, more contemporary methods, 
such as the interior point method. 


Quadratic Programming 

Besides linear programming, another class of 
problems with simple structure is the class of 
quadratic programming problems. It contains 
optimization problems with a quadratic objec¬ 
tive function and linear equalities and inequal¬ 
ities in the constraint set: 

min c'x + Ix'Hx 

2 ( 21 ) 

subject to Ax < b 
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where 


C = (Cl, 

..., c„) is a vector of coefficients 
defining the linear part of 
the objective function 

H = {hij 

}" , ! is an n x n matrix defining 

the quadratic part of the 
objective 

A={a tj ] 

is a A: x n matrix defining k 
linear inequalities in the 
constraint set 

b = (b lt 

... ,bk) is a vector of real numbers 
defining the right-hand side 
of the linear inequalities 


In optimal portfolio theory, mean-variance 
optimization problems in which portfolio vari¬ 
ance is in the objective function are quadratic 
programming problems. 

From the point of view of optimization theory, 
problem (21) is a convex optimization problem 
if the matrix defining the quadratic part of the 
objective function is positive semidefinite. In 
this case, the KKT conditions can be applied to 
solve it. 


KEY POINTS 

1. The mathematical theory of optimization 
concerns identifying the best alternative 
within a set of available, or feasible, alterna¬ 
tives and finds application in different areas 
of finance such as portfolio selection or, more 
generally, explaining behavior of economic 
agents in the face of uncertainty. 

2. An optimization problem has two important 
components: an objective function defining 
the criterion to be optimized and a feasibil¬ 
ity set described by means of equality or in¬ 
equality constraints. 


3. The properties of the objective function and 
the feasibility set are used to distinguish 
different classes of optimization problems 
with specific conditions for optimality and 
numerical solution methods. The most im¬ 
portant classes include linear, quadratic, and 
convex programming problems. 

4. In the theory of portfolio selection, the clas¬ 
sical mean-variance analysis belongs to the 
class of quadratic optimization problems. 

5. Employing more general reward and risk 
measures can result in a convex optimization 
problem but if scenarios for assets returns are 
available, the portfolio selection problem can 
be simplified to a linear programming prob¬ 
lem in some cases. Optimization of perfor¬ 
mance ratios can be related to quasi-convex 
programs. 
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Abstract: Meeting the challenges of modern investment practice involves the design of novel forms 
of investment solutions, as opposed to investment products customized to meet investors' ex¬ 
pectations. These new forms of investment solutions rely on the use of improved, more efficient 
performance-seeking portfolio and liability-hedging portfolio building blocks, as well as on the 
use of improved dynamic allocation strategies. Understanding the conceptual and technical chal¬ 
lenges involved in the design of improved benchmarks for the performance-seeking portfolio 
is critical. 


Management is justified as an industry by the 
capacity of adding value through the design 
of investment solutions that match investors' 
needs. For more than 50 years, the industry 
has in fact mostly focused on security selec¬ 
tion decisions as a single source of added value. 


This sole focus has somewhat distracted the 
industry from another key source of added 
value, namely portfolio construction and as¬ 
set allocation decisions. In the face of recent 
crises, and given the intrinsic difficulty in de¬ 
livering added value through security selection 
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decisions only, the relevance of the old 
paradigm has been questioned with heightened 
intensity, and a new paradigm is starting 
to emerge. 

Academic research has provided very useful 
guidance with respect to how asset allocation 
and portfolio construction decisions should 
be analyzed so as to best improve investors' 
welfare. In a nutshell, the "fund separation the¬ 
orems" that lie at the core of modern portfo¬ 
lio theory advocate a separate management of 
performance and risk control objectives. In the 
context of asset allocation decisions with con¬ 
sumption/liability objectives, it can be shown 
that the suitable expression of the fund sep¬ 
aration theorem provides rational support for 
liability-driven investment (LDI) techniques that 
have recently been promoted by a number of in¬ 
vestment banks and asset management firms. 
These solutions involve on the one hand the 
design of a customized liability-hedging portfolio 
(LHP), the sole purpose of which is to hedge 
away as effectively as possible the impact of 
unexpected changes in risk factors affecting li¬ 
ability values (most notably interest rate and 
inflation risks), and on the other hand the 
design of a performance-seeking portfolio (PSP), 
whose raison d'etre is to provide investors with 
an optimal risk-return trade-off. 

One of the implications of this LDI paradigm 
is that one should distinguish two different 
levels of asset allocation decisions: alloca¬ 
tion decisions involved in the design of the 
performance-seeking or the liability-hedging 
portfolio (design of better building blocks, or 
BBBs), and asset allocation decisions involved 
in the optimal split between the PSP and the 
LHP (designed of advanced asset allocation de¬ 
cisions, or AAAs). We address the question of 
better building blocks in detail in this entry 
and provide some thoughts on integrating these 
building blocks in asset allocation. More specif¬ 
ically, we mainly focus here on how to construct 
efficient performance-seeking portfolios. 

In this entry we provide an overview of the 
key conceptual challenges involved in asset al¬ 


location and portfolio construction in design¬ 
ing the performance-seeking portfolio. We be¬ 
gin by presenting the fundamental principle of 
the maximization of risk/reward efficiency and 
then deal with estimation of risk parameters 
and expected return parameters. The empirical 
results of optimal portfolio construction mod¬ 
eling are presented. We also provide a brief dis¬ 
cussion on integrating such properly designed 
building blocks in the overall PSP at the asset 
allocation level. 


THE TANGENCY PORTFOLIO 
AS THE RATIONALE BEHIND 
SHARPE RATIO 
MAXIMIZATION 

Modern portfolio theory provides some useful 
guidance with respect to the optimal design of a 
PSP that would best suit investors' needs. More 
precisely, the prescription is that the PSP should 
be obtained as the result of a portfolio optimiza¬ 
tion procedure aiming at generating the highest 
risk-reward ratio. 

Portfolio optimization is a straightforward 
procedure, at least in principle. In a mean- 
variance setting, for example, the prescription 
consists of generating a maximum Sharpe ra¬ 
tio (MSR) portfolio based on expected return, 
volatility, and pairwise correlation parameters 
for all assets to be included in the portfolio, a 
procedure that can even be handled analytically 
in the absence of portfolio constraints. 

More precisely, consider a simple mean- 
variance problem: 

1 2 

maxpp - -ya 

w z r 

Here, the control variable is a vector zv of op¬ 
timal weight allocated to various risky assets, 
/i p denotes the portfolio expected return, and 
a„ denotes the portfolio volatility. We further 
assume that the investor is facing the follow¬ 
ing investment opportunity set: a riskless bond 
paying the risk-free rate r, and a set of N risky 
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assets with expected return vector /z (of size N) 
and covariance matrix X (of size NxN), all as¬ 
sumed constant so far. 

With these notations, the portfolio expected 
return and volatility are respectively given by: 

/z p = w' (/x — re) + r 

CTp = w'TjW 

In this context, it is straightforward to show 
by standard arguments that the only efficient 
portfolio composed with risky assets is the max¬ 
imum Sharpe ratio portfolio, also known as the 
tangency portfolio. 1 

Finally, the Sharpe ratio reads (where we fur¬ 
ther denote by e vector of ones of size N): 

(m'Ew) /2 

And the optimal portfolio is given by: 

max ^/Zp - ^y<7 p J =>■ w* = (/z - re) 

= e'S^Qx-re) X" 1 (/z - re) 

Y e'X _1 (/z — re) 

PSP 

This is a two-fund separation theorem, which 
gives the allocation to the MSR performance¬ 
seeking portfolio (PSP), with the rest invested 
in cash, as well as the composition of the MSR 
performance-seeking portfolio. 

In practice, investors end up holding more 
or less imperfect proxies for the truly optimal 
performance-seeking portfolio, if only because 
of the presence of parameter uncertainty, which 
makes it impossible to obtain a perfect estimate 
for the maximum Sharpe ratio portfolio. De¬ 
noting by X the Sharpe ratio of the (generally 
inefficient) PSP actually held by the investor, 
and by a its volatility, we obtain the following 
optimal allocation strategy: 

< = —PSP ( 2 ) 

ya 

Hence the allocation to the performance¬ 
seeking portfolio is a function of two objec¬ 
tive parameters, the PSP volatility and the PSP 


Sharpe ratio, and one subjective parameter, the 
investor's risk aversion. The optimal alloca¬ 
tion to the PSP is inversely proportional to 
the investor's risk aversion. If risk aversion 
goes to infinity, the investor holds the risk-free 
asset only, as should be expected. For finite risk- 
aversion levels, the allocation to the PSP is in¬ 
versely proportional to the PSP volatility, and 
it is proportional to the PSP Sharpe ratio. As 
a result, if the Sharpe ratio of the PSP is in¬ 
creased, one can invest more in risky assets. 
Hence, portfolio construction modeling is not 
only about risk reduction; it is also about perfor¬ 
mance enhancement through a better spending 
of investors' risk budgets. 

The expression (1) is useful because it pro¬ 
vides in principle a straightforward expression 
for the optimal portfolio starting from a set of 
N risky assets. In the presence of a realistically 
large number N of securities, the curse of di¬ 
mensionality, however, makes it practically im¬ 
possible for investors to implement such direct 
one-step portfolio optimization decisions in¬ 
volving all individual components of the asset 
mix. The standard alternative approach widely 
adopted in investment practice consists instead 
in first grouping individual securities in vari¬ 
ous asset classes according to various dimen¬ 
sions, for example, country, sector, and/or style 
within the equity universe, or country, maturity, 
and credit rating within the bond universe, and 
subsequently generating the optimal portfolio 
through a two-stage process. On the one hand, 
investable proxies are generated for maximum 
Sharpe ratio (MSR) portfolios within each as¬ 
set class in the investment universe. We call 
this step, which is typically delegated to profes¬ 
sional money managers, the portfolio construc¬ 
tion step. On the other hand, when the MSR 
proxies are obtained for each asset class, an op¬ 
timal allocation to the various asset classes is 
eventually generated so as to generate the max¬ 
imum Sharpe ratio at the global portfolio level. 
This step is called the asset allocation step, and 
it is typically handled by a centralized deci¬ 
sion maker (e.g., a pension fund CIO) with or 
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without the help of specialized consultants, as 
opposed to being delegated to decentralized as¬ 
set managers. In this entry, the discussion fo¬ 
cuses on the first step, and we provide some 
concluding remarks on its relation to the sec¬ 
ond step at the end of this entry. 

For the definition of building blocks for asset 
allocation, in the absence of active views, the 
default option consists of using market cap 
weighted indexes as proxies for the asset class 
MSR portfolio. Academic research, however, 
has found that such market cap indexes were 
likely to be severely inefficient portfolios. 2 In 
a nutshell, market cap weighted indexes are 
not good choices as investment benchmarks be¬ 
cause they are poorly diversified portfolios. In 
fact, cap-weighting tends to lead to exceedingly 
high concentration in relatively few stocks. As a 
consequence of their lack of diversification, cap 
weighted indexes have been empirically found 
to be severely inefficient portfolios, which do 
not provide investors with the fair reward given 
the risk taken. As a result of their poor diversi¬ 
fication, they have been found to be dominated 
by equally weighted benchmarks, 3 which are 
naively diversified portfolios that are optimal if 
and only if all securities have identical expected 
return, volatilities, and all pairs of correlations 
are identical. 

In what follows, we analyze in some detail a 
number of alternatives based on practical im¬ 
plementation of modem portfolio theory that 
have been suggested to generate more efficient 
proxies for the MSR portfolio in the equity in¬ 
vestment universe. (See Figure 1.) 

Modem portfolio theory was bom with the 
efficient frontier analysis of Markowitz (1952). 
Unfortunately, early applications of the tech¬ 
nique, based on naive estimates of the input pa¬ 
rameters, have been found of little use because 
they lead to nonsensible portfolio allocations. 

In a first section, we explain how to help 
bridge the gap between portfolio theory and 
portfolio construction by showing how to gen¬ 
erate enhanced parameter estimates so as to 
improve the quality of the portfolio optimiza- 


The true tangency portfolio is a 
function of the (unknown) true 
parameter values 


• w msr - f (P f> Ptt) 



Implementable proxies 
depend on estimated 
parameter values 


Wmsr - f (Pt> °t>Ptf) 


Figure 1 Inefficiency of Cap-Weighted Bench¬ 
marks, and the Quest for an Efficient Proxy for 
the True Tangency Portfolio 


tion outputs (optimal portfolio weights). We 
first focus on enhanced covariance parameter 
estimates and explain how to meet the main 
challenge of sample risk reduction. 4 Against 
this backdrop, we present the state-of-the art 
methodologies for reducing the problem di¬ 
mensionality and estimating the covariance 
matrix with multifactor models. We then turn 
to expected return estimation. We argue that 
statistical methodologies are not likely to gener¬ 
ate any robust expected return estimates, which 
suggests that economic models such as the 
single-factor CAPM and the multifactor APT 
should instead be used for expected return esti¬ 
mation. Finally, we also present evidence that 
proxies for expected return estimates should 
not only include systematic risk measures, but 
they should also incorporate idiosyncratic risk 
measures as well as downside risk measures. 


ROBUST ESTIMATORS FOR 
COVARIANCE PARAMETERS 

In practice, success in the implementation of 
a theoretical model relies not only upon its 
conceptual grounds but also on the reliabil¬ 
ity of the inputs of the model. In the case of 









Asset Allocation and Portfolio Construction Techniques 


39 


mean-variance (MV) optimization the results 
will highly depend on the quality of the pa¬ 
rameter estimates: the covariance matrix and 
the expected returns of assets. 

Several improved estimates for the covari¬ 
ance matrix have been proposed, including 
most notably the factor-based approach sug¬ 
gested by Sharpe (1963), the constant cor¬ 
relation approach suggested by Elton and 
Gruber (1973), and the statistical shrinkage ap¬ 
proach suggested by Ledoit and Wolf (2004). 
In addition, Jagannathan and Ma (2003) find 
that imposing (non-short selling) constraints 
on the weights in the optimization program im¬ 
proves the risk-adjusted out-of-sample perfor¬ 
mance in a manner that is similar to some of the 
aforementioned improved covariance matrix 
estimators. 

In these papers, the authors have focused on 
testing the out-of-sample performance of global 
minimum variance (GMV) portfolios, as op¬ 
posed to the MSR portfolios (also known as 
tangency portfolios), given that there is a con¬ 
sensus around the fact that purely statistical 
estimates of expected returns are not robust 
enough to be used. (This is discussed later in 
this entry.) 

The key problem in covariance matrix esti¬ 
mation is the curse of dimensionality; when a 
large number of stocks are considered, the num¬ 
ber of parameters to estimate grows exponen¬ 
tially, where the majority of them are pairwise 
correlations. 

Therefore, at the estimation stage, the chal¬ 
lenge is to reduce the number of factors that 
come into play. In general, a multifactor model 
decomposes the (excess) return (in excess to the 
risk-free asset) of an asset into its expected re¬ 
wards for exposition to the "true" risk factors 
as follows: 

K 

r it = a it + Pi,jt ■ Fjt + Sit 

7=1 

or in matrix form for all N assets: 

Tf = Off + Pt Ft + St 


where /J t is an N x K matrix containing the sen¬ 
sitivities of each asset i with respect to the corre¬ 
sponding /-th factor movements; r t is the vector 
of the N assets' (excess) returns, F t a vector con¬ 
taining the K risk factors' (excess) returns, and s t 
the At x 1 vector containing the zero mean un¬ 
correlated residuals e,f.The covariance matrix 
for the asset returns implied by a factor model 
is given by: 

£2 = P ■ • /3 t + £ e 

where E F is the K x K covariance matrix of the 
risk factors and E £ an At x At covariance matrix 
of the residuals corresponding to each asset. 

While the factor-based estimator is expected 
to allow for a reasonable trade-off between sam¬ 
ple risk and model risk, there still remains, 
however, the problem of choosing the "right" 
factor model. One popular approach aims at 
relying as little as possible on strong theoreti¬ 
cal assumptions by using principal components 
analysis (PCA) to determine the underlying 
risk factors from the data. The PCA method is 
based on a spectral decomposition of the sam¬ 
ple covariance matrix, and its goal is to explain 
covariance structures using only a few linear 
combinations of the original stochastic vari¬ 
ables, which will constitute the set of (unob¬ 
servable) factors. 

Bengtsson and Holst (2002) and Fujiwara et al. 
(2006) motivate the use of PCA in a similar way, 
extracting principal components in order to es¬ 
timate expected correlation within MV portfo¬ 
lio optimization. Fujiwara et al. (2006) find that 
the realized risk-return of portfolios based on 
the PCA method outperforms the single-index- 
based one and that the optimization gives a 
practically reasonable asset allocation. Overall, 
the main strength of the PCA approach at this 
stage is that it allows "the data to talk" and 
has them tell the financial modeler what the 
underlying risk factors are that govern most 
of the variability of the assets at each point 
in time. This strongly contrasts with having to 
rely on the assumption that a particular factor 
model is the true pricing model and reduces the 
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specification risk embedded in the factor- 
based approach while keeping the sample risk 
reduction. 

The question of determining the appropriate 
number of factors to structure the correlation 
matrix is critical for the risk estimation when 
using PCA as a factor model. Several options 
have been proposed to answer this question, 
some of them with more theoretical grounds 
than others. 

As a final note, we need to recognize that the 
discussion is so far cast in a mean-variance set¬ 
ting, which can in principle only be rational¬ 
ized for normally distributed asset returns. In 
the presence of non-normally distributed asset 
returns, optimal portfolio selection techniques 
require estimates for variance-covariance pa¬ 
rameters, along with estimates for higher-order 
moments and comoments of the return dis¬ 
tribution. This is a formidable challenge that 
severely exacerbates the dimensionality prob¬ 
lem already present with mean-variance anal¬ 
ysis. In a recent paper, Martellini and Ziemann 
(2010) extend the existing literature, which has 
mostly focused on the covariance matrix, by in¬ 
troducing improved estimators for the coskew¬ 
ness and cokurtosis parameters. On the one 
hand, they find that the use of these enhanced 
estimates generates a significant improvement 
in investors' welfare. On the other hand, they 
find that also that when the number of con¬ 
stituents in the portfolios is large (e.g., exceed¬ 
ing 20), the increase in sample risk related to 
the need to estimate higher-order comoments 
by far outweighs the benefits related to consid¬ 
ering a more general portfolio optimization pro¬ 
cedure. In the end, when portfolio optimization 
is performed on the basis of a large number of 
individual securities, it appears that maximiz¬ 
ing the portfolio Sharpe ratio leads to a better 
out-of-sample return-to-VaR ratio or return-to- 
CVaR ratio compared to a procedure focusing 
on maximizing the return-to-VaR ratio or the 
return-to-CVaR ratio, a result that holds true 
even if improved estimators are used for higher- 
order comoments. 


ROBUST ESTIMATORS FOR 
EXPECTED RETURNS 

While it appears that risk parameters can be 
estimated with a fair degree of accuracy, it 
has been shown (Merton, 1980) that expected 
returns are difficult to obtain with a reason¬ 
able estimation error. What makes the problem 
worse is that optimization techniques are very 
sensitive to differences in expected returns, so 
that portfolio optimizers typically allocate the 
largest fraction of capital to the asset class for 
which estimation error in the expected returns 
is the largest. 5 

In the face of the difficulty of using sample- 
based expected return estimates in a portfolio 
optimization context, a reasonable alternative 
consists in using some risk estimate as a proxy 
for excess expected returns. 6 This approach is 
based on the most basic principle in finance; 
that is, the natural relationship between risk 
and reward. In fact, standard asset pricing the¬ 
ories such as the arbitrage pricing theory as 
proposed by Ross (1976) imply that expected 
returns should be positively related to system¬ 
atic volatility, such as measured through a factor 
model that summarizes individual stock return 
exposure with respect to a number of rewarded 
risk factors. 

More recently, a series of papers have focused 
on the explanatory power of idiosyncratic, as 
opposed to systematic, risk for the cross section 
of expected returns. In particular, Malkiel and 
Xu (2006), extending an insight from Merton 
(1987), show that an inability to hold the 
market portfolio, whatever the cause, will force 
investors to care about total risk to some degree 
in addition to market risk so that firms with 
larger firm-specific variances require higher 
average returns to compensate investors for 
holding imperfectly diversified portfolios. 7 
That stocks with high idiosyncratic risk earn 
higher returns has also been confirmed in a 
number of recent empirical studies, including 
in particular Tinic and West (1986) as well as 
Malkiel and Xu (1997,2006). 


Asset Allocation and Portfolio Construction Techniques 


41 


Taken together, these findings suggest that to¬ 
tal risk, a model-free quantity given by the sum 
of systematic and specific risk, should be pos¬ 
itively related to expected return. Most com¬ 
monly, total risk is the volatility of a stock's 
returns. Martellini (2008) has investigated the 
portfolio implications of these findings and has 
found that tangency portfolios constructed on 
the assumption that the cross-section of excess 
expected returns could be approximated by the 
cross-section of volatility posted better out-of- 
sample risk-adjusted performance than their 
market-cap-weighted counterparts. 

More generally, recent research suggests that 
the cross-section of expected returns might be 
best explained by risk indicators taking into ac¬ 
count higher-order moments. Theoretical mod¬ 
els have shown that, in exchange for higher 
skewness and lower kurtosis of returns, in¬ 
vestors are willing to accept expected returns 
lower (and with volatility higher) than those of 
the mean-variance benchmark. 8 More specifi¬ 
cally, skewness and kurtosis in individual stock 
returns (as opposed to the skewness and kurto¬ 
sis of aggregate portfolios) have been shown to 
matter in several papers. High skewness is asso¬ 
ciated with lower expected returns in Barberis 
and Huang (2004), Brunnermeier, Gollier, and 
Parker (2005), and Mitton and Vorkink (2007). 
The intuition behind this result is that investors 
like to hold positively skewed portfolios. The 
highest skewness is achieved by concentrating 
portfolios in a small number of stocks that them¬ 
selves have positively skewed returns. Thus in¬ 
vestors tend to be underdiversified and drive 
up the price of stocks with high positive skew¬ 
ness, which in turn reduces their future ex¬ 
pected returns. Stocks with negative skewness 
are relatively unattractive and thus have low 
prices and high returns. The preference for 
kurtosis is in the sense that investors like low 
kurtosis and thus expected returns should be 
positively related to kurtosis. Boyer, Mitton, 
and Vorkink (2010) and Conrad, Dittmar and 
Ghysels (2008) provide empirical evidence that 
individual stocks' skewness and kurtosis is in¬ 


deed related to future returns. An alternative to 
direct consideration of the higher moments of 
returns is to use a risk measure that aggregates 
the different dimensions of risk. In this line, Bali 
and Cakici (2004) show that future returns on 
stocks are positively related to their value-at- 
risk and Estrada (2000) and Chen, Chen, and 
Chen (2009) show that there is a relationship 
between downside risk and expected returns. 


IMPLICATIONS FOR 
BENCHMARK PORTFOLIO 
CONSTRUCTION 

Once careful estimates for risk and return 
parameters have been obtained, one may then 
design efficient proxies for an asset class bench¬ 
mark with an attractive risk-return profile. For 
example, Amenc et al. (2011) find that effi¬ 
cient equity benchmarks designed on the ba¬ 
sis of robust estimates for risk and expected 
return parameters substantially outperform in 
terms of risk-adjusted performance market cap 
weighted indexes that are often used as default 
options for investment benchmarks in spite of 
their well-documented lack of efficiency. 9 

Table 1, borrowed from Amenc et al. (2011), 
shows summary performance statistics for an 
efficient index constructed according to the 
aforementioned principles. For the average re¬ 
turn, volatility, and the Sharpe ratio, we re¬ 
port differences with respect to cap-weighting 
and assess whether this difference is statistically 
significant. 

Table 1 shows that the efficient weighting 
of index constituents leads to higher average 
returns, lower volatility, and a higher Sharpe 
ratio. All these differences are statistically sig¬ 
nificant at the 10% level, whereas the differ¬ 
ence in Sharpe ratios is significant even at the 
0.1% level. Given the data, it is highly unlikely 
that the unobservable true performance of ef¬ 
ficient weighting was not different from that 
of capitalization weighting. Economically, the 
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Table 1 Risk and Return Characteristics for the Efficient Index 


Index 

Ann. Average 
Return 

(compounded) 

Ann. 

Standard 

Deviation 

Sharpe Ratio 
(compounded) 

Information 

Ratio 

Tracking 

Error 

Efficient index 

11.63% 

14.65% 

0.41 

0.52 

4.65% 

Cap-weighted 

9.23% 

15.20% 

0.24 

0.00 

0.00% 

Difference (efficient minus 

2.40% 

-0.55% 

0.17 

- 

- 

cap-weighted) 
p-value for difference 

0.14% 

6.04% 

0.04% 

- 

- 


The table shows risk and return statistics portfolios constructed with the same set of constituents as the cap-weighted 
index. Rebalancing is quarterly subject to an optimal control of portfolio turnover (by setting the reoptimization 
threshold to 50%). Portfolios are constructed by maximizing the Sharpe ratio given an expected return estimate 
and a covariance estimate. The expected return estimate is set to the median total risk of stocks in the same decile 
when sorting by total risk. The covariance matrix is estimated using an implicit factor model for stock returns. 
Weight constraints are set so that each stock's weight is between 1/2N and 2/N, where N is the number of index 
constituents. The p-values for differences are computed using the paired f-test for the average, the F-test for volatility, 
and a Jobson-Korkie test for the Sharpe ratio. The results are based on weekly return data from 01/1959 to 12/2008. 


performance difference is pronounced, as the 
Sharpe ratio increases by about 70%. 


ASSET ALLOCATION 
MODELING: PUTTING THE 
ELLICIENT BUILDING 
BLOCKS TOGETHER 

After efficient benchmarks have been designed 
for various asset classes, these building blocks 
can be assembled in a second step, the asset allo¬ 
cation step, to build a well-designed multiclass 
performance-seeking portfolio. 

While the methods we have discussed so 
far can in principle be applied in both con¬ 
texts, a number of key differences should be 
emphasized. 

In the asset allocation context, the number 
of constituents is small, and using time- and 
state-dependent covariance matrix estimates 
becomes reasonable, while they do not nec¬ 
essarily improve the situation in portfolio 
construction contexts when the number of 
constituents is large. Similarly, while it is not 
feasible in general, as explained above, to per¬ 
form portfolio optimization with higher-order 
moments in a portfolio construction context 
where the number of constituents is typically 


large, it is reasonable to go beyond mean- 
variance analysis in an asset allocation context 
where the number of constituents is limited. 

Furthermore, in an asset allocation context, 
the universe is not homogenous, which has im¬ 
plications for expected returns and covariance 
estimation. In terms of a covariance matrix, it 
will not prove easy to obtain a universal factor 
model for the whole investment universe. In 
this context, it is arguably better to use statisti¬ 
cal shrinkage toward, say, the constant correla¬ 
tion model, as opposed to using a factor model 
approach. 10 


KEY POINTS 

• Modern portfolio theory advocates the sep¬ 
aration of the management of performance 
and risk control objectives. In the context of 
asset allocation decisions, the fund separa¬ 
tion theorem provides rational support for 
liability-driven investment techniques whose 
solutions involve the design of a customized 
liability-hedging portfolio and the design of 
a performance-seeking portfolio. 

• The sole purpose of the liability-hedging port¬ 
folio is to hedge away as effectively as possi¬ 
ble the impact of unexpected changes in risk 
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factors affecting liability values (most notably 
interest rate and inflation risks); the purpose 
of the performance-seeking portfolio is to pro¬ 
vide investors with an optimal risk-return 
trade-off. 

An implication of the liability-driven invest¬ 
ment paradigm is that one should distinguish 
two different levels of asset allocation deci¬ 
sions: (1) decisions involved in the design 
of the performance-seeking or the liability¬ 
hedging portfolio (design of better build¬ 
ing blocks), and (2) decisions involved in 
the optimal split between the performance¬ 
seeking portfolio and liability-hedging port¬ 
folio (designed of advanced asset allocation 
decisions). 

Although modern portfolio theory provides 
some useful guidance with respect to the op¬ 
timal design of performance-seeking portfo¬ 
lios that would best suit investors' needs, 
in practice, investors end up holding more 
or less imperfect proxies for the truly op¬ 
timal performance-seeking portfolio, if only 
because of the presence of parameter uncer¬ 
tainty, which makes it impossible to obtain 
a perfect estimate for the maximum Sharpe 
ratio portfolio. 

The allocation to the performance-seeking 
portfolio is a function of two objective param¬ 
eters, the PSP volatility and the PSP Sharpe 
ratio, and one subjective parameter, the in¬ 
vestor's risk aversion. The optimal allocation 
to the PSP is inversely proportional to the in¬ 
vestor's risk aversion. 

In practice, the success in the implementation 
of a theoretical model relies not only upon its 
conceptual grounds but also on the reliabil¬ 
ity of the inputs of the model. In the case of 
mean-variance optimization the results will 
highly depend on the quality of the parame¬ 
ter estimates: the covariance matrix and the 
expected returns of assets. 

Several improved estimates for the co- 
variance matrix have been proposed: the 
factor-based approach, constant correlation 
approach, and statistical shrinkage approach. 


* The key problem in covariance matrix esti¬ 
mation is the curse of dimensionality. Conse¬ 
quently, at the estimation stage, the challenge 
is to reduce the number of factors that come 
into play. In general, a multifactor model de¬ 
composes the (excess) return (in excess to the 
risk-free asset) of an asset into its expected re¬ 
wards for exposition to the "true" risk factors. 

* The problem of choosing the right factor 
model still remains. The statistical technique 
of principal components analysis is com¬ 
monly used to determine the underlying risk 
factors from the data. 

* While it appears that risk parameters can 
be estimated with a fair degree of accuracy, 
it has been shown that expected returns 
are difficult to obtain with a reasonable 
estimation error. What makes the problem 
worse is that optimization techniques are 
very sensitive to differences in expected 
returns, so that portfolio optimizers typically 
allocate the largest fraction of capital to the 
asset class for which estimation error in the 
expected returns is the largest. In the face of 
the difficulty of using sample-based expected 
return estimates in a portfolio optimization 
context, a reasonable alternative consists in 
using some risk estimate as a proxy for excess 
expected returns. 

* Research suggests that the cross-section of ex¬ 
pected returns might be best explained by risk 
indicators taking into account higher-order 
moments. Theoretical models have shown 
that, in exchange for higher skewness and 
lower kurtosis of returns, investors are will¬ 
ing to accept expected returns lower (and 
volatility higher) than those of the mean- 
variance benchmark. 

* Once careful estimates for risk and return pa¬ 
rameters have been obtained, one may then 
design efficient proxies for an asset class 
benchmark with an attractive risk-return pro¬ 
file. After efficient benchmarks have been 
designed for various asset classes, these 
building blocks can be assembled in a sec¬ 
ond step, the asset allocation step, to build 
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a well-designed multiclass performance¬ 
seeking portfolio. 

NOTES 

1. For more details, see Appendix A.l in 
Amenc, Goltz, Martellini, and Mihau 
( 2011 ). 

2. See, for example, Haugen and Baker (1991), 
Grinold (1992), or Amenc, Goltz, and Le 
Sourd (2006). 

3. De Miguel et al. (2009). 

4. Another key challenge is the presence of 
nonstationary risk parameters, which can 
be accounted for with conditional fac¬ 
tor models capturing time-dependencies 
(e.g., GARCH-type models) and state- 
dependencies (e.g., Markov regime switch¬ 
ing models) in risk parameter estimates. 

5. See Britten-Jones (1999) or Michaud (1998). 

6. This discussion focuses on estimating the 
fair neutral reward for holding risky assets. 
If one has access to active views on expected 
returns, one may use a disciplined approach 
(e.g., the Black-Litterman model) to com¬ 
bine the active views with the neutral esti¬ 
mates. 

7. See also Barberis and Huang (2001) for 
a similar conclusion from a behavioral 
perspective. 

8. See Rubinstein (1973) and Kraus and 
Litzenberger (1976). 

9. See, for example, Haugen and Baker (1991) 
and Grinold (1992). 

10. See Ledoit and Wolf (2003, 2004). 
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Abstract: Asset pricing is mainly about transforming asset payoffs into prices. The most important 
principles of valuation are no-arbitrage, law of one price, and linear positive state pricing. These 
principles imply asset prices are linearly related to their discounted payoffs in which the stochastic 
discount factor is a function of investors' risk tolerance and economy-wide risks. The arbitrage 
pricing theory, the capital asset pricing model, and the consumption asset pricing model, among 
others, are special cases of the discount factor models. 


In this entry, we discuss the general principles 
of asset pricing. Our focus here is to analyze 
asset pricing in a more general setup. Due to its 
generality, this entry is inevitably more abstract 
and challenging, but important for understand¬ 
ing the foundations of modern asset pricing 
theory. First, by extending the state-dependent 
contingent claims with two possible states 
allowing for an arbitrary number of states, we 
introduce the economic notions of complete 
market, the law of one price, and arbitrage. 
Then, we provide the fundamental theorem of 
asset pricing that ties these concepts to asset 
pricing relations. Subsequently, we discuss 
stochastic discount factor models, which is 
the unified framework of various asset pricing 
theories that include the capital asset pricing 
model (CAPM) (see Sharpe, 1964; Lintner, 1965; 
Mossin, 1966) and arbitrage pricing theory (APT) 
(see Ross, 1976) as special cases. 


ONE-PERIOD FINITE STATE 
ECONOMY 

If a security has payoffs, denoted by x, 

$1, up 
0, down 

it means that the economy will have two states 
next period, up or down, and the security will 
have a value of $1 or 0 in the up and down 
states, respectively. Similarly, as a simple exten¬ 
sion, we can think that the economy has three 
states next period: good, normal, and bad. Then, 
any security in this economy must have three 
payoffs corresponding to the three states. For 
example. 


'$3, 

good 

$2, 

normal 

$1, 

bad 
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is a security in the states economy with val¬ 
ues of $3, $2, and SI, respectively, in the three 
states. For notational brevity, we sometimes 
use the transposed vector dropping the dollar 
sign, (3,2,1)', to denote the payoff of this secu¬ 
rity, where the apostrophe (') is the symbol for 
transpose. 

In general, we can consider an economy with 
an arbitrary number of s states and N securities. 
In this economy, the payoff of any security can 
be expressed as 


Vi, 

State 1 

V 2 , 

St ate 2 

Vs, 

Sta te s 


where the v's are the values of the security in the 
m states. For example, suppose state s = 4, then 
a security with payoff (1.10,1.10,1.10,1.10)' is 
a well-defined security in our four-state econ¬ 
omy. Suppose further that the price of this secu¬ 
rity is $1, then this security earns $0.10 or 10% 
($0.1/$1) regardless of the state. Flence, this se¬ 
curity is risk free with a rate of return of 10% 
regardless of the state of the economy. 

Suppose now that there is a total of N secu¬ 
rities, x \, • • ■, Xn, in an economy of s states. We 
can summarize the payoffs next period of all 
the N securities by using the following matrix. 


(v\\ ■ ■ ■ vm\ 


X = 


\h 


’si 


VsN/ 


( 2 ) 


where each of the N columns represents the 
values of the securities. It is evident that matrix 
X summarizes payoffs of all the securities and 
determines their future values completely. 

The asset pricing question is how to deter¬ 
mine the price for each of the securities. Mathe¬ 
matically, the pricing mechanism can be viewed 
as a mapping from the j -th security (or the s 
vector, the payoff obtained from owning the se¬ 
curity), to a price p that an investor is willing to 
pay today. 


(3) 


As it turns out, simple economic principles 
imply many useful properties for the mapping, 
which comprises the general principles of asset 
pricing to be discussed below. 


PORTFOLIOS AND MARKET 
COMPLETENESS 

In evaluating securities, a key principle is to 
evaluate them as a whole, and not in isolation. 
To do so, consider a portfolio of the N securities 

Xp = (piX\ + (P 2 X 2 + • • • + (PnXn (4) 

where the (p's are portfolio weights that now 
represent the units of the securities we purchase 
in the portfolio, and x p is the payoff of the port¬ 
folio, which simply adds up the individual val¬ 
ues. Note that the weights can be either positive 
or negative. A negative weight on a security is a 
short position. In the case where no short sales 
are allowed, the weights are restricted to be 
positive. 

Note that the portfolio weights are often the 
percentages of money we invest in the securi¬ 
ties, where prices are given and we are inter¬ 
ested in the return on a portfolio. In contrast, 
we focus here on the weights in terms of units 
because we are interested in determining the 
prices from payoffs. Flowever, once the prices 
are given, the weights in terms of either units 
or percentages are equivalent. To see this, if we 
express a portfolio in term of returns, denoted 
by R, rather than payoffs as above, then the 
portfolio return is 


R p = w 1 R 1 + W 2 R 2 H-b w n R n (5) 


where 


Rj = 


Vi 


is the gross return on security j, which is one 
plus the usual percentage return. The relation 
between the (p's and the w's is 


w; 


VjPj 

V1P1 H-+ <PnPn 


p(Xj) = pj 


( 6 ) 
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where the numerator is the amount of money 
allocated to security j, and the denominator is 
the total amount of money invested in the secu¬ 
rities, so that the w's are the percentage weights 
as before. 

Consider the following two securities in a 
two-state economy: 

1, up 0, up 

*1 = n A ’ x 2 = ' i a 

0, down 1, down 


Suppose their prices today are $1. Then, with 
an investment of $1 that buys 0.5 unit each of 
the securities, one obtains a portfolio 


X = ip\Xi + <£>2*2 = 0.5xi + 0.5X2 
with payoff 

_ _ 0.5, up 
0.5, down 

One can also buy 2 units of the first security, 
and short one unit of the second security; then 
the resulting portfolio is 

X = 2Xi + ( — 1)X2 


with payoff 

x = ( 2 ' Up 
—1, down 

Note that the payoff of the portfolio is neg¬ 
ative, —$1, in the down state. This means that 
when the economy is down, one has to buy back 
the second security at a price of SI (its value in 
the down state) to cover the short position. The 
net cost is $1, the payoff of the portfolio in the 
down state. In contrast to the portfolio where 
equal dollar amounts are invested in both secu¬ 
rities, this portfolio with short sales permitted 
has a higher payoff of $2 in the up state, which 
compensates for the loss in the down state. 


The portfolio's payoffs are then uniquely de¬ 
termined by the s-vector. 

Payoff = Xcp (7) 

For example, one can easily verify that this is 
true in our first illustration in which X is simply 
equal to the identity matrix. 

A portfolio <£> is said to be replicable if we can 
find another portfolio with different weights, co, 
such that their payoffs are equal 

Xto — X<£>, u> ^ <£> (8) 


In particular, if one of the x's can be replicated 
by a portfolio of others, it is called a redundant 
asset or redundant security. In any economy, re¬ 
dundant securities can be eliminated without 
affecting the properties of all the possible port¬ 
folios of the remaining assets. Sometimes, in 
order to distinguish the securities, the x's that 
define the economy, and all their possible port¬ 
folios, we will refer to the x's as primitive secu¬ 
rities because all other portfolios are composed 
of them. 

Consider the following two-state economy 


Xi 


1, up _ 2, up 

0, down ’ X2 0, down 


with prices for both securities being $1 and 
$2 today. The portfolio with weight vector <p = 
(0.5, 0.5)' is 


x = 0.5xi + 0.5x2 


This portfolio is replicable because it is also 
equal to 

x = 1.5xi 

The primitive asset X 2 is redundant here be¬ 
cause its payoff is simply double the payoff of 
the first asset. 


Redundant Assets 

A portfolio is uniquely determined by its port¬ 
folio weights, which can be summarized by 
the A-vector 

<P = (</=>!, ((>2,..., <Pn)' 


Complete Market 

In an economy with N risky securities and s 
states, a security market is formed if arbitrary 
buying and shorting are allowed, which creates 
infinitely many possible portfolios. We say the 
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market is complete and is hence referred to as a 
complete market, if, for any possible payoff, there 
is a portfolio of the primitive securities to repli¬ 
cate it. That is, for any desired payoff x, we can 
find portfolio weights such that 

<PiXi + (P2X2 H-b <Pn%n = x (9) 

A complete market not only allows investors 
to obtain any desired payoff in any state (with a 
price), but also permits unique security pricing, 
as will be clear later. 

For example, the two securities in our first 
example will form a complete market. This is 
because for any possible payoff 

fa, up 
b, down 

the portfolio 

ax 1 + bx 2 

yields the payoff. To see why, if one investor 
wants to get a $2 payoff in the up state and $3 in 
the down state, buying 2 units of the first secu¬ 
rity and 3 units of the second security will pro¬ 
vide what is exactly desired. However, the two 
securities in our second example above form an 
incomplete market. This is because for any pos¬ 
sible portfolios consisting of the two securities, 
it will be impossible to create a payoff of $1 in 
the down state. 

In terms of matrix and vector notation, a com¬ 
plete market requires that, for any payoff vec¬ 
tor, we can find portfolio weights ip to solve the 
linear equation with <p as the unknown variable 

x<p = y (io) 

Note that X is an s by N matrix and y is 
an s-vector. Recall from linear algebra that the 
number of independent columns of X is called 
the rank of the matrix X, denoted as rank(X) 
below. If rank(X) = s, the linear combinations 
of these columns will generate all possible s- 
vectors. That is, a portfolio of those securities 
whose payoffs are those independent columns 
is capable of producing any possible payoffs, 
or the market must be complete. Conversely, 


if the above linear equation has a solution to 
any y, it must do so for s independent y's, say, 
the s columns of the s-dimensional identity ma¬ 
trix, which is an s by s matrix with diagonal 
elements 1 and zero elsewhere. For example, if 
s = 2, the y's correspond to the payoffs of the 
two securities in our first example. This means 
that the linear combinations of the columns of X 
are capable of yielding s independent columns. 
So, the number of independent columns must 
be greater than or equal to s. Since X is an s by 
N matrix, its number of independent columns, 
rank( X), cannot be greater than s. Then the only 
possibility is equal to s. 

We can summarize our discussion in the fol¬ 
lowing proposition: 

Market Completeness Proposition: The market is 
complete if and only if the rank of the s by N 
payoff matrix X is s, that is, 

rank(X) = s (11) 

Consequently, for s possible states, we should 
have at least N > s primitive assets for the mar¬ 
ket to be complete. One can verify that the rank 
condition holds for the two securities in our first 
example, but not in our second example. 


THE LAW OF ONE PRICE AND 
LINEAR PRICING 

In this section, we first discuss the law of one 
price and its relation to the linear pricing rule, 
and then introduce the concept of state price 
and relate it to the law of one price. 

Linear Pricing 

The law of one price (LOP) says that two assets 
with identical payoffs must have the same price. 
In international trade, in the absence of tariffs 
and transportation costs, an apple sold in New 
York City must have the same price as an apple 
sold in London after converting the money into 
the same currency. This provides an economic 
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channel through which to tie the currencies to¬ 
gether. In the financial markets, the LOP says 
that we should not be able to profit from buying 
the same security at a higher price and selling 
it at a lower one. 

Mathematically, under LOP, if two portfolios 
have the same payoffs 

Xcp = Xco (12) 

then their prices today must be the same 

p(Xcp) = p(Xar) (13) 

where, as we recall from our earlier discussion, 
p is the mapping that maps the payoff of an 
asset or of a portfolio into its price. 

A simple necessary and sufficient condition 
for the LOP to hold is that every portfolio with 
zero payoff must have zero price. To see the ne¬ 
cessity, suppose that there is an asset with zero 
payoff that sells at a nonzero price, say, SO.01. 
We can combine this asset with any other asset 
to form a new asset without changing the pay¬ 
off, but the price of this new asset is SO.01 higher 
than before packaging the two assets. The LOP 
says that the old one and the new one must 
have the same price, which is, of course, a con¬ 
tradiction. Conversely, if two portfolios with an 
identical price were sold at different prices, say 
$2.01 and $2, buying the one with the price of 
$2.01 and shorting the one with a price of $2 
creates an asset with zero payoff, but a price of 
$0.01. This is not possible from the zero price 
condition. 

The LOP essentially prevents an asset from 
having multiple prices, which gives rise to its 
name. Only when it is true is it possible for there 
to be rational pricing with a unique price. An 
important theoretical implication of the LOP is 
that the price mapping, the p function, must be 
linear: 

p[X(a(p + bco)] — ap(Xcp) + bp(Xa>) (14) 

That is, the price of a portfolio must be equal 
to a portfolio of the component prices. Intu¬ 
itively, the price of two burgers must be two 
times the price of one, and the price of a burger 


and a Coke must be the same as the sum of 
the two individual prices. The linear pricing 
rule is fundamental in finance. It implies that, if 
the share price of a company is its future cash 
flows, then no matter how one slices the cash 
flows, the price will remain unchanged and is 
equal to the values of the slices added together. 

The linear pricing rule clearly implies the 
LOP. The price mapping is uniquely deter¬ 
mined by the payoffs only, and so it must be 
the case that the prices are identical if the pay¬ 
offs are. Conversely, if the LOP is true, paying 
the price of the left-hand side of equation (14) 
will result in a portfolio with the identical pay¬ 
off as the right-hand side, and hence their prices 
must be the same. A formal statement of this is 
as follows: 

Linear Pricing Rule: The law of one price is valid 
if and only if the linear pricing rule is true. 

State Price 

In asset pricing, the concept of a state price is 
fundamental. In our states economy, there are 
s states. The state price in state i is the price 
investors are willing to pay today to obtain 
one unit of payoff in that state, and nothing 
in other states. The state price is also known as 
the Arrow-Debreu price, named in honor of the 
originators. A state price vector will then be an 
s-vector of all the prices in all the states. If there 
exists a state price vector q = (qi, q 2 , ■ ■ ■, qs )', 
then we can write the asset price for each prim¬ 
itive security as 

Pj = qwij + q 2 V2j H - h q s Vsj (15) 

In words, this equation says that the price of 
the y'-th security is equal to its payoffs in each of 
the states times the price per unit value in that 
state. 

The state price is not only useful for linking 
the payoffs of the primitive securities to their 
prices, but also useful to price any new as¬ 
sets, including any other contingent claims or 
derivatives in the economy. All we need to do is 
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to identify the payoffs of these assets and then 
sum the products of the payoffs with their state 
prices to obtain asset prices. 

The question is whether the state price vec¬ 
tor always exists. We rewrite the state pricing 
relation (15) in matrix form as 

P = X'q (16) 

The existence of the state price vector q is the 
existence of solution q to the linear equation 
given by (16). In our states economy here, we 
can show that the LOP is necessary and suffi¬ 
cient for the existence of the state price, while 
in more complex economies, say those with an 
infinite number of assets and an infinite num¬ 
ber of states, some auxiliary condition may be 
needed. 

Existence of State Price Condition: The law of one 
price is valid if and only if the state price 
vector exists. 


The proof of the above follows from linear 
algebra. If the state price vector exists, then 

p'cp — q XV = q X'co = p'co 


which says that the price of the portfolio with 
weights (p is the same as the price of another 
portfolio as long as their payoffs are identical. 
Conversely, if the LOP is true, then for any port¬ 
folio weights w with zero payoff or satisfying 
XV = 0, we must have zero price or p'<p = 0. 
This means that p is orthogonal to every vector 
that is orthogonal to X. Now projecting p on 
the entire N-dimensional space, p must then be 
a linear combination of the columns of X. The 
combination coefficients are exactly equal to q, 
which is what we are looking for. The proof is 
therefore complete. 

As an example, consider the following two 
securities in a two-state economy. 


X\ 


1, up _ 2, up 

0, down’ Xl 0, down 


where the first security has a price of $1 and the 
second of $2. Clearly the prices are consistent 
with the LOP. In this case, a state price of (1, 0)' 


can price all portfolios of the two securities: 
1 = 1 x 1+0 x 0 

and 


2=lx2+0x0 


Another state price (1,2)' can also do the 
same. A more subtle case is in an economy when 


X\ 


1, up _ (2, up 

1, down’ Xl 2, down 


with the same prices of $1 and $2. Then 
(0.5,0.5)' and (0.2,0.8)' both, among others, 
price the two primitive securities and all their 
portfolios correctly. 

Under what conditions will the state price be 
unique? To find the conditions, recall the matrix 
form of the state pricing relation 

P ~ X'q 


The LOP is equivalent to the existence of the 
state price vector q. If the market is in addition 
complete, then q in the above equation can be 
uniquely solved as 

q=(XXT 1 Xp (17) 

Note that X is s by N, so its inverse is unde¬ 
fined unless s — N. But the inverse of the s by s 
matrix, XX', is well defined. Equation (17) leads 
to our next proposition. 


Uniqueness of State Price Proposition: If the law 
of one price holds, and if the market is 
complete, the state price must exist and be 
unique. 


For example, consider the following two se¬ 
curities in a two-state economy 


Xl 


1, up _ 3, up 

2, down’ Xl 4, down 


where the first security has a price of $4 and the 
second of 810. We can check that both the rank 
and LOP conditions are true. The unique state 
price vector is then given by equation (17), 


<?i 

. ( ? 2 . 


5 

-3.5 


U 1 

CO 

■ 4 ■ 


'2' 


4; 

1 

i—* 

0 

1_ 


1 

i—* 
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It can be verified that these prices indeed work 
for pricing the two primitive securities. 


ARBITRAGE AND POSITIVE 
STATE PRICING 

The assumption of the absence of arbitrage is 
the foundation upon which asset pricing the¬ 
ories rely. When there are any free lunches or 
what economists refer to as arbitrage opportuni¬ 
ties, asset prices are not rational. Investors are 
likely to be able to correct the prices by ex¬ 
ploiting the arbitrage opportunities, and even¬ 
tually these opportunities will disappear, and 
the prices will reflect their true values. Asset 
pricing theory is largely concerned with these 
equilibrium true values. 

In our states economy, the concept of arbitrage 
can be formally defined. There are two types of 
arbitrage. The first type exists if there is a port¬ 
folio strategy that requires no investment to¬ 
day (i.e., referred to earlier as a zero-investment 
strategy) and yet yields nonnegative payoffs in 
the future, and positive (or not identical to zero) 
at least in one of the states. Mathematically, this 
type of arbitrage can be expressed as 

Xip > 0, and not equal to zero 

with 


pm + P2V2 H-h Pn<Pn < 0 

The second type of arbitrage is one in which 
a portfolio strategy earns money today, and yet 
has no future obligations. We can express this 
mathematically as follows: 

Xcp> 0 


with prices $1 and $2. If we follow a strategy 
that involves shorting two units of the first se¬ 
curity and buying one unit of the second secu¬ 
rity, then our net investment will be zero, but 
the payoffs will be 


—2 X X\ + 1 X X 2 = 


0 

0.1 


This is an arbitrage of the first type. However, 
there is no arbitrage of the second type. This is 
because for any weights (fi\ and <p 2 , if the cost is 
negative, that is. 


<pi + 2(p2 < 0 


then the payoff in the up state of the portfolio, 

<Pi + 2^2 


will be negative too. 

To illustrate, consider the following two secu¬ 
rities in a two-state economy. 


X\ 


1, up _ _ 2, up 

—1, down ’ Xl —4, down 


with prices $1 and SI.9. If we short two units of 
the first security and buy one unit of the second 
security, then our net investment will be 


(-2) x 1 + 1 x 1.9 = -0.1 


but the payoffs will be 

„ „ , „ ro" 

—2 X Xl + 1 X %2 = 

This is an arbitrage of the second type. How¬ 
ever, there is no arbitrage of the first type. This is 
because for any weights <pi and q> 2 , the arbitrage 
requires the portfolio payoffs be nonnegative 

ipi + 2<p 2 > 0 


and 


with 

pm + P2V2 H-h Pn<Pn < 0 

Consider as an example the following two se¬ 
curities in a two-state economy: 

2, up 
4.1, down 


—<P\ — 2<p2 > 0 

in the two states, respectively. The only non¬ 
negative payoffs for both the states is the zero 
payoff in this case. So, there cannot be an arbi¬ 
trage of the first type. 

Note that the pricing operator is to map the 
payoffs of an asset to its price, and it provides 


1, up 

2, down’ 
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that the state price of the payoff is $1 in that 
state and nothing in other states. If the state 
price in one state is zero, this will be clearly 
an arbitrage opportunity as an investor can get 
future payoffs in this state for paying a zero 
price today. To rule out arbitrage opportunities 
in the economy, it is hence necessary to require 
the state prices be positive. When the pricing 
operator is both linear and implying positive 
state prices, we call it a positive linear pricing 
rule. As it turns out below, the existence of such 
a positive linear pricing rule is equivalent to 
the absence of arbitrage opportunities in the 
economy. 

Arbitrage is also related to the LOP. If there is 
no arbitrage, the LOP must be true. This is be¬ 
cause if two portfolios with two identical pay¬ 
offs were sold at different prices, a "buy low 
and sell high" strategy will result in the con¬ 
struction of a portfolio with zero payoffs in the 
future, but with positive proceeds today. This 
is an arbitrage of the first type. Thus, the no¬ 
arbitrage condition is stronger than the LOP. 
In finance, the assumption of no arbitrage is 
crucial, as explained next by the fundamental 
theorem of asset pricing. 


THE FUNDAMENTAL 
THEOREM OF ASSET 
PRICING 

Consider now an investor's utility maximiza¬ 
tion problem. Assume the investor prefers more 
to less, so that the utility function is mono¬ 
tonic in the consumption level. Given an initial 
wealth Wo, and given the trading opportunities, 
the investor's future consumption, as a vector 
in the s states, will be 

C 1 = ]N l + (Wo - Co) x R p 

where 

Co = consumption (measured in dollars) 
today. 


Rp = return on a portfolio of assets, which 
can be optimally chosen by the investor 
maximizing his or her utility, and 

Wj = the investor's income from other sources 
next period, such as labor income 

The utility is a monotonic function of both Co 
and Ci. 

Then the following theorem ties together the 
no-arbitrage, positive linear pricing rule, and 
the utility maximization problem. 

Fundamental Theorem of Asset Pricing: The fol¬ 
lowing are equivalent: 

1. Absence of arbitrage 

2. Existence of a positive linear pricing rule 

3. Existence of an investor with monotonic 
preference whose utility is maximized 

We provide a simplified proof here. (A more 
rigorous proof is provided in Dybvig and Ross 
(1987).) 

To see that the absence of arbitrage implies ex¬ 
istence of a positive linear pricing rule, we note 
first that earlier we provided the argument for 
the existence of the linear pricing rule. The pos¬ 
itivity of the state prices must be true in the 
absence of arbitrage. This is because, if there is 
a zero or negative state price in some state, then 
the payoffs in this state are free lunches, so arbi¬ 
trage opportunities can arise. Conversely, if the 
state prices are positive, every single payoff in 
each state has a positive price, and there cannot 
be any free lunch. 

Mathematically, this can also be easily demon¬ 
strated. If cp is an arbitrage portfolio so that its 
price is zero or negative, then 

0 > p> = (X'(/)V = Cj'(Xcp) 

where the first equality is the linear pricing rule, 
and the second equality holds by matrix multi¬ 
plication rules. Because of positive state prices, 
all components of Cj are positive. If p'cp is zero, 
X(p must be all zeros, and if p'(p is negative, Xip 
must have strictly negative components. Both 
contradict the assumption that <p is an arbitrage 
portfolio. Hence, there are no arbitrage oppor¬ 
tunities when the state prices are positive. 
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To see how the existence of a positive lin¬ 
ear pricing rule implies the existence of an 
investor with monotonic preference whose util¬ 
ity is maximized, the consumption of the in¬ 
vestor in each state must be finite since the 
investor has finite wealth, and since the investor 
faces a binding budget constraint due to pos¬ 
itive state prices. Finally, the existence of an 
investor with monotonic preference whose util¬ 
ity is maximized clearly implies the absence of 
arbitrage. This is because adding an arbitrage 
portfolio (i.e., a free lunch) to the investor's 
portfolio will only strictly increase his or her 
utility without affecting the budget, contradict¬ 
ing the fact that the utility is maximized to begin 
with. This concludes our proof. 

An important insight from the fundamental 
theorem is what we need for rational pricing. 
In deriving pricing formulas, many theoreti¬ 
cal equilibrium asset pricing models assume all 
investors behave rationally and have identical 
information sets. The theorem says that, to ra¬ 
tionally price assets or to ensure market pricing 
efficiency, we do not need to assume that all in¬ 
vestors are smart. What we need is a few smart 
ones who can capitalize on any arbitrage op¬ 
portunities. Then, the prices should be in line 
with their payoffs in the economy. 

The Discount Factor 

Related to the fundamental theorem is the con¬ 
cept of the discount factor. As it turns out, this 
is the common feature of almost all asset pric¬ 
ing models, a point that will become evident in 
the next section. Let 0, > 0 be the probability for 
state i to occur. The linear pricing rule given by 
equation (15) can be rewritten as 

Pj = #iO?i / 6 i)vij + $2(^2/$2)^2/ H- 

+ O s (q s /e s )v sj = E(mvj) (18) 

where m is a random variable whose value in 
state s is equal to 



Equation (18) says that the price for asset j is 
given by the expected value of its payoff multi¬ 
plied by a random variable m, where m is com¬ 
mon for all assets. 

Suppose now that there is a risk-free asset in 
the economy that can earn a risk-free interest 
rate r, and that the price of this risk-free asset 
today is $1 (we can scale the asset unit if nec¬ 
essary). Then the payoff of this risk-free asset's 
price in the next period will be 1 + r in all the 
states. So, by equation (18), we have for the fol¬ 
lowing expected payoff for this risk-free asset 

1 = E[m( 1 + r)] 

and therefore 

E[m] = 1 ] (20) 

1+r 

If there were no risks in the economy, and if 
there were no arbitrage, it is clear that all assets 
should earn the same risk-free rate of return. 
Hence, assets should be priced by their present 
values of the cash flows, or the prices are equal 
to the discounted cash flows with the discount 
factor 1 /(I + r). When there is risk as is the case 
now, the payoffs are multiplied by the random 
variable m whose mean is 1 /(I + r ). This is why 
m is also known as a stochastic discount factor 
because (1) it is random, and (2) it extends the 
risk-free discounting to the risky asset case. 

Consider, for example, three securities in a 
three-state economy with prices $5, $5, and $6, 
and with the following payoff matrix: 

/10 20 30\ 

X = ( 10 10 10 
\10 5 5 / 

In this economy, the first asset is the risk-free 
asset since it has a constant payoff of $10 regard¬ 
less of the future state. Moreover, the risk-free 
rate is 100% because the asset is sold at a price 
of $5. The state price vector can be solved using 
equation (17) and is q = (0.1, 0.2, 0.2)'. Assume 
the probability for each state is 1/3. Then the 
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linear pricing rule can be expressed as 

5 = pi = - x (0.3 x 10) + - x (0.6 x 10) 

+ - x (0.6 x 10), 

5=p 2 =^x (0.3 x 20) + ^ x (0.6 x 10) 

1 

+ - x (0.6 x 5), 

1 1 

6 = p 3 — - x (0.3 x 30) + - x (0.6 x 10) 

1 

+ - x (0.6 x 5) 

Let m be a random variable that has values 
0.3,0.6, and 0.6 in the three possible states. Then 
the above says that, for each asset, the price is 
the expected value of the discounted payoff. 
The mean of the discount factor is 

111 
E[m\ = - x 0.3 + - x 0.6 + - x 0.6 = 0.5 
3 3 3 

1 

“ 1 + 100 % 

This verifies equation (19). 

The state price vector, or equivalently the dis¬ 
count factor, is not only useful for pricing primi¬ 
tive assets, but also useful to price any portfolio 
consisting of them, as well as derivatives. For 
example, consider a call option that grants the 
owner of the option the right to buy one unit of 
the second asset at a price of $10. This option 
will have a value in state 1 equal to $10 (the 
price of the second asset in state 1 reduced by 
the price that must be paid to acquire asset 1 as 
provided for by the option, $10. The value of the 
option is therefore $10, the difference between 
$20 - $10 in state 1). In the other two states, the 
value of the option is zero because the payoff 
(i.e., the price of the second asset) is no greater 
than $10. Hence, it would not be economic for 
the owner of the option to exercise. Then the 
price of this call option is 

1 1 

Price of Call = - x (0.3 x 10) H— x 0 

4 — 


The discount factor prices the assets by taking 
the expectation under the true probabilities. 


Pricing Using Risk-Neutral 
Probabilities 

Alternatively, one can also price the assets un¬ 
der a probability measure known as the risk- 
neutral probabilities. The approach is especially 
useful for pricing derivatives. The reason is that 
the risk-neutralized payoffs are easier to deter¬ 
mine, while the solution of the discount factor 
is more complex. 

To see how the risk-neutral approach works 
here, we apply the linear pricing rule given by 
equation (18) to the risk-free asset. We have: 

1 = <?i(l + r) + <7 2 (1 + r) H-F q s (l + r) 

so that 

1 

+ ^2 h- \-q s — zj-q — — q 

which says the sum of state prices must be equal 
to the present value of $1 today. Denote by q the 
sum of the individual q 's. Since now all the state 
prices are positive, the ratio of each to q can be 
considered a probability. Since the ratios sum 
to one, the probability is well defined. How¬ 
ever, this is not the original true probability of 
the states, but rather some artificial probability, 
which will be useful in the future for pricing 
derivatives and other assets. 

Suppose now, without loss of generality, that 
the risk-free asset is the first one. Then the pric¬ 
ing relations for the other assets are 


Vj — *71 v ij + °l2 v 2 j + • • • + qsV s j 


1 


1 + r 

1 

1 +r 


<7i 


Vlj 


q 2 


V2j 


% 

q 


E°[i 


( 21 ) 


that is, the price is the present value discounted 
at the risk-free rate of the risk-adjusted expected 
payoff of the asset, where E c - denotes the expec¬ 
tation taken under the artificial probability. In 
other words, for any risky asset, we compute 
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its value in two steps. In the first step, the risk- 
neutralized payoff is calculated. In the second 
step, treating this payoff as riskless, the payoff 
is discounted at the risk-free rate to obtain the 
price. Consequently, the artificial probability is 
also often referred to as the risk-neutral probabil¬ 
ity measure. 

For example, for the assets in our previous 
example, the sum of the state prices is 


0 . 1 + 0 . 2 + 0.2 = 


1 

1 + 100 % 


= 0.5 


Moreover, the risk-neutral probabilities are 
1/5, 2/5, and 2/5. So the expected payoff of 
the earlier call option is 

E ®(call) = -xl0+-x0+-x0 = 2 
5 5 5 

Discounting the $2 at the risk-free rate (100% 
in our example), we get the price of $1 (= $2/ 
(1 + 1)). This price is, of course, the same as 
computed above using the discount factor to 
price the call option. 


DISCOUNT FACTOR MODELS 

In this section, we provide the discount factor 
models in a more general setup by allowing 
the asset returns to be arbitrarily distributed, 
not necessarily finite states as in the previous 
section. Then we derive a lower bound on the 
variance of all possible discount factors, known 
as the Hansen-Jagannathan bound, and apply it 
to analyze the implications of some important 
theories in financial economics. 


STOCHASTIC DISCOUNT 
FACTORS 

Consider now a more general problem of an in¬ 
vestor who is interested in maximizing utility 
over the current and future values of consump¬ 
tion, 

U(C t ,Ct+i) = u(Ct) + SE[u(C t+ 1 )] 

where the first term is the utility of consump¬ 
tion today, the second term is the utility of fu¬ 


ture consumption, and <5 is the subjective time- 
discount factor of the investor that captures the 
investor's trade-off between current and future 
consumption. Note that the second term has 
an expectation operation since future consump¬ 
tion is unknown today, and the investor can 
only maximize the expected utility with the ex¬ 
pectation taken over all possible random real¬ 
izations of the future consumption. 

Besides the quadratic utility, another popular 
form of utility function is the power utility 


u(C t ) = 


c}~ Y 

W 


where y is the risk-aversion coefficient. The 
higher the y, the more risk averse the investor. 
Typically, a value of y of about 3 is believed to 
be reasonable. 

For notational brevity, we assume there is only 
one risky asset, which the pricing relation de¬ 
veloped holds for an arbitrary number of as¬ 
sets by adding them into the model. Unlike 
earlier sections in this entry where finite pay¬ 
offs were assumed, we now assume the payoff 
of the risky asset can have an arbitrary proba¬ 
bility distribution, so long as the expectation is 
well defined. The budget constraints for maxi¬ 
mizing the utility can be written as 


Ct = W f — p t w 

Cf+i = Wf+i + Xt + \w 


where W f and W t+1 are the investor's wealth 
from other sources, w is the number of units of 
the risky asset the investor purchases today at 
time f, p t is the security price, and X f+1 is the 
payoff. 

Plugging the budget constraints into the util¬ 
ity function, and taking the derivative with re¬ 
spect to w, we obtain the first-order condition 
(FOC): 


ptii'(Ct) = E t [Su'(C t+ i)X f+1 ] 


or 

p t = E t [mX t+1 ], m = 8 U ^=^ (22) 

M'(Cf) 

This equation says that the price today is the 
expected value of the discounted payoff, and m 
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is the discount factor. In the case of the power 
utility. 


m — 8 



(23) 


which is a power function of the consumptions. 

What we have derived in equation (22) is 
called a consumption-based asset pricing model, so 
named because the theory is motivated from 
the perspective of consumption. This motiva¬ 
tion is different from the earlier no-arbitrage 
arguments that yield equation (18). However, 
the pricing equations have the same form, ex¬ 
cept that the discount factor now takes a new 
specification. Indeed, most, if not all, asset pric¬ 
ing models are of the discount factor form, and 
different theories may specify the m differently. 
For the particular specification of m given by 
equation (22), it is also known as the marginal 
rate of substitution because it is the ratio of the 
marginal utilities. 

Intuitively, when the marginal rate of substi¬ 
tution is high, the value of future consumption 
will be high, and an investor is willing to pay 
more for the asset if the asset's payoff is high 
in this case. This is why the price, as given by 
equation (22), is high. 

The discount factor representation of asset 
prices is often also expressed in terms of re¬ 
turns. Let R t be the gross return on the asset 
where the gross return is equal to one plus the 
return. That is ,R t — X t+i /p t . Then the pricing 
relation in equation (22) is equivalent to 


1 = E t [mR t+ i] (24) 


If an asset price is scaled to be equal to $1, the 
payoff will be its return, and then the expected 
discounted return must be equal to $1, its price 
today. When there are N risky assets, we can 
write the discount factor model as 


1 = E t [mR j4+1 ] (25) 

where Rj,t +1 is the return on the asset;. 

Note that the expectation in equation (25) is 
conditional on all available information and 
therefore the pricing relation is known as the 


conditional form of the discount factor model. Tak¬ 
ing expectation on both sides of equation (25), 
we obtain 


1 = E[mRj' t+ i ] (26) 

which is known as the unconditional form of the 
discount factor model. Since conditional implies 
unconditional, and the reverse is not necessar¬ 
ily true, equation (26) is a weaker form of the 
model. 


Application to CAPM and APT 

To see the generality of the discount factor 
model, consider now its relation to the two 
dominant equilibrium asset pricing models: the 
CAPM and APT. As explained shortly, one can 
write these two asset pricing models as follows: 

— r + H fiji + ■ ■ • + ^tcPjK (27) 

where Rj is the gross return on asset;, (fk is the 
beta or risk exposure on the k- th factor /*, A.* 
is the factor risk premium, for k = 1,2,..., K, 
and r is a constant. 

Although equation (27) is now written out in 
terms of the gross returns to conform with dis¬ 
count factor notations, it can be reduced to have 
exactly the same expression in terms of returns. 
For example, the CAPM specifies K = 1, r as the 
gross risk-free rate 1 + r, k\ = E[R m ] — 1 — r, 
and R m is the gross return on the market portfo¬ 
lio. In this case, M is same as the usual market 
return in excess of the risk-free rate since the 
ones in their difference will be canceled out. 

We claim that if, and only if, the stochastic 
discount factor is a linear function of the factors 

m = a + b\ f\ H-1- b K f K (28) 

we will obtain equation (27). Conversely, if 
equation (27) is true, the discount factor must 
be a linear function of the factors. Therefore, the 
CAPM and APT are special cases of the discount 
factors models. 

To see why, it is sufficient to analyze the case 
of K = 1. For simplicity, we drop the subscripts 
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so that we want to show 

m = a + bf (29) 

and 

E [ R j ] = r + Xfij (30) 

are equivalent. The latter is often referred to as 
a beta pricing model. In the proof below, we can 
assume £ [f] = 0 since we can always move the 
mean of / into a. Recall the simple statistical 
formula that the covariance between any two 
random variables can be written as a sum of the 
expectation of their product and the product of 
their expectations 

Cov(x, xj) = E[xxj] + £[x]E[y] (31) 

Using this formula and £[/] = 0, we have, if 
equation (29) is true, 

1 = E[mRj] = aE[Rj] + bE[fRj] 

= aE[Rj ] + bCov(Rj, f) - bE[Rj]E[f] 

= aE[Rj] + bCov(Rj, f ) 

Solving for E[Ry], we obtain 

E[Rj] = - — -Cov(Rj, f) (32) 

a a 

Comparing this equation with equation (30), 
it follows that 

1 b r. 

r = —, X = --cr 2 (f) (33) 

a a 

where er 2 (/) is the variance of the factor. Hence, 
if the discount factor model is true, it must im¬ 
ply the beta pricing model. Conversely, if the 
beta pricing model is true, we can solve a and 
b from equation (33) to get the discount factor 
model. 


Hansen-Jagannathan Bound 

As we discussed, an asset pricing model is a 
specification of the discount factor. The ques¬ 
tion is what properties all the possible discount 
factors m must have. Hansen and Jagannathan 
(1991) show that the variance of the discount 
factors has to be bounded below. In other 


words, m must be volatile enough with respect 
to the asset returns to be priced. 

The discount factor relation, equation (26), 
ties the return R t of an asset to its price via 
the expectation of its product with m. It will be 
useful to separate R t out to understand further 
the relation between m and R t . Again using the 
covariance formula, equation (31), we have 

1 = Cov[m,R t+1 ] + E[m]E[R t+1 \ (34) 

Suppose that a risk-free asset with gross re¬ 
turn Rf = 1 + r is available, where r is the usual 
risk-free rate. Applying equation (34) to the 
risk-free asset, the first term will be zero, and 
hence 

E[m] = -^~ (35) 

Note that this equation is true for all possible 
discount factors and is an extension of earlier 
equation (20). In other words, for all possible 
stochastic discount factors, their mean must be 
equal to 1/(1 + r) to price the risk-free asset. 

Now we multiply equation (34) by Rf on both 
sides, and obtain 

£[R f+ i] - Rf = —RfCov[m, R t+ i] 

This says that an asset's return in excess of the 
risk-free rate will be higher if it has a larger 
negative covariance with m. Recall that the co- 
variance is related to correlation and standard 
deviations by 

Cou[x, y] = a(x) x cr(y) x C orr(x, y) 

where er(-) denotes the standard deviation func¬ 
tion. Since the correlation is always between -1 
and 1, we have from the earlier equation that 

|£[R f+ i] - R f | = R f \Cov[m, R f+ i]| 

< Rf x a(m) x cr(Rt + 1) 

Separating terms on m from those on R f+ j, we 
have a lower bound on the standard deviation 
of m as denoted by a (in) 

> |£[Rm] ~ Rf I , 36 v 

E[m] ~ a(Rt +1 ) ( ’ 
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The right-hand side, the ratio of the expected 
return on a risky asset to its standard devia¬ 
tion, is the Sharpe ratio that measures the extra 
return beyond the risk-free rate per unit of as¬ 
set risk. The relationship given by equation (36) 
says that, for any discount factor that prices the 
assets, it must have enough variability so that 
its standard deviation divided by its mean must 
be greater than the Sharpe ratio of any risky as¬ 
set in the economy. 

The above lower bound on o(m ) is known as 
the Hansen-Jagannathan bound. It is an impor¬ 
tant result since if an asset pricing model fails 
to pass this bound, then the proposed asset pric¬ 
ing model can be rejected. For example, to test 
the validity of either the discount factor model 
given by equation (18) for a finite state economy, 
or the consumption-based asset pricing given 
by equation (22), or the CAPM and the APT, 
one can test first whether it passes the bound 
given by (36). No further testing will be neces¬ 
sary if it fails the Hansen-Jagannathan bound. 
Theoretically, Kan and Zhou (2006) show that 
the Hansen-Jagannathan bound can be tight¬ 
ened substantially with the use of information 
on the state variables of the stochastic discount 
factor. 


KEY POINTS 

• A complete market is one in which any de¬ 
sired payoff in the future can be generated by 
a suitable portfolio of the existing assets in the 
economy. 

• In a world where the number of states (future 
scenarios) is finite, a market is complete if and 
only if this number is equal to the rank of the 
asset payoff matrix. In particular, it is neces¬ 
sary for the number of assets to be greater 
than the number of states. 

• The law of one price states that any two assets 
with identical payoffs in the future must have 
the same price today. 


• A linear pricing rule means that the price of 
a basket of assets is equal to the sum of the 
prices of those assets in the basket. The law 
of one price is true if and only if the linear 
pricing rule is true. 

• The state price is the price one has to pay to¬ 
day to obtain a one dollar payoff in a partic¬ 
ular future state and nothing in other states. 
The existence of the state price is equivalent to 
the validility of the law of one price. It will be 
unique if the market is in addition complete. 

• There are two types of arbitrage opportuni¬ 
ties. The first is paying nothing today and 
obtaining something in the future, and the 
second is obtaining something today with no 
future obligations. 

• The fundamental theorem of asset pricing as¬ 
serts the equivalence of three key issues in 
finance: (1) absence of arbitrage; (2) existence 
of a positive linear pricing rule; and (3) exis¬ 
tence of an investor who prefers more to less 
and who has maximized utility (no more free 
lunches to pick up from the economy). 

• Due to risk, a rational investor will not pay a 
price equal to the expected value of an as¬ 
set and will instead discount it by a suit¬ 
able factor for compensation for taking on the 
risk. A stochastic discount factor is a random 
variable such that the expected value of its 
product with the asset payoffs is the rational 
price of the asset. The stochastic discount ex¬ 
tends the risk-free discounting (time value of 
money) to the risky asset case and is the same 
for pricing all the assets in the economy. 

• The CAPM and APT are special cases of 
stochastic discount factor models in which the 
discount factor is a linear function of the mar¬ 
ket factor or APT factors. Moreover, almost all 
asset models can be formulated as stochastic 
discount factor models. 

• The Hansen-Jagannathan bound provides a 
simple bound on the variance of a stochas¬ 
tic discount factor, so that one can exam¬ 
ine whether the stochastic discount factor 
satisfies some basic restrictions on the data. 
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If not, we can reject it without further 
analysis. 
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Abstract: Risk-return analysis in finance is a "normative" theory: It does not purport to describe, 
rather it offers advice. Specifically, it offers advice to an investor regarding how to manage a 
portfolio of securities. The investor may be an institution, such as a pension fund or endowment; 
or it may be an asset management firm with multiple portfolios to manage (e.g., managing various 
mutual funds and funds for institutional clients). The focus of risk-return analysis is on advice for 
each individual portfolio. This contrasts with capital asset pricing models, which are hypotheses 
concerning capital markets as a whole. They are "positive" models, that is, they are hypotheses 
about that which is—as opposed to "normative" models, which advise on what should be or, more 
precisely, advise on what an investor should do. 


INTRODUCTION 

Asset pricing theory seeks to explain how the 
price or value of a claim from ownership of 
a financial asset is determined. The pricing or 
valuation of an asset must take into account the 
timing of the payments expected to be received 
and the risk associated with receiving the ex¬ 
pected payments. The major challenge in asset 
pricing theory is often not the timing issue but 
the treatment of risk. The formulation of an as¬ 
set pricing theory that has empirically proven to 
have good predictive value offers investors the 
opportunity to capitalize on mispriced assets. 
Moreover, the theory provides investors with a 
tool for pricing new financial instruments and 
nonpublicly traded assets. 

Cochrane (2001) suggests two popular ap¬ 
proaches to asset pricing: absolute pricing and 


relative pricing. The absolute pricing approach 
seeks to price an asset by reference to its expo¬ 
sure to fundamental macroeconomic risk. An 
example of an absolute pricing approach is the 
consumption-based capital asset pricing model 
(CAPM) formulated by Breeden (1979). In con¬ 
trast, the relative pricing approach seeks to 
value an asset based only on the prices of other 
assets without reliance on the exposure of the 
asset to the various sources of macroeconomic 
factors. The well-known option pricing model 
formulated by Black and Scholes (1973) is an ex¬ 
ample of an asset pricing model that employs 
the relative pricing approach. 

Most asset pricing models used in practice to¬ 
day are the result of a blend of both approaches. 
Capital asset pricing models (CAPM), the subject 
of this entry, are an example. The CAPM starts 
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as an absolute pricing model but then, as will be 
explained, prices assets relative to the market. 
There is no attempt in the CAPM to determine 
how the market risk premium or the risk factor 
is determined in an economy. 

In this entry, we focus on the basic CAPM 
first formulated in the 1960s by several aca¬ 
demicians. There have been numerous ex¬ 
tensions of the basic CAPM that have been 
proposed in the decades that followed but these 
will not be covered in this entry. However, be¬ 
cause there is considerable confusion regard¬ 
ing certain aspects of the theory, in addition 
to describing the basic CAPM in this entry we 
explain the sources of the confusion and their 
implications. 


SHARPE-LINTNER CAPM 

The first CAPM was that of Sharpe (1964) 
and Lintner (1965). The Sharpe-Lintner CAPM 
(SL-CAPM) assumes the following: 

* All investors have the same beliefs concern¬ 
ing security returns. 

* All investors have mean-variance efficient 
portfolios. 

* All investors can lend all they have or can 
borrow all they want at the same risk-free 
interest rate that the U.S. federal government 
pays to borrow money. 

By the mean it is meant the expected value 
of the return of a security or portfolio. Thus, 
throughout this entry, we use the terms "mean 
return" and "expected return" interchangeably. 
By variance, we mean the variance of the re¬ 
turns of a security or portfolio. This is the square 
of the standard deviation, the most commonly 
used measure in statistics to quantify the dis¬ 
persion of the possible outcomes of some ran¬ 
dom variable. Standard deviation is the more 
intuitively meaningful measure: Most of any 
probability distribution is between its mean mi¬ 


nus two standard deviations and mean plus 
two distributions. It is not true that most of 
a distribution is between the mean and plus 
or minus two variances, or any other number 
of variances. While standard deviation is the 
more intuitive measure, formulas are more con¬ 
veniently expressed in terms of variance. One 
can most easily compute the variance of a port¬ 
folio and then take its square root to obtain its 
standard deviation. 

By mean-variance efficient portfolios, we mean 
that of all the possible portfolios that can be 
created from all of the securities in the market, 
the ones that have highest mean for a given 
variance. 

The two major conclusions of the SL-CAPM 
are: 

CAPM Conclusion 1. The market portfolio is a 
mean-variance efficient portfolio. 

CAPM Conclusion 2. The difference between 
the expected return and the risk-free inter¬ 
est rate, referred to as the excess return, of 
each security is proportional to its beta. 

The "market portfolio" includes all securities in 
the market. The composition of the portfolio is 
such that the sum of the weights allocated to all 
the securities is equal to one. That is, denoting 
Xf as the percentage of security i in the market 
portfolio (denoted by M), then 

E x, m = i (i) 

i=i 

Each holding of a security is proportional to its 
part of the total market capitalization. That is, 

M Market value of i -th security 
Total market value of all securities 

CAPM Conclusion 1 is that this "market port¬ 
folio" is on the mean-variance efficient frontier. 

Let r, stand for the return on the /-th security 
during some period. The return on the market 



Capital Asset Pricing Models 


67 


portfolio then is 

n 

r M = J2 (3) 

1=1 

The beta (/?) referred to in CAPM Conclusion 
2 can be estimated using regression analysis 
from historical data on observed returns for a 
security and observed returns for the market. 
In this regression analysis, security return is the 
"dependent variable" and market return is the 
"independent variable." However, the beta pro¬ 
duced by this analysis should be interpreted as 
a measure of association rather than causation. 
That is, it is a measure of the extent that the 
two quantities move up and down together, not 
as the so-called "independent variable" causing 
the level of the "dependent variable." Below we 
examine why there is this association (not cau¬ 
sation) in CAPM between security returns and 
market return. 

The excess return, denoted by a, is the dif¬ 
ference between the security's expected return, 
E(r,), and the risk-free interest rate, p, at which 
all investors are assumed to lend or borrow: 

ei = E(n) — r f (4) 

CAPM Conclusion 2 is that the excess return for 
security i is proportional to its /3. That is, letting 
A: be a constant then 

e L = kfii i = 1,..., n (5) 

It can also be shown that equation (5) applies 
to portfolios as well as individual securities. 
Thus in an SL-CAPM world, each security and 
portfolio has an excess return that is propor¬ 
tional to the regression of the security or port¬ 
folio's return against the return of the market 
portfolio. 


ROY CAPM 

A second CAPM, which appeared shortly after 
that of the writings of Sharpe and Lintner, dif¬ 
fers from the SL-CAPM only in its assumption 


concerning the investment constraint imposed 
by investors. More specifically, it assumes that 
each investor (I) can choose any portfolio that 
satisfies 

n 

£*/ = ! ( 6 ) 

1=1 

without regard to the sign of the variables. Pos¬ 
itive Xj is interpreted as a long position in a 
security while a negative Xj is interpreted as a 
short position in a security. 

However, a negative Xj is far from a realistic 
model of real-world constraints on shorting. For 
example, equation (6) would consider feasible 
a portfolio with 

Xr = -1,000 

X 2 = 1,001 

X; = 0 i = 3,..., n 

since the above sums to one. This would corre¬ 
spond to an investor depositing $1,000 with a 
broker; shorting $1,000,000 of stock 1; then us¬ 
ing the proceeds of the sale, plus the $1,000 de¬ 
posited with the broker to buy $1,001,000 worth 
of stock 2. In fact, in this example. Treasury Reg¬ 
ulation T (Reg T) would require that the sum of 
long positions, plus the value of the stocks sold 
short, not exceed $2,000. 

Equation (6), as the only constraint on port¬ 
folio choice, was first proposed by Roy (1952), 
albeit not in a CAPM context. Since it is difficult 
to pin down who first used this constraint set in 
a CAPM (more than one did so almost simulta¬ 
neously), we refer to this as the Roy CAPM as 
distinguished from the SL-CAPM. 


CONFUSIONS REGARDING 
THE CAPM 

Probably no other part of financial the¬ 
ory has been subject to more confusion, by 
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professionals and amateurs alike, than the 
CAPM. Major areas of confusion include the 
following: 

Confusion 1 . Failure to distinguish between the 
following two statements: 

The market is efficient in that each partici¬ 
pant has correct beliefs and uses them to 
their advantage, 
and 

The market portfolio is a mean-variance effi¬ 
cient portfolio. 

Confusion 2. Belief that equation (5) shows that 
CAPM investors get paid for bearing "mar¬ 
ket risk." That this view—held almost uni¬ 
versally until quite recently—is in error is 
easily demonstrated by examples in which 
securities have the same covariance structure 
but different excess returns. 

Confusion 3. Failure to distinguish between the 
beta in Sharpe's one-factor model of covari¬ 
ance (see Sharpe, 1963) and that in Sharpe's 
CAPM. 

The following sections present the assump¬ 
tions and conclusions of the SL-CAPM and the 
Roy CAPM, and discuss the nature of these 
three historic sources of confusion, and their 
practical implications. 


TWO MEANINGS OF 
MARKET EFFICIENCY 

CAPM is an elegant theory. With the aid 
of some simplifying assumptions, it reaches 
dramatic conclusions about practical matters. 
For example: 

* How can an investor choose an efficient port¬ 
folio? The answer: Just buy the market. 

* How can an investor forecast expected re¬ 
turns? The answer: Just forecast betas. 

* How should an investor price a new security? 
The answer is once again: Forecast its beta. 

CAPM's simplifying assumptions make it 
easier to deduce properties of market equilib¬ 


ria, which is like computing falling body trajec¬ 
tories while assuming there is no air. But, before 
betting the ranch that the feather and the brick 
will hit the ground at the same time, it is best to 
consider the implications of some of the omit¬ 
ted complexities. The present section mostly ex¬ 
plores the implications of generalizing one of 
the CAPMs' simplifying assumptions. 

Note the difference between the statement 
"The market is efficient," in the sense that mar¬ 
ket participants have accurate information and 
use it correctly to their benefit, and the state¬ 
ment "The market portfolio is a mean-variance 
efficient portfolio." Under some assumptions 
the two statements are equivalent. Specifically, 
if we assume: 

Assumption 1. Transaction costs and other illiq¬ 
uidities can be ignored. 

Assumption 2. All investors hold mean-variance 
efficient portfolios. 

Assumption 3. All investors hold the same (cor¬ 
rect) beliefs about means, variances, and co- 
variances of securities. 

Assumption 4. Every investor can lend all she or 
he has or can borrow all she or he wants at 
the risk-free interest rate. 

Then based on these four assumptions we get 
CAPM Conclusion 1: The market portfolio is a 
mean-variance efficient portfolio. This CAPM 
conclusion also follows if Assumption 4 is re¬ 
placed by the following assumption: 

Assumption 4'. Equation (6) is the only constraint 
on the investor's choice of portfolio. 

As noted earlier, a negative X, is interpreted 
as a short position; but this is clearly a quite un¬ 
realistic model of real-world short constraints. 
Equation (6) would permit any investor to de¬ 
posit $1,000 with a broker, sell short $1,000,000 
worth of one security, and buy long $1,001,000 
worth of another security. 

In addition to CAPM Conclusion 1, Assump¬ 
tions 1 through 4 imply CAPM Conclusion 
2: In equilibrium, excess returns are propor¬ 
tional to betas, as in equation (5). This CAPM 


Capital Asset Pricing Models 


69 


Table 1 Expected Returns and Standard Deviations 
for Three Hypothetical Securities 3 


Security 

Expected Return 

Standard Deviation 

1 

0.15% 

0.18% 

2 

0.10% 

0.12% 

3 

0.20% 

0.30% 


“Security returns are uncorrelated. 


conclusion is the basis for the C APM's prescrip¬ 
tions for risk adjustment and asset valuation. 

Since a Roy CAPM world may or may not 
have a risk-free asset. Assumptions 1-3 plus As¬ 
sumption 4' cannot imply CAPM Conclusion 2. 
These assumptions do, however, imply the fol¬ 
lowing: 

CAPM Conclusion 2'. Expected returns are a lin¬ 
ear function of betas, that is, there are con¬ 
stants, a and b, such that 

E(n) = a + bfii i = l,...,n (7) 

Equation (5) of the SL-CAPM is the same as 
equation (7) of the Roy CAPM with a = ry. 

CAPM Conclusions 1 and 2 (or 2') do not 
follow from Assumptions 1, 2, and 3 if 4 (or 
Assumption 4') is replaced by a more realistic 
description of the investor's investment con¬ 
straints. This is illustrated by an example with 
the expected returns and standard deviations 
given in Table 1. In this example, it is assumed 
that the returns are uncorrelated (but similar 
results occur with correlated returns). The ex¬ 
ample assumes that investors cannot sell short 
or borrow. The same results hold if investors 
can borrow limited amounts or can sell short 
but are subject to Reg T or a similar constraint. 

Assumptions 1 through 3 are assumed in 
this example. Rather than Assumption 4 or As¬ 
sumption 4', the example assumes that the in¬ 
vestor can choose any portfolio that meets the 
following constraints: 

Xi+X 2 +X 3 = 1.0 (8a) 

Xi > 0, X 2 > 0, X 3 > 0 (8b) 



Figure 1 Example Illustrating That When Short 
Sales Are Not Allowed, the Market Portfolios Are 
Typically Not Mean-Variance Efficient 

This is the "standard" portfolio selection con¬ 
straint set presented in Markowitz (1952). It 
differs from the Roy constraint set in the inclu¬ 
sion of nonnegativity constraints, the inequali¬ 
ties given by (8b). 

In Figure 1, Xj—the fraction invested in Secu¬ 
rity 1—is plotted on the horizontal axis; X 2 —the 
fraction invested in Security 2—is plotted on 
the vertical axis; and X 3 —the fraction invested 
in the third security—is given implicitly by the 
relationship X 3 = 1 - Xi - X 2 . In the figure, the 
portfolio labeled "c" has smaller variance than 
any other portfolio that satisfies the equation 
(8a) constraint. In general, such a minimum- 
overall-variance portfolio may or may not sat¬ 
isfy the inequalities given by (8b) constraints. 
In other words, the minimum-overall-variance 
portfolio may or may not be feasible for the 
original Markowitz constraint set (Markowitz, 
1952). In the present example it is. Results simi¬ 
lar to those we illustrate here also typically hold 
when c is not feasible for the standard model. 1 

The line il' connects all points (portfolios) 
that minimize variance, on the portfolio-as-a- 
whole, for various levels of portfolio expected 
return, subject to equation (8a), ignoring non¬ 
negativity inequalities (8b). Using differential 
calculus, one can minimize a function such as 

v = £x?v, 

2=1 


and 


(9a) 
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subject to constraints 

3 

E x ' = 1 w 

i =1 

3 

Eq = E X i E ( r i) (9c) 

t=l 

One can do so with the expected returns and 
standard deviations from Table 1, letting £o 
vary, and thereby obtain the line in Figure 1. 
Moving downward and to the right on tt!, the 
portfolio expected return increases. This down¬ 
ward direction for increasing expected return 
does not always hold: It depends on the choice 
of security expected returns. 

In the Roy model, every point in the figure 
is feasible since they all satisfy equation (6) 
or, equivalently, equation (8a). It follows that, 
in the Roy CAPM, all points on tt!, from "c" 
downward in the direction of increasing E, are 
efficient. But in the standard model, including 
nonnegativity inequalities (8b), all points on It! 
below the point "b" are not feasible (since they 
have negative X 2 ) and therefore cannot be ef¬ 
ficient. In this example, when portfolio choice 
is subject to the standard constraint set, the set 
of efficient portfolios is the same as that of the 
Roy constraint set from portfolio c to portfo¬ 
lio b. After that, the set of efficient portfolios 
moves horizontally along the Xi axis, ending at 
point (0, 0). This represents the portfolio with 
everything invested in Security 3, which has 
maximum expected return in the example. 

Suppose that some investors select the cau¬ 
tious portfolio d, while the remainder selects the 
more aggressive portfolio e. The market port¬ 
folio M lies on the straight line that connects 
d and e (e.g., halfway between if both groups 
have equal amounts invested). 

But M is not an efficient portfolio, either for 
the standard constraint set or for the Roy con¬ 
straint set. Thus, even though all investors hold 
mean-variance efficient portfolios, the market 
portfolio is not mean-variance efficient! 


A Simple Market 

Figure 1 demonstrates that if the expected re¬ 
turns and variances for our three hypothet¬ 
ical securities in Table 1 reflect equilibrium 
beliefs, then the market portfolio would not 
be a mean-variance efficient portfolio. But can 
these be equilibrium beliefs? Consider the fol¬ 
lowing simple market: Inhabitants of an island 
live on coconuts and produce them from their 
own gardens. The island has three enterprises, 
namely, three coconut farms. Once a year, a 
stock market convenes to trade the shares of 
the three farms. Each year the resulting share 
prices turn out to be the same as those of 
preceding years. Thus the only source of un¬ 
certainty of return is the dividend each stock 
pays during the year, which is the stock's pro 
rata share of the farm's production. Markowitz 
(2005) shows that means, variances, and covari¬ 
ances of coconut production exist that imply 
the efficient set in Figure 1, or in any of the 
other three security-efficient sets presented in 
Markowitz (1952 and Chapter 7 in 1959) initial 
works. 

With such probability distributions of returns, 
the market is rational in the sense that each par¬ 
ticipant knows the true probability distribution 
of returns, and each seeks and achieves mean- 
variance efficiency. Nevertheless, in contrast to 
the usual CAPM conclusion, the market port¬ 
folio is not an efficient portfolio. It also follows 
that there is no representative investor since no 
one wants to hold the market portfolio. 

Arbitrage 

Suppose that most investors are subject to the 
nonnegativity requirement of inequalities (8b), 
but one investor can short in the CAPM sense. 
(Perhaps the CAPM investor has surreptitious 
access to a vault containing stock certificates 
that he or she can "borrow" temporarily with¬ 
out posting collateral.) Would this CAPM in¬ 
vestor, with unlimited power to short and use 
the proceeds to buy long, arbitrage away the 
inefficiency in the market portfolio? 
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Figure 2 Illustration That an Investor Who Can 
Sell Short and Use the Proceeds to Buy Long 
Should Not Short an Inefficient Market 

Figure 2 shows an investor would not do so. 
Suppose that portfolio P is the one most pre¬ 
ferred by the Roy CAPM investor. If this in¬ 
vestor shorts M and uses the proceeds to buy 
more P, then the resulting portfolio will be on 
the straight line connecting M and P —but this 
time on the far side of P (e.g., at Q) rather than 
between M and P. But Q is not efficient for the 
Roy CAPM investor since it does not lie on the 
I!! line. The Roy CAPM investor is better off 
just holding P rather than shorting M to buy 
more P. 

With market participants holding portfolios 
d, e, and P and with the weighted average of 
the d and e investors being at M, the new mar¬ 
ket portfolio will be on the straight line between 
M and P, such as at M", M b , or M c in Figure 3. 



Figure 3 Illustration That the Presence of a 
CAPM Short Seller Does Not Make the Market 
Portfolio Efficient 


M c cannot be the market equilibrium since this 
would imply a negative market value for Secu¬ 
rity 2. Similarly, M b implies a zero market value 
for Security 2, therefore a zero price. 

Thus the only points (portfolios) between M 
and P that are consistent with positive prices for 
all securities lie strictly between M and M b , such 
as M a ; but M" is not efficient for the investors 
with either a standard or a Roy constraint set. 

Expected Returns and Betas 

If Assumptions 1 through 4 (or Assumption 4') 
are true, then CAPM Conclusion 2' follows: Ex¬ 
pected returns are linearly related to the betas 
of each security as in equation (7), that is, 

Ei = ci + bj Si 

E 2 — a + bfi 2 

E 3 =a + bfi 3 

where pi is the coefficient of the regression of the 
return on the zth security against the return on 
the market portfolio. In other words, all (£„ /!,) 
combinations lie on the straight line 

Y = a +bX 

But equation (7) does not typically hold if As¬ 
sumptions 1 through 3 are true but neither As¬ 
sumption 4 nor Assumption 4' is also true, as 
illustrated using the data in Tables 2 and 3, and 
Figure 4. Table 2 shows the P, for portfolio P; Ta¬ 
ble 3 shows them for portfolio M. These betas 
are computed using the fact that the regression 
coefficient p S/r of random variable s against a 
random variable r is 

_ Covariance(r, s) 
s ’’ Variances(s) 


Table 2 Betas versus Portfolio P 


Security 

Percent in P 

co \i, P - PiVi 

beta ; , P 

1 

0.70% 

0.0227 

0.52 

2 

-0.25 

-0.0036 

-0.08 

3 

0.55 

0.0495 

1.12 

Note: var(P) 

= 0.0440; beta,_ P 

= covj p /var(P). 
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Table 3 Betas versus Portfolio M 


Security 

Percent in M 

cov„ M = MiVi 

beta i;M 

1 

0.30 

0.0097 

0.36 

2 

0.19 

0.0027 

0.10 

3 

0.51 

0.0459 

1.71 


Note: var(M) = 0.0268; beta,^ = cov ! , A1 /var(M). 


Figure 4 shows the plot of these betas against 
the expected returns given in Table 1 The re¬ 
lationship between beta and expected return is 
linear for regressions against P, as implied by 
equation (7), but not against M. In general, ex¬ 
pected returns are a linear function of betas if 
and only if the regressions are against a portfo¬ 
lio on the IP line. (See Chapter 12 in Markowitz 
and Todd [2000].) 


Limited Borrowing 

Thus far we have seen that the market portfo¬ 
lio is not necessarily an efficient portfolio, and 
there is usually no linear relationship between 
expected returns and betas (regressions against 
the market portfolio) if the SL-CAPM or Roy 
CAPM is replaced by the standard, Markowitz 
constraint set, constraints given by (8). Figure 5 
illustrates that the same conclusions hold if bor¬ 
rowing and lending at a risk-free interest rate 
are permitted, but borrowing is limited, for ex¬ 
ample, to 100% of the equity in the portfolio. 
In Figure 5, Security 3 is the risk-free asset. 


Expected Return 



Figure 4 Linear Relationship between Expected 
Returns and Betas If and Only If the Regression Is 
Against a Portfolio on the Line IP in Figure 1. 



Figure 5 Illustration That If Borrowing Is Per¬ 
mitted but Limited, the Market Portfolio Is Still 
Typically Not an Efficient Portfolio 

With 100% borrowing permitted, the set of fea¬ 
sible portfolios is no longer on and in the trian¬ 
gle with (0, 0), (1, 0), and (0, 1) as its vertices. 
Rather, the feasible region is on and in the tri¬ 
angle whose vertices are (0, 0), (2, 0), and (0, 
2). For example, the (2, 0) point represents the 
portfolio with 200% invested in Security 1. 

In the SL-CAPM, the efficient set starts at 
the portfolio (0, 0), which holds only the risk¬ 
free asset. From there, the efficient set moves 
along a straight line in the first quadrant of Fig¬ 
ure 5. 2 In the SL-CAPM, this efficient line would 
continue in the same direction without limit. In 
the model with borrowing limited to at most 
100% of equity, the ray extending from (0, 0) 
is no longer feasible (therefore no longer effi¬ 
cient) when it crosses the line connecting (0, 2) 
and (2, 0)—at b in the figure. The efficient 
set then moves towards the leveraged portfo¬ 
lio with highest expected return: (2, 0) in the 
present case. Thus in Figure 5 the set of effi¬ 
cient portfolios is the line segment connecting 
(0, 0) to b, followed by the segment connecting 
b to (2, 0). As in our analysis using the standard 
constraint set, if some investors hold portfolio 
d and the remainder hold portfolio e, then the 
"market portfolio" will be between them (e.g., 
at M') and will not be an efficient portfolio. 

We put "market portfolio" in quotes above 
because M' is a leveraged portfolio. In order 
to meet the definition of market portfolio in 
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equation (1), so that the holdings in the mar¬ 
ket portfolio sum to one, we must rescale M'. 
This gives us the market portfolio (no quota¬ 
tion marks) M, which is also not an efficient 
portfolio. 

Finally, as in the analysis of the standard case 
since M is not on the li' line, there does not 
exist a linear relationship between expected re¬ 
turns and betas. Also, there is no "representa¬ 
tive investor," since no investor wants to hold 
the market portfolio. 

Further Generalizations 

Suppose that there are n securities (for n — 3 or 
30 or 3,000). Suppose that one security has the 
highest expected return, and that the n secu¬ 
rities have a "nonsingular covariance matrix." 
This means that there is no riskless combina¬ 
tion of risky securities. If the only constraint on 
the choice of portfolio is equation (6), then the 
portfolios that minimize portfolio variance V p 
for various values of portfolio expected return 
Ep lie on a single straight line in n-dimensional 
portfolio space. This is not true for an investor 
also subject to nonnegativity constraints such 
as in the inequalities given by (8b). 

The critical line algorithm (CLA) for tracing out 
all efficient portfolios begins with the portfolio 
that is 100% invested in the security with high¬ 
est expected return (see Markowitz and Todd, 
2000). It traces out the set of efficient portfolios 
in a series of iterations. Each iteration computes 
one piece (one linear segment) of the piecewise 
linear efficient set. Each successive segment has 
either one more or one less security than the pre¬ 
ceding segment. If the universe consists of, say 
10,000 securities, and if all securities are to be 
demanded by someone, then this universal ef¬ 
ficient frontier must contain at least 10,000 seg¬ 
ments. If investors have sufficiently diverse risk 
tolerances, they will choose portfolios on many 
different segments. The market portfolio is a 
weighted average of individual portfolios and 
typically will not be on any efficient segment. 

This characterization of efficient sets remains 
true if limited borrowing is allowed, as we saw. 


It also remains true when short selling is permit¬ 
ted but is subject to Reg T or a similar constraint 
(see Jacobs, Levy, and Markowitz, 2005). 


CAPM INVESTORS DO NOT 
GET PAID FOR BEARING 
RISK 

Recall that if the SL-CAPM assumptions are 
made, then a stock's beta (regression against the 
market portfolio) is proportional to its excess 
return, as shown in equation (5). Markowitz 
shows that this does not imply that CAPM in¬ 
vestors are paid to bear risk (see Markowitz, 
2008). 

This is most easily seen if we assume that 
risks are uncorrelated. (CAPM should cover 
this case, too.) In this case, we show that two se¬ 
curities can have the same variance but different 
expected returns, or the same expected returns 
and different variances. Therefore, it cannot be 
true that the investor is paid for bearing risk! 

According to equation (10), the beta of r, 
against r M is 

Covariance(r,, r M ) 

^ 1 Variance(r M ) 

Therefore, equation (5) holds if and only if we 
also have 

d = B covariance(r,, r M ) (11) 

where 

h = b/Var(r M ) 

In other words, excess return is proportional to 
Pi if and only if it is proportional to the covari¬ 
ance between r, and r 1 ^. 

As a calculus exercise one can show that, in 
the uncorrelated case, the SL-CAPM investor 
minimizes portfolio variance for given portfo¬ 
lio mean if and only if the investor chooses a 
portfolio such that 

vxj = k 1 a 


(12a) 
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where V, is the variance of r, and k 1 depends on 
the investor's risk aversion. 

Equation (12a) implies a similar relationship 
for the market portfolio: 3 


ViX M = k M ei 

(12b) 

Therefore, 


x M = k M (y\ for i = 1,.. 

., n (12c) 


Thus if two securities have the same positive 
excess return but different variances, the mar¬ 
ket portfolio will contain a larger dollar value 
of the one with the lower variance. Conversely, 
if two securities have the same variance but dif¬ 
ferent positive excess returns, the market port¬ 
folio will contain a larger dollar value of the one 
with the higher excess return. 

Now let us consider where the linear relation¬ 
ship in equation (5), or (11), comes from in this 
case of uncorrelated returns. It can be shown 
that in equation (12b), V/ X ; M is the covariance 
of the r, with the market. Therefore, covariance 
with the market is proportional to excess re¬ 
turn (and vice versa) because the security with 
the higher ratio of excess return to variance is a 
larger part of the market portfolio. 

Thus, in the uncorrelated case, the relation¬ 
ship between beta and excess return in equa¬ 
tion (5) results from the security with higher 
excess return (per unit variance) being a larger 
part of the market portfolio. The beta in equa¬ 
tion (5) is the regression of r, against the market 
portfolio and, in the uncorrelated case, the only 
security in the market portfolio with which it is 
correlated is itself. 

When returns are correlated, the formula for 
the covariance between security return and 
market portfolio return is more complicated, 
but the basic principle is the same. For ex¬ 
ample, if two securities have the same co- 
variance structure, the one with the higher 
expected return will constitute a larger share 
of the market portfolio—despite the presence 
in the market portfolio of securities with which 
it is correlated—and hence have its own re¬ 


turns more correlated with returns on the mar¬ 
ket portfolio. 


THE "TWO BETA" TRAP 

Two distinct meanings of the word "beta" 
are used in modern financial theory. These 
meanings are sufficiently alike for people to 
converse—some with one meaning in mind, 
some with the other—without realizing they are 
talking about two different things. The mean¬ 
ings are sufficiently different, however, that one 
can validly derive diametrically opposite con¬ 
clusions depending on which one is used. The 
net result of all this can be like an Abbott and 
Costello vaudeville comedy routine with port¬ 
folio theory rather than baseball as its setting. 
This is what Markowitz (1984) calls the tzvo beta 
trap. Below we first review the background of 
the two betas and then tabulate propositions 
that are true for one concept and false for the 
other. 

Betai963 

Sharpe’s single-index (or one-factor) model of co- 
variance introduced in 1963 assumes that the re¬ 
turns of different securities are correlated with 
each other because each is dependent on some 
underlying systematic factor (see Sharpe, 1963). 
This can be written as 

rj = cii + ^ F + ip (13) 

where the expected value of «; is zero, and u, is 
uncorrelated with F and every other Uj. 

Originally F was denoted by I and described 
as an "underlying factor, the general prosper¬ 
ity of the market as expressed by some index." 
We have changed the notation from I to F to em¬ 
phasize that r, depends on the underlying factor 
rather than the index used to estimate the fac¬ 
tor. The index never measures the factor exactly, 
no matter how many securities are used in the 
index, provided that each security has positive 
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variance of «„ since the index I equals: 

I =^2 wAi 

= J2<XiUi+F(»>iPi) + I2 u i w i (-^) 

— A+ BF + U 

where if, is the weight of return r, in the index, 
and 

A = ^2 cij Wi 

B = ^2 W <P' 

U = ^2 u i w i 

U is the error in the observation of F. Under 
the conditions stated, the variance of U is 

N 

V U = J2 w f v u, > 0 ( 15 ) 

i =1 

Sharpe (1963) tested equation (13) as an ex¬ 
planation of how security returns tend to go up 
and down together. He concluded that equation 
(13) was as complex a model of covariance as 
seemed to be needed. This conclusion was sup¬ 
ported by research of Cohen and Pogue (1967). 
King (1966) found strong evidence for industry 
factors in addition to the market-wide factor. 
Rosenberg (1974) found other sources of sys¬ 
tematic risk beyond the market-wide factor and 
industry factors. 

We refer to the beta coefficient in equation (13) 
as "betai 963 " since it is the subject of Sharpe's 
1963 article. We contrast the properties of this 
beta with that of the beta that arises from the 
Sharpe-Lintner CAPM. The latter we will refer 
to as "betake' since it is the subject of Sharpe 
(1964). 

Betai964 

We noted that the SL-CAPM makes various as¬ 
sumptions about the world, including that all 
investors are mean-variance efficient, have the 
same beliefs, and can lend or borrow all they 
want at the same "risk-free" interest rate. Note, 
however, one assumption that the SL-CAPM 


does not make is that the covariances among 
securities satisfy equation (13). On the contrary, 
the assumptions it makes concerning covari¬ 
ances are quite general. 4 They are consistent 
with equation (13) but do not require it. They 
are also consistent with the existence of indus¬ 
try factors as noted by King, or other sources 
of systematic risk such as those identified by 
Rosenberg. 

As previously noted, the beta that appears in 
the CAPM relationship of equation (5) (which 
we now refer to as betai 964 ) is the regression of 
the ith security's return against the return on 
the market portfolio. This is defined whether 
or not the covariance structure is generated by 
the single-factor model of equation (13). Equa¬ 
tion (5) is an assertion about the expected return 
of a security and how it relates to the regres¬ 
sion of the security's return against the market- 
portfolio return. Unlike equation (13), it is not 
an assertion about how security returns covary. 

One source of confusion between beta ] t )63 and 
betai 964 is that William Sharpe presented each 
of them. Sharpe, however, has never been con¬ 
fused on this point. In particular, when explain¬ 
ing betai 964 he emphasizes that he derived it 
without assuming equation (13). 

Propositions about Betas 

Table 4 lists various propositions about betas 
and indicates whether they are true or false for 
betai 963 or betai 964 . The first column presents 
each proposition, the second indicates whether 
the proposition is true or false for beta 19153 , and 
the third column indicates the same for beta ] 954 . 
Most of the propositions in Table 4 are true for 
one of the betas and false for the other. 

Proposition 1 

Because of the definition of a regression beta in 
general, both beta kes and betai 964 equal 

Pi = co v(n, R)/V(R) 

for some random variable R. In the case of 
betai 963 , R is F for equation (13); in the case of 
betai 964 , R is the M in equations (1) and (2). 
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Table 4 Propostions about Beta 


^1963 ^1964 

1. The fli of the zth security equals cov(r„R)/V(R) for some random variable R. T T 

2. R is "observable"; specifically, it may be computed exactly from security returns (r,) F T 

and market values (X;). 

3. R is a utz/we-weighted average of the (rf). F T 

4. An index I that estimates R should ideally be weighted by a combination of (1/ V Uj ) T F 

and (ft i/Vi). Unfortunately, the ft , and V Ui needed to determine these weights are 

unobservable. 

5. If ideal weights are not used, then equal weights are "not bad" in computing I; T F 

specifically, nonoptimum weights can be compensated for by increased sample size. 

6. Essentially, all that is important in computing I is to have a large number of securities; T F 

it is not necessary to have a large fraction of all securities. 

7. The ideally weighted index is an efficient portfolio. F T 


Proposition 2 

Equation (15) implies that F cannot be observed 
exactly no matter how many securities are used 
to estimate it, provided that no security has a 
zero variance of zq. In contrast, portfolio M in 
equation (2) is observable, at least in principle, 
if only we are diligent enough to measure each 
X) vl in the market. Thus, the assertion that R is 
observable is true in principle for betai 964 and 
false for betai 963 . 


Propositions 3 and 4 

One source of confusion about the two betas 
concerns whether an index estimating R should 
be "value weighted"; that is, should the ay used 
in computing an estimate of R from the r, equal 
the Xj^? We have seen that in the case of betai 964 : 

R = J2 X / M h 

In this case IV, = Xf* — market-value weights. 

The answer is different in the case of beta 1953 . 
Ideally, we would like to eliminate the error 
term U from equation (14). Our index would 
be perfect if V u — 0, provided of course B / 0. 
Nevertheless, as long as no security has V Uj = 
0 , the perfect index cannot be achieved with a 
finite number of securities. Short of this, it might 
seem that the best to be wished is that Vu be a 
minimum. In this case, ay would equal 1 / V Ui . 
The optimum choice of weights for estimating 


the underlying factor F is more complicated, 
depending also on fit/Vi (see Markowitz, 1983) 
and more complicated still, since V Uj and /l, are 
not known. 

Proposition 5 

The fifth proposition in Table 4 asserts that if 
ideal weights cannot be obtained, equal weights 
are good enough. In particular, an increase in 
the number of securities can compensate for 
nonoptimum weights. We have already seen 
that this proposition is false for betaig 64 . It is 
easily seen to be true for beta] 953 under mild 
restrictions on how fast the V Uj increases as i 
increases. 

Proposition 6 

The next proposition asserts that all that is im¬ 
portant in designing a good index is to have 
many securities, as opposed to having a large 
percentage of the population represented in 
the index. This proposition is true for Ii 963 and 
false for f | 954 , as may be illustrated by two ex¬ 
treme examples. 

First, suppose that there are only a few secu¬ 
rities in the entire population, and all of them 
are used in computing a value-weighted index. 
Then I 1964 would, in fact, be M and would be 
precisely correct. In the case of 1 1963 , on the other 
hand, equation (15) implies that if n = 6 , for 
example, the error term Vu is the same 
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regardless of whether the six securities are 100 % 
or 1 % of the universe. 

At the other extreme, imagine that the sam¬ 
ple is large but is a small percentage of the total 
population. For example, suppose N = 1,000 
out of 100,000 securities. Then /)953 will give 
a good reading for F, and therefore betai 963 , 
but fi 964 may lead to serious misestimates of 
betai 964 . First, the covariance with fi 9^4 of an 
asset not in this index will tend to be too low. 
Second, if the index contains more of certain 
kinds of assets than is characteristic of the en¬ 
tire population, then assets of this sort will tend 
to have a higher correlation with the index than 
with the true M, and assets of other sorts will 
tend to have lower correlations. More precisely, 
the covariance between return r, and the mar¬ 
ket is a weighted average of the covariances er,y 
(including V, = <t„) weighted by market val¬ 
ues. If the index chosen does not have approx¬ 
imately the same average o y for a given i, the 
estimates of /k,i 964 will be in error. 

Proposition 7 

This proposition asserts that the ideal index is 
an efficient portfolio. This is true for Ii^ei and 
false for 1 4953 since one of the conclusions of 
the SL-CAPM assumptions is that the market 
portfolio is efficient. In fact, the market portfo¬ 
lio is the only combination of risky assets that is 
efficient in this CAPM. All other efficient port¬ 
folios consist of either investment in the market 
portfolio plus lending at the risk-free rate, or 
of investment in the market portfolio financed 
in part by borrowing at the risk-free rate. On 
the other hand, beta ] 9^3 has nothing to do with 
expected returns or market efficiency. 


KEY POINTS 

• The two major conclusions of the Sharpe- 
Lintner CAPM are that (1) the market portfo¬ 
lio is a mean-variance efficient portfolio; and 
( 2 ) the excess return of each security is pro¬ 
portional to its beta. 


• The "market portfolio" includes all securities 
in the market. 

* The beta (/3) in the CAPM is estimated using 
regression analysis using historical data on 
observed returns for a security (response vari¬ 
able) and observed returns for the market (ex¬ 
planatory variable). 

• The Roy CAPM differs from the Sharpe- 
Lintner CAPM only in its assumption con¬ 
cerning the investment constraint imposed by 
investors. More specifically, it assumes that 
each investor can short securities. 

* Confusion regarding the CAPM involves (1) 
the failure to distinguish between the follow¬ 
ing two statements: The market is efficient in 
that each participant has correct beliefs and 
uses them to their advantage on the one hand, 
and the market portfolio is a mean-variance 
efficient portfolio on the other hand; ( 2 ) be¬ 
lief that CAPM investors get paid for bearing 
nondiversifiable risk; and (3) failure to distin¬ 
guish between the beta in Sharpe's one-factor 
model of covariance (1963 beta) and that in 
Sharpe's CAPM (1964 beta). 

NOTES 

1. Markowitz presents examples of three- 
security standard analyses in which "c" is 
feasible in some cases and not feasible in oth¬ 
ers. It is possible in the latter case for the set 
of mean-variance efficient portfolios to be a 
single line segment or even a single point. 
But typically, when “c" is outside of the fea¬ 
sible triangle, as well as when it is within 
it, the set of efficient portfolios consists of 
two or more line segments (the "efficient seg¬ 
ments"), which meet at "comer portfolios." 
Thus the construction in Figure 1 can typi¬ 
cally be carried out in cases in which "c" is 
not feasible. (See Markowitz, 1952.) 

2. The SL-CAPM requires nonnegative invest¬ 
ments. Thus if the parameters of an exam¬ 
ple were such that the straight line would 
move into, say, the fourth quadrant, X 2 
would equal zero on the line and would, in 
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effect, drop out of the market, and out of the 
analysis. 

3. If we multiply both sides of equation (12a) 
by w 1 , the I- th investor's equity as a fraction 
of total market equity, and sum we get 

^(e^V) = (£^> 

If we sum the above over all securities, the 
second factor on the left, namely 

s =e(e» , x /) 

will not necessarily sum to one since noth¬ 
ing in the SL-CAPM assumptions prevents 
market participants from being either net 
borrowers or net lenders. However, if we 
divide both sides of equation (12c) by S, we 
get equation (12b) for the market portfolio as 
defined in equations (1) and (2). 

4. Mossin (1966) provides a precise statement 
of the assumptions behind the S-L CAPM. 
Specifically all that Mossin assumes about 
covariances is that the covariance matrix is 
nonsingular (i.e., that no portfolio of risky 
securities is riskless). 
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Abstract: The dynamics of asset price processes in discrete time increments are typically described by 
two kinds of models: trees (lattices) and random walks. Arithmetic, geometric, and mean reverting 
random walks are examples of the latter type of models. When the time increment used to model 
the asset price dynamics becomes infinitely small, we talk about stochastic processes in continuous 
time. Models for asset price dynamics can incorporate different observed characteristics of an asset 
price process, such as a drift or a reversion to a mean, and are important building blocks for risk 
management and financial derivative pricing models. 


Many classical asset pricing models, such as the 
capital asset pricing theory and the arbitrage 
pricing theory, take a myopic view of invest¬ 
ing: They consider events that happen one time 
period ahead, where the length of the time pe¬ 
riod is determined by the investor. This entry 
presents apparatus that can handle asset dy¬ 
namics and volatility over time. The dynamics 
of price processes in discrete time increments 
are typically described by two kinds of models: 
trees (such as binomial trees) and random walks. 
When the time increment used to model the 
asset price dynamics becomes infinitely small, 
we talk about stochastic processes in continuous 
time. 

In this entry, we introduce the fundamentals 
of binomial tree and random walk models, pro¬ 
viding examples for how they can be used in 
practice. We briefly discuss the special nota¬ 


tion and terminology associated with stochas¬ 
tic processes at the end of the entry; however, 
our focus is on interpretation and simulation 
of processes in discrete time. The roots for the 
techniques we describe are in physics and the 
other natural sciences. They were first applied 
in finance at the beginning of the 20th century 
and have represented the foundations of asset 
pricing ever since. 

FINANCIAL TIME SERIES 

Let us first introduce some definitions and no¬ 
tation. A financial time series is a sequence of 
observations of the values of a financial vari¬ 
able, such as an asset price (index level) or asset 
(index) returns, over time. Figure 1 shows an 
example of a time series, consisting of weekly 
observations of the S&P 500 price level over a 
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Figure 1 S&P 500 Index Level between August 
19, 2005 and August 19, 2009 

period of five years (August 19,2005 to August 
19, 2009). 

When we describe a time series, we talk about 
its drift and volatility. The term "drift" is used 
to indicate the direction of any observable trend 
in the time series. In the example shown in Fig¬ 
ure 1, it appears that the S&P 500 time series 
has a positive drift up from August 2005 until 
about the middle of 2007, as the level of prices 
appears to have been generally increasing over 
that time period. From the middle of 2007 until 
the beginning of 2009, there is a negative drift. 
The volatility is smaller (the time series is less 
"squiggly") from August 2005 until about the 
middle of 2007, but increases dramatically be¬ 
tween the middle of 2007 and the beginning of 
2009. 

We are usually interested also in whether 
the volatility increases when the price level 
increases, decreases when the price level 
decreases, or remains constant independently 
of the current price level. In this example, 
the volatility is lower when the price level is 
increasing, and is higher when the price level 
is decreasing. 

Finally, we talk about the continuity of the 
time series—is the time series smooth, or are 
there jumps whose magnitude appears to be 
large relative to the price movements the rest of 
the time? From August 2005 until about the 
middle of 2007, the time series is quite smooth. 
However, some dramatic drops in price levels 
can be observed between the middle of 2007 
and the beginning of 2009—notably in the fall 
of 2008. 


For the remainder of this entry, we will use 

the following notation: 

• St', value of underlying variable (price, inter¬ 
est rate, index level, etc.) at time t. 

• Sf+i: value of underlying variable (price, in¬ 
terest rate, etc.) at time t +1 . 

• Wfi a random error term observed at time t. 
(For the applications in this entry, it will fol¬ 
low a normal distribution with mean equal to 
0 and standard deviation equal to cr.) 

• St- a realization of a normal random variable 
with mean equal to 0 and standard deviation 
equal to 1 at time t. 


BINOMIAL TREES 

Binomial trees (also called binomial lattices) pro¬ 
vide a natural way to model the dynamics of a 
random process over time. The initial value of 
the security So (at time 0) is known. The length 
of a time period. At, is specified before the tree 
is built. (The symbol A is often used to denote 
difference. The notation At therefore means time 
difference, i.e., length of one time period.) 

The binomial tree model assumes that at the 
next time period, only two values are possible 
for the price, that is, the price may go up with 
probability p or down with probability (1 - p) . 
Usually, these values are represented as multi¬ 
ples of the price at the beginning of the period. 
The factor u is used for an up movement, and 
d is used for a down movement. For example, 
the two prices at the end of the first time period 
are u-So and d-Sg. If the tree is recombining, 
there will be three possible prices at the end of 
the second time period: u 2 -S 0 , u-d-S 0 , and d 2 -S 0 . 
Proceeding in a similar manner, we can build 
the tree in Figure 2. 

The binomial tree model may appear simple, 
because, given a current price, it only allows 
for two possibilities for the price at each time 
period. However, if the length of the time pe¬ 
riod is small, it is possible to represent a wide 
range of values for the price after only a few 
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Time 0 Time 1 Time 2 Time 3 



Figure 2 Example of a Binomial Tree 

steps. To see this, notice that each step in the 
tree can be thought of as a Bernoulli trial—it 
is a "success" with probability p and a "fail¬ 
ure" with probability (1 - p). (One can think of 
the Bernoulli random variable as the numerical 
coding of the outcome of a coin toss, where one 
outcome is considered a "success" and one out¬ 
come is considered a "failure." The Bernoulli 
random variable takes the value 1 ("success") 
with probability p and the value of 0 ("failure") 
with probability 1 —p. Note that the definition 
of success and failure here is arbitrary, because 


an increase in price is not always desirable, but 
we define them in this way for the example's 
sake.) 

After n steps, each particular value for the 
price will be reached by realizing k successes 
and (n - k) failures, where A: is a number between 
0 and n. The probability of reaching each value 
for the price after n steps will be 

P (k successes) = , p k (1 — p) n ~ k 

k\ (n — ky. 

For large values of n, the shape of the bino¬ 
mial distribution becomes more and more sym¬ 
metric and looks like a continuum. (See Fig¬ 
ure 3(A)-(C).) In fact, the binomial distribution 
approximates a normal distribution with spe¬ 
cific mean and standard deviation related to 
the probability of success and the number of 
trials. (The normal distribution is a continuous 
probability distribution. It is represented by a 
bell-shaped curve, and the shape of the curve is 
entirely described by the distribution mean and 
variance. Figure 4 shows a graph of the stan¬ 
dard normal distribution, which has a mean of 
zero and a standard deviation of 1.) One can 
therefore represent a large range of values for 
the price as long as the number of time periods 
used in the binomial tree is large. Practitioners 
often use also trinomial trees, that is, trees with 
three branches emanating from each node, in 
order to obtain a better representation of the 
range of possible prices in the future. 



Figure 3 Binomial Distribution 

Note: Probability of success (p) assumed to be 0.3. Number of trials (A) n = 3; (B) n = 20; (C) n = 100. 
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Figure 4 Standard Normal Distribution 


ARITHMETIC RANDOM 
WALKS 

Instead of assuming that at each step the asset 
price can only move up or down by a certain 
multiple with a given probability, we could as¬ 
sume that the price moves by an amount that 
follows a normal distribution with mean /x and 
standard deviation a. In other words, the price 
for each period is determined from the price of 
the previous period by the equation 

S,+i = Sf + /x + cbt 

where cbt is a normal random variable with 
mean 0 and standard deviation a. We will also 
assume that the random variable cbt describing 
the change in the price in one time period is 
independent of the random variables describ¬ 
ing the change in the price in any other time 
period. (This is known as the Markov property. 
It implies that past prices are irrelevant for fore¬ 
casting the future, and only the current value of 
the price is relevant for predicting the price in 
the next time period.) A sequence of indepen¬ 
dent and identically distributed (IID) random 
variables cog, ..., cb t ,... with zero mean and fi¬ 
nite variance a 2 is sometimes referred to as white 
noise. 

The movement of the price expressed through 
the equation above is called an arithmetic random 
walk with drift. The drift term, /x, represents the 



Figure 5 Five Paths of an Arithmetic Random 
Walk Assuming /x = -0.1697 and o = 3.1166 


average change in price over a single time pe¬ 
riod. Note that for every time period f, we can 
write the equation for the arithmetic random 
walk as 

Sf = Sf_i + /x + w t -\ 

= (St-2 + /X + cbt-f) + /x + cbt-l 
= (Sf _3 + /X + cbt—2) + 2 ■ /X + cbt-1 + cbt—2 

t-1 

= So + /x ■ t + y ' cbj 
i =0 

Therefore, an arithmetic random walk can be 
thought of as a sum of two terms: a determin¬ 
istic straight line Sf = So + /x-f and a sum of all 
past noise terms. (See Figure 5.) 


Simulation 

The equation for the arithmetic random walk 
can be expressed also as 

Sf+i = Sf + /x + a ■ St 


where St is a standard normal random variable. 
To show this, we need to mention that every 
normal distribution can be expressed in terms 
of the standard normal distribution, and the lat¬ 
ter has mean of 0 and standard deviation of 1. 
Namely, if e is a standard normal variable with 
mean 0 and standard deviation 1, and x is a nor¬ 
mal random variable with mean /x and standard 
deviation a, we have 


e = 


x — /x 


(equivalently, x — a ■ e + /x) 


a 
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This is a property unique to the normal 
distribution—no other family of probability 
distributions can be transformed in the same 
way. In the context of the equation for the arith¬ 
metic random walk, we have a normal random 
variable a>t with mean 0 and standard devia¬ 
tion a. It can be expressed through a standard 
normal variable St as er • g + 0 . 

The equation for St+i above makes it easy to 
generate paths for the arithmetic random walk 
by simulation. All we need is a way of gen¬ 
erating the standard normal random variables 
gf. We start with an initial price So, which is 
known. We also know the values of the drift /x 
and the volatility a over one period. To generate 
the price at the next time period. Si, we add /x 
to So, simulate a normal random variable from 
a standard normal distribution, multiply it by 
a, and add it to So + /x. At the next step (time 
period 2 ), we use the price at time period 1 we 
already generated. Si, add to it /x, simulate a 
new random variable from a standard normal 
distribution, multiply it by a, and add it to Si + 
/x. We proceed in the same way until we gener¬ 
ate the desired number of steps of the random 
walk. For example, given a current price S, in 
Excel the price for the next time period can be 
generated with the formula 

S + /x + er *NORMINV(RAND(), 0,1) 

Parameter Estimation 

In order to simulate paths of the arithmetic ran¬ 
dom walk, we need estimates of the parame¬ 
ters (/x and er). We need to assume that these 
parameters remain constant over the time pe¬ 
riod of estimation. Note that the equation for 
the arithmetic random walk can be written as 

Sf+i — Sf = /x + a ■ St 

Given a historical series of T prices for an as¬ 
set, we can therefore do the following to esti¬ 
mate /x and er: 

1. Compute the price changes S t+ i - S t for each 
time period t,t = 0,..., T-l. 


2. Estimate the drift of the arithmetic random 
walk, /x, as the average of all the price 
changes. 

3. Estimate the volatility of the arithmetic ran¬ 
dom walk, er, as the standard deviation of all 
the price changes. 

An important point to keep in mind is the 
units in which the parameters are estimated. If 
we are given time series in monthly increments, 
then the estimates of /x and er we will obtain 
through steps 1-3 will be for monthly drift and 
monthly volatility. If we then need to simulate 
future paths for monthly observations, we can 
use the same /x and er. However, if, for exam¬ 
ple, we need to simulate weekly observations, 
we will need to adjust /x and a to account for 
the difference in the length of the time period. In 
general, the parameters should be stated as an¬ 
nual estimates. The annual estimates can then 
be adjusted for daily, weekly, monthly, and so 
on increments. 

For example, suppose that we have estimated 
the weekly drift and the weekly volatility. To 
convert the weekly drift to an annual drift, we 
multiply the number we found for the weekly 
drift by 52, the number of weeks in a year. To 
convert the weekly volatility to annual volatil¬ 
ity, we multiply the number we found for the 
weekly volatility by the square root of the num¬ 
ber of weeks in a year, that is, by V52. Con¬ 
versely, if we are given annualized values for 
the drift and the volatility, we can obtain weekly 
values by dividing the annual drift and the 
volatility by 52 and V52, respectively. 


Arithmetic Random Walks: Some 
Additional Facts 

If we use the arithmetic random walk model, 
any price in the future, Sf, can be expressed 
through the initial (known) price So as 

t-i 

Sf = Sp + /x • t + a ■ St 

i=0 
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The random variable corresponding to the 
sum of t independent normal random variables 
So,, St- 1 is a normal random variable with 
mean equal to the sum of the means and stan¬ 
dard deviation equal to the square root of the 
sum of variances. Since So,■ ■ ■, £f_i are inde¬ 
pendent standard normal variables, their sum 
is a normal variable with mean 0 and standard 
deviation equal to 

Vi +... +1 = Vt 

t times 

Therefore, we can have a closed-form expres¬ 
sion for computing the asset price at time t given 
the asset price at time 0 : 

St = So + fi-t + a- */i-e 

where s is a standard normal random variable. 

Based on the discussion so far in this section, 
we can state the following observations about 
the arithmetic random walk: 

• The arithmetic random walk has a constant 
drift /i and volatility er, that is, at every time 
period, the change in price is normally dis¬ 
tributed, on average equal to /i, with a stan¬ 
dard deviation of a. 

• The overall noise in a random walk never de¬ 
cays. The price change over t time periods 
is distributed as a normal distribution with 
mean equal to ji-t and standard deviation 
equal to a^fi. That is why in industry one 
often encounters the phrase "The uncertainty 
grows with the square root of time." 

• Prices that follow an arithmetic random walk 
meander around a straight line St = So + p-f. 
They may depart from the line, and then cross 
it again. 

• Because the distribution of future prices is 
normal, we can theoretically find the prob¬ 
ability that the future price at any time will 
be within a given range. 

• Because the distribution of future prices is 
normal, future prices can theoretically take in¬ 
finitely large or infinitely small values. Thus, 
they can be negative, which is an undesirable 
consequence of using the model. 


Asset prices, of course, cannot be negative. In 
practice, the probability of the price becoming 
negative can be made quite small as long as the 
drift and the volatility parameters are selected 
carefully. However, the possibility of generat¬ 
ing negative prices with the arithmetic random 
walk model is real. 

Another problem with the assumptions un¬ 
derlying the arithmetic random walk is that 
the change in the asset price is drawn from 
the same random probability distribution, in¬ 
dependently of the current level of the prices. 
A more natural model is to assume that the pa¬ 
rameters of the random probability distribution 
for the change in the asset price vary depend¬ 
ing on the current price level. For example, a $1 
change in a stock price is more likely when the 
stock price is $100 than when it is $4. Empirical 
studies confirm that over time, asset prices tend 
to grow, and so do fluctuations. Only returns 
appear to remain stationary, that is, to follow 
the same probability distribution over time. A 
more realistic model for asset prices may there¬ 
fore be that returns are an IID sequence. We de¬ 
scribe such a model in the next section. 


GEOMETRIC RANDOM 
WALKS 

Consider the following model: 

r t = fi + a ■ s t 

where £o,..., £t is a sequence of independent 
normal variables, and rt, the return, is com¬ 
puted as 

_ S f+1 — Sf 
$ 

Returns are therefore normally distributed, 
and the return over each interval of length 1 has 
mean // and standard deviation a. How can we 
express future prices if returns are determined 
by the equations above? 
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Suppose we know the price at time f, S t . The 
price at time f +1 can be written as 


St+i = S t ■ 


ff+1 

St 


= St- 
= Sf 


/ Sf St+1 - s t \ 

\si St ) 



= Sf(l + f t ) 

— St + fi ■ St + (t ■ St ■ it 


This last equation is very similar to the equa¬ 
tion for the arithmetic random walk, except that 
the price from the previous time period appears 
as a factor in all of the terms. 

The equation for the geometric random walk 
makes it clear how paths for the geometric ran¬ 
dom walk can be generated. As in the case of the 
arithmetic random walk, all we need is a way of 
generating the normal random variables St . We 
start with an initial price So, which is known. 
We also know the values of the drift /x and the 
volatility a over one period. To generate the 
price at the next time period. Si, we add /x-So to 
So, simulate a normal random variable from a 
standard normal distribution, multiply it by a 
and So, and add it to So + /x-So. At the next step 
(time period 2 ), we use the price at time period 
1 we already generated. Si, add to it /x-Si, sim¬ 
ulate a new random variable from a standard 
normal distribution, multiply it by a and Si, 
and add it to Si + /x-Si. We proceed in the same 
way until we generate the desired number of 
steps of the geometric random walk. For exam¬ 
ple, given a current price S, in Excel the price 
for the next time period can be generated with 
the formula 


S + n*S + ct*S*NORMINV(RAND(), 0,1) 

Using similar logic to the derivation of the 
price equation earlier, we can express the price 
at any time f in terms of the known initial price 
So- Note that we can write the price at time f as 

c _ c Si Sf_i s f 
f 0 'sT"" S^'S^ 


Therefore, 


Sf — So • (1 + ?o) ■ • ■ • • (1 + tf-i) 

In the case of the arithmetic random walk, 
we determined that the price at any time pe¬ 
riod follows a normal distribution. This was 
because if we know the starting price So, the 
price at any time period could be obtained by 
adding a sum of independent normal random 
variables to a constant term and So- The sum of 
independent normal random variables is a nor¬ 
mal random variable itself. In the equation for 
the geometric random walk, each of the terms 
(1 + fo),..., (1 + ff_i) is a normal random vari¬ 
able as well. (It is the sum of a normal ran¬ 
dom variable and a constant.) However, they 
are multiplied together. The product of normal 
random variables is not a normal random vari¬ 
able, which means that we cannot have a nice 
closed-form expression for computing the price 
Sf based on So- 

To avoid this problem, let us consider the nat¬ 
ural logarithm of prices. (The natural logarithm 
is the function In so that e ln( b = x, where e is the 
number 2.7182....) Unless otherwise specified, 
we will use "logarithm" to refer to the natural 
logarithm, that is, the logarithm of base e. 

If we take logarithms of both sides of the equa¬ 
tion for Sf, we get 

ln(Sf) = ln(So-(l+f 0 )...(l + f f _i)) 

= ln(S 0 ) + ln(l + f 0 ) + ... + ln(l + f f _i) 

Log returns are in fact differences of log 
prices. To see this, note that 

ln(l + r f) = In ^1 + St+1 ^ ^ 

=ln (Ar) 

= ln(S f+ i) -ln(Sf) 

Now assume that log returns (not returns) are 
independent and follow a normal distribution 
with mean // and standard deviation a : 

ln(l + f t ) = ln(S f+ i) - ln(Sf) = /x + a • e t 
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As a sum of independent normal variables, 
the expression 

ln(So) + ln(l + f 0 ) + ... + ln(l + r t - 1) 

is also normally distributed. This means that 
ln(Sf) (rather than Sf) is normally distributed, 
that is, St is a lognormal random variable. Sim¬ 
ilarly to the case of an arithmetic random walk, 
we can compute a closed-form expression for 
the price S t given So: 

ln( S t ) = ln(S 0 ) • t + a ■ Vt • s 

or, equivalently, 

S f = S 0 • e^-r^yt+^-Vt-s 

where s is a standard normal variable. 

Notice that the only inconsistency with the 
formula for the arithmetic random walk is the 
presence of the extra term 



in the drift term 



Why is there an adjustment of one half of the 
variance in the expected drift? In general, if Y 
is a normal random variable with mean /i and 
variance a 2 , then the random variable, which is 
an exponential of the normal random variable 
Y, X = e Y , has mean 

E[X] = e^+r 0-2 

At first, this seems unintuitive—why is the 
expected value of X not 

£[X] = e M ? 

The expected value of a linear function of a 
random variable is a linear function of the ex¬ 
pected value of the random variable. For exam¬ 
ple, if a is a constant, then 

E[a-?] = a- E[Y] 

However, determining the expected value of a 
nonlinear function of a random variable (in par- 



Figure 6 Example of a Lognormal Distribution 
with Mean of 1 and Standard Deviation of 0.8 

ticular, the exponential function, which is the 
function we are using here) is not as trivial. For 
example, there is a well-known relationship, the 
Jensen inequality, which states that the expected 
value of a convex function of a random variable 
is less than the value of the function at the ex¬ 
pected value of the random variable. 

In our example, X is a lognormal random 
variable, so its probability distribution has the 
shape shown in Figure 6. The random variable 
X cannot take values less than 0. Since its vari¬ 
ance is related to the variance of the normal 
random variable Y, as the variance a 2 of Y in¬ 
creases, the distribution of X will spread out 
in the upward direction. This means that the 
mean of the lognormal variable X will increase 
not only as the mean of the normal variable Y, 
/z, increases, but also as Y's variance, a 2 , in¬ 
creases. In the context of the geometric random 
walk, Y represents the normally distributed log 
returns, and X is in fact the factor by which the 
asset price from the previous period is multi¬ 
plied in order to generate the asset price in the 
next time period. In order to make sure that 
the geometric random process grows exponen¬ 
tially at average rate /i, we need to subtract a 
term (that term turns out to be er 2 /2), which will 
correct the bias. 

Specifically, suppose that we know the price 
at time f, Sf. We have 

ln(Sf+i) = ln(Sf) + ln(l + r t ) 
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that is, 

S t+1 = St ■ e ln(1+f,) 

Note that we are explicitly assuming a multi¬ 
plicative model for asset prices here—the price in 
the next time period is obtained by multiplying 
the price from the previous time period by a 
random factor. In the case of an arithmetic ran¬ 
dom walk, we had an additive model—a ran¬ 
dom shock was added to the asset price from 
the previous time period. 

If the log return ln(l + f t ) is normally dis¬ 
tributed with mean p and standard deviation 
a, then the expected value of 

e ln(l+f,) 

is 

e u +\-° 2 

and hence 

E[S t+1 \ = S t -e^ 2 

In order to make sure that the geometric ran¬ 
dom walk process grows exponentially at an 
average rate p (rather than (p + 0.5 ■ er 2 )), we 
need to subtract the term 0.5 a 2 when we gen¬ 
erate the future price from this process. This ar¬ 
gument can be extended to determining prices 
for more than one time period ahead. 

We will understand better why this formula 
holds when we review stochastic processes at 
the end of this entry. 

Simulation 

It is easy to see how future prices can be gen¬ 
erated based on the initial price So- First, we 
compute the term in the power of e: We sim¬ 
ulate a value for a standard normal random 
variable, multiply it by the standard deviation 
and the square root of the number of time peri¬ 
ods between the initial point and the point we 
are trying to compute, and subtract the product 
from the drift term adjusted for the volatility 
and the number of time periods. We then raise e 
to the exponent we just computed and multiply 


the resulting value by the value of the initial 
price. For example, given a current price S, in 
Excel we use the formula 

S* exp((/x - 0.5V A 2 )*t - <rVt* 
NORMINV(RAND(), 0,1)) 

One might wonder whether this approach for 
simulating realizations of an asset price fol¬ 
lowing a geometric random walk is equivalent 
to the simulation approach mentioned earlier 
when we introduced geometric random walks, 
which is based on the discrete version of the 
equation for a random walk. The two ap¬ 
proaches are different (for example, the ap¬ 
proach based on the discrete version of the 
equation for the geometric random walk does 
not produce the expected lognormal price dis¬ 
tribution), but it can be shown that the differ¬ 
ences in the two simulation approaches tend to 
cancel over many steps. 

Parameter Estimation 

In order to simulate paths of the geometric ran¬ 
dom walk, we need to have estimates of the 
parameters (p and a). The implicit assump¬ 
tion here, of course, is that these parameters 
remain constant over the time period of estima¬ 
tion. (We will discuss how to incorporate con¬ 
siderations for changes in volatility later in this 
entry.) Note that the equation for the geometric 
random walk can be written as 

ln(S (+ i) - ln(Sf) = ln(l + f t ) 
Equivalently, 



Given a historical series of T prices of an asset, 
we can therefore do the following to estimate p 
and a: 

1. Compute ln(Sf+i/Sf) for each time period f, 
t = 0,...,T-1. 

2. Estimate the volatility of the geometric ran¬ 
dom walk, a, as the standard deviation of all 

ln(WS f ). 
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3. Estimate for the drift of the arithmetic ran¬ 
dom walk, /z, as the average of all ln(Sf + i/ 
S t ), plus one half of the standard deviation 
squared. 

If we are given data on the returns r t of an asset 
rather than the prices of the asset, we can com¬ 
pute ln(l + r f ), and use it to replace ln(Sf+i/S f ) 
in steps 1-3 above. This is because 

log (“^) = lo § + Sf+ g f = iogC 1 + f 0 

Geometric Random Walk: Some 
Additional Facts 

To summarize, the geometric random walk has 
several important characteristics: 

• It is a multiplicative model, that is, the price at 
the next time period is a multiple of a random 
term and the price from the previous time 
period. 

• It has a constant drift /i and volatility a. 
At every time period, the percentage change 
in price is normally distributed, on average 
equal to fi, with a standard deviation of a. 

• The overall noise in a geometric random walk 
never decays. The percentage price change 
over t time periods is distributed as a nor¬ 
mal distribution with mean equal to /r f and 
standard deviation equal to a *Jt. 

• The exact distribution of the future price 
knowing the initial price can be found. The 
price at time f is lognormally distributed with 
specific probability distribution parameters. 

• Prices that follow a geometric random walk 
in continuous time never become negative. 

The geometric random walk model is not 
perfect. However, its computational simplicity 
makes the geometric random walk and its vari¬ 
ations the most widely used processes for mod¬ 
eling asset prices. The geometric random walk 
defined with log returns never becomes nega¬ 
tive, because future prices are always a multiple 
of the initial stock price and a positive term. (See 



Figure 7 Five Paths of a Geometric Random 
Walk with fi = -0.0014 and cr = 0.0411 
Note: Although the drift is slightly negative, it is 
still possible to generate paths that generally in¬ 
crease over time. 


Figure 7.) In addition, observed historical stock 
prices can actually be quite close to lognormal. 

It is important to note that, actually, the as¬ 
sumption that log returns are normal is not re¬ 
quired to justify the lognormal model for prices. 
If the distribution of log returns is non-normal, 
but the log returns are IID with finite variance, 
the sum of the log returns is asymptotically nor¬ 
mal. (This is based on a version of the central 
limit theorem.) Stated differently, the log return 
process is approximately normal if we consider 
changes over sufficiently long intervals of time. 

Price processes, however, are not always ge¬ 
ometric random walks, even asymptotically. A 
very important assumption for the geometric 
random walk is that price increments are inde¬ 
pendently distributed; if the time series exhibits 
autocorrelation, the geometric random walk is 
not a good representation. We will see some 
models that incorporate considerations for au¬ 
tocorrelation and other factors later in this entry. 


MEAN REVERSION 

The geometric random walk provides the foun¬ 
dation for modeling the dynamics for asset 
prices of many different securities, including 
stock prices. However, in some cases it is not 
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Figure 8 Weekly Data for One-Year Treasury Bill 
Rates: January 5, 1962-July 31, 2009 

justified to assume that asset prices evolve with 
a particular drift, or can deviate arbitrarily far 
from some kind of a representative value. In¬ 
terest rates, exchange rates, and the prices of 
some commodities are examples for which the 
geometric random walk does not provide a 
good representation over the long term. For 
instance, if the price of copper becomes high, 
copper mines would increase production in or¬ 
der to maximize profits. This would increase 
the supply of copper in the market, therefore 
decreasing the price of copper back to some 
equilibrium level. Consumer demand plays a 
role as well—if the price of copper becomes 
too high, consumers may look for substitutes, 
which would reduce the price of copper back to 
its equilibrium level. 

Figure 8 illustrates the behavior of the one- 
year Treasury bill yield from the beginning of 
January 1962 through the end of July 2009. It 
can be observed that, even though the variabil¬ 
ity of Treasury bill rates has changed over time, 
there is some kind of a long-term average level 
of interest rates to which they return after de¬ 
viating up or down. This behavior is known as 
mean reversion. 

The simplest mean reversion (MR) model is 
similar to an arithmetic random walk, but the 
means of the increments change depending on 
the current price level. The price dynamics are 
represented by the equation 

Sf+1 = St + k ■ (p - St) + er • g f 


where e f is a standard normal random variable. 
The parameter k is a nonnegative number that 
represents the speed of adjustment of the mean- 
reverting process—the larger its magnitude, the 
faster the process returns to its long-term mean. 
The parameter /x is the long-term mean of the 
process. When the current price St is lower than 
the long-term mean /x, the term (/x - St) is posi¬ 
tive. Hence, on average there will be an upward 
adjustment to obtain the value of the price in 
the next time period, S f+ i- (We add a positive 
number, k-(ji - St), to the current price S t .) By 
contrast, if the current price St is higher than the 
long-term mean //, the term (/x - St) is negative. 
Hence, on average there will be a downward ad¬ 
justment to obtain the value of the price in the 
next time period, St+i. (We add a negative num¬ 
ber, k-(ji - St), to the current price S f .) Thus, the 
mean-reverting process will behave in the way 
we desire—if the price becomes lower or higher 
than the long-term mean, it will be drawn back 
to the long-term mean. 

In the case of the arithmetic and the geometric 
random walks, the cumulative volatility of the 
process increases over time. By contrast, in the 
case of mean reversion, as the number of steps 
increases, the variance peaks at 


K • (2 — k) 

In continuous time, this basic mean-reversion 
process is called the Ornstein-Uhlenbeck process. 
(See the last section of this entry.) It is widely 
used when modeling interest rates and ex¬ 
change rates in the context of computing bond 
prices and prices of more complex fixed-income 
securities. When used in the context of model¬ 
ing interest rates, this simple mean-reversion 
process is also referred to as the Vasicek model 
(see Vasicek, 1977). 

The mean-reversion process suffers from 
some of the disadvantages of the arithmetic ran¬ 
dom walk—for example, it can technically be¬ 
come negative. However, if the long-run mean 
is positive, and the speed of mean reversion is 
large relative to the volatility, the price will be 
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Figure 9 Five Paths with 50 Steps Each of a 
Mean-Reverting Process with /x = 1.4404, k = 
0.0347, and a = 0.0248 

pulled back to the mean quickly when it be¬ 
comes negative. Figure 9 contains an example 
of five paths generated from a mean-reverting 
process. 

Simulation 

The formula for the mean-reverting process 
makes it clear how paths for the mean-reverting 
random walk can be generated. As in the case of 
the arithmetic and the geometric random walks, 
all we need is a way of simulating the standard 
normal random variables St . We start with an 
initial price So, which is known. We know the 
values of the drift /x, the speed of adjustment k, 
and the volatility a over one period. To gener¬ 
ate the price at the next time period. Si, we add 
K-(fi - So) to So, simulate a normal random vari¬ 
able from a standard normal distribution, mul¬ 
tiply it by a, and add it to So + at-(/x - So). At 
the next step (time period 2), we use the price 
at time period 1 we already generated. Si, add 
to it /c (/x - Si), simulate a new random variable 
from a standard normal distribution, multiply 
it by a, and add it to Si + at-(/x - Si). We proceed 
in the same way until we generate the desired 
number of steps of the random walk. For exam¬ 
ple, given a current price S, in Excel the price 
for the next time period can be generated with 
the formula 

S + K*(fi - S) + a*NORMINV(RAND(), 0,1) 


Parameter Estimation 

In order to simulate paths of the mean-reverting 
random walk, we need estimates of the param¬ 
eters (at, /x, and er). Again, we assume that these 
parameters remain constant over the time pe¬ 
riod of estimation. The equation for the mean- 
reverting process can be written as 

S f+ i - Sf = k • Gu - Sf) + a ■ s t 

or, equivalently, 

Sf+l — Sf = K ■ /X — AT ■ Sf + o ■ gf 

This equation has the characteristics of a lin¬ 
ear regression model, with the absolute price 
change (S t+ i - S f ) as the response variable and 
Sf as the explanatory variable. Given a his¬ 
torical series of T prices for an asset, we can 
therefore do the following to estimate tc, /x, 
and a: 

1. Compute the price changes (Sf+i - Sf) for 
each time period t, t — 0,, T-l. 

2. Run a linear regression with (Sf+i - Sf) as the 
response variable and Sf as the explanatory 
variable. 

3. Verify that the estimates from the linear re¬ 
gression model are valid: 

a. Plot the values of Sf versus (Sf+i - Sf). The 
points in the scatter plot should approxi¬ 
mately vary around a straight line with no 
visible cyclical or other patterns. 

b. The p-value for the coefficient in front 
of the explanatory variable Sf should be 
small, preferably less than 0.05. (The p- 
values of the regression coefficients are 
part of standard regression output for 
most software packages. Most generally, 
they measure the degree of significance 
of the regression coefficient for explaining 
the response variable in the regression.) 

4. An estimate for the speed of adjustment of 
the mean-reversion process, at, can be ob¬ 
tained as the negative of the coefficient in 
front of Sf . Since the speed of adjustment can¬ 
not be a negative number, if the coefficient in 
front of Sf is positive, the regression model 
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cannot be used for estimating the parameters 
of the mean reverting process. 

5. An estimate for the long-term mean of the 
mean-reverting process, /x, can be obtained 
as the ratio of the intercept term estimated 
from the regression and the slope coefficient 
in front of St (if that slope coefficient is valid, 
i.e., negative and with low p-value). 

6. An estimate for the volatility of the mean- 
reverting process, a , can be obtained as the 
standard error of the regression. (The stan¬ 
dard error of the regression is also part 
of standard regression output for statisti¬ 
cal software packages and spreadsheet pro¬ 
grams like Excel. It measures the standard 
deviation of the points around the regression 
line.) 


Geometric Mean Reversion 

A more advanced mean-reversion model that 
bears some similarity to the geometric random 
walk is the geometric mean reversion (GMR) 
model 

Sf+i = S t + k ■ (n - S t ) ■ S t + a ■ S t • s t 

(Note that this is a special case of the mean 
reversion model S t+ i — St + /c-(/x - St)-St + 
o ■ S~t -St, where y is a parameter selected in ad¬ 
vance. The most commonly used models have 
y = 1 or y = 1/2.) The intuition behind 
this model is similar to the intuition behind 
the discrete version of the geometric random 
walk—the variability of the process changes 
with the current level of the price. However, 
the GMR model allows for incorporating mean 
reversion. Even though it is difficult to estimate 
the future price analytically from this model, it 
is easy to simulate. For example, given a cur¬ 
rent price S, in Excel the price for the next time 
period can be generated with the formula 

S + k*(/j,- S)*S + a*S 

*NORMINV(RAND(), 0,1) 



Figure 10 Five Paths with 50 Steps Each of a 
Geometric Mean Reversion Process with /x = 
1.4464, k = 0.0253, and a = 0.0177 

Figure 10 contains an example of five paths 
generated from a geometric mean reversion 
model. 

To estimate the parameters k, /x, and a to use 
in the simulation, we can use a series of T his¬ 
torical observations for the price of an asset. As¬ 
sume that the parameters of the geometric mean 
reversion remain constant during the time pe¬ 
riod of estimation. 

Note that the equation for the geometric 
mean-reverting random walk can be written as 



or, equivalently, as 

St+i — St ci~ 

--- = K- fl — K - St+ (T-e t 

St 

Again, this equation bears characteristics of 
a linear regression model, with the percentage 
price change (Sf+i - St)/St as the response 
variable and St as the explanatory variable. 
Given a historical series of T prices of an asset, 
we can therefore do the following to estimate 

k, (i, and cr: 

l. Compute the percentage price changes 
(Sf+i - St)/St for each time period t, t = 
0,..., T-l. 

2. Run a linear regression with (Sf+i - St)/S t as 
the response variable and St as the explana¬ 
tory variable. 
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3. Verify that the estimates from the linear re¬ 
gression model are valid: 

a. Plot the values of S t versus (S f+ i - S t )/S t . 
The points in the scatter plot should ap¬ 
proximately vary around a straight line 
with no visible cyclical or other patterns. 

b. The p-value for the coefficient in front 
of the explanatory variable St should be 
small, preferably less than 0.05. 

4. An estimate for the speed of adjustment of 
the mean-reverting process, k, can be ob¬ 
tained as the negative of the coefficient in 
front of S f . Since the speed of adjustment can¬ 
not be a negative number, if the coefficient in 
front of Sf is positive, the regression model 
cannot be used for estimating the parameters 
of the geometric mean-reverting process. 

5. An estimate for the long-term mean of the 
mean-reverting process, /i, can be obtained 
as the ratio of the intercept term estimated 
from the regression and the slope coefficient 
in front of Sf (if that slope coefficient is valid, 
i.e., negative and with low p-value). 

6. An estimate for the volatility of the mean- 
reverting process, a , can be obtained as the 
standard error of the regression. 

ADVANCED RANDOM WALK 
MODELS 

The models we described so far provide build¬ 
ing blocks for representing the asset price dy¬ 
namics. However, observed real-world asset 
price dynamics has features that cannot be in¬ 
corporated in these basic models. For exam¬ 
ple, asset prices exhibit correlation—both with 
each other and with themselves over time. Their 
volatility typically cannot be assumed to be con¬ 
stant. This section reviews several techniques 
for making asset price models more realistic de¬ 
pending on observed price behavior. 

Correlated Random Walks 

So far, we have discussed models for asset 
prices that assume that the dynamic processes 
for the prices of different assets evolve inde¬ 


pendently of each other. This is an unrealistic 
assumption—it is expected that market condi¬ 
tions and other factors will have an impact on 
the prices of groups of assets simultaneously. 
For example, it is likely that stock prices for 
companies in the oil industry will generally 
move together, as will stock prices for compa¬ 
nies in the telecommunications industry. 

The argument that asset prices are codepen¬ 
dent has theoretical and empirical foundations 
as well. If asset prices were independent ran¬ 
dom walks, then large portfolios would be fully 
diversified, have no variability, and therefore 
be completely deterministic. Empirically, this 
is not the case. Even large aggregates of stock 
prices, such as the S&P 500, exhibit random be¬ 
havior. 

If we make the assumption that log returns 
are jointly normally distributed, then their de¬ 
pendencies can be represented through the 
covariance matrix (equivalently, through the 
correlation matrix). It is worth noting that 
in general, covariance and correlation are not 
equivalent with dependence of random vari¬ 
ables. Covariance and correlation measure only 
the strength of linear dependence between 
two random variables. However, in the case 
of a multivariate normal distribution, covari¬ 
ance and correlation are sufficient to represent 
dependence. 

Let us give an example of how one can model 
two correlated stock prices assumed to follow 
geometric random walks. Suppose we are given 
two historical series of T observations each of 
observed asset prices for Stock 1 and Stock 2. 
We follow the steps described in the previous 
sections of this entry to estimate the drifts and 
the volatilities of the two processes. To estimate 
the correlation structure, we find the correlation 
between 



and 


In 



where the indices (1) and (2) correspond to 
Stock 1 and Stock 2, respectively. For example, 
in Excel the correlation between two data series 
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stored in Arrayl and Array2 can be computed 
with the function CORREL(Arrayl, Array2). 
This correlation can then be incorporated in 
the simulation. (Excel cannot simulate corre¬ 
lated normal random variables. A number of 
Excel add-ins for simulation are available, how¬ 
ever, and they have the capability to do so. Such 
add-ins include @RISK (sold by Palisade Cor¬ 
poration, http://www.palisade.com). Crystal 
Ball (sold by Oracle, http://www.oracle.com), 
and Risk Solver (from Frontline Systems, 
the developers of the original Excel Solver, 
http://www.solver.com).) Basically, at every 
step, we generate correlated normal random 
variables, fif' and ef \ with means of zero and 
with a given covariance structure. Those real¬ 
izations of the correlated normal random vari¬ 
ables are then used to compute the next period's 
Stock 1 price and the next period's Stock 2 price. 

When we consider many different assets, the 
covariance matrix becomes very large and can¬ 
not be estimated accurately. Factor models can 
be used to reduce the dimension of the covari¬ 
ance structure. Multivariate random walks are 
in fact dynamic factor models for asset prices. 
A multifactor model for the return of asset i can 
be written in the following general form: 

rf ) = ^ + J2P (i ' k) -ft k) + s t i) 

k =1 

where the K factors / follow random walks, 
p(‘,k) are i-pg f ac tor loadings, and ef 1 are normal 
random variables with zero means. 

It is important to note that the covariance ma¬ 
trix cannot capture correlations at lagged times 
(i.e., correlations of dynamic nature). Further¬ 
more, the assumptions that log returns behave 
as multivariate normal variables is not always 
applicable—some assets exhibit dependency of 
a nonlinear kind, which cannot be captured by 
the covariance or correlation matrix. Alterna¬ 
tive tools for modeling covariability include 
copula functions and transfer entropies. (See, 
for example. Chapter 17 and Appendix B in 
Fabozzi, Focardi, and Kolm, 2006.) 


Incorporating Jumps 

Many of the dynamic asset price processes used 
in industry assume continuous sample paths, as 
was the case with the arithmetic, geometric, and 
the different mean-reverting random walks we 
considered earlier in this entry. However, there 
is empirical evidence that the prices of many se¬ 
curities incorporate jumps. The prices of some 
commodities, such as electricity and oil, are no¬ 
torious for exhibiting "spikes." The logarithm 
of a price process with jumps is not normally 
distributed, but is instead characterized by a 
high peak and heavy tails, which are more typi¬ 
cal of market data than the normal distribution. 
Thus, more advanced models are needed to in¬ 
corporate realistic price behavior. 

A classical way to include jumps in models 
for asset price dynamics is to add a Poisson pro¬ 
cess to the process (geometric random walk or 
mean reversion) used to model the asset price. 
A Poisson process is a discrete process in which 
arrivals occur at random discrete points in time, 
and the times between arrivals follow an expo¬ 
nential distribution with average time between 
arrivals equal to 1/A. This means that the num¬ 
ber of arrivals in a specific time interval follows 
a Poisson distribution with mean rate of arrival 
A. The "jump" Poisson process is assumed to be 
independent of the underlying "smooth" ran¬ 
dom walk. 

The Poisson process is typically used to fig¬ 
ure out the times at which the jumps occur. The 
magnitude of the jumps itself could come from 
any distribution, although the lognormal distri¬ 
bution is often used for tractability. 

Let us explain in more detail how one would 
model and simulate a geometric random walk 
with jumps. At every point in time, the process 
moves as a geometric random walk and up¬ 
dates the price S t to S f+ i. If a jump happens, the 
size of the jump is added to S t as well to obtain 
Sf+i. In order to avoid confusion about whether 
we have included the jump in the calculation, 
let us denote the price right before we find out 
whether a jump has occurred , and keep the 
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total price for the next time period as Sf + i- We 
therefore have 

= Sf + fu. ■ St + a ■ St ■ Sf 

that is, Sj! j is computed according to the normal 
geometric random walk rule. Now suppose that 
a jump of magnitude J t occurs between time t 
and time t+1. Let us express the jump magni¬ 
tude as a percentage of the asset price, that is, 
let 

St+i = s<;> ■ J t 

If we restrict the magnitude of the jumps Jt 
to be nonnegative, we will make sure that the 
asset price itself does not become negative. 

Let us now express the changes in price in 
terms of the jump size. Based on the relationship 
between S f+ i, and J t/ we can write 

St +1 - s t ( ;j = -1) 

and hence 

S f +i = S t+1 - s ( -\ • (Jt - 1) 

Thus, we can substitute this expression for 
Sf,| and write the geometric random walk with 
jumps model as 

St+1 = St + /r • St + er ■ Sf • St + s| + j ■ (Jt — 1) 

How would we simulate a path for the jump- 
geometric random walk process? Note that 
given the relationship between S t+1 , and 
Jt, we can write 

ln(S f+ i) = ln(Sj~j) + ln(/f) 

Since S f ( j j is the price resulting only from the 
geometric random walk at time f, we already 
know what ln(S f ( ^l{) is. Recall based on our dis¬ 
cussion of the geometric random walk that 

ln(S t ( ~|) = ln(S f ) + (n — 0.5 ■ a 2 ) + as t 

Therefore, the overall equation will be 
ln(S f+ i) = ln(S f ) + (p - 0.5 ■ a 1 ) + a ■ e t 

+ EM// 0 ) 

i 


where / J 1 are all the jumps that occur during 
the time period between f and t+1. This means 
that 

St+i = S t ■ e ^-°- 5cr2+a -~ e ‘ ■ ]"~[ jJ i] 

i 

where the symbol n denotes product. (If no 
jumps occurred between f and t+1, we set the 
product to 1.) 

Hence, to simulate the price at time t+1, we 
need to simulate 

* A standard normal random variable St, as in 
the case of a geometric random walk. 

* How many jumps occur between t and t+1. 

* The magnitude of each jump. 

For more details, see Pachamanova and 
Fabozzi (2010) and Glasserman (2004). 

As Merton (1976) pointed out, if we assume 
that the jumps follow a lognormal distribu¬ 
tion, then ln(/f) is normal, and the simulation 
is even easier. See Glasserman (2004) for more 
advanced examples. 

Stochastic Volatility 

The models we considered so far all assumed 
that the volatility of the stochastic process re¬ 
mains constant over time. Empirical evidence 
suggests that the volatility changes over time, 
and more advanced models recognize that fact. 
Such models assume that the volatility param¬ 
eter a itself follows a random walk of some 
kind. Since there is some evidence that volatil¬ 
ity tends to be mean-reverting, often different 
versions of mean-reversion models are used. 
For more details on stochastic volatility models 
and their simulation see, for example, Glasser¬ 
man (2004) and Hull (2008). 


STOCHASTIC PROCESSES 

In this section, we provide an introduction to 
what is known as stochastic calculus. Our goal 
is not to achieve a working knowledge in the 
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subject, but rather to provide context for some 
of the terminology and the formulas encoun¬ 
tered in the literature on modeling asset prices 
with random walks. 

So far, we discussed random walks for which 
every step is taken at a specific discrete point 
in time. When the time increments are very 
small, almost zero in length, the equation of 
a random walk describes a stochastic process in 
continuous time. In this context, the arithmetic 
random walk model is known as a generalized 
Wiener process or Brownian motion (BM). The geo¬ 
metric random walk is referred to as geomet¬ 
ric Brownian motion (GBM), and the arithmetic 
mean-reverting walk is the Ornstein-Uhlenbeck 
process described earlier. 

Special notation is used to denote stochastic 
processes in continuous time. Increments are 
denoted by d or A. (For example, (Sf+i - St) is 
denoted dSt, meaning a change in St over an in¬ 
finitely small interval.) The equations describ¬ 
ing the process, however, have a very similar 
form to the equations we introduced earlier in 
this section: 

dSt = pdt + <j dW 

Equations involving small changes ("differ¬ 
ences") in variables are referred to as differ¬ 
ential equations. In words, the equation above 
reads "The change in the price St over a small 
time period dt equals the average drift /x multi¬ 
plied by the small time change plus a random 
term equal to the volatility a multiplied by dW, 
where dW is the increment of a Wiener pro¬ 
cess." The Wiener process, or Brownian motion, 
is the fundamental building block for many of 
the classical asset price processes. 

A standard Wiener process W(f) has the fol¬ 
lowing properties: 

1. For any time s < t, the difference W(f) - W(s) 
is a normal random variable with mean zero 
and variance (f - s). It can be expressed as 
-ft — s ■ e, where e is a standard normal ran¬ 
dom variable. 


2. For any times 0 < t\ < t 2 < f 3 < f 4 , the dif¬ 
ferences {W(t 2 ) - W(fj)) and (W(f 4 ) - W(f 3 )) 
(which are random variables) are indepen¬ 
dent. (These differences are the actual incre¬ 
ments of the process at different points in 
time.) Note that independent implies uncor¬ 
related. 

3. The value of the Wiener process at the begin¬ 
ning is zero, W(f 0 ) = 0. 

Using the new notation, the first two proper¬ 
ties can be restated as 

Property 1. The change dW during a small pe¬ 
riod of time dt is normally distributed with 
mean 0 and variance dt and can be expressed 
as Vdt ■ e. 

Property 2. The values of dW for any two 
nonoverlapping time intervals are indepen¬ 
dent. 

The arithmetic random walk can be obtained 
as a generalized Wiener process, which has the 
form 

dSt = a dt + b dW 

The appeal of the generalized Wiener process 
is that we can find a closed-form expression for 
the price at any time period. Namely, 

S t = So+a ■ t + b ■ W(t) 

The generalized Wiener process is a special 
case of the more general class of Ito processes, 
in which both the drift term and the coefficient 
in front of the random term are allowed to be 
nonconstant. The equation for an Ito process is 

dS t = a(S,t)dt + b(S,t)dW 

GBM and the Ornstein-Uhlenbeck process are 
both special cases of Ito processes. 

In contrast to the generalized Wiener process, 
the equation for the Ito process does not allow 
us to write a general expression for the price at 
time t in closed form. However, an expression 
can be found for some special cases, such as 
GBM. We now show how this can be derived. 

The main relevant result from stochastic cal¬ 
culus is the so-called Ito's lemma, which states 
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the following. Suppose that a variable x follows 
an Ito process 

dx t = a(x, t) dt + b(x, t)dW 


and let y be a function of x, that is, 
y t = f(x, t) 


Then, y evolves according to the following 
differential equation: 



+ ^ ■ b ■ dW 
dx 


1 a 2 / 

2 ' dx 2 



dt 


where the symbol 9 is standard notation for the 
partial derivative of the function/ with respect 
to the variable in the denominator. For exam¬ 
ple, df/dt is the derivative of the function/ with 
respect to f assuming that all terms in the ex¬ 
pression for/ that do not involve f are constant. 
Respectively, 9 2 denotes the second derivative 
of the function/ with respect to the variable in 
the denominator, that is, the derivative of the 
derivative. 

This expression shows that a function of a 
variable that follows an Ito process also follows 
an Ito process. 

Although a rigorous proof of Ito's lemma is 
beyond the scope of this entry, we will provide 
some intuition. Let us see how we would go 
about computing the expression for y in Ito's 
lemma. 

In ordinary calculus, we could obtain an ex¬ 
pression for a function of a variable in terms of 
that variable by writing the Taylor series exten¬ 
sion: 


, 9/ J 9/ 1 9 2 / , , 

dy = — ■ dx + — ■ dt + - ■ —■ dx 2 

J dx dt 2 dx 2 

1 9 2 / 2 9 2 / 

+ - • —■ dt 2 H- ■ dxdt + . 

2 dt 2 dxdx 


We will get rid of all terms of order dt 2 or 
higher, deeming them too small. We need to 
expand the terms that contain dx, however, 
because they will contain terms of order dt. We 


have 

dy = ■ ( a(x, t)dt + b(x, t) dW) + ~ ■ dt 

+ \ ■ y-y • (a(x, t) dt + b(x, t) dW) 2 
2 dx z 

The last expression in parentheses, when ex¬ 
panded, becomes (dropping the arguments of a 
and b for notational convenience) 

(a dt + b dW) 2 = a 2 (dt) 2 + b 2 (dW) 2 
+ 2ab ■ dt ■ dW 
— b 2 dt 

To obtain this expression, we dropped the first 
and the last term in the expanded expression, 
because they are of order higher than dt. The 
middle term, b 2 (dW) 2 , in fact equals b 2 -dt as dt 
goes to 0. The latter is not an obvious fact, but 
it follows from the properties of the standard 
Wiener process. The intuition behind it is that 
the variance of ( dW ) 2 is of order dt 2 , so we can 
ignore it and treat the expression as determinis¬ 
tic and equal to its expected value. The expected 
value of ( dW ) 2 is in fact dt. 

Substituting this expression back into the ex¬ 
pression for dy, we obtain the expression in Ito's 
lemma. 

Using Ito's lemma, let us derive the equation 
for the price at time f, S t that was the basis for 
the exact simulation method for the geometric 
random walk. Suppose that S t follows the GBM 

dS t = (/i- S t )dt + (o-S t )dW 

We will use Ito's lemma to compute the equa¬ 
tion for the process followed by the logarithm 
of the stock price. In other words, in the nota¬ 
tion we used in the definition of Ito's lemma, 
we have 


yt = f(x, t) = In St 
We also have 

a — jj. ■ S and b = a ■ S 
Finally, we have 

9/_9(lnS)_l 9 2 / _ 9(1/S) _ 1 

dx dS S dx 2 dS S 2 
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Plugging into the equation for y in Ito's 
lemma, we obtain 

dlnS=(-*-a+0+^ - ' b 2 ) dt 

+ i ■ b ■ dW 

I 1 ~ 2 ' dt + a ■ dW 

which is the equation we presented earlier. This 
also explains the presence of the 



term in the expression for the drift of the GBM. 


KEY POINTS 

• Models of asset dynamics include trees (such 
as binomial trees) and random walks (such 
as arithmetic, geometric, and mean-reverting 
random walks). Such models are called dis¬ 
crete when the changes in the asset price are 
assumed to happen at discrete time incre¬ 
ments. When the length of the time increment 
is assumed to be infinitely small, we refer to 
them as stochastic processes in continuous 
time. 

* The arithmetic random walk is an additive 
model for asset prices—at every time period, 
the new price is determined by the price at 
the previous time period plus a deterministic 
drift term and a random shock that is dis¬ 
tributed as a normal random variable with 
mean equal to zero and a standard deviation 
proportional to the square root of the length of 
the time period. The probability distribution 
of future asset prices conditional on a known 
current price is normal. 

• The arithmetic random walk model is analyti¬ 
cally tractable and convenient; however, it has 
some undesirable features such as a nonzero 
probability that the asset price will become 
negative. 

* The geometric random walk is a multiplica¬ 
tive model for asset prices—at every time pe¬ 


riod, the new price is determined by the price 
at the previous time period multiplied by a 
deterministic drift term and a random shock 
that is distributed as a lognormal random 
variable. The volatility of the process grows 
with the square root of the elapsed amount 
of time. The probability distribution of future 
asset prices conditional on a known current 
price is lognormal. 

• The geometric random walk is not only an¬ 
alytically tractable, but is more realistic than 
the arithmetic random walk, because the as¬ 
set price cannot become negative. It is widely 
used in practice, particularly for modeling 
stock prices. 

• Mean reversion models assume that the asset 
price will meander, but will tend to return to a 
long-term mean at a speed called the speed of 
adjustment. They are particularly useful for 
modeling prices of some commodities, inter¬ 
est rates, and exchange rates. 

• The codependence structure between the 
price processes for different assets can be 
incorporated directly (by computing the 
correlation between the random terms in 
their random walks), by using dynamic 
multifactor models, or by more advanced 
means such as copula functions and transfer 
entropies. 

• A variety of more advanced random walk 
models are used to incorporate different as¬ 
sumptions, such as time-varying volatility 
and "spikes," or jumps, in the asset price. 
They are not as tractable analytically as the 
classical random walk models, but can be 
simulated. 

• The Wiener process, a stochastic process in 
continuous time, is a basic building block 
for many of the stochastic processes used 
to model asset prices. The increments of a 
Wiener process are independent, normally 
distributed random variables with variance 
proportional to the length of the time period. 

• An Ito process is a generalized Wiener pro¬ 
cess with drift and volatility terms that can be 
functions of the asset price and time. 
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• An important result in stochastic calculus is 
Ito's lemma, which states that a variable that 
is a function of a variable that follows an Ito 
process follows an Ito process itself with spe¬ 
cific drift and volatility terms. 
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Abstract: Arbitrage in its most basic form involves the simultaneous buying and selling of an 
asset at two different prices in two different markets. In real-world financial markets, arbitrage 
opportunities rarely, if ever, exist. Less obvious arbitrage opportunities exist in situations where a 
package of assets can be assembled that have a payoff (return) that is identical to an asset that is 
priced differently. A market is said to be a complete market if an arbitrary payoff can be replicated 
by a portfolio. The most fundamental principle in asset pricing theory is the absence of arbitrage 
opportunities. 


The principle of absence of arbitrage or the 
no-arbitrage principle is perhaps the most 
fundamental principle of finance theory. In the 
presence of arbitrage opportunities, there is 
no trade-off between risk and returns because 
it is possible to make unbounded risk-free 
gains. The principle of absence of arbitrage is 
fundamental for understanding asset valuation 
in a competitive market. This entry discusses 
arbitrage pricing in a finite-state, discrete-time 
setting. However, it is important to note that 
there are well-known limits to arbitrage, 
first identified by Shleifer and Vishny (1997), 
resulting from restrictions imposed on rational 
traders and, as a result, pricing inefficiencies 
may exist for a period of time. 

THE ARBITRAGE PRINCIPLE 

Let's begin by defining what is meant by 
arbitrage. In its simple form, arbitrage is the 


simultaneous buying and selling of an asset at 
two different prices in two different markets. 
The arbitrageur profits without risk by buying 
cheap in one market and simultaneously selling 
at the higher price in the other market. Such 
opportunities for arbitrage are rare. In fact, 
a single arbitrageur with unlimited ability to 
sell short could correct a mispricing condition 
by financing purchases in the underpriced 
market with proceeds from short sales in the 
overpriced market. This means that riskless 
arbitrage opportunities are short-lived. 

Less obvious arbitrage opportunities exist in 
situations where a package of assets can pro¬ 
duce a payoff (return) identical to an asset that 
is priced differently. This arbitrage relies on a 
fundamental principle of finance called the law 
of one price, which states that a given asset must 
have the same price regardless of the location 
where the asset is traded and the means by 
which one goes about creating that asset. The 
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law of one price implies that if the payoff of an 
asset can be synthetically created by a package 
of assets, the price of the package and the price 
of the asset whose payoff it replicates must be 
equal. 

When a situation is discovered whereby the 
price of the package of assets differs from that 
of an asset with the same payoff, rational in¬ 
vestors will trade these assets in such a way 
so as to restore price equilibrium. This market 
mechanism is founded on the fact that an arbi¬ 
trage transaction does not expose the investor 
to any adverse movement in the market price 
of the assets in the transaction. 

For example, consider how we can produce an 
arbitrage opportunity involving three assets A, 
B, and C. These assets can be purchased today at 
the prices shown below, and can each produce 
only one of two payoffs (referred to as State 1 
and State 2) a year from now: 


Asset 

Price 

Payoff in State 1 

Payoff in State 2 

A 

$70 

$50 

$100 

B 

60 

30 

120 

C 

80 

38 

112 


While it is not obvious from the data pre¬ 
sented above, an investor can construct a port¬ 
folio of assets A and B that will have the 
identical payoff as asset C in both State 1 and 
State 2. Let iva and w B be the proportion of assets 
A and B, respectively, in the portfolio. Then the 
payoff (i.e., the terminal value of the portfolio) 
under the two states can be expressed mathe¬ 
matically as follows: 

• If State 1 occurs: $50 wa + $30 wb 

• If State 2 occurs: $100 wa + $120 wb 

We create a portfolio consisting of A and B 
that will reproduce the payoff of C regardless 
of the state that occurs one year from now. Here 
is how: For either condition (State 1 and State 2), 
we set the payoff of the portfolio equal to the 
payoff for C as follows: 

• State 1: $50 wa + $30 wg = $38 

• State 2: $100 wa + $120 wb = $112 


We also know that wa + w B = 1. If we solved 
for the weights for Wa and w B that would si¬ 
multaneously satisfy the above equations, we 
would find that the portfolio should have 40% 
in asset A (i.e., wa = 0.4) and 60% in asset B 
(i.e., wb = 0.6). The cost of that portfolio will be 
equal to 

(0.4)($70) + (0.6)($60) = $64 

Our portfolio (i.e., package of assets) com¬ 
prised of assets A and B has the same payoff 
in State 1 and State 2 as the payoff of asset C. 
The cost of asset C is $80 while the cost of the 
portfolio is only $64. This is an arbitrage oppor¬ 
tunity that can be exploited by buying assets 
A and B in the proportions given above and 
shorting (selling) asset C. 

For example, suppose that $1 million is in¬ 
vested to create the portfolio with assets A and 
B. The $1 million is obtained by selling short as¬ 
set C. The proceeds from the short sale of asset 
C provide the funds to purchase assets A and B. 
Thus, there would be no cash outlay by the in¬ 
vestor. The payoffs for States 1 and 2 are shown 
below: 


Asset 

Investment 

State 1 

State 2 

A 

$ 400,000 

$ 285,715 

$ 571,429 

B 

600,000 

300,000 

1,200,000 

C 

-1,000,000 

-475,000 

-1,400,000 

Total 

0 

$110,715 

$371,429 


ARBITRAGE PRICING IN A 
ONE-PERIOD SETTING 

We can describe the concepts of arbitrage pric¬ 
ing in a more formal mathematical context. It 
is useful to start in a simple one-period, finite- 
state setting as in the example of the previous 
section. This means that we consider only one 
period and that there is only a finite number M 
of states of the world. In this setting, asset prices 
can assume only a finite number of values. 

The assumption of finite states is not as 
restrictive as it might appear. In practice. 
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security prices can only assume a finite num¬ 
ber of values. Stock prices, for example, are not 
real numbers but integer fractions of a dollar. 
In addition, stock prices are nonnegative num¬ 
bers and it is conceivable that there is some very 
high upper level that they cannot exceed. In ad¬ 
dition, whatever simulation we might perform 
is a finite-state simulation given that the preci¬ 
sion of computers is finite. 

The finite number of states represents uncer¬ 
tainty. There is uncertainty because the world 
can be in any of the M states. At time 0 it is not 
known in what state the world will be at time 1. 
Uncertainty is quantified by probabilities but a 
lot of arbitrage pricing theory can be developed 
without any reference to probabilities. Suppose 
there are N securities. Each security i pays 
dij number of dollars (or of any other unit 
of account) in each state of the world j. The 
payoff of each security need not be a positive 
number. For instance, a derivative instrument 
might have negative payoffs in some states of 
the world. Therefore, in a one-period setting, 
the securities are formally represented by an 
At x M matrix D = {dij} where the d,, entry is 
the payoff of security i in state j. The matrix D 
can also be written as a set of At row vectors: 


D = 



d; = [di i • di m] 


where the M-vector d, represents the payoffs of 
security i in each of the M states. 

Each security is characterized by a price S. 
Therefore, the set of At securities is character¬ 
ized by an At-vector S and an AtxM matrix D. 
Suppose, for instance, there are two states and 
three securities. Then the three securities are 
represented by 



"ST 


dn 

du 

s = 

s 2 

, D = 

dii 

di2 


s 3 . 


_d3i 

<N 

CO 


Every row of the D matrix represents one se¬ 
curity, every column one state. Note that in a 
one-period setting, prices are defined at time 0 


while payoffs are defined at time 1. There is no 
payoff at time 0 and there is no price at time 1. 
A portfolio is represented by an At-vector of 
weights 0. In our example of a market with 
two states and three securities, a portfolio is a 
3-vector: 


di 


0 = 


02 


LfcJ 


The market value Se of a portfolio 0 at time 0 
is a scalar given by the scalar product: 


N 

S e = S0 = s i 0 i 

i=i 


Its payoff de at time 1 is the M-vector: 


d 0 = D'0 


The price of a security and the market value 
of a portfolio can be a negative number. In the 
previous example of a two-state, three-security 
market we obtain 


S@ — S0 — Sj$i + S2O2 + S3 $3 


d 0 = D'0 


dn d 2 1 

du d 22 


d^i 

d 2 2 


0\ 

02 

03 


di\6\+ d 2 i0 2 + c?3i03 
di20i-\- d 22 d 2 + d 22 6 2 


Let's introduce the concept of arbitrage in 
this simple setting. As we have seen, arbitrage 
is essentially the possibility of making money 
by trading without any risk. Therefore, we de¬ 
fine an arbitrage as any portfolio 6 that has a 
negative market value Sg = S0 < 0 and a non¬ 
negative payoff D g = D' Q > 0 or, alternatively, 
a nonpositive market value Sg = S 0 < 0 and a 
positive payoff D g = UQ > 0. 


State Prices 

Next we define state prices. A state-price vector 
is a strictly positive M-vector 4> such that secu¬ 
rity prices can be written as S = Dij). In other 
words, given a state-price vector, if it exists. 
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security prices can be recovered as a weighted 
average of the securities' payoffs, where the 
state-price vector gives the weights. In the pre¬ 
vious two-state, three-security example we can 
write: 



S = Dl/r 


"ST 


dn 

di2 

1-1 

to 

1_1 


dufi+ di2f2 

s 2 

= 

d 2 1 

d 22 

= 

d2lfl+ d 2 2 V^2 

_s 3 


_d 3 i 

d 32 _ 


_d 3 lV f l+ dz2f2 _ 


Given security prices and payoffs, state prices 
can be determined solving the system: 

dufi + di2^2 = Si 
^ 21^1 + ^ 22^2 = S2 
d$lfl + ^32^2 = S3 

This system admits solutions if and only if 
there are two linearly independent equations 
and the third equation is a linear combination 
of the other two. Note that this condition is nec¬ 
essary but not sufficient to ensure that there are 
state prices as state prices must be strictly posi¬ 
tive numbers. 

A portfolio 0 is characterized by payoffs d@ = 
D'0. Its price is given, in terms of state prices, 
by: Sq = S0 = Dip© = d 0 ip. 

It can be demonstrated that there is no arbi¬ 
trage if and only if there is a state-price vector. 
The formal demonstration is quite complicated 
given the inequalities that define an arbitrage 
portfolio. It hinges on the separating hyper¬ 
plane theorem, which says that, given any two 
convex disjoint sets in R M , it is possible to find 
a hyperplane separating them. A hyperplane is 
the locus of points x, that satisfy a linear equa¬ 
tion of the type: 

M 

a 0 + Ui*i = 0 

1=1 

Intuitively, however, it is clear that the ex¬ 
istence of state prices ensures that the law of 


one price introduced in the previous section 
is automatically satisfied. In fact, if there are 
state prices, two identical payoffs have the same 
price, regardless of how they are constructed. 
This is because the price of a security or of 
any portfolio is univocally determined as a 
weighted average of the payoffs, with the state 
prices as weights. 


Risk-Neutral Probabilities 

Let's now introduce the concept of risk-neutral 
probabilities. Given a state-price vector, con¬ 
sider the sum of its components fo = f\ + fi + 
... + Vow- Normalize the state-price vector by 
dividing each component by the sum t/tq- The 
normalized state-price vector 



is a set of positive numbers whose sum is one. 
These numbers can be interpreted as probabil¬ 
ities. They are not, in general, the real proba¬ 
bilities associated with states. They are called 
risk-neutral probabilities. We can then write 


S 


1 

fo 


= Dip 


We can interpret the above relationship as fol¬ 
lows: The normalized security prices are their 
expected payoffs under these special probabili¬ 
ties. In fact, we can rewrite the above equation 
as 


Si 


S, 

fo 


E[di] 


where expectation is taken with respect to 
risk-neutral probabilities. In this case, security 
prices are the discounted expected payoffs un¬ 
der these special risk-neutral probabilities. 

Suppose that there is a portfolio 0 such that 
d § = D'0 = {1,1,..., 1}. This portfolio can be 
one individual risk-free security. As we have 
seen above, S0 = deip, which implies that 
fo = 0S is the discount on riskless borrowing. 
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Complete Markets 

Let's now define the concept of complete mar¬ 
kets, a concept that plays a fundamental role 
in finance theory. In the simple setting of the 
one-period finite-state market, a complete mar¬ 
ket is one in which the set of possible portfo¬ 
lios is able to replicate an arbitrary payoff. Call 
span(D) the set of possible portfolio payoffs, 
which is given by the following expression: 

span(D) = {D'0: 0 e R M } 

A market is complete if span(D) = R M . 

A one-period finite-state complete market is 
one where the equation 

D'0 = £:$ e R M 

always admits a solution. Recall from matrix 
algebra that this is the case if and only if the 
rank of D is M. This means that there are at 
least M linearly independent payoffs—that is, 
there are as many linearly independent pay¬ 
offs as there are states. Let's write down explic¬ 
itly the system in the two-state, three-security 
market. 

D'0 = $ 

d 11 d 2 i ^31 

. du d22 d& 

dn&i + d 2 i 0 2 + ^31^3 = fi 
dl2$l + d22&2 + ^32^3 = ^2 

This system of linear equations admits solu¬ 
tions if and only if the rank of the coefficient 
matrix is 2. This condition is not verified, for 
example, if the securities have the same pay¬ 
off in each state. In this case, the relationship 
§1 = £2 must always be verified. In other words, 
the three securities can only replicate portfolios 
that have the same payoff in each state. 

In this simple setting it is easy to associate 
risk-neutral probabilities with real probabili¬ 
ties. In fact, suppose that the vector of real prob¬ 
abilities p is associated to states so that p L is 
the probability of the z-th state. For any given 


M-dimensional vector x, we write its expected 
value under the real probabilities as 

M 

E[x] = px = Y pm 

i =1 

It can be demonstrated that there is no arbi¬ 
trage if and only if there is a strictly positive M- 
vector 7t such that: S = E [D7t]. Any such vector 
7t is called a state-price deflator. To see this point, 
define 



Pi 


Prices can then be expressed as 

M M . M 

Sj = Ydqfj =Yp id‘i~~ = X! Pi^'Fi 
7=1 1=i Pi 1=i 

which demonstrates that S = £[D7t], 

We can now specialize the above calculations 
in the numerical case of the previous section. 
Recall that in the previous section we gave the 
example of three securities with the following 
prices and payoffs expressed in dollars: 


"70" 


60 


_80_ 


"50 

100" 

30 

120 

_38 

112 


We first compute the relative state prices: 

50i/fi + IOO 1 A 2 = 70 
30tAi + 120^2 = 60 
38iAi + 112^2 = 80 

Solving the first two equations, we obtain 


in 


' 4 /5 ‘ 

_f 2 _ 


_ 3 /to _ 


However, the third equation is not satisfied by 
these values for the state prices. As a conse¬ 
quence, there does not exist a state-price vector, 
which confirms that there are arbitrage oppor¬ 
tunities as observed in the first section. 


01 

03 
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Now suppose that the price of security C 
is $64 and not $80. In this case, the third 
equation is satisfied and the state-price vector 
is the one shown above. Risk-neutral probabil¬ 
ities can now be easily computed. Here is how. 
First sum the two state prices: 4 /s + 3 /io = n /io to 
obtain 

fo = + fl = “/lO 

and consequently the risk-neutral probabilities: 




'fl/fo' 


' 8 /u' 

_fl. 


. fl/fo . 


. 3 /ll. 


Risk-neutral probabilities sum to one while 
state prices do not. We can now check if our 
market is complete. Write the following equa¬ 
tions: 

5O0i + 3002 + 3803 = §i 

10001 + 12002 + 11203 = Hi 

The rank of the coefficient matrix is clearly 2 as 
the determinant of the first minor is different 
from zero: 

; 5 [! = 50 x 120 - 100 x 30 = 300 / 0 

100 120 _ 

Our sample market is therefore complete and 
arbitrage-free. A portfolio composed of the first 
two securities can replicate any payoff and the 
third security can be replicated as a portfolio of 
the first two. 


ARBITRAGE PRICING IN A 
MULTIPERIOD FINITE-STATE 
SETTING 

The above basic results can be extended to 
a multiperiod finite-state setting using proba¬ 
bilistic concepts. The economy is represented by 
a probability space (£2,31, P) where Q is the set 
of possible states, 3 is the algebra of events (re¬ 
call that we are in a finite-state setting and there¬ 
fore there are only a finite number of events), 
and P is a probability function. As the number 
of states is finite, finite probabilities P({a>}) = 


P(o>) = p„, are defined for each state. There is 
only a finite number of dates from 0 to T. 

Propagation of Information 

The propagation of information is represented by a 
filtration S5 f that, in the finite case, is equivalent 
to an information structure t f . The latter is a 
discrete, hierarchical organization of partitions 
I t with the following properties: 

Ik = ({Ait}); k = 0 , T; i — 1,, A4; 
l = Mi<-<A4<-<M r = M 

Mk 

Ajk n Aj k = 0 if i j and A ik = £2 

/=l 

and, in addition, given any two sets Ak, Aji„ 
with h > k, either their intersection is empty 
Ak H Aji , = 0 or Ak 5 Aji,. In other words, the 
partitions become more refined with time. 

Each security i is characterized by a payoff 
process d' t and by a price process Sf In this 
finite-state setting, d\ and S\ are discrete vari¬ 
ables that, given that there are M states, can 
be represented by M-vectors d} = [d' t {a>)\ and 
= [S t ! (<y)] where d\{co) and S l t (co) are, respec¬ 
tively, the payoff and the price of the z-th asset at 
time f, 0 < t < T and in state co e Q. All payoffs 
and prices are stochastic processes adapted to 
the filtration Jv f . Given that d\ and Sj are adapted 
processes in a finite probability space, they have 
to assume a constant value on each partition of 
the information structure f. It is convenient to 
introduce the following notation: 

dity = d't(co), co e A jt 

S Aj t = S t ( co) , co e A^ 

where d 1 ^ and represent the constant val¬ 
ues that the processes d\ and S\ assume on the 
states that belong to the sets Ajt of each parti¬ 
tion I t . There is Mo = 1 value for d l A and S A , 
M f values for d l A and S l A and Mj = M val¬ 
ues for d\ and S' A . The same notation and 
the same consideration can be applied to any 
process adapted to the filtration ,3>. 
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Trading Strategies 

We have to define the meaning of trading strate¬ 
gies in this multiperiod setting. A trading strat¬ 
egy is a sequence of portfolios 9 such that 9 t 
is the portfolio held at time f after trading. To 
ensure that there is no anticipation of informa¬ 
tion, each trading strategy 6 must be an adapted 
process. The payoff d e generated by a trading 
strategy is an adapted process df with the fol¬ 
lowing time dynamics: 

df = 0f_i(Sf + df) — OfSt 

An arbitrage is a trading strategy whose pay¬ 
off process is nonnegative and not always zero. 
In other words, an arbitrage is a trading strat¬ 
egy that is never negative and which is strictly 
positive for some instants and some states. Note 
that imposing the condition that payoffs are al¬ 
ways nonnegative forbids any initial positive 
investment that is a negative payoff. 

A consumption process is any nonnegative 
adapted process. Markets are said to be com¬ 
plete if any consumption process can be ob¬ 
tained as the payoff process of a trading strategy 
with some initial investment. Market complete¬ 
ness means that any nonnegative payoff process 
can be replicated with a trading strategy. 


State-Price Deflator 

We will now extend the concept of state-price 
deflator to a multiperiod setting. A state-price 
deflator is a strictly positive adapted process 7t t 
such that the following set of M equations hold: 


for each state, the term on the left, the prices 
Sj, is an adapted process that, as mentioned, 
assumes constant values on each set of the par¬ 
tition If. The term on the right is a conditional 
expectation multiplied by a factor 1 /n t . The pro¬ 
cess 7Tf is adapted by definition and, therefore, 
assumes constant values TtA it on each set of the 
partition I t . 

In this finite setting, conditional expecta¬ 
tions are expectations computed with condi¬ 
tional probabilities. Conditional expectations 
are adapted processes. Therefore they assume 
one value at t — 0, My values for f = j , and M 
values at the last date. 

To illustrate the above, let's write down ex¬ 
plicitly the above equation in terms of the nota¬ 
tion d‘ A . t and S^. Note first that 


P({a>}\At) = 


P(M n At) P«<a}) 


P(At) P(At)’ 
if me At , 0 if co $ At 

Given that the probability space is finite, 

P(A jt ) = £ Pa) 
coeAjt 

As we defined P({cu}) = p a „ the previous equa¬ 
tion becomes 


P({co}\At) = 


P({w}nAt) P(M) 


P(At) 


P(At) 


E P»o 


coeAkt 


if co e Akt, 0 if co ^ Akt- 




7T t 


T 


n i di j 

/=*+! 


In other words, a state-price deflator is a strictly 
positive process such that prices S' t are random 
variables equal to the conditional expectation 
of discounted payoffs with respect to the filtra¬ 
tion 3. As noted above, in this finite-state set¬ 
ting a filtration is equivalent to an information 
structure If. Note that in the above stochastic 
equation—which is a set of M equations, one 


Pricing Relationships 

We can now write the pricing relationship as 
follows: 

% = — E \ p (l"ll4w) ( El XiHd)H 

71 AkA me At \ \;=f+l 



1 

^At 


E 

SIEAi 


( E p») 

<ueAi, 




u=t+l 


At 6 It, 1 <k < A4t 
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The above formulas generalize to any trading 
strategy. In particular, if there is a state-price de¬ 
flator, the market value of any trading strategy 
is given by 


Aj.o = {1+2 + 3 + 4}, A u = {1 +2}, 

A 2 .i = {3 + 4} 

A-1,2 = {!}, ^2,2 = {2}, A},2 = {3}, A,2 = {4} 


e, x s, = —e 

nt 


E 

./=*+! 


(0tSt)Aki 


1 

71 At 


E 

(oeAkt 


P(MIAct) ( E 
./=‘+i 


/ 


l 

T 4 


E 

a>eAkt 


Vco 


\ 



E xjHdjH 

.M+l 


/J 


It is possible to demonstrate that the payoff- 
price pair (d\, S' f ) admits no arbitrage if and 
only if there is a state-price deflator. These con¬ 
cepts and formulas generalize those of a one- 
period setting to a multiperiod setting. 

Given a payoff-price pair (d\, S}) it is possi¬ 
ble to compute the stateprice deflator, if it ex¬ 
ists, from the previous equations. In fact, it is 
possible to write a set of linear equations in 
the 7tt, 7tt -1 for each period. One can proceed 
backward from the period T to period 1 writ¬ 
ing a homogeneous system of linear equations. 
As the system is homogeneous, one of the vari¬ 
ables can be arbitrarily fixed; for example, the 
initial value ttq can be assumed equal to 1. If 
the system admits nontrivial solutions and if 
all solutions are strictly positive, then there are 
state-price deflators. 

To illustrate the above, let's write down ex¬ 
plicitly the previous formulas for prices, ex¬ 
tending the example of the previous section to 
a two-period setting. We assume there are three 
securities and two periods, that is, three dates 
(0,1,2) and four states, indicated with the inte¬ 
gers 1,2,3,4, so that 11 = {1,2,3,4}. Assume that 
the information structure is given by the follow¬ 
ing partitions of events: 


h = (fo = Mi,oh h = Mu, A2 ,i), 
h = Ml, 2 , ^ 2 , 2 , A 3 , 2 , A 4 2 }) 


where we use + to indicate logical union, so 
that, for example, {1 + 2} is the event formed by 
states 1 and 2. The interpretation of the above 
notation is the following. At time zero the world 
can be in any possible state, that is, the securi¬ 
ties can take any possible path. Therefore the 
partition at time zero is formed by the event 
{1 + 2 + 3 + 4}. At time 1, the set of states is 
partitioned into two mutually exclusive events, 
{1+2} or {3 + 4}. At time 2 the partition is 
formed by all individual states. Note that this is 
a particular example; different partitions would 
be logically admissible. 

Figure 1 represents the above structure. Each 
security is characterized by a price process and 
a payoff process adapted to the information 
structure. Each process is a collection of three 
discrete random variables indexed with the 
time indexes 0,1,2. Each discrete random vari¬ 
able is a 4-vector as it assumes as many values as 
states. However, as processes are adapted, they 
must assume the same value on each partition 
of the information structure. Note also that pay¬ 
offs are zero at date zero and prices are zero at 
date 2. Therefore, in this example, we can put 
together these vectors in two 3x4 matrices for 



Figure 1 An Information Structure with Four 
States and Three Dates 
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each security as follows 


{s'H} 


"W 

S[( 1) 

0 

%( 2) 

4(2) 

0 

4>( 3) 

4( 3) 

0 

-%( 4 ) 

S](4) 

0 


The state-price deflator can be represented as 
follows: 


{7T f (ft>)} 


"jto(1) 

7Ti(l) 

772(1) “ 

tt 0 (2) 

771 (2) 

77 2 (2) 

tt 0 (3) 

771 (3) 

772(3) 

_ jr o(4) 

77i(4) 

772(4) _ 



‘0 

d\( 1) 

4(VT 

KH] - 

0 

4(2) 

4(2) 

0 

4( 3) 

4( 3) 


_0 

d[( 4) 

4(4). 


The following relationships hold: 


7r 0 (l) = jr o(2) = ?ro(3) = 7T 0 (4) 

7Ti(1) = 7Ti(2) = 7Ti(3) = 7Ti(4) 

A probability p a is assigned to each of the 
four states of the world. The probability of each 
event is simply the sum of the probabilities of 
its states. We can write down the formula for 
security prices in this way: 


SoO) = %(2) = S‘( 3) = S'(4) = S‘ Aio ; 

Si(l) = S i(2) = S< Aii ; 

S'(3) = S'(4) = S' 2i 

4 ( 1) = 4 ( 2 ) = 4 ( 3 ) = 4( 4) = 4 21 

where, as above, S| (co) is the price of secu¬ 
rity i in state a> at moment t and d\ (o>) is 
the payoff of security i in state co at time f 
with the restriction that processes must assume 
the same value on partitions. This is because 
processes are adapted to the information struc¬ 
ture so that there is no anticipation of informa¬ 
tion. One must not be able to discriminate at 
time 0 events that will be revealed at time 1 and 
so on. 

Observe that there is no payoff at time 0 and 
no price at time 2 and that the payoffs at time 
2 have to be intended as the final liquidation of 
the security as in the one-period case. Payoffs at 
time 1, on the other hand, are intermediate pay¬ 
ments. Note that the number of states is chosen 
arbitrarily for illustration purposes. Each state 
of the world represents a path of prices and 
payoffs for the set of three securities. To keep 
the example simple, we assume that of all the 
possible paths of prices and payoffs only four 
are possible. 


S\ 2 = S],(l) = S\ 2 = S‘ 2 ( 2) = S i Ai = S l 2 (3) 

= 4, 2 = 4(4) = 0 
= S i(D = S[( 2) 

= -[P(Ai, 2 |Ai,i) n - 2 (l)d*(l) 

71 A,i 

+ P(A 2 ' 2 \A hl )n 2 (2)d' 2 (2)] 


Pi 


77 A.i L Pi + pi 
P2 


7T 2 (l)d'(l) 


Pi + P2 


77 2 (2)d*(2) 


S^, = Si(3) = S{(4) 

= -[^(-^3,2 1 ^ 2 , 1 ) 772 ( 3 )^ 2 ( 3 ) 

77 A,\ 

+ P(^4.2|A 2 ,i)7r 2 (4)d'(4)] 


P 3 


77 A 2 ,1 L P3 + P4 

Pi 


2r 2 (3)4(3) 


7T 2 (4)d'(4) 

P3 + P4 

S A,o = {PlDhk.A.i + * 2(1 )d 2 (l)] + PiixA^d^ 
+ 7T 2 (2)d 2 (2)] + p 3 [nA h2 d l Ai2 + 712(3)^2(3)] 

+Pi[ 7r Ai. 2 d' Al2 + 772 ( 4 )^ 2 ( 4 )]} 

These equations illustrate how to com¬ 
pute the state-price deflator knowing prices. 
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payoffs, and probabilities. They form a ho¬ 
mogeneous system of linear equations in 
^2(1)/ ^2(2)/ ^2(3), ^ 2 ( 4 ), Ai A / ^ A 22 / 

p 1 ^(l)7T 2 (l) + p 2 d' 2 {2)7t 2 {2) - S^(pi + p 2 ) 

77 A ,1 = 0 

p 3 d' 2 (3)7T 2 (3) + pid l 4 (A)7T 2 (4:) - S^ 2i (p 3 + p 4 ) 

77 4u = 0 

pid 2 (l)7T 2 (l) + p 2 d' 2 (2)it 2 (2) + p 3 d 2 (3)jT 2 (3) 

+ p4d[(A)7T 2 (A) + (pi + p 2 )d l Ai n All 
+ (p 3 + Pi)d‘ A23 JTA 2 , 3 ~ S l Al0 n Al0 = 0 

Substituting, we obtain 

pid 2 (l)7r 2 (l) + p 2 d‘ 2 {2 )jt 2 (2) - S l Alt (p! + p 2 ) 
n Al 1 = 0 

p 3 d 2 (3)jt 2 (3) + Pidl(A)jt 2 (A) - S' Ail (p 3 + p 4 ) 

1 = 0 

[(Pi + P2)S^ U + (pi + p 2 )4j 7I X 1 

+ [(P3 + Pi)S A21 + (p 3 + Pi)d l A2l ]iT: All 

- s \, 0 ^a w = 0 

This homogeneous system must admit a 
strictly positive solution to yield a state-price 
deflator. There are seven unknowns. However, 
as the system is homogeneous, if nontrivial so- 

Table 1 Conditional Probabilities 


lutions exist, one of the unknowns can be ar¬ 
bitrarily fixed, for example tt Ai 0 . Therefore, six 
independent equations are needed. Each asset 
provides two conditions, so a minimum of three 
assets are needed. 

To illustrate the point, we assume that all 
states (which are also events in this discrete 
example) have the same probability 0.25. Thus 
the events of the information structure have the 
following probabilities: the single event at time 
zero has probability 1, the two events at time 1 
have probability 0.5, and the four events at time 
2 coincide with individual states and have prob¬ 
ability 0.25. Conditional probabilities are shown 
in Table 1. 

For illustration purposes, let's write the fol¬ 
lowing matrices for payoffs for each security at 
each date in each state: 


0 15 50 


0 8 30 

0 15 100 

; 14(®)1 = 

0 8 120 

0 20 70 

0 15 40 

0 20 110 


0 15 140 


{rf'Hl 


0 5 38 
0 5 112 
0 8 42 
0 8 130 


P(A,il A 10 ) 
P(Ai, 2 I Ai,o) 

P(h3.2| Ai^o) 

P(A 1 , 2 ! A,d 
P(A 2 , 2 | A,0 
P(h3,2l A.l) 
P(A4,2! Ai 4 ) 


P(A u nAi, 0 ) P{1 + 2} 


P(A, 0 ) 

P{ 1 + 2 + 3 

i + 4) 

P(Ai, 2 n A\ A ) 

P{1} 


P(Ai, 0 ) 

Pjl + 2 + 3 

+ 4) 

P(A3,2 n A^o) 

P{3) 


P(Ai, 0 ) 

P{1 + 2 + 3 

1 + 4} 

P(Ai ,2 n Ai 4 ) 

P{1} 

0.25 

P(Ai,i) 

P{1 + 2) 

0.5 

P(A 22 n Ai_i) 

P{2) 

0.25 

P(Ai,i) 

P (1 + 2} 

0.5 

p(a 3 , 2 n a 14 ) 

pm 

0 

P(A,i) 

P {1 + 2} 

P(A, 2 n Ai 4 ) 

pm 

n 

P(Ai,i) 

P (1 + 2} 



P(A,il Aid) 
P(A 2 , 2 |A,o) 
P(A,2I Ai,o) 
T(A,21 A 2 ,i) 

p(a 2A \a 2A ) 

P(A3,2| ^ 2 , 1 ) 

P(d4,2| A2 ,i) 


p(A 2 ,1 n A!,o) = P{3 + 4} 

P(A,o) P{1+2 + 3 + 4) 

P(A 2 ,2 n A.o) = P {2} 
P(A.o) P{1+2 + 3 + 4) 

P(A, 2 n Ai, 0 ) P{4) 

PfAj.o) P{1 +2 + 3 + 4) 


P(Ai ,2 n A 2 ,i) P{(£) 


P(A 2 ,i) 

P{1 + 2) 

P{A 22 n A 2 ,i) 

pm 

P(A 24 ) 

P{ 1+2} 

P(A 3 ,2 n a 2 j) 

P{3) 

P(A 2 ,i) 

P (3 +4} 

p(A, 2 n a 2 ,i) 

P{4) 


P(A 2 ,i) P{3 + 4} 
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We will assume that the state-price deflator is 
the following given process: 




"1 

1 

1 

1 


0.8 0.7 ' 
0.8 0.75 
0.9 0.75 
0.9 0.8 


S 3 Ao = i [0.25(0.8 x 5 + 0.7 x 38) 

+ 0.25(0.8 x 5 + 0.75 x 112) 
+ 0.25(0.9 x 8 + 0.75 x 42) 
+ 0.25(0.9 x 8 + 0.8 x 130)] 
= 67.125 


Each price is computed according to the previ¬ 
ous equations. For example, calculations related 
to asset 1 are as follows: 

Si(l) = 3(2) = 3(3) = 3(4) = 0 

s^ u = ^g(0-5 X 0.7 X 50 + 0.5 x 075 x 100) 

= 68.75 

S\ 21 = ^(0.5 x 0.75 x 70 + 0.5 x 0.8 x 110) 
= 78.05 

Si io = j [0.25(0.8 x 15 + 0.7 x 50) 

+ 0.25(0.8 x 15 + 0.75 x 100) 

+ 0.25(0.9 x 20 + 0.75 x 70) 

+ 0.25(0.9 x 20 + 0.8 x 110)] 

= 68.75 

3(i) = 3(2) = 3(3) = 3(4) = o 

S 2 a = — (0.5 X 0.7 X 30 + 0.5 X 0.75 X 120) 

+.i 0 .8 ^ ' 

= 69.37 

S 2 All = -^(0.5 x 0.75 x 40 + 0.5 x 0.8 x 140) 
= 78.88 

S 2 Ao = i[0.25(0.8 x 8 + 0.7 x 30) 

+ 0.25(0.8 x 8 + 0.75 x 120) 

+ 0.25(0.9 x 15 + 0.75 x 40) 

+ 0.25(0.9 x 15 + 0.8 x 140)] 

= 73.2 

3(i) = 3(2) = 3(3) = 3(4) = o 

= — (0.5 x 0.7 x 38 + 0.5 x 0.75 x 112) 

+.i 0 .8 ^ ' 

= 69.12 

S^ 2i = -^(0.5 x 0.75 x 42 + 0.5 x 0.8 x 130) 
= 75.27 


With the above equations we computed prices 
from payoffs and state-price deflators. If prices 
and payoffs were given, we could compute 
state-price deflators from the homogeneous 
system for state prices established above. Sup¬ 
pose that the following price processes were 
given: 


{S»} = 


(3M1 = 


(3M) = 


"68.75 

68.75 

0 

68.75 

68.75 

0 

68.75 

78.05 

0 

68.75 

78.05 

0 


"73.2 

69.37 

0 

73.2 

69.37 

0 

73.2 

78.88 

0 

73.2 

78.88 

0 


"67.125 

69.12 

0 

67.125 

69.12 

0 

67.125 

75.27 

0 

67.125 

75.27 

0 


We could then write the following system of 
equations to compute state-price deflators: 

0.25 x 50 x 772 ( 1 ) + 0.25 x 100 x 7 x 2 ( 2 ) 

— 68.75 x 0.5 x 7T Ahl = 0 

0.25 x 70 x jt 2 (1) + 0.25 x 110 x jt 2 (2) 

— 78.05 x 0.5 x 7 t a j = 0 

(55 x 0.5 + 0.5 x 15) x 7 r Al + (70.25 x 0.5 
+ 0.5 x 20) x tta 21 — 68.75 x tt A j) = 0 
0.25 x 30 x 7 t 2 (1) + 0.25 x 120 x 7r 2 (2) 

— 69.37 x 0.5 x tt Aj = 0 

0.25 x 40 x tt 2 (1) + 0.25 x 140 x tt 2 (2) 

— 78.88 x 0.5 x 7 r A , = 0 

(55.5 x 0.5 + 0.5 x 8) x 7 + (71 x 0.5 
+ 0.5 x 15) x tta 21 — 73.2 x 7 r Al0 = 0 
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0.25 x 38 x 7r 2 (l) + 0.25 x 115 x tt 2 (2) 

— 69.12 x 0.5 x tcatj — 0 

0.25 x 42 x jr 2 (l) + 0.25 x 130 x tt 2 (2) 

— 75.27 x 0.5 x TtAn — 0 

(55 x 0.5 + 0.5 x 15) x jr/t,, + (70.25 x 0.5 
+ 0.5 x 20) x jta 21 — 67.125 x 7tA 10 — 0 

It can be verified that this system, obviously, is 
solvable and returns the same state-price defla¬ 
tors as in the previous example. 


counted, become martingales. More precisely, 
we will see that in the absence of arbitrage 
there is an artificial probability measure Q in 
which the following discounted present value 
relationship holds: 


Sj = Ef 




We can rewrite this equation explicitly as 
follows: 


Equivalent Martingale Measures 

We now introduce the concept and properties 
of equivalent martingale measures. This concept 
has become fundamental for the technology of 
derivative pricing. The idea of equivalent mar¬ 
tingale measures is the following. A martingale 
is a process X f such that at any time t its con¬ 
ditional expectation at time s, s > t coincides 
with its present value: X f = E f [ X, ]. In discrete 
time, a martingale is a process such that its value 
at any time is equal to its conditional expecta¬ 
tion one step ahead. In our case, this principle 
can be expressed in a different but equivalent 
way by stating that prices are the discounted 
expected values of future payoffs. The law of it¬ 
erated expectation then implies that price plus 
payoff processes are martingales. 

In fact, assume that we can write 


S t = E t 


E d i 


j=t+i 

then the following relationship holds: 


T 



T 

E d i 

j= t +i 

= E t 

dt+i + Et+i 

E d > 

;=t+l+l 


— E t [d t+ i + Sf+i] 


Given a probability space, price processes are 
not, in general, martingales. However it can 
be demonstrated that, in the absence of arbi¬ 
trage, there is an artificial probability measure 
in which all price processes, appropriately dis- 


s; = Ep 
= Ep 
= Ep 
= Ep 


k + r R '\ 

d t +1 + 
Rt.t+i 


1 

Rt.t+i 


R, 


d U 1 

Rt.t+i 

d\ +! + $ 


E q 

^t+i 


R, 


f,f+i 


E 

j=t+2 

T 

E 

j=t+2 


t+l.j 

± 

Rt.j 


t +1 


Ri 


■t,t +1 


which shows that the discounted price plus 
payoff process is a martingale. The terms on 
the left are the price processes, the terms on 
the right are the conditional expectations un¬ 
der the probability measure Q of the payoffs 
discounted with the risk-free payoff. 

The measure Q is a mathematical construct. 
The important point is that this new probabil¬ 
ity measure can be computed either from the 
real probabilities if the state-price deflators are 
known or directly from the price and payoff 
processes. This last observation illustrates that 
the concept of arbitrage depends only on the 
structure of the price and payoff processes and 
not on the actual probabilities. As we will see 
later in this entry, equivalent martingale mea¬ 
sures greatly simplify the computation of the 
pricing of derivatives. 

Let's assume that there is short-term risk-free 
borrowing in the sense that there is a trad¬ 
ing strategy able to pay for any given inter¬ 
val (t, s) one sure dollar at time s given that 
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{dtdt+i ■ ■ . d s -i)“ 1 has been invested at time f. 
Equivalently, we can define for any time inter¬ 
val (t, s) the payoff of a dollar invested risk-free 
at time t as R fjS = (dtdt+i ... d s _i). 

We now define the concept of equivalent prob¬ 
ability measures. Given a probability measure P 
the probability measure Q is said to be equiv¬ 
alent to P if both assign probability zero to the 
same events. An equivalent probability mea¬ 
sure Q is an equivalent martingale measure if 
all price processes discounted with R;y become 
martingales. More precisely, Q is an equivalent 
martingale measure if and only if the market 
value of any trading strategy is a martingale: 


Q t x St = Ep 


T 


E 

;=f+l 





Risk-Neutral Probabilities 

Probabilities computed according to the equiv¬ 
alent martingale measure Q are the risk-neutral 
probabilities. Risk-neutral probabilities can be 
explicitly computed. Here is how. Call q ( „ the 
risk-neutral probability of state a>. Let's write 
explicitly the relationship 


Si 




as follows: 


S*= £ 

coeAkt 


Cjoj 

Q(At) 



i=t+i 


dfo) 


= E 

coeAkt 



T 


E 

i=t+i 


d)H 

A, 


The above system of equations determines 
the risk-neutral probabilities. In fact, we can 
write, for each risky asset, Mf linear equations, 
where Mf is the number of sets in the partition 
If plus the normalization equation for probabil¬ 
ities. From the above equation, one can see that 


the system can be written as 


E 9* 

a>eAk,t 


T 


E 

;=f+l 


d)H 



E^ = 1 

£ 0=1 


= 0 


This system might be determined, indeter- 
mined, or impossible. The system will be im¬ 
possible if there are arbitrage opportunities. 
This system will be indetermined if there is an 
insufficient number of securities. In this case, 
there will be an infinite number of equivalent 
martingale measures and the market will not be 
complete. 

Now consider the relationship between risk- 
neutral probabilities and state-price deflators. 
Consider a probability measure P and a nonneg¬ 
ative random variable Y with expected value 
on the entire space equal to 1. Define a new 
probability measure as Q(B) = £[lgY] for any 
event B and where 1 r is the indicator function 
of the event B. The random variable Y is called 
the Radon-Nikodym derivative of Q and it is 
written 


It is clear from the definition that P and Q 
are equivalent probability measures as they as¬ 
sign probability zero to the same events. Note 
that in the case of a finite-state probability space 
the new probability measure is defined on each 
state and is equal to 

q a = Y(w)p 0> 


Suppose 7Tf is a state-price deflator. Let Q be 
the probability measure defined by the Radon- 
Nikodym derivative: 

tttRq.t 

ST = - 

7T 0 

The new state probabilities under Q are the 
following: 


qco 


7Tt(co)Rq,t 

7T 0 (a>) 


-p 


CO 
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Define the density process f t for Q as f ( = 
E f [f t]- As = E f [f r ] is an adapted process, we 
can write: 


(£f[?r])4c = %At = E 


Pa 


P(At ) 


?r(<y) 


= E 

(oeAkt 


(o^Akt 

Pw ^t(<w)Eo,T _ XAiaRo.t 

7T 0 (w) 


P(At) ?To(®) 


x— E 

77 A-f Ko,t 

TCq 


P(At) 


7t T [7r 0 (cL>)]R t ,T 


As Ef, s = (dtdt+i ■ ■ ■ d s - 1 ) is the payoff at time 
s of one dollar invested in a risk-free asset at 
time t, s > t, we can then write the following 
equation: 

1 = — Et[7T s Kf, s ] 

Jt t 


Therefore, 


1 = 


77 At 

1 

77 At 


coeAkt 


E 

weAkt 


Vo. 


P(At) 


7T s (cD)R t ' S 


1 < k < M, 


Substituting in the previous equation, we ob¬ 
tain, for each interval (f, T), 


£ah = (Ef[£r])At 


KAtPp ,t 
Am 


which we can rewrite in the usual notation as 


ft = £t[fr] = 


JTlO 


We can now state the following result. Con¬ 
sider any £sy -measurable variable Xj. This con¬ 
dition can be expressed equivalently stating 
that Xj assumes constant values on each set of 
the partition Ij. Then the following relationship 
holds: 


Ep[Xj] = Ef y[S jXj ] 

St 


To see this, consider the following demonstra¬ 
tion, which hinges on the fact that Xj assumes a 
constant value on each Aij and, therefore, can 
be taken out of sums. In addition, as demon¬ 
strated above, from 

1 = — E f [jr s E f>s ] 

it t 

the following relationship holds: 

P(At)XA, = E Pa> n s((0)Rt,s 

coeAkt 

l<k<Mt 


= E 


cue At 

1 


Q(At) 


, , Pa, 71 t(co)Roj . . 

t/H = 2^ Aua \ AT7T7\ x ’ (A 


cue At 


Q(At) MA 


Q(At) 


E 

AjCAt 


E 

cue Ay 


7to(co) 


1 *^0, R. TPaM(0) 

Q{Akt) AAA, [ hk, 


Q(At) 


E 


AtcAt 


x^Rojn^PjAij)' 
n 0 (a>) 


Q(At) 


E K?a^(A)] 


AyCAt 


1 x Aj ?Ay j) 

S* A^cAt P{Akt) 


%At 




Let's now apply the above result to the rela¬ 
tionship: 


s; = —£( 

itt 


7To 

77 / k(| j 


E Mj- 

;=f+l 


JTO r, 

— -tf 

71, 


E 

;=t+l 


JTO j 


E JtjRo.j dj 

7Tq jRy 

j=t +1 U r ’- 




We have thus demonstrated the following re¬ 
sults: There is no arbitrage if and only if there is 
an equivalent martingale measure. In addition. 
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TTf is a state-price deflator if and only if an equiv¬ 
alent martingale measure Q has the density pro¬ 
cess defined by 

_ TTfflo.I 

Sf — - 

7To 

In addition, it can be demonstrated that, if 
there is no arbitrage, markets are complete if 
and only if there is a unique equivalent martin¬ 
gale measure. 

To illustrate the above we now proceed to de¬ 
tail the calculations for the previous example of 
three assets, three dates, and four states. Let's 
first write the equations for the risk-free asset: 


1 = 


1 

71 At 
1 


17 A„ 
1 


E 

(oeAkt 

Pi 


P(At) 


7r s (<w)flf,s 


1 = - I ---n 2 (l)-Rl,2 + 


P2 


Pi + Pi 
P3 


1 = - —^—7r 2 (3)R li2 + 

H A n \ P3 + Pi 


Pi + Pi 
Pi 

P3 + Pi 


ni(Z)Ri,2 

H 2 (4)Ri,2 


1 = -[piJr 2 (l)flo,2 + Pi^i(2)Rq, 2 

*A 10 

+ P3^l(^)Ro,l + P47r 2 (4)flo,2] 

Ha„ = JTl(l) = 7 Ti(2) 

Ha 21 = Hi (3) = 7 Ti(4) 

ha 10 = n 0 (l) = n 0 (2) = 7 To(3) = 7T 0 (4) 

We can now rewrite the pricing relationships 
for the other risky assets as follows: 

At date 2, prices are zero: S' 2 = 0. 

At date 1, the relationship 


^ £l L-Rl.2 


holds. In fact, we can write the following: 

4.! = 4(1) = s'( 2) 

= ^y[P(Ai,2|Ai,iK 2 (l)d'(l) 

+ P(A 2 , 2 \A hl )7z 2 (2)d^2)] 


1 


Pi 


nn \Pi + Pi 
P2 


H 2 (l)Rl,2 


4( 1) 


Pi + Pi 


H2(2)Ri,2 


R 

4 ( 2 ) 

Pi,2 


■ 1,2 


Q(-^l,2l-^l,l) 


4W , „ .4(2) 


Pi,2 

4(1) q 2 


+ Q(-^2,2l-^l,l)' 


R 


■1,2 


4(2) 


.qi + qi Pi .2 

4. = s i< 3 ) = s i< 4 ) 

Q(2b,2l^l,l) 

<?3 


+ <?2 Pi,2 . 


4(3) 


Pi,2 

4(3) , 


Q(A4,2|-Ai,i) 


4(4) ' 

Pi,2 


4(4) 


.<?3 + ^4 Pi,2 PI3 + <74 Pi,2 _ 

At date 0, the relationship 


$= Eo 


4 

. Po,i 


“2 

Po, 2 . 


holds. In fact we can write the following: 

4,0 = S o(l) = 5o(2) = So (3) = Sg(4) 

pl[7Ti(l)dj(l) + H 2 (l)4(l)] 

1 + p 2 [7ri(2)4(2) + 7r 2 (2)4(2)] 

77 Ao +P3[ jr i(3)dj(3) + h 2 (3)4(3)] 

+ P4[jri(4)dj(4) + tt 2 (4)4(4)] 


= Pi 
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+ Pi 


\n 1 (4)R 0 , 1 d[(A) If p 3 

I-„--;- X2W)J<1,2 
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The value of a derivative instrument might 
depend on the path of its past values. Consider a 
lookback option on a stock—that is, a derivative 
instrument on a stock whose payoff at time t is 
the maximum difference between the price of 
the stock and a given value K at any moment 
prior to f. Call Vf the payoff of the lookback 
option at time f. We can then write: 


Vf = max(Si; — K) + 

o<k<t 

(S k - K)+ S k -K (S k - K)+ = ma x(S k - K, 0) 


THE BINOMIAL MODEL 

Let's now introduce the simple but important 
multiperiod finite-state model known as the 
binomial model. The binomial model is impor¬ 
tant because it gives a simple and mathemat¬ 
ically tractable model of stock price behavior 
that tends, in the limit of a zero time step, to a 
Brownian motion. 1 We introduce a market pop¬ 
ulated by one risk-free asset and by one or more 
risky assets whose price(s) follow(s) a binomial 
or trinomial model. In the next section we will 
see how to compute the price of derivative in¬ 
struments in this market. 


In the binomial model of stock prices, we as¬ 
sume that at each time step the stock price will 
assume one of two possible values. This is a re¬ 
striction of the general multiperiod finite-state 
model described in the previous sections on 
probability theory. The latter is, as we have seen 
in the previous section, a hierarchical structure 
of partitions of the set of states. The number of 
sets in any partition is arbitrary, provided that 
partitions grow more refined with time. 

The binomial model assumes that there are 
two positive numbers, d and u, such that 0 < 
d < u and such that at each time step the price 
St of the risky asset changes to dS t or to uS t . 
In general one assumes that 0 < d < 1 < u so 
that d represents a price decrease (a movement 
down) while u represents a price increase (a 
movement up). It is often required that 



u 


In this case an equal number of movements up 
and down leave prices unchanged. The bino¬ 
mial model is a Markov model as the distri¬ 
bution of St clearly depends only on the value 
of S f _i. 

A binomial model can be graphically repre¬ 
sented by a tree. For example. Figure 2 shows a 
binomial model for three periods. A binomial 
model over T time steps, from 0 to T, produces a 
total of 2 T paths. Therefore, the corresponding 
space of states has 2 T states. Flowever, the 


^2,1 



Figure 2 Binomial Model: Illustration of a Bino¬ 
mial Tree with Three Dates, Three Final Prices, and 
Four States: uu, ud, du, dd 
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number of different final prices Sj = 
u k d T ~ k So , k = 0,1,..., T is determined 
solely by the number of u and d in each path 
and increases by 1 at each time step; there are 
as many final prices as dates. For example, the 
model in Figure 2 shows three final prices and 
four states. 

Note that there is a simple relationship be¬ 
tween the numbers d and u and returns. In fact, 
we can write. 


R t (up) = 


S t+ i — St u St — St 


= u — 1 


r > 0 is the positive risk-free rate. To avoid arbi¬ 
trage it is clearly necessary that d < 1 + r < u. 
In fact, if the interest rate is inferior to both the 
up and down returns, one can make a sure profit 
by buying the risky asset and shorting the risk¬ 
free asset. If the interest rate is superior to both 
the up and down returns, one can make a sure 
profit by shorting the risky asset and buying 
the risk-free asset. Denote by bt the price of the 
risk-free asset at time f. From the definition of 
price movement in the binomial model we can 
write: b t — (1 + r) l b q. 


R f (down) = d — 1 

Real probabilities of states are typically con¬ 
structed from the probabilities of a movement 
up or down. Call p the probability of a move¬ 
ment up; 1 — p is thus the probability of a move¬ 
ment down. Suppose that the state s, which is 
identified by a price path, has k movements up 
and T—k movements down. The probability of 
the state s is 

p s = p k ( 1 - p) T ~ k 

Consider the final date T. Each of the possible 
final prices St = u k d T ~ k So, k = 0,1,..., T can 
be obtained through 

T\ _ T! 
k ) k\(T — k)\ 

paths with k movements up and T — k move¬ 
ments down. The probability distribution of fi¬ 
nal prices is therefore a binomial distribution: 

P(S T = u k d T ~ k S 0 ) = (J) p k ( 1 - p) T ~ k 

Following the same reasoning, one can 
demonstrate that at any intermediate date the 
probability distribution of prices is a binomial 
distribution as follows: 


Risk-Neutral Probabilities for the 
Binomial Model 

Let's now compute the risk-neutral probabili¬ 
ties. In the setting of binomial models, the com¬ 
putation of risk-neutral probabilities is simple. 
In fact we have to impose the condition: 

Rt = hf Q It?f+l] 


which we can explicitly write as follows: 

quS t + (l-q)dS t 
S,= 1 + r 

1 + r — qu + d — qd 
1 + r — d 

q = 

i -q = 


u — d 
u — 1 — r 
u — d 


As we have assumed 0<d<l+r<n, the 
condition 0 < q < 1 holds. Therefore we can 
state that the unique risk-neutral probabilities 
are 

1+r — d 


q 

l-q 


u — d 
u — 1 — r 
u — d 


P(S t = mV- 1 So) = (A p k (l - pf- k 

Next introduce a risk-free security. In the set¬ 
ting of a binomial model, a risk-free security is 
simply a security such that d — u = 1 + r where 


The binomial model is complete and arbitrage 
free. 

Suppose that there is more than one risky as¬ 
set, for example two risky assets, in addition 
to the risk-free asset. At each time step each 










116 


Asset Pricing Models 


risky asset can go either up or down. Therefore 
there are four possible joint movements at each 
time step: uu , ltd, du , dd that we identify with 
the states 1,2,3,4. Four probabilities must be de¬ 
termined at each time step; four equations are 
therefore needed. Two equations are provided 
by the martingale conditions: 


S} 

sf 


qi uSj + qiuSj + q^uSj + q^tiSj 
1 T r 

qiuSf + q$uS} + q 2 uSf + q^uSf 
1 + r 


A third equation is provided by the fact that 
probabilities must sum to 1. The fourth condi¬ 
tion, however, is missing. The model is incom¬ 
plete. 

The problem of approximating price pro¬ 
cesses when there are two stocks and one bond 
and where the stock prices follow two corre¬ 
lated lognormal processes has long been of in¬ 
terest to financial economists. As seen above, 
with two stocks and one bond available for trad¬ 
ing, markets cannot be completed by dynamic 
trading. This is not the case in the continuous¬ 
time model, in which markets can be completed 
by continuous trading in the two stocks and the 
bond. Different solutions to this problem have 
been proposed in the literature. 2 


ARBITRAGE PRICING 
IN A DISCRETE-TIME, 
CONTINUOUS-STATE 
SETTING 

Let's now discuss the discrete-time, 
continuous-state setting. This is an impor¬ 
tant setting as it is, for example, the setting of 
the arbitrage pricing theory (APT) model. 3 

As in the previous discrete-time, discrete- 
state setting, we apply probabilistic concepts. 
The economy is represented by a probability 
space (£2, cr, P) where £2 is the set of possible 
states, a is the er-algebra of events (formed, in 


this continuous-state setting, by a nondenumer- 
able number of events), and P is a probability 
function. As the number of states is infinite, the 
probability of each state is zero and only events, 
in general, formed by nondenumerable states 
have a finite probability. There are only a finite 
number of dates from 0 to T. The propagation of 
information is represented by a finite filtration 
f = 0,l,...,T.Inthis case, the filtration 
is not equivalent to an information structure t f . 

Each security i is characterized by a payoff 
process d\ and by a price process S\. In this 
continuous-state setting, d\ and Sj are formed 
by a finite number of continuous variables. As 
before, d\{to) and S' t (a>) are, respectively, the 
payoff and the price of the z-th asset at time 
t, 0 < t < T and in state to e £2. All payoffs and 
prices are stochastic processes adapted to the 
filtration Js . 

To develop an intuition for continuous- 
state arbitrage pricing, consider the previous 
multiperiod, finite-state case with a very large 
number M of states, M >> N where N is the 
number of securities. Recall from our earlier 
discussion that risk-neutral probabilities can be 
computed solving the following system of lin¬ 
ear equations: 


t 


£ d)(co) 

j=t+i 


R 


t.i 


M 


= 0 




»=1 


Recall also that at each date t the information 
structure I t partitions the set of states into M t 
subsets. Each partition therefore yields N x Mt 
equations and the system is formed by a total 
of 

T-l 

NxJ2 M t 

t =0 

equation plus the probability normalizing 
equation. Consider that the previous system 
can be broken down, at each date f, into sep¬ 
arate blocks formed by N equations (one for 
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each asset) of the following type: 

E < E y ~. = 

o>eAt ;=t+l 1,1 



oieAkt 


Each of these systems can be solved individ¬ 
ually for the conditional probabilities cj*. Recall 
that a system of this type admits a solution if 
and only if the coefficient matrix and the aug¬ 
mented coefficient matrix have the same rank. 
If the system is solvable, its solution will be 
unique if and only if the number of unknowns 
is equal to the rank of the coefficient matrix. 

If the above system is not solvable, then there 
are arbitrage opportunities. This occurs if the 
payoffs of an asset are a linear combination of 
those of other assets, but its price is not the same 
linear combination of the prices of the other 
assets. This happens, in particular, if two assets 
have the same payoff in each state but different 
prices. In these cases, in fact, the rank of the 
coefficient matrix is inferior to the rank of the 
augmented matrix. 

Under the assumption 

T-l 

M » N x Mt 
t =o 

this system, if it is solvable, will be undeter¬ 
mined. Therefore, there will be infinite equiv¬ 
alent risk-neutral probabilities and the market 
will not be complete. Going to the limit of an 
infinite number of states, the above reason¬ 
ing proves, heuristically, that a discrete-time 
continuous-state market with a finite number 
of securities is inherently incomplete. In addi¬ 
tion, there will be arbitrage opportunities only 
if the random variable that represents the pay¬ 
off of an asset is a linear combination of the 
random variables that represent the payoffs of 
other assets, but the random variables that rep¬ 
resent prices are not in the same relationship. 

The above discussion can be illustrated in the 
case of multiple assets, each following a bino¬ 
mial model. If there are N linearly indepen¬ 


dent assets, the price paths in the interval (0, T) 
will form a total of 2 NT states. In a binomial 
model, we can limit our considerations to one 
time step as the other steps are identical. In one 
step, each price S' t at time f can go up to S\u‘ 
or down to S\d' at time t + 1. Given the prices 
{S|} = {Sj 1 , Sf, ..., S f N } at time f, there will be 
at the next time step, 2 N possible combinations 
{Sjw 1 , Sfw 2 , ..., S?w N }, w l = u 1 or d l . 

Suppose that there are 2 N states and that each 
combination of prices identifies a state. This 
means that at each date t the information struc¬ 
ture It partitions the set of states into 2 Nt sub¬ 
sets. Each set of the partition is partitioned into 
2 n subsets at the next time step. This yields 
2 N (t + 1) subsets at time t + 1. 

Note that this partitioning is compatible with 
any correlation structure between the random 
variables that represent prices. In fact, corre¬ 
lations depend on the value of the probability 
assigned to each state while the partitioning we 
assume depends on how different prices are as¬ 
signed to different states. 

Risk-neutral probabilities Cji, i = 1.2..... 2 N 
can be determined solving the following system 
of martingale conditions: 

2 N 

f>s;V(j) = s; 

;'=i 

2N 

!>;■ =! 

7=1 

= 1,2, ...,2 N ,i = 1,2, ...,N 

which becomes, after dividing each equation by 
S{, the following: 

2N 

E"? lw /0') = 1 

i =1 

2N 

= 1 

7=1 

where u> l (j) = u' or d‘ for asset i in state/. 

It can be verified that, under the previous as¬ 
sumptions and provided prices are positive, the 
above system admits infinite solutions. In fact. 
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asN + 1 < 2 N , the number of equations is larger 
than the number of unknowns. Therefore, if the 
system is solvable it admits infinite solutions. 
To verify that the system is indeed solvable, 
let's choose the first asset and partition the set 
of states into two events corresponding to the 
movement up or down of the same asset. Assign 
to these events probabilities as in the binomial 
model 


It 1 j 1 It 

u\ — d t 

Choose a second asset and partition each of the 
previous events into two events corresponding 
to the movements up or down of the second 
asset. We can now assign the following proba¬ 
bilities to each of the following four events: 

qlqh qU 1 - <h 2 )’ C 1 - qi)qf> i 1 - q^i 1 - qi) 

It can be verified that these numbers sum to 
one. The same process can be repeated for each 
additional asset. We obtain a set of positive 
numbers that sum to one and that satisfy the 
system by construction. There are infinite other 
possible constructions. In fact, at each step, we 
could multiply probabilities by "correlation fac¬ 
tors" (i.e., numbers that form a 2 x 2 correlation 
matrix) and still obtain solutions to the system. 

We can therefore conclude that a system of 
positive binomial prices such as the one above 
plus a risk-free asset is arbitrage-free and forms 
an incomplete market. If we let the number 
of states tend to infinity, the binomial dis¬ 
tribution converges to a normal distribution. 
We have therefore demonstrated heuristically 
that a multivariate normal distribution plus a 
risk-free asset forms an incomplete and 
arbitrage-free market. Note that the presence 
of correlations does not change this conclusion. 

Let's now see under what conditions this con¬ 
clusion can be changed. Go back to the multiple 
binomial model, assuming, as before, that there 
are N assets and T time steps. There is no logical 
reason to impose that the number of states be 
2 NT . As we can consider each time step sepa¬ 
rately, suppose that there is only one time step 


and that there are a number of states less than or 
equal to the number of assets plus 1: M < N + 1. 
In this case, the martingale condition that deter¬ 
mines risk-neutral probabilities becomes: 

M 

'Eqjw'ij) 

7=1 

N 

J2qi = 1 

7=1 

There are M equations and N + 1 unknowns 
with M < N + 1. This system will either deter¬ 
mine unique risk-neutral probabilities or will 
be unsolvable. Therefore, the market will be ei¬ 
ther complete and arbitrage-free or will exhibit 
arbitrage opportunities. Note that in this case 
we cannot use the constructive procedure used 
in the previous case. 

What is the economic meaning of the con¬ 
dition that the number of states be less than 
or equal to the number of assets? To illustrate 
this point, assume that the number of states 
is M = 2 K < N + 1. This means that we can 
choose K assets whose independent price pro¬ 
cesses identify all the states as in the previous 
case. Now add one more asset. This asset will go 
up or down not in specific states but in events 
formed by a number of states. Suppose it goes 
up in the event A and goes down in the event B. 
These events are determined by the value of the 
first K assets. In other words, the new asset will 
be a function of the first K assets. An interesting 
case is when the new asset can be expressed as a 
linear function of the first K assets. We can then 
say that the first K assets are factors and that any 
other asset is expressed as a linear combination 
of the factors. 

Consider that, given the first K assets, it is pos¬ 
sible to determine state-price deflators. These 
state-price deflators will not be uniquely de¬ 
termined. Any other price process must be ex¬ 
pressed as a linear combination of state-price 
deflators to avoid arbitrage. If all price pro¬ 
cesses are arbitrage-free, the market will be 
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complete if it is possible to determine uniquely 
the risk-neutral probabilities. 

If we let the number of states become very 
large, the number of assets must become large 
as well. Therefore it is not easy to develop sim¬ 
ple heuristic arguments in the limit of a large 
economy. What we can say is that in a large 
discrete economy where the number of states 
is less than or equal to the number of assets, if 
there are no arbitrage opportunities the market 
might be complete. If the market is complete 
and arbitrage-free, there will be a number of 
factors while all other processes will be linear 
combinations of these factors. 

KEY POINTS 

• The law of one price states that a given asset 
must have the same price regardless of the 
means by which one goes about creating that 
asset. 

• Arbitrage is the simultaneous buying and 
selling of an asset at two different prices in 
two different markets. 

• A finite-state one-period market is repre¬ 
sented by a vector of prices and a matrix of 
payoffs. 

• A state-price vector is a strictly positive vector 
such that prices are the product of the state- 
price vector and the payoff matrix. 

• There is no arbitrage if and only if there is a 
state-price vector. 

• A market is complete if an arbitrary payoff 
can be replicated by a portfolio. 

• A finite-state one-period market is complete if 
there are as many linearly independent assets 
as states. 

• A multiperiod finite-state economy is repre¬ 
sented by a probability space plus an infor¬ 
mation structure. 

• In a multiperiod finite-state market each se¬ 
curity is represented by a payoff process and 
a price process. 

• An arbitrage is a trading strategy whose pay¬ 
off process is nonnegative and not always 
zero. 


• A market is complete if any nonnegative pay¬ 
off process can be replicated with a trading 
strategy. 

• A state-price deflator is a strictly positive pro¬ 
cess such that prices are random variables 
equal to the conditional expectation of dis¬ 
counted payoffs. 

• A martingale is a process such that at any time 
t its conditional expectation at time s, s > t 
coincides with its present value. 

• In the absence of arbitrage there is an arti¬ 
ficial probability measure in which all price 
processes, appropriately discounted, become 
martingales. 

• Given a probability measure P, the probabil¬ 
ity measure Q is said to be equivalent to P 
if both assign probability zero to the same 
events. 

• The binomial model assumes that there are 
two positive numbers, d and it, such that 
0 < d < u and such that at each time step the 
price S of the risky asset changes to dS or 
to nS. 

• The distribution of prices of a binomial model 
is a binomial distribution. 

• The binomial model is complete. 


NOTES 

1. The binomial model was first suggested for 
the pricing of options by Cox, Ross, and 
Rubinstein (1979), Rendleman and Bartter 
(1979), and Sharpe (1978). 

2. See He (1990). 

3. For an application of the principles discussed 
here to the APT, see Focardi and Fabozzi 
(2004). 
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Abstract: The principle of absence of arbitrage is perhaps the most fundamental principle of finance 
theory. In the presence of arbitrage opportunities, there is no trade-off between risk and returns 
because it is possible to make unbounded risk-free gains. The principle of absence of arbitrage is 
fundamental for understanding asset valuation in a competitive market. Arbitrage pricing can be 
developed in a finite-state, discrete-time setting and a continuous-time, continuous-state setting. 


In this entry, we describe arbitrage pricing in the 
continuous-state, continuous-time setting. There 
are a number of important conceptual changes 
in going from a discrete-state, discrete-time set¬ 
ting (as described in the entry "Arbitrage Pric¬ 
ing: Finite-State Models") to a continuous-state, 
continuous-time setting. First, each state of the 
world has probability zero. This precludes the 
use of standard conditional probabilities for 
the definition of conditional expectation and 
requires the use of filtrations (rather than of 
information structures) to describe the propa¬ 
gation of information. Second, the tools of ma¬ 
trix algebra are inadequate; the more complex 
tools of calculus and stochastic calculus are re¬ 
quired. Third, simple generalizations are rarely 
possible as many pathological cases appear in 
connection with infinite sets. 


THE ARBITRAGE PRINCIPLE 
IN CONTINUOUS TIME 

Let's start with the definition of basic concepts. 
The economy is represented by a probability 
space (f2, J3, P) where Q is the set of possible 
states, 3 is the a -algebra of events, and P is a 
probability measure. Time is a continuous vari¬ 
able in the interval [0, T], The propagation of 
information is represented by a filtration h ( . 
The latter is a family of cr-algebras such that 
Sst cS s ,f < s. 

Each security i is characterized by a payoff- 
rate process Sj and by a price process S‘ t . In 
this continuous-state setting, <5| and S) are real 
variables with a continuous range such that 
<5J(o>) and S' t (o>) are, respectively, the payoff-rate 
and the price of the i-th asset at time t, 0 < t < T 
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and in state co e £1 Note that S\ represents a 
rate of payoff and not a payoff as was the case 
in the discrete-time setting. The payoff-rate 
process must be interpreted in the sense that the 
cumulative payoff of each individual asset is 


t 



o 


We assume that the number of assets is fi¬ 
nite. We can therefore use the vector notation 
to indicate a set of processes. For example, 
we write <5f and St to indicate the vector pro¬ 
cess of payoff rates and prices respectively. All 
payoff-rates and prices are stochastic processes 
adapted to the filtration 3. One can make as¬ 
sumptions about the price and the payoff-rate 
processes. For example, it can be assumed that 
price and payoff-rate processes satisfy a set of 
stochastic differential equations or that they ex¬ 
hibit finite jumps. Later in this entry we will 
explore a number of these processes. 

Conditional expectations are defined as par¬ 
tial averaging. In fact, given a variable X s , s > f, 
its conditional expectation £ f [X s ] is defined as 
a variable that is 3 f -measurable and whose av¬ 
erage on each set A e St is the same as that 
of X: 


Y t = E t [X s ]^E[Y t (co)] = E[X s (co)] 

for co e A, VA e St and Y is ^-measurable. 

The law of iterated expectations applies as in 
the finite-state case: 

E t [E u (X s )] = E,[X S ] 

In a continuous-state setting, conditional ex¬ 
pectations are variables that assume constant 
values on the sets of infinite partitions. Imagine 
the evolution of a variable X. At the initial date, 
Xo identifies the entire space S2. At each sub¬ 
sequent date t, the space S2 is partitioned into 
an infinite number of sets, each determined by 
one of the infinite values of X*. 1 However, these 
sets have measure zero. In fact, they are sets of 
the type: {A co e A o X t (co) = x\ determined 
by specific values of the variable X f . These sets 


have probability zero as there is an infinite num¬ 
ber of values X ( . As a consequence, we cannot 
define conditional expectation as expectation 
under the usual definition of conditional prob¬ 
abilities the same way we did in the case of 
finite-state setting. 


Trading Strategies and 
Trading Gains 

We have to define the meaning of trading strate¬ 
gies in the continuous-state, continuous-time 
setting; this requires the notion of continu¬ 
ous trading. Mathematically, continuous trad¬ 
ing means that the composition of portfolios 
changes continuously at every instant and that 
these changes are associated with trading gains 
or losses. A trading strategy is a (vector-valued) 
process 0 = {O'} such that 0 t = {0/} is the port¬ 
folio held at time f. To ensure that there is no 
anticipation of information, each trading strat¬ 
egy 6 must be an adapted process. 

Given a trading strategy, we have to define 
the gains or losses associated with it. In discrete 
time, the trading gains equal the sum of payoffs 
plus the change of a portfolio's value 



E S o*o 


over a finite interval [0, T]. 

We must define trading gains when time is 
a continuous variable. It is not possible to re¬ 
place finite sums of stochastic increments with 
pathwise Riemann-Stieltjes integrals after let¬ 
ting the time interval go to zero. The reason 
is that, though we can assume that paths are 
continuous, we cannot assume that they have 
bounded variation. As a consequence, pathwise 
Riemann-Stieltjes integrals generally do not ex¬ 
ist. However, we can assume that paths are of 
bounded quadratic variation. Under this latter 
assumption, using Ito isometry, we can define 
pathwise Ito integrals and stochastic integrals. 

Let's first assume that the payoff-rate pro¬ 
cess is zero, so that there are only price pro¬ 
cesses. Under this assumption, the trading gain 
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Tf of a trading strategy can be represented by a 
stochastic integral: 


r 

-/ 


Tt = / 0 s dS f 


= W, 


^ , °'sdS' s 


In the rest of this section, we will not strictly 
adhere to the vector notation when there is no 
risk of confusion. For example, we will write 6 ■ 
S to represent the scalar product 0- S. If a payoff- 
rate process is associated with each asset, we 
have to add the gains consequent to the payoff- 
rate process. We therefore define the gain 
process 

G\ = S‘ + D\ 


as the sum of the price processes plus the cu¬ 
mulative payoff-rate processes, and we define 
the trading gains as the stochastic integral 


t 

Tt = J 9 s dG s 

o 



How can we match the abstract notion of a 
stochastic integral with the buying and selling 
of assets? In discrete time, trading gains have a 
meaning that is in agreement with the practical 
notion of buying a portfolio of assets, holding it 
for a period, and then selling it at market prices, 
thus realizing either a gain or a loss. One might 
object that in continuous time this meaning is 
lost. How can a process where prices change so 
that their total variation is unbounded be a rea¬ 
sonable representation of financial reality? This 
is a question of methodology that is relevant to 
every field of science. In classical physics, the 
use of continuous models was assumed to re¬ 
flect reality; time and space, for example, were 
considered continuous. Quantum physics upset 
the conceptual cart of classical physics, and the 
reality of continuous processes has since been 
questioned at every level. In quantum physics, 
a theory is considered to be nothing but a model 
useful as a mathematical device to predict mea¬ 
surements. This is, in essence, the theory set 
forth in the 1930s by Niels Bohr and the school 


of Copenhagen; it has now become mainstream 
methodology in physics. It is also, ultimately, 
the point of view of positive economics. In a 
famous and widely quoted essay, Milton Fried¬ 
man (1953) wrote: 

The relevant question to ask about the "assump¬ 
tions" of a theory is not whether they are descrip¬ 
tively "realistic," for they never are, but whether 
they are sufficiently good approximations for the 
purpose in hand. And this question can be answered 
only by seeing whether the theory works, which 
means if it yields sufficiently accurate predictions. 

In the spirit of positive economics, 
continuous-time financial models are math¬ 
ematical devices used to predict, albeit in 
a probabilistic sense, financial observations 
made at discrete intervals of time. Stochastic 
gains predict trading gains only at discrete 
intervals of time-the only intervals that can be 
observed. Continuous-time finance should be 
seen as a logical construction that meets obser¬ 
vations only at a finite number of dates, not as 
a realistic description of financial trading. 

Let's consider processes without any interme¬ 
diate payoff. A self-financing trading strategy is 
a trading strategy such that the following rela¬ 
tionships hold: 


0(Si = J2 d tSi 



, t e [0, T] 


We first define arbitrage in the absence of 
a payoff-rate process. An arbitrage is a self¬ 
financing trading strategy such that: (fSo < 0 
and OtSj > 0, or OqSo < 0 and OtSj > 0. If there 
is a payoff-rate process, a self-financing trad¬ 
ing strategy is a trading strategy such that the 
following relationships hold: 


OtSt = J2 6 i S i = E K S o + / d\AG\ j , t e [0, T] 

where G\ = S} + D\ is the gain process as pre¬ 
viously defined. An arbitrage is a self-financing 
trading strategy such that: 9oSo < 0 and OjSt > 
0, or <9 0 S 0 < 0 and 9jSt > 0. 
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ARBITRAGE PRICING IN 

CONTINUOUS-STATE, 

CONTINUOUS-TIME 

The abstract principles of arbitrage pricing are 
the same in a discrete-state, discrete-time set¬ 
ting as in a continuous-state, continuous-time 
setting. Arbitrage pricing is relative pricing. In 
the absence of arbitrage, the price and payoff- 
rate processes of a set of basic assets fix the 
prices of other assets given the payoff-rate pro¬ 
cess of the latter. If markets are complete, every 
price process can be computed in this way. In a 
discrete-state, discrete-time setting, the compu¬ 
tation of arbitrage pricing is done with matrix 
algebra. In fact, in the absence of arbitrage, ev¬ 
ery price process can be expressed in two alter¬ 
native ways: 

1. Prices S[ are equal to the normalized condi¬ 
tional expectation of payoffs deflated with 
state prices under the real probabilities: 

T 

n i d j 

j= t +i 

2. Prices S' t are equal to the conditional expec¬ 
tation of discounted payoffs under the risk- 
neutral probabilities 



si = E t ° 


T d‘ 
V — L- 

^ Rt i 
;=t+l 


State-price deflators and risk-neutral probabili¬ 
ties can be computed solving systems of linear 
equations for a kernel of basic assets. The above 
relationships are algebraic linear equations that 
fix all price processes. 

In a continuous-state, continuous-time set¬ 
ting, the principle of arbitrage pricing is the 
same. In the absence of arbitrage, given a num¬ 
ber of basic price and payoff stochastic pro¬ 
cesses, other processes are fixed. The latter are 
called redundant securities as they are not nec¬ 
essary to fix prices. If markets are complete, ev¬ 
ery price process can be fixed in this way. In 


order to make computations feasible, some ad¬ 
ditional assumptions are made, in particular, all 
payoff-rate and price processes are assumed to 
be Ito processes. 

The theory of arbitrage pricing in a 
continuous-state, continuous-time setting uses 
the same tools as in a discrete-state, discrete¬ 
time setting. Under an equivalent martingale 
measure, all price processes become martin¬ 
gales. Therefore prices can be determined as 
discounted present value relationships. Equiv¬ 
alent martingale measures are the same concept 
as state-price deflators: After appropriate de¬ 
flation, all processes become martingales. The 
key point of arbitrage pricing theory is that both 
equivalent martingale measures and state-price 
deflators can be determined from a subset of the 
market. All other processes are redundant. 

In the following sections we will develop 
the theory of arbitrage pricing in steps. First, 
we will illustrate the principles of arbitrage 
pricing in the case of options, arriving at the 
Black-Scholes option pricing formida. We will then 
extend this theory to more general derivative 
securities. Subsequently, we will state arbitrage 
pricing theory in the context of equivalent mar¬ 
tingale measures and of state-price deflators. 

OPTION PRICING 

We will now apply the concepts of arbitrage 
pricing to option pricing in a continuous-state, 
continuous-time setting. Suppose that a mar¬ 
ket consists of three assets: a risk-free asset 
(which allows risk-free borrowing and lending 
at the risk-free rate of interest), a stock, and a 
European option. We will show that the price 
processes of a stock and of a risk-free asset fix 
the price process of an option on that stock. 

Suppose the risk-free rate is a constant r. The 
value Vf of a risk-free asset with constant rate 
r evolves according to the deterministic dif¬ 
ferential equation of continually compounding 
interest rates: 


dV t = rV t dt 






Arbitrage Pricing: Continuous-State, Continuous-Time Models 


125 


The above is a differential equation with sepa¬ 
rable variables. After separating the variables, 
the equation can be written as 


which admits the solution V t = Voe rt where Vo 
is the initial value of the bank account. This 
formula can also be interpreted as the price 
process of a risk-free bond with deterministic 
rate r. 


Stock Price Processes 

Let's now examine the price process of the 
stock. Consider the process y = at + a B t where 
Bt is a standard Brownian motion. From the def¬ 
inition of Ito integrals, it can be seen that this 
process, which is called an arithmetic Brownian 
motion, is the solution of the following diffu¬ 
sion equation: 

dy t = adt + odBt 

where a is a constant called the drift of the dif¬ 
fusion and a is a constant called the volatility of 
the diffusion. 

Consider now the process St = Soe^ at+crB ‘\ t > 
0. Applying Ito's lemma it is easy to see that 
this process, which is called a geometric Brow¬ 
nian motion, is an Ito process that satisfies the 
following stochastic differential equation: 

dSt = fiStdt + a StdB t ; So = x 

where x is an initial value, p = a + l/2cr 2 and 
Bt is a standard Brownian motion. We assume 
that the stock price process follows a geometric 
Brownian motion and that there is no payoff- 
rate process. 

Now consider a European call option, which 
gives the owner the right but not the obligation 
to buy the underlying stock at the exercise price 
K at the expiry date T. Call Y t the price of the 
option at time t. The price of the option as a 
function of the stock price is known at the final 


expiry date. If the option is rationally exercised, 
the final value of the option is 

Y t = max( S T — X, 0) 

In fact, the option can be rationally exercised 
only if the price of the stock exceeds K. In that 
case, the owner of the option can buy the un¬ 
derlying stock at the price K, sell it immediately 
at the current price St and make a profit equal 
to (St — K). If the stock price is below K, the 
option is clearly worthless. After T, the option 
ceases to exist. 

How can we compute the option price at ev¬ 
ery other date? We can arrive at the solution in 
two different but equivalent ways: (1) through 
hedging arguments and (2) the equivalent mar¬ 
tingale measures. In the following sections we 
will introduce hedging arguments and equiva¬ 
lent martingale measures. 

Hedging 

To hedge means to protect against an adverse 
movement. The seller of an option is subject to 
a liability as, from his point of view, the op¬ 
tion has a negative payoff in some states. In our 
context, hedging this option means to form a 
self-financing trading strategy formed with the 
stock plus the risk-free asset in appropriate pro¬ 
portions such that the option plus this hedging 
portfolio is risk free. Hedging the option implies 
that the hedging portfolio perfectly replicates 
the option payoff in every possible state. 

A European call option has only one payoff 
at the expiry date. It therefore suffices that the 
hedging portfolio replicates the option payoff at 
that date. Suppose that there is a self-financing 
trading strategy ((f , ()f) in the bond and the 
stock such that 

0}Vt + O^St = Yt 

To avoid arbitrage, the price of the option at any 
moment must be equal to the value of the hedg¬ 
ing self-financing trading strategy. In fact, sup¬ 
pose that at any time t < T the self-financing 
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strategy (Of, Of) has a value lower than the 
option: 

OfV t + 0fS t < Y t 

An investor could then sell the option for Y t , 
make an investment 6} V t + Of S, in the trading 
strategy, and at time T liquidate both the option 
and the trading strategy As Of Vj + Of St = Yj 
the final liquidation has value zero in every 
state of the world, so that the initial profit 
Yt — OfVr + Of St is a risk-free profit. A simi¬ 
lar reasoning could be applied if, at any time 
f < T, the strategy (Of, Of) had a value higher 
than the option. Therefore, we can conclude 
that if there is a self-financing trading strategy 
that replicates the option's payoff, the value 
of the strategy must coincide with the op¬ 
tion's price at every instant prior to the expiry 
date. 

Observe that the above reasoning is an in¬ 
stance of the law of one price. If two portfolios 
have the same payoffs at every moment and in 
every state of the world, their price must be the 
same. In particular, if a trading strategy has the 
same payoffs of an asset, its value must coincide 
with the price of that asset. 


The Black-Scholes Option 
Pricing Formula 

Let's now see how the price of the option can be 
computed. Assume that the price of the option 
is a function of time and of the price of the un¬ 
derlying stock: Y t = C(St, t). This assumption 
is reasonable but needs to be justified; for the 
moment it is only a hint as to how to proceed 
with the calculations. It will be justified later by 
verifying that the pricing formula produces the 
correct final payoff. 

As S f is assumed to be an Ito process, in 
particular a geometric Brownian motion, Y t = 
C(Sf, t )—which is a function of S t —is an Ito pro¬ 
cess as well. Therefore, using Ito's formula, we 
can write down the stochastic equation that Y, 


must satisfy. Ito's formula prescribes that: 

'dC(StJ) dC(St,t) 


dY t = 


dt dS t 

1 d 2 C(S t , t) c2 _ 2 

2 3 Sf 


-S t (i 


c2 _2 
■ Dj. (7 


dt - 


dC(St, t) 
3 St 


a St dB 


Suppose now that there is a self-financing 
trading strategy Y t = Of V t + Of S t . We can write 
this equation as 


r 

/ 


t t 

dY t = Of J dV t 


9f j dS t 
o 


0 0 

or, in differential form, as 

dY t = OfdVt + OfdSt 

= (OfrV t + OffiS t )dt + OfaSfdB, 


If the trading strategy replicates the option 
price process, the two expressions for dY t — 
the one obtained through Ito's lemma and 
the other obtained through the assumption 
that there is a replicating self-financing trading 
strategy—must be equal: 


(OfrV t + 9fiiSt)dt + OfaS t dB t 
'dC(S t ,t) 

_ dt dSt 

dC(St, t) 


dC(S,,t) c 1 3 2 C(S|, t) 


Stfi + 


as, 2 


+ 


as, 


-aS t dBt 


c2„2 
jj. (7 


dt 


The equality of these two expressions implies 
the equality of the coefficients in dt and dB re¬ 
spectively. Equating the coefficients in dB yields 


,2 _ 9C(S t , t ) 

1 a St 


As Y t = C(S t , t) = OfV t + Of S t , substituting, we 
obtain 


1 

V t 


C(St,t) 


ac(St,t) 

dSt 


St 


We have now obtained the self-financing 
trading strategy in function of the stock and 
option prices. Substituting and equating the 
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coefficients of dt yields 


1 

Vt 


C(S t ,t) 


9C(S f , f) l 

- Wr - 


rV t + 


9 C(S t ,t) 
dS t 


/tS, 


_ dC(S t ,t) dC(S t ,t) 1 d 2 C(S t , t) 2 2 

dt dS t 2 3 S} 1 

Simplifying and eliminating common terms, 
we obtain 

3C(Sf, f) 


-rC(S t ,f) + r^|^S t 

dot 

, 1 9 2 C(S f , f) 2 2 


3f 


3S f 2 


S 2 cr 2 = 0 


If the function C(Sj, f) satisfies this relation¬ 
ship, then the coefficients in dt match. The 
above relationship is a partial differential equa¬ 
tion (PDE). This equation can be solved with 
suitable boundary conditions. Boundary con¬ 
ditions are provided by the payoff of the option 
at the expiry date: 


Yr = C(S r , T) = max(S T - K, 0) 


The closed-form solution of the above PDE with 
the above boundary conditions was derived by 
Black and Scholes (1973) and referred to as the 
Black-Scholes option pricing formula: 

C(St, t) = X5>(Z) - e -r(T-f) x< _ a ^/Y~t) 


with 

_ log(S f /fC) + (r + fa 2 )(r-f) 

(Jy/T — t 

and where <t> is the cumulative normal 
distribution. 

Let's stop for a moment and review the log¬ 
ical steps we have followed thus far. First, we 
defined a market made by a stock whose price 
process follows a geometric Brownian motion 
and a bond whose price process is a determinis¬ 
tic exponential. We introduced into this market 
a European call option. We then made two as¬ 
sumptions: (1) The option's price process is a 
deterministic function of the stock price pro¬ 
cess; and (2) the option's price process can be 
replicated by a self-financing trading strategy. 


If the above assumptions are true, we can 
write a stochastic differential equation for the 
option's price process in two different ways: 
(1) Using Ito's lemma, we can write the op¬ 
tion price stochastic process as a function of 
the stock stochastic process; and (2) using the 
assumption that there is a replicating trading 
strategy, we can write the option price stochas¬ 
tic process as the stochastic process of the trad¬ 
ing strategy. As the two equations describe the 
same process, they must coincide. Equating the 
coefficients in the deterministic and stochastic 
terms, we can determine the trading strategy 
and write a deterministic PDE that the pric¬ 
ing function of the option must satisfy. The lat¬ 
ter PDE together with the boundary conditions 
provided by the known value of the option at 
the expiry date uniquely determine the option 
pricing function. 

Note that the above is neither a demonstra¬ 
tion that there is an option pricing function, nor 
a demonstration that there is a replicating trad¬ 
ing strategy. However, if both a pricing func¬ 
tion and a replicating trading strategy exist, the 
above process allows one to determine both by 
solving a partial differential equation. After de¬ 
termining a solution to the PDE, one can ver¬ 
ify if it provides a pricing function and if it 
allows the creation of a self-financing trading 
strategy. Ultimately, the justification of the ex¬ 
istence of an option's pricing function and of 
a replicating self-financing trading strategy re¬ 
sides in the possibility of actually determining 
both. Absence of arbitrage ensures that this so¬ 
lution is unique. 

Generalizing the Pricing of 
European Options 

We can now generalize the above pricing 
methodology to a generic European option and 
to more general price processes for the bond 
and for the underlying stock. In the most gen¬ 
eral case, the process underlying a derivative 
need not be a stock price process. However, we 
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suppose that the underlying is a stock price pro¬ 
cess so that replicating portfolios can be formed. 
We generalize in three ways: 

• The option's payoff is an arbitrary finite- 

variance random variable. 

• The stock price process is an Ito process. 

• The short-rate process is stochastic. 

Following the definition given in the finite- 
state setting, we define a European option on 
some underlying process St as an asset whose 
payoff at time T is given by the random vari¬ 
able Y t = g(Sr) where g(x), x e R is a contin¬ 
uous real-valued function. In other words, a 
European option is defined as a security whose 
payoff is determined at a given expiry date T 
as a function of some underlying random vari¬ 
able. The option has a zero payoff at every other 
date f e [0, T]. This definition clearly distin¬ 
guishes European options from American op¬ 
tions, which yield payoffs at random stopping 
times. 

Let's now generalize the price process of the 
underlying stock. We represent the underly¬ 
ing stock price process as a generic Ito pro¬ 
cess. A generic univariate Ito process can be 
represented through the differential stochastic 
equation: 

dSt = p(Sf, t)dt + a (St, t)dB t ', So = x 

where x is the initial condition, B is a standard 
Brownian motion, and fx(St, f) and (St, t) are 
given functions R x (0, oo) —»■ R. The geomet¬ 
ric Brownian motion is a particular example of 
an Ito process. 

Let's now define the bond price process. We 
retain the risk-free nature of the bond but let 
the interest rate be stochastic. Recall that in a 
discrete-state, discrete-time setting, a bond was 
defined as a process that, at each time step, ex¬ 
hibits the same return for each state though the 
return can be different in different time steps. 
Consequently, in continuous-time we define a 
bond price process as the following integral: 

t 

f r(S u ,u)du 

V t — V 0 e° 


where r is a given function that represents the 
stochastic rate. In fact, the rate r depends on 
the time t and on the stock price process St- 
Application of Ito's lemma shows that the bond 
price process satisfies the following equation: 

dV t = V t r(St, t)dt 


We can now use the same reasoning that led to 
the Black-Scholes formula. Suppose that there 
are both an option pricing function Y t = C(S t ,t) 
and a replicating self-financing trading strategy 

Y t = elv t + efSt 

We can now write a stochastic differential 
equation for the process Y t in two ways: 

• Applying Ito's lemma to Y f = C(S t ,t) 

• Directly to Y t = 0}V t + 9 2 S t 


The first approach yields 

'dC(StJ) dC(St,t) 


dY, = 


dt dS t 

1 d 2 C(Sf, t) 

2 dSf 
dC(St,t) 


n(St, t) 


o 2 (S t , t) 


dt 


ds t 


cr(St, t)dB t 


The second approach yields 

dY t = [9fr(St, t) V t + 0 2 fx(St, t)]dt + 0fo(S t , t)dB f 


Equating coefficients in dt, Db we obtain the 
trading strategy 


C(S t ,t) 


1 

Vt 

dC(St,t) 


dC(S t , t) ( 
dSt ‘ 


dS t 


and the PDE 


3 C(x, t) 

-r(x, t)C(x, t) + r(x, t )—-- -x 


dx 


dC(x,t) 13 2 C(x,t) 2 


dt 


dx 2 


a (x, t) — 0 


with the boundary conditions C( Sr, T) = g (St)- 
Solving this equation we obtain a candidate op¬ 
tion pricing function. In each specific case, one 
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can then verify that the option pricing function 
effectively solves the option pricing problem. 

STATE-PRICE DEFLATORS 

We now extend the concepts of state prices 
and equivalent martingale measures to a 
continuous-state, continuous-time setting. As 
in the previous sections, the economy is rep¬ 
resented by a probability space (£2, Js, P) where 
£2 is the set of possible states, b is the a -algebra 
of events, and P is a probability measure. Time 
is a continuous variable in the interval [0, T], 
The propagation of information is represented 
by a filtration Sst . A multivariate standard Brow¬ 
nian motion B — (B ..., B d) in R D adapted to 
the filtration Ssf is defined over this probabil¬ 
ity space. We know that there are mathematical 
subtleties that we will not take into consider¬ 
ation, as regards whether (1) the filtration is 
given and the Brownian motion is adapted to 
the filtration or (2) the filtration is generated by 
the Brownian motion. 

Suppose that there are N price processes 
X = (X 1 ,..., X N ) that form a multivariate I to 
process in R N . Trading strategies are adapted 
processes 9 = (9 1 ,... ,6 ,v ) that represent the 
quantity of each asset held at each instant. In 
order to ensure the existence of stochastic inte¬ 
grals, we require the processes (X 1 , ..., X N ) and 
any trading strategy to be of bounded variation. 
Let's first suppose that there is no payoff-rate 
process. This assumption will be relaxed in a 
later section. Suppose also that one of these pro¬ 
cesses, say Xj, is defined by a short-rate process 
r, so that 

Xj = e Io r * du 

or 

dXj = r t Xjdt 

where r t is a deterministic function of t called 
the short-rate process. Note that Xj could 
be replaced by a trading strategy. We can 
think of i' t as the risk-free short-term contin¬ 
uously compounding interest rate and of Xj 


as a risk-free continuously compounding bank 
account. 

The concept of arbitrage and of trading strat¬ 
egy was defined in the previous section. We 
now introduce the concept of deflators in a 
continuous-time continuous-state setting. Any 
strictly positive Ito process is called a deflator. 
Given a deflator Y we can deflate any process 
X, obtaining a new deflated process 

X y = X f Y t 

For example, any stock price process of a non¬ 
defaulting firm or the risk-free bank account 
is a deflator. For technical reasons it is neces¬ 
sary to introduce the concept of regular defla¬ 
tors. A regular deflator is a deflator that, after 
deflation, leaves unchanged the set of admissi¬ 
ble bounded-variation trading strategies. 

We can make the first step towards defin¬ 
ing a theory of pricing based on equivalent 
martingale measures. It can be demonstrated 
that if Y is a regular deflator, a trading strat¬ 
egy 9 is self-financing with respect to the price 
process X = (X 1 ,..., X N ) if and only if it is 
self-financing with respect to the deflated price 
process 

x y = (Y t xj, .. ., yx f N ) 

In addition, it can be demonstrated that the 
price process X = (X 1 ,..., X N ) admits no arbi¬ 
trage if and only if the deflated price process 

x Y = (Y t xj,, y f x f N ) 

admits no arbitrage. 

A state-price deflator is a deflator jt with the 
property that the deflated price process X 71 is a 
martingale. A martingale is a stochastic process 
M f such that its current value equals the condi¬ 
tional expectation of the process at any future 
time: Mt — E f [ M, ], s > t. For each price process 
Xj, the following relationship therefore holds: 

TTf Xj = E f [tc s Xj ], s > t 

This definition is the equivalent in continuous 
time of the definition of a state-price deflator 
in discrete time. In fact, a state-price deflator is 
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defined as a process tt such that 

r 

n i d ) 

j=t+i 

If there is no intermediate payoff, as in our 
present case, the previous relationship can be 
written as 


1 

s; = — E t 

Kt 


rt t S‘ t = E f [7t T S' T ] = £ f [£ f+ i[7r r S^]] 

= E f [jr t+1 Sl +1 \ 

The next proposition states that if there is a 
regular state-price deflator, then there is no ar¬ 
bitrage. The demonstration of this proposition 
hinges on the fact that, as the deflated price pro¬ 
cess is a martingale, the following relationship 
holds: 


' T 

J QudS* 


Lo 


= 0 


and therefore any self-financing trading strat¬ 
egy is a martingale. We can thus write 

©oSq = £[9rSy] 


If 


6tSj > 0 then 0 o So > 0 

and if 0jSj > 0 then 0 o Sq > 0 

which shows that there cannot be any arbitrage. 

We have now stated that the existence of state- 
price deflators ensures the absence of arbitrage. 
The converse of this statement in a continuous- 
state, continuous-time setting is more delicate 
and will be dealt with later. We will now move 
on to equivalent martingale measures. 


EQUIVALENT MARTINGALE 
MEASURES 

In the previous section we saw that if there is a 
regular state-price deflator then there is no ar¬ 
bitrage. A state-price deflator transforms every 
price process and every self-financing trading 
strategy into a martingale. We will now see that. 


after discounting by an appropriate process, 
price processes become martingales through 
a transformation of the real probability mea¬ 
sure into an equivalent martingale measure. 2 
This theory parallels the theory of equivalent 
martingale measures developed in the discrete- 
state, discrete-time setting in the entry "Arbi¬ 
trage Pricing: Finite-State Models." First some 
definitions must be discussed. 

Given a probability measure P, the probability 
measure Q is said to be equivalent to P if both 
assign probability zero to the same events, that 
is, if P(A) — 0 if and only if Q(A) = 0 for every 
event A. The equivalent probability measure Q 
is said to be an equivalent martingale measure for 
the process X if X is a martingale with respect 
to Q and if the Radon-Nikodym derivative 


has finite variance. The definition of the Radon- 
Nikodym derivative is the same here as it is 
in the finite-state context. The Radon-Nikodym 
derivative is a random variable f such that 
Q(A) = E p [%Ia] for every event A where I a is 
the indicator function of the event A. 

To develop an intuition for this definition, 
consider that any stochastic process X is a time- 
dependent random variable X f . The latter is 
a family of functions £2 —> R from the set of 
states to the real numbers indexed with time 
such that the sets {X t (a>) < x) are events for 
any real x. Given the probability measure P, 
the finite-dimension distributions of the pro¬ 
cess X are determined. The equivalent measure 
Q determines another set of finite-dimension 
distributions. However, the correspondence be¬ 
tween the process paths and the states remains 
unchanged. 

The requirement that P and Q are equivalent 
is necessary to ensure that the process is effec¬ 
tively the same under the two measures. There 
is no assurance that given an arbitrary process 
an equivalent martingale measure exists. Let's 
assume that an equivalent martingale measure 
does exist for the N-dimensional price process 
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X = (X 1 ,..., X N ). It can be demonstrated that 
if the price process X = (X 1 ,..., X N ) admits an 
equivalent martingale measure, then there is no 
arbitrage. 

The proof is similar to that for state-price 
deflators as discussed above. Under the 
equivalent martingale measure Q, which we 
assume exists, every price process and every 
self-financing trading strategy becomes a mar¬ 
tingale. Using the same reasoning as above it is 
easy to see that there is no arbitrage. 

This result can be generalized; here is how. 
If there is a regular deflator Y such that the 
deflated price process X ' = ( Y t Xj,..., Y t X^) 
admits an equivalent martingale measure, then 
there is no arbitrage. The proof hinges on the 
result established in the previous section that, 
if there is a regular deflator Y, the price process 
X admits no arbitrage if and only if the deflated 
price process X } admits no arbitrage. 

Note that none of these results is constructive. 
They only state that the existence of an equiva¬ 
lent martingale measure with respect to a price 
process ensures the absence of arbitrage. Con¬ 
ditions to ensure the existence of an equivalent 
martingale measure with respect to a price pro¬ 
cess are given in the next section. 

EQUIVALENT MARTINGALE 
MEASURES AND 
GIRSANOV'S THEOREM 

We first need to establish an important mathe¬ 
matical result known as Girsanov's theorem. This 
theorem applies to Ito processes. Let's first state 
Girsanov's theorem in simple cases. Let X be a 
single-valued Ito process where B is a single¬ 
valued standard Brownian motion: 

t t 

X, = x + J l ,,d S +j„,dB, 

0 0 

Suppose that a process v and a process 9 such 
that a t 9 t = Ht — v t are given. Suppose, in addi¬ 
tion, that the process 0 satisfies the Novikov 


condition which requires 



Then, there is a probability measure Q equiva¬ 
lent to P such that the following integral 

t 

6t — Bt + J 0 s ds 

o 

defines a standard Brownian motion B, in R on 
(£2, Ss, Q) with the same standard filtration of 
the original Brownian motion B f . In addition, 
under Q the process X becomes 

f f 

X t = x + j v s ds + j cr s dB s 

o o 

Girsanov's theorem states that we can add 
drift to a standard Brownian motion and still 
obtain a standard Brownian motion under 
another probability measure. In addition, by 
changing the probability measure we can ar¬ 
bitrarily change the drift of an Ito process. 

The same theorem can be stated in multiple 
dimensions. Let X be an N-valued Ito process: 

t f 

X, = * + /**+/MB, 

0 0 

In this process, /xg is an N-vector process and 
er s is an N x D matrix. Suppose that there are 
both a vector process v = (v 1 ,...,v N ) and a 
vector process 9 = (9 1 ,..., 0 N ) such that a t 9t = 
fit — vt where the product a t 9t is not a scalar 
product but is performed component by com¬ 
ponent. Suppose, in addition, that the process 9 
satisfies the Novikov condition: 



r / * \ 

' 

£ 

e ( j f 6 ' 6ds 



_ V 0 ) 



Then there is a probability measure Q equiva¬ 
lent to P such that the following integral 

t 

Sf = Bf + J 9 s ds 

o 
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defines a standard Brownian motion B f in R 1 ’ 
on (£2, J3, Q) with the same standard filtration 
of the original Brownian motion B f . In addition, 
under Q the process X becomes 

t t 

Xf = x + J v s ds + J a s dB s 
0 0 

Girsanov's theorem essentially states that un¬ 
der technical conditions (the Novikov condi¬ 
tion) by changing the probability measure, it is 
possible to transform an Ito process into another 
Ito process with arbitrary drift. Prima facie, this 
result might seem unreasonable. In the end the 
drift of a process seems to be a fundamental 
feature of the process as it defines, for example, 
the average of the process. Consider, however, 
that a stochastic process can be thought as the 
set of all its possible paths. In the case of an 
Ito process, we can identify the process with 
the set of all continuous and square integrable 
functions. As observed above, the drift is an 
average, and it is determined by the probabil¬ 
ity measure on which the process is defined. 
Therefore, it should not be surprising that by 
changing the probability measure it is possible 
to change the drift. 

The Diffusion Invariance Principle 

Note that Girsanov's theorem requires neither 
that the process X be a martingale nor that Q 
be an equivalent martingale measure. If X is 
indeed a martingale under Q, an implication of 
Girsanov's theorem is the diffusion invariance 
principle, which can be stated as follows. Let X 
be an Ito process: 

dXf = [Afdt T tjfdBf 

If X is a martingale with respect to an equiva¬ 
lent probability measure Q, then there is a stan¬ 
dard Brownian motion ft? in RP under Q such 
that 


Let's now apply the previous results to a price 
process X = ( V, S 1 , , S N_1 ) where 

dSt — fJttdt + (7/rfBf 

and 

dV t =r t V t dt 

If the short-term rate r is bounded, Vf 1 
is a regular deflator. Consider the deflated 
processes: 

z, = s ^- 1 

By Ito's lemma, this process satisfies the follow¬ 
ing stochastic equation: 

dZ t = (-r t Z t + —\dt+ —dB t 
V V t J V t 

Suppose there is an equivalent martingale 
measure Q. Under the equivalent martingale 
measure Q, the discounted price process 

z f = SfV^ 1 

is a martingale. In addition, by the diffusion in¬ 
variance principle there is a standard Brownian 
motion 6 f in R° under Q such that: 

dZ t = £dB t 

Vt 

Applying Ito's lemma, given that Z t V t = St, 
we obtain the fundamental result: 

dSf — r t dt -f- a t d& t 

This result states that, under the equivalent 
martingale measure, all price processes become 
Ito processes with the same drift. 

Application of Girsanov's 
Theorem to Black-Scholes Option 
Pricing Formula 

To illustrate Girsanov's theorem, let's see how 
the Black-Scholes option pricing formula can be 
obtained from an equivalent martingale mea¬ 
sure. In the previous setting, let's assume that 
N = 3, d = 1, rt is a constant and 


dXf — a,d& t 


at — a St 


Arbitrage Pricing: Continuous-State, Continuous-Time Models 


133 


with a constant. Let S be the stock price process 
and C be the option price process. The option's 
price at time T is 

C = max(Sp — K) 

In this setting, therefore, the following three 
equations hold: 

dSt = /J-fdt + aSfdBt 
dC 2 t = n c t dt + a^dB t 
dV t = rV t dt 


Given that C f V t 
r C 2 


-l 


C f = V,E 


Q 


V t 


is a martingale, we can write 
= Ep[e- r{T ~ t) max(S T - K)] 


It can be demonstrated by direct computation 
that the above formula is equal to the Black- 
Scholes option pricing formula presented ear¬ 
lier in this entry. 


EQUIVALENT MARTINGALE 
MEASURES AND COMPLETE 
MARKETS 

In the continuous-state, continuous-time set¬ 
ting, a market is said to be complete if any finite- 
variance random variable Y can be obtained as 
the terminal value at time T of a self-financing 
trading strategy 6 : Y — Oj Xt . A fundamental 
theorem of arbitrage pricing states that, in the 
absence of arbitrage, a market is complete if and 
only if there is a unique equivalent martingale 
measure. This condition can be made more spe¬ 
cific given that the market is populated with as¬ 
sets that follow Ito processes. Suppose that the 
price process is X = (V, S 1 ,..., S N_1 ) where, as 
in the previous section: 

dSf = [Afidt -{- cjfdBf 

dV t = rV t dt 

and B is a standard Brownian motion B = 
(B 1 ,..., B D ) in R d . 

It can be demonstrated that markets are com¬ 
plete if and only if rank(er) = d almost every¬ 
where. This condition should be compared with 


the conditions for completeness we established 
in the discrete-state setting. In that setting, we 
demonstrated that markets are complete if and 
only if the number of linearly independent price 
processes is equal to the maximum number of 
branches leaving a node. In fact, market com¬ 
pleteness is equivalent to the possibility of solv¬ 
ing a linear system with as many equations as 
branches leaving each node. 

In the present continuous-state setting, there 
are infinite states and so we need different types 
of considerations. Roughly speaking, each price 
process (which is an Ito process) depends on D 
independent sources of uncertainty as we as¬ 
sume that the standard Brownian motion is D- 
dimensional. In a finite-state setting this means 
that, if processes are Markovian, at each time 
step any process can jump to D different values. 
The market is complete if there are D indepen¬ 
dent price processes. Note that the number D is 
arbitrary. 


EQUIVALENT MARTINGALE 
MEASURES AND STATE 
PRICES 

We will now show that equivalent martingale 
measures and state prices are the same concept. 
We use the same setting as in the previous sec¬ 
tions. Suppose that Q is an equivalent martin¬ 
gale measure after deflation by the process 

J_ _ e fo —r u du 

Vt 1 

where r is a bounded short-rate process. The 
density process for Q is defined as 


where 


is the Radon-Nikodym derivative of Q with re¬ 
spect to P. As in the discrete-state setting, the 
Radon-Nikodym derivative of Q with respect 


dQ 

_dP J ’ 

dQ' 

dP 
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to P is a random variable 


£ = 


\ dQ 1 

dP 


with average value on the entire space equal to 1 
and such that, for every event A, the probability 
of A under Q is the average of f: 

p q (a) = p A m 


It can be demonstrated that, given any 3 f - 
measurable random variable W, the density 
process for Q has the following property: 


rQrun 

E p[W] = --- 

ft 

To gain an intuition for the Radon-Nikodym 
derivative in a continuous-state setting, let's 
assume that the probability space is the real 
line equipped with the Borel er-algebra and 
with a probability measure P. In this case, 
£ = £(x), R —* R and we can write 



or, dQ = l;dP. Given any random variable X 
with density / under P and density q under Q, 
we can then write 




In other words, the random variable § is a func¬ 
tion that multiplies the density / to yield the 
density q. 

We can now show the following key result. 
Given an equivalent martingale measure with 
density process a state-price deflator is given 
by the process 

f —r u du 

n, = % t e° 


Conversely, given a state-price deflator jtf, the 
density process 

t 

fr u du n . 

f t = e° — 

Jto 

defines an equivalent martingale measure. In 
fact, suppose that Q is an equivalent martingale 


measure for X v with 7tt = / Y t where 

t 

f —r u du 

Y t = e« 

Then, using the above relationship we can 
write: 

E t [jt t X t ] = EtfoXj] = ftEp[ffX, Y ] = t; t Xj 

= 7T t X t 


which shows that n t is a state-price deflator. The 
same reasoning in reverse order demonstrates 
that if 7 Tt is astate-price deflator then: 

t 

frudu 7Tt 

f t = e° — 

Jto 

is a density process for Q. 


ARBITRAGE PRICING WITH 
A PAYOFF RATE 

In the analysis thus far, we assumed that there 
is no intermediate payoff. The owner of an 
asset makes a profit or aloss due only to the 
changes in value of the asset. Let's now intro¬ 
duce a payoff-rate process 8j for each asset i. 
The payoff-rate process must be interpreted in 
the sense that the cumulative payoff of each in¬ 
dividual asset is 


f 



o 


We define a gain process 

G\ = S' + D\ 

By the linearity of the Ito integrals, we can write 
any trading strategy as 

t t t 

6 t dG t = j 6 t d X[ -{- J 6 t dD, 
ooo 

If there is a payoff-rate process, a self¬ 
financing trading strategy is a trading strategy 
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such that the following relationship holds: 

0 f s t = X>/sj = We/sj 

i i ' 

t 

+ j eidG'Xt e [0, T] 
o 

An arbitrage is, as before, a self-financing trad¬ 
ing strategy such that 

f?oSo < OandfhSr > 0, orf?oSo < Oand0-rSr > 0 

The previous arguments extend to this case. 
An equivalent martingale measure for the pair 
(D, S) is defined as an equivalent probabil¬ 
ity measure Q such that the Radon-Nikodym 
derivative 


has finite variance and the process G — S + D 
is a martingale. Under these conditions, the fol¬ 
lowing relationship holds: 


T 



t 


IMPLICATIONS OF THE 
ABSENCE OF ARBITRAGE 

We saw that the existence of an equivalent mar¬ 
tingale measure or of state-price deflators im¬ 
plies absence of arbitrage. We have also seen 
that, in the absence of arbitrage, markets are 
complete if and only if there is a unique equiv¬ 
alent martingale measure. 

In a discrete-state, discrete-time context we 
could establish the complete equivalence be¬ 
tween the existence of state-price deflators, 
equivalent martingale measures and absence of 
arbitrage, in the sense that any of these con¬ 
ditions implies the other two. In addition, the 
existence of a unique equivalent martingale 
measure implies absence of arbitrage and mar¬ 
ket completeness. 

In the present continuous-state context, how¬ 
ever, absence of arbitrage implies the existence 


of an equivalent martingale measure and of 
state price deflators only under rather restric¬ 
tive and complex technical conditions. If we 
want to relax these conditions, the condition 
of absence of arbitrage has to be slightly mod¬ 
ified. These discussions are quite technical and 
will not be presented in this entry. 3 

WORKING WITH 
EQUIVALENT MARTINGALE 
MEASURES 

The concepts established in the preceding sec¬ 
tions of this entry might seem very complex, ab¬ 
stract, and scarcely useful. On the contrary, they 
entail important simplifications in the computa¬ 
tion of derivative prices. Applications of these 
computations can be found in the pricing of 
bonds and credit derivatives. Here we want to 
make a few general comments on how these 
tools are used. 

The key result of the arbitrage pricing the¬ 
ory is that, under the equivalent martingale 
measure, all discounted price processes be¬ 
come martingales and all price processes have 
the same drift. Therefore, all calculations can 
be performed under the assumption that the 
change to an equivalent martingale measure 
has been made. This environment allows im¬ 
portant simplifications. For example, as we 
have seen, the option pricing problem becomes 
a problem of computing the present value of 
simpler processes. 

Obviously one has to go back to a real en¬ 
vironment at the end of the pricing exercise. 
This is essentially a calibration problem, as risk- 
neutral probabilities have to be estimated from 
real probabilities. Despite this complication, the 
equivalent martingale methodology has proved 
to be an important tool in derivative pricing. 

KEY POINTS 

• A trading strategy is a vector-valued pro¬ 
cess that represents portfolio weights at each 

moment. 
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• Trading gains are defined as stochastic inte¬ 
grals. 

• A self-financing trading strategy is one whose 
value at every moment is the initial value plus 
the trading gains at that moment. 

• An arbitrage is a self-financing trading strat¬ 
egy whose initial value is either negative and 
the final value nonnegative or the initial value 
non-negative and the final value positive. 

• The Black-Scholes option pricing formula can 
be established by replicating self-financing 
trading strategies. 

• The Black-Scholes pricing argument is based 
on constructing a self-financing trading strat¬ 
egy that replicates the option price in each 
state and for each time. 

• Absence of arbitrage implies that a replicating 
self-financing trading strategy must have the 
same price as the option. 

• The Black-Scholes option pricing formula is 
obtained by solving the partial differential 
equation implied by the equality of the repli¬ 
cating self-financing trading strategy and the 
option price process. 

• A deflator is any strictly positive Ito process; 
a state-price deflator is a deflator with the 
property that the deflated price process is a 
martingale. 

• If there is a (regular) state-price deflator, then 
there is no arbitrage; the converse is true only 
under a number of technical conditions. 

• Two probability measures are said to be 
equivalent if they assign probability zero to 
the same event. 

• Given a process X on a probability space with 
probability measure P, the probability mea¬ 
sure Q is said to be an equivalent martingale 
measure if it is equivalent to P and X is a 
martingale with respect to Q (plus other con¬ 
ditions). 

• If there is a regular deflator such that the 
deflated price process admits an equiva¬ 
lent martingale measure, then there is no 
arbitrage. 


• Under the equivalent martingale measure, all 
Ito price processes have the same drift. 

• In the absence of arbitrage, a market is com¬ 
plete if and only if there is a unique equivalent 
martingale measure. 

NOTES 

1. One can visualize this process as a tree struc¬ 
ture with an infinite number of branches 
and an infinite number of branching points. 
However, as the number of branches and of 
branching points is a continuum, intuition 
might be misleading. 

2. The theory of equivalent martingale mea¬ 
sures was developed in Harrison and Pliska 
(1981,1985) and Harrison and Kreps (1979). 

3. See Delbaen and Schachermayer (1994, 
1999). 
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Abstract: One of the basic mechanisms of learning is assimilating the information arriving from 
the external environment and then updating the existing knowledge base with that information. 
This mechanism lies at the heart of the Bayesian framework. A Bayesian decision maker learns 
by revising beliefs in light of the new data that become available. From the Bayesian point of 
view, probabilities are interpreted as degrees of belief. Therefore, the Bayesian learning process 
consists of revising probabilities. Contrast this with the way probability is interpreted in the classical 
(frequentist) statistical theory—as the relative frequency of occurrence of an event in the limit, as the 
number of observations goes to infinity. Bayes' theorem provides the formal means of putting that 
mechanism into action; it is a simple expression combining the knowledge about the distribution 
of the model parameters and the information about the parameters contained in the data. 


Quantitative financial models describe in math¬ 
ematical terms the relationships between fi¬ 
nancial random variables through time and / or 
across assets. The fundamental assumption is 
that the model relationship is valid indepen¬ 
dent of the time period or the asset class un¬ 
der consideration. Financial data contain both 
meaningful information and random noise. An 
adequate financial model not only extracts op¬ 
timally the relevant information from the his¬ 


torical data but also performs well when tested 
with new data. The uncertainty brought about 
by the presence of data noise makes imperative 
the use of statistical analysis as part of the pro¬ 
cess of financial model building, model evalua¬ 
tion, and model testing. 

Statistical analysis is employed from the van¬ 
tage point of either of the two main statis¬ 
tical philosophical traditions—frequentist and 
Bayesian. An important difference between the 
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two lies with the interpretation of the concept 
of probability. As the name suggests, advocates 
of frequentist statistics adopt a frequentist in¬ 
terpretation: The probability of an event is the 
limit of its long-run relative frequency (i.e., the 
frequency with which it occurs as the amount 
of data increases without bound). Strict adher¬ 
ence to this interpretation is not always pos¬ 
sible in practice. When studying rare events, 
for instance, large samples of data may not be 
available and in such cases proponents of fre¬ 
quentist statistics resort to theoretical results. 
The Bayesian view of the world is based on the 
subjectivist interpretation of probability: Prob¬ 
ability is subjective, a degree of belief that is 
updated as information or data are acquired. 

The concept of subjective probability is de¬ 
rived from arguments for rationality of the pref¬ 
erences of agents. It originated in the 1930s with 
the (independent) works of Bruno de Finetti 
(1931) and Frank Ramsey (1931), and was fur¬ 
ther developed by Leonard Savage (1954) and 
Dennis Lindley (1971). The subjective prob¬ 
ability interpretation can be traced back to 
the Scottish philosopher and economist David 
Hume, who also had philosophical influence 
over Harry Markowitz (by Markowitz's own 
words in his autobiography published in Les 
Prix Nobel, 1991). 

Closely related to the concept of probability is 
that of uncertainty. Proponents of the frequen¬ 
tist approach consider the source of uncertainty 
to be the randomness inherent in realizations 
of a random variable. The probability distribu¬ 
tions of variables are not subject to uncertainty. 
In contrast, Bayesian statistics treats probability 
distributions as uncertain and subject to modi¬ 
fication as new information becomes available. 
Uncertainty is implicitly incorporated by prob¬ 
ability updating. The probability beliefs based 
on the existing knowledge base take the form 
of the prior probability. 

The posterior probability represents the up¬ 
dated beliefs. Since the beginning of the last 
century, when quantitative methods and mod¬ 
els became a mainstream tool to aid in un¬ 


derstanding financial markets and formulating 
investment strategies, the framework applied in 
finance has been the frequentist approach. The 
term frequentist usually refers to the Fisherian 
philosophical approach named after Sir Ronald 
Fisher. 

Strictly speaking, "Fisherian" has a broader 
meaning as it includes not only frequentist sta¬ 
tistical concepts such as unbiased estimators, 
hypothesis tests, and confidence intervals, but 
also the maximum likelihood estimation frame¬ 
work pioneered by Fisher. Only in the last two 
decades has Bayesian statistics started to gain 
greater acceptance in financial modeling, de¬ 
spite its introduction about 250 years ago by 
Thomas Bayes, a British minister and mathe¬ 
matician. It has been the advancements of com¬ 
puting power and the development of new 
computational methods that has fostered the 
growing use of Bayesian statistics in finance. 

On the applicability of the Bayesian concep¬ 
tual framework, consider an excerpt from the 
speech of the former chairman of the Board 
of Governors of the Federal Reserve System, 
Alan Greenspan, at the Meeting of the Amer¬ 
ican Statistical Association in San Diego, Cali¬ 
fornia, January 3, 2004: 

The Federal Reserve's experiences over the past 
two decades make it clear that uncertainty is not 
just a pervasive feature of the monetary policy 
landscape; it is the defining characteristic of that 
landscape. The term "uncertainty" is meant here 
to encompass both "Knightian uncertainty," in 
which the probability distribution of outcomes is 
unknown, and "risk," in which uncertainty of out¬ 
comes is delimited by a known probability distribu¬ 
tion. ... This conceptual framework emphasizes un¬ 
derstanding as much as possible the many sources of 
risk and uncertainty that policymakers face, quan¬ 
tifying those risks when possible, and assessing the 
costs associated zvith each of the risks. In essence, 
the risk management approach to monetary poli¬ 
cymaking is an application of Bayesian [decision 
making]. 

The three steps of Bayesian decision making 
that Alan Greenspan outlines are: 

1. Formulating the prior probabilities to reflect 
existing information. 
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2. Constructing the quantitative model, taking 
care to incorporate the uncertainty intrinsic 
in model assumptions. 

3. Selecting and evaluating a utility function 
describing how uncertainty affects alterna¬ 
tive model decisions. 

While these steps constitute the rigorous 
approach to Bayesian decision making, appli¬ 
cations of Bayesian methods to financial mod¬ 
eling often only involve the first two steps or 
even only the second step. This tendency is a 
reflection of the pragmatic Bayesian approach 
that financial modelers often favor. 

Applications of the Bayesian framework to fi¬ 
nancial modeling include: 

• Bayesian approach to mean-variance portfo¬ 
lio selection. 

• Reflecting degrees of belief in an asset pricing 
model when selecting an optimal portfolio. 

• Bayesian methods of portfolio selection 
within the context of the Black-Litterman 
model. 

• Computing measures of market efficiency. 

• Estimating complex volatility models. 

All of these applications are presented in 
Rachev et al. (2008). 

In this entry, we discuss some of the basic 
principles of Bayesian analysis. 


THE LIKELIHOOD FUNCTION 

Suppose we are interested in analyzing the re¬ 
turns on a given stock and have available a his¬ 
torical record of returns. Any analysis of these 
returns, beyond a very basic one, would require 
that we make an educated guess about (pro¬ 
pose) a process that might have generated these 
return data. Assume that we have decided on 
some statistical distribution and denote it by 

p(y \o) (i) 

where y is a realization of the random variable 
Y (stock return) and 9 is a parameter specific to 


the distribution, p. Assuming that the distribu¬ 
tion we proposed is the one that generated the 
observed data, we draw a conclusion about the 
value of 9. Obviously, central to that goal is our 
ability to summarize the information contained 
in the data. The likelihood function is a statisti¬ 
cal construct with this precise role. Denote the 
n observed stock returns by t/i, 1 / 2 , ■ ■ ■, y„. The 
joint density function of Y, for a given value of 
9, is 

f(y 1 ,y 2 ,...,y n \9) 

By using the term "density function," we im¬ 
plicitly assume that the distribution chosen for 
the stock return is continuous, which is invari¬ 
ably the case in financial modeling. 

We can observe that the function above can 
also be treated as a function of the unknown 
parameter, 9, given the observed stock returns. 
That function of 9 is called the likelihood function. 
We write it as 

L(9 1 yi, y 2 ,.. ., y n ) = f(yi, yi,--., y« 1 0 ) (2) 

Suppose we have determined from the data 
two competing values of 9, 9\ and 9 2 , and want 
to determine which one is more likely to be 
the true value (at least, which one is closer to 
the true value). The likelihood function helps 
us make that decision. Assuming that our data 
were indeed generated by the distribution in 
(1), 9 1 is more likely than 9 2 to be the true pa¬ 
rameter value whenever L (0i | yi, y 2 ,..., y„) > 
L (0 2 | yi, y 2 ,... ,y n ). This observation provides 
the intuition behind the method most often 
employed in "classical" statistical inference to 
estimate 9 from the data alone—the method of 
maximum likelihood. The value of 9 most likely 
to have yielded the observed sample of stock 
return data, y\, y 2 ,..., y„, is the maximum like¬ 
lihood estimate, 9, obtained from maximizing 
the likelihood function in (2). 

To illustrate the concept of a likelihood func¬ 
tion, we briefly discuss two examples—one 
based on the Poisson distribution (a discrete 
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distribution) and another based on the normal 
distribution (one of the most commonly em¬ 
ployed continuous distributions). 


The Poisson Distribution 
Likelihood Function 

The Poisson distribution is often used to de¬ 
scribe the random number of events occurring 
within a certain period of time. It has a single 
parameter, 0, indicating the rate of occurrence 
of the random event, that is, how many events 
happen on average per unit of time. The proba¬ 
bility distribution of a Poisson random variable, 
X, is described by the following expression: 

p(X = /c) = ^e- e , k = 0,1,2,.... (3) 


The Poisson distribution is employed in the 
context of finance (most often, but not exclu¬ 
sively, in the areas of credit risk and operational 
risk) as the distribution of a stochastic process, 
called the Poisson process, which governs the 
occurrences of random events. 

Suppose we are interested in examining the 
annual number of defaults of North American 
corporate bond issuers and we have gathered a 
sample of data for the period from 1986 through 
2005. Assume that these corporate defaults oc¬ 
cur according to a Poisson distribution. Denot¬ 
ing the 20 observations by X\, x 2 ,... , x 2 o, we 
write the likelihood function for the Poisson pa¬ 
rameter 0(the average rate of defaults) as 1 


\e\ 


Xi, x 2 ,...,x w ) = 


Zu zU £, 

;_i i * ‘ 




(4) 


It is often customary to retain in the expressions 
for the likelihood function and the probability 
distributions only the terms that contain the un¬ 
known parameter(s); that is, we get rid of the 
terms that are constant with respect to the pa¬ 


rameters). Thus, (4) could be written as 

L(6 | x\, x 2 , ..., X 20 ) oc $£>= i x ‘ e ~ 200 (5) 


where oc denotes "proportional to." Clearly, 
for a given sample of data, the expressions in 
(4) and (5) are proportional to each other and 
therefore contain the same information about 0. 
Maximizing either of them with respect to 6, we 
obtain that the maximum likelihood estimator 
of the Poisson parameter, 6, is the sample mean, 
x: 


6 = x = 


v^20 

2 ^ i =1 


Xi 


20 


For the 20 observations of annual corporate de¬ 
faults, we get a sample mean of 51.6. The Pois¬ 
son probability distribution function (evaluated 
at 6 equal to its maximum-likelihood estimate, 
6 — 51.6) and the likelihood function for 6 can 
be visualized, respectively, in the left-hand-side 
and right-hand-side plots in Figure 1. 


The Normal Distribution 
Likelihood Function 

The normal distribution (also called the Gaus¬ 
sian distribution) has been the predominant 
distribution of choice in finance because of the 
relative ease of dealing with it and the availabil¬ 
ity of attractive theoretical results resting on it. 2 
It is certainly one of the most important distri¬ 
butions in statistics. Two parameters describe 
the normal distribution—the location param¬ 
eter, n, which is also its mean, and the scale 
(dispersion) parameter, a , also called standard 
deviation. The probability density function of a 
normally distributed random variable Y is ex¬ 
pressed as 

r , \ 1 (.y-ig 2 

f(y) = 2 * 2 (6) 

V2jt<j 

where y and // could take any real value and 
er can only take positive values. We denote the 
distribution of Y by Y ~ N(ji, ct). The normal 
density is symmetric around the mean, //, and 
its plot resembles a bell. 
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Figure 1 The Poisson Distribution Function and Likelihood Function 

Note: The graph on the left represents the mass function of the Poisson random variable evaluated at the 
maximum-likelihood estimate, 6 = 51.6. The graph on the right represents the likelihood function for 
the parameter of the Poisson distribution. 


Suppose we have gathered daily dollar return 
data on the MSCI-Germany Index for the pe¬ 
riod January 2,1998, through December 31,2003 
(a total of 1,548 returns), and we assume that 
the daily return is normally distributed. Then, 
given the realized index returns (denoted by 
t/i, xj 2 , ■ ■ ■, 3 / 1548 )/ the likelihood function for the 
parameters fi and a is written in the following 
way: 


L(fl, O' I 3/1, 3/2.3/1548) 

1548 

=n 


i =1 


= {^k) 


1548 


_ y"'1548 (jfj -M) 2 
. 2-ii=l la 2 


oc o 


\ r . a y- 1548 (y,-/c) 2 

-1548 -^=1 


( 7 ) 


We again implicitly assume that the MSCI- 
Germany index returns are independently and 
identically distributed (IID), that is, each daily 
return is a realization from a normal distribu¬ 
tion with the same mean and standard devia¬ 
tion. 

In the case of the normal distribution, since 
the likelihood is a function of two arguments, 
we can visualize it with a three-dimensional 
surface as in Figure 2. It is also useful to plot 
the so-called contours of the likelihood, which 
we obtain by "slicing" the shape in Figure 2 
horizontally at various levels of the likelihood. 


Each contour corresponds to a pair of parame¬ 
ter values (and the respective likelihood value). 
In Figure 3, for example, we could observe that 
the pair (/r, a) = (—0.23e — 3, 0.31e — 3), with 
a likelihood value of 0.6, is more likely than 
the pair (/r, a) = (0.096e — 3, 0.33e — 3), with a 
likelihood value of 0.1, since the corresponding 
likelihood is larger. 


BAYES' THEOREM 

Bayes' theorem is the cornerstone of the Bayesian 
framework. Formally, it is a result from in¬ 
troductory probability theory, linking the un¬ 
conditional distribution of a random variable 
with its conditional distribution. For Bayesian 



i 
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Figure 2 The Likelihood Function for the Pa¬ 
rameters of the Normal Distribution 
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3.4 - Likelihood level = 0.1 



2.8 - Likelihood level = 0.6 
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Figure 3 The Likelihood Function for the Pa¬ 
rameters of the Normal Distribution: Contour Plot 


proponents, it is the representation of the philo¬ 
sophical principle underlying the Bayesian 
framework that probability is a measure of the 
degree of belief one has about an uncertain 
event. Bayes' theorem is a rule that can be used 
to update the beliefs that one holds in light of 
new information (for example, observed data). 

We first consider the discrete version of Bayes' 
theorem. Denote the evidence prior to observ¬ 
ing the data by £ and suppose that a re¬ 
searcher's belief in it can be expressed as the 
probability P(E). The Bayes theorem tells us 
that, after observing the data, D, the belief in 
E is adjusted according to the following expres¬ 
sion: 


P(E | D) = 


P(D | £) x P(E) 
P(Dj 


( 8 ) 


where: 


1. P(D | E) is the conditional probability of the 
data given that the prior evidence, £, is true. 

2. P(D) is the unconditional (marginal) proba¬ 
bility of the data, P(D) > 0; that is, the prob¬ 
ability of D irrespective of £, also expressed 
as 

P(D) = P(D | £) x P(E) + P(D | E c ) x P(E C ) 

where the subscript c denotes a complemen¬ 
tary event. 3 


The probability of £ before seeing the data, 
P(E), is called the prior probability, whereas the 
updated probability, P(£ | D), is called the poste¬ 
rior probability , 4 Notice that the magnitude of the 
adjustment of the prior probability, P(£), after 
observing the data is given by the ratio P(D|E) / 
P(D). The conditional probability, P(D|E), when 
considered as a function of £, is in fact the like¬ 
lihood function, as will become clear further 
below. 

As an illustration, consider a manager in 
an event-driven hedge fund. The manager is 
testing a strategy that involves identifying 
potential acquisition targets and examines the 
effectiveness of various company screens, in 
particular the ratio of stock price to free cash 
flow per share (PFCF). Let us define the follow¬ 
ing events: 

D = Company X's PFCF has been more than 
three times lower than the sector average for 
the past three years. 

£ = Company X becomes an acquisition target 
in the course of a given year. 

Independently of the screen, the manager as¬ 
sesses the probability of company X being tar¬ 
geted at 40%. That is, denoting by E c the event 
that X does not become a target in the course of 
the year, we have 

P(E) = 0.4 

and 

P(£ c ) = 0.6 

Suppose further that the manager's analysis 
suggests that the probability a target company's 
PFCF has been more than three times lower than 
the sector average for the past three years is 75% 
while the probability that a nontarget company 
has been having that low of a PFCF for the past 
three years is 35%: 

P(D | E) = 0.75 

and 


P(D | E c ) = 0.35 
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If a bidder does appear on the scene, what is 
the chance that the targeted company had been 
detected by the manager's screen? To answer 
this question, the manager needs to update the 
prior probability P(E) and compute the pos¬ 
terior probability P(E | D). Applying (8), we 
obtain 


v 7 0.75 x 0.4 + 0.35 x 0.6 

»0.59 (9) 

After taking into account the company's persis¬ 
tently low PFCF, the probability of a takeover 
increases from 40% to 59%. 

In financial applications, the continuous ver¬ 
sion of the Bayes' theorem (as follows later) is 
predominantly used. Nevertheless, the discrete 
form has some important uses, two of which we 
briefly outline now. 


Bayes' Theorem and Model 
Selection 

The usual approach to modeling of a financial 
phenomenon is to specify the analytical and 
distributional properties of a process that one 
thinks generated the observed data and treat 
this process as if it were the true one. Clearly, 
in doing so, one introduces a certain amount 
of error into the estimation process. Account¬ 
ing for model risk might be no less important 
than accounting for (within-model) parameter 
uncertainty, although it seems to preoccupy re¬ 
searchers less often. 

One usually entertains a small number of 
models as plausible ones. The idea of apply¬ 
ing the Bayes' theorem to model selection is to 
combine the information derived from the data 
with the prior beliefs one has about the degree 
of model validity. One can then select the single 
"best" model with the highest posterior proba¬ 
bility and rely on the inference provided by it or 
one can weigh the inference of each model by its 
posterior probability and obtain an "averaged- 
out" conclusion. 


Bayes' Theorem and Classification 

Classification refers to assigning an object, 
based on its characteristics, into one out of 
several categories. It is most often applied in 
the area of credit and insurance risk, when 
a creditor (an insurer) attempts to determine 
the creditworthiness (riskiness) of a potential 
borrower (policyholder). Classification is a 
statistical problem because of the existence 
of information asymmetry—the creditor's (in¬ 
surer's) aim is to determine with very high 
probability the unknown status of the borrower 
(policyholder). For example, suppose that a 
bank would like to rate a borrower into one 
of three categories: low risk (L), medium risk 
(M), and high risk (H). It collects data on the 
borrower's characteristics such as the current 
ratio, the debt-to-equity ratio, the interest cov¬ 
erage ratio, and the return on capital. Denote 
these observed data by the four-dimensional 
vector y. The dynamics of y depends on the 
borrower's category and is described by one of 
three (multivariate) distributions, 

f(y\c = L) 
f(y\c = M ) 

or 

f(y\c = H) 

where C is a random variable describing the cat¬ 
egory. Let the bank's belief about the borrower's 
category be jr,-, where 

TT\ = It (C = L) 

7r 2 = n(C — M) 

and 

jt 3 = jr(C = H) 

The discrete version of Bayes' theorem can be 
employed to evaluate the posterior (updated) 
probability, tt(C =i \ y), i = L, M, H, that the 
borrower belongs to each of the three categories. 

Let us now take our first steps in illustrating 
how Bayes' theorem helps in making inferences 
about an unknown distribution parameter. 
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Note: X = number of occurences of A = 2 within the sample period 

Figure 4 The Number of Consecutive Trade-by-Trade Price Increases 


Bayesian Inference for the Binomial 
Probability 

Suppose we are interested in analyzing the dy¬ 
namic properties of the intraday price changes 
for a stock. In particular, we want to evaluate 
the probability of consecutive trade-by-trade 
price increases. In an oversimplified scenario, 
this problem could be formalized as a binomial 
experiment. 

The binomial experiment is a setting in which 
the source of randomness is a binary one (only 
takes on two alternative modes/states) and the 
probability of both states is constant through¬ 
out. The binomial random variable is the num¬ 
ber of occurrences of the state of interest. In our 
illustration, the two states are "the consecutive 
trade-by-trade price change is an increase" and 
"the consecutive trade-by-trade price change is 
a decrease or null." The random variable is the 
number of consecutive price increases. Denote 
it by X. Denote the probability of a consecutive 
increase by 0. Our goal is to draw a conclusion 
about the unknown probability, 6. 

As an illustration, we consider the transaction 
data for the AT&T stock during the two-month 
period from January 4,1993, through February 


26, 1993 (a total of 55,668 price records). The 
diagram in Figure 4 shows how we define the 
binomial random variable given six price obser¬ 
vations, Pi,..., Pf : . (Notice that the realizations 
of the random variable are one less than the 
number of price records.) A consecutive price 
increase is "encoded" as A = 2 and its proba¬ 
bility is 9 = P(A = 2); all other realizations of 
A (A = —2, —1, 0 or 1) have a probability of 
1 — 6. We say that the number of consecutive 
price increases, X, is distributed as a binomial 
random variable with parameter 6. The proba¬ 
bility mass function of X is represented by the 
expression 

P(X = x\6) = ^0*( l-6) n ~ x 

x = 0,1,2,... ,n (10) 

where n is the sample size (the number of trade- 
by-trade price changes; a price change could 

be zero) and = x ,^L x y, • During the sam¬ 
ple period, there are X = 176 trade-by-trade 
consecutive price increases. This information is 
embodied in the likelihood function for 6: 

L(6\X= 176) = 6 176 (1 - 0) 55667 - 176 


( 11 ) 
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We would like to combine that information 
with our prior belief about what the probability 
of a consecutive price increase is. We denote the 
prior distribution of an unknown parameter 9 
by Jt(9), the posterior distribution of 9 by it(9 
| data), and the likelihood function by L(9 | data). 

We consider two prior scenarios for the prob¬ 
ability of consecutive price increases, 6: 

1. We do not have any particular belief about 
the probability 9. Then, the prior distribution 
could be represented by a uniform distribu¬ 
tion on the interval [0,1]. Note that this prior 
assumption implies an expected value for 9 
of 0.5. The density function of 9 is given by 

jt(9) = 1, 0<9<1 

2. Our intuition suggests that the probability of 
a consecutive price increase is around 2%. A 
possible choice of a prior distribution for 6 is 
the beta distribution. 5 The density function 
of 9 is then written as 

M0 I a,0)= ———-f?" -1 (1 - 9f~\ 0 < 9 < 1 
B{a,p) 

( 12 ) 


where a > 0 and f > 0 are the parameters of 
the beta distribution and B(a, (i) is the so-called 
beta function. We set the parameters a and j J > to 
1.6 and 78.4, respectively. 

Figure 5 presents the plots of the two prior 
densities. Notice that under the uniform prior, 
all values of 9 are equally likely, while under 
the beta prior, we assert higher prior probability 
for some values and lower prior probability for 
others. 

Combining the sample information with the 
prior beliefs, we obtain 9's posterior distribu¬ 
tion. We rewrite Bayes' theorem with the nota¬ 
tion in the current discussion: 


p(9 | x) = 


L(9 | x)ir(9) 

fix) 


(13) 


where/(x) is the unconditional (marginal) dis¬ 
tribution of the random variable X, given by 



L(9 | x)jr(x) d 9 


(14) 


in 


>. q 



o J _ 

O ,-,-,-,-,-, 

0.0 0.2 0.4 0.6 0.8 1.0 

e 



Figure 5 Density Curves of the Two Prior Dis¬ 
tributions for the Binomial Parameter, 9 
Note: The density curve on top is the uniform den¬ 
sity, while the one at the bottom is the beta density. 


Since /(x) is obtained by averaging over all 
possible values of 9, it does not depend on 9. 
Therefore, we can rewrite (8) as 

7 r(9 | x) oc L(9 \ x)jt(9) (15) 

The expression in (15) provides us with the pos¬ 
terior density of 9 up to some unknown con¬ 
stant. However, in certain cases we would still 
be able to recognize the posterior distribution as 
a known distribution, as we see shortly. 6 Since 
both assumed prior distributions of 9 are con¬ 
tinuous, the posterior density is also continuous 
and (13) and (15), in fact, represent the continu¬ 
ous version of Bayes’ theorem. 

Let us see what the posterior distribution for 
9 is under each of the two prior scenarios. 

1. The posterior of 9 under the uniform prior 
scenario is written as 

tt(0 | x) oc L(9 | x) x 1 

oc 0l76 (1 _ 0)55667-176 

= 9 177 -\1 - 0) 55492 - 1 


(16) 
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where the first oc refers to omitting the 
marginal data distribution term in (14), while 
the second oc refers to omitting the constant 
term from the likelihood function. 

The expression 0 177_1 ( 1 — 0)55492-1 above 
resembles the density function of the beta 
distribution in (12). The missing part is the 
term B(177, 55492), which is a constant with 
respect to 0. We call 0“ _1 (1 — 0)^ _1 the ker¬ 
nel of a beta distribution with parameters a 
and fi. Obtaining it is sufficient to identify 
uniquely the posterior of 0 as a beta distribu¬ 
tion with parameters a = 177 and ft = 55492. 
2. The beta distribution is the conjugate prior 
distribution for the binomial parameter 0. 
This means that the posterior distribution of 
0 is also a beta distribution (of course, with 
updated parameters): 

7 r(0 | x) oc L(0 | x)ix(9) 

OC 0l76 (1 _ 0) 55667-1760l.6-l (1 _ 0) 78.4-l 

= 0 177 - 6 ~\1 - 0)55569.4-1 (17) 

where again we omit any constants with re¬ 
spect to 9. As expected, we can recognize 
the expression in the last line above as the 
kernel of a beta distribution with parameters 
a = 177.6 and /3 = 55569.4. 

Finally, we might want to obtain a single num¬ 
ber as an estimate of 0. In the classical (fre- 
quentist) setting, the usual estimator of 6 is the 
maximum likelihood estimator (the value max¬ 
imizing the likelihood function in (11)), which 
happens to be the sample proportion 9: 

d =Hk = om ' 6 (18) 

or 0.316%. 

In the Bayesian setting, one possible estimate 
of 9 is the posterior mean, that is, the mean of 
0's posterior distribution. Since the mean of the 
beta distribution is given by a/(a + ft), the pos¬ 
terior mean of 9 (the expected probability of 
consecutive trade-by-trade increase in the price 
of the AT&T stock) under the uniform prior 


scenario is 


0 U 


177 

177 + 55492 


0.00318 


or 0.318%, while the posterior mean of 9 under 
the beta prior scenario is 


177.6 


177.6 + 55569.4 


0.00319 


or 0.319%. 

The two posterior estimates and the 
maximum-likelihood estimate are the same for 
all practical purposes. The reason is that the 
sample size is so large that the information con¬ 
tained in the data sample "swamps out" the 
prior information. 


KEY POINTS 

• Statistical analysis is employed from the van¬ 
tage point of either of the two main statisti¬ 
cal philosophical traditions—frequentist and 
Bayesian. 

* The frequentist interpretation of the proba¬ 
bility of an event is that it is the limit of 
its long-run relative frequency (i.e., the fre¬ 
quency with which it occurs as the amount of 
data increases without bound). 

• The Bayesian view of the world is based on 
the subjectivist interpretation of probability: 
Probability is subjective, a degree of belief 
that is updated as information or data are ac¬ 
quired. 

* In the Bayesian framework, probability be¬ 
liefs based on the existing knowledge base 
take the form of the prior probability; the 
posterior probability represents the updated 
beliefs. 

• The likelihood function is a statistical con¬ 
struct summarizing the information con¬ 
tained in the sample of data. 

* Bayes' theorem links the unconditional 
and unconditional probabilities. Under the 
Bayesian approach, prior beliefs are com¬ 
bined with sample information to create up¬ 
dated posterior beliefs. 
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• Two important applications of the discrete 
form of Bayes' theorem are model selection 
and classification. 

• In financial applications, the continuous ver¬ 
sion of Bayes' theorem is predominantly 
used. 


NOTES 

1. In this example, we assume, perhaps unre¬ 
alistically, that 0 stays constant through time 
and that the annual number of defaults in 
a given year is independent from the num¬ 
ber of defaults in any other year within 
the 20-year period. The independence as¬ 
sumption means that each observation of the 
number of annual defaults is regarded as a 
realization from a Poisson distribution with 
the same average rate of defaults. O', this al¬ 
lows us to represent the likelihood function 
as the product of the mass function at each 
observation. 

2. One such result is the Central Limit Theo¬ 
rem which asserts that, under certain mild 
regularity conditions, sums of independent 
random variables are distributed with the 
normal distribution asymptotically (as the 
terms of the sum become indefinitely many). 

3. The complement (complementary event) of 
E, E c , includes all possible outcomes that 
could occur if £ is not realized. The probabil¬ 
ities of an event and its complement always 
sum up to 1: P(E) + P(E C ) — 1. 

4. The expression in (8) is easily generalized 
to the case when a researcher updates be¬ 


liefs about one of many mutually exclusive 
events (such that two or more of them oc¬ 
cur at the same time). Denote these events 
by Ei, E 2 , ■ ■ ., E k . The events are such that 

their probabilities sum up to 1: P(£ i) H-h 

P(Ek) = 1. Bayes' theorem then takes the form 

P(E k I D) = 

_ P(P ]E t )x P(E t ) _ 

P(D | Ei) x P(Ej) + P(D | E 2 ) x P(E 2 ) + ■ ■ ■ + P(D | E x ) x P( E K ) 

for k = 1,..., K and P(D) > 0. 

5. The beta distribution is the conjugate dis¬ 
tribution for the parameter of the binomial 
distribution. 

6. When the posterior distribution is not rec¬ 
ognizable as a known distribution, inference 
about 0 is accomplished with the help of nu¬ 
merical methods. 
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Abstract: Bayesian inference is the process of arriving at estimates of the model parameters reflecting 
the blending of information from different sources. Most commonly, two sources of information 
are considered: prior knowledge or beliefs and observed data. The discrepancy (or lack thereof) 
between them and their relative strength determines how far away the resulting Bayesian estimate 
is from the corresponding classical estimate. Along with the point estimate, which most often is the 
posterior mean, in the Bayesian setting one has available the whole posterior distribution, allowing 
for a richer analysis. 


In this entry, we focus on the essentials 
of Bayesian inference. Formalizing the practi¬ 
tioner's knowledge and intuition into prior 
distributions is a key part of the inferential pro¬ 
cess. Especially when the data records are not 
abundant, the choice of prior distributions can 
influence greatly posterior conclusions. After 
presenting an overview of some approaches to 
prior specification, we focus on the elements 
of posterior analysis. Posterior and predictive 
results can be summarized in a few numbers, 
as in the classical statistical approach, but one 


could also easily examine and draw conclusions 
about all other aspects of the posterior and pre¬ 
dictive distributions of the (functions of the) 
parameters. 


PRIOR INFORMATION 

The prior distribution for the model parameters 
is an integral component of the Bayesian infer¬ 
ence process. The updated (posterior) beliefs 
are the result of the trade-off between the prior 
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and data distributions. The continuous form of 
Bayes' theorem is: 

p(6 | y) oc L(0 | ij)jt(0) (1) 

where 

6 — unknown parameter whose infer¬ 
ence we are interested in. 
y — a vector (or a matrix) of recorded 
observations. 

tt(6) — prior distribution of 6 depending 
on one or more parameters, called 
hyperparameters. 

L(6 | y) — likelihood function for 9. 
p(6 | y) — posterior (updated) distribution 
of 6. 

Two factors determine the degree of posterior 
trade-off—the strength of the prior information 
and the amount of data available. Generally, 
unless the prior is very informative (in a sense 
that will become clear), the more observations, 
the greater the influence of the data on the pos¬ 
terior distribution. On the contrary, when very 
few data records are available, the prior distri¬ 
bution plays a predominant role in the updated 
beliefs. 

How to translate the prior information about 
a parameter into the analytical (distributional) 
form, 7 r(9), and how sensitive the posterior in¬ 
ference is to the choice of prior have been ques¬ 
tions of considerable interest in the Bayesian 
literature. 1 There is, unfortunately, no "best" 
way to specify the prior distribution and trans¬ 
lating subjective views into prior values for the 
distribution parameters could be a difficult un¬ 
dertaking. 

Before we review some commonly used ap¬ 
proaches to prior elicitation, we make the 
following notational and conceptual note. It is 
often convenient to represent the posterior dis¬ 
tribution, p(0 | y), in a logarithmic form. Then, 
it is easy to see that the expression in (1) is trans¬ 
formed according to 

log (p(0 I 1/)) = const + log(L(0 | t/)) + log(jr(0)), 


where const is the logarithm of the constant of 
proportionality. 

Informative Prior Elicitation 

Prior beliefs are informative when they mod¬ 
ify substantially the information contained in 
the data sample so that the conclusions we 
draw about the model parameters based on the 
posterior distribution and on the data distri¬ 
bution alone differ. The most commonly used 
approach to representing informative prior be¬ 
liefs is to select a distribution for the unknown 
parameter and specify the hyperparameters so 
as to reflect these beliefs. 

Informative Prior Elicitation for Location and 
Scale Parameters 

Usually, when we think about the average value 
that a random variable takes, we have the typ¬ 
ical value in mind. Therefore, we hold beliefs 
about the median of the distribution rather than 
its mean . 2 This distinction does not matter in the 
case of symmetric distributions, since then the 
mean and the median coincide. However, when 
the distribution we selected is not symmetric, 
care must be taken to ensure that the prior pa¬ 
rameter values reflect our beliefs. Formulating 
beliefs about the spread of the distribution is 
less intuitive. The easiest way to do so is to 
ask ourselves questions such as, for instance: 
Which value of the random variable do a quar¬ 
ter of the observations fall below/above? De¬ 
noting the random variable by X, the answers 
to these questions give us the following proba¬ 
bility statements: 

P(X < xo. 25 ) = 0.25 

and 

P(X > xo. 75 ) = 0.25 

where X 0.25 and X 0.75 are the values we have sub¬ 
jectively determined and are referred to as the 
first and third quartiles of the distribution, re¬ 
spectively. Other similar probability statements 
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can be formulated, depending on the prior 
beliefs. 

As an example, suppose that we model the 
behavior of the monthly returns on some finan¬ 
cial asset and the normal distribution, N(p, a 2 ) 
(along with the assumption that the returns are 
independently and identically distributed), de¬ 
scribes their dynamics well. Assume for now 
that the variance is known, a 2 — o 2 *, and thus 
we only need to specify a prior distribution for 
the unknown mean parameter, p. We believe 
that a symmetric distribution is an appropri¬ 
ate choice and go for the simplicity of a normal 
prior: 

M ~ N (y, t 2 ) (2) 

where y is the prior mean and r 2 is the prior 
variance of p; to fully specify p's prior, we need 
to (subjectively) determine their values. We be¬ 
lieve that the typical monthly return is around 
1%, suggesting that the median of p's distribu¬ 
tion is 1%. Therefore, we set y to 1%. Further, 
suppose we (subjectively) estimate that there is 
about a 25% chance that the average monthly 
return is less than 0.5% (i.e., p 0.25 = 0.5%). Then, 
using the tabulated cumulative probability val¬ 
ues of the standard normal distribution, we find 
that the implied variance, r 2 , is approximately 
equal to 0.74 2 . 3 Our choice for the prior distri¬ 
bution of p is thus :r(p) = N(l, 0.74 2 ). 

Noninformative Prior Distributions 

In many cases, our prior beliefs are vague 
and thus difficult to translate into an infor¬ 
mative prior. We therefore want to reflect 
our uncertainty about the model parameter(s) 
without substantially influencing the posterior 
parameter inference. The so-called noninforma¬ 
tive priors, also called vague or diffuse priors, 
are employed to that end. 

Most often, the noninformative prior is cho¬ 
sen to be either a uniform (flat) density defined 
on the support of the parameter or the Jeffreys' 
prior. 4 The noninformative distribution for a 
location parameter, p, is given by a uniform 


distribution on its support ((— 00 , 00 )), that is, 5 

tt(h) oc 1 (3) 

The noninformative distribution for a scale pa¬ 
rameter, a (defined on the interval (0, 00 )) is 6 

1 

7 r(er) a — (4) 

a 

Notice that the prior densities in both (3) and 
(4) are not proper densities, in the sense that 
they do not integrate to one: 



Even though the resulting posterior densities 
are usually proper, care must be taken to ensure 
that this is indeed the case. To avoid impropri¬ 
ety of the posterior distributions, one could em¬ 
ploy proper prior distributions but make them 
noninformative, as we discuss further on. 

When one is interested in the joint posterior 
inferences for p and a , these two parameters 
are often assumed independent, giving the joint 
prior distribution 

1 

7T(p, a) cx — (5) 

a 

The prior in (5) is often referred to as the Jeffreys' 
prior. 7 

Prior ignorance could also be represented by a 
(proper) standard distribution with a very large 
dispersion—the so-called flat or diffuse proper 
prior distribution. Let us turn again to the ex¬ 
ample for the monthly returns for some finan¬ 
cial asset we considered earlier and suppose 
that we do not have particular prior informa¬ 
tion about the range of typical values the mean 
monthly return could take. To reflect this igno¬ 
rance, we might center the normal distribution 
of p around 0 (a neutral value, so to speak) and 
fix the standard deviation, r, at a large value 
such as 10 6 , that is, 7r(p) = N(0, (10 6 ) 2 ). 

The prior of p could take alternative dis¬ 
tributional forms. For instance, a symmetric 
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Student's f-distribution could be asserted. A 
standard Student's f-distribution has a single 
parameter, the degrees of freedom, v, which 
one can use to regulate the heaviness of the 
prior's tails—the lower v is, the flatter the prior 
distribution. Asserting a scaled Student's t- 
distribution with a scale parameter, a, provides 
additional flexibility in specifying the prior of 
p. 8 It can be argued that eliciting heavy-tailed 
prior distributions (with tails heavier than the 
tails of the data distribution) increases the pos¬ 
terior's robustness, that is, lowers the sensitiv¬ 
ity of the posterior to the prior specification. 

Conjugate Prior Distributions 

In many situations, the choice of a prior dis¬ 
tribution is governed by the desire to obtain 
analytically tractable and convenient posterior 
distribution. Thus, if one assumes that the data 
have been generated by a certain class of dis¬ 
tributions, employing the class of the so-called 
"conjugate prior distributions" guarantees that 
the posterior distribution is of the same class 
as the prior distribution. 9 Although the prior 
and posterior distributions have the same form, 
their parameters differ—the parameters of the 
posterior distribution reflects the trade-off be¬ 
tween prior and sample information. We now 
consider the case of the normal data distribu¬ 
tion, since it is central to our discussions of fi¬ 
nancial applications. 

If the data, x, are assumed to come from a 
normal distribution, the conjugate priors for the 
normal mean, //, and variance, a 1 , are, respec¬ 
tively, a normal distribution and an inverted j 2 
distribution 10 

jr(p |ct 2 ) = N (ij, 

and 

jr(cr 2 ) = Inv - x 2 (H),Co) ( 6 ) 

where Inv — y 2 (u, c 2 ) denotes the inverted y 2 
distribution with vq degrees of freedom and a 
scale parameter c jj. The prior parameters (hy¬ 
perparameters) that need to be (subjectively) 


specified in advance are t\, T, vq, and c\. The 
parameter T plays the role of a discount factor, 
reflecting the degree of uncertainty about the 
distribution of //. Usually, T is greater than one 
since one naturally holds less uncertainty about 
the distribution of the mean, /i, (with variance 
ct 2 /T) than the data, x (with variance a 2 ). 

In various financial applications, the normal 
distribution is often not the most appropriate 
assumption for a data-generation process in 
view of various empirical features that financial 
data exhibit. Alternative distributional choices 
most often do not have corresponding conju¬ 
gate priors and the resulting posterior distribu¬ 
tions might not be recognizable as any known 
distributions. Then, numerical methods are ap¬ 
plied to compute the posteriors. 

In general, eliciting conjugate priors should 
be preceded by an analysis of whether prior be¬ 
liefs would be adequately represented by them. 

Empirical Bayesian Analysis 

So far, we took care to emphasize the sub¬ 
jective manner in which prior information is 
translated into a prior distribution. This in¬ 
volves specifying the prior hyperparameters 
(if an informative prior is asserted) before 
observing/analyzing the set of data used 
for model evaluation. One approach for elic¬ 
iting the hyperparameters parts with this 
tradition—the so-called "empirical Bayesian 
approach." In it, sample information is used 
to compute the values of the hyperparameters. 
Here we provide an example with the natural 
conjugate prior for a normal data distribution. 

Denote the sample of n observations by x = 
(x\, x 2 , ..., x n ) . It can be shown that the nor¬ 
mal likelihood function can be expressed in the 
following way: 

L(/x, a 2 | x) 

. ( w)-” ,z ex P (-Sh|prT j 

= (2tto 2 ) ' exp (- A (vs 2 + (l (p _ £) 2 ) j 

(7) 
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where 


Posterior Point Estimates 


- ET=i* 

p = —-—-—, v = n — 1, 
n 

and 


„ 2 ELite-A ) 2 

n — 1 


( 8 ) 


The quantities A and s 2 are, respectively, the 
unbiased estimators of the mean, p, and the 
variance, cr 2 , of the normal distribution. 11 It is 
now easy to see that the likelihood in (7) can be 
viewed as the product of two distributions—a 
normal distribution for p conditional on a 2 , 


p |ct 


N 



and an inverted x 2 distribution for a 2 , 
a 2 ~ Inv — x 2 (v, s 2 ) 


which become the prior distributions under 
the empirical Bayesian approach. We can ob¬ 
serve that these two distributions are, of course, 
the same as the ones in (6). Their parame¬ 
ters are functions of the two sufficient statistics 
for the normal distribution, instead of subjec¬ 
tively elicited quantities. The sample size, n, 
above plays the role of the discount factor, T, 
in (6)—the more data available, the less uncer¬ 
tain one is about the prior distribution of p (its 
prior variance decreases). 

We now turn to a discussion of the fundamen¬ 
tals of posterior inference. Later in this entry, we 
provide an illustration of the effect various prior 
assumptions have on the posterior distribution. 


POSTERIOR INFERENCE 

The posterior distribution of a parameter (vec¬ 
tor) 6 given the observed data x is denoted as 
p(9 | x) and obtained by applying the Bayes' 
theorem given by (1). Being a combination of 
the data and the prior, the posterior contains all 
relevant information about the unknown 
parameter 9. 


Although the benefit of being able to visual¬ 
ize the whole posterior distribution is unques¬ 
tionable, it is often more practical to report 
several numerical characteristics describing the 
posterior, especially if reporting the results to 
an audience used to the classical (frequentist) 
statistical tradition. Commonly used for this 
purpose are the point estimates, such as the 
posterior mean, the posterior median, and the 
posterior standard deviation. 12 When the pos¬ 
terior is available in closed form, these numer¬ 
ical summaries can also be expressed in closed 
form. The posterior parameters in the natural 
conjugate prior scenario with a normal sam¬ 
pling density (see (6)) are also available ana¬ 
lytically. The mean parameter, p, of the normal 
distribution has a normal posterior, conditional 
on a 2 , 

p{p-\x,(J 2 )=N^p*,^-^j (9) 


The posterior mean and variance of p are 
given, respectively, by 


n 


T 


E(/z | x, a 2 ) = p* = p—^ 


X 1 > n_ , X 

a 1 " r a 1 a 1 ~ l " a 2 

n T 

= p—rp + ri—— (10) 
n+T n+T 

where p is the sample mean as given in (8) and 

_2 


var (p | x, a ) = 


T + n 


( 11 ) 


In practical applications, usually the empha¬ 
sis is placed on obtaining the posterior dis¬ 
tribution of p, not least because it is more 
difficult to formulate prior beliefs about the 
variance, a 2 (let alone the whole covariance ma¬ 
trix in the multivariate setting). Often, then, the 
variance (covariance matrix) is estimated out¬ 
side of the regression model and then fed into 
it, as if it were the "known" variance (covari¬ 
ance matrix). 13 Nevertheless, for completeness, 
we provide rr 2 's posterior distribution—an in¬ 
verted x 2 . 


p (a 2 | x) = Inv - x 2 (^*, c T ) 


( 12 ) 
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where 

v* = vo + n, (13) 

c 2 * = ^ (h4 + (n~ l)s 2 + -^(A - 0 ) 2 ) 

(14) 

and s 2 is the unbiased sample estimator of the 
normal variance as given in (8). Using (13) and 
(14), one can now compute the posterior mean 
and variance of a 2 as, respectively 14 

E(a 2 | x) = -^-c 2 * (15) 

v* — 2 

and 


var(or 2 | x) = 


2i> 


*2 


(v* — 2) 2 (v* — 4) 


( c 2 *) 2 


(16) 


When the posterior is not of known form 
and is computed numerically (through simu¬ 
lations), so are the posterior point estimates, 
as well as the distributions of any functions of 
these estimates (see Chapter 4 in Rachev et al., 
2008). 


Bayesian Intervals 

The point estimate for the center of the pos¬ 
terior distribution is not too informative if the 
posterior uncertainty is significant. To assess the 
degree of uncertainty, a posterior (1 — a)100% 
interval [a, b], called a credible interval, can 
be constructed. The probability that the un¬ 
known parameter, 0, falls between a and b is 
(1 - a)100%, 

P(a < 6 <b \ x) = ( p(9 | x) d6 = 1 — a 

J a 

For reasons of convenience, the interval bounds 
may be determined so that an equal proba¬ 
bility, a/2, is left in the tails of the posterior 
distribution. For example, a could be chosen 
to be the 0.25th quantile, while b —the 0.75th 
quantile. The interpretation of the credible in¬ 
terval is often mistakenly ascribed to the classi¬ 
cal confidence interval. In the classical setting, 
(1 — a)100% is a coverage probability—if ar¬ 


bitrarily many repeated samples of data are 
recorded, (1 — a)100% of the corresponding 
confidence intervals will contain 6 —a much less 
intuitive interpretation. 

The credible interval is computed either ana¬ 
lytically, by finding the theoretical quantiles of 
the posterior distribution (when it is of known 
form), or numerically, by finding the empirical 
quantiles using the simulations of the posterior 
density (see Chapter 4 in Rachev et al., 2008). 15 

Bayesian Hypothesis Comparison 

The title of this section 16 abuses the usual ter¬ 
minology by intentionally using "comparison" 
instead of "testing" in order to stress that the 
Bayesian framework affords one more than 
the mere binary reject/do-not-reject decision of 
the classical hypothesis testing framework. In 
the classical setting, the probability of a hypoth¬ 
esis (null or alternative) is either 0 or 1 (since fre- 
quentist statistics considers parameters as fixed, 
although unknown, quantities). 

In contrast, in the Bayesian setting (where pa¬ 
rameters are treated as random variables), the 
probability of a hypothesis can be computed 
(and is different from 0 or 1, in general), allow¬ 
ing for a true hypothesis comparison. 17 

Suppose one wants to compare the null hy¬ 
pothesis 

Ho : 0 is in @o 
with the alternative hypothesis 
Hi : 0 is in @i 

where ©o and ©i are disjoint sets of possible 
values for the unknown parameter 0. As with 
point estimates and credible intervals, hypothe¬ 
sis comparison is entirely based on 0's posterior 
distribution. We compute the posterior proba¬ 
bilities of the null and alternative hypotheses, 

P(0 is in ©o | x) = ( p(9\x)d9 (17) 

J&o 

and 

P(0 is in @i | x) = f p(9\x)d9 (18) 

Je 1 
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respectively. These posterior hypotheses prob¬ 
abilities naturally reflect both the prior 
beliefs and the data evidence about 9. An in¬ 
formed decision can now be made incorporat¬ 
ing that knowledge. For example, the posterior 
probabilities could be employed in scenario- 
generation—a tool of great importance in risk 
analysis. 


The Posterior Odds Ratio 

Although the framework outlined in the pre¬ 
vious section is generally sufficient to make an 
informed decision about the relevance of hy¬ 
potheses, we briefly discuss a somewhat more 
formal approach for Bayesian hypothesis test¬ 
ing. That approach consists of summarizing 
the posterior relevance of the two hypotheses 
into a single number—the posterior odds ra¬ 
tio. The posterior odds ratio is the ratio of the 
weighted likelihoods for the model parameters 
under the null hypothesis and under the alter¬ 
native hypothesis, multiplied by the prior odds. 
The weights are the prior parameter distribu¬ 
tions (thus, parameter uncertainty is taken into 
account). 18 

Denote the a priori probability of the null hy¬ 
pothesis by a. Then, the prior odds are the ra¬ 
tio a/( 1 — a). The posterior odds, denoted by 
PO, are simply the prior odds updated with 
the information contained in the data and are 
given by 

PO = x fL(0\x,Ho)7t(e)d0 

1 —a f L(9 | x, H\) tt( 0) dO 

where L(9 \ x, Hq) is the likelihood function 
reflecting the restrictions imposed by the null 
hypothesis and L(0\x, Hi) is the likelihood 
function under the alternative hypothesis. 

When no prior evidence in favor or against the 
null hypothesis exists, the prior odds is usually 
set equal to one. A low value of the posterior 
odds generally indicates evidence against the 
null hypothesis. 


BAYESIAN PREDICTIVE 
INFERENCE 

After performing Bayesian posterior inference 
about the parameters of the data-generating 
process, one may use the process to predict the 
realizations of the random variable ahead in 
time. The purpose of such a prediction could be 
to test the predictive power of the model (for 
example, by analyzing a metric for the distance 
between the model's predictions and the ac¬ 
tual realizations) as part of a backtesting proce¬ 
dure or to directly use it in the decision-making 
process. 

As in the case of posterior inference, pre¬ 
dictive inference provides more than sim¬ 
ply a point prediction—one has available the 
whole predictive distribution (either analyti¬ 
cally or numerically) and thus increased mod¬ 
eling flexibility. 19 The density of the predictive 
distribution is the sampling (data) distribution 
weighted by the posterior parameter density. By 
averaging out the parameter uncertainty (con¬ 
tained in the posterior), the predictive distri¬ 
bution provides a superior description of the 
model's predictive ability. In contrast, the clas¬ 
sical approach to prediction involves comput¬ 
ing point predictions or prediction intervals by 
plugging in the parameter estimates into the 
sampling density, treating those estimates as if 
they were the true parameter values. 

Denoting the sampling and the posterior den¬ 
sity by f(x | 9) and p(9 \ x), respectively, the 
predictive density one step ahead is given by 20 

f(x+i I x) = J f(x +1 1 9)p(9 | x) d<9 (20) 

where x +1 denotes the one-step-ahead realiza¬ 
tion. Notice that since we integrate (average) 
over the values of 9, the predictive distribution 
is independent of 9 and depends only on the 
past realizations of the random variable X—it 
describes the process we assume has gener¬ 
ated the data. The predictive density could be 
used to obtain a point prediction (for example, 
the predictive mean) or an interval prediction 
(similar in spirit to the Bayesian interval 
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discussed above) or to perform a hypotheses 
comparison. 


ILLUSTRATION: POSTERIOR 
TRADE-OFF AND THE 
NORMAL MEAN 
PARAMETER 

Using an illustration, we show the effects prior 
distributions have on posterior inference. For 
simplicity, we look at the case of a normal data 
distribution with a known variance, a 2 = 1. 
That is, we need to elicit a prior distribution of 
the mean parameter, //, only. We investigate the 
following prior assumptions: 

1. A noninformative, improper prior (Jeffreys' 
prior): n[pj) oc 1. 

2. A noninformative, proper prior: n{pj) = N [p, 
t 2 ), where p = 0 and r = 10 6 . 

3. An informative conjugate prior with 
subjectively determined hyperparameters: 
7r(p) = N(p, r 2 ), where p — 0.02 and r = 0.1. 

As mentioned earlier in the entry, the relative 
strengths of the prior and the sampling distribu¬ 
tion determine the degree of trade-off of prior 
and data information in the posterior. When the 
amount of available data is large, the sampling 
distribution dominates the prior in the posterior 
inference. (In the limit, as the number of obser¬ 
vations grows indefinitely, only the sampling 
distribution plays a role in determining poste¬ 
rior results. 21 ) To illustrate this sample-size ef¬ 
fect, we consider the following two samples of 
data: 

1. The monthly return on the S&P 500 stock 
index for the period January 1999 through 
December 2005 (a total of 192 returns). 

2. The monthly return on the S&P 500 stock 
index for the period January 2005 through 
December 2005 (a total of 12 returns). 

Let us denote the return data by the n x 1 
vector r = (r\, r 2 , ..., r n ), where n = 192 or 


n = 12. We assume that the sampling (data) dis¬ 
tribution is normal, R ~ N(/r, a 1 ). Combining 
the normal likelihood and the noninformative 
improper prior, we obtain for the posterior dis¬ 
tribution of jl 


p(/j.\r,a 2 


1) oc (2;r) n/2 exp 


e;=i(t 


exp 


n(n - A) 5 



( 21 ) 


where jl is the sample mean as given in (8). 
Therefore, the posterior of ji is a normal dis¬ 
tribution with mean jl and variance 1/n. As 
expected, the data completely determine the 
posterior distributions for both data samples, 
since we assumed prior ignorance about //. 

When a normal prior for ji, NO;, r 2 ), is as¬ 
serted, the posterior can be shown to be normal 
as well. In the generic case, for an arbitrary data 
variance a 1 , we have 


p (ji | r , <t 2 ) = [Ina 2 ) ”/ 2 exp I — 


£” =1 (h - nf 


x (2jrr 2 ) _ h 2 exp I — 


exp 


(h - P? 


2r 2 


(m - n*y 

2t 2 * 


( 22 ) 


where the posterior mean, p *, is 

n 1 


[1 — 11 - 


n ,1 


Y]~ 


(23) 


and the posterior variance, r 2 *, is 


T 2 * = 


— + — 
rxl I 1-2 


(24) 


Notice that the posterior mean is a weighted 
average of the sample mean, ji, and the prior 
mean, ij. The quantities l/er 2 and 1/r 2 have 
self-explanatory names: data precision and prior 
precision, respectively. The higher the preci¬ 
sion, the more concentrated the distribution 
around its mean value. 22 Let us see how the 
information trade-off between the data and the 
prior is reflected in the values of the posterior 
parameters. 
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Figure 1 Sample Size and Posterior Trade-Off 
for the Normal Mean Parameter: The Case of In¬ 
formative Prior 

In the case of the noninformative, proper 
prior, r = 10 6 . The rightmost term in (23) is 
then negligibly small and the posterior mean is 
very close to the sample mean: fi* ~ jl, while 
the posterior variance in (24) is approximately 
equal to 1 /n (substituting in a 1 = 1). That is, for 
both data samples, the noninformative proper 
prior produced posteriors almost the same as in 
the case of the noninformative improper prior, 
as expected. 

Consider how the posterior is affected when 
informativeness of the prior is increased, as in 
the third prior scenario. Figure 1 helps visu¬ 
alize the posterior trade-off for the long and 
short data samples, respectively. The smaller 
the amount of observed data, the larger the 
influence of the prior on the posterior (the 
"closer" the posterior to the prior). 


KEY POINTS 

• The degree of posterior information trade-off 
has two determinants: strength of the prior in¬ 
formation and amount of historical data avail¬ 
able. 

• Informative prior beliefs can modify substan¬ 
tially the information content of the observed 
data. 


• Informative prior elicitation most commonly 
involves two steps: selecting the form of 
the prior distribution (usually, an analytically 
convenient one) and specifying its parame¬ 
ters (the hyperparameters) to reflect the prior 
beliefs. 

• Noninformative priors help account for esti¬ 
mation uncertainty without substantially in¬ 
fluencing the posterior parameter inference. 

• A conjugate prior distribution guarantees 
that the resulting posterior distribution is of 
the same form as the prior. 

• The posterior distribution can be summarized 
with point estimates, such as posterior mean, 
posterior median, posterior standard devia¬ 
tion, and posterior quantiles, as well as inter¬ 
val estimates. 

• As in the case of posterior inference, when 
forecasting, one has available the whole pre¬ 
dictive distribution of the random variable(s). 

NOTES 

1. See Chapter 3 in Berger (1985), Chap¬ 
ter 3 in Leonard and Hsu (1999), Berger 
(1990, 2006), and Garthwaite, Kadane, and 
O'Hagan (2005), among others. 

2. The median is a measure of the center 
of a distribution alternative to the mean, 
defined as the value of the random vari¬ 
able, which divides the probability mass in 
halves. The median is the typical value the 
random variable takes. It is a more robust 
measure than the mean as it is not affected 
by the presence of extreme observations 
and, unless the distribution is symmetric, 
is not equal to the mean. 

3. A random variable, X ~ N(/x, a 2 ), is trans¬ 
formed into a standard normal random 
variable, Z ~ N(0,1), by subtracting the 
mean and dividing by its standard devia¬ 
tion: 


a 

4. Reference priors are another class of nonin¬ 
formative priors developed by Berger and 
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Bernardo (1992); see also Bernardo and 
Smith (1994). Their derivation is somewhat 
involved and applications in the field of fi¬ 
nance are rare. One exception is Aguilar and 
West (2000). 

5. Suppose a density has the form f(x — /x). 
The parameter fi is called the location pa¬ 
rameter if it only appears within the expres¬ 
sion (x — fi). The density,/, is then called 
a location density. For example, the normal 
density, N(p, a 2 *), is a location density when 
a 2 * is fixed. 

6. Suppose a density has the form y/(y )• The 
parameter a is the scale parameter. For exam¬ 
ple, the normal density, N(/x‘, a 2 ), is a scale 
density when the mean is fixed at some fi*. 

7. See Jeffreys (1961). In general, Jeffreys'prior 
of a parameter (vector), 0 , is given by 

n(0) = \I(0)\ 1/2 


where I(0) is the so-called Fisher's informa¬ 
tion matrix for 0, given by 


1(0) = -E 


/ d 2 log /(x | fl) \ 
V dOd0' ) 


and the expectation is with respect to the 
random variable X, whose density func¬ 
tion is f (x | 0). Notice that applying the 
expression for tt(0) to, for example, the nor¬ 
mal distribution, one obtains the joint prior 
jr(/x, a) oc 1/er 2 , instead of the one in (5). 
Nevertheless, Jeffreys advocated the use of 

(5) since he assumed independence of the 
location and scale parameters. 

8. The Student's f-distribution has heavier 
tails than the normal distribution. For val¬ 
ues of v less than 2, its variance is not de¬ 
fined. 

9. Technically speaking, for the parameters of 
all distributions belonging to the exponen¬ 
tial family there are conjugate prior distri¬ 
butions. 

10. Notice that /i and a 1 are not independent in 

(6) . This prior scenario is the so-called natu¬ 


ral conjugate prior scenario. Natural con¬ 
jugate priors are priors whose functional 
form is the same as the likelihood's. The 
joint prior density of p and a 2 , Jt(p, a 2 ) 
can be represented as the product of a 
conditional and a marginal density: jr(/x, 
a 2 ) = tt(p | cr 2 )n(a 2 ). If the dependence of 
the normal mean and variance is deemed in¬ 
appropriate for the particular application, it 
is possible to make them independent and 
still benefit from the convenience of their 
functional forms—by eliciting a prior for p 
as in (2). 

11. An unbiased estimator of a parameter 
0 is a function of the data (a statistic), 
whose expected value is 0. The statistics 
p and s 2 are the so-called sufficient statis¬ 
tics for the normal distribution—knowing 
them is sufficient to uniquely determine 
the normal distribution that generated the 
data. In empirical Bayesian analysis, the 
hyperparameters are usually functions of 
the sufficient statistics of the sampling 
distribution. 

12. In decision theory, loss functions are used 
to assess the impact of an action. In the con¬ 
text of parameter inference, if 6* is the true 
parameter value, the loss associated with 
employing the estimate 6 instead of 0* is 
represented by the loss function L(6*, 9). 
One approach to estimating 9 is to deter¬ 
mine the value that minimizes the expected 
resulting loss. In Bayesian analysis, we min¬ 
imize the expected posterior loss: its ex¬ 
pectation is computed with respect to 9 's 
posterior distribution. It can be shown that 
the estimate of central tendency that min¬ 
imizes the expected, posterior, squared- 
error loss function, L(9*, 9) = (9* — 9) 2 , 
is the posterior mean, while the esti¬ 
mate that minimizes the expected, poste¬ 
rior, absolute-error loss function, L(9*, 9) = 

1 9* — 9 1, is the posterior median. 

13. One example for such an approach is 
the Black-Litterman model. See Black and 
Litterman (1991). 
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14. These are the expressions for expected 
value and variance of a random variable 
with the inverted / 2 distribution. 

15. A special type of Bayesian interval is the 
highest posterior density (HPD) interval. 
It is built so as to include the values of 
6 that have the highest posterior proba¬ 
bility (the most likely values). When the 
posterior is symmetric and has a single 
peak (is unimodal), credible and HPD inter¬ 
vals coincide. With very skewed posterior 
distributions, however, the two intervals 
look very different. A disadvantage of 
HPD intervals is that they could be dis¬ 
joint when the posterior has more than one 
peak (is multimodal). In unimodal settings, 
the Bayesian HPD interval obtained under 
the assumptions of a noninformative prior 
corresponds to the classical confidence 
interval. 

16. In this section, we emphasize a practical ap¬ 
proach to Bayesian hypothesis testing. For a 
rigorous description of Bayesian hypothesis 
testing, see, for example, Zellner (1971). 

17. In the classical setting, the decision to reject 
or not the null hypothesis is made on the 
basis of the realization of a test statistic—a 
function of the data—whose distribution is 
known. The p-value of the hypothesis test 
is the probability of obtaining a value of the 
statistic as extreme or more extreme than 
the one observed. The p-value is compared 
to the test's significance level, which repre¬ 
sents the predetermined probability of re¬ 
jecting the null hypothesis falsely. If the 
p-value is sufficiently small (smaller than 
the significance level), the null hypothesis 
is rejected. The p-value is often mistakenly 
given the interpretation of a posterior prob¬ 
ability of the null hypothesis. It has been 
suggested that a low p-value, interpreted by 
many as strong evidence against the null 
hypothesis, could be in fact quite a mis¬ 
leading signal about evidence strength. See, 
for example, Berger (1985) and Stambaugh 
(1999). 


18. The posterior odds ratio bears similarity to 
the likelihood ratio which is at the center 
of most frequentist hypothesis tests. As its 
name suggests, the likelihood ratio is the 
ratio of the (maximized) likelihoods under 
the null and the alternative hypotheses. 

19. The predictive density is usually of known 
(closed) form under conjugate prior as¬ 
sumptions. 

20. Here, we assume that 0 is continuous, 
which is the case in most financial appli¬ 
cations. 

21. This statement is valid only if one assumes 
that the data-generating process remains 
unchanged through time. 

22. The posterior mean is an example for the 
shrinkage effect that combining prior and 
data information has. 
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Abstract: Linear regression is the "workhorse" of financial modeling. Cornerstone applications, 
such as asset pricing models, as well as time series models, are built around linear regression's 
methods and tools. Casting the linear regression methodology in a Bayesian setting helps account 
for estimation uncertainty, allows for integration of prior information, and makes accessible the 
Bayesian numerical simulation framework. 


In this entry, we lay the foundations of 
Bayesian linear regression estimation. We start 
with a univariate model with Gaussian in¬ 
novations and consider two cases for prior 
distributional assumptions—diffuse and infor¬ 
mative. Then, we show how one could in¬ 
corporate knowledge that the sample is not 
homogeneous with respect to the variance, 
for example, due to a structural break. Fi¬ 
nally, multivariate regression estimation is 
discussed. 


THE UNIVARIATE LINEAR 
REGRESSION MODEL 

The univariate linear regression model at¬ 
tempts to explain the variability in one variable 
(called the dependent variable) with the help 
of one or more other variables (called explana¬ 
tory or independent variables) by asserting a 
linear relationship between them. We write the 
model as 

Y = a + p iXj + P2X2 + ■ ■ ■ Pk-iXk-i + e ( 1 ) 
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where 

Y = dependent variable; 

X k — independent (explanatory) variables, 
k = 1, ...,K -1; 
a — regression intercept; 
ftk — regression (slope) coefficients, k = 
1 ,,K — 1, representing the effect a 
unit change in X/,, k = 1,..., K — 1, has 
on Y, keeping the remaining indepen¬ 
dent variables, X ; , j / /c, fixed; 
c = regression disturbance. 

The regression disturbance is the source of 
randomness about the linear (deterministic) re¬ 
lationship between the dependent and inde¬ 
pendent variables. Whereas a + /h X! + ••• + 
( J >K-\ Xk-, represents the part of Y's variability 
explained by X k , k = 1,..., K — 1, e represents 
the portion of Y's variability left unexplained. It 
is usually assumed that the independent vari¬ 
ables are fixed (nonstochastic). 

Suppose that we have n observations of the 
dependent and the independent variables avail¬ 
able. These data are then described by 

\ji — a + PiXij + ■ ■ ■ + Pk-iXk-ij + e; 

i = l,...,n (2) 

The subscript i,i = 1 ,,n, refers to the zth ob¬ 
servation of the respective random variable. To 
describe the source of randomness, e, one needs 
to make a distributional assumption about it. 
For simplicity, assume that €,, i = '\,, n, are 
independently and identically distributed (IID) 
with the normal distribution and have zero 
means and (equal) variances, a 2 . Then, the de¬ 
pendent variable, Y, has a normal distribution 
as well. 


iji ~ N(q/,a 2 ) (3) 


where /x, = a + Pix u H-h Pk-iXk-u- No¬ 

tice that the constant-variance assumption in (3) 
is quite restrictive. We come back to this issue 
later in the entry. 


The expression in (2) is often written in the 
following compact form 


V — X/3 + e 

where y is a n x 1 vector, 

/yi \ 

V2 


y= 


/3 is a ( K ) x 1 vector. 


\y>i ) 


( 4 ) 


( a \ 

Pi 

\Pk-i J 

X is an n x (K) matrix whose first column con¬ 
sists of ones. 


X = 


n 

i 


*i,i 

* 1,2 


—1,1^ 
Xk- 1,2 


\1 X\ n • • • XK—l.n J 


and e is an n x 1 vector, 

h \ 

€2 

€ = 

\e„ / 

We write the normal distributional assump¬ 
tion for the regression disturbances in compact 
form as 


e ~ N(0, ct 2 J„) 


where I„ is an (n x n ) identity matrix. The pa¬ 
rameters in (4) we need to estimate are p and a 2 . 
Assuming normally distributed disturbances, 
we write the likelihood function for the model 
parameters as 

L(a, Pi,. ..p K -i, o\y,X) 

= (2jrer 2 ) - " /2 exp j ^(i h - a 

l CT i=l 

— PlXl.i — • • • — Pk-iXk-1 ,/) 2 | 
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Or, in matrix notation, we have the likelihood 
function for the parameters of a multivariate 
normal distribution. 


L(P, a \y,X) = (2 jt a 2 )~ n ^ 2 


x exp 


-—(y-xp)'{y-xp) 


( 5 ) 


Bayesian Estimation of the 
Univariate Regression Model 

In the classical setting, the regression parame¬ 
ters are usually estimated by maximizing the 
model's likelihood with respect to fi and a 2 , 
for instance, the likelihood in (5) if the normal 
distribution is assumed. When disturbances are 
assumed to be normally distributed, the maxi¬ 
mum likelihood and the ordinary least squares 
(OLS) methods produce identical parameter es¬ 
timates. It can be shown that the OLS estima¬ 
tor of the regression coefficients vector, p, is 
given by 

'P = (X r X)~ 1 X'y (6) 

where the prime symbol (') denotes a matrix 
transpose. 1 The estimator of a 2 is 2 

Z 2 =^(y-xp)\y-xp) (7) 

To account for the parameters' estimation risk 
and to incorporate prior information, regres¬ 
sion estimation can be cast in a Bayesian setting. 
We consider two prior scenarios—a diffuse im¬ 
proper prior and an informative conjugate prior 
for the regression parameter vector, (P, cr 2 ). 


Diffuse Improper Prior 

The joint diffuse improper prior for fi and cr 2 is 
given by 

Tt{f,a 2 ) rx\ ( 8 ) 

where the regression coefficients can take any 
real value, —oo < /6/ c < oo, for k = 1,..., K, 


and the disturbance variance is positive, 

< 7 2 > 0 . 

Combining the likelihood in (5) and the prior 
above, we obtain the posteriors of the model 
parameters as follows: 

* The posterior distribution of fi conditional on 
a 2 is (multivariate) normal: 

p(P\y, X, a 2 ) = N 0, (X'X)-V) (9) 

where P is the OLS estimate in (6) and 
(X'Xj-V 2 is the covariance matrix of fi. 

• The posterior distribution of a 2 is inverted- 
X 2 : 

V l 0 ’ 2 I y> x ) = Inv-/ 2 (n - X, ct 2 ) (10) 

where a 2 is the estimator of a 2 in (7). 

It could be useful to obtain the marginal (un¬ 
conditional) distribution of P in order to char¬ 
acterize it independently of cr 2 (as in practical 
applications, the variance is an unknown pa¬ 
rameter). 3 It can be shown, by integrating the 
joint posterior distribution 

V (P> ° 2 I V’ X ) = P {P I 1/- cr 2 ) p (cr 2 | y, X) 

with respect to a 2 , that P's unconditional pos¬ 
terior distribution is a multivariate Student's t 
distribution with a kernel given by 4 

/ x'X \ ~ n i 2 

P(P\y,X)<x [{n -K) + (P- 'Pf-^riP - P)j 

( 11 ) 

Notice that integrating a 2 out makes P's distri¬ 
bution more heavy-tailed, duly reflecting the 
uncertainty about a 2 's true value. Although 
P’s mean vector is unchanged, its variance in¬ 
creased (on average) by the term v/(v — 2): 

^,8 = ^( X ' X ) -1 V —~ 

v — 2 

where v — n — K is the degrees of freedom 
parameter of the multivariate Student's t 
distribution above. 

In conclusion of our discussion of the pos¬ 
teriors in the diffuse improper prior scenario, 
suppose we are interested particularly in one 
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of the regression coefficients, say fa. For exam¬ 
ple, fa could be the return on a factor (size, 
value, momentum, etc.) in a multifactor model 
of stock returns. It can be shown that the stan¬ 
dardized fa has a Student's t distribution with 
n — K degrees of freedom as its marginal pos¬ 
terior distribution. 


fa — fa 


I y,x 


tn—K 


( 12 ) 


where hk,k is the kth diagonal element of 
A 2 (X'X) 1 and fa is the OLS estimate of fa (the 
corresponding component of fa. Bayesian inter¬ 
vals for fa can then be constructed analytically. 


Informative Prior 

Under the normality assumption for the regres¬ 
sion errors in (4), one can make use of the natu¬ 
ral conjugate framework to reflect the existing 
prior knowledge and to obtain convenient ana¬ 
lytical posterior results. Thus, let us assume that 
the regression coefficients vector, fa has a nor¬ 
mal prior distribution (conditional on a 2 ) and 
er 2 —an inverted-/ 2 prior distribution: 

P I ff ~ N(A>, a 2 A) (13) 

and 

a 2 ~ Inv-x 2 (v 0 , cl) (14) 

Four parameters have to be determined a pri¬ 
ori: fa, A, Vo, and c 2 . The scale matrix A is of¬ 
ten chosen to be r -1 (X'X) -1 in order to obtain 
a prior covariance the same as the covariance 
matrix of the OLS estimator of /S up to a scal¬ 
ing constant. Varying the (scale) parameter, r, 
allows one to adjust the degree of confidence 
one has that fas mean is fa —the smaller the 
value of r, the greater the degree of uncertainty 
about fa 

The easiest way to assert the prior mean, fa, 
is to fix it at some default value (such as 0, de¬ 
pending on the estimation context), unless more 
specific prior information is available, or to set 
it equal to the OLS estimate, fa obtained from 


running the regression (4) on a prior sample 
of data. 5 

The parameters of the inverted-/ 2 distribu¬ 
tion could be asserted using a prior sample of 
data as follows: 

v 0 = n 0 - K 

c l = — (M) - x ofa)\y 0 - x o %) 

where the subscript, 0, refers to the prior data 
sample. If no prior data sample is available, the 
inverted-/ 2 hyperparameters could be speci¬ 
fied indirectly, by expressing beliefs about the 
prior mean and variance of cr 2 . 6 

The posterior distributions for the model pa¬ 
rameters, and a 2 have the same form as the 
prior distributions, however, their parameters 
are updated to reflect the data information, 
along with the prior beliefs. 

• The posterior for (3 is 

p(fa y,X.a 2 ) = N(fa,Hp) (15) 

where the posterior mean and covariance ma¬ 
trix of /3 are given by 

fa = (A” 1 + X'X) -1 (A" 1 A, + X'Xp) 

(16) 

and 

£/s = a 2 (A- 1 + X'X)' 1 (17) 

We can observe that the posterior mean is a 
weighted average of the prior mean and the 
OLS estimator of fa as noted earlier in the 
entry as well. 7 

* The inverted-/ 2 posterior distribution of cr 2 
is 

V (° 2 I y. X) = Inv-/ 2 (v*, c 2 *) (18) 

The parameters of fi 2 's posterior distribution 
are given by 

v* = vo + n (19) 

and 

v*c 2 *= (n -K)a 2 + (fa - fa'H(fa-fa+v 0 c 2 

( 20 ) 

where H = ((X'X) -1 + A) 1 
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As earlier, we can derive the marginal pos¬ 
terior distribution of P by integrating a 2 out 
of the joint posterior distribution. We obtain 
again a multivariate Student's f distribution, 
t(v*,P*, Q), 

p (p i y, x) oc (v* + (p- p*y Q(p - n)~ vV2 

( 21 ) 

where Q = (A -1 + X'X) /c 2 * 

The mean of ft remains the same, ft* (as it 
is independent of a 2 ), while its unconditional 
(with respect to a 2 ) covariance matrix is equal 
to Q~ l v*/(v* — 2). The marginal posterior dis¬ 
tribution for a single regression coefficient, /ft, 
can be shown to be 

& k ~ ft | Y _ , pox 

(<h,it ) 1/2 V ’ X V ° +n ~ K 22 

where qk,k is the /cth diagonal element of Q 1 
and f}£ is the /cth component of /l*. 


Prediction 

Suppose that we would like to predict the 
dependent variable, Y, p steps ahead in 
time and denote by the p x 1 vector 1/ = 
(yr+i, yr+ 2 , ■ ■ ■, yr+p) these future observa¬ 
tions. We assume that the future observations 
of the independent variables are known and 
given by X. The predictive density in the linear 
regression context can be expressed as, 8 

p(y\ y, X, X) = JJ p(y \p,a 2 ,X) 
x P(£, cr 2 | y, X) dp, a 2 (23) 

where p(P, a 2 \ y, X) is the joint posterior distri¬ 
bution of P and a 2 . 

It can be shown that the predictive distri¬ 
bution is multivariate Student's f. Under the 
diffuse improper prior scenario, the predictive 
distribution is 

p(y\y,X,X) = t(n-K, X% S) (24) 

where S = a 2 (t p + X(X , X) _1 X / ) and p is the 
posterior mean of P under the diffuse improper 
scenario. In the case of the informative prior. 


the predictive distribution of 1/ is 


p(y\ y, X, X) = f(v 0 + n, X p*, V) (25) 


where V = c 2 *(I p + X(A _1 + X'Xj-'x') and p* 
is the posterior mean of P in (16). 

Certainly, it is again possible to derive the dis¬ 
tribution for the predictive distribution for a 
single component of y —a univariate Student's t 
distribution—in the two scenarios, respectively. 


Vk 


~^Pk 


i /2 

b k,k 


tn- 


(26) 


where X k is the /cth row of X (the observations 
of the independent variables pertaining to the 
/cth future period), and Sk,k is the /cth diagonal 
element of the scale matrix, S, in (24), and 


Vk 


X k p* k 


1/2 

V kk 


v 0 +n-K 


(27) 


where Vk,k is the /cth diagonal element of the 
scale matrix, V, in (25). 


The Case of Unequal Variances 
We mentioned earlier in the entry that the equal- 
variance assumption in (3) might be somewhat 
restrictive. Two examples would help clarify 
what that means. First, suppose that the n 
observations of Y are collected through time. 
It is a common practice in statistical estima¬ 
tion to use the longest available data record, 
likely spanning many years. Changes in the un¬ 
derlying economic or financial paradigms, the 
way data are recorded, and so on, that might 
have occurred during the sample period might 
have caused the variance of the random vari¬ 
able (as well as its mean, for that matter) to 
shift. 9 The equal-variance assumption would 
then lead to variance overestimation in the low- 
variance period(s) and variance underestima¬ 
tion in the high-variance period(s). When the 
variance (and/or mean) shifts permanently, the 
so-called "structural-break" models can be em¬ 
ployed to reflect it. 10 
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Second, if our estimation problem is based on 
observations recorded at a particular point in 
time (producing a cross-sectional sample), the 
equal-variance assumption might be violated 
again. All units in our sample could potentially 
have different variances, so that var(y,) = of, 
instead of var(y,) = cr 2 as in (3), for i = 1,... , n. 
Estimation would then be severely hampered 
because this would imply a greater number of 
unknown parameters (variances and regression 
coefficients) than available data points. 

In practice one would perhaps be able to iden¬ 
tify groups of homogeneous sample units that 
can be assumed to have equal variances. Sup¬ 
pose, for instance, that the cross-sectional sam¬ 
ple consists of small-cap and large-cap stock 
returns. One could then expect that the return 
variances (volatilities) across the two groups 
differ but assume that companies within each 
group have equal return volatilities. More gen¬ 
erally, one could assume some form of func¬ 
tional relation among the unknown variances— 
this would serve to reduce the number of un¬ 
known parameters to estimate. We now pro¬ 
vide one possible way to address the variance 
inequality in the case when the sample obser¬ 
vations can be divided into two homogeneous 
(with respect to their variances) groups or when 
a structural break (whose timing we know) is 
present in the sample. 11 

Denote the observations from the two 
groups by y x = (y u , yi, 2 ,. .., yi, ni ) and y 2 = 
( 3 / 2 , 1 , 1 / 2 , 2 , • • ■, y 2 , IJ2 ),sothat y= (y 1; y 2 )and tii + 
n 2 = n. The univariate regression setup in (1) is 
modified as 

3/i = XiP + €\ 

y 2 = X 2 fi + e 2 (28) 

where X\ and X 2 are, respectively, (h, x K) 
and (n 2 x K) matrices of observations of the in¬ 
dependent variables. The disturbances are as¬ 
sumed to be independent and distributed as 

<fi ~ N(0, off,,!) 

e 2 ~ N(0, erf f„ 2 ) (29) 


where erf ^ erf. The likelihood function for the 
model parameters, P, erf, and erf is given by 

L (£, of, erf | y,X 1 , X 2 ) oc (erf)'^ (erf)'? 

X eXP ( _ 2o^ yi ~~ Xl ^ yi ~ Xl ^ 
--L(y 2 -X 2 py(y 2 -X 2 l3)j (30) 

A noninformative diffuse prior can be as¬ 
serted, as in (3.5), by assuming that the parame¬ 
ters are independent. The prior is written, then, 
as 

1 

7r(/S, g\, <r 2 ) oc - 

o r ier 2 

It is straightforward to write out the joint 
posterior density of P, erf, and erf, which can 
be integrated with respect to the two vari¬ 
ances to obtain the marginal posterior distribu¬ 
tion of the regression coefficients vector. Zellner 
(1971) shows that the marginal posterior of P 
is the product of two multivariate Student's f 
densities. 

p{P I JA %i, X 2 ) oc t(vi, Pi, Si) x t(y 2 , P 2 , S 2 ) 

where, for i = 1,2, /l, is the OLS estimator of p in 
the two expressions in (28) viewed as separate 
regressions, 

Vi = m - K, Si=^ 2 (X'iXi) 

and 

sf = - Xi%Y(yi - XiPi). 

Zellner shows that the marginal posterior of 
P above can be approximated with a normal 
distribution (through a series of asymptotic 
expansions). 

Illustration: The Univariate Linear 
Regression Model 

We now provide an example to illustrate the 
posterior and predictive inference in a univari¬ 
ate linear regression model. We restrict our 
attention to the diffuse noninformative prior 
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and the informative prior discussed above, 
in order to take advantage of their analytical 
convenience. 12 

Our data consist of the monthly returns on 
25 portfolios; the companies in each portfolio 
are ranked according to market capitalization 
and book-to-market (BM) ratios. The returns we 
use for model estimation span the period from 
January 1995 to December 2005 (a total of 132 
time periods). We extract the factors that best 
explain the variability of returns of the 25 port¬ 
folios using principal components analysis. The 
first five factors explain around 95% of the vari¬ 
ability and we use their returns as the indepen¬ 
dent variables in our linear regression model, 
making up the matrix X (the first column is a 
column of ones). The return on the portfolio 
consisting of the companies with the smallest 
size and BM ratios is the dependent variable, 
y. In addition, returns recorded for the months 
from January 1990 to December 1994 (a total 


of 60 time periods) are employed to compute 
the hyperparameters of the informative prior 
distributions, in the manner explained in the 
previous section. Our interest centers primar¬ 
ily on the posterior inference for the regression 
coefficients, fa, k = 1 ,..., 6—the intercept and 
the five factor exposures (in the terminology of 
multifactor models). 

Posterior Distributions 

The prior and posterior parameter values for fi 
are given in Table 1. Part A of the table presents 
the results under the diffuse improper prior 
assumption and Part B under the informative 
prior assumption. In parentheses are the poste¬ 
rior standard deviations of the regression coeffi¬ 
cients. 13 The OLS estimates of the regression co¬ 
efficients are given by the posterior means in the 
diffuse prior scenario. Notice how the posterior 
mean of /3 under the informative prior is shrunk 


Table 1 Posterior Inference for ft 




Pi 

p 2 

Pi 

Pi 

p s 

p- 



Intercept 

Factor 1 

Factor 2 

Factor 3 

Factor 4 

Factor 5 

A. 

Prior Mean 



- 

- 

- 

- 


Posterior Mean 

0.0048 

-0.3108 

-0.3997 

0.0648 

-0.4132 

-0.0042 


Posterior Standard 
Deviation 

(0.0011) 

(0.0048) 

(0.0103) 

(0.0202) 

(0.0297) 

(0.0410) 


bo.oi 

0.0021 

-0.3219 

-0.4238 

0.0174 

-0.4826 

-0.1000 


ko.05 

0.0029 

-0.3187 

-0.4168 

0.0312 

-0.4624 

-0.0721 


bo.25 

0.0040 

-0.314 

-0.4067 

0.0511 

-0.4333 

-0.0319 


bo.75 

0.0055 

-0.3075 

-0.3928 

0.0784 

-0.3931 

0.0235 


bo.95 

0.0067 

-0.3029 

-0.3827 

0.0983 

-0.364 

0.0636 


bo.99 

0.0075 

-0.2996 

-0.3757 

0.1121 

-0.3438 

0.0915 


Prior Mean 

0.0037 

-0.2952 

-0.4217 

0.038 

-0.2784 

0.1063 

Posterior Mean 

0.0042 

-0.303 

-0.4107 

0.0514 

-0.3458 

0.0510 

Posterior Standard 
Deviation 

(0.0008) 

(0.0033) 

(0.0072) 

(0.0142) 

(0.0208) 

(0.0287) 

bo.oi 

0.0024 

-0.3108 

-0.4276 

0.0182 

-0.3945 

-0.0162 

bo.os 

0.0029 

-0.3085 

-0.4226 

0.0280 

-0.3801 

0.0038 

bo.25 

0.0037 

-0.3052 

-0.4156 

0.0418 

-0.3598 

0.0318 

bo.75 

0.0048 

-0.3007 

-0.4059 

0.0609 

-0.3318 

0.0703 

bo.95 

0.0056 

-0.2975 

-0.3986 

0.0747 

-0.3115 

0.0983 

bo.99 

0.0061 

-0.2952 

-0.3939 

0.0844 

-0.2972 

0.1180 


Notes: Part A contains posterior results under the diffuse improper 
prior; Part B contains posterior results under the informative prior. 
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Figure 1 Posterior Densities of ft(, under the Two Prior Scenarios 

Notes: The plot on the left refers to the diffuse improper prior; the plot on the right—to the informative 
prior. 


away from the the OLS estimate and towards 
the prior value, for the chosen value of r = 1. 
We could introduce more uncertainty into the 
prior distribution of ft (make it less informative) 
by choosing a smaller value of r—the posterior 
mean of ft would then be closer to the OLS esti¬ 
mate. Conversely, the stronger our prior belief 
about the mean of ft, the closer the posterior 
mean would be to the prior mean. 

Credible Intervals 

Since the marginal posterior distribution of ftk, 
k = 1,..., 6, is of known form (Student's f), we 
can compute analytically the Bayesian confi¬ 
dence intervals for the regression coefficients. 
We provide the values of several quantiles of 
the posterior distribution of each ft For ex¬ 
ample, under the diffuse improper prior, the 
95% (symmetric) Bayesian interval for ft 2 is 
(—0.3187, —0.3029), while, under the informa¬ 
tive prior, the 99% (symmetric) Bayesian inter¬ 
val for ft(, is (-0.0162, 0.1180). 14 

Hypothesis Comparison 

In the frequentist regression tradition, testing 
the significance of the regression coefficients 
is of great interest—the validity of the null 
hypothesis ftk = 0 is examined. In the Bayesian 


setting, we could evaluate and compare the 
posterior probabilities, P(ftk > 0 | y, X) and 
P(ftk < 0 | y, X) (given in Table 1 for each 
factor exposure). We could safely conclude 
that the exposures on Factor 1 through Factor 
4 are different from zero—the mass of their 
posterior distributions is concentrated on either 
positive or negative values. For the exposure 
on Factor 5, the picture is less than clear-cut. 
Under the diffuse, improper prior, a bit over 
50% of the posterior mass is below zero and 
the rest—above zero. Therefore, one would 
perhaps take the pertinence of this factor for 
explaining the variability of the return on the 
small-cap/small-BM portfolio with a grain 
of salt. Notice, however, how the situation 
changes in the informative-prior case. More 
than 95% of the posterior mass is above zero. 
The strong prior beliefs about a positive mean 
of ft(, lead to the conclusion that the exposure 
of the portfolio returns to Factor 5 is not zero. 
Figure 1 further illustrates these observations. 

THE MULTIVARIATE LINEAR 
REGRESSION MODEL 

Quite often in finance, and especially in invest¬ 
ment management, one is faced with modeling 
data consisting of many assets whose returns 
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or other attributes are not independent. Casting 
the problem in a multivariate framework is one 
way to tackle dependencies between assets. 15 In 
this section, we outline the basics of multivari¬ 
ate regression estimation within the Bayesian 
setting. 16 

Suppose that T observations are available on 
N dependent variables. We arrange these in the 
T x N matrix, y, 


( y\ \ 

Vt 

\VtJ 


'Vi 

yi.2 

• yiV 

yu 

yt, 2 

• • ■ yt,N 

\yr,i 

yr ,2 

■ ■ ■ yr.N/ 


The multivariate linear regression is written as 


y=XB + U (31) 


where 

X = T x K matrix of observations of the K 
independent variables. 



' X\ ' 


^1,1 

Xl,2 

... *1,0 

x = 

X t 

= 

Xt, 1 

X t , 2 

■ • • Xt,K 


K x t) 


\ X T,1 

Xt, 2 

■ ■ ■ Xj k ) 


B — K x N matrix of regression coefficients. 



/ a \ 


/ ai 

a 2 

a N ^ 

B = 

Pi 

= 

Pi,i 

Pi,2 

• Pl,N 


\PkJ \Pk,1 Pk, 2 Pk,N/ 


U = T x N matrix of regression disturbances. 


( Ml ^ 


^Ml,l 

Ml,2 

• • ■ Ul.N^ 

Mf 

= 

Mf,l 

Ik,2 

Ut,N 

\ M V 


\ U T,1 

Mr,2 

... Ut,N/ 


The first column of X usually consists of ones 
to reflect the presence of an intercept. In the 
multivariate setting, the usual linear regres¬ 
sion assumption that the disturbances are IID 


means that each row of U is an independent 
realization from the same N-dimensional mul¬ 
tivariate distribution. We assume that this dis¬ 
tribution is multivariate normal with zero mean 
and covariance matrix, E, 

u f ~ N(0, E) (32) 

for t = 1, ..., T. The off-diagonal elements of E 
are nonzero, as we assume the dependent vari¬ 
ables are correlated, and the covariance matrix 
contains N variances and N( N — l)/2 distinct 
covariances. 

Using the expression for the density of the 
multivariate normal distribution, we write the 
likelihood function for the unknown model pa¬ 
rameters, B and E, as 17 

L(B, E | Y, X) oc |E|- t/2 

where | E | is the determinant of the covariance 
matrix. We now turn to specifying the prior dis¬ 
tributional assumptions for B and E. 


Diffuse Improper Prior 

The lack of specific prior knowledge about the 
elements of B and E can be reflected by employ¬ 
ing the Jeffreys' prior, which, in the multivariate 
setting, takes the form 18 

N+l 

tt(B, E) cx |E|—2- (34) 

The posterior distributions parallel those in the 
univariate case. With the risk of stating the ob¬ 
vious, note that B is a random matrix; therefore, 
its posterior distribution, conditional on E, will 
be a generalization of the multivariate normal 
posterior distribution in (9). To describe it, we 
first vectorize (expand column-wise) the ma¬ 
trix of regression coefficients, B, and denote the 
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resulting K N x 1 vector by /l. 


P = vec(B) = 


ft 


v&/ 


by stacking vertically the columns of B'. It can 
be shown that P's posterior distribution, con¬ 
ditional on E, is a multivariate normal given 
by 

p (p I y, X, E) = N (p, E ® (X'X)- 1 ) (35) 

where p — vec(B) = vec ((X'X) 1 (X'Y)) is the 
vectorized OLS estimator of B and "<g>" denotes 
the Kronecker product. 19 

The posterior distribution of E can be shown 
to be the inverted-Wishart distribution (the 
multivariate analog of the inverted-gamma 
distribution). 


p(E|y,X) = IW(v*,S) (36) 

where the degrees of freedom parameter is v* = 
T — K + N + 1 and the scale matrix is S = (Y — 
XB)'(Y - XB). 

A full Bayesian informative prior approach to 
estimation of the multivariate linear regression 
model would involve specifying a prior distri¬ 
bution for the regression coefficients, P, and the 
covariance matrix, E. The conjugate prior sce¬ 
nario is invariably the scenario of choice, so as 
to keep the estimation within analytically man¬ 
ageable boundaries. That scenario consists of a 
multivariate normal prior for p and inverted- 
Wishart for E. 20 


KEY POINTS 

• To account for estimation risk and to incorpo¬ 
rate prior information, regression estimation 
can be cast in a Bayesian setting. 

• Depending on the amount of prior informa¬ 
tion, diffuse or informative priors can be se¬ 
lected for the regression parameters. 

• Under the assumption that the regression in¬ 
novations are distributed with the normal 


distribution, the natural conjugate priors for 
the regression coefficients and variance are 
Gaussian and inverted-/ 2 distributions, re¬ 
spectively. 

• The case of unequal variances is easily incor¬ 
porated into the linear regression. Unequal 
variances may be due to reasons such as struc¬ 
tural breaks in time series data or nonhomo¬ 
geneity in cross-sectional data. 

NOTES 

1. In order for the inverse matrix in (6) to exist, 
it is necessary that X'X be nonsingular, that 
is, that the n x K matrix X have a rank K 
(all its columns be linearly independent). 

2. The MLE of a 1 is in fact 

°MLE = l(y-xp)'(y-xp) 

However, as it is not unbiased, the estimator 
in (7) is more often employed. 

3. In fact, it is possible to describe fully the 
distribution of ft even without knowing its 
unconditional distribution, by employing a 
numerical simulation method such as the 
Gibbs sampler, for example, and making in¬ 
ferences on the basis of samples drawn from 
P's and a 2 's posterior distributions. 

4. We denote the multivariate scaled, noncen¬ 
tral Student's t distribution with degrees of 
freedom v, location parameter vector /i, and 
scale matrix S by f(v, /i. S). Its mean and co- 
variance matrix are given, respectively, by 
p and S -1 v/(v — 2). 

5. There are two contrasting approaches to 
prior parameter assertion. The full Bayesian 
approach calls for specifying the hyper¬ 
prior parameters independently of the data 
used for model estimation. The empirical 
Bayesian approach would use the OLS es¬ 
timate, p, obtained from the data sample 
used for estimation, as the value for the hy¬ 
perprior parameter. 

6. The mean and variance of a random vari¬ 
able X distributed with the inverted-/ 2 dis¬ 
tribution with parameters v and c are given. 
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respectively, by 

V 2V 2 r, 

£(X)=- -c imr(X) = - - —27 -rrC 2 

v — 2 (v — 2) z (v — 4) 

7. See Chapter 6 in Rachev et al. (2008) for 
more details on this shrinkage effect. 

8 . Denoting the sampling and posterior densi¬ 
ties by f(x | 6) and p(6 \ x), respectively, the 
predictive density one step ahead is defined 
as 

f(x+i\x) = J f(x +1 \9)p(6\x)d0 

where x is the observed data, 0 is the sam¬ 
pling distribution's parameter, and x + i de¬ 
notes the one-step-ahead realization. 

9. Returns on interest-rate instruments and 
foreign exchange are particularly likely to 
exhibit structural breaks. 

10. See, for example, Wang and Zivot (2000). 
Chapter 11 in Rachev et al. (2008) discusses 
the so-called "regime switching" models, in 
which parameters are allowed to change 
values according to the state of the world 
prevailing in a particular period in time. 

11. See Chapter 4 in Zellner (1971). 

12. See Chapter 5 in Rachev et al. (2008) for 
details on how to employ numerical simu¬ 
lation methods to tackle inference when no 
analytical results are available. 

13. The standard deviation of the univariate 
Student's t distribution with degrees-of- 
freedom parameter v and scale parameter 
cr is given by ojv/{v — 2 ). 

14. Notice that, since the Student's f distribu¬ 
tion is unimodal, these (symmetric) inter¬ 
vals are also the highest posterior density 
intervals. 

15. Although the multivariate normal distri¬ 
bution is usually assumed because of its 
analytical tractability, dependencies among 
asset returns could be somewhat more 
complex than what the class of elliptical 
distributions (to which the normal distribu¬ 
tion belongs) is able to describe. Alternative 
distributional assumptions could be made 


at the expense of analytical convenience 
and occasional substantial estimation prob¬ 
lems (especially in high-dimensional set¬ 
tings). A more flexible way of dependence 
modeling is provided through the use of 
copulas. Some types of copulas could also 
suffer from estimation problems, especially 
in large-scale applications. 

16. For applications to portfolio construction, 
see Chapters 6 through 9 in Rachev et al. 
(2008). 

17. The expression in the exponent in (33) could 
also be written as 

\ 

—-tr(Y - XB)'(Y - XB)2T\ 

where "tr" denotes the trace operator, 
which sums the diagonal elements of a 
square matrix. 

18. As in the univariate case, we assume in¬ 
dependence between (the elements of) B 
and E. 

19. The Kronecker product is an operator for 
direct multiplication of matrices (which are 
not necessarily compatible). For two matri¬ 
ces, A of size m x n and B of size p x q, the 
Kronecker product is defined as 


Cl\\B ^ 1 , 2 ^ 

. . U \ n B 

CL m ^ 2 ^ 

• • fim,n B 


resulting in an mp x nq block matrix. 

20. See Chapters 6 and 7 of Rachev et al. (2008) 
for further details in the context of portfolio 
selection. 
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Abstract: Empirical evidence abounds that asset returns exhibit characteristics such as volatility 
clustering, asymmetry, and heavy-tailedness. Volatility clustering describes the tendency of returns 
to alternate between periods of high volatility and low volatility. In addition, volatility responds 
asymmetrically to positive and negative return shocks—it tends to be higher when the market 
falls than when it rises. The nonconstancy of volatility has been suggested as an underlying rea¬ 
son for returns' fat tails. Volatility models attempt to systematically explain these stylized facts 
about asset returns. The Bayesian methodology offers distinct advantages over the classical frame¬ 
work in estimating volatility models. Parameter restrictions, such as stationarity restriction, are 
notoriously difficult to handle within the frequentist setting and straightforward to implement in 
the Bayesian one. The MCMC numerical simulation methods facilitate greatly the estimation of 
complex volatility models, such as Markov-switching volatility models. 


Generalized autoregressive conditional het- 
eroskedastic (GARCH) models are used in finan¬ 
cial modeling to provide a measure of volatility 
that could be employed in portfolio selection, 
risk management, and derivatives pricing. In 
this entry, we focus on the Bayesian treatment of 


GARCH model estimation. Our discussion of 
prior distributions' choice and posterior anal¬ 
ysis is developed around an example where 
the data are assumed to follow the Student's t 
distribution. We then introduce a Bayesian ap¬ 
proach to Markov-switching GARCH models 
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and explain in detail the steps one could use to 
estimate this important extension of the simple 
GARCH model. 


BAYESIAN ESTIMATION OF 
THE GARCH(1,1) MODEL 

Volatility is a forward-looking concept. It is the 
variance of the yet unrealized asset return, con¬ 
ditional on all relevant, available information. 
Denote by rt the asset return at time t and by 
Ff_i the set of information available up to time 
t — 1. The information set includes, for example, 
past asset returns and past trading volume. The 
return's dynamics can be described as follows: 

r t = /h|f-i + ( 1 ) 

where 

• is the return's conditional expectation 
at time f, 

• <7f|f_i is the return's conditional volatility at 
time f, 

• €t is a white noise process (a sequence of 
independent and identically distributed ran¬ 
dom variables with zero mean and variance 
of one). 

The aim of volatility models is to specify the 
dynamics of 0t\t-i- Autoregressive conditional 
heteroskedastic (ARCH)-type models describe 
the conditional volatility at time f as a determin¬ 
istic function of (attribute of) past squared re¬ 
turns. That is, volatility at time t can be uniquely 
determined at time t — 1. The volatility up¬ 
dating expression of a GARCH(1,1) process is 
given by 

( 2 ) 

where Ut = <Jt\t-\£t- The model parameters are 
restricted to be nonnegative —to > 0, a > 0, and 
j> > 0—in order to ensure that the conditional 
variance is positive for all values of the white 
noise process, e f . Additionally, the requirement 
for stationarity imposes the constraint that the 
sum a + f) is smaller than one. 


Estimation of the model parameters is usually 
performed by likelihood maximization. Since 
the return at time f, r t , depends on o>p_i and 
through it on the conditional volatilities in all 
previous periods, the unconditional density of 
the return is not available in closed form (it is 
a mixture of densities depending on the dy¬ 
namics of er ( 'j ( _ 1 ). Therefore, the likelihood func¬ 
tion of the GARCH(1,1) model is expressed as 
the product of the conditional densities of q for 
each period t, t = 1,2,..., T. 

Given Fo, the likelihood function 
L{0 | n, r 2 ,.Tt, Fo) is written as 1 

L(0 \ r, F 0 ) = f(r 1 \e,F Q )f(r 2 \e,F 1 )... 
f (r T I 0, Fr-i) (3) 

where r = (jq , r 2 , ... , rr) ■ Due to the form of the 
likelihood function, posterior estimation is per¬ 
formed, without exception, numerically. This, 
on the other hand, implies that few, if any, 
restrictions exist on the choice of prior distri¬ 
butions, when estimation is cast in a Bayesian 
setting. 

In this entry, our focus is on the Student's f 
distributional assumption for the return dis¬ 
turbances, in an attempt to reflect the em¬ 
pirically observed heavy-tailedness of returns. 
This comes at the expense of only a marginal 
increase in complexity (compared to estima¬ 
tion of a model with normally distributed 
disturbances). The two numerical simulation 
methods we employ to simulate the posterior 
distribution of the vector of model parameters, 
0, are the Metropolis-Hastings algorithm and 
the Gibbs sampler . 2 

Our focus is the model of returns in (1) with a 
modification. We assume that the return mean 
is unconditional and equal to zero. That is, we 
define our parameter vector as 0 = (co, a, /3, v) 

Distributional Setup 

Next, we outline the general setup we use in our 
Bayesian estimation of the GARCH(1,1) model. 
We modify this setup in the second half of the 
entry, where we discuss regime switching. 
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Likelihood Function 

Assuming that et is distributed with a Student's 
t distribution with v degrees of freedom, we 
write the likelihood function for the model's 
parameters as 


L(0 \ r, F 0 ) oc J" 


t =1 




V o, 




( 4 ) 


where ctq is considered as a known constant, for 
simplicity. Under the Student's t assumption for 
€t, the conditional volatility at time t is given by 


for v greater than 2. 


Prior Distributions 

For simplicity, assume that the conditional 
variance parameters have uninformative dif¬ 
fuse prior distributions over their respective 
ranges, 3 

7r(co, a,/3) oc lI{g G ) (5) 


where I[e G ) is an indicator function reflecting 
the constraints on the conditional variance pa¬ 
rameters. 


k»G) 


I 

0 


if co > 0, a > 0, and ft > 0, 
otherwise 


( 6 ) 


The choice of prior distribution for the degrees- 
of-freedom parameter, v, requires more care. 
Bauwens and Lubrano (1998) show that if a dif¬ 
fuse prior for v is asserted on the interval [0, oo), 
the posterior distribution of v is not proper 
(its right tail does not decay quickly enough, 
so that the posterior does not integrate to 1). 
Therefore, the prior for v needs to be proper. 
Geweke (1993a) advocates the use of an expo¬ 
nential prior distribution with density given by 


7r(v) = Xexp(— vX) (7) 


The mean of the exponential distribution is 
given by 1/X. The parameter X can thus be 
uniquely determined from the prior intuition 
about v's mean. Another prior option for v is 
a uniform prior over an interval [0, M], where 


M is some finite number. Empirical research 
indicates that the degrees-of-freedom parame¬ 
ter calibrated from financial returns data (espe¬ 
cially of daily and higher frequency) is usually 
less than 20, so the upper bound, M, of i/s range 
could be fixed at 20, for instance. Bauwens and 
Lubrano propose a third prior for v —the upper 
half of a Cauchy distribution centered around 
zero. In our discussion, we adopt the exponen¬ 
tial prior distribution for v in (7). 


Posterior Distributions 

Given the distributional assumptions above, 
the posterior distribution of 6 is written as 

r f / i 2 \ ~ s r r 

p(0 | r, F 0 ) oc Y\ (Cit-i)" 1 ( 1 + — Y — ) 
t=i|_ V va t\t~i/ 

x exp (—vl) 

x k»G) (8) 

The restrictions on co, a, and ft are enforced 
during the sampling procedure by rejecting the 
draws that violate them. Stationarity can also 
be imposed and dealt with in the same way. 

As evident from the expression in (8), the joint 
posterior density does not have a closed form. 
Posterior numerical simulations are facilitated 
if one employs a specific representation of the 
Student's t distribution—a scale mixture of nor¬ 
mal distributions. We explain this representa¬ 
tion before we move on to the discussion of 
sampling algorithms. 


Mixture of Normals Representation of the 
Student's t Distribution 

Suppose that return r t is distributed with the 
Student's t distribution with v degrees of free¬ 
dom, scale parameter cr, and location param¬ 
eter pc. This distributional assumption can be 
represented as a scale mixture of normal distri¬ 
butions, given by 4 


r f | p-f, a t , % 


N 



( 9 ) 
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where p, the so-called "mixing variable," has 
the gamma distribution, 

r] t | v ~ Gamma (-, -) , for t = 1,... ,T 

( 10 ) 

The benefit of employing this representation is 
increased tractability of the posterior distribu¬ 
tion because the nonlinear expression for the 
model's likelihood in (4) is linearized. Sam¬ 
pling from the conditional distributions of the 
remaining parameters is thus greatly facili¬ 
tated. This comes at the expense of T ad¬ 
ditional model parameters, rj — (pi, ..., rjr ), 
whose conditional posterior distribution needs 
to be simulated as well. 5 

Under this Student's t representation, the pa¬ 
rameter vector, 0, is transformed to 6 


0 = ( cd, a, P, v , r/') (11) 

The log-likelihood function for 0 is simply the 
normal log-likelihood, 

1 T 

log (L(0 | r, F 0 )) = const - - 

Z t =1 


X 


lo g Kf-i) - lo g {it ) + 


( 12 ) 


The posterior distribution of 0 has an 
additional term reflecting the mixing variables' 
distribution. The log-posterior distribution is 
written as 

T 

log (p(0 | r, F 0 )) = const - - 

^ t =1 


X 


i°s (V. i) - log (iii) + U 


+ - log 0_ Tlog(r 0) 

T 


T 

~ vX 

z t =1 


for a) > 0, a > 0, and P > 0 

Next, we discuss some strategies for simulat¬ 
ing the posterior in (13). 


Posterior Simulations with the 
Metropolis-Hastings Algorithm 

The Metropolis-Hastings (M-H) algorithm 
could be implemented in two ways. The first 
way is by sampling the whole parameter vec¬ 
tor, 0 , from a proposal distribution (usually a 
multivariate Student's t distribution) centered 
on the posterior mode and scaled by the nega¬ 
tive inverse Hessian (evaluated at the posterior 
mode). 7 The second way is by employing a sam¬ 
pling scheme in which the parameter vector is 
updated component by component. Here, we 
focus on the latter M-H implementation. 

Consider the decomposition of the param¬ 
eter vector 0 into three components, 0 = 
(0Gi v i0')> where 0G — (a>,a,p). We would 
like to employ a scheme of sampling 
consecutively from the conditional posterior 
distributions of the components, given, respec¬ 
tively, by p(0 G \v,v,r,F o ), p (v\0 G , r), r, F 0 ), 
and p (t) | 0 G , v, r, Fo). The scale mixture of nor¬ 
mals representation of a Student's t distribution 
allows us to recognize the conditional posterior 
distribution of the last component, r], as a stan¬ 
dard distribution. For the first two components, 
0 G and v, whose posterior distributions are not 
of standard form, we offer two posterior simu¬ 
lation approaches and mention alternatives that 
have been suggested in the literature. 


Conditional Posterior Distribution for »/ 

The full conditional posterior distribution for 
the (independently-distributed) mixing param¬ 
eters, rjt, t = 1,... ,T, can be shown to be a 
gamma distribution, 

P (Pt\0 G , v, r , F 0 ) 

( v + 1 r? v \ 

(14) 


Conditional Posterior Distribution for v 

It can be seen from (13) that the conditional pos¬ 
terior distribution of the degrees-of-freedom 
parameter, v, does not have a standard form. 
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The kernel of the posterior distribution is given 
by the expression, 

r p Tv 

p(v\0 G ,ri, r, F o )ocr(0 (0 * exp(vA*) 

(15) 

where 

1 T 

= 2 ( lo § (^) -Vt)-*- (16) 

z t=i 

Geweke (1993b) describes a rejection sampling 
approach that could be employed to simulate 
draws from the conditional posterior distribu¬ 
tion of v in (15). In this entry, we employ a 
sampling algorithm called the griddy Gibbs 
sampler. The appendix provides details on it. 


Proposal Distribution for 0 G 

The kernel of 0 G 's log-posterior distribution is 

given by the expression. 


log(p(0 G \0-$ G ,r, F 0 )) 


1 V" 

= const - - 2_^ 


t =1 L 


i°g Kf-i) + 


mrf 

2 


for co > 0, a > 0, and ft > 0 

where ofu_v f = 1, ■ • •, T, is a function of 0 c. 

We specify a Student's t proposal distribu¬ 
tion for 6c, centered on the posterior mode 
of 0q and scaled by the negative inverse Hes¬ 
sian of the posterior kernel, evaluated at the 
posterior mode. Other approaches for posterior 
simulation, for example, the griddy Gibbs sam¬ 
pler, could be employed as well. (In this case, 
the components of 0 G would be sampled sepa¬ 
rately.) 

Having determined the full conditional 
posterior distribution r], as well as a proposal 
distribution for 0 c and a sampling scheme for 
v, implementing a hybrid M-H algorithm is 
straightforward. Its steps are as follows. At it¬ 
eration m of the algorithm. 


• Draw an observation, 0* G , of the vector of con¬ 
ditional variance parameters, 6c, from its pro¬ 
posal distribution. 


• Check whether the positivity (and stationar- 
ity) parameter restrictions on the components 
of 0q are satisfied. If not, draw 0* G repeatedly 
until they are satisfied. 

* Compute the acceptance probability 

«(^G^G _1) ) 

• L p(<>G\y)hK\o { G~ l) ) 

(17) 

where p(0 G \y) is 0 G 's posterior distribution 
and q(0 G |-) is 6c's proposal distribution. The 
previous draw of the parameter vector is 
given by 0 [j 1 . Accept or reject the candidate 
draw 0* G with probability a (0 G , 0 G ^). 

* Draw an observation, i / m ), from the 
full conditional posterior distribution, 
V (m I 0 G l) ’ r ■ F o),in (14) 

• Draw an observation, v^ m \ from its condi¬ 
tional posterior distribution with kernel in 
(15) using the griddy Gibbs sampler as ex¬ 
plained in the appendix. 

At each iteration of the sampling algorithm, 
the sampling strategy described above pro¬ 
duces a large output consisting of the draws 
from the model parameters and the T mix¬ 
ing variables, )/. However, since the role of the 
mixing parameters is only auxiliary and their 
conditional distribution is of no interest, at any 
iteration of the algorithm above one needs to 
store only the latest draw of rj (as well as the 
draws of v and 0 G , of course). 

In the simple GARCH model discussed now, 
it is implicitly assumed that expression (2) de¬ 
scribes the volatility process during the whole 
sample period and (at least) in the short run 
after the end of the sample. That is, the pa¬ 
rameters of the model are unchanged through¬ 
out. It is not inconceivable, however, that the 
volatility dynamics differ in different peri¬ 
ods. Then, volatility forecasts produced by a 
simple (single-regime) model are likely to over¬ 
estimate volatility during periods of low volatil¬ 
ity and underestimate it during periods of 
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high volatility. In the next section, we dis¬ 
cuss a class of models extending the simple 
GARCH(1,1) model, which could potentially 
provide more accurate volatility forecasting 
power. Regime-switching models incorporate 
the possibility that the dynamics of the volatil¬ 
ity process evolves through different states of 
nature, which we call regimes. 

MARKOV-SWITCHING 
GARCH MODELS 

The Markov-switching (MS) models, introduced 
by Hamilton (1989), provide maximal flexibility 
in modeling transitions of the volatility dynam¬ 
ics across regimes. They form the class of the 
so-called endogenous regime-switching mod¬ 
els in which transitions between states of nature 
are governed by parameters estimated within 
the model; the number of transitions is not 
specified a priori, unlike the number of states. 
Each volatility state could be revisited multi¬ 
ple times. 8 In our discussion below, we use the 
terms "state" and "regime" interchangeably. 

Different approaches to introducing regime 
changes in the GARCH process have been 
proposed in the empirical finance literature. 
Hamilton and Susmel (1994) incorporate a 
regime-dependent parameter, g$ t , into the stan¬ 
dard deviation (scale) of the returns process, 

ff = Mtif-i + -v/SVhlf-i^f 

where St denotes period t's regime. Another 
option, pursued by Cai (1994), is to include a 
regime-dependent parameter as part of the con¬ 
stant in the conditional variance equation, 

p 

a t\t~i = (u>+ gs f )+ a P u t-v 
p =i 

Both Hamilton and Susmel (1994) and Cai 
(1994) model the dynamics of the conditional 
variance with an ARCH process. The reason, as 
explained further below, is that when GARCH 
term(s) are present in the process, the regime- 


dependence makes the likelihood function an¬ 
alytically intractable. 

The most flexible approach to introducing 
regime-dependence is to allow all parameters 
of the conditional variance equation to vary 
across regimes. That approach is suggested by 
Henneke, Rachev, Fabozzi, and Nikolov (2011) 
who model jointly the conditional mean as an 
ARMA(1,1) process in a Bayesian estimation 
setting. 9 The implication for the dynamics of the 
conditional variance is that the manner in which 
the variance responds to past return shocks 
and volatility levels changes across regimes. 
For example, high-volatility regimes could be 
characterized by "hyper-sensitivity" of asset re¬ 
turns to return shocks and high volatility in 
one period could have a more lasting effect 
on future volatilities compared to low-volatility 
regimes. This would call for a different relation¬ 
ship between the parameters a and ft in differ¬ 
ent regimes. 

In this section, we discuss the estimation 
method of Henneke, Rachev, Fabozzi, and 
Nikolov (2011), with some modifications. 

Preliminaries 

Suppose that there are three states the condi¬ 
tional volatility can occupy, denoted by i,i = 
1,2,3. We could assign economic interpreta¬ 
tion to them by labeling them "a low-volatility 
state," "a normal-volatility state," and "a high- 
volatility state." Denote by n q the probability 
of a transition from state i to state j. The transi¬ 
tion probabilities, 7T/y, could be arranged in the 
transition probability matrix, fl. 


/ TTn 

7Ti2 

Xl3 

n = 1 7T 2 1 

7T22 

1*23 

\T31 

1*32 

T33 


such that the probabilities in each row sum up 
to 1. The Markov property (central to model 
estimation, as we will see below) that lends its 
name to the MS models concerns the memory of 
the process—which volatility regime the system 
visits in a given period depends only on the 
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regime in the previous period. Analytically, the 
Markov property is expressed as 

P(S f |S f _ 1 ,S f _2,...,S 1 ) = P(S f |S f _ 1 ) (19) 


Each row of n in (18) represents the three- 
dimensional conditional probability distribu¬ 
tion of St, conditional on the regime realization 
in the previous period, S ( _i. We say that { S t }J =1 
is a three-dimensional (discrete-time) Markov 
chain with transition matrix, n. 

In the regime-switching GARCH(1,1) setting, 
the expression for the conditional variance dy¬ 
namics becomes 

°f|f_i = «(S f ) + + P(St)c Tf—i|t_2 

( 20 ) 

For each period f. 


(co(S t ),a(S t ),p(S t )) 


(a>i, ai, ft) if S t = 1, 
(a>2, <X2, Pi) if St = 2, 
(' C 03 , a 3 , p 3 ) if S t = 3 


The presence of the GARCH component in 
(20) complicates the model estimation substan¬ 
tially. To see this, notice that, via er f 2 _ 1 , f _ 2 , the 
current conditional variance depends on the 
conditional variances from all preceding peri¬ 
ods and, therefore, on the whole unobservable 
sequence of regimes up to time f. A great num¬ 
ber of regime paths could lead to the particular 
conditional variance at time t (the number of 
possible regime combinations grows exponen¬ 
tially with the number of time periods), render¬ 
ing classical estimation very complicated. For 
that reason, the early treatments of MS mod¬ 
els include only an ARCH component in the 
conditional variance equation. The MCMC 
methodology, however, copes easily with the 
specification in (20), as we will see below. 

We adopt the same return decomposition as 
in (1)—with the conditional mean set to zero— 
and note that, given the regime path, (20) repre¬ 
sents the same conditional variance dynamics 
as a simple GARCH(1,1) process. We return to 
this point again further below when we discuss 
estimation of that MS GARCH(1,1) model. 

Next, we outline the prior assumptions for the 
MS GARCH(1,1) model. 


Prior Distributional Assumptions 

The parameter vector of the MS GARCH(1,1) 
model, specified by (1), (20), and the Markov 
chain {S t }J =1 , is given by 

0 = (l t', V, 0 Gl l, 0 G ,2, 0 G ,3, JTi, 7l 2 , 7l 3 , S ) (21) 

where, for i = 1, 2, 3, 

0 G ,i — ( coi, at, Pt) and m = (ica, Tin, 7173) 

and S is the regime path for all periods, 

S = (Si.S r ) 

Our prior specifications for i) and v remain un¬ 
changed from our earlier discussion: The scale- 
mixture-of-normals mixing parameters, >j, and 
the degrees-of-freedom parameter, v, are not af¬ 
fected by the regime specification in the MS 
GARCH(1,1) model. We assert prior distribu¬ 
tions for the vector of conditional variance pa¬ 
rameters, 0 G j, under each regime, i, and a prior 
distribution for each triple of transition proba¬ 
bilities iti, i = 1,2, 3. 

Prior Distributions for 0 c,,i 
To reflect our prior intuition about the effect the 
three regimes have on the conditional variance 
parameters, we assert proper normal priors for 
00,7,1=1,2,3. 

0 c,i ~N(#ii, Zi)I { , Giil (22) 

where the indicator function, I[e Gi ], is given in 
(6). As explained earlier in the entry, the param¬ 
eter constraints are imposed during the imple¬ 
mentation of the sampling algorithm. 

Prior Distribution for 71 , 

A convenient prior for the probability pa¬ 
rameter in a binomial experiment is the beta 
distribution. 10 The analogue of the beta distri¬ 
bution in the multivariate case is the so-called 
Dirichlet distribution. 11 Therefore, we specify 
a Dirichlet prior distribution for each triple of 
transition probabilities, i = 1,2,3, 

71 i ~ Dirichlet (flu, «i 2 , fli' 3 ) (23) 
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To elicit the prior parameters, fl, f , i, j = 1,2,3, 
it is sufficient that one express prior intuition 
about the expected value of each of the tran¬ 
sition probabilities in a triple, then solve the 
system equations for fly. 


Estimation of the MS GARCH 
Model 

The evolution of volatility in the MS GARCH 
model is governed by the realizations of 
the unobservable (latent) regime variable, St, 
t = 1,..., T. Hence, the discrete-time Markov 
chain, [S t }J =1 is also called a hidden Markov 
process. Earlier, we briefly discussed that the 
presence of the hidden Markov process creates 
a major estimation difficulty in the classical set¬ 
ting. The Bayesian methodology, in contrast, 
deals with the latent-variable characteristic in 
an easy and natural way: The latent variable 
is simulated together with the model param¬ 
eters. In other words, the parameter space is 
augmented with St, t = 1,..., T, in much the 
same way as the vector of mixing variables, //, 
was added to the parameter space in estimating 
the Student's t GARCH(1,1) model. The distri¬ 
bution of S is a multinomial distribution. 


r-i 

V (S | jt) = n P ( S f+i I Sf’ 7 *) 

t =l 

= tt”” n S 2 ... jr”! 2 TTgf 3 (24) 

= 7r" 1 11 7r" 2 12 ( 1 - 7Tn - JTu) ni3 ■ ■ ■ 

7r 3 " 2 32 (l - 7 t 3 i - 7r 32 )" 33 

where n !; denotes the number of times the chain 
transitions from state i to state / during the span 
of period 1 through period T. The first equality 
in (24) follows from the Markov property of 

{St }J =V 

Based on our discussion of the Student's f 
GARCH(1,1) model and the hidden Markov 
process, as well as the prior distributional 
assumptions for n, and 0cj, i = 1,2,3, the 
joint log-posterior distribution of the MS 
GARCH(1,1) model's parameter vector 0 is 


given by 

log (p (0 I r, F 0 )) = const 


1 1 

- £ 

t= 1 

-i 3 


!°g K-i) + lo § (4 + 


4-i j 


- n yi G,i - 4 ^i 1 (@G,i ~ 0-i) ks(t)=i] 


E lo g(^)- 


t =l 
3 3 


t =1 


+ n 'i - !) lo § 4 ) ( 25 ) 

i =1 7=1 


for 0 ), > 0, at > 0, and Pi > 0 

Although (25) looks very similar to the joint 
log-posterior in (13), there is a crucial differ¬ 
ence. The model's log-likelihood (given by the 
right-hand-side term in the first line of (25)) 
depends on the whole sequence of regimes, S. 
Conditional on S, however, it is the same log- 
likelihood as in (12). We will exploit this fact 
in constructing the posterior simulation algo¬ 
rithm as an extension of the algorithm for the 
Student's t GARCH(1,1) model estimation. 

We now outline the posterior results for 
iti, S, and 0c,i■ The posterior results for the 
degrees-of-freedom parameter, v, and the mix¬ 
ing variables, //, remain unchanged from our 
earlier discussion. 


Conditional Posterior Distribution of itt 
The conditional log-posterior distribution of the 
vector of transition probabilities, iij, i = 1, 2,3, 
is given by 

log (p (iti | r, 0-*,)) = const 

3 

+ + n ‘i ~ x ) lo § 4 ) 

7=1 

for z = 1,2,3 (26) 

where 0 - K , denotes the vector of all parame¬ 
ters except nj. The expression in (26) is readily 
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recognized as the logarithm of the kernel 
of a Dirichlet distribution with parameters 
(fl,i + tin, a n + nn, a n + n, 3 ). The parameters 
fly are specified a priori, while the parameters 
rijj can be determined by simply counting the 
number of times the Markov chain, { St)J =1 , tran¬ 
sitions from i to j. 

Sampling from the Dirichlet distribution in 
(26) is accomplished easily in the following 
way. 12 For each i, i = 1,2.3, 


(1) sample three independent observations. 


(2) set 


y> i ~ 

, „2 

Z^il+Kil) 

yis - 

' X 2 / \ 

2(fli3+«i3) 

nn = 

yn 

2_fc=l Vik 

Xi3 = 

yis 


Vi i ~ x: 


[ a i 2 + n i 2 ) 


Efc=l Vik 


nn 


yn 

E 3 

k =1 Vik 


Conditional Posterior Distribution of S 

In the three-regime switching setup of this en¬ 
try, the number of regime paths that could have 
potentially generated Sr, the regime in the final 
period, is 3 r . The level of complexity makes it 
impossible to obtain a draw of the whole 1 x T 
vector, S, at once. Instead, its components can 
be drawn one at a time, in a T-step procedure. In 
other words, at each step, we sample from the 
full conditional posterior density of St given by 

p{St = i\r,e- S ,S- t ) (27) 


where 0-s is the parameter vector in (21) ex¬ 
cluding S and S-t is the regime path excluding 
the regime at time f. Applying the rules of con¬ 
ditional probability, p (St = i\r, # s,) is written 
as 


p {S t =i\r, 0-s, S-t) 


p {S t = i, S-t, r | 0-s) 


p {S-t, r\ 0-s) 
p {r | 0-s, S-t, S t = i) p {S t = i, S- t \ 0-s) 
p {S-t, r | 0- S ) 


( 28 ) 


The first term in the numerator, 
p {r | 0-s, S-t, S f = i), is simply the model's 
likelihood evaluated at a given regime path, in 
which Sf = i. The second term in the numer¬ 
ator, p (S f = i, S-t), is given, by the Markov 
property, by 

P {S t = i, S-, | 0-s) = p (S t = i, S f _i = j, St+i 
= k | 0 -s) 

= Xj,m,k (29) 

while the denominator in (28) is expressed as 

3 

p (S-t, r I 0-s) = Y2 p (St = S, S-t, r \ 0- S ) 

S = 1 

(30) 

Using (28), (29), and (30), we obtain the condi¬ 
tional posterior distribution of Sf as 

P (St =i\r, 0-s, S-t) 

p (r | 0-s, S-t, S t = i) nj j n itk 

ELi V ( r I 0-s, S-t, Sf = s) iTj' S n s<k 

(31) 

for i = 1,2,3. An observation, S* *, from the con¬ 
ditional density in (31) is obtained in the fol¬ 
lowing way: 

• Compute the probability in (31) for i — 1,2,3. 

• Split the interval (0,1) into three intervals of 
lengths proportional to the probabilities in 
step (1). 

• Draw an observation, u, from the uniform dis¬ 
tribution lt[0,1]. 

• Depending on which interval u falls into, set 
Sf = i. 

To draw the regime path, S <m \ at the mth itera¬ 
tion of the posterior simulation algorithm, 

• Draw sj m) from p (Si | r, 0-Si) in (31). Update 
S (m) with Sj m) . 

• For t = 2, ..., T, draw Sj m> from p (S t \ r, 0~s t ) 
in (31). Update S (m) with St'"\ 


Proposal Distribution for 0 G ,i 
The posterior distribution of the vector of con¬ 
ditional variance parameters is not available in 
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closed form because of the regime dependence 
of the conditional variance. Since in the regime¬ 
switching setting we adopted informative prior 
distributions for 0Q i, i = 1, 2, 3, the kernel of 
the conditional log-posterior distribution is a 
bit different from the one in (17) and is given 
by 


log (p (6 >G,i I 0-e C j, r, F 0 )) = const 


i J 
- £ 


!og Kt-i) + lo § (it) + 


t =l L 
3 


mn 

7 

CT flf-l _ 


h X! - mV X i 1 ^ c ’ 1 - Fi) *{$=*}• 

(32) 


i=l 


for ® > 0, a > 0, yS > 0, and 
i = 1,2,3 


For a given regime path, S, the only differ¬ 
ence between the earlier posterior kernel and 
(32) is the term reflecting the informative prior 
of $G,i • Therefore, specifying a proposal dis¬ 
tribution for 0c,i is in no way different from 
the approach in the single-regime Student's t 
GARCH(1,1) setting. 


The parameter vector, 0, is updated as new 
components are drawn. The steps above are re¬ 
peated a large number of times until conver¬ 
gence of the algorithm. 


APPENDIX: THE GRIDDY 
GIBBS SAMPLER 

Implementation of the Gibbs sampler requires 
that parameters' conditional posterior distribu¬ 
tions be known. Sometimes, however, the con¬ 
ditional posterior distributions have no closed 
forms. In these cases, a special form of the Gibbs 
sampler, called the griddy Gibbs sampler, can 
be employed whereby the (univariate) condi¬ 
tional posterior densities are evaluated on grids 
of parameter values. The griddy Gibbs sam¬ 
pler, developed by Ritter and Tanner (1992), is 
a combination of the ordinary Gibbs sampler 
and a numerical routine. In this appendix, we 
illustrate the griddy Gibbs sampler with the 
posterior distribution of the degrees-of- 
freedom parameter, v. 

Recall the expression for the kernel of v's con¬ 
ditional log-posterior distribution. 


Sampling Algorithm for the Parameters of the 

MS GARCH (1,1) Model 

The sampling algorithm for the MS 

GARCH(1,1) model parameters consists of the 

following steps. At iteration m, 

• Draw n f 1 ' 1 from its posterior density in (26), 
fori = 1,2,3. 

• Draw S^ from (31). 

• Draw 11 ^™' 1 from (14). 

• Draw t/"0 from (15). 

• Draw 0* Gi , i = 1, 2, 3, from the proposal dis¬ 
tribution, as explained earlier. 

• Check whether the parameter restrictions on 
the components of 0gj are satisfied; if not, 
draw 0 * G j. repeatedly, until they are satisfied. 

• Compute the acceptance probability in (17) 
and accept of reject 0* G ■ , for i = 1,2, 3. 


log(p(v |0_„, r, F 0 )) = const 

+ - log 0_ Tlog(r 0) 

T T 

+ ( \ - ! ) lo § ('h) - \ 1 * ~ yA ( 33 ) 

t=i z t=i 

The griddy Gibbs sampler approach to drawing 
from the conditional posterior distribution of v 
is to recognize that at iteration m we can treat 
the latest draws of the remaining parameters as 
the known parameter values. Therefore, we can 
evaluate numerically the conditional posterior 
density of v on a grid of its admissible values. 
The support of v is the positive part of the real 
line. However, a reasonable range for the values 
of v in an application to asset returns could be 
(2, 30). 13 
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Drawing from the Conditional 
Posterior Distribution of v 

Denote the eqLially-spaced grid of values for 
v by (v\,V 2 ,,vj). We outline the steps for 
drawing from i/s conditional posterior distri¬ 
bution at iteration m of the sampling algorithm. 
Denote the most recent draws of the remain¬ 
ing model parameters by 0 1) . (Note that this 
notation is not entirely precise since some of 
the parameters might have been updated last 
during the mth iteration of the sampler but 
before v.) 

• Compute the value of v's posterior kernel (the 
exponential of the expression in (33)) at each 
of the grid nodes and denote the resultant 
vector by 

p(v) = (p(v i), p(v 2 ),p(vj )) (34) 

• Normalize p(v) by dividing each vector com¬ 
ponent in (34) by the quantity X]!=i P( v j )( v 2 — 
Vi). For convenience of notation, let us rede¬ 
fine p(v) to denote the vector of (normalized) 
posterior density values at each node of v's 
grid. 

• Compute the empirical cumulative distribu¬ 
tion function (CDF), 

F(v) = f p(vi), ^ p(vj),..., K y /)j 

(35) 

If the grid is adequate, the first element of F (v) 
should be nearly 0, while the last element of 
F (v) nearly 1. 

• Draw an observation from the uniform distri¬ 
bution (lf[0,1]) and denote it by u. 

• Find the element of F(v) closest to u without 
exceeding it. 

• The grid node corresponding to the value of 
F (v) in the previous step is the draw of v from 
its posterior distribution. 

The method above of obtaining a draw from 
v's distribution using its CDF is called the CDF 
inversion method. 


Constructing an adequate grid is the key 
to efficient sampling from v's posterior. Since 
the griddy Gibbs sampling procedure relies on 
multiple evaluations of the posterior kernel, 
two desired characteristics of an adequate grid 
are short length and coverage of the parame¬ 
ter support where the posterior distribution has 
positive probability mass. A simple example il¬ 
lustrates this point. Suppose that for a given 
sample of observed data, the likely values of v 
are in the interval (2,15). Suppose further that 
we construct an equally-spaced grid of length 
30, with nodes on each integer from 2 to 30. The 
value of the posterior kernel at the nodes corre¬ 
sponding to v equal to 16 and above would be 
only marginally different from zero. The pos¬ 
terior kernel evaluations at those nodes should 
be avoided, if possible. 

If no prior intuition exists about what the 
likely parameter values are, one could employ 
a variable grid instead of a fixed grid. At each 
iteration of the sampling algorithm one must 
analyze the distribution of posterior mass and 
adjust the grid, so that the majority of the grid 
nodes are placed in the interval of greatest prob¬ 
ability mass. Automating this process could in¬ 
volve some computational effort. 


KEY POINTS 

• The unconditional density of the return in 
GARCFI models is not available in closed 
form. Therefore, the likelihood function of the 
GARCFI parameters is expressed as a prod¬ 
uct of the return's conditional density in each 
period. 

• In the Bayesian setting, estimation of GARCFI 
models is performed numerically. 

• Posterior numerical simulations are facili¬ 
tated if the scale mixture of normal dis¬ 
tributions representation is adopted for the 
Student's t distribution. 

• Markov-switching GARCFI models provide 
maximal flexibility in modeling transitions of 
the volatility dynamics across regimes. 
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• Transitions among regimes are governed by 
an unobserved state variable. 

• In posterior simulations, the whole path 
of regimes, governed by the state vari¬ 
able, is simulated together with the model 
parameters. 

NOTES 

1. To see that, notice that when F t is defined as 
an information set consisting of lagged asset 
returns, F 1 = F 0 {J n, F 2 = F 1 \J r 2/ etc. 

2. For a discussion of numerical estimation 
methods, see, for example, Rachev, et al. 
(2008). See also Geweke (1989) for an ap¬ 
plication of importance sampling to the es¬ 
timation of ARCH models. 

3. It is possible to assert a prior distribution 
for co, a, and defined on the whole real 
line, for example, a normal distribution. To 
respect the positivity constraints on the pa¬ 
rameters, such a prior would have to be 
truncated at the lower bound of the pa¬ 
rameters' range. In practice, however, the 
constraints could also be enforced during 
the posterior simulation as explained fur¬ 
ther below. Alternatively, one could assert 
such a prior without enforcing constraints, 
after transforming co, a, and /I by taking 
their logarithms (their ranges then become 
the whole real line). 

4. Many heavy-tailed distributions can be 
represented as (mean-) scale mixtures of 
normal distributions. Such representations 
make estimation based on numerical, itera¬ 
tive procedures easier. See, for example, Fer¬ 
nandez and Steel (2000) for a discussion of 
the Bayesian treatment of regression anal¬ 
ysis with mixtures of normals. In contin¬ 
uous time, the mean and scale mixture of 
normals models lead to the so-called sub¬ 
ordinated processes, widely used in mathe¬ 
matical and empirical finance. Rachev and 
Mittnik (2000) offer an extensive treatment 
of subordinated processes. 


5. This is an example of the technique known 
as "data augmentation." It consists of in¬ 
troducing latent (unobserved) variables to 
help construct efficient simulation algo¬ 
rithms. For a (technical) review of data aug¬ 
mentation, see, for example, van Dyk and 
Meng (2001). 

6. Recall that we assume that /it — 0. 

7. The Hessian matrix is the matrix of sec¬ 
ond derivatives. According to a fundamen¬ 
tal result in maximum likelihood theory, 
the maximum likelihood estimator's distri¬ 
bution is asymptotically normal, with co- 
variance matrix—the negative inverse Hes¬ 
sian matrix, evaluated at the maximum 
likelihood estimate. Usually, the Hessian 
is provided as a "by-product" of numer¬ 
ical optimization routines for finding the 
maximum-likelihood estimate. See, for ex¬ 
ample, Rachev, et al. (2008) for additional 
details. 

8. It is certainly possible to introduce (test 
for) a deterministic permanent shift in a 
model parameter into the regime-switching 
model. For example, Kim and Nelson (1999) 
apply such a model to a Bayesian investiga¬ 
tion of business cycle fluctuations. See also 
Carlin, Gelfand, and Smith (1992). Wang 
and Zivot (2000) consider Bayesian esti¬ 
mation of a heteroskedastic model with 
structural breaks only. The variance in that 
investigation, however, does not evolve ac¬ 
cording to an ARCH-type process. 

9. See also Haas, Mittnik, and Paolella (2004), 
Klaassen (1998), Francq and Zakoian (2001), 
and Ghysels, McCulloch, and Tsay (1998), 
among others. 

10. The beta distribution is the conjugate dis¬ 
tribution for the probability parameter in a 
binomial experiment. 

11. A K-dimensional random variable 
P = (pi, Pi, ■ ■ ■, Pk), where p k > 0 
and J2k=i Pk — 1/ distributed with a 
Dirichlet distribution with parameters 
a — (ai,a 2 ,..., a K ), cii > 0, i = 1,..., K, 
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has a density function 


f(P I «) = 


nf=i r(«t) 


K 


Up" ' 


where F is the gamma function. The mean 
and the variance of the Dirichlet distribu¬ 
tion are given, respectively, by E(pk) — ^ 

and var(pjt) = "a^+iy where a o = Ef=i «/• 
The Dirichlet distribution is the conjugate 
prior distribution for the parameters of the 
multinomial distribution. As can be seen 
in our discussion on the MS GARCH (1,1) 
estimation, the distribution of the Markov 
chain, {S f }/ =1 , is, in fact, a multinomial dis¬ 
tribution. 

12. See, for example, Anderson (2003). 

13. This is the typical range of the degrees- 
of-freedom parameter of a Student's t 
distribution fitted to return data. The higher 
the data frequency is, the more heavy-tailed 
returns are and the lower the value of the 
degrees-of-freedom parameter. 
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Abstract: Investment policies constructed using inferior estimates, such as sample means and sam¬ 
ple covariance matrices, typically perform very poorly in practice. Besides introducing spurious 
changes in portfolio weights each time the portfolio is rebalanced, this undesirable property also 
results in unnecessary turnover and increased transaction costs. These phenomena are not neces¬ 
sarily a sign that portfolio optimization does not work, but rather that the modem portfolio theory 
framework is very sensitive to the accuracy of inputs. There are different ways to address this issue. 
Qn the estimation side, one can try to produce more robust estimates of the input parameters for 
the optimization problems. This is most often achieved by using estimators that are less sensitive 
to outliers, and possibly, other sampling errors, such as Bayesian and shrinkage estimators. On 
the modeling side, one can constrain portfolio weights, use portfolio resampling, or apply robust or 
stochastic optimization techniques to specify scenarios or ranges of values for parameters estimated 
from data, thus incorporating uncertainty into the optimization process itself. 


In this entry, we provide a general overview 
of some of the common problems encoun¬ 
tered in mean-variance optimization before 
we turn our attention to shrinkage estima¬ 
tors for expected returns and the covariance 
matrix. Within the context of Bayesian estima¬ 
tion, we focus on the Black-Litterman model 
(see Black and Litterman, 1992). We derive 


the model using so-called mixed estima¬ 
tion from classical econometrics. Introducing 
a simple cross-sectional momentum strategy, 
we then show how one can combine this 
strategy with market equilibrium using the 
Black-Litterman model in the mean-variance 
framework to rebalance the portfolio on a 
monthly basis. 
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PRACTICAL PROBLEMS 
ENCOUNTERED IN 
MEAN-VARIANCE 
OPTIMIZATION 

The simplicity and the intuitive appeal of port¬ 
folio construction using modern portfolio the¬ 
ory have attracted significant attention both in 
academia and in practice. Yet, despite consid¬ 
erable effort, it took many years until portfolio 
managers started using modern portfolio the¬ 
ory for managing real money. Unfortunately, in 
real world applications there are many prob¬ 
lems with it, and portfolio optimization is still 
considered by many practitioners to be difficult 
to apply. In this section we consider some of the 
typical problems encountered in mean-variance 
optimization. In particular, we elaborate on: (1) 
the sensitivity to estimation error; (2) the effects 
of uncertainty in the inputs in the optimization 
process; and (3) the large data requirement nec¬ 
essary for accurately estimating the inputs for 
the portfolio optimization framework. We start 
by considering an example illustrating the ef¬ 
fect of estimation error. 

Example: The True, Estimated, and 
Actual Efficient Frontiers 

Broadie introduced the terms true frontier, 
estimated frontier, and actual frontier to re¬ 
fer to the efficient frontiers computed us¬ 
ing the true expected returns (unobservable), 
estimated expected returns, and true expected 
returns of the portfolios on the estimated fron¬ 
tier, respectively. 1 In this example, we refer to 
the frontier computed using the true, but un¬ 
known, expected returns as the true frontier. 
Similarly, we refer to the frontier computed us¬ 
ing estimates of the expected returns and the 
true covariance matrix as the estimated frontier. 
Finally, we define the actual frontier as follows: 
We take the portfolios on the estimated frontier 
and then calculate their expected returns using 
the true expected returns. Since we are using the 
true covariance matrix, the variance of a port¬ 


folio on the estimated frontier is the same as the 
variance on the actual frontier. 

From these definitions, we observe that the ac¬ 
tual frontier will always lie below the true fron¬ 
tier. The estimated frontier can lie anywhere 
with respect to the other frontiers. However, if 
the errors in the expected return estimates have 
a mean of zero, then the estimated frontier will 
lie above the true frontier with extremely high 
probability, particularly when the investment 
universe is large. We look at two cases consid¬ 
ered by Ceria and Stubbs: 2 

1. Using the covariance matrix and expected 
return vector from Idzorek (2005), they ran¬ 
domly generate a time series of normally dis¬ 
tributed returns and compute the average to 
use as estimates of expected returns. Using 
the expected-return estimate calculated in 
this fashion and the true covariance matrix, 
they generate an estimated efficient frontier 
of risk versus expected return where the 
portfolios were subject to no-shorting con¬ 
straints and the standard budget constraint 
that the sum of portfolio weights is one. Sim¬ 
ilarly, Ceria and Stubbs compute the true 
efficient frontier using the original covari¬ 
ance matrix and expected return vector. Fi¬ 
nally, they construct the actual frontier by 
computing the expected return and risk of 
the portfolios on the estimated frontier with 
the true covariance and expected return val¬ 
ues. These three frontiers are illustrated in 
Figure 1. 

2. Using the same estimate of expected returns, 
Ceria and Stubbs also generate risk versus 
expected return where active holdings of 
the assets are constrained to be ±3% of the 
benchmark holding of each asset. These fron¬ 
tiers are illustrated in Figure 2. 

We observe that the estimated frontiers sig¬ 
nificantly overestimate the expected return for 
any risk level in both types of frontiers. More 
importantly, we note that the actual frontier 
lies far below the true frontier in both cases. 
This shows that the optimal mean-variance 
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Figure 1 Markowitz Efficient Frontiers 
Source : Figure 2 in Ceria and Stubbs (2005, p. 6). 
Reprinted with the permission of Axioma, Inc. 


portfolio is not necessarily a good portfolio; 
that is, it is not mean-variance efficient. Since 
the true expected return is not observable, we 
do not know how far the actual expected re¬ 
turn may be from the expected return of the 
mean-variance optimal portfolio, and we end 
up holding an inferior portfolio. 

Sensitivity to Estimation Error 

In a portfolio optimization context, securities 
with large expected returns and low stan¬ 
dard deviations will be overweighted and 
conversely, securities with low expected re- 



Figure 2 Markowitz Benchmark-Relative Effi¬ 
cient Frontiers 

Source: Figure 3 in Ceria and Stubbs (2005, p. 7). 
Reprinted with the permission of Axioma, Inc. 


turns and high standard deviations will be 
underweighted. Therefore, large estimation er¬ 
rors in expected returns and/or variances/ 
covariances introduce errors in the optimized 
portfolio weights. For this reason, people of¬ 
ten cynically refer to optimizers as error 
maximizers. 

Uncertainty from estimation error in expected 
returns tends to have more influence than 
in the covariance matrix in a mean-variance 
optimization. 3 The relative importance de¬ 
pends on the investor's risk aversion, but as 
a general rule of thumb, errors in the expected 
returns are about 10 times more important than 
errors in the covariance matrix, and errors in 
the variances are about twice as important as 
errors in the covariances. 4 As the risk tolerance 
increases, the relative impact of estimation er¬ 
rors in the expected returns becomes even more 
important. Conversely, as the risk tolerance de¬ 
creases, the impact of errors in expected returns 
relative to errors in the covariance matrix be¬ 
comes smaller. From this simple rule, it follows 
that the major focus should be on providing 
good estimates for the expected returns, fol¬ 
lowed by the variances. In this entry we discuss 
shrinkage techniques and the Black-Litterman 
model in order to mitigate estimation errors. 

Constraining Portfolio Weights 

Several studies have shown that the inclu¬ 
sion of constraints in the mean-variance opti¬ 
mization problem leads to better out-of-sample 
performance. 5 Practitioners often use no short- 
selling constraints or upper and lower bounds 
for each security to avoid overconcentration in 
a few assets. Gupta and Eichhom (1998) suggest 
that constraining portfolio weights may also as¬ 
sist in containing volatility, increase realized ef¬ 
ficiency, and decrease downside risk or shortfall 
probability. 

Jagannathan and Ma (2003) provide a theoret¬ 
ical justification for these observations. Specif¬ 
ically, they show that the no short-selling 
constraints are equivalent to reducing the 
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estimated asset covariances, whereas upper 
bounds are equivalent to increasing the cor¬ 
responding covariances. For example, stocks 
that have high covariance with other stocks 
tend to receive negative portfolio weights. 
Therefore, when their covariance is decreased 
(which is equivalent to the effect of impos¬ 
ing no short-selling constraints), these negative 
weights disappear. Similarly, stocks that have 
low covariances with other stocks tend to get 
overweighted. Hence, by increasing the corre¬ 
sponding covariances the impact of these over¬ 
weighted stocks decreases. 

Furthermore, Monte Carlo experiments per¬ 
formed by Jagannathan and Ma indicate that 
when no-short-sell constraints are imposed, the 
sample covariance matrix has about the same 
performance (as measured by the global min¬ 
imum variance (GMV) portfolio) as a covari¬ 
ance matrix estimator constructed from a factor 
structure. 

Care needs to be taken when imposing con¬ 
straints for robustness and stability purposes. 
For example, if the constraints used are too 
tight, they will completely determine the port¬ 
folio allocation—not the forecasts. 

Instead of providing ad hoc upper and 
lower bounds on each security, as proposed 
by Bouchaud, Potters, and Aguilar (1997), one 
can use so-called diversification indicators that 
measure the concentration of the portfolio. 
These diversification indicators can be used as 
constraints in the portfolio construction phase 
to limit the concentration to individual securi¬ 
ties. The authors demonstrate that these indi¬ 
cators are related to the information content of 
the portfolio in the sense of information theory. 6 
For example, a very concentrated portfolio cor¬ 
responds to a large information content (as we 
would only choose a very concentrated alloca¬ 
tion if our information about future price fluctu¬ 
ations is perfect), whereas an equally weighted 
portfolio would indicate low information con¬ 
tent (as we would not put "all the eggs in one 
basket" if our information about future price 
fluctuations is poor). 


Importance of Sensitivity Analysis 
In practice, in order to minimize dramatic 
changes due to estimation error, it is advisable 
to perform sensitivity analysis. For example, 
one can study the results of small changes or 
perturbations to the inputs from an efficient 
portfolio selected from a mean-variance opti¬ 
mization. If the portfolio calculated from the 
perturbed inputs drastically differs from the 
first one, this might indicate a problem. The per¬ 
turbation can also be performed on a security 
by security basis in order to identify those secu¬ 
rities that are the most sensitive. The objective 
of this sensitivity analysis is to identify a set of 
security weights that will be close to efficient 
under several different sets of plausible inputs. 
Issues with Highly Correlated Assets 
The inclusion of highly correlated securities is 
another major cause for instability in the mean- 
variance optimization framework. For example, 
high correlation coefficients among common as¬ 
set classes are one reason why real estate is pop¬ 
ular in optimized portfolios. Real estate is one 
of the few asset classes that has a low correlation 
with other common asset classes. But real estate 
in general does not have the liquidity necessary 
in order to implement these portfolios and may 
therefore fail to deliver the return promised by 
the real estate indexes. 

The problem of high correlations typically 
becomes worse when the correlation matrix 
is estimated from historical data. Specifically, 
when the correlation matrix is estimated over 
a slightly different period, correlations may 
change, but the impact on the new portfolio 
weights may be drastic. In these situations, it 
may be a good idea to resort to a shrinkage es¬ 
timator or a factor model to model covariances 
and correlations. 

Incorporating Uncertainty in the 
Inputs into the Portfolio 
Allocation Process 

In the classical mean-variance optimiza¬ 
tion problem, the expected returns and the 
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covariance matrix of returns are uncertain 
and have to be estimated. After the estima¬ 
tion of these quantities, the portfolio opti¬ 
mization problem is solved as a deterministic 
problem—completely ignoring the uncertainty 
in the inputs. However, it makes sense for the 
uncertainty of expected returns and risk to en¬ 
ter into the optimization process, thus creating 
a more realistic model. Using point estimates 
of the expected returns and the covariance ma¬ 
trix of returns, and treating them as error-free 
in portfolio allocation does not necessarily cor¬ 
respond to prudent investor behavior. 

The investor would probably be more com¬ 
fortable choosing a portfolio that would 
perform well under a number of different 
scenarios, thereby also attaining some protec¬ 
tion from estimation risk and model risk. Ob¬ 
viously, to have some insurance in the event of 
less likely but more extreme cases (e.g., scenar¬ 
ios that are highly unlikely under the assump¬ 
tion that returns are normally distributed), the 
investor must be willing to give up some of 
the upside that would result under the more 
likely scenarios. Such an investor seeks a ro¬ 
bust portfolio, that is, a portfolio that is assured 
against some worst-case model misspecifica- 
tion. The estimation process can be improved 
through robust statistical techniques such as 
shrinkage and Bayesian estimators discussed 
later in this entry. However, jointly consider¬ 
ing estimation risk and model risk in the finan¬ 
cial decision-making process is becoming more 
important. 

The estimation process frequently does not 
deliver a point forecast (that is, one single num¬ 
ber), but a full distribution of expected returns. 
Recent approaches attempt to integrate estima¬ 
tion risk into the mean-variance framework by 
using the expected return distribution in the 
optimization. A simple approach is to sample 
from the return distribution and average the 
resulting portfolios (Monte Carlo approach). 7 
However, as a mean-variance problem has to 
be solved for each draw, this is computationally 
intensive for larger portfolios. In addition, the 


averaging does not guarantee that the resulting 
portfolio weights will satisfy all constraints. 

Introduced in the late 1990s by Ben-Tal and 
Nemirovski (1998,1999) and El Ghaoui and Le- 
bret (1997) the robust optimization framework is 
computationally more efficient than the Monte 
Carlo approach. This development in optimiza¬ 
tion technology allows for efficiently solving 
the robust version of the mean-variance opti¬ 
mization problem in about the same time as 
the classical mean-variance optimization prob¬ 
lem. The technique explicitly uses the distri¬ 
bution from the estimation process to find a 
robust portfolio in one single optimization. 
It thereby incorporates uncertainties of inputs 
into a deterministic framework. The classical 
portfolio optimization formulations such as the 
mean-variance portfolio selection problem, the 
maximum Sharpe ratio portfolio problem, and 
the value-at-risk (VaR) portfolio problem all 
have robust counterparts that can be solved in 
roughly the same amount of time as the original 
problem. 8 


Large Data Requirements 

In classical mean-variance optimization, we 
need to provide estimates of the expected re¬ 
turns and covariances of all the securities in 
the investment universe considered. Typically, 
however, portfolio managers have reliable re¬ 
turn forecasts for only a small subset of these 
assets. This is probably one of the major rea¬ 
sons why the mean-variance framework has not 
been adopted by practitioners in general. It is 
simply unreasonable for the portfolio manager 
to produce good estimates of all the inputs re¬ 
quired in classical portfolio theory. 

We will see later in this entry that the Black- 
Litterman model provides a remedy in that it 
blends any views (this could be a forecast on 
just one or a few securities, or all of them) 
the investor might have with the market equi¬ 
librium. When no views are present, the re¬ 
sulting Black-Litterman expected returns are 
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just the expected returns consistent with the 
market equilibrium. Conversely, when the in¬ 
vestor has views on some of the assets, the re¬ 
sulting expected returns deviate from market 
equilibrium. 

SHRINKAGE ESTIMATION 

It is well known since the seminal work by Stein 
(1956) that biased estimators often yield better 
parameter estimates than their generally pre¬ 
ferred unbiased counterparts. In particular, it 
can be shown that if we consider the problem of 
estimating the mean of an N-dimensional mul¬ 
tivariate normal variable (N > 2), X e N (q, E) 
with known covariance matrix E, the sample 
mean q is not the best estimator of the pop¬ 
ulation mean q in terms of the quadratic loss 
function 

Mb. b) = (b - b)'z -1 (b- b) 

For example, the so-called James-Stein shrink¬ 
age estimator 

b/S= (1 - w)fl+WfX 0 l 

has a lower quadratic loss than the sample 
mean, where 

• (i N ~ 2 \ 

w = mm 1,--- 

V T(q- poO'E (b- MoO/ 

and i = [1,1,.. .,1]'. Moreover, T is the number 
of observations, and /xq is an arbitrary number. 
The vector /iqi and the weight zv are referred 
to as the shrinkage target and the shrinkage 
intensity (or shrinkage factor), respectively. Al¬ 
though there are some choices of /xq that are 
better than others, what is surprising with this 
result is that it could be any number! This fact 
is referred to as the Stein paradox. 

In effect, shrinkage is a form of averaging 
different estimators. The shrinkage estimator 
typically consists of three components: (1) an 
estimator with little or no structure (like the 
sample mean above); (2) an estimator with a 
lot of structure (the shrinkage target); and (3) 
the shrinkage intensity. The shrinkage target is 


chosen with the following two requirements in 
mind. First, it should have only a small num¬ 
ber of free parameters (robust and with a lot of 
structure). Second, it should have some of the 
basic properties in common with the unknown 
quantity being estimated. The shrinkage inten¬ 
sity can be chosen based on theoretical proper¬ 
ties or simply by numerical simulation. 

Probably the most well-known shrinkage 
estimator 9 used to estimate expected returns 
in the financial literature is the one proposed 
by Jorion (1986) where the shrinkage target is 
given by /x„t with 

£'E _1 q 


and 

N + 2 

w = -=- 

N+ 2 + T(q - q,gt)'E 1 (q-/x^t) 

We note that /x,., is the return on the GMV 
portfolio. Several studies document that for the 
mean-variance framework: (1) the variability in 
the portfolio weights from one period to the 
next decrease; and (2) the out-of-sample risk- 
adjusted performance improves significantly 
when using a shrinkage estimator as compared 
to the sample mean. 10 

We can also apply the shrinkage technique 
for covariance matrix estimation. This involves 
shrinking an unstructured covariance estimator 
toward a more structured covariance estimator. 
Typically the structured covariance estimator 
only has a few degrees of freedom (only a few 
nonzero eigenvalues) as motivated by random 
matrix theory. 

For example, as shrinkage targets, Ledoit and 
Wolf (2003, 2004) suggest using the covariance 
matrix that follows from the single-factor model 
developed by Sharpe (1963) or the constant 
correlation covariance matrix. 11 In practice the 
single-factor model and the constant correlation 
model yield similar results, but the constant cor¬ 
relation model is much easier to implement. In 
the case of the constant correlation model, the 
shrinkage estimator for the covariance matrix 
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takes the form 

t LW = w±cc + (1 - w)t 

where t is the sample covariance matrix, and 
t CC is the sample covariance matrix with con¬ 
stant correlation. The sample covariance matrix 
with constant correlation is computed as fol¬ 
lows. 

First, we decompose the sample covariance 
matrix according to 

t = AC A' 

where is A a diagonal matrix of the volatilities of 
returns and C is the sample correlation matrix, 
that is, 

1 P12 • • • Pin 

c = hl 

: ' • ' ■ Pn- in 

_ pm ■ ■ ■ pnn -i 1 

Second, we replace the sample correlation ma¬ 
trix with the constant correlation matrix 

"1 p ■■■ p~ 

Ccc= P . . ’ . ’ : 

: p 

_P ■■■ p 1 _ 

where p is the average of all the sample corre¬ 
lations, in other words 

2 N N 

P = (N — 1)N ^ ^ Pi] 

x ' i=i j=i+i 

The optimal shrinkage intensity can be shown 
to be proportional to a constant divided by the 
length of the history, T. 12 

Ledoit and Wolf (2003, 2004) compare the 
empirical out-of-sample performance of their 
shrinkage covariance matrix estimators with 
other covariance matrix estimators, such as the 
sample covariance matrix, a statistical factor 
model based on the first five principal compo¬ 
nents, and a factor model based on the 48 in¬ 
dustry factors as defined by Fama and French 
(1997). The results indicate that when it comes 


to computing a GMV portfolio, their shrinkage 
estimators are superior compared to the others 
tested, with the constant correlation shrinkage 
estimator coming out slightly ahead. Interest¬ 
ingly enough, it turns out that the shrinkage 
intensity for the single-factor model (the shrink¬ 
age intensity for the constant coefficient model 
is not reported) is fairly constant throughout 
time with a value around 0.8. This suggests that 
there is about four times as much estimation 
error present in the sample covariance matrix 
as there is bias in the single-factor covariance 
matrix. 


THE BLACK-LITTERMAN 
MODEL 

In the Black-Litterman model an estimate of 
future expected returns is based on combin¬ 
ing market equilibrium (e.g., the CAPM equi¬ 
librium) with an investor's views. As we will 
see, the Black-Litterman expected return is a 
shrinkage estimator where market equilibrium 
is the shrinkage target and the shrinkage in¬ 
tensity is determined by the portfolio manger's 
confidence in the model inputs. We will make 
this statement precise later in this section. Such 
views are expressed as absolute or relative de¬ 
viations from equilibrium together with confi¬ 
dence levels of the views (as measured by the 
standard deviation of the views). 

The Black-Litterman expected return is calcu¬ 
lated as a weighted average of the market equi¬ 
librium and the investor's views. The weights 
depend on (1) the volatility of each asset and 
its correlations with the other assets and (2) the 
degree of confidence in each forecast. The re¬ 
sulting expected return, which is the mean of 
the posterior distribution, is then used as in¬ 
put in the portfolio optimization process. Port¬ 
folio weights computed in this fashion tend to 
be more intuitive and less sensitive to small 
changes in the original inputs (i.e., forecasts of 
market equilibrium, investor's views, and the 
covariance matrix). 
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The Black-Litterman model can be interpreted 
as a Bayesian model. Named after the English 
mathematician Thomas Bayes, the Bayesian ap¬ 
proach is based on the subjective interpreta¬ 
tion of probability. A probability distribution 
is used to represent an investor's belief on the 
probability that a specific event will actually 
occur. This probability distribution, called the 
prior distribution, reflects an investor's knowl¬ 
edge about the probability before any data are 
observed. After more information is provided 
(e.g., data observed), the investor's opinions 
about the probability might change. Bayes' rule 
is the formula for computing the new proba¬ 
bility distribution, called the posterior distri¬ 
bution. The posterior distribution is based on 
knowledge of the prior probability distribution 
plus the new data. A posterior distribution of 
expected return is derived by combining the 
forecast from the empirical data with a prior 
distribution. 

The ability to incorporate exogenous in¬ 
sight, such as a portfolio manager's judgment, 
into formal models is important: Such insight 
might be the most valuable input used by the 
model. The Bayesian framework allows fore¬ 
casting systems to use such external informa¬ 
tion sources and subjective interventions (i.e., 
modification of the model due to judgment) in 
addition to traditional information sources such 
as market data and proprietary data. 

Because portfolio managers might not be 
willing to give up control to a black box, 
incorporating exogenous insights into formal 
models through Bayesian techniques is one 
way of giving the portfolio manager better 
control in a quantitative framework. Forecasts 
are represented through probability distribu¬ 
tions that can be modified or adjusted to in¬ 
corporate other sources of information deemed 
relevant. The only restriction is that such ad¬ 
ditional information (i.e., the investor's views) 
be combined with the existing model through 
the laws of probability. In effect, incorporating 
Bayesian views into a model allows one to ra¬ 
tionalize subjectivity within a formal, quanti¬ 


tative framework. "[T]he rational investor is a 
Bayesian," as Markowitz noted (1987, p. 57). 


Derivation of the 
Black-Litterman Model 

The basic feature of the Black-Litterman model 
that we discuss in this and the following sec¬ 
tions is that it combines an investor's views 
with the market equilibrium. Let us under¬ 
stand what this statement implies. In the clas¬ 
sical mean-variance optimization framework 
an investor is required to provide estimates 
of the expected returns and covariances of all 
the securities in the investment universe con¬ 
sidered. This is of course a humongous task, 
given the number of securities available today. 
Portfolio and investment managers are very un¬ 
likely to have a detailed understanding of all 
the securities, companies, industries, and sec¬ 
tors that they have at their disposal. Typically, 
most of them have a specific area of expertise 
that they focus on in order to achieve superior 
returns. 

This is probably one of the major reasons why 
the mean-variance framework has not been 
adopted among practitioners in general. It is 
simply unrealistic for the portfolio manager to 
produce reasonable estimates (besides the ad¬ 
ditional problems of estimation error) of the in¬ 
puts required in classical portfolio theory. 

Furthermore, many trading strategies used 
today cannot easily be turned into forecasts of 
expected returns and covariances. In particu¬ 
lar, not all trading strategies produce views on 
absolute return, but rather just provide rela¬ 
tive rankings of securities that are predicted 
to outperform/underperform other securities. 
For example, considering two stocks, A and B, 
instead of the absolute view, "the one-month 
expected return on A and B are 1.2% and 1.7% 
with a standard deviation of 5% and 5.5%, re¬ 
spectively," a relative view may be of the form 
"B will outperform A with half a percent over 
the next month" or simply "B will outperform 
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A over the next month." Clearly, it is not an easy 
task to translate any of these relative views into 
the inputs required for the modern portfolio 
theoretical framework. We now walk through 
and illustrate the usage of the Black-Litterman 
model in three simple steps. 

Step 1: Basic Assumptions and 
Starting Point 

One of the basic assumptions underlying the 
Black-Litterman model is that the expected re¬ 
turn of a security should be consistent with 
market equilibrium unless the investor has a 
specific view on the security. In other words, an 
investor who does not have any views on the 
market should hold the market. lj 
Our starting point is the CAPM model: 

E(Ri) - R f = pi(E(R M ) - R f ) 

where E(R;), E(R M ), and Rf are the expected re¬ 
turn on security z, the expected return on the 
market portfolio, and the risk-free rate, respec¬ 
tively. Furthermore, 

COv(R;, R m ) 

Pi = - 2 - 

where a M is the variance of the market port¬ 
folio. Let us denote by w ; , = (w bl/ . .., w^)' the 
market capitalization or benchmark weights, so 
that with an asset universe of N securities 14 the 
return on the market can be written as 
N 

Rm = J2 w bj R j 

j =1 

Then by the CAPM, the expected excess re¬ 
turn on asset z, 11; = E(R,) - Rf, becomes 

LI; = Pi(E(R M ) - Rf) 

a M 

E(R M )~R f ^ , D x 
= - 5 -- 2_^ COV (R‘' R j) w bj 

a M j = i 

We can also express this in matrix-vector 
form as 


II = 8Y,w 


where we define the market price of risk as 

_ E(R m ) - R f 

° ~ _2 
a M 

the expected excess return vector 


n = 


Lh 

n w 


and the covariance matrix of returns 


cov(Ri, Ri) • • • cov(Ri, R n ) 


X = 


cov(R N , Ri) ■ ■ ■ co v(R n , R n ) 


The true expected returns /z of the securities 
are unknown. However, we assume that our 
previous equilibrium model serves as a reason¬ 
able estimate of the true expected returns in the 
sense that 


n = p+ £n, £n ~ N(0, rX) 

for some small parameter r << 1. We can think 
about r X as our confidence in how well we can 
estimate the equilibrium of expected returns. In 
other words, a small r implies a high confidence 
in our equilibrium estimates and vice versa. 

According to portfolio theory, because the 
market portfolio is on the efficient frontier, as 
a consequence of the CAPM an investor will 
be holding a portfolio consisting of the market 
portfolio and a risk-free instrument earning the 
risk-free rate. But let us now see what happens 
if an investor has a particular view on some of 
the securities. 


Step 2: Expressing an Investor's Views 
Formally, K views in the Black-Litterman model 
are expressed as a K-dimensional vector q with 

q = Pp + t q , e, ~ N( 0, ft) 

where P is a K x N matrix (explained in the 
following example) and ft is a K x K matrix 
expressing the confidence in the views. In order 
to understand this mathematical specification 
better, let us take a look at an example. 
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Let us assume that the asset universe that we 
consider has five stocks (N = 5) and that an 
investor has the following two views: 


1. Stock 1 will have a return of 1.5%. 

2. Stock 3 will outperform Stock 2 by 4%. 


We recognize that the first view is an absolute 
view whereas the second one is a relative view. 
Mathematically, we express the two views to¬ 
gether as 


1.5% 

4% 


1 0 0 0 0 

0-1100 


Mi 

M2 

M3 

M4 

Ms 


si 

s2 


The first row of the P matrix represents the first 
view, and similarly, the second row describes 
the second view. In this example, we chose the 
weights of the second view such that they add 
up to zero, but other weighting schemes are 
also possible. For instance, the weights could 
also be chosen as some scaling factor times one 
over the market capitalizations of the stock, 
some scaling factor times one over the stock 
price, or other variations thereof. We come 
back to these issues later in this section when 
we discuss how to incorporate time-series- 
based strategies and cross-sectional ranking 
strategies. 

We also remark at this point that the error 
terms ei, £2 do not explicitly enter into the 
Black-Litterman model—but their variances do. 
Quite simply, these are just the variances of 
the different views. Although in some instances 
they are directly available as a by-product of the 
view or the strategy, in other cases they need to 
be estimated separately. For example. 


represents a much lower confidence in the 
views. We discuss a few different approaches 
in choosing the confidence levels below. The 
off-diagonal elements of ft are typically set to 
zero. The reason for this is that the error terms 
of the individual views are most often assumed 
to be independent of one another. 


Step 3: Combining an Investor's Views with 
Market Equilibrium 

Flaving specified the market equilibrium and 
an investor's views separately, we are now 
ready to combine the two. There are two differ¬ 
ent, but equivalent, approaches that can be used 
to arrive at the Black-Litterman model. We will 
describe a derivation that relies upon standard 
econometrical techniques, in particular, the so- 
called mixed estimation technique described by 
Theil (1971). The approach based on Bayesian 
statistics has been explained in some detail by 
Satchell and Scowcroft (2000). 

Let us first recall the specification of market 
equilibrium 


II — q + £rn £n ~ N(0, r£) 

and the one for the investor's views 

q = Pp + e q , e q ~ N( 0, ft) 

We can stack these two equations together in 
the form 


y = Xu + e, e ~ N(0, V) 


where 


ft = 


'1% 2 0 

0 1% 2 _ 

y = 

n 

_q. 

,x = 

1 

p 

,v = 

1 1 

Ci 

W 

1 _ 1 


corresponds to a higher confidence in the views, 
and conversely. 


ft = 


5% 2 0 
0 7% 2 


with I denoting the N x N identity matrix. We 
observe that this is just a standard linear model 
for the expected returns |_i. Calculating the gen¬ 
eralized least squares (GLS) estimator for p, we 
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obtain 


(i BL = (X'V'XJ-'X'V-’y 


= (P P' 

x [I P' 


(tE)- 


ft 


I 

n 


-(tE )- 1 

n" 

ft - 1 _ 

_q. 


= l[i P'] 
= [(TS )- 1 


(tE)- 1 ' 

ft _1 P 
P'ft *P 


-1 


[I P' 

- 1 [(tE) _ 1 II + P'ft _1 q 


(tE)-^ 


ft _1 q 


The last line in the above formula is the Black- 
Litterman expected returns that blend the mar¬ 
ket equilibrium with the investor's views. 


Some Remarks and Observations 

Following are some comments in order to pro¬ 
vide a better intuitive understanding of the 
formula. We see that if the investor has no 
views (that is, q = ft = 0) or the confidence in 
the views is zero, then the Black-Litterman ex¬ 
pected return becomes q BL = II. Consequently, 
the investor will end up holding the market 
portfolio as predicted by the CAPM. In other 
words, the optimal portfolio in the absence of 
views is the defined market. 

If we were to plug return targets of zero or 
use the available cash rates, for example, into 
an optimizer to represent the absence of views, 
the result would be an optimal portfolio that 
looks very much different from the market. The 
equilibrium returns are those forecasts that in 
the absence of any other views will produce 
an optimal portfolio equal to the market port¬ 
folio. Intuitively speaking, the equilibrium re¬ 
turns in the Black-Litterman model are used to 
center the optimal portfolio around the market 
portfolio. 

By using q = Pq + t q , we have that the 
investor's views alone imply the estimate 
of expected returns q = (P'P) _1 P'q. Since 
P(P P) 1 P' = I where I is the identity matrix, 
we can rewrite the Black-Litterman expected 
returns in the form 

q BL = [(tE) - 1 +P , S2 -1 P] _1 [(rE)- 1 n+P'fi- 1 Pq] 


Now we see that the Black-Litterman ex¬ 
pected return is a confidence weighted linear 
combination of market equilibrium II and the 
expected return q implied by the investor's 
views. The two weighting matrices are given 
by 

w n = [(tE )- 1 + P'ft 1 ?] 1 (rE)- 1 
Wq = [(rE)- 1 + p'ft^p] -1 p'ft *p 

where 

wn = w q = I 

In particular, (rE) _1 and Pft 1 P represent the 
confidence we have in our estimates of the 
market equilibrium and the views, respectively. 
Therefore, if we have low confidence in the 
views, the resulting expected returns will be 
close to the ones implied by market equilib¬ 
rium. Conversely, with higher confidence in the 
views, the resulting expected returns will de¬ 
viate from the market equilibrium implied ex¬ 
pected returns. We say that we tilt away from 
market equilibrium. 

It is straightforward to show that the Black- 
Litterman expected returns can also be written 
in the form 

q Bi = n + rEP'(ft + rPEP')~ 1 (q - PII) 

where we now immediately see that we tilt 
away from the equilibrium with a vector pro¬ 
portional to EP'(ft + rPEP') _1 (q — PII). 

We also mention that the Black-Litterman 
model can be derived as a solution to the fol¬ 
lowing optimization problem: 

Pbl = argmin {(II — q)' E" 1 (II - q) 

\x 

+ r(q - Pq)'ft _1 (q - Pq)} 

From this formulation we see that q BL is chosen 
such that it is simultaneously as close to II, and 
Pq is as close to q as possible. The distances 
are determined by E 1 and ft 1 . Furthermore, 
the relative importance of the equilibrium ver¬ 
sus the views is determined by r. For example, 
for r large the weight of the views is increased. 
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whereas for r small the weight of the equilib¬ 
rium is higher. Moreover, we also see that r is 
a redundant parameter as it can be absorbed 
into 

It is straightforward to calculate the variance 
of the Black-Litterman combined estimator of 
the expected returns by the standard sandwich 
formula, that is, 

var(p BL ) = (X'V _1 X)~ 1 

= [(r£) -1 + P'S2 _1 P] 1 

The most important feature of the Black- 
Litterman model is that it uses the mixed 
estimation procedure to adjust the entire mar¬ 
ket equilibrium implied expected return vector 
with an investor's views. Because security re¬ 
turns are correlated, views on just a few assets 
will, due to these correlations, imply changes 
to the expected returns on all assets. Mathemat¬ 
ically speaking, this follows from the fact that 
although the vector q can have dimension K<< 
N, P'S2 1 is an N x K matrix that propagates the 
K views into N components, PT2 'q. This effect 
is stronger the more correlated the different se¬ 
curities are. In the absence of this adjustment 
of the expected return vector, the differences 
between the equilibrium expected return and 
an investor's forecasts will be interpreted as 
an arbitrage opportunity by a mean-variance 
optimizer and result in portfolios concentrated 
in just a few assets ("corner solutions"). Intu¬ 
itively, any estimation errors are spread out over 
all assets, making the Black-Litterman expected 
return vector less sensitive to errors in individ¬ 
ual views. This effect contributes to the mitiga¬ 
tion of estimation risk and error maximization 
in the optimization process. 

Practical Considerations and Extensions 
In this subsection we discuss a few practi¬ 
cal issues in using the Black-Litterman model. 
Specifically, we discuss how to incorporate fac¬ 
tor models and cross-sectional rankings in this 
framework. Furthermore, we also provide some 
ideas on how the confidences in the views can 


be estimated in cases where these are not di¬ 
rectly available. 

It is straightforward to incorporate factor 
models in the Black-Litterman framework. Let 
us assume we have a factor representation of 
the returns of some of the assets, that is 

Rj = oil T F|3 ; - T 8j , i € 1 

where I C {1.2,..., N). Typically, from a factor 
model it is easy to obtain an estimate of the 
residual variance, var(e,). In this case, we set 

a -f- F (3, z G I 
0, otherwise 

and the corresponding confidence 

2 _ var(e ; ), i e I 
11 0, otherwise 

The P matrix is defined by 

_|L iel 
^ ll 0, otherwise 
Pij = 0, i j 

Of course in a practical implementation we 
would omit rows with zeros. 

Many quantitative investment strategies do 
not a priori produce expected returns, but 
rather just a simple ranking of the securities. Let 
us consider a ranking of securities from best to 
worst (from an outperforming to an underper¬ 
forming perspective, etc.). For example, a value 
manager might consider ranking securities in 
terms of increasing book-to-price ratio (B/P), 
where a low B/P would indicate an underval¬ 
ued stock (potential to increase in value) and 
high B/P an overvalued stock (potential to de¬ 
crease in value). From this ranking we form a 
long-short portfolio where we purchase the top 
half of the stocks (the group that is expected to 
outperform) and we sell short the second half 
of stocks (the group that is expected to under¬ 
perform). The view q in this case becomes a 
scalar, equal to the expected return on the long- 
short portfolio. The confidence of the view can 
be decided from backtests, as we describe next. 
Further, here the P matrix is a 1 x N matrix of 
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ones and minus ones. The corresponding col¬ 
umn component is set to one if the security be¬ 
longs to the outperforming group, or minus one 
if it belongs to the underperforming group. 

In many cases we may not have a direct es¬ 
timate of the expected return and confidence 
(variance) of the view. There are several differ¬ 
ent ways to determine the confidence level. 

One of the advantages of a quantitative strat¬ 
egy is that it can be backtested. In the case of 
the long-short portfolio strategy discussed pre¬ 
viously, we could estimate its historical vari¬ 
ance through simulation with historical data. 
Of course, we cannot completely judge the per¬ 
formance of a strategy going forward from our 
backtests. Nevertheless, the backtest methodol¬ 
ogy allows us to obtain an estimate of the Black- 
Litterman view and confidence for a particular 
view/strategy. 

Another approach of deriving estimates of 
the confidence of the view is through sim¬ 
ple statistical assumptions. To illustrate, let us 
consider the second view in the preceding ex¬ 
ample: "Stock 3 will outperform Stock 2 by 4%." 
If we don't know its confidence, we can come 
up with an estimate for it from the answers to a 
few simple questions. We start asking ourselves 
with what certainty we believe the strategy will 
deliver a return between 3% and 5% (4% ± a 
where a is some constant, in this case a = 1%). 
Let us say that we believe there is a chance of 
two out of three that this will happen, 2 / 3 «6 7%. 
If we assume normality, we can interpret this as 
a 67% confidence interval for the future return 
to be in the interval [3%, 5%]. From this con¬ 
fidence interval we calculate that the implied 
standard deviation is equal to about 0.66%. 
Therefore, we would set the Black-Litterman 
confidence equal to (0.66%) 2 = 0.43%. 

Some extensions to the Black-Litterman 
model have been derived. For example. Satchel 
and Scowcroft (2000) propose a model where 
an investor's view on global volatility is incor¬ 
porated in the prior views by assuming that 
r is unknown and stochastic. Idzorek (2005) 
introduces a new idea for determining the 


confidence level of a view. Fie proposes that 
the investor derives his confidence level indi¬ 
rectly by first specifying his confidence in the 
tilt away from equilibrium (the difference be¬ 
tween the market capitalization weights and the 
weights implied by the view alone). Qian and 
Gorman (2001) describe a technique based on 
conditional distribution theory that allows an 
investor to incorporate his views on any or all 
variances. 

Of course other asset classes beyond equities 
and bonds can be incorporated into the Black- 
Litterman framework. 15 Some practical expe¬ 
riences and implementation details have been 
described by Bevan and Winkelman (1998) and 
Fie and Litterman (1999). A Bayesian approach, 
with some similarity to the Black-Litterman 
model, to portfolio selection using higher mo¬ 
ments has been proposed by Flarvey et al. 
( 2010 ). 


KEY POINTS 

* Classical mean-variance optimization is sen¬ 
sitive to estimation error and small changes 
in the inputs. 

* There are four different approaches to make 
the classical mean-variance framework more 
robust: (1) improve the accuracy of the inputs; 

(2) use constraints for the portfolio weights; 

(3) use portfolio resampling to calculate the 
portfolio weights; and (4) apply the robust 
optimization framework to the portfolio allo¬ 
cation process. 

* Typically, errors in the expected returns are 
about 10 times more important than errors in 
the covariance matrix, and errors in the vari¬ 
ances are about twice as important as errors 
in the covariances. 

* Estimates of expected return and covariances 
can be improved by using shrinkage esti¬ 
mation. Shrinkage is a form of averaging 
different estimators. The shrinkage estimator 
typically consists of three components: (1) 
an estimator with little or no structure; (2) an 
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estimator with a lot of structure (the shrink¬ 
age target); and (3) the shrinkage intensity. 

• Jorion's shrinkage estimator for the expected 
return shrinks toward the return of the global 
minimum variance portfolio. 

• The sample covariance matrix should not be 
used as an input to the mean-variance prob¬ 
lem. By shrinking it toward the covariance 
matrix with constant correlations, its quality 
will be improved. 

• The Black-Litterman model combines an in¬ 
vestor 's views with the market equilibrium. 

• The Black-Litterman expected return is a con¬ 
fidence weighted linear combination of mar¬ 
ket equilibrium and the investor's views. The 
confidence in the views and in market equi¬ 
librium determines the relative weighting. 

• Factor models as well as simple ranking mod¬ 
els can be simultaneously incorporated into 
the Black-Litterman model. 


NOTES 

1. See Broadie (1993). 

2. We are grateful to Axioma Inc. for provid¬ 
ing us with this example. Previously, it has 
appeared in Ceria and Stubbs (2005). 

3. See Best and Grauer (1991,1992). 

4. See Chopra and Ziemba (1993) and Kallberg 
and Ziemba (1984). 

5. See, for example. Frost and Savarino (1988), 
Chopra (1991), and Grauer and Shen (2000). 

6. The relationship to information theory is 
based upon the premise that the diversifi¬ 
cation indicators are generalized entropies. 
See Curado and Tsallis (1991). 

7. See, for example, Michaud (1998), Jorion 
(1992), and Scherer (2002). 

8. See Goldfarb and Iyengar (2003). 

9. Many similar approaches have been pro¬ 
posed. For example, see Jobson and Korkie 
(1981) and Frost and Savarino (1986). 

10. See, for example, Michaud (1998), Jorion 
(1986), and Larsen and Resnick (2001). 

11. Elton, Gruber, and Urich (1978) proposed 
the single factor model for purposes of co- 


variance estimation. They show that this ap¬ 
proach leads to (1) better forecasts of the 
covariance matrix; (2) more stable portfolio 
allocations over time; and (3) more diversi¬ 
fied portfolios. They also find that the aver¬ 
age correlation coefficient is a good forecast 
of the future correlation matrix. 

12. Although straightforward to implement, 
the optimal shrinkage intensity, w, is a bit 
tedious to write down mathematically. Let 
us denote by f the return on security i dur¬ 
ing period t,l<i<N,l<t<T, 

1 T 

r, = — and 

t =1 

1 T 

j- J](b,f - f;)(b ,t - ft) 

1 f=i 


Then the optimal shrinkage intensity is 
given by the formula 


w — max 


0, min 



)) 


where 


jt 


— c 


Y 


and the parameters, jr,c,y, are computed 
as follows. First, dis given by 


N 

jr = ^ Ttij 

i 


where 




f - - c) - °ij ) 2 

t=j 


Second, c is given by 


N 

c = J2 

i =1 

N . 

^ ' 2 Pa/Pii&iijj “1“ yjPiilPii&jjrii'j 

i =1 Z 
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where 

T 

[(( r '. f - f ') 2 - & «) 

t =1 

x ((h,t - fi)(rj,t -fi)- a.j)] 
Finally, y is given by 

Y = IIC - Ccclll 

where || • ||f denotes the Frobenius norm 
defined by 

N 

iiaiif= 

N f;=i 

13. A predecessor to the Black-Litterman model 
is the so-called Treynor-Black model. In this 
model, an investor's portfolio is shown to 
consist of two parts: (1) a passive portfolio/ 
positions held purely for the purpose of 
mimicking the market portfolio, and (2) 
an active portfolio/positions based on the 
investor's return/risk expectations. This 
somewhat simpler model relies on the 
assumption that returns of all securities 
are related only through the variation of 
the market portfolio (Sharpe's diagonal 
model). See Treynor and Black (1973). 

14. For simplicity, we consider only equity 
securities. Extending this model to other 
assets classes such as bonds and currencies 
is fairly straightforward. 

Two comments about the above two rela¬ 
tionships are of importance: 

1. As it may be difficult to accurately esti¬ 
mate expected returns, practitioners use 
other techniques. One is that of reverse 
optimization, also referred to as the tech¬ 
nique of implied expected returns. The 
technique simply uses the expression 
n = SY,w to calculate the expected re¬ 
turn vector given the market price of risk 
S, the covariance matrix E, and the mar¬ 
ket capitalization weights w. The tech¬ 
nique was first introduced by Sharpe 
(1974) and Fisher (1975) and is an impor¬ 


tant component of the Black-Litterman 
model. 

2. We note that E(Rm) - Rf is the market 
risk premium (or the equity premium) 
of the universe of assets considered. As 
pointed out by Herold (2005) and Id- 
zorek (2005), using a market proxy with 
different risk-return characteristics than 
the market capitalization weighted port¬ 
folio for determining the market risk pre¬ 
mium may lead to nonintuitive expected 
returns. For example, using a market risk 
premium based on the S&P 500 for cal¬ 
culating the implied equilibrium return 
vector for the NASDAQ 100 should be 
avoided. 

15. See, for example. Black and Litterman 

(1992) and Litterman (2003). 
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Abstract: The value of any financial asset is the present value of its expected future cash flows. 
To value a bond, one must be able estimate the bond's remaining cash flows and identify the 
appropriate discount rate(s). The traditional approach to bond valuation is to discount every 
cash flow with the same discount rate. Simply put, the relevant yield curve used in valuation is 
assumed to be flat. This approach permits opportunities for arbitrage. Alternatively, the arbitrage- 
free valuation approach starts with the premise that a bond should be viewed as a portfolio or 
package of zero-coupon bonds. Moreover, each of the bond's cash flows is valued using a unique 
discount rate that depends on the shape of the yield curve and when the cash flow is delivered in 
time. The relevant set of discount rates (that is, spot rates) is derived from the Treasury yield curve 
and when used to value risky bonds augmented with a spread. 


Valuation is the process of determining the fair 
value of a financial asset. In this entry, we will 
explain the general principles of bond valua¬ 
tion. Our focus will be on how to value option- 
free bonds (that is, bonds that are not callable, 
putable, or convertible). A special analytical 
framework is required to value more complex 
bond structures such as bonds that are callable 
or putable and mortgage-backed and certain 
asset-backed securities. 

GENERAL PRINCIPLES OF 
BOND VALUATION 

The fundamental principle of valuation is that 
the value of any financial asset is equal to the 
present value of its expected future cash flows. 


This principle holds for any financial asset from 
zero-coupon bonds to interest rate swaps. Thus, 
the valuation of a financial asset involves the 
following three steps: 

Step 1: Estimate the expected future cash flows. 
Step 2: Determine the appropriate interest rate 
or interest rates that should be used to dis¬ 
count the cash flows. 

Step 3: Calculate the present value of the ex¬ 
pected future cash flows found in Step 1 by 
using the appropriate interest rate or interest 
rates determined in Step 2. 

Estimating Cash Flows 

Cash flow is simply the cash that is expected 
to be received in the future from owning a 
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financial asset. For a fixed income security, it 
does not matter whether the cash flow is interest 
income or repayment of principal. A security's 
cash flows represent the sum of each period's 
expected cash flow. Even if we disregard de¬ 
fault, the cash flows for only a few fixed income 
securities are simple to forecast accurately. U.S. 
Treasury securities possess this feature since 
they have known cash flows. While the prob¬ 
ability of default of the U.S. government is not 
zero, it is close enough to that threshold to be 
safely ignored. Besides, if the U.S. government 
ever does default, we will have other things to 
worry about than valuing bonds. For Treasury 
coupon securities, the cash flows consist of the 
coupon interest payments every six months up 
to and including the maturity date and the prin¬ 
cipal repayment at the maturity date. 

Many fixed income securities have features 
that make estimating their cash flows problem¬ 
atic. These features may include one or more of 
the following: 

1. The issuer or the investor has the option to 
change the contractual due date of the repay¬ 
ment of the principal. 

2. The coupon and / or principal payment is re¬ 
set periodically based on a formula that de¬ 
pends on one or more market variables (e.g., 
interest rates, inflation rates, exchange rates, 
etc.). 

3. The investor has the choice to convert or ex¬ 
change the security into common stock or 
some other financial asset. 

Callable bonds, putable bonds, mortgage- 
backed securities, and asset-backed securities 
are examples of (1). Floating-rate securities and 
Treasury Inflation Protected Securities (TIPS) 
are examples of (2). Convertible bonds and ex¬ 
changeable bonds are examples of (3). 1 

For securities that fall into the first category, 
a key factor determining whether the owner of 
the option (either the issuer of the security or 
the investor) will exercise the option to alter the 
security's cash flows is the level of interest rates 


in the future relative to the security's coupon 
rate. In order to estimate the cash flows for these 
types of securities, we must determine how the 
size and timing of their expected cash flows will 
change in the future. For example, when esti¬ 
mating the future cash flows of a callable bond, 
we must account for the fact that when interest 
rates change, the expected cash flows change. 
This introduces an additional layer of complex¬ 
ity to the valuation process. For bonds with 
embedded options, estimating cash flows is ac¬ 
complished by introducing a parameter that re¬ 
flects the expected volatility of interest rates. 

Determining the Appropriate 
Interest Rate or Rates 

Once we estimate the cash flows for a fixed in¬ 
come security, the next step is to determine the 
appropriate interest rate for discounting each 
cash flow. Before proceeding, we pause here 
to note that we will use the terms "interest 
rate," "discount rate," and "required yield" in¬ 
terchangeably throughout this entry. The inter¬ 
est rate used to discount a particular security's 
cash flows will depend on three basic factors: 
(1) the level of benchmark interest rates (that is, 
U.S. Treasury rates); (2) the risks that the mar¬ 
ket perceives the securityholder is exposed to; 
and (3) the compensation the market expects to 
receive for these risks. 

The minimum interest rate that an investor 
should require is the yield available in the mar¬ 
ketplace on a default-free cash flow. For bonds 
with dollar-denominated cash flows, yields on 
U.S. Treasury securities serve as benchmarks for 
default-free interest rates. For now, we can think 
of the minimum interest rate that investors re¬ 
quire as the yield on a comparable maturity 
Treasury security. 

The additional compensation or spread over 
the yield on the Treasury issue that investors 
will require reflects the additional risks the in¬ 
vestor faces by acquiring a security that is not 
issued by the U.S. government. These risks in¬ 
clude default risk, liquidity risk, and the risks 
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associated with any embedded options. These 
yield spreads will depend not only on the risks 
an individual issue is exposed to but also on 
the level of Treasury yields, the market's risk 
aversion, the business cycle, and so forth. 

For each cash flow estimated, the same in¬ 
terest rate can be used to calculate the present 
value. This is the traditional approach to val¬ 
uation and it serves as a useful starting point 
for our discussion. We discuss the traditional 
approach in the next section and use a single 
interest rate to determine present values. By do¬ 
ing this, however, we are implicitly assuming 
that the yield curve is flat. Since the yield curve 
is almost never flat and a coupon bond can be 
thought of as a package of zero-coupon bonds, 
it is more appropriate to value each cash flow 
using an interest rate specific to that cash flow. 
After the traditional approach to valuation is 
discussed, we will explain the proper approach 
to valuation using multiple interest rates and 
demonstrate why this must be the case. 


Discounting the Expected 
Cash Flows 

Once the expected (estimated) cash flows and 
the appropriate interest rate or interest rates 
that should be used to discount the cash flows 
are determined, the final step in the valuation 
process is to value the cash flows. The present 
value of an expected cash flow to be received t 
years from now using a discount rate i is: 

„ , Expected cash flow in period f 

Present valuer =-—-—- 

(i + 0 f 

The value of a financial asset is then the sum of 
the present value of all the expected cash flows. 
Specifically, assuming that there are N expected 
cash flows: 

Value = Present valuei + Present value 2 + • • • 
+P resent valuer 


Determining a Bond's Value 

Determining a bond's value involves comput¬ 
ing the present value of the expected future cash 
flows using a discount rate that reflects market 
interest rates and the bond's risks. A bond's 
cash flows come in two forms—coupon inter¬ 
est payments and the repayment of principal at 
maturity. In practice, many bonds deliver semi¬ 
annual cash flows. Fortunately, this does not 
introduce any complexities into the calculation. 
Two simple adjustments are needed. First, we 
adjust the coupon payments by dividing the an¬ 
nual coupon payment by 2. Second, we adjust 
the discount rate by dividing the annual dis¬ 
count rate by 2. The time period t in the present 
value expression is treated in terms of 6-month 
periods as opposed to years. 

To illustrate the process, let's value a 4-year, 
6% coupon bond with a maturity value of $100. 
The coupon payments are $3 (0.06 x $100/2) 
every six months for the next eight periods. In 
addition, on the maturity date, the investor re¬ 
ceives the repayment of principal ($100). The 
value of a nonamortizing bond can be divided 
in two components: (1) the present value of the 
coupon payments (that is, an annuity) and (2) 
the present value of the maturity value (that is, 
a lump sum). Therefore, when a single discount 
rate is employed, a bond's value can be thought 
of as the sum of two present values—an annuity 
and a lump sum. 

The adjustment for the discount rate is easy to 
accomplish but tricky to interpret. For example, 
if an annual discount rate of 6% is used, how 
do we obtain the semiannual discount rate? We 
will simply use one-half the annual rate, 3.0% 
(6%/2). How can this be? A 3.0% semiannual 
rate is not a 6% effective annual rate. As we 
will see later in this entry, the convention in the 
bond market is to quote annual interest rates 
that are just double the semiannual rates. This 
convention will be explained more fully later 
when we discuss yield to maturity. For now, 
accept on faith that one-half the discount rate 
is used as a semiannual discount rate in the 
balance of the entry. 
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We now have everything in place to value a 
semiannual coupon-paying bond. The present 
value of an annuity is equal to: 


Annuity payment x 


1 - 


^ _|_ 7/^no. of years 


where r is the annual discount rate. 

Applying this formula to a semiannual-pay 
bond, the annuity payment is one half the an¬ 
nual coupon payment and the number of peri¬ 
ods is double the number of years to maturity. 
Accordingly, the present value of the coupon 
payments can be expressed as: 


Semiannual coupon payment 
1 


(1 _|_ ;) n o. of yearsx2 


where i is the semiannual discount rate (r / 2). 
Notice that in the formula, for the number of pe¬ 
riods we use the number of years multiplied by 
2 since a period in our illustration is six months. 

The present value of the maturity value is just 
the present value of a lump sum and is equal 
to: 


Present value of the maturity value 

$100 

^ _|_ j'j No. of years x 2 

We will illustrate the calculation by valuing 
our 4-year, 6% coupon bond assuming that the 
relevant discount rate is 7%. The data are sum¬ 
marized below: 


Semiannual coupon payment = $3 (per $100 of 
par value) 

Semiannual discount rate (i) = 3.5% (7%/2) 
Number of years to maturity= 4 


The present value of the coupon payments is: 


$3 x 


(1.035) 4x2 

0.035 


$20.6219 


This number tells us that the coupon payments 
contribute $20.6219 to the bond's value. 

The present value of the maturity value is: 


Present value of the maturity value = 


$100 

(1.035) 4x2 

$75.9412 


This number ($75.9412) tells us how much the 
maturity value contributes to the bond's value. 
The bond's value is then $96.5631 ($20.6219 + 
$75.9412). The price is less than par value and 
the bond is said to be trading at a discount. 
This will occur when the fixed coupon rate a 
bond offers (6%) is less than the required yield 
demanded by the market (the 7% discount rate). 
A discount bond has an inferior coupon rate 
relative to new comparable bonds being issued 
at par so its price must drop so as to offer the 
required yield of 7%. If the discount bond is 
held to maturity, the investor will experience a 
capital gain that just offsets the lower current 
coupon rate so that it appears equally attractive 
to new comparable bonds issued at par. 

Suppose instead of a 7% discount rate, a 5% 
discount rate is used. This discount rate is less 
than the coupon rate on the bond (6%). It can 
be shown that the present value of the coupon 
payments is $21.5104 and the present value of 
the maturity value is $82.0747. Thus, the bond's 
value in this case is $103.5851. That is, the price 
is greater than par value and the bond is said 
to be trading at a premium. This will occur 
when the fixed coupon rate a bond offers (6%) 
is greater than the required yield demanded by 
the market (the 5% discount rate). Accordingly, 
a premium bond carries a higher coupon rate 
than new bonds (otherwise the same) being is¬ 
sued today at par so the price will be bid up 
and the required yield will fall until it equals 
5%. If the premium bond is held to maturity, 
the investor will experience a capital loss that 
just offsets the benefits of the higher coupon 
rate so that it will appear equally attractive to 
new comparable bonds issued at par. 

Finally, let's suppose that the discount rate is 
equal to the coupon rate. That is, suppose that 
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the discount rate is 6%. It can be shown that 
the present value of the coupon payments is 
$21.0591 and the present value of the maturity 
value is $78.9409. Thus, the bond's value in this 
case is $100 or par value. Thus, when a bond's 
coupon rate is equal to the discount rate, the 
bond will trade at par value. Note that the pre¬ 
ceding statement is strictly true only when a 
bond is valued on its coupon payment dates. 


Valuing a Zero-Coupon Bond 

For a zero-coupon bond, there is only one cash 
flow—the repayment of principal at maturity. 
The value of a zero-coupon bond that matures 
N years from now is: 

Maturity value 
(1 + i ) N * 2 

where i is the semiannual discount rate. 

The expression presented above states that 
the price of a zero-coupon bond is simply 
the present value of the maturity value. In the 
present value computation, why is the number 
of periods used for discounting rather than the 
number of years to the bond's maturity when 
there are no semiannual coupon payments? We 
do this in order to make the valuation of a zero- 
coupon bond consistent with the valuation of 
a coupon bond. In other words, both coupon 
and zero-coupon bonds are valued using semi¬ 
annual discounting rates. 

To illustrate, the value of a 10-year zero- 
coupon bond with a maturity value of $100 dis¬ 
counted at a 6.4% interest rate is $53.2606, as 
presented below: 


i = 0.032 = (0.064/2) 
N= 10 

$100 


(1.032) 


10x2 


= $53.2606 


For bonds with semiannual coupon payments, 
this occurs only twice a year. Our task now is 
to describe how bonds are valued on the other 
363 days (or 364 days) of the year. 

In order to value a bond with a settlement 
date between coupon payments, we must an¬ 
swer three questions. First, how many days are 
there until the next coupon payment date? The 
answer depends on the day count convention 
for the bond being valued. Second, how should 
we compute the present value of the cash flows 
received over the fractional period? Third, how 
much must the buyer compensate the seller for 
the coupon earned over the fractional period? 
This amount is accrued interest. We will answer 
these three questions in order to determine the 
full price and the clean price of a coupon bond. 
For a more detailed discussion of these issues 
for not only U.S. bonds but bonds traded in 
other countries, see Krgin (2002). 

Computing the Full Price When valuing a bond 
purchased with a settlement date between 
coupon payment dates, the first step is to de¬ 
termine the fractional periods between the set¬ 
tlement date and the next coupon date. Using 
the appropriate day count convention, this is 
determined as follows: 

Days between settlement date 

and next coupon payment date 

w periods = ----— 

Days in the coupon period 

Then the present value of each expected fu¬ 
ture cash flow to be received t periods from 
now using a discount rate i assuming the next 
coupon payment is zv periods from now (settle¬ 
ment date) is: 

„ . Expected cash flow 

Present value f = — ———- 

(1 + I)'-!+“' 


Valuing a Bond between Coupon Payments 
In our discussion of bond valuation to this 
point, we have assumed that the bonds are val¬ 
ued on their coupon payment dates (that is, the 
next coupon payment is one full period away). 


Note for the first coupon payment subsequent 
to the settlement date, t = 1 so the exponent 
is just zv. This procedure for calculating the 
present value when a bond is purchased be¬ 
tween coupon payments is called the "Street 
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method." In the Street method, as can be seen 
in the previous expression, coupon interest is 
compounded over the fractional period zv. 2 

To illustrate the calculation, suppose that a 
U.S. Treasury note maturing on December 31, 
2007, was purchased with a settlement date of 
November 22, 2006. This note's coupon rate 
was 4.375 and it had coupon payment dates of 
June 30 and December 31. As a result, the next 
coupon payment was December 31,2006, while 
the previous coupon payment was paid on June 

30, 2006. There were three cash flows remain¬ 
ing and they were to be delivered on December 

31, 2006, June 30,2007, and December 31, 2007. 
The final cash flow represented the last coupon 
payment and the maturity value of $100. Also 
assume the following: 

1. Actual / actual day count convention 

2. 39 days between the settlement date and the 
next coupon payment date 

3. 184 days in the coupon period 

Then zv is 0.2120 periods (39/184). The present 
value of each cash flow assuming that each is 
discounted at a 4.9% annual discount rate is 
, $2.1875 

Period 1: Present valu ei = (1 0245)0 . 21 20 
= $2.1761 

„ . „ , $2.1875 

Period 2: Present value 2 = (1 0245)1 . 2120 

= $2.1243 

„ . , „ „ , $102.1875 

Period 3: Present value 3 = (1 Q245)2 . 2120 

= $96.8498 

The sum of the present values of the cash flows 
is $101.1502. This price is referred to as the full 
price (or the dirty price). 

It is the full price the bond's buyer pays the 
seller at delivery. However, the very next cash 
flow received and included in the present value 
calculation was not earned by the bond's buyer. 
A portion of the next coupon payment is the ac¬ 
crued interest. Accrued interest is the portion of 
a bond's next coupon payment that the bond's 


seller is entitled to depending on the amount of 
time the bond was held by the seller. Recall that 
the buyer recovers the accrued interest when 
the next coupon payment is delivered. 

Computing the Accrued Interest and the Clean 
Price The last step in this process is to find the 
bond's value without accrued interest (called 
the clean price or simply price). To do this, the 
accrued interest must be computed. The first 
step is to determine the number of days in 
the accrued interest period (that is, the num¬ 
ber of days between the last coupon payment 
date and the settlement date) using the appro¬ 
priate day count convention. For ease of ex¬ 
position, we will assume in the example that 
follows that the actual/actual calendar is used. 
We will also assume there are only two bond¬ 
holders in a given coupon period—the buyer 
and the seller. 

As an illustration, we return to the previous 
example with the 4.375% coupon Treasury note. 
Since there were 184 days in the coupon period 
and 39 days from the settlement date to the next 
coupon period, there were 145 days (184—39) 
in the accrued interest period. Therefore, the 
percentage of the next coupon payment that is 
accrued interest is: 

145 

— = 0.7880 = 78.80% 

184 

Of course, this is the same percentage found by 
simply subtracting zv from 1. In our example, zv 
was 0.2120. Then, 1 - 0.2120 = 0.7880. 

Given the value of zv, the amount of accrued 
interest (AI) is equal to: 

AI = Semiannual coupon payment x (1 — w) 

Accordingly, using a 4.375 Treasury note with a 
settlement date of November 22, 2006, the por¬ 
tion of the next coupon payment that was ac¬ 
crued interest was: 

$2.1875 x (1 - 0.7880) = $1.7238 (per $100 of 

par value) 

Once we know the full price and the accrued 
interest, we can determine the clean price. The 
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clean price is the price quoted in the market and 
represents the bond's value to the new bond¬ 
holder. The clean price is computed as follows: 

Clean price = Full price — Accrued interest 
In our illustration, the clean price is: 

$99.43 = $101.1502 - $1.7238 

Note that in computing the full price, the 
present value of the next coupon payment is 
computed. However, the buyer pays the seller 
the accrued interest now despite the fact that 
it will not be recovered until the next coupon 
payment date. To make this concrete, suppose 
one sells a bond such that the settlement date is 
halfway between the coupon payment dates. In 
this case w = 0.50. Accordingly, the seller will 
be entitled to one-half of the next coupon pay¬ 
ment which would not otherwise be received 
for another three months. Thus, when calculat¬ 
ing the clean price, we subtract "too much" ac¬ 
crued interest—one-half the coupon payment 
rather than the present value of one-half the 
coupon payment. Of course, this is the mar¬ 
ket convention for calculating accrued interest 
but it does introduce a curious twist in bond 
valuation. 


The Price/Discount Rate 
Relationship 

An important general property of present value 
is that the higher (lower) the discount rate, the 
lower (higher) the present value. Since the value 
of a security is the present value of the expected 
future cash flows, this property carries over 
to the value of a security: The higher (lower) 
the discount rate, the lower (higher) a secu¬ 
rity's value. We can summarize the relationship 
between the coupon rate, the required market 
yield, and the bond's price relative to its par 
value as follows: 

Coupon rate = Yield required by market 
=> Price = Par value 


Coupon rate < Yield required by market 

=£• Price < Par value (discount) 
Coupon rate > Yield required by market 

=>■ Price > Par value (premium) 

This agrees with what we found for the 4-year, 
6% coupon bond: 


Coupon 

Rate 

Yield 

Required by 
Market 

Price 

Bond 
Trading at 

6% 

7% 

$96.5631 

Discount 

6% 

5% 

$103.5851 

Premium 

6% 

6% 

$100.0000 

Par 


Figure 1 depicts this inverse relationship be¬ 
tween an option-free bond's price and its dis¬ 
count rate (that is, required yield). There are 
two things to infer from the price/discount rate 
relationship depicted in the figure. First, the re¬ 
lationship is downward sloping. This is simply 
the inverse relationship between present values 
and discount rates at work. Second, the rela¬ 
tionship is represented as a curve rather than a 
straight line. In fact, the shape of the curve in 
Figure 1 is referred to as convex. By convex, it 
simply means the curve is "bowed in" relative 
to the origin. This second observation raises two 
questions about the convex or curved shape of 
the price/discount rate relationship. First, why 
is it curved? Second, what is the import of the 
curvature? 



Figure 1 Price/Discount Rate Relationship for 
an Option-Free Bond 
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The answer to the first question is mathemat¬ 
ical. The answer lies in the denominator of the 
bond pricing formula. Since we are raising one 
plus the discount rate to powers greater than 
one, it should not be surprising that the rela¬ 
tionship between the level of the price and the 
level of the discount rate is not linear. 

As for the importance of the curvature to 
bond investors, let's consider what happens to 
bond prices in both falling and rising inter¬ 
est rate environments. First, what happens to 
bond prices as interest rates fall? The answer 
is obvious—bond prices rise. How about the 
rate at which they rise? If the price/discount 
rate relationship was linear, as interest rates fell, 
bond prices would rise at a constant rate. How¬ 
ever, the relationship is not linear, it is curved 
and curved inward. Accordingly, when interest 
rates fall, bond prices increase at an increasing 
rate. Now, let's consider what happens when 
interest rates rise. Of course, bond prices fall. 
How about the rate at which bond prices fall? 
Once again, if the price/discount rate relation¬ 
ship were linear, as interest rates rose, bond 
prices would fall at a constant rate. Since it 
curved inward, when interest rates rise, bond 
prices decrease at a decreasing rate. 

Time Path of Bond 

As a bond moves towards its maturity date, 
its value changes. More specifically, assuming 
that the discount rate does not change, a bond's 
value: 

1. Decreases over time if the bond is selling at 
a premium. 

2. Increases over time if the bond is selling at a 
discount. 

3. Is unchanged if the bond is selling at par 
value. 

With respect to the last property, we are assum¬ 
ing the bond is valued on its coupon anniver¬ 
sary dates. 

At the maturity date, the bond's value is equal 
to its par or maturity value. So, as a bond's 


maturity approaches, the price of a discount 
bond will rise to its par value and a premium 
bond will fall to its par value—a characteristic 
sometimes referred to as pull to par value. 

ARBITRAGE-FREE BOND 
VALUATION 

The traditional approach to valuation is to dis¬ 
count every cash flow of a fixed income secu¬ 
rity using the same interest or discount rate. 
The fundamental flaw of this approach is that it 
views each security as the same package of cash 
flows. For example, consider a 5-year U.S. Trea¬ 
sury note with a 6% coupon rate. The cash flows 
per $100 of par value would be 9 payments of 
$3 every six months and $103 ten 6-month pe¬ 
riods from now. The traditional practice would 
discount every cash flow using the same dis¬ 
count rate regardless of when the cash flows 
are delivered in time and the shape of the yield 
curve. Finance theory tells us that any security 
should be thought of as a package or portfolio 
of zero-coupon bonds. 

The proper way to view the 5-year 6% coupon 
Treasury note is as a package of zero-coupon in¬ 
struments whose maturity value is the amount 
of the cash flow and whose maturity date co¬ 
incides with the date the cash flow is to be re¬ 
ceived. Thus, the 5-year 6% coupon Treasury 
issue should be viewed as a package of 10 
zero-coupon instruments that mature every six 
months for the next five years. This approach 
to valuation does not allow a market partici¬ 
pant to realize an arbitrage profit by breaking 
apart or "stripping" a bond and selling the in¬ 
dividual cash flows (that is, stripped securities) 
at a higher aggregate value than it would cost 
to purchase the security in the market. Simply 
put, arbitrage profits are possible when the sum 
of the parts is worth more than the whole or 
vice versa. Because this approach to valuation 
precludes arbitrage profits, we refer to it as the 
arbitrage-free valuation approach. 

By viewing any security as a package of 
zero-coupon bonds, a consistent valuation 
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framework can be developed. Viewing a secu¬ 
rity as a package of zero-coupon bonds means 
that two bonds with the same maturity and 
different coupon rates are viewed as different 
packages of zero-coupon bonds and valued ac¬ 
cordingly. Moreover, two cash flows that have 
identical risk delivered at the same time will 
be valued using the same discount rate even 
though they are attached to two different bonds. 

To implement the arbitrage-free approach it is 
necessary to determine the theoretical rate that 
the U.S. Treasury would have to pay on a zero- 
coupon Treasury security for each maturity. We 
say "theoretical" because other than U.S. Trea¬ 
sury bills, the Treasury does not issue zero- 
coupon bonds. Zero-coupon Treasuries are, 
however, created by dealer firms. The name 
given to the zero-coupon Treasury rate is the 
(Treasury) spot rate. Our next task is to explain 
how the Treasury spot rate can be calculated. 

Theoretical Spot Rates 

The theoretical spot rates for Treasury securities 
represent the appropriate set of interest or dis¬ 
count rates that should be used to value default- 
free cash flows. A default-free theoretical spot 
rate can be constructed from the observed Trea¬ 
sury yield curve or par curve. We will begin our 
quest of how to estimate spot rates with the par 
curve. 

Par Rates 

The raw material for all yield curve analysis is 
the set of yields on the most recently issued 
(that is, on-the-run) Treasury securities. The 
U.S. Treasury routinely issues 10 securities—the 
1-month, 3-month, 6-month, and 1-year bills 
and the 2-, 3-, 5-, 7-, and 10-year notes, and 
a 30-year bond. These on-the-run Treasury is¬ 
sues are default risk-free and trade in one of 
the most liquid and efficient secondary markets 
in the world. Because of these characteristics, 
historically Treasury yields serve as a reference 
benchmark for risk-free rates which are used for 


pricing other securities. However, other bench¬ 
marks such as the swap curve are now used but 
the principles of valuation remain unchanged. 

In practice, however, the observed yields for 
the on-the-run Treasury coupon issues are not 
usually used directly. Instead, the coupon rate 
is adjusted so that the price of the issue would 
be the par value. Accordingly, the par yield 
curve is the adjusted on-the-run Treasury yield 
curve where coupon issues are at par value and 
the coupon rate is therefore equal to the yield to 
maturity. The exception is for the 6-month Trea¬ 
sury bills; the bond-equivalent yield for this is¬ 
sue is already the spot rate. 

Deriving a par curve from a set of points start¬ 
ing with the yield on the 6-month bill and end¬ 
ing the yield on the 30-year bond is not a trivial 
matter. The end result is a curve that tells us "if 
the Treasury were to issue a security today with 
a maturity equal to say 12 years, what coupon 
rate would the security have to pay in order 
to sell at par?" Some analysts contend that es¬ 
timating the par curve with only the yields of 
the on-the-run Treasuries uses too little infor¬ 
mation that is available from the market. In 
particular, one must estimate the back-end of 
the yield curve with only one security, that is, 
the 30-year bond. Some analysts prefer to use 
the on-the-run Treasuries and selected off-the- 
run Treasuries. 

In summary, a par rate is the average dis¬ 
count rate of many cash flows (those of a par 
bond) over many periods. This begs the ques¬ 
tion, "the average of what?" As we will see, par 
rates are complicated averages of the implied 
spot rates. Thus, in order to uncover the spot 
rates, we must find a method to "break apart" 
the par rates. There are several approaches that 
are used in practice. 3 The approach that we de¬ 
scribe below for creating a theoretical spot rate 
curve is called bootstrapping. 

Bootstrapping the Spot Rate Curve 

Bootstrapping begins with the par curve. To il¬ 
lustrate bootstrapping, we will use the Treasury 
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Table 1 Hypothetical Treasury Par Yield Curve 


Period 

Years 

Annual 
Yield to 
Maturity 
(BEY) 
(%)• 

Price 

Spot 

Rate 

(BEY) 

(%)* 

1 

0.5 

3.00 

— 

3.0000 

2 

1.0 

3.30 

— 

3.3000 

3 

1.5 

3.50 

100.00 

3.5053 

4 

2.0 

3.90 

100.00 

3.9164 

5 

2.5 

4.40 

100.00 

4.4376 

6 

3.0 

4.70 

100.00 

4.7520 

7 

3.5 

4.90 

100.00 

4.9622 

8 

4.0 

5.00 

100.00 

5.0650 

9 

4.5 

5.10 

100.00 

5.1701 

10 

5.0 

5.20 

100.00 

5.2772 

11 

5.5 

5.30 

100.00 

5.3864 

12 

6.0 

5.40 

100.00 

5.4976 

13 

6.5 

5.50 

100.00 

5.6108 

14 

7.0 

5.55 

100.00 

5.6643 

15 

7.5 

5.60 

100.00 

5.7193 

16 

8.0 

5.65 

100.00 

5.7755 

17 

8.5 

5.70 

100.00 

5.8331 

18 

9.0 

5.80 

100.00 

5.9584 

19 

9.5 

5.90 

100.00 

6.0863 

20 

10.0 

6.00 

100.00 

6.2169 


*The yield to maturity and the spot rate are annual rates. 
They are reported as bond-equivalent yields. To obtain 
the semiannual yield or rate, one half the annual yield 
or annual rate is used. 


par curve shown in Table 1. The par yield curve 
shown extends only out to 10 years. Our ob¬ 
jective is to show how the values in the last 
column of the table (labeled "Spot Rate") are 
obtained. Throughout the analysis and illustra¬ 
tions to come, it is important to remember the 
basic principle is that the value of the Treasury 
coupon security should be equal to the value of 
the package of zero-coupon Treasury securities 
that duplicates the coupon bond's cash flows. 

The key to this process is the existence of the 
Treasury strips market. A government securi¬ 
ties dealer has the ability to take apart the cash 
flows of a Treasury coupon security (that is, 
strip the security) and create zero-coupon secu¬ 
rities. These zero-coupon securities, which are 
called Treasury strips, can be sold to investors. 
At what interest rate or yield can these Trea¬ 
sury strips be sold to investors? The answer is 


they can be sold at the Treasury spot rates. If the 
market price of a Treasury security is less than 
its value after discounting with spot rates (that 
is, the sum of the parts is worth more than the 
whole), then a dealer can buy the Treasury se¬ 
curity, strip it, and sell off the Treasury strips so 
as to generate greater proceeds than the cost of 
purchasing the Treasury security. The resulting 
profit is an arbitrage profit. 

Before we proceed to our illustration of boot¬ 
strapping, a very sensible question must be 
addressed. Specifically, if Treasury strips are 
in effect zero-coupon Treasury securities, why 
not use strip rates (that is, the rates on Trea¬ 
sury strips) as our spot rates? In other words, 
why must we estimate theoretical spot rates via 
bootstrapping using yields from Treasury bills, 
notes, and bonds when we already have strip 
rates conveniently available? There are three 
major reasons. First, although Treasury strips 
are actively traded, they are not as liquid as on- 
the-run Treasury bills, notes, and bonds. As a re¬ 
sult, Treasury strips have some liquidity risk for 
which investors will demand some compensa¬ 
tion in the form of higher yields. Second, the tax 
treatment of strips is different from that of Trea¬ 
sury coupon securities. Specifically, the accrued 
interest on strips is taxed even though no cash is 
received by the investor. Thus, they are negative 
cash flow securities to taxable entities, and, as a 
result, their yield reflects this tax disadvantage. 
Finally, there are maturity sectors where non- 
U.S. investors find it advantageous to trade off 
yield for tax advantages associated with a strip. 
Specifically, certain non-U.S. tax authorities al¬ 
low their citizens to treat the difference between 
the maturity value and the purchase price as a 
capital gain and tax this gain at a favorable tax 
rate. Some will grant this favorable treatment 
only when the strip is created from the prin¬ 
cipal rather than the coupon. For this reason, 
those who use Treasury strips to represent the¬ 
oretical spot rates restrict the issues included to 
coupon strips. 

Now let's see how to generate the spot rates. 
Consider the 6-month Treasury security in 
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Table 1. This security is a Treasury bill and is 
issued as a zero-coupon instrument. Therefore, 
the annualized bond-equivalent yield (not the 
bank discount yield) of 3.00% for the 6-month 
Treasury security is equal to the 6-month spot 
rate. Using the yield on the 1-year bill, we use 
3.3% as the 1-year spot rate. Given these two 
spot rates, we can compute the spot rate for a 
theoretical 1.5-year zero-coupon Treasury. The 
value of a theoretical 1.5-year Treasury should 
equal the present value of the three cash flows 
from the 1.5-year coupon Treasury, where the 
yield used for discounting is the spot rate cor¬ 
responding to the time of receipt of the cash 
flow. Since all the coupon bonds are selling at 
par, as explained in the previous section, the 
yield to maturity for each bond is the coupon 
rate. Using $100 as par, the cash flows for the 
1.5-year coupon Treasury are: 

0.5 year 0.035 x $100 x 0.5 = $1.75 

1.0 year 0.035 x $100 x 0.5 = $1.75 

1.5 years 0.035 x $100 x 0.5 + 100 = $101.75 

The present value of the cash flows is then: 

1.75 i 1.75 | 101.75 

(1+Zi) 1 + (l+z 2 ) 2 + (l + z 3 ) 3 

where 

Zi = one-half the annualized 6-month theoreti¬ 
cal spot rate 

Z 2 = one-half the 1-year theoretical spot rate 
Z 3 = one-half the 1.5-year theoretical spot rate 

Since the 6-month spot rate is 3% and the 
1-year spot rate is 3.30%, we know that: 

Zi = 0.0150 and Z 2 = 0.0165 

We can compute the present value of the 1.5- 
year coupon Treasury security as: 

1.75 | 1.75 | 101.75 1.75 

(1 + zj) 1 + (1 + z 2 ) 2 + (1 + z 3 ) 3 ~~ (1.015) 1 

1.75 | 101.75 

+ (1.0165) 2 + (l + z 3 ) 3 

Since the price of the 1.5-year coupon Trea¬ 
sury security is equal to its par value (see 


Table 1), the following relationship must hold 

1.75 1.75 101.75 

(1.015) 1 + (1.0165) 2 + (1 + z 3 ) 3 “ 


If we had not been working with a par yield 
curve, the equation would have been set to the 
market price for the 1.5-year issue rather than 
par value. 

Note we are treating the 1.5 year par bond 
as if it were a portfolio of three zero-coupon 
bonds. Moreover, each cash flow has its own 
discount rate that depends on when the cash 
flow is delivered in the future and the shape of 
the yield curve. This is in sharp contrast to the 
traditional valuation approach that forces each 
cash flow to have the same discount rate. 

We can solve for the theoretical 1.5-year spot 
rate as follows: 


1.7241 + 1.6936+ -—— 
(1 + Z 3) 3 

101.75 

(I+Z 3) 3 

(l+z 3 ) 3 

(l+z 3 ) 3 

Z3 


100 

96.5822 

101.75 

96.5822 

1.05351 

0.017527 

1.7527% 


Doubling this yield we obtain the bond- 
equivalent yield of 3.5053%, which is the theo¬ 
retical 1.5-year spot rate. This is the rate that the 
market would apply to a 1.5-year zero-coupon 
Treasury security if, in fact, such a security ex¬ 
isted. In other words, all Treasury cash flows 
to be received 1.5 years from now should be 
valued (that is, discounted) at 3.5053%. 

Given the theoretical 1.5-year spot rate, we 
can obtain the theoretical 2-year spot rate. The 
cash flows for the 2-year coupon Treasury in 
Table 1 are: 


0.5 year 0.039 x $100 x 0.5 = $1.95 

1.0 year 0.039 x $100 x 0.5 = $1.95 

1.5 years 0.039 x $100 x 0.5 = $1.95 

2.0 years 0.039 x $100 x 0.5 + 100 = $101.95 
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The present value of the cash flows is then: 

1.95 1.95 1.95 101.95 

(i + Zl y + (i + z 2 ) 2 + (i + z 3 ) 3 + (i + z 4 ) 4 

where Z4 = one-half of the 2-year theoretical 
spot rate. 

Since the 6-month spot rate, 1-year spot rate, 
and 1.5-year spot rate are 3.00%, 3.30%, and 
3.5053%, respectively, then: 

Z! = 0.0150 z 2 = 0.0165 z 3 = 0.017527 


Therefore, the present value of the 2-year 
coupon Treasury security is: 

1.95 1.95 1.95 


(1.0150) 1 (1.0165) 2 

101-95 

= 100 


(1.017527) 3 


(I + Z4) 4 

Since the price of the 2-year coupon Treasury se¬ 
curity is equal to par, the following relationship 
must hold: 

1.95 1.95 1.95 


(1.0150) 1 (1.0165) 2 

101-95 

= 100 


(1.017527) 3 


(1 + Z4) 4 

We can solve for the theoretical 2-year spot rate 
as follows: 

101.95 

= 94.3407 

101.95 


(I + Z4) 4 
(1 + Z4) 4 = 


94.3407 
z 4 = 0.019582 = 1.9582% 


Doubling this yield, we obtain the theoretical 2- 
year spot rate bond-equivalent yield of 3.9164%. 

One can follow this approach sequentially to 
derive the theoretical 2.5-year spot rate from 
the calculated values of z lr z 2 , z 3 , and z 4 (the 
6-month, 1-year, 1.5-year, and 2-year rates), and 
the price and coupon of the 2.5-year bond in 
Table 1. Further, one could derive theoretical 
spot rates for the remaining 15 half-yearly rates. 
The spot rates thus obtained are shown in the 
last column of Table 1. They represent the term 
structure of default-free spot rate for maturities 
up to 10 years at the particular time to which 
the bond price quotations refer. 


Let us summarize to this point. We started 
with the par curve which is constructed using 
the adjusted yields from the on-the-run Trea¬ 
suries. A par rate is the average discount rate 
of many cash flows over many periods. Specifi¬ 
cally, par rates are complicated averages of spot 
rates. The spot rates are uncovered from par 
rates via bootstrapping. A spot rate is the av¬ 
erage discount rate of a single cash flow over 
many periods. It appears that spot rates are also 
averages. Spot rates are averages of one or more 
forward rates. 

Valuation Using Treasury Spot Rates 

To illustrate how Treasury spot rates are used to 
compute the arbitrage-free value of a Treasury 
security, we will use the hypothetical Treasury 
spot rates shown in the fourth column of Ta¬ 
ble 2 to value an 8%, 10-year Treasury security. 
The present value of each period's cash flow 
is shown in the fifth column. The sum of the 
present values is the arbitrage-free value for the 
Treasury security. For the 8%, 10-year Treasury 
it is $107.0018. 

Reason for Using Treasury 
Spot Rates 

Thus far, we have simply asserted that the 
value of a Treasury security should be based 
on discounting each cash flow using the 
corresponding Treasury spot rate. But what if 
market participants value a security using just 
the yield for the on-the-run Treasury with a 
maturity equal to the maturity of the Treasury 
security being valued? Let's see why the value 
of a Treasury security should trade close to its 
arbitrage-free value. 

Stripping and Arbitrage-Free 
Valuation 

The key to the arbitrage-free valuation ap¬ 
proach is the existence of the Treasury strips 
market. A dealer has the ability to take apart the 
cash flows of a Treasury coupon security (that 
is, strip the security) and create zero-coupon 
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securities. These zero-coupon securities, called 
Treasury strips, can be sold to investors. At 
what interest rate or yield can these Treasury 
strips be sold to investors? They can be sold 
at the Treasury spot rates. If the market price 
of a Treasury security is less than its value us¬ 
ing the arbitrage-free valuation approach, then 
a dealer can buy the Treasury security, strip it, 
and sell off the individual Treasury strips so as 
to generate greater proceeds than the cost of 
purchasing the Treasury security. The resulting 
profit is an arbitrage profit. Since as we will 
see, the value determined by using the Trea¬ 
sury spot rates does not allow for the genera¬ 
tion of an arbitrage profit, this is referred to as 
an "arbitrage-free" approach. 

To illustrate this, suppose that the yield for 
the on-the-run 10-year Treasury issue is 7.08%. 
Suppose that the 8% coupon 10-year Treasury 
issue is valued using the traditional approach 
based on 7.08%. The value based on discounting 
all the cash flows at 7.08% is $106.5141 as shown 
in the next-to-the-last column in Table 2. 


Consider what would happen if the market 
priced the security at $106.5141 and that the 
spot rates are those shown in the fourth column 
of Table 2. The value based on the Treasury spot 
rates is $107.0018 as shown in the fifth column 
of Table 2. What can the dealer do? The dealer 
can buy the 8% 10-year issue for $106.5141, strip 
it, and sell the Treasury strips at the spot rates 
shown in Table 2. By doing so, the proceeds that 
will be received by the dealer are $107.0018. This 
results in an arbitrage profit (ignoring transac¬ 
tion costs) of $0.4877 (= $107.0018 - $106.5141). 
Dealers recognizing this arbitrage opportunity 
will bid up the price of the 8% 10-year Trea¬ 
sury issue in order to acquire it and strip it. The 
arbitrage profit will be eliminated when the se¬ 
curity is priced at $107.0018, the value that we 
said is the arbitrage-free value. 

To understand in more detail where this arbi¬ 
trage profit is coming from, look at the last three 
columns in Table 2. The sixth column shows 
how much each cash flow can be sold for by 
the dealer if it is stripped. The values in this 


Table 2 Determination of the Arbitrage-Free Value of an 8% 10-Year Treasury and Arbitrage Opportunity 


Arbitrage-Free Value Abitrage Opportunity 


Period 

Years 

Cash Flow 
($) 

Spot Rate (%) 

Present 
Value ($) 

Sell for 

Buy for 

Arbitrage 

Profit 

1 

0.5 

4 

6.05 

3.8826 

3.8826 

3.8632 

0.0193 

2 

1.0 

4 

6.15 

3.7649 

3.7649 

3.7312 

0.0337 

3 

1.5 

4 

6.21 

3.6494 

3.6494 

3.6036 

0.0458 

4 

2.0 

4 

6.26 

3.5361 

3.5361 

3.4804 

0.0557 

5 

2.5 

4 

6.29 

3.4263 

3.4263 

3.3614 

0.6486 

6 

3.0 

4 

6.37 

3.3141 

3.3141 

3.2465 

0.0676 

7 

3.5 

4 

6.38 

3.2107 

3.3107 

3.1355 

0.0752 

8 

4.0 

4 

6.40 

3.1090 

3.1090 

3.0283 

0.0807 

9 

4.5 

4 

6.41 

3.0113 

3.0113 

2.9247 

0.0866 

10 

5.0 

4 

6.48 

2.9079 

2.9079 

2.8247 

0.0832 

11 

5.5 

4 

6.49 

2.8151 

2.8151 

2.7282 

0.0867 

12 

6.0 

4 

6.53 

2.7203 

2.7203 

2.6349 

0.0854 

13 

6.5 

4 

6.63 

2.6178 

2.6178 

2.5448 

0.0730 

14 

7.0 

4 

6.78 

2.5082 

2.5082 

2.4578 

0.0504 

15 

7.5 

4 

6.79 

2.4242 

2.4242 

2.3738 

0.0504 

16 

8.0 

4 

6.81 

2.3410 

2.3410 

2.2926 

0.0484 

17 

8.5 

4 

6.84 

2.2583 

2.2583 

2.2142 

0.0441 

18 

9.0 

4 

6.93 

2.1666 

2.1666 

2.1385 

0.0281 

19 

9.5 

4 

7.05 

2.0711 

2.0711 

2.0654 

0.0057 

20 

10.0 

104 

7.20 

51.2670 

51.2670 

51.8645 

-0.5975 




Total 

107.0018 

107.0018 

106.5141 

0.4877 
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column are just those in the fifth column. The 
next-to-last column shows how much the dealer 
is effectively purchasing the cash flow for if each 
cash flow is discounted at 7.08%. The sum of the 
arbitrage profit from each stripped cash flow is 
the total arbitrage profit and is contained in the 
last column. 

We have just demonstrated how coupon strip¬ 
ping of a Treasury issue will force the market 
value to be close to the value as determined by 
the arbitrage-free valuation approach when the 
market price is less than the arbitrage-free value 
(that is, the whole is worth less than the sum of 
the parts). What happens when a Treasury is¬ 
sue's market price is greater than the arbitrage- 
free value? Obviously, a dealer will not want to 
strip the Treasury issue since the proceeds gen¬ 
erated from stripping will be less than the cost 
of purchasing the issue. 

When such situations occur, the dealer can 
purchase a package of Treasury strips so as to 
create a synthetic Treasury coupon security that 
is worth more than the same maturity and same 
coupon Treasury issue. This process is called 
reconstitution. 

The process of stripping and reconstituting 
ensures that the price of a Treasury issue will 
not depart materially (depending on transac¬ 
tion costs) from its arbitrage-free value. 


Credit Spreads and the Valuation of 
Non-Treasury Securities 

The Treasury spot rates can be used to value any 
default-free security. For a non-Treasury secu¬ 
rity, the theoretical value is not as easy to de¬ 
termine. The value of a non-Treasury security 
is found by discounting the cash flows by the 
Treasury spot rates plus a yield spread which 
reflects the additional risks (e.g., default risk, 
liquidity risks, the risk associated with any em¬ 
bedded options, and so on). 

The spot rate used to discount the cash flow of 
a non-Treasury security can be the Treasury spot 
rate plus a constant credit spread. For example. 


suppose the 6-month Treasury spot rate is 6.05% 
and the 10-year Treasury spot rate is 7.20%. Also 
suppose that a suitable credit spread is 100 ba¬ 
sis points. Then a 7.05% spot rate is used to dis¬ 
count a 6-month cash flow of a non-Treasury 
bond and an 8.20% discount rate is used to 
discount a 10-year cash flow. (Remember that 
when each semiannual cash flow is discounted, 
the discount rate used is one-half the spot rate: 
3.525% for the 6-month spot rate and 4.10% for 
the 10-year spot rate.) 

The drawback of this approach is that there is 
no reason to expect the credit spread to be the 
same regardless of when the cash flow is ex¬ 
pected to be received. Consequently, the credit 
spread may vary with a bond's term to matu¬ 
rity. In other words, there is a term structure 
of credit spreads. Generally, credit spreads in¬ 
crease with maturity. This is a typical shape for 
the term structure of credit spreads. Moreover, 
the shape of the term structure is not the same 
for all credit ratings. Typically, the lower the 
credit rating, the steeper the term structure of 
credit spreads. 

Dealer firms typically estimate the term struc¬ 
ture of credit spreads for each credit rating and 
market sector. Typically, the credit spread in¬ 
creases with maturity. In addition, the shape 
of the term structure is not the same for all 
credit ratings. Typically, the lower the credit 
rating, the steeper the term structure of credit 
spreads. 

When the relevant credit spreads for a given 
credit rating and market sector are added to the 
Treasury spot rates, the resulting term struc¬ 
ture is used to value the bonds of issuers with 
that credit rating in that market sector. This 
term structure is referred to as the benchmark 
spot rate curve or benchmark zero-coupon rate 
curve. 

For example. Table 3 reproduces the Treasury 
spot rate curve in Table 2. Also shown is a hy¬ 
pothetical term structure of credit spreads for 
a non-Treasury security. The resulting bench¬ 
mark spot rate curve is in the next-to-the-last 
column. Like before, it is this spot rate curve 
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Table 3 Calculation of Arbitrage-Free Value of a Hypothetical 8% 10-Year Non-Treasury Security Using 
Benchmark Spot Rate Curve 


Period 

Years 

Cash 

Flow 

($) 

Treasury 

Spot 

Rate (%) 

Credit 

Spread (%) 

Benchmark 

Spot (%) 

Present 
Value ($) 

1 

0.5 

4 

6.05 

0.30 

6.35 

3.8769 

2 

1.0 

4 

6.15 

0.33 

6.48 

3.7529 

3 

1.5 

4 

6.21 

0.34 

6.55 

3.6314 

4 

2.0 

4 

6.26 

0.37 

6.63 

3.5108 

5 

2.5 

4 

6.29 

0.42 

6.71 

3.3916 

6 

3.0 

4 

6.37 

0.43 

6.80 

3.2729 

7 

3.5 

4 

6.38 

0.44 

6.82 

3.1632 

8 

4.0 

4 

6.40 

0.45 

6.85 

3.0553 

9 

4.5 

4 

6.41 

0.46 

6.87 

2.9516 

10 

5.0 

4 

6.48 

0.52 

7.00 

2.8357 

11 

5.5 

4 

6.49 

0.53 

7.02 

2.7369 

12 

6.0 

4 

6.53 

0.55 

7.08 

2.6349 

13 

6.5 

4 

6.63 

0.58 

7.21 

2.5241 

14 

7.0 

4 

6.78 

0.59 

7.37 

2.4101 

15 

7.5 

4 

6.79 

0.63 

7.42 

2.3161 

16 

8.0 

4 

6.81 

0.64 

7.45 

2.2281 

17 

8.5 

4 

6.84 

0.69 

7.53 

2.1340 

18 

9.0 

4 

6.93 

0.73 

7.66 

2.0335 

19 

9.5 

4 

7.05 

0.77 

7.82 

1.9301 

20 

10.0 

104 

7.20 

0.82 

8.02 

Total 

47.3731 

101.763 


that is used to value the securities of issuers that 
have the same credit rating and are in the same 
market sector. This is done in Table 3 for a hy¬ 
pothetical 8% 10-year issue. The arbitrage-free 
value is $101,763. Notice that the theoretical 
value is less than that for an otherwise compa¬ 
rable Treasury security. The arbitrage-free value 
for an 8% 10-year Treasury is $107.0018 (see 
Table 3). 

KEY POINTS 

• A bond can be thought of as a portfolio or 
package of cash flows. Accordingly, the value 
of a bond is simply the present value of its 
remaining expected future cash flows. 

• There is an inverse relationship between bond 
prices/required yields. 

• The traditional approach to valuation is to 
discount each cash flow with the same dis¬ 
count rate. The weakness of the traditional 


approach is its reliance on using the same dis¬ 
count rate for all of the bond's cash flows. 

• The arbitrage-free approach allows each cash 
flow to be valued as a zero-coupon bond with 
a discount rate that depends on the shape of 
the yield curve and when the cash flow is 
delivered in time. 

• The bootstrapping technique is used to derive 
the discount rates for discounting a bond's 
cash flows. These discount rates are called 
spot rates. 

• Default-free bonds should trade at prices 
close to their arbitrage-free values. The pro¬ 
cess of stripping and reconstituting of Trea¬ 
sury securities ensures that this will occur. 

NOTES 

1. For a description of these securities, see 
Fabozzi (2012). 
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2. There is another method called the "Trea¬ 
sury method," which treats the coupon in¬ 
terest over the fractional period as simple 
interest. 

3. There is an extensive literature on es¬ 
timating spot rates or what is known 
as term structure modeling. See Fabozzi 
( 2002 ). 
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Abstract: Valuation of fixed-income products employs one of two basic methods—discounted cash 
flows and relative value. Methods using discounted cash flows require several assumptions to 
be used as inputs but produce a precise valuation result. The tools of relative value analysis are 
less ambitious. They help us discern differences in value between two similar bonds on a relative 
basis. Relative value analysis investors make statements such as "Bond X is cheaper than Bond Y." 
Relative value tools range in complexity from yield spreads to asset swap spreads and the credit 
default swap basis. 


There are two basic approaches to the valua¬ 
tion of fixed-income products. The discounted 
cash flow method seeks to value a bond given 
assumptions about cash flows, reference yield 
curves, risk premiums, and so on. Given these 
inputs, the bond's value is determined. Once 
computed, this value is compared to the pre¬ 
vailing market price and a rich / cheap determi¬ 
nation can be made. The alternative method, 
relative valuation, is less ambitious and not sur¬ 
prisingly more popular. 

Tools of relative value analysis, when prop¬ 
erly interpreted, give the user some clues about 
how similar bonds are currently valued in the 
market on a relative basis. This battery of tools 
allows us to make conjectures such as "Bond X is 
cheaper than Bond Y." Yield measures are basic 
relative value tools. For example, one method 


of measuring a risky bond's relative value is 
to compute its yield spread relative to a desig¬ 
nated benchmark. Discerning relative value is 
then a matter of comparing the yield spreads 
of two or more bonds that are otherwise the 
same. The bond with the largest yield spread 
is viewed as the cheapest and is considered 
the best relative value. In this entry, we will 
introduce yield spread measures utilizing in¬ 
struments from both the cash and derivatives 
markets. 1 

One common way fixed-income portfolio 
managers attempt to outperform benchmarks 
is through security selection. When pursuing a 
security selection strategy, managers attempt to 
overweight cheap issues and underweight rich 
issues to enhance the total rate of return rela¬ 
tive to their benchmark. For this to occur, one or 
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more of the bond's risks must be mispriced. Ac¬ 
tive security selection to enhance performance 
leads to the search for effective relative value 
tools in bond markets. 


YIELD SPREADS OVER SWAP 
AND TREASURY CURVES 

As noted, yield spreads are a frequently used 
tool of relative value analysis. The computation 
is a simple one. A yield spread is the difference 
between a risky bond's yield and a benchmark 
yield holding maturity constant. It is critical to 
note the yield spread does not have any predic¬ 
tive power on the bondholder's realized return; 
the yield spread is merely a convenient way to 
express the price relative to the benchmark. 

There are two commonly used benchmarks: 
the interest rate swap curve and the U.S. Trea¬ 
sury yield curve. A swap is a contract used 
to transform cash flows from one form to an¬ 
other. In its most basic form, in an interest 
rate swap two counterparties agree to exchange 
cash flows at designated future dates for a spec¬ 
ified length of time. The fixed-rate payer makes 
payments that are determined by a fixed rate 
called a szvap rate. Correspondingly, the floating- 
rate payer makes payments based on a reference 
rate, usually the London Interbank Offered Rate 
(LIBOR). LIBOR is the interest rate that prime 
banks in London are willing to pay other prime 
banks on certificates of deposit denominated in 
U.S. dollars. 

Market participants quote swap rates for 
swaps across the maturity spectrum. The rela¬ 
tionship between the swap rate and the swap's 
maturity is called the szvap curve. Since the refer¬ 
ence rate for a swap's floating rate payments is 
usually LIBOR, the swap curve is also referred 
to as the LIBOR curve. 

Over time the szvap curve has supplanted 
the Treasury yield curve as the benchmark of 
choice for computing yield spreads. Indeed, in 
some countries and currencies, the interest rate 
swap market is more liquid than the market 
for sovereign debt. It is important to keep in 


mind that the swap curve does not represent 
a set of default-free interest rates. A swap rate 
is a rate that embodies two risks: (1) the de¬ 
fault risk of the counterparty, and (2) liquidity 
risk. 

As noted, in many countries, the swap curve is 
the benchmark of choice over a country's gov¬ 
ernment securities yield curve. There are sev¬ 
eral reasons that augur use of the swap curve. 
First, in order to construct a government bond 
yield curve that is reflective of the term struc¬ 
ture of interest rates, yields on government 
securities must be available across the entire 
maturity spectrum. In most government bond 
markets, however, a limited number of securi¬ 
ties are available. For example, the U.S. Trea¬ 
sury issues only six securities with a maturity 
of two years or more (two, three, five, seven, 10, 
and 30 years). Conversely, in the swap market, 
swap rates are quoted on a wide swath of the 
maturity spectrum. 

Second, technical factors introduce some 
noise into Treasury yields and preclude them 
from being clear signals of benchmark risk-free 
interest rates. Treasury securities differ on di¬ 
mensions other than level of the coupon and 
maturity. Yields are affected when a note or 
bond is cheapest to deliver into the Treasury note 
or bond futures contracts. In addition, yields 
are also affected when the security is "on spe¬ 
cial" in the repo market. The tax treatment of 
bonds, especially those trading at a premium 
or a discount, can affect yields. Swap rates for 
the most part do not carry this excess baggage 
and are therefore more reflective of true, albeit 
risky, interest rates. 

Lastly, because of the differences in sovereign 
credit risk, comparing government yields 
across countries is tenuous at best. The swap 
curve, by contrast, reflects roughly the same 
level of credit risk across countries. Cross¬ 
country comparisons are more meaningful. 

A spread over the benchmark swap curve is 
simply the difference between the yield mea¬ 
sure in question and the linearly interpolated 
swap rate at the same maturity. It should be a 
suitable yield measure such as yield to maturity. 


Relative Value Analysis of Fixed-Income Products 


227 


yield to call, or cash flow yield for structured 
products. Because the swap rate is interpolated, 
the spread over the benchmark swap curve is 
often referred to as the interpolated spread or the 
I-spread. Interpolated spreads circumvent the 
problem of maturity mismatch that affects 
the level of the spread. This is especially true 
if the yield curve is steeply sloped. 

To find the I-spread, consider a 5.25% coupon 
bond issued by General Electric (GE) that ma¬ 
tures on December 6, 2017. For a settlement 
date of January 27,2009, the I-spread was 261.6 
basis points. This spread can be interpreted as 
the compensation the market demanded for the 
risk differential between the risky bond and the 
benchmark swap curve. 

The yield spreads can also be computed us¬ 
ing active or on-the-run Treasuries. Qn-the-run 
Treasuries are the most recently issued Trea¬ 
sury securities of a particular maturity. Since 
the yield curve is not flat, the yield spreads 
differ depending on the maturity of the on- 
the-run Treasury. Thus, even if the yield curve 
remains fixed, the yield spread will change as 
the bond rolls down the curve. Using the in¬ 
terpolated 8.9-year Treasury yield, suppose the 
yield spread for the GE bond on January 27, 
2009 was 284 basis points. This yield spread 
can then be compared to similar bonds at the 
time in order to determine which bond reflects 
the best relative value. 


ASSET SWAPS 

An asset swap is a synthetic structure that trans¬ 
forms the nature of the bond's cash flow from 
one form into another. The structure is created 
through the combination of a bond position 
(fixed-rate or floating-rate) with one or more 
interest rate swaps. Asset swaps are used exten¬ 
sively by financial institutions for asset-liability 
management. Namely, asset swaps transform 
the cash flows of long-term fixed-rate assets to 
floating-rate cash flows, which are in a form 
more amendable to financial institutions' fund¬ 
ing opportunities. 


Asset Swap Mechanics 

The mechanics of an asset swap are straight¬ 
forward. An investor, whom we shall refer to 
as the asset swap buyer, does the following: 
(1) takes a long position in a fixed-rate coupon 
bond with a bullet maturity, and (2) simulta¬ 
neously enters into an off-market interest rate 
swap with a tenor equal to the bond's remain¬ 
ing term to maturity. An off-market swap is 
one whose floating rates are determined with 
a nonzero spread added to the reference rate. 
Assume that the bond is trading at par. The 
asset swap buyer enters into an agreement to 
pay the semiannual coupon payments as the 
fixed-rate leg in exchange for floating-rate pay¬ 
ments at LIBOR plus (or minus) a spread (called 
the asset swap spread). For simplicity, assume 
the frequency of the fixed-rate and floating-rate 
payments are the same. The spread over LIBOR 
that makes the net present value of the coupon 
payments (i.e., the fixed-rate leg) and the pro¬ 
jected floating-rate payments equal to zero is 
the asset swap spread. 2 This asset swap spread is 
used as a measure of relative value regardless 
of whether the cash flows are actually swapped. 

Determining the Asset Swap Spread for a Par 
Bond 

To better understand how all the pieces fit to¬ 
gether, let's illustrate how an asset swap spread 
is calculated. Consider a corporate bond issued 
by General Electric that matures on December 6, 
2017, and pays coupon interest semiannually 
at an annual rate of 5.25%. Assume a position 
with a par value of $1 million. Further assume 
that this bond sold for par for settlement on 
December 6, 2008. For ease of exposition, we 
will evaluate the asset swap on a coupon pay¬ 
ment date to abstract some of the details of 
swaps. 

The asset swap spread is determined using 
the following procedure. First, assume that a $1 
million par value position of the General Elec¬ 
tric coupon bond was valued at a price of $100 
for settlement on December 6, 2008. (It actu¬ 
ally traded at a large premium at the time.) The 
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price paid for the bond at settlement is the flat 
price of $1,000,000 plus zero accrued interest 
such that the full price is $1,000,000 since it is 
a coupon payment date. Second, assume that a 
long position in an interest rate swap is estab¬ 
lished with a notional principal of $1,000,000. 
Third, determine the net cash difference at set¬ 
tlement. This amount is simply the difference 
between the bond's full price and the swap's 
principal amount plus accrued interest. By 
construction, this difference is zero in our illus¬ 
tration. Fourth, determine the spread over the 
reference rate (i.e., LIBOR) required to equate 
the present value of the swap's floating-rate 
payments and the present value of the fixed- 
rate payments (i.e., the bond's cash flows). In 
our illustration, a swap spread of 221.1 basis 
points satisfied this condition. 

Our illustration is a special case for a bond 
selling at par, and the accrued interest on both 
the bond and the swap are equal to zero. The 
asset swap spread makes the present value of a 
par swap's floating payments equal the bond's 
payments to maturity. This is true because the 
net cash at settlement is equal to zero. 

Par versus Market Structures 

Market participants use two types of fixed- 
floating asset swap structures—par and market. 
The par structure is the most prevalent. When 
utilizing a par structure, the notional amount 
of the interest rate swap is equal to the bond's 
maturity value. The price of the bond acquired 
by the asset swap buyer is par regardless of its 
market price. 3 If the bond is trading at a dis¬ 
count, the asset swap seller receives more for 
the bond than it is worth and garners an upfront 
"profit." Alternatively, if the bond is trading at 
a premium, the asset swap seller receives less 
for the bond than it is worth and suffers an up¬ 
front "loss." At the initiation of the asset swap, 
the present value of the net cash flows of both 
parties is zero, so any upfront profit or loss is 
illusory because the spread adjusts. The asset 
swap seller "gives up" the premium over par 


at inception and in return pays a lower spread 
on the floating-rate cash flows. For bonds trad¬ 
ing at a discount, the asset swap seller pays a 
higher spread on the floating-rate cash flows 
as recompense for capturing the discount at 
settlement. 

An asset swap with a par structure is two 
separate transactions: (1) The asset swap buyer 
pays par to the asset swap seller for a bond 
and (2) an off-market swap. Accordingly, after 
the asset swap's cash flows are established, the 
bond's credit performance has no impact on the 
interest rate swap. If the bond were to default, 
the asset swap buyer no longer receives coupon 
payments or the maturity payment. The asset 
swap buyer's obligations imposed by the swap 
continue on as before until it matures or can be 
closed out at market value. 

An alternative structure for an asset swap is 
called a market structure. This method differs 
from a par structure in four respects. First, the 
bond is purchased at its prevailing market price 
rather than at par. Second, the notional principal 
of the off-market swap floating-rate payments is 
scaled by the bond's full price. Third, at the end 
of the transaction's life, the asset swap buyer 
pays par to the asset swap seller and receives the 
original full price of the bond. Lastly, note also 
that the counterparty risk exposure is allocated 
differently in the two asset swap structures. If 
the bond in question trades at a premium, the 
asset swap seller bears more of the counterparty 
risk. Conversely, in a market structure for the 
same premium bond, the counterparty risk is 
tilted toward the asset swap buyer due to the 
net payment of the bond's premium at the end 
of the transaction. Correspondingly, if the bond 
in question trades at a discount, the tilt of the 
counterparty risk exposure is reversed for both 
structures. 

Determining the Asset Swap Spread 
in the General Case 

Let's introduce some real-world complications. 
First, we consider an asset swap with a 
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settlement date that falls between two coupon 
payment dates. Once this circumstance is con¬ 
sidered, both the coupon-paying bond and 
swap will have nonzero accrued interest. Sup¬ 
pose an asset swap with a par structure has a 
settlement day that falls between the two semi¬ 
annual coupon payment dates. By market con¬ 
vention, the asset swap buyer pays par for the 
bond and does not directly pay accrued interest. 
The asset swap buyer receives the full coupon 
payment at the next payment date and pays the 
full coupon payment as required on the fixed- 
rate side of the swap. The floating-rate swap 
payment from the asset swap seller is treated 
somewhat differently. Floating-rate payments 
are usually more frequent than fixed-rate 
payments (quarterly versus semiannually) and 
almost always use a different day count con¬ 
vention. The floating-rate payment is adjusted 
accordingly. 

As an illustration, consider a 4.125% coupon 
bond issued by Wal-Mart that matured on 
February 15, 2011. This bond delivered coupon 
payments semiannually. Suppose an asset swap 
buyer took a long position in this bond that 
was trading at a flat price of 103.764. We will 
sketch the procedure for calculating the asset 
swap spread if it had a trade settlement date 
of June 23, 2008. The notional principal is set 
to the default of $1 million. The asset swap 
spread that equates the present value of the 
cash flows was 75.7 basis points. As a result, the 
floating-rate swap payments would have been 
calculated with a rate of 3-month LIBOR plus 
75.7 basis points. The asset swap buyer's swap 
payments would have been simply the five 
semiannual coupon payments of $20,625 and 
$1,000,000 on the maturity date of February 15, 
2011. The asset swap seller's floating-rate swap 
payments would have depended on the value of 
3-month LIBOR on each payment date. As 
noted, the first floating-rate payment of 
$2,835.04 reflects the accrual from the settle¬ 
ment date on January 28, 2009, to the first 
payment date of February 15, 2009, using an 
actual/360 day count convention. 


Uses of Asset Swaps 

The primary reason for using an asset swap is 
to acquire some exposure to risks of a fixed rate 
while neutralizing the interest rate risk. For ex¬ 
ample, financial institutions typically fund on a 
floating-rate basis and unless they have a view 
on interest rates, management wants to invest 
in floating-rate assets. Financial institutions are 
active participants in the asset swap market by 
buying fixed-rate bonds and transforming the 
cash flow from those bonds into floating pay¬ 
ments, which provide a better match against 
their liability structure. An active asset swap 
market tends to eliminate pricing discrepancies 
between fixed-rate and floating-rate products. 

Asset swap spreads are often used as an indi¬ 
cator of relative value. If a fixed-income investor 
is considering five fixed-rate bonds of similar 
maturity and risk for inclusion in a portfolio 
and wants to assess their relative value, the in¬ 
vestor would simply find the highest asset swap 
spreads, which represent the best relative value. 

In practice, however, asset swaps are typically 
employed as a relative value detector in the fol¬ 
lowing manner. After choosing portfolio dura¬ 
tion (and perhaps key rate durations to control 
shaping risk) and after choosing a credit mix 
(or perhaps an average credit rating), find the 
constrained portfolio that produces the highest 
asset swap spread. This portfolio presumably 
represents the best relative value for a given 
duration target and credit target—with or with¬ 
out distributional constraints on durations and 
credit ratings. 

A Miscellany of Asset Swaps 

There are a handful of variations on the stan¬ 
dard asset swap structure discussed to this 
point. A forward start asset swap involves tak¬ 
ing a long position in a risky bond on a forward 
settlement date in combination with an inter¬ 
est rate swap whose asset swap spread is estab¬ 
lished today. This transaction allows an investor 
to gain an exposure to a risky product in the 
future at a known price today. Investors bear 
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no exposure to credit risk until the forward set¬ 
tlement date because the asset swap terminates 
if the bond defaults prior to this date. 

A cross-currency asset swap is a combination 
of a long position in a risky bond whose cash 
flows are denominated in a different currency 
and an off-market interest rate swap. The swap 
transforms the fixed-rate coupon payments into 
floating-rate cash flows in the investor's home 
currency. An exchange of principal occurs at the 
end of the swap's life as is common with cur¬ 
rency swaps. Moreover, the swap's cash flows 
are converted using a predetermined exchange 
rate. This asset swap variation would allow, say, 
a U.S. investor to take an exposure to a yen- 
denominated corporate bond while simultane¬ 
ously mitigating the interest rate and currency 
risks. 

Investors often use asset swaps in convertible 
bond arbitrage. Convertibles are ideal securities 
for "arbitrage" because the convertible itself, 
namely the underlying stock and the embedded 
derivatives, are traded along predictable ratios 
and any discrepancy or mispricing would give 
rise to arbitrage opportunities for hedge fund 
managers to exploit. The valuation of convert¬ 
ible bonds is driven by four primary factors: 
(1) interest rates, (2) credit spreads, (3) stock 
prices, and (4) volatility of stock prices. 
Convertible bond arbitrage involves taking a 
leveraged position (usually long) in the con¬ 
vertible bond to gain exposure to a mispriced 
factor while simultaneously hedging interest 
rates and small changes in stock prices. 

Callable asset swaps are used to strip out eq¬ 
uity and credit components with a structure 
that allows the investor to cancel the off-market 
swap on any call date. This ability to terminate 
the swap is accomplished through the purchase 
of Bermudan receiver swaptions. 

CREDIT DEFAULT SWAPS 

Credit default swaps (CDS) are contracts that en¬ 
able the transfer of credit risk between the two 


counterparties to the trade. CDS resemble insur¬ 
ance policies. 4 Taking long/short CDS positions 
is referred to as buying/selling "protection." 
The protection buyer pays the protection seller 
a periodic payment (premium) for protection 
against a credit event experienced by a reference 
asset or entity. Simply put, sellers of protection 
are taking on credit risk for a fee while pro¬ 
tection buyers are paying to reduce their credit 
risk exposure. A reference asset could refer to 
a single asset, and this is termed a single-name 
credit default swap. Alternatively, if the refer¬ 
ence asset is a group of assets, it is referred to as 
a basket credit default swap. A reference entity 
could be a corporation or government entity 
(sovereign or municipal). 

The payout of credit default swaps is contin¬ 
gent on the occurrence of a credit event. Defini¬ 
tions of credit events are published by the ISDA, 
the so-called "1999 Definitions." The 1999 Def¬ 
initions list eight different credit events, which 
include: (1) bankruptcy, (2) credit event upon 
merger, (3) cross acceleration, (4) cross default, 
(5) downgrade, (6) failure to pay, (7) repudi¬ 
ation/moratorium, and (8) restructuring. The 
most controversial credit event is a restructur¬ 
ing. A restructuring refers to an alteration of 
the debt obligation's original terms in an effort 
to make the obligation less onerous to the bor¬ 
rower. Among the terms that may be offered: (1) 
reduction in the stated rate of interest, (2) princi¬ 
pal reduction, (3) principal payment reschedul¬ 
ing or interest payment postponement, or (4) a 
change in the seniority level of the obligation. 
The inclusion of restructuring as a trigger for 
a credit event is desired by protection buyers 
because they insist it is part of their essential 
credit protection. Protection sellers counter that 
the restructuring provision is triggered by rou¬ 
tine modifications to the debt. In April 2001, the 
ISDA issued the so-called "Supplement Defini¬ 
tion" that indicates the conditions needed to 
qualify for a restructuring: (1) The reference 
obligation must have at least four bondholders, 
and (2) at least two-thirds of the bondholders 
must consent to the restructuring. 
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The market for single-name credit default 
swaps is an over-the-counter interdealer mar¬ 
ket. For credit default swaps on corporate or 
sovereign debt, the contract specifications are 
largely standardized. For example, the tenor is 
usually five years. Certain dealers are also will¬ 
ing to create customized contracts better suited 
to the counterparty's risk exposure. A protec¬ 
tion buyer makes payments (typically quar¬ 
terly) that are fixed by contract until a credit 
event is triggered or maturity, whichever is ear¬ 
lier. The formula for calculating the protection 
buyer's quarterly is given by the expression 

quarterly payment = CDS spread 
x notional principal 
x (days in period)/360 

Figure 1 presents these payments. If a credit 
event does not occur during the tenor of the 
CDS, the protection buyer's fixed payments are 
the only payments. At inception, there is no ex¬ 
change of principal between the buyer and the 
seller. If a credit event is triggered, there is an 
exchange between the protection buyer and 
protection seller. The protection buyer makes 
accrued payments up until the credit event date 
and then stops making quarterly payments. 

What both parties must do when there is a 
credit event depends on the settlement terms 
of the CDS. The settlement terms can specify 
either physical settlement or cash settlement. 
If the CDS specifies physical delivery, the pro¬ 
tection buyer delivers the reference obligation 
to the protection seller in return for the cash 
payment. Figure 2 illustrates this scenario. If 
the credit event is triggered, the seller's pay- 

Quarterly Fixed 
Premium 
Payments 


Zero 

Figure 1 Premium Payments for a CDS Assum¬ 
ing no Credit Event 


Deliverable Obligation 





Protection 


Protection 

Seller 

.► 

Notional Amount 

Buyer 


Figure 2 The Exchange if a Credit Event Occurs 

ment may be a prespecified amount or it may 
reflect the reference obligation's value decline. 
When the payment is fixed, it is based on a 
notional principal amount. Conversely, when 
the payment is based on the reference obli¬ 
gation's value decline, it is usually computed 
using pricing information obtained by polling 
several CDS dealers. 

Usually there is more than one obligation of 
the reference entity from which the protection 
buyer can choose. The set of all obligations that 
are permitted for physical delivery is called the 
deliverable obligations. Any obligation meeting 
the stated criteria (coupon, maturity, etc.) is part 
of this basket. Naturally, the protection buyer 
will choose among the deliverable obligations 
the one that is cheapest to deliver. 

CDS are structured to replicate the experience 
of a default in the cash market. If a credit event 
occurs, the deliverable obligation should trade 
at a deep discount to par. 

The seller's net loss will be the difference be¬ 
tween par and the deliverable obligation's re¬ 
covery value. Note that the CDS is a pure play 
in the deliverable obligation's credit risk. A long 
position in the reference instrument exposes the 
investor to other risks. 

As an illustration, consider a CDS with a refer¬ 
ence asset being a Citigroup 6.5% coupon bond 
that matured on January 18, 2011. The notional 
principal for this contract was $10 million. Sup¬ 
pose the following information was available: 


Reference Entity/Asset 

Citigroup 6.5% 1/18/2011 

Tenor 

5 years 

Effective date 

7/3/08 

Maturity date 

9/20/13 

Payment frequency 

Quarterly 


Protection 

Seller 


Protection 

Buyer 
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The first coupon payment date was Septem¬ 
ber 22,2008. Suppose the deal spread was 143.5 
basis points. Presented in the following table 
are the first four quarterly payments that the 
protection buyer made to the seller. 


Date 

Cash flow ($) 

9/22/08 

$32,287.50 

12/22/08 

$36,273.61 

3/20/09 

$35,077.78 

6/22/09 

$37,469.44 


There were 81 actual days of accrual from 
the effective date of 7/3/08 to the first coupon 
date of 9/ 22 /08, so inserting this number along 
with the notional principal of $10 million and a 
spread of 143.5 basis points (in decimal) gives 
the first quarterly payment of 

$32,287.50 = 0.01435 x $10 million x (81/360) 

The remainder of the quarterly payments are 
computed in the same fashion. Note that while 
the CDS spread remains fixed, the payments 
will vary somewhat due to the varying number 
of days between coupon payment dates. 

Credit Default Swap Basis 

A CDS is, under certain simplifying assump¬ 
tions, equivalent to a long position in an 
asset-swapped fixed-rate bond financed with a 
repurchase (repo) agreement. Accordingly, it is 
critical to address the linkage between asset 
swap spreads, CDS spreads, and credit spreads. 

Practitioners access relative value by compar¬ 
ing CDS spreads and asset-swap spread levels. 
In fact, the difference between the CDS pre¬ 
mium and the asset swap spread is referred 
to as the credit defaidt swap basis (CDS basis). 5 
Practitioners also look at differences between 
CDS spreads and either the I-spread or the zero- 
volatility spread (Z-spread). A nonzero basis 
signals opportunities for investors. If the basis 
is negative (i.e., the CDS spread is less than the 
asset swap spread), this suggests that the in¬ 
vestor buy the bond in the cash market and buy 
protection via a CDS. Conversely, if the basis 


is positive (i.e., the CDS spread is greater than 
the asset swap spread), this suggests that the 
investor sell the bond in the cash market and 
sell protection via a CDS. 

KEY POINTS 

• There are two approaches to the valuation of 
fixed-income products: discounted cash flow 
and relative value. 

• The relative value method can provide infor¬ 
mation about how similar bonds are priced 
on a relative basis. 

• A yield spread is the difference between a 
risky bond's yield and a benchmark yield 
holding maturity constant. 

• Two commonly used benchmark yield curves 
are the swap curve and the U.S. Treasury 
curve. 

• An asset swap is a synthetic structure that 
transforms the nature of cash flows from one 
form into another. 

• An asset swap spread is used as an indicator 
of relative value and is the spread over the 
reference rate that equates the value of the 
floating rate cash flows and the bond's cash 
flows. 

• The credit default swap (CDS) basis is the dif¬ 
ference between the CDS premium and the 
asset swap spread. 

• A nonzero CDS basis signals opportunities 
for investors. 

NOTES 

1. For a further discussion of relative value 
tools, see Fabozzi and Mann (2010) and 
Grieves and Mann (2010). 

2. For simplicity, we are ignoring any nonzero 
net payments at the beginning and end of 
the swap's life. These elements will be intro¬ 
duced shortly. 

3. When nonpar bonds are purchased as part 
of an asset swap structure, tax and account¬ 
ing rules create incentives to buy and sell 
premium / discount bonds at par through an 
asset swap structure. 
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4. More on the mechanics of CSD can be found 
in Anson, Fabozzi, Choudhry, and Chen 
(2004). 

5. For a further discussion of the CDS spread, 
see Choudhry (2006). 
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Abstract: The complication in valuing bonds with embedded options and option-type derivatives 
is that cash flows depend on interest rates in the future. Academicians and practitioners have 
attempted to capture this interest rate uncertainty through various models, often designed as one- 
or two-factor processes. These models attempt to capture the stochastic behavior of rates. In practice, 
these elegant mathematical models must be implemented numerically in order to be useful. One 
such model is a single factor model that assumes a stationary variance, or volatility. 


An often-used framework for the valuation of 
interest rate instruments with embedded op¬ 
tions and interest rate option-type derivatives 
is the lattice framework. Effectively, the lattice 
specifies the distribution of short-term interest 
rates over time. The lattice holds all the informa¬ 
tion required to perform the valuation of certain 
option-like interest rate products. First, the lat¬ 
tice is used to generate the cash flows across the 
life of the security. Next, the interest rates on the 
lattice are used to compute the present value of 
those cash flows. 

There are several interest rate models that 
have been used in practice to construct an inter¬ 
est rate lattice. In each case, interest rates can real¬ 
ize one of several possible levels when we move 
from one period to the next. A lattice model that 


allows only two rates in the next period is called 
a binomial model. A lattice model that allows 
three possible rates in the next period is called 
a trinomial model. There are even more com¬ 
plex models that allow more than three possible 
rates in the next period. 

Regardless of the underlying assumptions, 
each model shares a common restriction. In 
order to be "arbitrage-free," the interest rate 
tree generated must produce a value for an on- 
the-run optionless bond that is consistent with 
the current par yield curve. In effect, the value 
generated by the model must be equal to the 
observed market price for the optionless instru¬ 
ment. Under these conditions the model is said 
to be "arbitrage free." A lattice that produces an 
arbitrage-free valuation is said to be "fair." The 
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lattice is used for valuation only when it has 
been calibrated to be fair. More on calibration 
below. 

In this entry we will demonstrate how a lattice 
is used to value an option-free bond. The model 
is also used to value bonds with embedded op¬ 
tions, floating-rate securities with option-type 
derivatives, bond options, and swaptions. 1 

THE INTEREST RATE 
LATTICE 

In our illustration, we represent the lattice as 
a binomial tree, the simplest lattice form. Fig¬ 
ure 1 provides an example of a binomial interest 
rate tree, which consists of a number of "nodes" 
and "legs." Each leg represents a one-year in¬ 
terval over time. A simplifying assumption of 
one-year intervals is made to illustrate the key 
principles. The methodology is the same for 
smaller time periods. In fact, in practice the 
selection of the length of the time period is 
critical, but we need not be concerned with this 
nuance here. 

The distribution of future interest rates is rep¬ 
resented on the tree by the nodes at each point 
in time. Each node is labeled as "N" and has 
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Figure 1 Four-Year Binomial Interest Rate Tree 


a subscript, a combination of I/s and H's. The 
subscript indicates whether the node is lower or 
higher on the tree, respectively, relative to the 
other nodes. Thus, node Nhh is reached when 
the 1-year rate realized in the first year is the 
higher of the two rates for that period, then the 
highest of the rates in the second year. 

The root of the tree is N, the only point in 
time at which we know the interest rate with 
certainty. The 1-year rate today (that is, at N) is 
the current 1-year spot rate, which we denote 
by r 0 . 

We must make an assumption concerning the 
probability of reaching one rate at a point in 
time. For ease of illustration, we have assumed 
that rates at any point in time have the same 
probability of occurring. In other words, the 
probability is 50% on each leg. 

The interest rate model we will use to con¬ 
struct the binomial tree assumes that the 1-year 
rate evolves over time based on a lognormal 
random walk with a known (stationary) volatil¬ 
ity. Technically, the tree represents a one-factor 
model. Under the distributional assumption, the 
relationship between any two adjacent rates at 
a point in time is calculated via the following 
equation: 

v _ „ „ 2 cr^/t 

U,H = U.iT 

where a is the assumed volatility of the 1-year 
rate, t is the length of the time period in years, 
and e is the base of the natural logarithm. Since 
we assume a 1-year interval, that is, t — 1, we 
can disregard the calculation of the square root 
of t in the exponent. 

For example, suppose that r\ jL is 4.4448% and 
ct is 10% per year, then: 

ri, H = 4.4448%(e 2x010 ) = 4.4448%(1.2214) 

= 5.4289% 

In the second year, there are three possible 
values for the 1-year rate. The relationship be¬ 
tween r 2 ,iL and the other two 1-year rates is as 
follows: 

T2.HH = r 2 ,LL(e 4a ) and r 2 , H L = r 2 ,LL{e lcr ) 
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* r, is the lowest 1-year rate at each point in time. 

Figure 2 Four-Year Binomial Interest Rate Tree 
with 1-Year Rates* 

So, for example, if t 2/ ll is 4.6958%, and assum¬ 
ing once again that a is 10%, then 

r 2 ,HH = 4.6958%(e 4x0 ' 10 ) = 7.0053% 

and 

r 2 ,HL = 4.6958%(e 2x010 ) = 5.7354% 

This relationship between rates holds for each 
point in time. Figure 2 shows the interest rate 
tree using this notation. 

Determining the Value at a Node 

In general, to get a security's value at a node we 
follow the fundamental rule for valuation: The 
value is the present value of the expected cash 
flows. The appropriate discount rate to use for 
cash flows one year forward is the 1-year rate 
at the node where we are computing the value. 
Now there are two present values in this case: 
the present value of the cash flows in the state 
where the 1-year rate is the higher rate, and 
one where it is the lower rate state. We have 
assumed that the probability of both outcomes 
is equal. Figure 3 provides an illustration for a 
node assuming that the 1-year rate is r* at the 
node where the valuation is sought and letting: 



Bond’s value in higher-rate 
state one year forward 

* 

Cash flow in 


* V H + c 
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1 -year rate 

* 
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bond’s value 

-► r * \ 
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is sought 

• V L + C ◄- 

lower-rate state 


t 

Bond’s value in lower-rate 
state one year forward 



Figure 3 Calculating a Value at a Node 

Vi i = the bond's value for the higher 1-year rate 
state 

Vi = the bond's value for the lower 1-year rate 
state 

C = coupon payment 

From where do the future values come? Ef¬ 
fectively, the value at any node depends on the 
future cash flows. The future cash flows include 

(1) the coupon payment one year from now and 

(2) the bond's value one year from now, both 
of which may be uncertain. Starting the pro¬ 
cess from the last year in the tree and working 
backwards to get the final valuation resolves 
the uncertainty. At maturity, the instrument's 
value is known with certainty—par. The final 
coupon payment can be determined from the 
coupon rate, or from prevailing rates to which 
it is indexed. Working back through the tree, 
we realize that the value at each node is quickly 
calculated. This process of working backward 
is often referred to as recursive valuation. 

Using our notation, the cash flow at a node is 
either: 

Vh + C for the higher 1-year rate 
Vi + C for the lower 1-year rate 

The present value of these two cash flows us¬ 
ing the 1-year rate at the node, r*, is: 

^ = present value for higher 1-year rate 

L — = present value for lower 1-vear rate 
(1 + r*) r J 




238 


Bond Valuation 


Then, the value of the bond at the node is 
found as follows: 


Value at a node 


1 

2 


\V H + C 
(1 + r*) + 


Vl+C' 
(1 + r*)_ 


CALIBRATING THE LATTICE 

We noted above the importance of the no¬ 
arbitrage condition that governs the construc¬ 
tion of the lattice. To assure this condition holds, 
the lattice must be calibrated to the current par 
yield curve, a process we demonstrate here. Ul¬ 
timately, the lattice must price optionless par 
bonds at par. 

Assume the on-the-run par yield curve for a 
hypothetical issuer as it appears in Table 1. The 
current 1-year rate is known, 3.50%. Hence, the 
next step is to find the appropriate 1-year rates 
one year forward. As before, we assume that 
volatility, a , is 10% and construct a 2-year tree 
using the 2-year bond with a coupon rate of 
4.2%, the par rate for a 2-year security. 

Figure 4 shows a more detailed binomial tree 
with the cash flow shown at each node. The root 
rate for the tree, /'o, is simply the current 1-year 
rate, 3.5%. At the beginning of Year 2 there are 
two possible 1-year rates, the higher rate and 
the lower rate. We already know the relation¬ 
ship between the two. A rate of 4.75% at Nl 
has been arbitrarily chosen as a starting point. 
An iterative process determines the proper rate 
(that is, trial and error). The steps are described 
and illustrated below. Again, the goal is a rate 
that, when applied in the tree, provides a value 
of par for the 2-year, 4.2% bond. 

Step 1: Select a value for r\. Recall that r\ is 
the lower 1-year rate. In this first trial, we 
arbitrarily selected a value of 4.75%. 


Table 1 Issuer Par Yield Curve 


Maturity 

Par Rate 

Market Price 

1 year 

3.50% 

too 

2 years 

4.20% 

too 

3 years 

4.70% 

too 

4 years 

5.20% 

too 
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Figure 4 The 1-Year Rates for Year 1 Using the 
2-Year 4.2% On-the-Run Issue: First Trial 


Step 2: Determine the corresponding value for 
the higher 1-year rate. As explained earlier, 
this rate is related to the lower 1-year rate 
as follows: r\e la . Since r\ is 4.75%, the higher 
1-year rate is 5.8017% (= 4.75% e 2x010 ). This 
value is reported in Figure 4 at node Nh- 
Step 3: Compute the bond's value one year from 
now as follows: 

a. Determine the bond's value two years 
from now. In our example, this is sim¬ 
ple. Since we are using a 2-year bond, the 
bond's value is its maturity value ($100) 
plus its final coupon payment ($4.2). Thus, 
it is $104.2. 

b. Calculate Vh- Cash flows are known. The 
appropriate discount rate is the higher 
1-year rate, 5.8017% in our example. 
The present value is $98,486 (= $104.2/ 
1.058017). 

c. Calculate V L . Again, cash flows are 
known—the same as those in Step 3b. The 
discount rate assumed for the lower 1- 
year rate is 4.75%. The present value is 
$99,475 (= $104.2/1.0475). 

Step 4: Calculate V. 

a. Add the coupon to both V H and V L to ob¬ 
tain the values at Nh and Nl, respectively. 
In our example we have $102,686 for the 
higher rate and $103,675 for the lower rate. 
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b. Calculate V. The 1-year rate is 3.50%. 
(Note: At this point in the valuation, 
r* is the root rate, 3.50%. Therefore, 
$99,691 = V 2 ($99.214 + $100,169.) 

Step 5: Compare the value in Step 4 to the bond's 
market value. If the two values are the same, 
then the jq used in this trial is the one we 
seek. If, instead, the value found in Step 4 is 
not equal to the market value of the bond, 
then r\ in this trial is not the 1-year rate that 
is consistent with the current yield curve. In 
this case, the five steps are repeated with a 
different value for ?q . 

When T\ is 4.75%, a value of $99,691 results in 
Step 4, which is less than the observed market 
price of $100. Therefore, 4.75% is too large and 
the five steps must be repeated trying a lower 
rate for ?q . 

Let's jump right to the correct rate for ?q in 
this example and rework Steps 1 through 5. This 
occurs when ?q is 4.4448%. The corresponding 
binomial tree is shown in Figure 5. The value 
at the root is equal to the market value of the 
2-year issue (par). 

We can "grow" this tree for one more year 
by determining iq. Now we will use the 3-year 
on-the-run issue, the 4.7% coupon bond, to get 
ri- The same five steps are used in an iterative 


n hh 


* n h 


100.000 

3.5000% 



100.000 

4.2 


NHL 


n l 


99.766 

4.2 

4.4448% 


N L L 


Today 


Year 1 


Year 2 


Figure 5 The 1-Year Rates for Year 1 Using the 
2-Year 4.2% On-the-Run Issue 
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Figure 6 Information for Deriving the 1-Year 
Rates for Year 2 Using the 3-Year 4.7% On-the-Run 
Issue 

process to find the 1-year rates in the tree two 
years from now. Our objective is now to find 
the value of jq that will produce a bond value 
of $100. Note that the two rates one year from 
now of 4.4448% (the lower rate) and 5.4289% 
(the higher rate) do not change. These are the 
fair rates for the tree one year forward. 

The problem is illustrated in Figure 6. The 
cash flows from the 3-year, 4.7% bond are in 
place. All we need to perform a valuation are 
the rates at the start of Year 3. In effect, we need 
to find ?q such that the bond prices at par. Again, 
an arbitrary starting point is selected, and an 
iterative process produces the correct rate. 

The completed version of Figure 6 is found 
in Figure 7. The value of jq, or equivalently 
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r 2 ,LL> which will produce the desired result 
is 4.6958%. The corresponding rates r 2 , hl an d 
Tz.hh would be 5.7354% and 7.0053%, respec¬ 
tively. To verify that these are the correct 1- 
year rates two years from now, work backwards 
from the four nodes at the right of the tree in Fig¬ 
ure 7. For example, the value in the box at Nhh 
is found by taking the value of $104.7 at the two 
nodes to its right and discounting at 7.0053%. 
The value is $97,846. Similarly, the value in the 
box at Nhl is found by discounting $104.70 by 
5.7354% and at Nn by discounting at 4.6958%. 


USING THE LATTICE FOR 
VALUATION 

To illustrate how to use the lattice for valua¬ 
tion purposes, consider a 6.5% option-free bond 
with four years remaining to maturity. Since this 
bond is option-free, it is not necessary to use the 
lattice model to value it. All that is necessary to 
obtain an arbitrage-free value for this bond is 
to discount the cash flows using the spot rates 
obtained from bootstrapping the yield curve 
shown in Table 1. (All calculations are highly 
sensitive to the number of decimal places cho¬ 
sen.) The spot rates are as follows: 


1-year 

3.5000% 

2-year 

4.2147% 

3-year 

4.7345% 

4-year 

5.2707% 


Discounting the 6.5% 4-year option-free bond 
with a par value of $100 at the above spot rates 
would give a bond value of $104,643. 

Figure 8 contains the fair tree for a four-year 
valuation. Figure 9 shows the various values in 
the discounting process using the lattice in Fig¬ 
ure 8. The root of the tree shows the bond value 
of $104,643, the same value found by discount¬ 
ing at the spot rate. This demonstrates that the 
lattice model is consistent with the valuation of 
an option-free bond when using spot rates. 
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Figure 8 Binomial Interest Rate Tree for Valuing 
up to a 4-Year Bond for Issuer (10% Volatility As¬ 
sumed) 


The lesson here can be applied to more com¬ 
plex instruments, those with option features 
that require the lattice-based process for proper 
valuation and derivatives such as swaptions. 
Regardless of the security or derivative to be 
valued, the generation of the lattice follows 
the same no-arbitrage principles outlined here. 
Subsequently, cash flows are determined at 
each node, the recursive valuation process un¬ 
dertaken to arrive at fair values. Flence, a single 
lattice and a valuation process prove to be ro¬ 
bust means for obtaining fair values for a wide 
variety of fixed income instruments. 


KEY POINTS 

* The complication in valuing bonds with em¬ 
bedded options and option-type derivatives 
is that cash flows depend on interest rates in 
the future. 

* In practice, several interest rate models have 
been employed to construct an interest rate 
lattice. In each case, interest rates can real¬ 
ize one of several possible levels when we 
move from one period to the next. There are 
binomial lattices (two possible rates in the 
next period), trinomial lattices (three possi¬ 
ble rates in the next period), and even more 
complex models that allow more than three 
possible rates in the next period. 
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Figure 9 Valuing an Option-Free Bond with Four Years to Maturity and a Coupon Rate of 6.5% (10% 
Volatility Assumed) 


• Several models have been developed to 
value bonds with embedded options and 
option-type interest rate derivatives, the 
most common model being a one-factor 
model. 

• The lattice framework uses an arbitrage-free 
interest rate lattice or tree to generate the cash 
flows over the life of the financial instrument 
and then to determine the present value of 
the cash flow. The present value of the cash 
flow is then the fair value of the financial 
instrument. 

• The lattice must be constructed so as to be 
consistent with (that is, calibrated to) the ob¬ 
served market value of an on-the-run option- 
free issue. 


NOTE 

1. For an extensive discussion of the appli¬ 
cation to the valuation of embedded op¬ 
tions in bonds see Kalotay, Williams, and 
Fabozzi (1993), and for the application to 
interest rate swaptions see Fabozzi and Bue- 
tow (2000). 
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Abstract: In principle, the valuation of a financial instrument is straightforward: It is the present 
value of the expected cash flow. For fixed income securities, the expected cash flow, ignoring the 
possibility of default, is the periodic interest payments and the maturity value. The interest rates 
used to discount the expected cash flows are obtained from an appropriate benchmark spot rate 
curve. When a fixed-rate or floating-rate bond has an interest-sensitive embedded option such as a 
call option, put option, or a cap in the case of a floater, the expected cash flow will be dependent on 
future interest rates. To value fixed income securities with embedded options, the lattice framework 
is the standard tool in practice. The same lattice-based framework is also used to value interest- 
sensitive derivatives such as options, caps, and floors. 


We will demonstrate in this entry how the lattice 
framework provides a robust means for valuing 
fixed-rate and floating-rate bonds and interest 
rate derivatives. In addition, we extend the ap¬ 
plication of the interest rate tree to the calcu¬ 
lation of the option-adjusted spread, as well as 
the effective duration and convexity of a fixed 
income instrument. The model described below 
was first introduced by Kalotay, Williams, and 
Fabozzi (1993). 


FIXED-COUPON BONDS 
WITH EMBEDDED OPTIONS 

The valuation of bonds with embedded options 
proceeds in the same fashion as in the case of an 
option-free bond. Flowever, the added complex¬ 
ity of an embedded option requires an adjust¬ 
ment to the cash flows on the tree depending 
on the structure of the option. A decision on 
whether to call or put must be made at nodes 
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on the tree where the option is eligible for exer¬ 
cise. Examples for both callable and putable bonds 
follow. The analysis can be extended to cases 
where there are several embedded options such 
as a callable bond with an accelerated sinking 
fund provision. 

Valuing a Callable Bond 

In the case of a call option, the call will be made 
when the present value (PV) of the future cash 
flows is greater than the call price at the node 
where the decision to exercise is being made. 
Effectively, the following calculation is made: 

V t = Min[Call Price, PV(Future Cash Flows)] 

where V t represents the PV of future cash flows 
at the node. This operation is performed at each 
node where the bond is eligible for call. 

For example, consider a 6.5% bond with four 
years remaining to maturity that is callable in 
one year at $100. We will value this bond, as 
well as the other instruments in this entry, using 
a binomial tree. The on-the-run yield curve for 
the issuer used to construct the tree is given in 
Table 1. The methodology for constructing the 
binomial interest rate tree from the yield curve 
is not discussed here but is explained in Entry 
16. Application of the methodology results in 
the binomial interest rate tree in Figure 1. In 
constructing the binomial tree in Figure 1, it is 
assumed that interest rate volatility is 10% and 
that cash flows occur at the end of the year. This 
binomial tree will be used throughout this entry. 

Figure 2 shows two values are now present at 
each node of the binomial tree. The discounting 
process is used to calculate the first of the two 
values at each node. The second value is the 
value based on whether the issue will be called. 
To simplify the analysis, it is assumed that the 


Table 1 Issuer Par Yield Curve 


Maturity 

Par Rate 

Market Price 

1 year 

3.50% 

too 

2 years 

4.20% 
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A III > 
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Nlll 

Today Year 1 Year 2 Year 3 


Figure 1 Binomial Interest Rate Tree for Valuing 
up to a Four-Year Bond for Issuer (10% Volatility 
Assumed) 


issuer calls the issue if the PV of future cash 
flows exceeds the call price. This second value is 
incorporated into the subsequent calculations. 

In Figure 3 certain nodes from Figure 2 are 
highlighted. Panel (a) of the figure shows nodes 
where the issue is not called (based on the sim¬ 
ple call rule used in the illustration) in year 2 
and year 3. The values reported in this case are 
the same as in the valuation of an option-free 
bond. Panel (b) of the figure shows some nodes 
where the issue is called in year 2 and year 
3. Notice how the methodology changes the 
cash flows. In year 3, for example, at node Nhll 
the recursive valuation process produces a PV of 
100.315. 1 However, given the call rule, this issue 
would be called. Therefore, 100 is shown as the 
second value at the node and it is this value that 
is then used as the valuation process continues. 
Taking the process to its end, the value for this 
callable bond is 102.899. 

The value of the call option is computed as the 
difference between the value of an optionless 
bond and the value of a callable bond. In our 
illustration, the value of the option-free bond 
can be shown to be 104.643. The value of the 
callable bond is 102.899. Hence, the value of the 
call option is 1.744 (=104.634 - 102.899). 


Valuing a Putable Bond 

A putable bond is one in which the bondholder 
has the right to force the issuer to pay off the 
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Computed value 
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Figure 2 Valuing a Callable Bond with Four Years to Maturity, a Coupon Rate of 6.5%, and Callable 
after the First Year at 100 (10% Volatility Assumed) 



Year 2 Year 3 


Figure 3 Highlighting Nodes in Years 2 and 3 for 
a Callable Bond: (a) Nodes Where the Call Option 
Is Not Exercised and (b) Selected Nodes Where 
the Call Option Is Exercised 


bond prior to the maturity date. The analysis 
of the putable bond follows closely that of the 
callable bond. In the case of the putable, we 
must establish the rule by which the decision to 
put is made. The reasoning is similar to that for 
the callable bond. If the PV of the future cash 
flows is less than the put price (that is, par), then 
the bond will be put. In equation form, 

Vf = Max[Put Price, PV(Future Cash Flows)] 

Figure 4 is analogous to Figure 3. It shows 
the binomial tree with the values based on 
whether or not the investor exercises the put 
option at each node. The bond is putable any 
time after the first year at par. The value of the 
bond is 105.327. Note that the value is greater 
than the value of the corresponding option-free 
bond. 

With the two values in hand, we can calculate 
the value of the put option. Since the value of 
the putable bond is 105.327 and the value of the 
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Computed value 

Put price if exercised; computed value if not exercised 
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Figure 4 Valuing a Putable Bond with Four Years to Maturity, a Coupon Rate of 6.5%, and Putable 
after the First Year at 100 (10% Volatility Assumed) 


corresponding option-free bond is 104.643, the 
value of the embedded put option purchased 
by the investor is effectively 0.684. 

Suppose that a bond is both putable and 
callable. The procedure for valuing such a struc¬ 
ture is to adjust the value at each node to re¬ 
flect whether the issue would be put or called. 
Specifically, at each node there are two decisions 
about the exercising of an option that must be 
made. If it is called, the value at the node is 
replaced by the call price. The valuation proce¬ 
dure then continues using the call price at that 
node. If the call option is not exercised at a node, 
it must be determined whether or not the put 
option will be exercised. If it is exercised, then 
the put price is substituted at that node and is 
used in subsequent calculations. 


FLOATING-COUPON BONDS 
WITH EMBEDDED OPTIONS 

Simple discounted cash flow methods of anal¬ 
ysis fail to handle floaters with embedded or 
option-like features. In this section we demon¬ 
strate how to use the lattice model to value (1) 
a capped floater, and (2) a callable capped floater. 
We will streamline the notation used in the bi¬ 
nomial tree in the figures shown in this section. 

Valuing Capped Floating-Rate 
Bonds 

Consider a floating-rate bond with a coupon 
indexed to the 1-year rate (the reference rate) 
plus a spread. For our purposes, assume a 25 
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basis point (bp) spread to the reference rate. 
The coupon adjusts at each node to reflect the 
level of the reference rate plus the spread. 

Using the same valuation method as before, 
we can find the value at each node. Recall the 
value of the bond is 100 (par) at the end of 
year 4. Consider Nhll- 


Nhll = 


100 + 6.416 


1.06166 
100.235 


100 + 6.416 
1.06166 


Stepping back one period 

100.235 + 4.9458 | 100.238 + 4.9458' 
1.046958 + 1.046958 

= 100.465 



Following this same procedure, we arrive 
at the price of 100.893. How would this 


change if the interest rate on the bond were 
capped? 

Assume that the cap is 7.25%. In Figure 5 
we've taken the tree from Figure 1 and, as was 
the case with the optionless fixed-coupon bond, 
at each node we've entered the cash flow ex¬ 
pected at the end of each period based on the 
reset formula. As rates move higher there is a 
possibility that the current reference rate ex¬ 
ceeds the cap. Such is the case at Nhhh and 
Nhhl ■ The coupon is subject to the following 
constraint: 


C t = Min[r f , 7.25%] 


As a result of the cap, the value of the bond 
in the upper nodes at f = 3 falls below par. For 
example. 
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Figure 5 Valuation of a Capped Floating-Rate Bond 
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Valuing recursively through the tree, we ar¬ 
rive at the current value of the capped floater, 
100.516, a value lower than the plain vanilla 
floater. This last calculation gives us a means 
for pricing the embedded option. Without a cap, 
the bond is priced at 100.893. The difference be¬ 
tween these two prices is the value of the cap, 
0.377. It is important to note that the price of the 
cap is volatility dependent. Any change in the 
volatility would result in a different valuation 
for the cap. The greater the volatility, the higher 
the price of the option, and vice versa. 

We can extend the application of the lattice 
to the initial pricing of securities. What if an 
issuer wanted to offer this bond at par? In such 
a case, an adjustment has to be made to the 
coupon. To lower the price from 100.516 to par, 
a lower spread over the reference rate is offered 
to investors. Figure 6 shows the relationship 
between the spread over the 1-year reference 
rate and the bond price. At a spread of 8.70 
bps over the 1-year reference rate, the capped 
floater in Figure 5 will be priced at par. Again, 
the spread of 8.7 bps is volatility dependent. 

Callable Capped Floating-Rate 
Bonds 

Now consider a call option on the capped 
floater. As was the case for a fixed-coupon bond, 
we must be careful to specify the appropriate 
rules for calling the bond on the valuation tree. 
It turns out that the rule is the same for floaters 
and fixed-coupon bonds. Any time the bond 



Figure 6 Spread to Index to Price Cap at Par 


has a PV above par at a node where the bond 
is callable, the bond will be called. (Flere we 
assume a par call to simplify the illustration.) 

Before we get into the details, it is important 
to motivate the need for a call on a floating-rate 
bond. The value of a cap to the issuer increases 
as market rates near the cap and there is the 
potential for rates to exceed the cap prior to 
maturity. As rates decline, so does the value of 
the cap. The problem for the issuer in the event 
of low rates is the additional basis-point spread 
it is paying for a cap that now has little or no 
value. Thus, when rates decline, a call has value 
to the issuer because it can call and reissue at a 
different spread. 

Suppose that the capped floater is callable at 
par anytime after the first year. Figure 7 pro¬ 
vides details on the effect of the call option 
on valuation of the capped floater. Again, for a 
callable bond, when the present value exceeds 
par in a recursive valuation model, the bond is 
called. In the case of our 4-year bond, in Figure 
7 the value of the bond at several lower nodes is 
now 100, the call price. The full effect of the call 
option on price is evident with today's price for 
the bond moving to 99.9140. 

The by-product of this analysis is the value of 
the call option on a capped floater. We now have 
the fair value of the capped floater versus the 
callable capped floater. So the call option has a 
value of 100.516 - 100.189 = 0.327. 

Flow would one structure the issue so that it 
is priced at par? We have to offer a lower spread 
over the floating rate than the holder is already 
receiving for accepting the cap. In this case, we 
need to move the total spread over the 1-year 
floating rate to 13.37 bps. Figure 8 shows the 
relationship between spread and value. 


VALUING CAPS AND 
FLOORS 

An interest rate cap is nothing more than a pack¬ 
age or strip of options. More specifically, a cap 
is a strip of European options on interest rates. 
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Figure 7 Valuation of a Capped Floating-Rate Bond 


Thus, to value a cap, the value of each period's 
cap, called a caplet, is found and all the caplets 
are then summed. 



Figure 8 Spread to Index to Price Callable Cap 
at Par 


In order to value caps and floors, a modifi¬ 
cation of the lattice framework is required. The 
modification is necessary because of the timing 
of the payments for a cap and floor: Settlement 
for the typical cap and floor is paid in arrears. 
Payment in arrears means that the interest rate 
paid is determined at the beginning of the pe¬ 
riod, but the actual payment is made at the end 
of the period (that is, beginning of the next pe¬ 
riod). This modification complicates the nota¬ 
tion and will not be made here but is explained 
in Fabozzi (2006). 

To illustrate, we once again use the binomial 
tree given in Figure 1 to value a cap. Consider a 
5.2% 3-year cap with a notional amount of $10 
million. The reference rate is the 1-year rate. The 
payoff for the cap is annual. 
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The three panels in Figure 9 show how this 
cap is valued by valuing the three caplets. The 
value for the caplet for any year, say year X, is 
found as follows. First, calculate the payoff in 
year X at each node as either: 

1. Zero if the one-year rate at the node is less 
than or equal to 5.2%, or 

2. The notional amount of $10 million times 
the difference between the 1-year rate at the 
node and 5.2% if the 1-year rate at the node 


is greater than 5.2%. 




Assumptions : 

Cap rate: 5.2% 






Notional amount: $10,000,000 




Payment frequency: Annual 





Panel A: The Value of the Year 1 Caplet 
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Panel B: The Value of the Year 2 Caplet 
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Then, the recursive valuation process is used to 
determine the value of the year X caplet. 

For example, consider the year 3 caplet. At 
the top node in year 3 of Panel C of Figure 9, 
the 1-year rate is 9.1987%. Since the 1-year 
rate at this node exceeds 5.2%, the payoff in 
year 3 is: 

$10,000,000 x (0.091987 - 0.052) = $399,870 

For node Nhh we look at the value for the cap 
at the two nodes to its right, Nhhh and Nhhl- 
Discounting the values at these nodes, $399,870 
and $233,120, by the interest rate from the bino¬ 
mial tree at node Nhh, 7.0053%, we arrive at a 
value of $295,755. That is. 

Value at N HH = [$399,870/(1.070053) 

+ $233,120(1.070053)]/2 
= $295,775 

The values at nodes Nhh and Nhl are dis¬ 
counted at the interest rate from the binomial 
tree at node Nh, 5.4289%, and then the value is 
computed. That is. 

Value at N H = [$295,775/(1.054289) 

+ $155,918/(1.054289)1/2 
= $214,217 


Panel C: The Value of the Year 3 Caplet 



Today Year 1 Year 2 Year 3 


Value of Year 3 caplet = $150,214 

Summary: Value of 3-Year Cap = $11,058 + $66,009 + 

$150,214 = $227,281 

Note on calculations: Payoff in last box of each figure is 
$10,000,000 x Maximum [(Rate at node - 5.2%, 0)] 

Figure 9 Valuation of a Three-Year 5.2% Cap 
(10% Volatility Assumed) 


Finally, we get the value at the root, node N, 
which is the value of the year 3 caplet found 
by discounting the value at Nh and Nl by 
3.5% (the interest rate at node N). Doing so 
gives: 

Value at N = [$214,217/(1.035) 

+ $96,726/(1.035)1/2 
= $150,214 

Following the same procedure, the value of 
the year 2 caplet is $66,009 and the value of the 
year 1 caplet is $11,058. The value of the cap is 
then the sum of the three caplets. 

Thus, the value of the cap is $227,281, found 
by adding $11,058, $66,009, and $150,214. The 
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valuation of an interest rate floor is done in the 
same way. 


VALUATION OF TWO MORE 
EXOTIC STRUCTURES 

The lattice-based recursive valuation method¬ 
ology is robust. To further support this claim, 
we address the valuation of two more exotic 
structures—the step-up callable note and the 
range floater. 

Valuing a Step-Up Callable Note 

Step-up callable notes are callable instruments 
whose coupon rate is increased (that is, 
"stepped up") at designated times. When the 
coupon rate is increased only once over the 
security's life, it is said to be a single step-up 
callable note. A multiple step-up callable note 
is a step-up callable note whose coupon is in¬ 
creased more than one time over the life of 
the security. Valuation using the lattice model 
is similar to that for valuing a callable bond 


described above except that the cash flows are 
altered at each node to reflect the coupon char¬ 
acteristics of a step-up note. 

Suppose that a four-year step-up callable note 
pays 4.25% for two years and then 7.5% for two 
more years. Assume that this note is callable at 
par at the end of year 2 and year 3. We will use 
the binomial tree given in Figure 1 to value this 
note. 

Figure 10 shows the value of the note if it were 
not callable. The valuation procedure is the re¬ 
cursive valuation from Figure 2. The coupon 
in the box at each node reflects the step-up 
terms. The value is 102.082. Figure 11 shows 
that the value of the single step-up callable note 
is 100.031. The value of the embedded call op¬ 
tion is equal to the difference in the optionless 
step-up note value and the step-up callable note 
value, 2.051. 

Now we move to another structure where the 
coupon floats with a reference rate, but is re¬ 
stricted. In this next case, a range is set in which 
the bond pays the reference rate when the rate 
falls within a specified range, but outside the 
range no coupon is paid. 


Step-up coupon structure: 4.25% for Years 1 and 2 

7.50% for Years 3 and 4 
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Figure 10 Valuing a Single Step-Up Noncallable Note with Four Years to Maturity (10% Volatility 
Assumed) 
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Step-up coupon structure: 4.25% for Years 1 and 2 
7.50% for Years 3 and 4 


Computed value 

Call price if exercised; computed value if not exercised 
Coupon based on step-up schedule 
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Figure 11 Valuing a Single Step-Up Callable Note with Four Years to Maturity, Callable in Two Years 
at 100 (10% Volatility Assumed) 


Valuing a Range Note 

A range note is a security that pays the reference 
rate only if the rate falls within a band. If the 
reference rate falls outside the band, whether 
the lower or upper boundary, no coupon is paid. 
Typically, the band increases over time. 

To illustrate, suppose that the reference rate 
is, again, the 1-year rate and the note has three 
years to maturity. Suppose further that the band 
(or coupon schedule) is defined as in Table 2. 
Figure 12 shows the interest rate tree and the 
cash flows expected at the end of each year. Ei- 


Table 2 Coupon Schedule (Bands) for a Range Note 



Year 1 

Year 2 

Year 3 

Lower Limit 

3.00% 

4.00% 

5.00% 

Upper Limit 

5.00% 

6.25% 

8.00% 


ther the 1-year reference rate is paid, or nothing. 
In the case of this 3-year note, there is only one 
state in which no coupon is paid. Using our re¬ 
cursive valuation method, we can work back 
through the tree to the current value, 98.963. 

VALUING AN OPTION 
ON A BOND 

Thus far we have seen how the lattice can be 
used to value bonds with embedded options. 
The same tree can be used to value a stand¬ 
alone option on a bond. 

To illustrate how this is done, consider a 2- 
year American call option on a 6.5% 2-year Trea¬ 
sury bond with a strike price of 100.25 which 
will be issued two years from now. We will as¬ 
sume that the on-the-run Treasury yields are 
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Year 0 Year 1 Year 2 


Figure 12 Valuation of a Three-Year Range 
Floater 

those represented in Figure 13. Within the bino¬ 
mial tree we find the value of the Treasury bond 
at each node. Figure 14 shows the value of our 
hypothetical Treasury bond (excluding coupon 
interest) at each node at the end of year 2. 

The decision rule at a node for determining 
the value of an option on a bond depends on 
whether or not the call or put option being val¬ 
ued is in the money. Moreover, the exercise deci¬ 
sion is only applied at the expiration date. That 
is, a call option will be exercised at the option's 
expiration date if the bond's value at a node is 
greater than the strike price. In the case of a put 
option, the option will be exercised if the strike 
price at a node is greater than the bond's value 
(that is, if the put option is in the money). 

Three values for the underlying 2-year bond 
are shown in Figure 14: 97.925, 100.418, and 


102.534. Given these three values, the value of 
a call option with a strike price of 100.25 can 
be determined at each node. For example, if in 
year 2 the price of this Treasury bond is 97.925, 
then the value of the call option would be zero. 
In the other two cases, since the value at the 
end of year 2 is greater than the strike price, the 
value of the call option is the difference between 
the price of the bond at the node and 100.25. 

Given these values, the binomial tree is used 
to find the present value of the call option us¬ 
ing recursive valuation. The discount rates are 
the now familiar 1-year forward rates from the 
binomial tree. The expected value at each node 
for year 1 is found by discounting the call op¬ 
tion value from year 2 using the rate at the node. 
Move back one more year to "Today." The value 
of the option is 0.6056. 

The same procedure is used to value a put 
option on a bond. 

EXTENSIONS 

We next demonstrate how to compute the 
option-adjusted spread, effective duration, and 
the convexity for a fixed income instrument 
with an embedded option. 

Option-Adjusted Spread 

We have concerned ourselves with valuation 
to this point. However, financial market trans¬ 
actions determine the actual price for a fixed 


Call value 



Rate from binomial tree 



Today Year 1 Year 2 


Value of call at end of year 2 
Treasury value at end of year 2 
Expected call value 
Rate from binomial tree 
Value of call at end of year 2 
Treasury value at end of year 2 
Expected call value 
Rate from binomial tree 
Value of call at end of year 2 
Treasury value at end of year 2 


Figure 13 Using the Arbitrage-Free Binomial Method 

Expiration: 2 years; Strike Price: 100.25; Current Price: 104.643; Volatility Assumption: 10% 
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Figure 14 Demonstration That the Option-Adjusted Spread is 35 Basis Points for a 6.5% Callable Bond 
Selling at 102.218 (Assuming 10% Volatility) 


income instrument, not a series of calculations 
on an interest rate lattice. If markets are able 
to provide a meaningful price (usually a func¬ 
tion of the liquidity of the market in which the 
instrument trades), this price can be translated 
into an alternative measure of relative value, 
the option-adjusted spread (OAS). 

The OAS for a security is the fixed spread 
(usually measured in basis points) over the 
benchmark rates that equates the output from 
the valuation process with the actual market 
price of the security. 2 For an optionless security, 
the calculation of OAS is a relatively simple it¬ 
erative process. The process is much more ana¬ 
lytically challenging with the added complexity 
of optionality. And, just as the value of the op¬ 


tion is volatility dependent, the OAS for a fixed 
income security with embedded options or an 
option-like interest rate product is volatility de¬ 
pendent. 

Recall our illustration in Figure 2 where the 
value of a callable bond was calculated as 
102.899. Suppose that we had information from 
the market that the price is actually 102.218. We 
need the OAS that equates the value from the 
lattice with the market price. Since the market 
price is lower than the valuation, the OAS is a 
positive spread to the rates in the figure, rates 
which we assume to be benchmark rates. 

The solution in this case is 35 basis points, 
which is incorporated into Figure 14 that shows 
the value of the callable bond after adding 
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35 basis points to each rate. The simple, bino¬ 
mial tree provides evidence of the complex cal¬ 
culation required to determine the OAS for a 
callable bond. In Figure 2, the bond is called at 
Nhll- However, once the tree is shifted 35 bps 
in Figure 14, the PV of future cash flows at Nhll 
falls below the call price to 99.985, so the bond 
is not called at this node. Hence, as the lattice 
structure grows in size and complexity, the need 
for computer analytics becomes obvious. 


Effective Duration and Effective 
Convexity 

Duration and convexity provide a measure of 
the interest rate risk inherent in a fixed income 
security. 3 We rely on the lattice model to calcu¬ 
late the effective duration and effective convexity 
of a bond with an embedded option and other 
option-like securities. The formulas for these 
two risk measures are given below: 


Effective duration = 


V_ - V+ 
2 Vo(Ar) 


Effective convexity = 


V+ - V_ - 2V 0 
2 Vo(Ar) 2 


where V_ and V + are the values derived follow¬ 
ing a parallel shift in the yield curve down and 
up, respectively, by a fixed spread. The model 
adjusts for the changes in the value of the em¬ 
bedded call option that result from the shift in 
the curve in the calculation of V_ and V + . 

Note that the calculations must account for 
the OAS of the security. Below we provide the 
steps for the proper calculation of V + . The cal¬ 
culation for V_ is analogous. 


Step 1: Given the market price of the issue, cal¬ 
culate its OAS. 

Step 2: Shift the on-the-run yield curve up by a 
small number of basis points (A r). 

Step 3: Construct a binomial interest rate tree 
based on the new yield curve from Step 2. 

Step 4: Shift the binomial interest rate tree by 
the OAS to obtain an "adjusted tree." That 


is, the calculation of the effective duration 
and convexity assumes a constant OAS. 

Step 5: Use the adjusted tree in Step 4 to deter¬ 
mine the value of the bond, V + . 

We can perform this calculation for our 4- 
year callable bond with a coupon rate of 6.5%, 
callable at par selling at 102.218. We computed 
the OAS for this issue as 35 basis points. Figure 
15 shows the adjusted tree following a shift in 
the yield curve up by 25 basis points, and then 
adding 35 basis points (the OAS) across the tree. 
The adjusted tree is then used to value the bond. 
The resulting value, V + is 101.621. 

To determine the value of V_, the same five 
steps are followed except that in Step 2, the on- 
the-run yield curve is shifted down by the same 
number of basis points (A r). It can be demon¬ 
strated that for our callable bond, the value for 
V_ is 102.765. 

The results are summarized below: 


A r = 0.0025 
V+ = 101.621 
V_ = 102.765 
V 0 = 102.218 


Therefore, 


effective duration = 


102.765- 101.621 
2(102.218)(0.0025) 


Effective convexity = 


101.621 + 102.765 - 2(102.218) 
2(102.218)(0.0025) 2 
= -39.1321 


2.24 


Notice that this callable bond exhibits negative 
convexity. 


KEY POINTS 

• The valuation of an option-free bond is 
straightforward. However, once there is a pro¬ 
vision in the bond structure that grants the is¬ 
suer and / or the investor an option, valuation 
becomes more difficult. 

• The standard technology employed to value 
bonds with embedded options that depend 
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* +25 basis point shift in on-the-run yield curve. 


Figure 15 Determination of V + for Calculating Effective Duration and Convexity* 


on future interest rates, such as callable and 
putable bonds, is the lattice framework. 

* The initial step in the lattice approach is to 
generate an arbitrage-free lattice or interest 
rate tree from an appropriate on-the-run yield 
curve. 

* Based on rules specified by the modeler for 
when an option will be exercised, a lattice of 
future cash flows is obtained and then valued 
using the interest rates in the lattice. 

* The same model is used to value interest 
rate-sensitive derivatives such as options on 
bonds, interest rate caps, and interest rate 
caps and floors. 

* Other useful analytical measures can be ob¬ 
tained using the lattice model. These mea¬ 
sures include the option-adjusted spread—a 
measure of relative value—and effective du¬ 
ration and effective convexity—measures of 
price sensitivity to changes in interest rates. 


NOTES 

1. See Kalotay, Williams, and Fabozzi (1993). 

2. For a discussion of OAS, see Fabozzi (1990, 

2012 ). 

3. See Fabozzi (1999, 2012) for a discussion of 
effective duration and convexity. 
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Abstract: Ubiquity of option-adjusted spread (OAS) in finance practice is remarkable, in light of the 
fact that there is no general consensus on its implementation. Investors in mortgage-backed (MBS) 
and asset-backed (ABS) securities hold a long position in noncallable bonds and short positions in 
prepayment (call) options. The noncallable bond is a bundle of zero coupon bonds, and the call 
option gives the borrower the right to prepay the loan at any time prior to the scheduled principal 
repayment dates. The call option component of the valuation consists of intrinsic and time values. 
To the extent that the option embedded in ABS/MBS is a delayed American exercise style, the time 
value component associated with prepayment volatility needs to be evaluated. To evaluate this 
option, OAS analysis uses an option-based technique to price ABS/MBS under different interest 
rate scenarios. Hence, OAS is the spread differential between the zero volatility spread and option 
value components of an ABS/MBS. 


Investors and analysts continue to wrestle 
with the differences in option-adjusted-spread 
(OAS) values for securities they see from com¬ 
peting dealers and vendors. And portfolio man¬ 
agers continue to pose fundamental questions 
about OAS with which we all struggle in the fi¬ 
nancial industry. Some of the frequently asked 
questions are 

• How can we interpret the difference in deal¬ 
ers' OAS values for a specific security? 

• What is responsible for the differences? 

• Is there really a correct OAS value for a given 
security? 

In this entry, we examine some of the ques¬ 
tions about OAS analysis, particularly the basic 
building block issues about OAS implementa¬ 


tion. Because some of these issues determine 
"good or bad" OAS results, we believe there is a 
need to discuss them. To get at these fundamen¬ 
tal issues, we hope to avoid sounding pedantic 
by relegating most of the notations and expres¬ 
sions to the endnotes. 

Clearly, it could be argued that portfolio man¬ 
agers do not need to understand the OAS en¬ 
gine to use it but that they need to know how 
to apply it in relative value decisions. This ar¬ 
gument would be correct if there were market 
standards for representing and generating in¬ 
terest rates and prepayments. In the absence of 
a market standard, investors need to be familiar 
with the economic intuitions and basic assump¬ 
tions made by the underlying models. More 
important, investors need to understand what 
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works for their situation and possibly iden¬ 
tify those situations in which one model incor¬ 
rectly values a bond. Although pass-throughs 
are commoditized securities, OAS results still 
vary considerably from dealer to dealer and 
vendor to vendor. This variance is attributable 
to differences in the implementation of the re¬ 
spective OAS models. 

Unlike other market measures, for example, 
yield to maturity and the weighted average life 
of a bond, which have market standards for 
calculating their values, OAS calculations suf¬ 
fer from the lack of a standard and a black¬ 
box mentality. The lack of a standard stems 
from the required inputs in the form of inter¬ 
est rate and prepayment models that go into 
an OAS calculation. Although there are many 
different interest rate models available, there 
is little agreement on which one to use. More¬ 
over, there is no agreement on how to model 
prepayments. The black-box mentality comes 
from the fact that heavy mathematical machin¬ 
ery and computational algorithms are involved 
in the development and implementation of an 
OAS model. This machinery is often so cryptic 
that only a few initiated members of the intel¬ 
lectual tribe can decipher it. In addition, dealers 
invest large sums in the development of their 
term structures and prepayment models and, 
consequently, they are reluctant to share it. 

In this entry, we review some of the proposed 
term structures and prepayments. Many of the 
term structure models describe "what is" and 
only suggest that the models could be used. 
Which model to use perhaps depends on the 
problem at hand and the resources available. 
In this entry, we review some of the popular 
term structure models and provide some gen¬ 
eral suggestions on which ones should not be 
used. 

Investors in asset-backed securities (ABS) and 
mortgage-backed securities (MBS) hold long 
positions in noncallable bonds and short po¬ 
sitions in call (prepayment) options. The non¬ 
callable bond is a bundle of zero-coupon bonds 
(e.g.. Treasury strips), and the call option gives 


the borrower the right to prepay the mortgage 
at any time prior to the maturity of the loan. 
In this framework, the value of MBS is the dif¬ 
ference between the value of the noncallable 
bond and the value of the call (prepayment) 
option. Suppose a theoretical model is devel¬ 
oped to value the components of ABS/MBS. 
The model would value the noncallable com¬ 
ponent, which we loosely label the zero volatil¬ 
ity component, and the call option component. 
If interest rate and prepayment risks are well 
accounted for, and if those are the only risks 
for which investors demand compensation, one 
would expect the theoretical value of the bond 
to be equal to its market value. If these values 
are not equal, then market participants demand 
compensation for the unmodeled risks. One of 
these unmodeled risks is the forecast error asso¬ 
ciated with the prepayments. By this, we mean 
the actual prepayment may be faster or slower 
than projected by the model. Other unmodeled 
risks are attributable to the structure and liquid¬ 
ity of the bond. In this case, OAS is the market 
price for the unmodeled risks. 

To many market participants, however, OAS 
indicates whether a bond is mispriced. All else 
being equal, given that interest rate and prepay¬ 
ment risks have been accounted for, one would 
expect the theoretical price of a bond to be equal 
to its market price. If these two values are not 
equal, a profitable opportunity may exist in a 
given security or a sector. Moreover, OAS is 
viewed as a tool that helps identify which secu¬ 
rities are cheap or rich when the securities are 
relatively priced. 

The zero volatility component of ABS/MBS 
valuation is attributable to the pure interest 
rate risk of a known cash flow—a noncallable 
bond. The forward interest rate is the main 
value driver of a noncallable bond. Indeed, the 
value driver of a noncallable bond is the sum 
of the rolling yield and the value of the con¬ 
vexity. The rolling yield is the return earned if 
the yield curve and the expected volatility are 
unchanged. Convexity refers to the curvature 
of the price-yield curve. A noncallable bond 
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exhibits varying degrees of positive convexity. 
Positive convexity means a bond's price rises 
more for a given yield decline than it falls for 
the same yield. By unbundling the noncallable 
bond components in ABS/MBS to their zero- 
coupon bond components, the rolling yield be¬ 
comes dominant. Hence, it is called the zero 
volatility component—that is, the component of 
the yield spread that is attributable to no change 
in the expected volatility. 

The call option component in ABS/MBS val¬ 
uation consists of intrinsic and time values. To 
the extent the option embedded in ABS/MBS is 
the delayed American exercise style—in other 
words, the option is not exercised immediately 
but becomes exercisable any time afterward— 
the time value component dominates. Thus, in 
valuing ABS/MBS, the time value of the op¬ 
tion associated with the prepayment volatility 
needs to be evaluated. To evaluate this option, 
OAS analysis uses an option-based technique 
to evaluate ABS/MBS prices under different 
interest rate scenarios. OAS is the spread dif¬ 
ferential between the zero volatility and op¬ 
tion value components of MBS. These values 
are expressed as spreads measured in basis 
points. 

The option component is the premium paid 
(earned) from going long (shorting) a prepay¬ 
ment option embedded in the bond. The bond¬ 
holders are short the option, and they earn the 
premium in the form of an enhanced coupon. 
Mortgage holders are long the prepayment op¬ 
tion, and they pay the premium in spread above 
the comparable Treasury. The option compo¬ 
nent is the cost associated with the variability 
in cash flow that results from prepayments over 
time. 

The two main inputs into the determination 
of an OAS of a bond are as follows: 

* Generate the cash flow as a function of the 

principal (scheduled and unscheduled) and 

coupon payments. 

* Generate interest rate paths under an as¬ 
sumed term structure model. 


At each cash flow date, a spot rate deter¬ 
mines the discount factor for each cash flow. 
The present value of the cash flow is equal to 
the sum of the product of the cash flow and 
the discount factors. 1 When dealing with a case 
in which uncertainty about future prospects is 
important, the cash flow and the spot rate need 
to be specified to account for the uncertainty. 2 
The cash flow and spot rate become a function 
of time and the state of the economy. The time 
consideration is that a dollar received now is 
worth more than one received tomorrow. The 
state of the economy consideration accounts for 
the fact that a dollar received in a good econ¬ 
omy may be perceived as worth less than a dol¬ 
lar earned in a bad economy. For OAS analysis, 
the cash flow is run through different economic 
environments represented by interest rates and 
prepayment scenarios. The spot rate, which is 
used to discount the cash flow, is run through 
time steps and interest rate scenarios. The spot 
rate represents the instantaneous rate of risk¬ 
free return at any time, so that $1 invested now 
will have grown by a later time to $1 multiplied 
by a continuously compounded rollover rate 
during the time period. 3 Arbitrage pricing the¬ 
ory stipulates the price one should pay now to 
receive $1 at later time is the expected discount 
of the payoff. 4 So by appealing to the arbitrage 
pricing theory, we are prompted to introduce an 
integral representation for the value equation; 
in other words, the arbitrage pricing theory al¬ 
lows us to use the value additivity principle 
across all interest rate scenarios. 


IS IT EQUILIBRIUM OR AN 
ARBITRAGE MODEL? 

Market participants are guided in their invest¬ 
ment decision making by received economic 
philosophy or intuition. Investors, in general, 
look at value from either an absolute or rel¬ 
ative value basis. Absolute value basis pro¬ 
ceeds from the economic notion that the market 
clears at an exogenously determined price 
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that equates supply-and-demand forces. Abso¬ 
lute valuation models are usually supported 
by general or partial equilibrium arguments. 
In implementing market measure models that 
depend on equilibrium analysis, the role of 
an investor's preference for risky prospects is 
directly introduced. The formidable task en¬ 
countered with respect to preference modeling 
and the related aggregation problem has ren¬ 
dered these types of models useless for most 
practical considerations. One main exception is 
the present value rule that explicitly assumes 
investors have a time preference for today's dol¬ 
lar. Where the present value function is a mono- 
tonically decreasing function of time, today's 
dollar is worth more than a dollar earned tomor¬ 
row. Earlier term structure models were sup¬ 
ported by equilibrium arguments, for example, 
the Cox-Ingersoll-Ross (CIR) model. 5 In partic¬ 
ular, CIR provides an equilibrium foundation 
for a class of yield curves by specifying the en¬ 
dowments and preferences of traders, which, 
through the clearing of competitive markets, 
generates the proposed term structure model. 

Relative valuation models rely on arbitrage 
and dominance principles and characterize as¬ 
set prices in terms of other asset prices. A 
well-known example of this class is the Black- 
Scholes 6 and Merton 7 option pricing model. 
Modern term structure models, for example, 
Hull-White, 8 Black-Derman-Toy (BDT), 9 and 
Heath-Jarrow-Morton (HJM), 10 are based on 
arbitrage arguments. Although relative val¬ 
uation models based on arbitrage principles 
do not directly make assumptions about in¬ 
vestors' preferences, there remains a vestige 
of the continuity of preference, for example, 
the notion that investors prefer more wealth to 
less. Thus, whereas modelers are quick in at¬ 
tributing "arbitrage-freeness" to their models, 
assuming there are no arbitrage opportunities 
implies a continuity of preference that can be 
supported in equilibrium. So, if there are no 
arbitrage opportunities, the model is in equi¬ 
librium for some specification of endowments 
and preferences. The upshot is that the distinc¬ 


tion between equilibrium models and arbitrage 
models is a stylized fetish among analysts to 
demarcate models that explicitly specify en¬ 
dowment and preference sets (equilibrium) and 
those models that are outwardly silent about 
the preference set (arbitrage). Moreover, ana¬ 
lysts usually distinguish equilibrium models as 
those that use today's term structure as an out¬ 
put and no-arbitrage models as those that use 
today's term structure as an input. 

Arbitrage opportunity exists in a market 
model if there is a strategy that guarantees a 
positive payoff in some state of the world with 
no possibility of negative payoff and no initial 
net investment. The presence of arbitrage op¬ 
portunity is inconsistent with economic equi¬ 
librium populated by market participants that 
have increasing and continuous preferences. 
Moreover, the presence of arbitrage opportu¬ 
nity is inconsistent with the existence of an op¬ 
timal portfolio strategy for market participants 
with nonsatiated preferences (prefer more to 
less) because there would be no limit to the 
scale at which they want to hold an arbitrage 
position. The economic hypothesis that main¬ 
tains two perfect substitutes (two bonds with 
the same credit quality and structural charac¬ 
teristics issued by the same firm) must trade 
at the same price is an implication of no arbi¬ 
trage. This idea is commonly referred to as the 
law of one price. Technically speaking, the fun¬ 
damental theorem of asset pricing is a collection 
of canonical equivalent statements that implies 
the absence of arbitrage in a market model. 
The theorem provides for weak equivalence be¬ 
tween the absence of arbitrage, the existence of 
a linear pricing rule, and the existence of op¬ 
timal demand from some market participants 
who prefer more to less. The direct consequence 
of these canonical statements is the pricing 
rule: the existence of a positive linear pricing 
rule, the existence of positive risk-neutral prob¬ 
abilities, and associated riskless rate or the ex¬ 
istence of a positive state price density. 

In essence, the pricing rule representa¬ 
tion provides a way of correctly valuing a 
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security when the arbitrage opportunity is elim¬ 
inated. A fair price for a security is the arbitrage- 
free price. The arbitrage-free price is used as a 
benchmark in relative value analysis to the ex¬ 
tent that it is compared with the price observed 
in actual trading. A significant difference be¬ 
tween the observed and arbitrage-free values 
may indicate the following profit opportunities: 

• If the arbitrage price is above the observed 
price, all else being equal, the security is cheap 
and a long position may be called for. 

• If the arbitrage price is below the observed 
price, all else being equal, the security is rich 
and a short position may be called for. 

In practice, the basic steps in determining the 
arbitrage-free value of the security are as fol¬ 
lows: 

• Specify a model for the evolution of the un¬ 
derlying security price. 

• Obtain a risk-neutral probability. 

• Calculate the expected value at expiration us¬ 
ing the risk-neutral probability. 

• Discount this expectation using the risk-free 
rates. 

In studying the solution to the security val¬ 
uation problem in the arbitrage pricing frame¬ 
work, analysts usually use one of the following: 

• Partial differential equation (PDE) frame¬ 
work 

• Equivalent martingale measure framework 

The PDE framework is a direct approach and 
involves constructing a risk-free portfolio, then 
deriving a PDE implied by the lack of arbitrage 
opportunity. The PDE is solved analytically or 
evaluated numerically. 11 

Although there are few analytical solutions 
for pricing PDEs, most of them are evaluated 
using numerical methods such as lattice, fi¬ 
nite difference, and Monte Carlo. The equiv¬ 
alent martingale measure framework uses the 
notion of arbitrage to determine a probability 
measure under which security prices are mar¬ 
tingales once discounted. The new probability 


measure is used to calculate the expected value 
of the security at expiration and discounting 
with the risk-free rate. 


WHICH IS THE RIGHT 
MODEL OF THE INTEREST 
RATE PROCESS? 

The bare essential of the bond market is a col¬ 
lection of zero-coupon bonds for each date, for 
example, now, that mature later. A zero-coupon 
bond with a given maturity date is a contract 
that guarantees the investor $1 to be paid at 
maturity. The price of a zero-coupon bond at 
time t with a maturity date of T is denoted by 
P(f, T). In general, analysts make the follow¬ 
ing simplifying assumptions about the bond 
market: 

• There exists a frictionless and competitive 
market for a zero-coupon bond for every ma¬ 
turity date. By a frictionless market, we mean 
there is no transaction cost in buying and sell¬ 
ing securities and there is no restriction on 
trades such as a short sale. 

• For every fixed date, the price of a zero- 
coupon bond, {P(f, T); 0 < t < T}, is a 
stochastic process with P(f, t) = 1 for all f. 
By stochastic process, we mean the price of a 
zero-coupon bond moves in an unpredictable 
fashion from the date it was bought until it 
matures. The present value of a zero-coupon 
bond when it was bought is known for certain 
and it is normalized to equal one. 

• For every fixed date, the price for a zero- 
coupon bond is continuous in that at every 
trading date the market is well bid for the 
zero-coupon bond. 

In addition to zero-coupon bonds, the bond 
market has a money market (bank account) ini¬ 
tialized with a unit of money. 12 The bank ac¬ 
count serves as an accumulator factor for rolling 
over the bond. 
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A term structure model establishes a mathe¬ 
matical relationship that determines the price 
of a zero-coupon bond, {P(f, T); 0 < t < T}, 
for ah dates t between the time the bond is 
bought (time 0) and when it matures (time T). 
Alternatively, the term structure shows the re¬ 
lationship between the yield to maturity and 
the time to maturity of the bond. To compute 
the value of a security dependent on the term 
structure, one needs to specify the dynamic of 
the interest rate process and apply an arbitrage 
restriction. A term structure model satisfies the 
arbitrage restriction if there is no opportunity 
to invest risk-free and be guaranteed a positive 
return. 13 

To specify the dynamic of the interest rate 
process, analysts have always considered a 
dynamic that is mathematically tractable and 
anchored in sound economic reasoning. The 
basic tenet is that the dynamic of interest rates 
is governed by time and the uncertain state 
of the world. Modeling time and uncertainty 
are the hallmarks of modern financial theory. 
The uncertainty problem has been modeled 
with the aid of the probabilistic theory of the 
stochastic process. The stochastic process mod¬ 
els the occurrence of random phenomena; in 
other words, the process is used to describe un¬ 
predictable movements. The stochastic process 
is a collection of random variables that take val¬ 
ues in the state space. The basic elements distin¬ 
guishing a stochastic process are state space 14 
and index parameter, 15 and the dependent re¬ 
lationship among the random variables (e.g., 
Xf). 16 The Poisson process and Brownian mo¬ 
tion are two fundamental examples of continu¬ 
ous time stochastic processes. 

In everyday financial market experiences, one 
may observe, at a given instant, three possible 
states of the world: Prices may go up a tick, 
decrease a tick, or do not change. The ordi¬ 
nary market condition characterizes most trad¬ 
ing days; however, security prices may from 
time to time exhibit extreme behavior. In finan¬ 
cial modeling, there is the need to distinguish 
between rare and normal events. Rare events 


usually bring about discontinuity in prices. The 
Poisson process is used to model jumps caused 
by rare events and is a discontinuous process. 
Brownian motion is used to model ordinary 
market events for which extremes occur only 
infrequently according to the probabilities in 
the tail areas of normal distribution. 17 

Brownian motion is a continuous martingale. 
Martingale theory describes the trend of an ob¬ 
served time series. A stochastic process behaves 
like a martingale if its trajectories display no 
discernible trends. 

• A stochastic process that, on average, in¬ 
creases is called a submartingale. 

• A stochastic process that, on average, declines 

is called a supermartingale. 

Suppose one has an interest in generating a 
forecast of a process (e.g., R t — interest rate) 
by expressing the forecast based on what has 
been observed about R based on the informa¬ 
tion available (e.g., F f ) at time f. 18 This type of 
forecast, which is based on conditioning on in¬ 
formation observed up to a time, has a role in 
financial modeling. This role is encapsulated in 
a martingale property. 19 A martingale is a pro¬ 
cess, the expectation for which future values 
conditional on current information are equal 
to the value of the process at present. A mar¬ 
tingale embodies the notion of a fair gamble: 
The expected gain from participating in a fam¬ 
ily of fair gambles is always zero and, thus, the 
accumulated wealth does not change in expec¬ 
tation over time. Note the actual price of a zero- 
coupon bond does not move like a martingale. 
Asset prices move more like sub-martingales 
or supermartingales. The usefulness of martin¬ 
gales in financial modeling stems from the fact 
that one can find a probability measure that is 
absolutely continuous with objective probabil¬ 
ity such that bond prices discounted by a risk¬ 
free rate become martingales. The probability 
measures that convert discounted asset prices 
into martingales are called equivalent martin¬ 
gale measures. The basic idea is that, in the 
absence of an arbitrage opportunity, one can 
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find a synthetic probability measure Q abso¬ 
lutely continuous with respect to the original 
measure P so that all properly discounted as¬ 
set prices behave as martingales. A fundamen¬ 
tal theorem that allows one to transform R t 
into a martingale by switching the probabil¬ 
ity measure from P to Q is called the Girsanov 
theorem. 

The powerful assertion of the Girsanov the¬ 
orem provides the ammunition for solving a 
stochastic differential equation driven by Brow¬ 
nian motion in the following sense: By chang¬ 
ing the underlying probability measure, the 
process that was driving the Brownian motion 
becomes, under the equivalent measure, the 
solution to the differential equation. In finan¬ 
cial modeling, the analog to this technical re¬ 
sult says that in a risk-neutral economy assets 
should earn a risk-free rate. In particular, in the 
option valuation, assuming the existence of a 
risk-neutral probability measure allows one to 
dispense with the drift term, which makes the 
diffusion term (volatility) the dominant value 
driver. 

To model the dynamic of interest rates, it is 
generally assumed the change in rates over in¬ 
stantaneous time is the sum of the drift and 
diffusion terms (see Figure l). 20 The drift term 
could be seen as the average movement of 
the process over the next instants of time, and 
the diffusion is the amplitude (width) of the 
movement. If the first two moments are suffi¬ 
cient to describe the distribution of the asset 


return, the drift term accounts for the mean 
rate of return and the diffusion accounts for the 
standard deviation (volatility). Empirical evi¬ 
dence has suggested that interest rates tend to 
move back to some long-term average, a phe¬ 
nomenon known as mean reverting that corre¬ 
sponds to the Ornstein-Ulhenbeck process (see 
Figure 2). 21 When rates are high, mean rever¬ 
sion tends to cause interest rates to have a neg¬ 
ative drift; when rates are low, mean reversion 
tends to cause interest rates to have a positive 
drift. 

The highlights of the preceding discussion are 
as follows: 

* The modeler begins by decomposing bonds to 
their bare essentials, which are zero-coupon 
bonds. 

* To model a bond market that consists of zero- 
coupon bonds, the modeler makes some sim¬ 
plifying assumptions about the structure of 
the market and the price behaviors. 

• A term structure model establishes a mathemat¬ 
ical relationship that determines the price of a 
zero-coupon bond and, to compute the value 
of a security dependent on the term structure, 
the modeler needs to specify the dynamic of 
the interest rate process and apply arbitrage 
restriction. 

• The stochastic process is used to describe the 
time and uncertainty components of the price 
of zero-coupon bonds. 



Figure 1 Drift and Diffusion 






264 


Bond Valuation 



Figure 2 Process with Mean Reversion (Ornstein-Uhlenbeck Process) 


• There are two basic types of stochastic pro¬ 
cesses used in financial modeling: The Pois¬ 
son process is used to model jumps caused 
by rare events, and Brownian motion is used 
to model ordinary market events for which 
extremes occur only infrequently. 

* We assume the market for zero-coupon bonds 
is well bid, that is, the zero-coupon price 
is continuous. Brownian motion is the suit¬ 
able stochastic process to describe the evolu¬ 
tion of interest rates over time. In particular. 
Brownian motion is a continuous martingale. 
Martingale theory describes the trend of the 
observed time series. 

* Once we specify the evolution of interest rate 
movements, we need an arbitrage pricing the¬ 
ory that tells us the price one should pay 
now to receive $1 later is an expected dis¬ 
counted payoff. The issue to be resolved is. 
What are the correct expected discount factors 
to use? The discount must be determined by 
the market and based on risk-adjusted proba¬ 
bilities. In particular, when all bonds are prop¬ 
erly risk-adjusted, they should earn risk-free 
rates; if not, arbitrage opportunity exists to 
earn riskless profit. 

• The risk-adjusted probability consistent with 
the no-arbitrage condition is the equivalent 
martingale measure; it is the probability mea¬ 


sure that converts the discounted bond price 
to a martingale (fair price). The elegance of 
the martingale theory is the "roughs and tum¬ 
bles" one finds in the world of partial differ¬ 
entiation are to some extent avoided and the 
integral representation it allows fits nicely 
with Monte Carlo simulations. 

Several term structure models have been pro¬ 
posed with subtle differences. However, the ba¬ 
sic differences amount to how the dynamic of 
the interest rate is specified, the number of fac¬ 
tors that generate the rate process, and whether 
the model is closed by equilibrium or arbitrage 
arguments. 

Which of these models to use in OAS analy¬ 
sis depends on the available resources. Where 
resource availability is not an issue, we favor 
models that account for the path-dependent 
nature of mortgage cash flows. Good rules-of- 
thumb in deciding which model to use are as 
follows: 

• Flexibility: How flexible is the model? 

• Simplicity: Is the model easy to understand? 

• Specification: Is the specification of the interest 
rate process reasonable? 

• Realism: How real is the model? 

• Good fit: How well does the result fit the mar¬ 
ket data? 
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• Internal consistency rule: A necessary condition 
for the existence of market equilibrium is the 
absence of arbitrage, and the external consis¬ 
tency rule requires models to be calibrated to 
market data. 


TERM STRUCTURE MODELS: 
WHICH IS THE RIGHT 
APPROACH FOR OAS? 

Numerical schemes are constructive or algo¬ 
rithmic methods for obtaining practical solu¬ 
tions to mathematical problems. They provide 
methods for effectively finding practical solu¬ 
tions to asset pricing PDEs. 

The first issue in a numerical approach is 
discretization. The main objective for discretiz¬ 
ing a problem is to reduce it from continuous 
parameters formulation to an equivalent dis¬ 
crete parameterization in a way that makes it 
amenable to practical solution. In financial val¬ 
uation, one generally speaks of a continuous 
time process in an attempt to find an analytical 
solution to a problem; however, nearly all the 
practical solutions are garnered by discretizing 
space and time. Discretization involves finding 
numerical approximatizations to the solution at 
some given points rather than on a continuous 
domain. 

Numerical approximation may involve the 
use of a pattern, lattice, network, or mesh of 
discrete points in place of the (continuous) 
whole domain, so that only approximate solu¬ 
tions are obtained for the domain in the iso¬ 
lated points, and other values such as integrals 
and derivatives can be obtained from the dis¬ 
crete solution by the means of interpolation and 
extrapolation. 

With the discretization of the continuous do¬ 
main come the issues of adequacy, accuracy, 
convergence, and stability. Perhaps how these 
issues are faithfully addressed in the implemen¬ 
tation of OAS models speaks directly to the type 
of results achieved. Although these numerical 


techniques—lattice methods, finite difference 
methods, and Monte Carlo methods—have 
been used to solve asset pricing PDEs, the lattice 
and Monte Carlo methods are more in vogue in 
OAS implementations. 


Lattice Method 

The most popular numerical scheme used by fi¬ 
nancial modelers is the lattice (or tree) method. A 
lattice is a nonempty collection of vertices and 
edges that represent some prescribed mathe¬ 
matical structures or properties. The node (ver¬ 
tex) of the lattice carries particular information 
about the evolution of a process that generates 
the lattice up to that point. An edge connects 
the vertices of a lattice. A lattice is initialized 
at its root, and the root is the primal node that 
records the beginning history of the process. 

The lattice model works in a discrete frame¬ 
work and calculates expected values on a dis¬ 
crete space of paths. A node in a given path of a 
nonrecombining lattice distinguishes not only 
the value of the underlying claim there but also 
the history of the path up to the node. A bushy 
tree represents every path in the state space and 
can numerically value path-dependent claims. 
A node in a given path of a bushy tree distin¬ 
guishes not only the value of the underlying 
claim there but also the history of the path to 
the node. There is a great cost in constructing a 
bushy tree model. For example, modeling a 10- 
year Treasury rate in a binary bushy tree with 
each time period equal to one coupon payment 
would require a tree with 2 20 (1,048,576) paths. 
Figure 3 shows a schematic of a bushy tree. 

In a lattice construction, it is usually assumed 
the time to maturity of the security, T, can be 
divided into discrete (finite and equal) time- 
steps M, A t = T/M. The price of the underlying 
security is assumed to have a finite number 
of "jumps" (or up-and-down movements) N 
between the time-steps At. In a recombining 
lattice, the price or yield of the underlying 
security is assumed to be affected by N and not 
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the sequences of the jumps. For computational 
ease, N is usually set to be two or three; the case 
where N — 2 is called binomial lattice (or tree), 
and N = 3 is the trinomial lattice. Figures 4 and 
5 show the binomial and trinomial lattices, re¬ 
spectively, for the price of a zero-coupon bond. 


Monte Carlo Method 

The Monte Carlo method is a numerical scheme 
for solving mathematical models that involve 
random sampling. This scheme has been used 
to solve problems that are either deterministic 
or probabilistic in nature. In the most common 



Figure 4 Binomial Lattice for the Price of a Zero- 
Coupon Bond 



Figure 5 Trinomial Lattice for the Price of a 
Zero-Coupon Bond 


application, the Monte Carlo method uses ran¬ 
dom or pseudo-random numbers to simulate 
random variables. Although the Monte Carlo 
method provides flexibilities in dealing with a 
probabilistic problem, it is not precise especially 
when one desires the highest level of accuracy 
at a reasonable cost and time. 

Aside from this drawback, the Monte Carlo 
method has been shown to offer the following 
advantages: 

• It is useful in dealing with multidimensional 
problems and boundary value problems with 
complicated boundaries. 

• Problems with random coefficients, random 
boundary values, and stochastic parameters 
can be solved. 

• Solving problems with discontinuous bound¬ 
ary functions, nonsmooth boundaries, and 
complicated right-hand sides of equations 
can be achieved. 

The application of the Monte Carlo method 
in computational finance is predicated on the 
integral representation of security prices. The 
approach taken consists of the following: 

• Simulating in a manner consistent with a risk- 
neutral probability (equivalent martingale) 
measure the sample path of the underlying 
state variables 

• Evaluating the discounted payoff of the secu¬ 
rity on each sample path 

• Taking the expected value of the discounted 
payoff over the entire sample paths 
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The Monte Carlo method computes a mul¬ 
tidimensional integral—the expected value of 
discounted cash flows over the space of sample 
paths. For example, let f(x) be an integral func¬ 
tion over d-dimensional unit hypercube, then 
a simple (or crude) estimate of the integral is 
equal to the average value of the function f 
over n points selected at random (more appro¬ 
priately, pseudorandom) from the unit hyper¬ 
cube. By the law of large numbers, 22 the Monte 
Carlo estimate converges to the value as n tends 
to infinity. Moreover, we know from the cen¬ 
tral limit theorem that the standard error of 
estimate tends toward zero as 1 /(y/n). To im¬ 
prove on the computational efficiency of the 
crude Monte Carlo method, there are several 
variance-reduction techniques available. 


IS THERE A RIGHT WAY TO 
MODEL PREPAYMENTS? 

Because cash flows are one of the most impor¬ 
tant inputs in determining the value of a se¬ 
curity, there has to be a model for cash flow. 
The cash flow model consists of a model for 
distributing the coupon and scheduled princi¬ 
pal payments to the bondholders, as contained 
in the deal prospectus, and a prepayment model 
that projects unscheduled principal payments. 
The basic types of prepayment models are as 
follows: 

• Rational prepayment models. These models ap¬ 
ply an option-theoretic approach and link 
prepayment and valuation in a single unified 
framework. 

• Econometric prepayment models. This class of 
models is based on econometric and statistical 
analysis. 

• Reduced-form prepayment models. This type of 
model uses past prepayment rates and other 
endogenous variables to explain current pre¬ 
payment. It fits the observed prepayment 
data, unrestricted by theoretical considera¬ 
tion. 


The reduced-form prepayment model is the 
most widely used approach among dealers and 
prepayment vendors because of its flexibility 
and unrestricted calibration techniques. The ba¬ 
sic determinants of the voluntary and invol¬ 
untary components of total prepayments are 
collateral and market factors. Collateral fac¬ 
tors are the origination date, weighted average 
coupon (WAC), and weighted average matu¬ 
rity, and the market-related factors are bench¬ 
mark rates and spreads. 

KEY POINTS 

• There are foundational issues that explain (1) 
why there is a difference in dealers' OAS val¬ 
ues for a specific bond, (2) what may be re¬ 
sponsible for the differences, and (3) why one 
OAS value may be more correct than another. 

• As a general guideline, portfolio managers 
should become familiar with the economic in¬ 
tuitions and basic assumptions made by the 
models. 

• The reasonableness of the OAS values pro¬ 
duced by different models should be consid¬ 
ered. Moreover, because prepayment options 
are not traded in the market, calibrating OAS 
values using the prices of these options is not 
possible. 

• Interest rate models, which are closed by 
precluding arbitrage opportunities, are more 
tractable and realistic. 

• Interest rate models that account for the path- 
dependent natures of ABS and MBS cash 
flows are more robust. 

• With the path-dependent natures of ABS and 
MBS cash flows come the difficulties of imple¬ 
mentation, in particular, the speed of calcula¬ 
tion; the toss-up here is between the lattice 
and Monte Carlo schemes. 

• There is a tendency for market participants to 
believe that because we are talking about in¬ 
terest rate scenarios, the ideal candidate for 
the job would be Monte Carlo techniques, 
but this should not necessarily be the case. 
Although lattice implementation could do a 
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good job, the success of this scheme depends 
highly on ad hoc techniques that have not 
been time-tested. Hence, whereas the OAS 
implementation scheme is at the crux of what 
distinguishes good or bad results, the pre¬ 
ferred scheme is an open question that crit¬ 
ically depends on available resources. 

* Reduced-form prepayment models should be 
favored because of their flexibility and unre¬ 
stricted calibration techniques. In particular, 
a model that explicitly identifies its control 
parameters and is amenable to the perturba¬ 
tion of these parameters is more robust and 
transparent. 

* With respect to how to interpret the differ¬ 
ences in dealers' OAS value for a specific 
security, decisions by dealers, vendors, and 
portfolio managers to choose one interest rate 
and prepayment model over others and the 
different approaches they take in implement¬ 
ing these models largely account for the wide 
variance in OAS results. Moreover, to compli¬ 
cate the issue, the lack of a market for tradable 
prepayment options makes calibrating the re¬ 
sulting OAS values dicey at best. 

* As for whether there is a correct OAS value 
for a given security, examining the change in 
OAS value over time, the sensitivity of OAS 
parameters, and their implications to relative 
value analysis are some of the important indi¬ 
cators of the reasonableness of the OAS value. 


NOTES 


1. In the world of certainty, the present value 
is 


PV = ± 

i =1 


Cfi 

(1 +r,y 


where r, is the spot rate applicable to cash 
flow cfi. In terms of forward rates, the equa¬ 
tion becomes 


pv = y, 

i =1 


_ c _jf _ 

(l + / 1 )(l + / 2 )...(l + /„) 


where fi is the forward rate applicable to 
cash flow cfj. 


2 . 


3. 

4. 

5. 

6 . 

7. 

8 . 
9. 

10 . 

11 . 


The present value formula becomes more 
complicated and could be represented as 


£2 T 

Ct)j t{ 


cf(tj, COj) 

(1 + r(ti,(Oi)) 


Vi = 1,2, ...N 


where 

PVq = the present value of uncertain 
cash flow 

c/(f„ mi) = the cash flow received at time 
tj and state a>, 

r(tj, cot) = the spot rate applicable at time 
tj and state a>, 


For OAS analysis, a stylized version of the 
previous equation is given by 


PV Q 


lim 

n—>oo 


1 cfi { tj , COj ) 
N(l+r(ti,coi)) 


Vi = 1,2, ...N 



Cox, Ingersoll, and Ross (1985). 

Black and Scholes (1973). 

Merton (1974). 

Hull and White (1990). 

Black, Derman, and Toy (1990). 

Heath, Jarrow, and Morton (1992). 

For example, the PDE for a zero-coupon 
bond price is 


dp 1 ,3 2 p , s dp 


rp = 0 


where 


p = zero-coupon price 
r = instantaneous risk-free rate 
/x = the drift rate 
a = volatility 
X = market price of risk 
To solve the zero-coupon price PDE, 
we must state the final and boundary 
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conditions. The final condition that corre¬ 
sponds to payoff at maturity is p(r, T) = k. 

12. The bank account is denoted by 

B(t) — exp 

and B(0) = 1. 

13. Technically, the term structure model is said 
to be arbitrage-free if and only if there is a 
probability measure Q on £2 (Q ~ P) with 
the same null 

z( hT)=^p,°<t<T 

set as P, such that for each f, the process is 
a martingale under Q. 

14. State space is the space in which the possi¬ 
ble values of X f lie. Let S be the state space. 
If S — (0,1,2...), the process is called the dis¬ 
crete state process. If S = 3t(—oo, oo) that is 
the real line, and the process is called the 
real-valued stochastic process. If S is Eu¬ 
clidean d-space, then the process is called 
the d-dimensional process. 

15. Index parameter: If T = (0,1...), then Xf is 
called the discrete-time stochastic process. 
If T = ;)f + [0, oo), then X t is called a contin¬ 
uous time stochastic process. 

16. Formally, a stochastic process is a family of 
random variables X = {x t ; t e T}, where T 
is an ordered subset of the positive real line 
91+. A stochastic process X with a time set [0, 
T] can be viewed as a mapping from £2 x [0, 
T] to St with x(o>, t) denoting the value of 
the process at time t and state a>. For each 
co e £2, {x(co, f); te [0,T]} is a sample path of 
X sometimes denoted as x(co, •). A stochas¬ 
tic process X = {x t ; t e [0, T]} is said to 
be adapted to filtration F if x t is measurable 
with respect to F t for all t e [0, T]. The adapt¬ 
edness of a process is an informational con¬ 
straint: The value of the process at any time 
t cannot depend on the information yet to 
be revealed strictly after t. 

17. A process X is said to have an independent 
increment if the random variables x(ti) — 



x(t 0 ), x(t 2 ) — x(t{) ... and x(t n ) — x(f„_i) are 
independent for any n > 1 and 0 < t 0 < 
t\ < ... < t n < T. A process X is said to 
have a stationary independent increment if, 
moreover, the distribution of x(t) — x(s) de¬ 
pends only on t — s. We write z ~ N(/x, a 2 ) 
to mean the random variable z has normal 
distribution with mean /x and variance a 2 . 
A standard Brownian motion W is a process 
having continuous sample paths, stationary 
independent increments, and W(f) ~ N(ji, 
t) (under probability measure P). Note that 
if X is a continuous process with station¬ 
ary and independent increments, then X is 
a Brownian motion. A strong Markov prop¬ 
erty is a memoryless property of a Brownian 
motion. Given X as a Markov process, the 
past and future are statistically independent 
when the present is known. 

18. We write 


E t [R t ] = E[R T \F t ],t <T 

19. More concretely, given a probability space, 
a process {B f t e (0, oo)} is a martingale with 
respect to information sets F t , if for all t > 0, 

1. R t is known, given F f , that is, Rf is F t 
adapted 

2. Unconditional forecast is finite; 
E\R t \ < oo 

3. And if 


£ f [_R f ] = R t , V t < T 

with a probability of 1. The best forecast of 
unobserved future value is the last observa¬ 
tion onR f . 

20. In particular, assume 

dX(t) = a{t, X(t))dt + P(t, X(t))dW(t) 

for which the solution X(f) is the factor. 
Depending on the application, one can 
have n-factors, in which case we let X 
be an 77-dimensional process and W an 
77-dimensional Brownian motion. Assume 
the stochastic differential equation for X(f) 
describes the interest process r(t), (i.e., r(t) 
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is a function of X(f)). A one-factor model of 
interest rate is 

dr(t) = a'.(t)dt + Jl(t)dW(t) 

21. This process is represented as 

dr = a(b — r)dt + crr^dW 

where a and b are called the reversion speed 
and level, respectively. 

22. Strong Law of Large Numbers. Let X = X\, 
X 2 ... be an independent identically dis¬ 
tributed random variable with E(X 2 ) < 00 
then the mean of the sequence up to the nth 
term, though itself a random variable, tends 
as n get larger and larger, to the expectation 
of X with probability 1. That is 
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Convertible Bonds 


FILIPPO STEFANINI 

Head of Hedge Funds and Manager Selection, Eurizon Capital SGR 


Abstract: Convertible bonds are bonds that give their holders the right to periodic coupon payments 
and, as of a fixed date, the right to convert the bonds into a fixed number of shares. If the bondholder 
decides to exercise his conversion right, instead of being paid back the par value of the bonds, he will 
receive a fixed number of shares in exchange. There are several options embedded in a convertible 
bond. There is obviously a call option on the underlying stock. All convertible bonds are callable. 
A convertible bond may be putable. The presence of all of these options complicates the valuation 
of convertible bonds. There are models that practitioners use for valuation purposes. These models 
are classified as analytical models and numerical models. 


Convertibles are ideal securities for arbitrage, 
because the convertible itself, namely the un¬ 
derlying stock and the associated derivatives, 
are traded along predictable ratios, and any dis¬ 
crepancy or misprice would give rise to arbi¬ 
trage opportunities for fund managers. Traders 
use quantitative models to identify convert¬ 
ible bonds whose market value differs from 
their theoretical price. However, unlike callable 
bonds or putable bonds that have interest 
rate-embedded options, a convertible bond 
also has an embedded equity option. This com¬ 
plicates the quantitative modeling of these se¬ 
curities. 

Quantitative models, or valuation models, for 
convertible bonds are divided into two cate¬ 
gories: analytical models and numerical models. In 
this entry, we describe the more commonly used 
model in both of these categories. 


ANALYTICAL MODELS 

Ingersoll (1977) proposed a valuation model for 
convertible bonds based on the option theory 
and on the Black-Scholes option pricing model. 
The model's main assumptions are: 

* Markets operate continuously. 

* There are no transaction costs. 

* Share prices follow an Ito diffusion process. 

* Securities prices have a lognormal distribu¬ 
tion. 

* The underlying stock volatility is constant. 

Ingersoll's model assumes that prices vary con¬ 
tinuously, that is, there is always liquidity in 
the market and there are no limits to securities 
lending and short selling. It also assumes that 
the company's market value follows an Ito dif¬ 
fusion process, that is, a continuous Brownian 
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motion. Under this assumption, it is possible to 
set up a closed analytical formula to calculate 
the value of a convertible bond. 

The model can be applied only to European 
convertibles, namely, convertibles that can be 
exercised only upon expiration. Moreover, the 
model makes it clear how complex the valua¬ 
tion of convertible bonds is, and it provides a 
highly interesting theoretical reference, in that 
it can reach an analytical solution to the val¬ 
uation of convertibles. Yet, we know all too 
well that interest rates, credit spreads, curren¬ 
cies, and dividends are not constant, and the 
clauses and provisions written in the prospec¬ 
tus of a convertible are often highly varied and 
complicated, making it fairly difficult to apply 
analytical valuation models. This is why it is 
necessary to turn to numerical approximation 
models. 


The Ingersoll Model 

As just noted, the Ingersoll model provides an 
analytic solution for the pricing of a convertible 
bond, given some general market assumptions. 
The strongest assumptions are: 

• Capital markets are perfect with no transac¬ 
tion costs, no taxes, and equal access to infor¬ 
mation for all investors. 

• Trading takes place continuously in time and 
there are no restrictions against borrowing or 
short sales. 

• The market value of the company follows an 
Ito diffusion process. 

The Black-Scholes option pricing model is used 
to value the convertible bond as a contingent 
claim on the firm as a whole. 

Consider a convertible bond that is convert¬ 
ible only at maturity, therefore with a European 
call option embedded. Let 

n 

Y = n + N 

equal the dilution factor, indicating the fraction 
of the common equity that would be held by 


the convertible bond issue's owners if the entire 
issue were converted: 

V — market value of the company 
r = maturity of the convertible bond 
B = balloon payment (nominal value of the 
convertible bond) 
r = interest rate 

In light of the continuous-time analysis, the 
functional form to assume for the call price of a 
convertible bond is the exponential: 

K(r) = B ■ e~ pz 


where 

p — rate of change in the call price 
ct 2 = the instantaneous variance of returns 
of the stock underlying the convertible 
bond 


X 



—oo 


is the cumulative normal distribution 


F(V, r; B, 0) = 


Be 


q> 


log(B ■ e~ rt /V) + jcr 2 r 
cryT 


■V ■ ck 


/ logtg-eyhu+lgh U 
\ CT V* / 


B ■ e~ rt 


W(yV,z;B) = y ■ V ■ 4> 


/ 1 °g( z r) + ( r + I ff2 ) r \ 

\ ff \/r / 


-B-e~ rt D 


!°g (^ + O'+ ! ff2 )' 



The value of the convertible bond is 
H{V, z) = F(V, r; B, 0) + W(yV, r; B) 


K(r) \ 
yV ) 


2(r-rt/<r 2 


(F (yV ■ e (p - r)t , r; B ■ e (r-p)r , 0) 
-F(yV-e (fl - r > T ,T; - ■ e (r ~ p)r , 0)^ 
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Figure 1 Plot of a Convertible Bond Function for 
Different Firm Values 

To illustrate the model, let's plot the function 
H with the following parameters: 

B = 100 

p = 0,02 

y — 0,2 
a 2 = 5% 
r = 7% 

V ranges from 0 to 625. 

The plots are shown in Figure 1. The straight 
lines cross at 


Y 

NUMERICAL MODELS 

The most widely used mathematical models 
among hedge fund managers for the valuation 
of convertible bonds are numerical methods, 
among which are the binomial and trinomial 
trees, the three-dimensional binomial model, 
implied trees, and the Monte Carlo simulation 
model. 

The binomial tree model was introduced by Cox, 
Ross, and Rubinstein (1979) and by Sharpe in 
his textbook (Sharpe, 1978). This model allows 
one to build a tree of possible share prices be¬ 
tween now and the convertible's maturity date. 
This tree is then used to find the convertible's 
current value by calculating its value along all 
the tree's nodes. In the binomial tree model. 


the tree has two branches that develop from 
every node, while in the trinomial tree model 
there are three branches diverging from each 
node. The higher the number of nodes, the 
more accurate the model is. The binomial model 
makes it possible to also value an American op¬ 
tion that would otherwise find no solution in a 
closed form. If the number of time steps grows 
bigger, the binomial tree tends toward the 
Black-Scholes continuous formula for European 
options. 

All these models are helpful when making a 
decision, but many of the options embedded in 
a convertible do not fit the models and there¬ 
fore the fund manager's skill and a rigorous 
risk management discipline become more pre¬ 
cious. The manager's art lies in finding innova¬ 
tive ways to evaluate convertible bonds without 
being swamped with too many details. 

The trinomial tree model was introduced by 
Boyle (1986). The share price can move in three 
directions from every single node and there¬ 
fore the number of time steps can be reduced 
to reach the same precision obtained with the 
binomial tree. 

The Monte Carlo method, named after the 
casino of the Principality of Monaco, is a sta¬ 
tistical simulation method, according to which 
data obtained through the generation of ran¬ 
dom numbers coming from a given statistical 
distribution is considered empirical and is used 
to estimate the parameters under consideration. 
Thousands of random samples are generated, 
derived from the assumed statistical distribu¬ 
tion, which takes as parameters the maximum 
likelihood estimators using real data, and then 
these data are used to estimate the parameters 
under examination. 


The Binomial Tree Model 

Here, we will describe a version of the Cox- 
Ross-Rubinstein model as modified by Gold¬ 
man Sachs. The binomial tree model can be used 
to evaluate convertible bonds with either an 
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embedded European call option or an embed¬ 
ded American call option. 

To determine the value of the convertible 
bond, it's necessary to build four different trees 
in the following order: 

1. Stock price tree. 

2. Conversion probability tree. 

3. Credit-adjusted spread tree. 

4. Convertible bond value tree that is calculated 
backward from the previous trees. 

In the first step we build the stock price tree. 
The binomial tree model allows us to build up 
a picture of how a stock is likely to perform be¬ 
tween now and the maturity of the convertible 
bond (T). The number of nodes (N) is calcu¬ 
lated from the maturity of the convertible bond 
according to the formula T • (T + l)/2. The more 
nodes, the more accurate will be the model. 

Between a node and the following node, the 
stock price can move upward or downward. 
The jump of the stock price depends on the 
length of the time interval At = T/N and on 
the stock price volatility a. Therefore 

u = e (upward move) 
d = e~ a ^* (downward move) 

The stock price, S, at each node is set equal to 

S • y} • d’- { 

where i = 0, 1,..., j 

N is the time step and i is the number of up¬ 
ward moves. 

The probability of a downward move in stock 
price at the next time step At is 

gby/~Kt—d 

while the probability of a downward move 
must be (1 — p), since the probability of going 
either up or down equals unity. 

In the second step we build the conversion 
probability tree. We calculate the conversion 
probabilities backward, starting from the leaves 
of the stock price tree. If it's optimal to convert 
the bond, the conversion probability is 1, other¬ 


wise it is 0. For the steps before the end of the 
tree, the conversion probability is 1 if it's opti¬ 
mal to convert the bond; otherwise, it is equal to 

tfn,i = P ' P H+l,/+1 + (1 + P) ' P w+1 ,z 

In the third step we build the credit-adjusted 
spread tree. If the convertible bond is out-of- 
the-money, futures cash flows should be dis¬ 
counted to a rate equal to the risk-free rate, r, 
plus a credit spread, k, of that particular bond. 
In fact, if the stock price is much lower than the 
conversion price, the convertible bond behaves 
like a plain vanilla bond. If the convertible bond 
is in-the-money, future cash flows must be dis¬ 
counted at the risk-free rate. In this case, the 
convertible bond behaves like a stock. There¬ 
fore, instead of using a fixed discount rate r, in 
each node is calculated a discount rate r n ,; and a 
conversion probability q n ,i is used. The discount 
rate is equal to 

Yn,i = *jn,i ‘ ? dr (1 CJn.i) ' (? T k) 

In the fourth step, we build the convertible 
bond value tree. At each node of the tree, the 
price of the convertible bond is equal to the 
maximum between the conversion value of the 
bond and the face value plus the final coupon. 
The tree is built backward: from the leaves back 
to the root of the tree. The root of the tree is the 
price of the convertible bond. 

If it's optimal to convert the bond at a node, 
then that node is assigned the conversion value; 
otherwise, the price of the convertible bond is 

P n j = maxjmS, p ■ P„ + i,;+i ■ e r " +u+1 ' Af 

+ (1 — p) • P n +i.i ■e ~ rn+u ' At ] 

where m is the conversion ratio. 

For example, let's determine the price of a con¬ 
vertible bond with the binomial tree method, 
starting with the following data: 

T — 5 years (maturity) 

Af = 1 year (step) 

N — 5 (number of nodes) 
r = 4% (risk-free rate) 
k — 2% (credit spread) 
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Stock Price Tree 
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Convertible Bond Value 
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Figure 2 Binomial Trees Necessary to Determine the Value of a Convertible Bond 
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The convertible bond has nominal value 100 
and coupon 10%. 

m — 100% (conversion ratio) 

S = 85 (stock price) 
a — 10% (stock volatility) 

With the formulas discussed above we calculate 

u = 1.1052 (upwards move) 
d — 0.9048 (downwards move) 
p = 0.6787 (probability of an upward move 
of the stock price in the next time 
interval At) 

As shown in Figure 2 we built first the stock 
price tree, then the conversion probability tree. 


then the credit-adjusted spread tree, and finally 
the convertible bond value tree. The value in the 
root of the tree is 90.4, which is the price of the 
convertible bond. 

KEY POINTS 

• To implement strategies involving convert¬ 
ible bonds, traders and fund managers re¬ 
quire a valuation model. 

• Analytical models provide a closed-form so¬ 
lution for the value of a convertible bond, and 
the most commonly used model in practice is 
the Ingersoll model. 
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• While there are several models that fall into 
the realm of numerical models, the one com¬ 
monly used is the binomial tree model, which 
requires the construction of a stock price tree, 
conversion probability tree, credit-adjusted 
spread tree, and convertible bond value tree 
that is calculated backward from the previous 
trees. 
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Abstract: Inflation-indexed bonds, as a way of financing government debt, were proposed in the 
1920s by economists such as Alfred Marshall and John Maynard Keynes. In Israel, they have been 
issued since the 1950s and have often dominated that country's bond market. Inflation-indexed 
sovereign bonds now exist in a broad range of developed countries, as well as in a number of 
emerging markets. A wide variety of bond structures and tax regimes exist. Issuance volumes 
and the breadth of the investor base vary widely from country to country; liquidity varies from 
reasonably good to very poor. When inflation-indexed bonds were introduced in the United States 
in 1997, there was some disagreement about the degree to which inflation-indexed bonds—called 
Treasury inflation-protected securities or TIPS—are "risk-free" and the role they should play in a 
portfolio. In particular, it had not been universally appreciated that these bonds can have volatile 
mark-to-market returns. 


Since their introduction in 1997, Treasury 
inflation-protected securities (TIPS) have become 
an established part of the U.S. bond market. 
This entry reviews the structure of TIPS and the 
factors that drive TIPS returns; examines the 
role that TIPS play in a broader bond portfolio, 
and the nature of TIPS interest rate risk; and 
discusses some methods employed by TIPS in¬ 
vestors to assess value and risk. 


BOND STRUCTURES AND 
THE CONCEPT OF REAL 
YIELD 

The key features of the TIPS bond structure are 
summarized here: 


* TIPS pay interest semiannually. Interest pay¬ 
ments are based on a fixed coupon rate. How¬ 
ever, the underlying principal amount of the 
bonds is indexed to inflation; this inflation- 
adjusted principal amount is used to calcu¬ 
late the coupon payments, which therefore 
also rise with inflation. At maturity, the re¬ 
demption value of the bonds is equal to their 
inflation-adjusted principal amount, rather 
than their original par amount. 

• The inflation-adjusted principal amount is 
equal to the original par amount multiplied 
by an index ratio, which is based on changes 
in the Consumer Price Index (CPI) and which 
is recalculated every day. The index ratio 
is simply the reference CPI on the relevant 
date divided by the reference CPI on the 
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issue date. Negative inflation adjustments are 
not made. 

• The reference CPI for the first day of any 
month is defined to be the non-seasonally 
adjusted CPI-U for the third preceding cal¬ 
endar month, while the reference CPI for any 
subsequent day in that month is determined 
by linearly interpolating the reference CPI for 
the first of the month and the reference CPI 
for the first day of the next month. 

• Price-yield calculations are as follows. Com¬ 
pute the "real price" of the bond from the 
quoted real yield via the standard bond pricing 
formula, using an actual / actual day count ba¬ 
sis, round to 3 decimal places (in $100); then 
multiply the real price by the index ratio to 
obtain the inflation-adjusted price. Accrued 
interest is computed in exactly the same way, 
except that no rounding is carried out. 

An attractive feature of the TIPS structure is 
that inflation indexation occurs with no sub¬ 
stantial lag. In the U.K., there is an eight-month 
lag in the inflation adjustment of index-linked 
gilts; in Australia and New Zealand, there is 
a three- to six-month lag. The lag means that 
real returns from these inflation-indexed bonds 
are subject to short-term inflation risk and con¬ 
siderably complicates the analysis of the bonds. 

The obvious question, of course, is: Where 
does the real yield come from, and how much 
can it change? To investors used to thinking 
of bond yields as being driven by inflation ex¬ 
pectations, it is not obvious that real yields 
should be volatile at all—except perhaps be¬ 
cause of temporary imbalances in supply and 
demand, or changes in liquidity. After all, there 
are respectable economic theories that suggest 
that real interest rates should be constant. But 
in practice, there are various economic reasons 
why real yields do in fact fluctuate. 1 

Causes of Real Yield Volatility 

The real yield may be defined as the long-term 
cost of risk-free capital (net of inflation). That 


is, since TIPS are competing with other invest¬ 
ments, real yields on TIPS will move with the 
cost of capital in the economy as a whole. Of 
course, other factors affect real yields: For ex¬ 
ample, index-linked gilts in the U.K. have had 
artificially low real yields because of their fa¬ 
vorable tax treatment and because of a regu¬ 
latory requirement (since loosened) making it 
virtually obligatory for pension funds to own 
them. However, in this entry we will focus on 
economic and market factors. 

Long-term real yields are influenced by ex¬ 
pectations about future long-term real interest 
rates. The two main macroeconomic factors that 
affect these expectations are: 

1. The domestic factor: long-term expected 
growth in real gross domestic product 
(GDP). Strong growth generally drives up 
real interest rates, since the demand for cap¬ 
ital tends to rise, and borrowers—expecting 
higher real returns—are prepared to shoul¬ 
der higher real borrowing costs. 

2. The international factor: long-term expected 
changes in the current account deficit. De¬ 
mand for capital is by definition higher in 
countries with a large current account deficit, 
driving up domestic interest rates in order to 
attract required international investment. 

Note that short-term trends in real GDP 
and the current account deficit can have a 
strong influence on real yields, because they 
tend to influence the long-term expectations of 
investors. 2 (Roll [1996] has also argued, based 
on an analysis of tax effects, that real yields 
should also rise when expected inflation rises; 
this argument is outlined later in this entry. For 
the moment we ignore tax effects.) 

Real yields on inflation-linked bonds are 
also influenced by relative demand for these 
bonds when compared with competing in¬ 
vestments that may offer investors some 
protection—albeit imperfect—against inflation. 
The balance between competing investments 
constantly shifts, depending on subjective 
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factors such as investor aversion to different 
kinds of risk. Relevant investments include: 

1. Money market investments : If investors are 
confident that short-term interest rates will 
move broadly in line with inflation—which 
was the case for US monetary policy dur¬ 
ing the "Great Moderation" period from the 
early 1980s up to the recent financial crisis, 
but not before or since—then real returns on 
money market instruments will be relatively 
stable over the long term. 

2. Equities: When profit margins are stable, cor¬ 
porate profits, and hence dividends and divi¬ 
dend growth rates, tend to rise with the price 
level; thus, it is reasonable to regard equities 
as an inflation hedge in the long term (re¬ 
membering that equity investors are exposed 
to additional risks in comparison to holders 
of inflation-indexed bonds). 

3. Corporate bonds: As with equities, corporate 
bond performance is partly linked to infla¬ 
tion: Rising price levels drive up corporate 
revenues and reduce the real value of exist¬ 
ing fixed-rate debt, and both these factors can 
cause yield spreads to tighten. However, this 
relationship is often weak and dominated by 
other factors. 

4. Commodities: A basket of commodities also 
provides a partial hedge against inflation; 
in practice, this investment alternative was 
not historically as important as the previous 
three, though its importance has increased 
considerably since 2005 as financial innova¬ 
tion has expanded the investor base. 

To summarize: Real yields are far from stable, 
and the behavior of real yields is just as complex 
as the behavior of nominal yields. Real yields 
are influenced by both economic fundamentals 
and market supply/demand factors across as¬ 
set classes. It is not at all obvious that inflation- 
linked bonds should be "among the least risky 
of all assets." Indeed, in the Australian market 
these securities were long regarded as highly 
risky in comparison to nominal bonds—though 
this is partly because of their poorer liquidity. 


In all countries where inflation-linked bonds 
are actively traded, real yields have, historically, 
been quite volatile. Like nominal yields, market 
real yields trade in ranges of hundreds of basis 
points (see Figure 1). Historical examples from 
other countries include: 

• In the U.K., real yields on long index-linked 
gilts fluctuated between 2% and 4.5% in the 
period 1981-1993. 3 In the period 1984-1994, 
real yields on short index-linked gilts fluctu¬ 
ated between 1.5% and 5.75%, partly reflect¬ 
ing instability in monetary policy. 4 

• In Israel from 1984-1993, long-dated real 
yields fluctuated between —1.5% and 3.3%; 
however, they more typically traded in the 
range ±1%. 5 

• In Australia, real yields have varied from a 
high of 5.75% in 1986 and 1994 to a low of 
3.25% in 1993. 6 

Real yields are often estimated by subtract¬ 
ing current (i.e., recent historical) inflation from 
current nominal bond yields; but this proce¬ 
dure is obviously illogical, as it assumes that 
expected inflation is equal to current inflation. 
One can get a better idea of what market real 
yields would have been by taking nominal 
yields and subtracting a consensus inflation 
forecast. Figure 1 shows the 10-year nominal 
Treasury yield minus the 10-year consensus CPI 
forecast, as reported in the Philadelphia Fed's 
Survey of Professional Forecasters; this mea¬ 
sures investors' expectations of real returns on 
10-year Treasury bonds and is therefore a rea¬ 
sonable estimate of the 10-year real yield go¬ 
ing back several decades. Figure 1 also shows 
the market real yield of the 10-year TIPS (dat¬ 
ing back only to 1997); it is correlated with the 
survey-based real yield estimate, but not per¬ 
fectly. We discuss this divergence at the end of 
the entry. 

Even though using consensus data has a num¬ 
ber of drawbacks, this rough analysis yields 
some useful results. The figure shows clearly 
how long-dated real yields soared in the early 
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Figure 1 U.S. 10-Year Real Yield Estimated from Consensus Long-Term CPI Forecasts and TIPS 
Real Yield 


1980s, due to the extreme instability in mone¬ 
tary policy They stabilized after 1985, once the 
Fed stopped targeting monetary aggregates and 
adopted interest rate targeting instead. Since 
then they have fluctuated between 5% (in the 
overheated economy of the late 1980s) and less 
than 0.5% (in the crisis and postcrisis periods). 
Note the apparent link between long-term real 
yields and current GDP growth in recent years. 

Figure 2 shows a more detailed history of TIPS 
real yields since issuance. It also shows the yield 
spread between the 10-year TIPS and the 10- 
year CMT nominal yield. This may be regarded 
as a rough measure of the market's inflation 
expectations over the next 10 years. 

It's interesting that 10-year TIPS real yields 
have never been stable, whereas 10-year TIPS 
break-even inflation was remarkably stable 
from about 2004-2007, a period of relative 
macroeconomic stability and strong Fed cred¬ 
ibility. Also note the extraordinary period of 
volatility during the crisis period of late 2008 
and early 2009, during which TIPS were highly 
correlated with risky asset classes such as eq¬ 
uities and credit (as predicted above, but per¬ 


haps not for the fundamental economic reasons 
cited). 

A derivative market for inflation swaps 
has developed alongside the cash market for 
inflation-linked bonds. While inflation swaps 
will not be discussed explicitly in this entry, 
much of the material is also applicable to them. 

Existence of an Inflation 
Risk Premium 

It is often asserted that real yields on inflation- 
linked bonds should reflect an inflation risk 
premium, since investors are not exposed to in¬ 
flation risk as they are with nominal bonds. 
Note that if future inflation were known—not 
necessarily zero—there would be no inflation 
risk premium; it is uncertainty about inflation 
that creates a risk premium. The more volatile 
inflation is expected to be, the higher the infla¬ 
tion risk premium on nominal bonds should be, 
and the lower real yields should be in relation 
to nominal yields. 

It is important to note that it is uncertainty 
about future inflation that should determine the 
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Figure 2 TIPS Real Yield History and Spread to Nominal Yield Curve ("Break-Even Inflation") 


risk premium, not the historical volatility of in¬ 
flation. For example, the inflationary episode of 
the 1970s is not relevant unless investors think it 
may be repeated. Investors' expectations about 
the future volatility of inflation are not directly 
observable, but it may be helpful to look at 
economists' estimates. It is also useful to com¬ 
pare expected inflation volatility with expected 
volatility in real interest rates, since both factors 
are relevant to the risk/return opportunities of¬ 
fered by inflation-indexed bonds. 

Note that if the inflation risk premium ex¬ 
ists, one would not expect it to be unvary¬ 
ing. Since it is related to market expectations 
about potential uncertainty in inflation, it is 
comparable to option-implied volatility. One 
would thus expect the inflation risk premium 
to depend on bond maturity, and also to vary 
over time; for example, if the market lost con¬ 
fidence in the Fed's ability or willingness to 
control inflation, the inflation risk premium 
would rise, causing nominal yields to rise rel¬ 
ative to real yields. However, since the infla¬ 
tion risk premium is determined by inflation 
uncertainty over a long period (10 years for the 


10-year TIPS), sudden changes should be un¬ 
usual. Absent unusual shocks to Fed credibility, 
the inflation risk premium should experience 
moderate fluctuations, like long-dated swap¬ 
tion implied volatilities, and not sharp ones, like 
short-dated exchange-traded option implied 
volatilities. 

In the absence of a complete inflation-linked 
derivatives market, the inflation risk premium 
is not directly observable. Furthermore, naive 
attempts to measure it can lead to grossly over¬ 
stated estimates, and a number of proposed 
methods for measuring it turn out to be spu¬ 
rious. For example, it has been asserted that the 
differential between money market and bond 
yields arises because of an inflation risk pre¬ 
mium, which can thus be estimated by looking 
at the long-term average spread between the 
Fed Funds rate and the two-year bond yield 
(about 70 bp in the period since deregulation). 
This argument has a grain of truth, but the con¬ 
clusion is incorrect as it stands. The slope of 
the yield curve reflects a term premium that 
is not solely attributable to inflation risk. In 
addition, there are other reasons why money 
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market yields are usually lower than bond 
yields: Liquidity preference and the impact of 
capital charges both have important effects. 
Furthermore, if the spread between money mar¬ 
ket and bond yields reflects a risk premium, this 
is not just an inflation risk premium but a real 
rate risk premium as well. 

Also, the argument that the difference be¬ 
tween the Fed Funds rate and the two-year 
bond yield equals the inflation risk premium 
implies that the yields of money market secu¬ 
rities reflect no inflation risk premium, while 
this risk premium is fully priced into two-year 
bond yields. This would only be plausible if 
money market securities were not (perceived to 
be) subject to inflation risk, and this is far from 
obvious, particularly since real money market 
returns were frequently negative during the 
1970s. 

Thus we must look for more valid ways of 
estimating what the inflation risk premium 
should be. There is no strong consensus in the 
literature, and a surprisingly wide range of es¬ 
timates appears in the literature, from around 
100 bp to modestly negative. 7 Flowever, the 
most credible estimates tend to fall in the zero to 
50 bp range. 8 

One approach is to try to observe inflation un¬ 
certainty directly and then derive a "fair " infla¬ 
tion risk premium by applying a market price of 
risk. Figure 3 shows the probabilities attached 
by economists to various GDP growth and in¬ 
flation scenarios; it is taken from the Survey of 
Professional Forecasters. 

Economists' forecasts recognize that both in¬ 
flation and real yields are volatile, and that they 
have comparable volatilities. It is tempting to 
conclude that nominal bond yields should in¬ 
deed reflect an inflation risk premium, since 
returns on nominal bonds are affected by both 
inflation volatility and real yield volatility, 
while returns on inflation-linked bonds are only 
affected by real yield volatility. And the re¬ 
ported uncertainty in inflation naively leads to 
an (again, very rough) estimate of the inflation 
risk premium at the upper end of the range 


mentioned above. But this conclusion is not nec¬ 
essarily correct. 

Based on an analysis of 30 years' worth 
of cross-country panel data, Judson and 
Orphanides (1999) have shown that—as one 
might expect—there is a strong negative cor¬ 
relation between inflation and growth. Thus, 
as inflation rises, real yields should fall, and 
vice versa; in other words, the risks arising from 
fluctuations in inflation and fluctuations in real 
yields at least partly offset each other, at least 
over the medium to long term. It is therefore 
conceivable that, over the medium to long term, 
a portfolio of nominal bonds may be less risky, 
not more risky, than a portfolio of inflation- 
linked bonds, in which case an inflation risk 
premium need not exist at all. Certainly the sit¬ 
uation is more complex than it seems at first. 

We can actually use the earlier "economists' 
estimates" of volatility in real GDP growth and 
CPI inflation, together with the implied volatil¬ 
ity of short-term rates, to compute a rough 
estimate of the correlation between inflation 
and growth. Assuming that nominal rates are 
solely determined by growth and inflation, we 
have: 

a nom = °GDP + a CPl + 2 P^GDP^CPI 

where ctgdp, g cpi denote the volatility of growth 
and inflation respectively, and p is the correla¬ 
tion between growth and inflation. 

If all three volatilities are around 1% per an¬ 
num, then solving this formula for p gives an 
estimate of around 0.5. Flowever, it would not 
be meaningful to try to compute a more precise 
estimate this way. 

Incidentally, Judson and Orphanides (1999) 
also found a strong negative correlation be¬ 
tween inflation volatility and growth. In other 
words, if inflation is expected to become more 
volatile, real yields should fall, that is, inflation- 
indexed bond prices should rally. Flowever, in 
this scenario inflation-linked bonds should out¬ 
perform nominal bonds, since the inflation risk 
premium should rise. 
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Probability distribution of GDP growth rates for next 12 months 



Probability distribution of CPI inflation rates for 12mo following next 12mo 



Source: Federal Reserve Board 


Figure 3 Economists' Uncertainty about Future GDP Growth and Future Inflation 


INFLATION-INDEXED 
BONDS IN A NOMINAL 
PORTFOLIO 

TIPS behave in unique ways and resemble nei¬ 
ther nominal Treasuries nor spread products. 
It's therefore worth going back to basics in or¬ 
der to understand the nature of the interest rate 


risk inherent in TIPS and the role they can play 
in broader portfolios. 

What Is the Duration of an 
Inflation-Indexed Bond? 

Inflation-indexed bonds are often used for 
specialized purposes (e.g., asset/liability 
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management for insurance companies offering 
inflation-linked life annuities, or for defined 
benefit plans where benefits are subject to 
cost of living adjustments), and may thus be 
segregated from other fixed-income holdings. 
However, if they are held in the same portfolio 
as nominal bonds, an interesting problem 
arises when attempting to define their price 
sensitivity to rate changes as measured by "du¬ 
ration." We first examine the simplest possible 
definition of duration and its consequences; 
then we look at some alternative definitions. 

It is easy to compute the duration of an 
inflation-indexed bond using exactly the same 
method as one would use for a nominal bond. 
Because of its low real coupon and low real 
yield, an inflation-indexed bond tends to have 
a much longer duration than a nominal bond of 
comparable maturity. 

But what does this duration mean? The du¬ 
ration of a nominal Treasury bond measures its 
sensitivity to changes in nominal yields, that 
is, to changes in inflation and real interest rate 
expectations. By contrast, the duration of an 
inflation-linked bond measures its sensitivity 
to changes in real yields, that is, to changes in 
real interest rate expectations alone. In other 
words, the two durations are not comparable: 
They are measuring different things. So, for ex¬ 
ample, it does not make sense to look for a "ref¬ 
erence" nominal yield for the TIPS real yield: 
While the TIPS yield may appear to trade off 
the 10-year Treasury during some periods, or 
off the 5-year Treasury during other periods, 
there is no fundamental reason why any such 
relationship should persist. 

This creates a problem at the portfolio level. 
If we try to compute a portfolio duration 
by adding up the durations of nominal and 
inflation-indexed bond holdings, what does the 
resulting figure mean? Two portfolios could 
have the same duration but, depending on 
the relative weighting of index-linked bonds, 
might have a very different response to a 
change in investors' economic expectations. 
A simple duration target is no longer an 


effective way of controlling portfolio interest 
rate risk. 

Thus, when a portfolio contains both nominal 
and inflation-linked bonds, it is critical to mon¬ 
itor and report the relative weights and dura¬ 
tions of the "nominal" and "real" components 
of the portfolio separately. One approach is to 
report two durations for the portfolio, which 
distinguish two sources of risk: 

1. A "portfolio real yield duration" equal to the 
sum of the durations of both nominal and 
inflation-indexed bond holdings. This shows 
how the portfolio value will respond to a 
change in market real yields (which also af¬ 
fect nominal yields). 

2. A "portfolio inflation duration" equal to 
the duration of the nominal bond holdings 
alone. This shows how the portfolio value 
will respond to a change in market inflation 
expectations (which affect nominal yields 
but not real yields). 

Similarly, care must be taken when carrying 
out portfolio simulations. For example, when 
carrying out parallel interest rate simulations, it 
is standard practice to apply an identical yield 
shift to all securities in the portfolio. For a port¬ 
folio containing both nominal and inflation- 
indexed bonds, this actually corresponds to a 
"real yield simulation." One should also carry 
out "expected inflation simulations," where 
the yield shift is applied to nominal but not 
inflation-indexed bond yields. 

There is one practical situation in which it 
makes sense to compare the durations of a nom¬ 
inal and inflation-indexed bond directly: when 
designing trading strategies based on expected 
inflation. Suppose the central banking author¬ 
ity is targeting a long-term core CPI inflation 
rate of no more than 2%; and suppose that the 
10-year nominal yield is 3.5% while the 10-year 
real yield is 0.5%. This means that the market 
is predicting an average headline CPI inflation 
rate, over the next 10 years, of 3%. If one had 
faith in the central bank's ability to meet its in¬ 
flation target and one did not believe headline 
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CPI would consistently outpace core CPI over 
the next decade, nominal bonds would look un¬ 
dervalued relative to inflation-indexed bonds. 

How should one exploit this perceived op¬ 
portunity without changing exposure to other 
sources of risk? The correct way is to exe¬ 
cute a duration-matched swap, selling 10-year 
inflation-indexed bonds and buying 10-year 
nominal bonds. If inflation expectations fall, 
the strategy would realize a profit. If real inter¬ 
est rate expectations change (i.e., if real yields 
change), there would be no effect—which is 
the intention. (This kind of strategy can be 
implemented more precisely using a full term 
structure of market inflation forecasts, and in¬ 
corporating short horizon economist forecasts.) 

The above duration calculation is based on 
the (known) real cash flows and discounts at 
the real yield. There are other potential ways to 
compute the "duration" of an inflation-linked 
bond, which involve forecasting the (unknown) 
nominal cash flows and discounting using nom¬ 
inal yields on a zero coupon curve basis. The 
three most obvious alternatives are: 

1. Using a fixed inflation forecast, generate pro¬ 
jected bond cash flows (one should use a fore¬ 
cast that ensures that the net present value of 


the forecast cash flows, discounted using the 
current nominal zero coupon curve, is equal 
to the current bond price). Compute the du¬ 
ration of this fixed cash flow stream using 
±100 bp shifts in the nominal zero coupon 
curve. 

2. The same, except that when shifting the zero 
coupon curve by ±100 bp, one recalculates 
the bond cash flows based on a new inflation 
forecast, adjusted by ±1%. That is, the cash 
flow stream is assumed to depend on the 
level of nominal yields. 

3. The same, except that one adjusts the infla¬ 
tion forecast by an amount different from 
±1%. For example. Figure 4 shows that his¬ 
torically, a 10 bp rise in U.S. nominal yields 
corresponded, on average, to a 9 bp rise 
in market long-term inflation expectations 
(though with much variation around that 
average). Thus one might adjust the inflation 
forecast by (say) ±0.9%. The precise number 
depends on the reference Treasury yield, 
and the historical period used to estimate 
the relationship. 

In each case, some minor variations are pos¬ 
sible; for example, either constant or time- 
varying inflation forecasts could be used. These 
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Figure 4 Shift in Nominal Yield versus Shift in Break-Even Inflation 
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calculations can be related to the above con¬ 
cepts of "real yield duration" and "inflation du¬ 
ration" in the following way: 

1. Assuming a fixed cash flow stream (i.e., a 
fixed inflation scenario) amounts to assum¬ 
ing that the ±100 bp shift in nominal yields 
is due to a change in real yields, not a change 
in inflation expectations. Thus, this calcula¬ 
tion determines the sensitivity to a change 
in real yields, that is, it is essentially com¬ 
puting a real yield duration and produces an 
answer very close to the duration calculation 
described above. 

2. Assuming an inflation scenario that varies by 
±1% amounts to assuming that the ±100 bp 
shift in nominal yields is due to a change in 
inflation expectations. Thus, this calculation 
measures an inflation duration, that is, a sen¬ 
sitivity to a shift in market inflation expec¬ 
tations, which is conceptually different from 
the real yield duration. The inflation dura¬ 
tion of a TIPS will be approximately zero, 
but it may depend on the precise way the 
calculation is carried out. 

3. Assuming an inflation scenario that varies 
by some amount based on the empirical rela¬ 
tionship between nominal yields and market 
inflation expectations amounts to calculating 
a nominal yield duration, which attempts to 
measure the sensitivity of an inflation-linked 
bond to a shift in nominal yields. 

Real yield duration is the most important of 
these risk measures—and, as we have seen, 
it can be calculated without using an infla¬ 
tion forecast. The inflation duration is not a 
useful risk measure for TIPS; however, in the 
U.K. and Australian markets, where inflation- 
indexed Treasuries have some residual inflation 
sensitivity due to the lag in inflation indexation, 
inflation duration is perhaps worth monitoring. 
The definition of nominal yield duration makes 
essential use of an estimate about an empirical 
relationship that is probably unstable, severely 
limiting the usefulness of this risk measure. 


Note that if inflation-indexed Treasury bonds 
did have stable nominal durations—that is, if 
they did respond in an absolutely predictable 
way to a change in nominal yields—then they 
would not be a very useful risk manage¬ 
ment tool, since their mark-to-market behavior 
could be perfectly replicated by nominal bonds, 
which, moreover, are more liquid. In fact, expe¬ 
rience shows that inflation-indexed bonds can¬ 
not be hedged perfectly with nominal bonds. 

One can also attempt to compute a "tax- 
adjusted duration" for an inflation-linked bond, 
which takes its tax treatment into account; this 
may be of importance in the U.K., where infla¬ 
tion accruals are not taxed. In the U.S. market 
inflation-linked and nominal bonds are taxed 
on a broadly consistent basis; in particular, by 
analogy with Treasury STRIPS, the inflation ad¬ 
justment to the bond principal is taxable as it 
occurs, and not simply at bond maturity. Thus, 
just as one continues to use pretax durations for 
Treasury STRIPS despite their tax treatment, it 
seems reasonable to use pretax durations for 
TIPS as well. The trading behavior of inflation- 
linked bonds in a range of markets suggests that 
pretax duration measures suffice for most day- 
to-day interest rate risk management. However, 
it is worth discussing tax briefly. 

The Impact of Taxation: An Outline 

Inflation-indexed bonds attempt to eliminate 
inflation risk, but it reappears on an after-tax 
basis. We begin with the fact that tax affects 
returns on both nominal bonds and inflation- 
indexed bonds in an unfortunate way: High in¬ 
flation results in lower after-tax real returns. For 
inflation-indexed bonds, an investor would rea¬ 
son as follows: 9 

forecast after-tax real yield 

= forecast after-tax nominal yield 

— forecast inflation 

= tax rate x forecast pretax nominal yield 

— forecast inflation 

= tax rate x (pretax real yield 

± forecast inflation) — forecast inflation 
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= tax rate x pretax real yield — 

(1 — tax rate) • forecast inflation 

For nominal bonds, the reasoning is similar: 

forecast after-tax real yield 

= forecast after-tax nominal yield 

— forecast inflation 

= tax rate x forecast pretax nominal yield 

— forecast inflation 

= tax rate x (pretax real yield 

+ market inflation) — forecast inflation 
= tax rate x pretax real yield 

— (forecast inflation 

— tax rate ■ market inflation) 

where "forecast inflation" refers to the in¬ 
vestor's inflation forecast and "market infla¬ 
tion" refers to the market's inflation forecast as 
reflected in the spread between market nominal 
yields and market real yields. 

Thus an investor who agrees with the mar¬ 
ket's inflation forecast and who is thus in¬ 
different between inflation-linked bonds and 
nominal bonds on a pretax basis will also be 
indifferent on an after-tax basis. The arguments 
show that projected after-tax real returns on 
both inflation-indexed and nominal bonds de¬ 
pend on forecast inflation. 

An important consequence is that since U.S. 
inflation-indexed bonds and nominal bonds are 
affected equally, inflation-linked bonds do not 
protect investors against the negative after-tax 
impact of high inflation. Thus, TIPS real yields 
reflect only a premium for "pretax inflation 
risk." By contrast, since U.K. index-linked gilts 
receive preferential tax treatment, their yields 
also reflect a premium for "after-tax inflation 
risk." The price paid by U.K. investors, as ob¬ 
served by Roll (1996) and by Brown and Schae¬ 
fer (1996), is lower liquidity: The market for 
index-linked gilts is confined to investors with 
high marginal tax rates and to investors who 
have other incentives, such as regulatory in¬ 
centives, to own inflation-linked securities. 10 
Roll (1996) points out a further consequence: 
If the demand for inflation-indexed or nomi¬ 


nal bonds is a function of expected after-tax re¬ 
turns, pretax real yields should rise as expected 
inflation rises, to maintain a constant after-tax 
real yield. It is not clear whether real yields on 
inflation-indexed bonds actually behave in this 
way, although the Australian experience in 1994 
suggests that they do. In any case, this intro¬ 
duces a further source of uncertainty about the 
future behavior of real yields. 

Inflation-Indexed Bonds and 
Portfolio Efficiency 

Inflation-indexed bonds have a risk profile 
quite different from that of nominal bonds. In 
fact, it could be argued that for asset alloca¬ 
tion purposes, they should not be grouped with 
nominal bonds but should be treated as an en¬ 
tirely separate asset class. We will use portfolio 
theory to explore the consequences of adopt¬ 
ing this point of view. More specifically, we will 
try to determine what weight TIPS should have 
in efficient portfolios with varying degrees of 
risk, and what impact their inclusion has on ex¬ 
pected returns. 

For simplicity, we focus on maximizing nom¬ 
inal returns in the U.S. market, and we work in 
a total return framework. Other kinds of analy¬ 
sis are possible; for example, Eichholtz, Naber, 
and Petri (1993) discuss the problem of match¬ 
ing inflation-indexed liabilities in the U.K. and 
Israeli markets. 

The results of any Markowitz-style analysis 
are always highly dependent on the expected 
returns, volatilities, and correlations used. The 
assumptions we use are set out in Table 1 and 
are broadly based on market data and pre¬ 
sumed market expectations. They were derived 
as follows: 

1. Expected nominal returns for cash and nom¬ 
inal bonds (aggregate bond index) are based 
on current market yields—this is more mean¬ 
ingful than using historical returns. For sim¬ 
plicity, we assume that nominal bonds and 
TIPS have the same expected return (in prac¬ 
tice, one would derive expected returns more 
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Table 1 Assumptions Used in Efficient 
Portfolio Analysis 



Cash 

Bonds 

TIPS 

Equities 

Expected return 
Return volatility 
Correlations 

2.0% 

0.6% 

2.8% 

3.6% 

2.8% 

5.9%/3.6% 

5.8% 

16.5% 

Cash 

1.00 

0.06 

-0.03 

0.01 

Bonds 


1.00 

0.73 

-0.01 

TIPS 



1.00 

0.03 

Equities 




1.00 


carefully). The expected return for equities is 
obtained by adding a risk premium of 3% to 
that for bonds. 

2. Return volatilities for cash, nominal bonds, 
TIPS, and equities are historical, calculated 
using monthly Barclays index return data 
and S&P 500 return data over the pe¬ 
riod 1997-2011. In addition to the histori¬ 
cal volatility for TIPS—which is quite high, 
largely due to the experience during the cri¬ 
sis period—we also carry out an alterna¬ 
tive analysis that assumes that they have the 
same volatility as nominal bonds. 

3. We use historical correlations estimated us¬ 
ing the same time period. 


At first glance, the results seem highly depen¬ 
dent on the volatility assumption used for TIPS. 
Figure 5 shows the composition of theoreti¬ 
cally efficient portfolios with varying degrees 
of risk, using the realistic volatility assump¬ 
tion; Figure 6 shows the same, using the low 
volatility assumption. Using the higher volatil¬ 
ity, TIPS play almost no role in any efficient 
portfolio; for example, at moderate risk levels, 
nominal bonds are preferred because of their 
lower correlation with equities. However, using 
the lower volatility, TIPS have a much more im¬ 
portant role to play. They partly displace cash 
at low risk levels, and more importantly they 
partly displace nominal bonds at moderate risk 
levels. Only the equity weightings remain more 
or less unchanged. 

But how much value do TIPS actually add? 
Figure 7 shows the efficient frontier; that is, 
expected returns from efficient portfolios, cal¬ 
culated using both the realistic and low TIPS 
volatility assumptions. Above the 2% risk level, 
they are very close: Expected returns differ 
only marginally. That is, even assuming that 
TIPS will have a very low return volatility does 
not significantly increase their expected value 
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Figure 5 Composition of Efficient Portfolios, 5.9% Volatility Assumption 
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Figure 6 Composition of Efficient Portfolios, 3.6% Volatility Assumption 


added to portfolio returns unless different ex¬ 
pected return assumptions are used as well 
(or unless we move beyond the pure mean- 
variance framework). 

Figure 8 is even more telling. It shows ex¬ 
pected returns from efficient portfolios under 
the low TIPS volatility assumption for both 


unconstrained portfolios and portfolios from 
which TIPS have been excluded. Even at moder¬ 
ate risk levels, where TIPS are most important, 
the difference in expected returns is extremely 
modest. Moreover, an investor who currently 
held a TIPS-free portfolio, and who wanted to 
capture these additional few basis points by 



Figure 7 Efficient Frontier for the Two Different Volatility Assumptions 
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Figure 8 Efficient Portfolios Including and Excluding TIPS 


purchasing TIPS, would have to trade over a 
quarter of the portfolio to achieve the optimal 
asset class weightings. 

The overall conclusions are that (1) a realis¬ 
tic TIPS return volatility assumption, consistent 
with historical experience, implies that TIPS do 
not add much value to asset allocation; and 
(2) even under a very optimistic TIPS return 
volatility assumption, the value added by TIPS 
is modest. The main reasons are that TIPS do 
not have a higher expected return than nomi¬ 
nal bonds, but have a slightly higher assumed 
correlation with equities. 

These results should be compared with the 
findings of Eichholtz, Naber, and Petri (1993), 
who used data from 1983-1991 and discovered 
a significant difference between relatively low- 
inflation countries such as the U.K. in that pe¬ 
riod and countries such as Israel where inflation 
had been extremely high and volatile. 

• Results for the U.K.: If the goal is to maximize 
total return, inflation-linked bonds do not ap¬ 
pear in any efficient portfolio. If inflation- 
linked liabilities are included in the problem 
(but setting regulatory considerations aside), 
they appear in very low-risk efficient portfo¬ 
lios, but with negligible weight: less than 1%. 


• Results for Israel: If the goal is to maximize 
total return, inflation-linked bonds play a mi¬ 
nor role in low-risk portfolios but a major 
role in risky portfolios, sometimes having a 
weight of over 50%. If inflation-linked liabil¬ 
ities are included in the problem, inflation- 
linked bonds play a major role at all levels of 
risk, with weights between 44% and 88%. 

TIPS provide insurance against inflation, and 
each investor's subjective assessment of future 
inflation risk and the need for inflation protec¬ 
tion must strongly influence any conclusions 
about the role of TIPS. U.S. investors will have 
to decide which set of results provides more 
useful guidance. 

ADVANCED ANALYTICAL 
APPROACHES TO 
INLLATION-INDEXED 
BONDS 

As the U.S. TIPS market has matured, with a 
full term structure of maturities and a trading 
history spanning several business cycles and in¬ 
flation environments, investors have developed 
many analytical approaches in the search for 
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investment opportunities. Rather than attempt¬ 
ing a comprehensive survey, the remainder of 
this entry gives two brief examples, both of 
them focusing on economic factors rather than 
supply/ demand relationships or "market tech¬ 
nicals." Standard econometric techniques turn 
out to be useful. 

Link between TIPS Performance 
and Short-Term Inflation 

The relative performance of TIPS versus nom¬ 
inal Treasuries is determined both by daily 
mark-to-market movements and by infla¬ 
tion accrual, which influences both inflation- 
adjusted principal and interest payments. 
Inflation accrual is clearly determined by real¬ 
ized headline CPI inflation (relative to nominal 
yields). Since headline CPI is quite volatile, this 
"carry" component of TIPS returns can often be 
a dominant factor in the performance of short 
and even intermediate maturity TIPS. 

One useful way of looking at this is by isolat¬ 
ing the impact of the more volatile components 
of CPI. Bryan and Meyer (2010) divide the CPI 
basket into "flexible price" and "sticky price" 
categories, leading to two separate measures 


of inflation. Examples of flexible price items 
are gasoline, fruit and vegetables, and women's 
apparel; examples of sticky price items are fur¬ 
niture, alcoholic beverages, and public trans¬ 
portation. 

Figure 9 shows that during the period since 
the introduction of TIPS in 1997, there was a 
positive correlation between changes in 10-year 
TIPS break-even inflation and changes in three- 
month flexible price CPI inflation. It may seem 
surprising that 10-year inflation expectations 
are visibly influenced by realized three-month 
inflation; but this simply reflects the strong in¬ 
fluence of carry on the trading behavior of TIPS, 
even 10-year maturity TIPS. 

We can get a more refined view of the re¬ 
lationship if we run a vector autoregression 
analysis and examine the impulse-response 
functions. Some of the results from an analy¬ 
sis using 2003-2011 data on five-year TIPS (i.e., 
a shorter maturity, more strongly influenced 
by carry considerations) are shown in Fig¬ 
ure 10. The impulse-response functions suggest 
that: 

1. 5-year TIPS break-even inflation responds 
more strongly to flexible price CPI than 
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change in flexible price CPI inflation (3-month, seasonally adjusted) 


Figure 9 TIPS Performance and Flexible Price CPI 
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Response of 5-year TIPS breakeven inflation to inflation shocks 



months after shock 


Decay of a shock to flexible price CPI 



months after shock 

Figure 10 Five-Year TIPS Break-Even Inflation and Shocks to Flexible Price CPI Inflation 


to sticky price CPI, that is, the sources of 
shorter-term inflation volatility are more im¬ 
portant; despite the fact that 
2. Shocks to flexible price CPI inflation tend to 
be quite short-lived, dissipating after a cou¬ 
ple of months and even tending to (partially) 
correct. 


Since TIPS inflation accrual is based on non- 
seasonally-adjusted headline CPI inflation, a 
further aspect of TIPS carry is the strong sea¬ 
sonal pattern exhibited by CPI inflation. This 
needs to be analyzed separately. Seasonal fac¬ 
tors have often been a source of market ineffi¬ 
ciency in the past. 
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The TIPS Premium versus 
Survey-Based Real Yield Measures 

As can be seen from Figure 1, TIPS real yields 
have usually (but not always) been higher than 
the real yields implied by subtracting consen¬ 
sus inflation forecasts from observed nominal 
Treasury yields. In other words, TIPS real yields 
usually incorporate an apparent "concession." 
The historical behavior of this apparent real 
yield premium is shown in Figure 11, together 
with an estimate of its trend behavior (de¬ 
rived by applying a standard Hodrick-Prescott 
filter 11 ). 

This premium has averaged around 40-50 bp, 
but has fluctuated quite a bit over time. It seems 
to mean revert to trend (heavy line in Figure 11) 
over about a 12-month period on average. 

Why would this premium exist? 

1. Survey bias: Economists' forecasts of fu¬ 
ture inflation may be systematically biased 
(higher) relative to the market's forecasts. 
This is more likely to have been true dur¬ 
ing the period of declining trend inflation 
from the mid-1980s to the mid-2000s; and in¬ 
deed the real yield premium seems to have 


decreased since then, though it has still been 
positive on average. 

2. Recalculation risk: There may be a downward 
bias to the risk of future changes to the defi¬ 
nition of CPI. 12 

3. Liquidity: TIPS are less liquid than nominals 
(i.e., they have wider and more uncertain 
bid /ask spreads, and greater market impact 
of large trades), so investors require a higher 
real yield to compensate for that. 

4. Tracking error: TIPS aren't in the standard 
bond indexes, so index-sensitive investors 
need to be compensated for the fact that ow¬ 
ing TIPS leads to additional tracking error. 

5. Undesirable correlations: TIPS tend to under¬ 
perform in deflations/recessions, which is 
when investors most want bonds to do well; 
another kind of undesirable correlation is 
that TIPS liquidity tends to deteriorate in pe¬ 
riods of general market stress. 

The first two factors are difficult to quantify, 
but one could argue that their influence is prob¬ 
ably fairly constant over time. It thus seems 
feasible to develop relative value measures con¬ 
ditioned on the remaining three factors, which 
are potentially more tractable. 



Figure 11 Apparent TIPS Real Yield Premium 
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The (il)liquidity premium turns out to be 
particularly important, since it exhibits the 
most time variation. It is difficult to estimate 
based on yield data alone; 13 furthermore, as 
with all liquidity premiums, it is not fully 
determined by prevailing liquidity conditions 
(bid /ask spreads, quoted volumes, and market 
impact of trades) but is also influenced by the 
perceived risk that future liquidity conditions 
may differ from today's. 

Modeling liquidity premiums is extremely 
difficult, but useful information can be ex¬ 
tracted via model-free approaches. For exam¬ 
ple, Christensen and Gillan (2011) argue that 
the difference between TIPS break-even infla¬ 
tion and inflation swap rates provides a time- 
varying upper bound on the TIPS liquidity 
premium. This upper bound has typically fluc¬ 
tuated between 10 bp and 20 bp, but rose to over 
100 bp in late 2008 during the financial crisis; it 
has been highly correlated with other measures 
of bond liquidity, such as the yield premium 
of off-the-run versus on-the-run nominal Trea¬ 
suries. 


KEY POINTS 

• TIPS real yields are volatile. They are influ¬ 
enced by domestic growth, external balances, 
and the behavior of competing asset classes. 

• TIPS real yields also reflect a modest and 
somewhat volatile inflation risk premium. 

• There are different notions of TIPS duration 
corresponding to different aspects of TIPS in¬ 
terest rate risk. 

• TIPS often do not play a significant role in 
efficient portfolios, and some investors may 
be better off regarding them as opportunistic 
rather than core investments. 

• Market returns on TIPS are often driven 
by short-term inflation accrual, and this is 
best analyzed by breaking observed inflation 
down into suitable components. 

• Survey-based measures of inflation and real 
yields often differ from those implied by the 


TIPS market, and it is important for investors 
to understand the reasons for the divergence. 

NOTES 

1. The following discussion of risk factors 
expands on the account in Carmody and 
Mason (1996). 

2. See Chapter 12 in Keynes (1936). 

3. See Eichholtz, Naber, and Petri (1993). 

4. See Brown and Schaefer (1996). 

5. See Eichholtz, Naber, and Petri (1993). 

6. See Carmody and Mason (1996). 

7. See the citations in Grishchenko and Pluang 
(2009). 

8. See, for example, D'Amico, Kim, and Wei 
(2008) and Durham (2006). 

9. See Roll (1996) for more details. 

10. By the way, this provides an example of the 
tax clientele effects analyzed by Dybvig and 
Ross (1986). 

11. The Plodrick-Prescott filter developed in 
Plodrick and Prescott (1997) is an economet¬ 
ric technique employed in macroeconomics 
in the analysis of time series data. 

12. See the change in the calculation of CPI fol¬ 
lowing the recommendations of the Boskin 
Commission (1996). 

13. D'Amico, Kim, and Wei (2008). 
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Abstract: Credit risk technology has evolved with advances in computer science and information 
technology. Traditional credit ratings date back to 1860, an era when the cost of collecting and 
analyzing corporate credit information was high. The commercial advantages of a central provider 
of credit risk analysis were high. With the advent of better computer technology and databases of 
corporate financial information and stock prices, quantitative approaches to credit risk assessment 
have become more popular and increasingly accurate. Credit scoring is a quantitative approach to 
retail credit assessment, but, in the corporate world, more and more credit analysts prefer a default 
probability with an explicit maturity to a "credit rating" or "credit score." 


This entry introduces the topic of credit risk mod¬ 
eling by first summarizing the key objectives 
of credit risk modeling. We then discuss rat¬ 
ings and credit scores, contrasting them with 
modern default probability technology. Next, we 
discuss why valuation, pricing, and hedging of 
credit risky instruments are even more impor¬ 
tant than knowing the default probability of the 
issuer of the security. We review some empiri¬ 
cal data on the consistency of movements be¬ 
tween common stock prices and credit spreads 
with some surprising results. Finally, we com¬ 
pare the accuracy of ratings, the Merton model 
of risky debt, and reduced form credit models. 


KEY OBJECTIVES IN CREDIT 
RISK MODELING 

In short, the objective of the credit risk modeling 
process is to provide an investor with practical 
tools to "buy low/sell high." 1 Robert Merton, in 


a 2002 story retold by van Deventer, Imai, and 
Mesler (2004), explained how Wall Street has 
worked for years to get investors to focus on 
expected returns, ignoring risk, in order to get 
investors to move into higher risk investments. 
In a similar vein, investment banks have tried 
to get potential investors in collateralized debt 
obligations (CDOs) to focus on "expected loss" 
instead of market value and the volatility of that 
market value on a CDO. The result, according to 
the Global Stability Report of the International 
Monetary Fund, was an estimated $945 billion 
in global credit losses during the credit crisis 
that began in earnest in 2007. 2 

This means that we need more than a de¬ 
fault probability. The default probability pro¬ 
vides some help in the initial yes /no decision on 
a new transaction, but it is not enough informa¬ 
tion to make a well-informed yes/no, buy/sell 
decision, as we discuss below. Once the trans¬ 
action is done, we have a number of very crit¬ 
ical objectives from the credit risk modeling 
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process. We need to know the value of the 
portfolio, the risk of the portfolio (as measured 
most importantly by the random variation in its 
value), and the proper hedge of the risk if we 
deem the risk to be beyond our risk appetite. 
Indeed, the best single sentence test of a credit 
model is "What is the hedge?" If one cannot 
answer this question, the credit modeling effort 
falls far short of normal risk management stan¬ 
dards. It is inconceivable that an interest rate 
risk manager could not answer this question. 
Why should we expect any less from a credit 
risk manager, who probably has more risk in his 
area of responsibility than almost anyone else? 
Indeed, stress testing with respect to macroeco¬ 
nomic factors is now standard under proposals 
from the European Central Bank and under the 
U.S. programs titled "Supervisory Capital As¬ 
sessment Program" and "Comprehensive Cap¬ 
ital Analysis and Review." The latter programs, 
applied to the 19 largest financial institutions 
in the United States, focused on macro factors 
like home prices, real gross domestic product 
growth, and unemployment. 

RATINGS AND "CREDIT 
SCORES" VERSUS DEFAULT 
PROBABILITIES 

Rating agencies have played a major role in 
fixed income markets around the world since 
the origins of Standard & Poor's in 1860. Even 
the "rating agencies" of consumer debt, the 
credit bureaus, play prominently in the bank¬ 
ing markets of most industrialized countries. 
Why do financial institutions use ratings and 
credit scores instead of default probabilities? As 
a former banker myself, I confess that the em¬ 
barrassing answer is "There is no good reason" 
to use a rating or a credit score as long as the de¬ 
fault probability modeling effort is a sophisticated 
one and the inputs to that model are complete. 

Ratings have a lot in common with interest 
accrual based on 360 days in a year. Both rat¬ 
ings and this interest accrual convention date 


from an era that predates calculators and mod¬ 
ern default probability technology. Why use a 
debt rating updated every 1-2 years when one 
can literally have the full term structure of de¬ 
fault probabilities on every public company up¬ 
dated daily or in real time? In the past, there 
were good reasons for the reliance on ratings: 

• Default probability formulas were not dis¬ 
closed, so proper corporate governance 
would not allow reliance on those default 
probabilities. 

• Default probability model accuracy was ei¬ 
ther not disclosed or disclosed in such a way 
that weak performance was disguised by se¬ 
lecting small sectors of the covered universe 
for testing. 

• Default probability models relied on old tech¬ 
nology, like the Merton model of risky debt 
and its variants, that has long been recognized 
as out of date. 3 

• Default probability models implausibly re¬ 
lied on a single input (the unobservable value 
of company assets), ignoring other obvious 
determinants of credit risk like cash balances, 
cash flow coverage, the charge card balance 
of the CEO of a small business, or the number 
of days past due on a retail credit. 

With modern credit technology, none of these 
reasons are currently valid because there is a 
rich, modern credit technology available with 
full disclosure and an unconstrained ability to 
take useful explanatory variables. In this vein, 
ratings suffer from a number of comparisons to 
the modem credit model: 

• Ratings are discrete with a limited number of 
grades. There are 21 Standard & Poor's rat¬ 
ings grades, for example, running from AAA 
to D. Default probabilities are continuous and 
run (or should run) from 0 to 100%. 

• Ratings are updated very infrequently and 
there are obvious barriers that provoke even 
later than usual response from the rating 
agencies, like the 2004 downgrade from AAA 
to AA- for Merck, a full three weeks after the 
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withdrawal of its major drug Vioxx crushed 
the company's stock price. Another example 
is General Electric, first rated AAA in 1956, 
which was not downgraded until March 2009, 
a full four months after General Electric was 
forced to borrow under the Federal Reserve's 
Commercial Paper Funding Facility. 4 Default 
probabilities can adjust in real time if done 
right. 

• Ratings have an ambiguous maturity, which 
we discuss in the next section. The full term 
structure of default probabilities is available 
and the obvious impact of the business cy¬ 
cle is observable: The full default probabil¬ 
ity term structure rises and falls through the 
business cycle, with short-term default prob¬ 
abilities rising and falling more dramatically 
than long-term default probabilities. Figure 1 
illustrates this cyclical rise and fall during the 
credit crisis of 2007-2011 for Bank of Amer¬ 
ica Corporation and Citigroup, two of the 
largest U.S. bank holding companies, using 
the reduced form model default probabilities 
discussed below and provided by Kamakura 
Corporation. 


The cyclical rise and fall of default probabil¬ 
ities for both banks are very clear and show 
the impact of the credit crisis, which was at its 
height in 2007-2009. We take a longer-term view 
from 1990 to 2006 below. 

Figure 2 shows clearly the joint rise in default 
probabilities in 1990-1991, a mini recession in 

1994- 1995, and the impact of the Russian debt 
crisis and high-tech crash in 1998-2002. By way 
of contrast. Standard & Poor's only changed 
its ratings on Bank of America twice in the 

1995- 2006 period, once in 1996 and once in 2005. 

What about consumer and small business 

"credit scores"? Like ratings and the interest ac¬ 
crual method mentioned above, these date from 
an era when there was limited understanding of 
credit risk in the financial community. Vendors 
of credit scores had two objectives in marketing 
a credit risk product: to make it simple enough 
for any banker to understand and to avoid an¬ 
gering consumers who might later learn how 
they are ranked under the credit measure. The 
latter concern is still, ironically, the best reason 
for the use of credit scores instead of default 
probabilities today on the retail side. From a 
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Figure 1 Five-Year Default Probabilities for Bank of America and Citigroup: January 1, 2007 to 
May 1, 2011 
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Figure 2 Five-Year Default Probabilities for Bank of America and Citigroup: January 1, 1990 to 
December 31, 2006 


banker's perspective, though, the score hides 
information that is known to the credit score 
vendor. The credit scoring vendor is actually us¬ 
ing the statistical techniques we describe below 
to derive a default probability for the consumer. 
They then hide it by scaling the default proba¬ 
bility to run from some arbitrary range like 600 
to 1,000 with 1,000 being best. 5 One scaling that 
does this, for example, is the formula: 

Credit score = 1,000 — 4 (Consumer 1-year 
default probability) 

This scaling formula hides the default proba¬ 
bility that Basel II requires and modern bankers 
are forced to "undo" by analyzing the mapping 
of credit scores to defaults. This just wastes ev¬ 
eryone's time for no good reason other than the 
desire to avoid angering retail borrowers with 
a cold-hearted default probability assessment. 

The only time a rating or credit score can out¬ 
perform a modern credit model is if there are 
variables missing in the credit model. Heading 
into the credit crisis as of December 31, 2006, 
for example, Citigroup had a roughly $50 bil¬ 
lion direct and indirect exposure to super senior 


tranches of collateralized debt obligations, but 
these exposures were not reported in a quanti¬ 
tative form and therefore could not be used in 
a quantitative credit model. A judgmental rat¬ 
ing in this case would be able to adjust for this 
risk if proper disclosure were made to the rat¬ 
ing agencies. This, however, is a rare case and 
in general a first-class modeling effort will be 
consistently superior. 6 


WHAT "THROUGH THE 
CYCLE" REALLY MEANS 

Financial market participants often comment 
that default probabilities span a specific pe¬ 
riod of time (30 days, 1 year, 5 years) while 
ratings are "through the cycle" ratings. What 
does "through the cycle" really mean? 

Figure 3 provides the answer. It shows the 
term structure of default probabilities out for 
10 years for Morgan Stanley on October 15, 
2008, one month after the collapse of Lehman 
Brothers, and July 7, 2011. The July 7, 2011 
term structure was quite low because business 
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Figure 3 Term Structure of Default Probabilities for Morgan Stanley on October 15, 2008 
and July 7, 2011 


conditions at the time were excellent. 7 Look¬ 
ing at the right-hand side of the curve, we can 
see that both default probability curves are con¬ 
verging and, if the graph is continued to a long 
enough maturity, both will hit about 42-50 basis 
points for a very long-term default probability. 

This is consistent with the "long-run" default 
experience for both Morgan Stanley's 2011 rat¬ 
ing of A. 8 Over the 15 years after being rated 
A, 2.77% of those formerly rated A defaulted. 
This is the same as a constant default rate over 
those 15 years of 18.7 basis points, a rate dou¬ 
ble the 8 basis point default rate in just the first 
one year after being rated A. Morgan Stanley 
is a higher than average risk for an A-rated 
company as it was forced to borrow as much 
as $61.3 billion from the Federal Reserve on 
September 29, 2008. 9 "Through the cycle" has 
a very simple meaning—it is a very long-term 
default probability that is totally consistent with 
the term structure of default probabilities of a well- 
specified model. What is the term? The major 
rating agencies are currently reporting about 
30 years of historical experience, so the answer 
is 30 years. 


VALUATION, PRICING, AND 
HEDGING 

Earlier in this entry, we said the best one- 
sentence test of a credit model is "what is the 
hedge?" That statement is no exaggeration, be¬ 
cause in order to be able to specify the hedge, we 
need to be able to value the risky credit (or port¬ 
folio of risky credits). If we can value the credits, 
we can price them as well. If we can value them, 
we can stress test that valuation as macroeco¬ 
nomic factors driving default probabilities shift. 
The pervasive impact of macroeconomic fac¬ 
tors on default probabilities Figure 1 shows for 
Bank of America and Citigroup makes obvious 
what is documented by van Deventer and Imai 
(2003). The business cycle drives default risk 
(and valuations) up and down. With this val¬ 
uation capability, we can meet one of the key 
objectives specified in this entry: We know the 
true value of everything we own and every¬ 
thing Wall Street wants us to buy or sell. We 
can see that the structured product offered at 
103 is in reality only worth 98. This capability 
is essential to meet modem risk management 
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standards. Just as important, it is critical in¬ 
surance against becoming yet another victim 
of Wall Street. 


EMPIRICAL DATA ON 
CREDIT SPREADS AND 
COMMON STOCK PRICES 

Before exploring the nature and performance 
of modern credit models, it is useful to look at 
the relationship between stock prices and credit 
spreads. Van Deventer and Imai (2003) print in 
its entirety a useful data series of new issue 
credit spreads compiled over a nine-year pe¬ 
riod beginning in the mid-1980s by First Inter¬ 
state Bancorp. First Interstate at the time was 
the seventh largest bank holding company in 
the United States, one of the largest debt issuers 
in the United States, and a company whose rat¬ 
ing ranged from AA to BB during the course 
of the data series. The credit spreads were the 
average credit spread quoted for a new issue of 
noncall debt of $100 million by six investment 
banking firms, with the high and low quota¬ 
tions thrown out. Data were collected weekly 
for 427 weeks. No yield curve smoothing or 
secondary market bond prices were necessary 
to get the spreads, as the spreads themselves 
were the pricing quotation. These data, in the 
author's judgment, are much more reliable than 
the average credit default swap spread avail¬ 
able since 2003 because of the extremely low 
volumes of credit default swap transactions re¬ 
ported by the Depository Trust and Clearing 
Corporation on www.dtcc.com. 

Jarrow and van Deventer (1998, 1999) first 
used these data to test the implications of credit 
models. They reported the following findings 
on the relationship between credit spreads and 
equity prices: 

* Stock prices and credit spreads moved in op¬ 
posite directions during the week 172-184 
times (depending on the maturity of the credit 
spread) of the 427 observations. 


* Stock prices and credit spreads were both un¬ 
changed in only 1-3 observations. 

• In total, only 40.7% to 43.6% of the observa¬ 
tions were consistent with the Merton model 
(and literally any of its single factor variants) 
of risky debt. 

This means that multiple variables are im¬ 
pacting credit spreads and stock prices, not the 
single variable (the value of company assets) 
that is the explanatory variable in any of the 
commercially available implementations of de¬ 
fault probabilities that are Merton related. We 
address this issue in detail in our discussion of 
the Merton model 10 and its variants in the fol¬ 
lowing section. The summary data on the First 
Interstate stock price and credit spreads are re¬ 
produced in Table 1. 


STRUCTURAL MODELS OF 
RISKY DEBT 

Modem derivatives technology was the first 
place analysts turned in the mid-1970s as they 
sought to augment Altman's early work on 
corporate default prediction with an analytical 
model of default. 11 The original work in this re¬ 
gard was done by Black and Scholes (1973) and 
Merton (1974). This early work and almost all of 
the more recent extensions of it share a common 
framework: 

• The assets of the firm are assumed to be per¬ 
fectly liquid and are traded in efficient mar¬ 
kets with no transactions costs. 

• The amount of debt is set at time zero and 
does not vary. 

• The value of the assets of the firm equal the 
sum of the equity value and the sum of the 
debt value, the original Modigliani and Miller 
assumptions. 

All of the analysts using this framework con¬ 
clude that the equity of the firm is some kind of 
option on the assets of the firm. An immediate 
implication of this is that one variable (except 
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Table 1 Analysis of Changes in First Interstate Bancorp Credit Spreads Stock Prices 



SPREAD 

2 Years 

SPREAD 

3 Years 

SPREAD 

5 Years 

SPREAD 

7 Years 

SPREAD 
10 Years 

Total 

Total Number of Data Points 

427 

427 

427 

427 

427 

2135 

Data Points Consistent with Merton 

Opposite Move in Stock Price and Spreads 

179 

178 

183 

172 

184 

896 

Stock Price and Credit Spreads Unchanged 

3 

3 

1 

2 

2 

11 

Total Consistent 

182 

181 

184 

174 

186 

907 

Percent Consistent 

With Merton Model 

42.6% 

42.4% 

43.1% 

40.7% 

43.6% 

42.5% 

Standard Deviation 

2.4% 

2.4% 

2.4% 

2.4% 

2.4% 

1.1% 

Standard Deviations from 100% Consistency 

-23.9 

-24.1 

-23.7 

-24.9 

-23.5 

-53.8 

Standard Deviations from 50% Consistency 

-3.1 

-3.2 

-2.9 

-3.9 

-2.7 

-7.0 


Source: van Deventer and Imai (2003). 


in the cases of random interest rates assumed 
below), the random value of company assets, 
completely determines stock prices, debt prices, 
and credit spreads. Except for the random in¬ 
terest rate versions of the model, this means 
that when the value of company assets rises, 
then stock prices should rise and credit spreads 
should fall. Table 1 rejects the hypothesis that 
this result is true by 23.5 to 24.9 standard devi¬ 
ations using the First Interstate data described 
earlier. In fact, as the First Interstate data show, 
stock prices and credit spreads move in the di¬ 
rection implied by various versions of the Mer¬ 
ton model only 40.7% to 43.6% of the time. Van 
Deventer and Imai (2003) report on a similar 
analysis for a large number of companies with 
more than 20,000 observations and find similar 
results. 

Given this inconsistency of actual market 
movements with the strongly restrictive as¬ 
sumption that only one variable drives debt 
and equity prices, why did analysts choose the 
structural models of risky debt in the first place? 
Originally, the models were implemented on 
the hope (and sometimes belief) that perfor¬ 
mance must be good. Later, once the perfor¬ 
mance of the model was found to be poor, this 
knowledge was known only to very large fi¬ 
nancial institutions who had an extensive credit 
model testing regime. One very large institu¬ 
tion, for example, told the author in 2003 that 
it had known for years that the most popu¬ 


lar commercial implementation of the Merton 
model of risky debt was less accurate than the 
market leverage ratio in the ordinal ranking of 
companies by riskiness. The firm was actively 
using this knowledge to arbitrage market par¬ 
ticipants who believed, but had not confirmed, 
that the Merton model of risky debt was accu¬ 
rate. We report on the large body of test results 
that began to enter the public domain in 1998 
in a later section. 

As analysts began to realize there were prob¬ 
lems with the structural models of risky debt, 
active attempts were made to improve the 
model. We present in the following paragraphs 
a brief listing of the types of assumptions that 
can be used in the structural models of risky 
debt. 12 


Pure Black-Scholes/Merton 
Approach 

The original Merton model assumes interest 
rates are constant and that equity is a European 
option on the assets of the firm. This means 
that bankruptcy can occur only at the maturity 
debt of the single debt instrument issued by the 
firm. Lando (2004, p. 14) notes a very important 
liability of the basic Merton model as the matu¬ 
rity of debt gets progressively shorter: "When 
the value of assets is larger than the face value 
of debt, the yield spreads go to zero as time 
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to maturity goes to 0 in the Merton model." 
This is a critical handicap in trying to use 
this one-period model as a complete valuation 
framework. If credit spreads are unrealistic, we 
cannot achieve accuracy in our one-sentence 
credit model test: What's the hedge? 

We note here that allowing for various classes 
of debt is a very modest extension of the 
model. Allowing for subordinated debt does 
not change the probability of default. The im¬ 
plicit loss given default will simply be higher 
for the subordinated debt issue than it will for 
the senior debt issue. 


Merton Model with Stochastic 
Interest Rates 

The Merton model with stochastic interest rates 
was published by Shimko, Tejima, and van De¬ 
venter (1993). This modest extension of the 
original Merton framework simply combined 
Merton's own model for options when interest 
rates are random with the structural credit risk 
framework. The model has the virtue of allow¬ 
ing two random factors (the risk-free short-term 
rate of interest and the value of company assets, 
which can have any arbitrary degree of corre¬ 
lation). It provides at least a partial explanation 
of the First Interstate results discussed above, 
but it shares most of the other liabilities of the 
basic Merton approach. 

The Merton Model with Jumps in 
Asset Values 

One of the most straightforward ways in which 
to make credit spreads more realistic is to 
assume that there are random jumps in the 
value of company assets, overlaid on top of 
the basic Merton assumption of geometric 
Brownian motion (i.e., normally distributed as¬ 
set returns and lognormally distributed asset 
values). This model produces more realistic 
credit spread values, but Lando (2004, p. 27) 
concludes, "while the jump-diffusion model is 


excellent for illustration and simulating the ef¬ 
fects of jumps, the problems in estimating the 
model make it less attractive in practical risk 
management." 

Introducing Early Default in the 
Merton Structural Approach 

In 1976, Black and Cox allowed default to oc¬ 
cur prior to the maturity of debt if the value of 
company assets hits a deterministic barrier that 
can be a function of time. The value of equity is 
the equivalent of a "down and out" call option. 
When there are dividend payments, model¬ 
ing gets much more complicated. Lando (2004, 
p. 33) summarizes key attributes of this mod¬ 
eling assumption: "While the existence of a de¬ 
fault barrier increases the probability of default 
in a Black-Cox setting compared with that in a 
Merton setting, note that the bond holders ac¬ 
tually take over the remaining assets when the 
boundary is hit and this in fact leads to higher 
bond prices and lower spreads." 

Other Variations on the 
Merton Model 

Other extensions of the model summarized by 
Lando include 

• A Merton model with continuous coupons 
and perpetual debt. 

• Stochastic interest rates and jumps with bar¬ 
riers in the Merton model. 

• Models of capital structure with stationary 
leverage ratios. 

Ironically, all current commercial implemen¬ 
tations of the Merton model for default prob¬ 
ability estimation are minor variations on the 
original Merton model or extremely modest 
extensions of Black and Cox (1976). In short, 
at best 34-year-old technology is being used. 
Moreover, all current commercial implementa¬ 
tions assume interest rates are constant, mak¬ 
ing failure of the "What's the hedge test" a 
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certainty for fixed income portfolio managers, 
the primary users of default technology. All of 
the problems raised in the previous section on 
the First Interstate dataset remain for all current 
commercial implementations. That has much 
to do with the empirical results summarized 
below. 


REDUCED-FORM MODELS 
OF RISKY DEBT 

The many problems with the major variations 
on the Merton approach led Jarrow and Turn- 
bull (1995) to elaborate on a reduced form 
of the original Merton model. In his options 
model for companies where the stock price is 
lognormally distributed, Merton allowed for a 
constant instantaneous default intensity. If the 
default event occurred, the stock price was as¬ 
sumed to go to zero. Merton derived the value 
of options on a defaultable common stock in 
a constant interest rates framework. Van De¬ 
venter (2006) shows how to use this Merton 
"reduced form" model to imply default proba¬ 
bilities from observable put and call options. 

Jarrow and Turnbull adopted this default in¬ 
tensity approach as an alternative to the Mer¬ 
ton structural approach. They did so under 
the increasingly popular belief that compa¬ 
nies' choices of capital structure vary dynam¬ 
ically with the credit quality of the firm, and 
that the assets they hold are often highly illiq¬ 
uid, contrary to the assumptions in the struc¬ 
tural approach. Duffie and Singleton (1999), 
Jarrow (2001), and many others have dramat¬ 
ically increased the richness of the original 
Jarrow-Turnbull model to include the following 
features: 

• Interest rates are random. 

• An instantaneous default intensity is also ran¬ 
dom and driven by interest rates and one or 
more random macroeconomic factors. 

• Bonds are traded in a less liquid market, and 
credit spreads have a "liquidity premium" 


above and beyond the loss component of the 
credit spread. 

* Loss given default can be random and driven 
by macroeconomic factors as well. 

Default intensities and the full term structure 
of default probabilities can be derived in two 
ways: 

* By implicit estimation, from observable bond 
prices, credit default swap prices, or options 
prices or any combination of them 

* By explicit estimation, using a historical de¬ 
fault database 

The first commercial implementation on a 
sustained basis of the latter approach was the 
2002 launch of the Kamakura Risk Informa¬ 
tion Services multiple models default proba¬ 
bility service, which includes both Merton and 
reduced form models benchmarked in histor¬ 
ical default data bases. The first commercial 
implementation of this approach for sovereign 
default risk assessment was also by Kamakura 
Risk Information Services in 2008. 

In deriving default probabilities from histor¬ 
ical data, financial economists have converged 
on a hazard rate modeling estimation proce¬ 
dure using logistic regression, where estimated 
default probabilities P[f] are fitted to a historical 
database with both defaulting and nondefault¬ 
ing observations and a list of explanatory vari¬ 
ables Xj. Chava and Jarrow (2004) prove that the 
logistic regression is the maximum likelihood 
estimator when trying to predict a dependent 
variable that is either one (i.e., in the default 
case) or zero (in the "no default" case): 

n 

P[f] = 1/[1 + exp(—a— AXi)] 

i=l 

This simple equation makes obvious the most 
important virtue of the reduced form approach. 
The reduced form approach can employ any 
variable, without restriction, that improves the 
quality of default prediction, because any vari¬ 
able can contribute in the equation above in¬ 
cluding Merton default probabilities if they 
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have explanatory power. This means that the 
reduced form approach can never be worse 
than the Merton model because the Merton 
model can always be an input. The reverse is 
not true—the charge card balance of the chief 
executive officer is a well-known predictor of 
small business default, but the Merton default 
formulas do not have the flexibility to use this 
insight. Note also that the linear function in 
the denominator can be thought of as Altman's 
1968 z-score concept. In that sense, the reduced 
form/logistic regression approach has both Alt¬ 
man's work and Merton's work as ancestors. 

In short, reduced form models can be the re¬ 
sult of unconstrained variable selection among 
the full set of variables that add true eco¬ 
nomic explanatory power to default prediction. 
The Merton model, in any variation, is a con¬ 
strained approach to default estimation because 
the mathematical formula for the model does 
not allow many potential explanatory variables 
to be used. 

Most importantly, the logistic regression ap¬ 
proach provides a solid opportunity to test 
whether in fact the Merton model does have 
the problems one would predict from the First 
Interstate data discussed above. We turn to that 
task now. 


EMPIRICAL EVIDENCE ON 
MODEL PERFORMANCE 

Shumway and Bharath (2008) conduct an ex¬ 
tensive test of the Merton approach. They 
test two hypotheses. Hypothesis 1 is that the 
Merton model is a "sufficient statistic" for the 
probability of default, that is, a variable so pow¬ 
erful that in a logistic regression like the formula 
in the previous section no other explanatory 
variables add explanatory power. Hypothesis 
2 is the hypothesis that the Merton model adds 
explanatory power even if common reduced 
form model explanatory variables are present. 
They specifically test modifications of the Mer¬ 
ton structure partially disclosed by commercial 


vendors of the Merton model. The Shumway 
and Bharath (2008) conclusions, based on all 
publicly traded firms in the United States (ex¬ 
cept financial firms) using quarterly data from 
1980 to 2003 are as follows: 13 

• "We conclude that the ... Merton model does 
not produce a sufficient statistic for the prob¬ 
ability of default." 

• "Models 6 and 7 include a number of other co¬ 
variates: the firm's returns over the past year, 
the log of the firm's debt, the inverse of the 
firm's equity volatility, and the firm's ratio 
of net income to total assets. Each of these 
predictors is statistically significant, making 
our rejection of hypothesis one quite robust. 
Interestingly, with all of these predictors in¬ 
cluded in the hazard model, the ... Merton 
probability is no longer statistically signifi¬ 
cant, implying that we can reject hypothesis 
two." 

• "Looking at CDS implied default probabil¬ 
ity regressions and bond yield spread regres¬ 
sions, the ... Merton probability does not 
appear to be a significant predictor of either 
quantity when our naive probability, agency 
ratings and other explanatory variables are 
accounted for." 

These conclusions have been confirmed by 
Kamakura Corporation in five studies done in 
2002, 2003, 2004, 2006, and 2011. The current 
Kamakura default database includes more than 
1.76 million monthly observations on all pub¬ 
lic companies in North America from 1990 to 
December 2008, including 2,046 defaulting ob¬ 
servations. Both hypotheses 1 and 2 were tested 
in the context of a "hybrid" model, which adds 
the Kamakura Merton implementation as an 
additional explanatory variable alongside the 
Kamakura reduced form model inputs. In ev¬ 
ery case, Kamakura agrees with Shumway and 
Bharath that hypothesis 1 can be strongly re¬ 
jected. Kamakura has found 49 other variables 
that are statistically significant predictors of de¬ 
fault even when Merton default probabilities 
are added as an explanatory variable. 
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Somewhat different from Shumway and 
Bharath, Kamakura finds that the Merton de¬ 
fault probability has weak statistical signifi¬ 
cance when added as an explanatory variable to 
these other 49 variables, but the coefficient on 
the Merton default probability has the wrong 
sign; when Merton default probabilities rise, 
the predicted hybrid default probabilities fall. 
This is because Merton default probabilities are 
highly correlated with other variables like the 
market leverage ratio (which was mentioned 
above as out-predicting the commercial Merton 
implementation) and the ratio of total liabili¬ 
ties to total assets. It is an interesting economet¬ 
ric question whether the Merton input variable 
should be retained in such an event. 

These findings were indirectly confirmed in 
Bohn, Arora, and Korablev (2005), in which 
Moody's for the first time releases quantitative 
test results on their Merton implementation. 
In that paper, the authors report on the rela¬ 
tive accuracy of their proprietary Merton im¬ 
plementation compared to the more standard 
Merton theoretical implementation; they state 
that on a relatively easy data set (1996-2004 
with small firms and financial institutions ex¬ 
cluded) the proprietary Merton implementa¬ 
tion has a receiver operating characteristics 
(ROC) accuracy ratio 7.5% higher than the stan¬ 
dard Merton implementation. 14 This puts the 
accuracy of the Moody's model more than 
5% below that reported on a harder data set 
(all public firms of all sizes, including banks, 
1990-2004) in the Kamakura Risk Information 
Services Technical Guide, Version 4.1 (2005) and 
again in the Kamakura Risk Information Ser¬ 
vices Guide, Version 5.0 (2010) on data span¬ 
ning 1990-2008. The accuracy is also well be¬ 
low reduced form model accuracy published 
in Bharath and Shumway (2008), Campbell, 
Hilscher, and Szilagyi (2008), Hilscher and Wil¬ 
son (2011), van Deventer and Imai (2003), and 
van Deventer, Imai, and Mesler (2004). The stan¬ 
dard Merton accuracy ratio reported by Bohn, 
Arora, and Korablev (2005) is identical to that 
reported by Kamakura on a harder data set. It is 


not surprising that there were no comparisons 
to reduced-form models using logistic regres¬ 
sion in Bohn, Arora, and Korablev. 


KEY POINTS 

• Ratings date from the founding of a prede¬ 
cessor of Standard & Poor's in 1860. The very 
existence of ratings as a credit assessment tool 
dates from an era when computers did not ex¬ 
ist and the electronic transmission of financial 
information was impossible. 

• Because of this history, ratings are extremely 
simple ordinal rankings of firms or other 
counterparties by a small number of ratings 
grades, 21 grades in the case of the U.S. rating 
agencies. 

• Ratings have no explicit maturity and no 
explicit default probability associated with 
them. 

• For consumer credit risk assessment, "credit 
scores" are similar to ratings in that they are 
an ordinal risk measure, they have no matu¬ 
rity, and they have no explicit default proba¬ 
bility associated with the score. While some 
credit bureaus state that credit scores rank 
the risk of a 90-day past due experience over 
24 months, they are used on the full spectrum 
of credits from charge cards to 30-year mort¬ 
gages. 

• Unlike ratings, which have both qualitative 
and quantitative inputs to the process, the cre¬ 
ation of credit scores is fully automated and 
based on a sophisticated statistical process. 

• In the modern era, there is no need for either 
ratings or credit scores if the credit analyst has 
access to best in class default probabilities for 
a full term structure of time horizons for each 
counterparty. 

• The ratings debate about "point in time" and 
"through the cycle" is a distinction without a 
difference. All ratings reflect information as of 
the ratings announcement date, as do default 
probabilities, so they are in that sense a "point 
in time." The longest term default probability 
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is the best measure of long-term risk and the 
shortest term default probability is the best 
measure of short-term risk. The longest de¬ 
fault probability is "through the cycle" if the 
maturity is long enough. The maturity of the 
rating has never been clearly articulated by 
the rating agencies themselves. 

• The first attempts at measuring default prob¬ 
abilities were based on the early work by 
Robert Merton nearly 40 years ago. Merton's 
theory is simple and has intuitive appeal. 

• The Merton model has not been accurate in 
practical use because it is based on assump¬ 
tions that are simply not true: that common 
stock prices and bond prices are driven by 
only one factor, the value of company assets, 
and that company assets are perfectly liquid. 

* A modern reduced form approach will al¬ 
ways be more accurate than the Merton ap¬ 
proach because the reduced form approach 
can employ any input that makes economic 
sense and improves accuracy. 

* Logistic regression is the maximum likeli¬ 
hood estimator for prediction of a variable 
that has a zero (no default) or one (default) 
value. 

* Reduced form default models were intro¬ 
duced by Jarrow based on an early continu¬ 
ous time default model by Merton. Empirical 
evidence suggests reduced form models are 
more accurate than ratings and the Merton 
approach in predicting default. 

• Reduced form default models were first 
launched commercially in 2002 for public 
firms and in 2008 for sovereigns. They are also 
in wide use for predicting default of retail and 
small business clients. 

NOTES 

1. For a detailed discussion of the objectives 
of the credit risk modeling process, see van 
Deventer and Imai (2003). 

2. Financial Times, April 8, 2008. 

3. For evidence in this regard, see Bharath and 
Shumway (2008), Campbell, Hilscher, and 


Szilagyi (2008), and Hilscher and Wilson 

( 2011 ). 

4. The exact amounts, dates, and terms of bor¬ 
rowing are available at www.frb.gov. 

5. Typically, the range of credit scores runs from 
300 to 850 in the United States. There are 
differences by region and by vendor in the 
range used. 

6. For examples, see Hilscher and Wilson (2011) 
and Kamakura Corporation press releases on 
March 15, 2006 and March 8,2011. 

7. The Kamakura Corporation troubled com¬ 
pany index measures the percent of public 
firms that are "troubled," defined as firms 
with annualized 1 month default risk over 
1%. This index was only 6% in July 2011, and 
it was near 25% in October 2008. 

8. See Table 24 in Standard & Poor's (2011). 

9. See "Case Studies in Liquidity Risk: 
Morgan Stanley," Kamakura blog, www 
.kamakuraco.com. May 31, 2011. 

10. See Merton (1974). 

11. See Altman (1968). 

12. For a summary of the extensions of the 
model, see Chapter 2 in Lando (2004). 

13. Quotations are from an unpublished 2004 
version of the paper, rather than the final 
2008 published version, as some of the au¬ 
thor 's insights were removed during the ed¬ 
itorial process. 

14. The difference is 15% on the equivalent cu¬ 
mulative accuracy profile basis, which is 
scaled from 0 to 100, compared to a 50-100 
scale for the ROC accuracy ratio. 
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Abstract: The two primary types of credit risk models that seek to statistically describe default 
processes are the reduced-form model and the structural model. The most extended types of 
reduced-form models are the intensity models. There are three main approaches to incorporate 
credit risk correlation among firms within the framework of reduced models. The first approach, 
the conditionally independent defaults models, introduces credit risk dependence among firms 
through the dependence of the firms' default intensity processes on a common set of state variables. 
Contagion models extend the conditionally independent defaults approach to account for default 
clustering (periods in which the firms' credit risk is increased and in which the majority of the 
defaults take place). Finally, default dependencies can also be accounted for using copula functions. 
The copula approach takes as given the marginal default probabilities of the different firms and 
plugs them into a copula function, which provides the model with the default dependence structure. 


There are two primary types of models in the 
literature that attempt to describe default pro¬ 
cesses for debt obligations and other default- 
able financial instruments, usually referred to as 
structural and reduced-form (or intensity) models. 

Structural models use the evolution of firms' 
structural variables, such as asset and debt val¬ 
ues, to determine the time of default. Merton's 
model (1974) was the first modern model of 
default and is considered the first structural 
model. In Merton's model, a firm defaults if, 
at the time of servicing the debt, its assets 
are below its outstanding debt. A second ap¬ 
proach within the structural framework was 
introduced by Black and Cox (1976). In this ap¬ 
proach defaults occur as soon as a firm's asset 
value falls below a certain threshold. In con¬ 


trast to the Merton approach, default can occur 
at any time. 

Reduced-form models do not consider the re¬ 
lation between default and firm value in an ex¬ 
plicit manner. Intensity models represent the 
most extended type of reduced-form models. 1 
In contrast to structural models, the time of de¬ 
fault in intensity models is not determined via 
the value of the firm, but it is the first jump 
of an exogenously given jump process. The pa¬ 
rameters governing the default hazard rate are 
inferred from market data. 

Structural default models provide a link be¬ 
tween the credit quality of a firm and the firm's 
economic and financial conditions. Thus, de¬ 
faults are endogenously generated within the 
model instead of exogenously given as in the 
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reduced approach. Another difference between 
the two approaches refers to the treatment of 
recovery rates: Whereas reduced models ex¬ 
ogenously specify recovery rates, in structural 
models the value of the firm's assets and liabil¬ 
ities at default will determine recovery rates. 

This entry focuses on the intensity approach, 
analyzing various models and reviewing the 
three main approaches to incorporate credit risk 
correlation among firms within the framework 
of reduced-form models. 


PRELIMINARIES 

In this section, we fix the information and prob¬ 
abilistic framework we need to develop the 
theory of reduced-form models. After present¬ 
ing the basic features of reduced models and 
the motivation of the default intensity through 
Poisson processes, we apply these concepts to 
the specification of single firm default proba¬ 
bilities and to the valuation formulas for de- 
faultable and defeault-free bonds. Finally, we 
analyze the different treatments the recovery 
rate has received in the literature. 


Information Framework 

For the purposes of this investigation, we shall 
always assume that economic uncertainty is 
modeled with the specification of a filtered 
probability space n = (£2, T, (T f ), P), where JT2 
is the set of possible states of the economic 
world, and P is a probability measure. The filtra¬ 
tion {Tt ) represents the flow of information over 
time. T = a (Ut>o -^t) is a u-algebra, a family of 
events at which we can assign probabilities in 
a consistent way. 2 Before continuing with the 
exposition, let us make some remarks about the 
choice of the probability space. 

First, we assume, as a starting point, that we 
can fix a unique physical or real probability 
measure P, and we consider the filtered prob¬ 
ability space ft = (Q, T, (Tt) , P). The choice of 


the probability space will vary in some respects, 
according to the particular problems under con¬ 
sideration. In the rest of the entry, as we indi¬ 
cated above, we shall regularly make use of a 
probability measure P, that will be assumed to 
be equivalent to P. The choice of P then varies 
according to the context. 

The model for the default-free term structure 
of interest rates is given by a non-negative, 
bounded and (JF f )-adapted default-free short- 
rate process r t . The money market account 
value process is given by: 

Pt = exp ^ r s ds^j (1) 

For our purposes we shall use the class of 
equivalent probability measures P, where non- 
dividend-paying (NDP) asset processes dis¬ 
counted by the money market account are 
((T t ), P)-martingales, that is, where P is an 
equivalent probability measure that uses the 
money market account as numeraire. 3 Such an 
equivalent measure is called a risk neutral mea¬ 
sure, because under this probability measure 
the investors are indifferent between investing 
in the money market account or in any other as¬ 
set. There are different scenarios under which 
the transition from the physical to the equiv¬ 
alent (or risk neutral) probability measure can 
usually be accomplished. 

We present a mathematical framework that 
will embody essentially all models used 
throughout this entry. Nevertheless, more gen¬ 
eral frameworks can be considered. On our 
probability space n we assume that there 
exists an RJ -valued Markov process X t = 
(Xi j,...,Xj'tY or background process, that 
represents / economy-wide variables, either 
state (observable) or latent (not observable). 4 
There also exist I counting processes, N Ll/ i = 
1,, I, initialized at 0, that represent the de¬ 
fault processes of the I firms in the econ¬ 
omy such that the default of the zth firm 
occurs when Nq jumps from 0 to 1. (Gx.t) 
and (Gi.t), where Qx,t — o (X s , 0 < s < t) and 
Gi t = a (Nij, 0 < s < f), represent the filtrations 
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generated by X t and N l t respectively. The filtra¬ 
tion (Gx,t) represents information about the de¬ 
velopment of general market variables and all 
the background information, whereas (C?/ 1 ) only 
contains information about the default status of 
firm i. 

The filtration {T t ) contains the information 
generated by both the information contained 
in the state variables and the default processes: 

(Ft) = ( Gx,t ) v (Gi, t ) v ... v (G Lt ) (2) 

We also define the filtrations {Ti.t), i = 1 
as 

(Ti,t) = (Gx,t) v (Gi,t) (3) 

which only accumulate the information gener¬ 
ated by the state variables and the default status 
of each firm. 


Poisson and Cox Processes 

Poisson processes provide a convenient way of 
modeling default arrival risk in intensity-based 
default risk models. 5 In contrast to structural 
models, the time of default in intensity mod¬ 
els is not determined via the value of the firm, 
but instead is taken to be the first jump of a 
point process (for example, a Poisson process). 
The parameters governing the default intensity 
(associated with the probability measure P) are 
inferred from market data. 

First, we recall the formal definition of Pois¬ 
son and Cox processes. Consider an increasing 
sequence of stopping times (r h < hi+i)- We de¬ 
fine a counting process associated with that se¬ 
quence as a stochastic process N t given by 

M = (4) 

h 

A (homogeneous) Poisson process with inten¬ 
sity X > 0 is a counting process whose incre¬ 
ments are independent and satisfy 

P[N f -N g =n\ = i(f-s)”k n exp(- (t-s)X) 

( 5 ) 


for 0 < s < f, that is, the increments N t — N s are 
independent and have a Poisson distribution 
with parameter X (f — s) for s < t. 

So far, we have considered only the case of 
homogeneous Poisson processes where the de¬ 
fault intensity is a constant parameter X, but 
we can easily generalize it allowing the de¬ 
fault intensity to be time dependent X t = X (f), 
in which case we would talk about unhomoge- 
neous Poisson processes. 

If we consider stochastic default intensities, 
the Poisson process would be called a Cox pro¬ 
cess. For example, we can assume X t follows a 
diffusion process of the form 

dXf = p. (f, Xt)dt + a (f, Xt) dWt (6) 

where W t is a Brownian motion. We can also 
assume that the intensity is a function of a set 
of state variables (economic variables, interest 
rates, currencies, etc.) X t , that is, X t = X (f, X t ) . 

The fundamental idea of the intensity-based 
framework is to model the default time as the 
first jump of a Poisson process. Thus, we define 
the default time to be 


r = inf {t e R + \ N t > 0} (7) 


The survival probabilities in this setup are 
given by 


P[N f = 0] = P[r >t] = E 


exp 



( 8 ) 


The intensity, or hazard rate, is the conditional 
default arrival rate, given no default: 


P[r e (t, t + h] \ r > t] 
lim- 

ft->o li 


f(t) 

1 ~F(t) 




(9) 


where 


F(f) = P[t<f] (10) 

and /(f) is the density of F. 

The functions p(t, T) = P [r < T | r > f] and 
s (f, T) = P [t > T | r > f] are the risk neutral 
default and survival probabilities from time t 
to time T respectively, where 0 < f < T. Note 
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that s (f, T) = 1 — p(t, T) , and if we fix t = 0, 
then p (0, T) = F (T). 

The hazard or intensity rate X t is the central 
element of reduced form models, and repre¬ 
sents the instantaneous default probability, that 
is, the (very) short-term default risk. 


Pricing Building Blocks 

This section reviews the pricing of risk-free 
and defaultable zero-coupon bonds, which to¬ 
gether with the default/survival probabilities 
constitute the building blocks for pricing credit 
derivatives and defaultable instruments. 

We assume a perfect and arbitrage-free cap¬ 
ital market, where the money market account 
value process p t is given by (1). Since our prob¬ 
ability measure P takes as numeraire the money 
market account process, the value of any NDP- 
asset discounted by the money market account 
follows an ((Tt) , P)-martingale. Using the pre¬ 
vious property, the price at time t of a default- 
free zero coupon bond with maturity T and face 
value of 1 unit is given by 


P(t,T) = faE 


' P(T , T) 


= £ 


exp 


-jf 


Pt 


T t 


r s ds | T t 


( 11 ) 


From the previous section we know that the 
survival probability s (f, T) in the risk-neutral 
measure can be expressed as 


s(f,T) 

= P[r > T | r > f] 


/ r T \ 

= £ 

exp l-J X s dsj T 


( 12 ) 


Consider a defaultable zero coupon bond is¬ 
sued by firm i with maturity T and face value 
of M units that, in case of default at time r < 
T, generates a recovery payment of R r units. 
R t is an (.F^-adapted stochastic process, with 
Rt = 0 for all t > T. 6 The price of the default- 
able coupon bond at time f, (0 < t < T), is given 


by 


Q(t, T) = faE 
= faE 


Q(T . T) 
Pt 


T t 


Mi |r> n 

1 Tt 

+ PtE 

[Rr 1 

—E 1 T t 

Pt 



VPr J 
( 


(13) 


which can be expressed as 7 


Q(t,T) = E 


exp 


- J (h + k s ) M 


T t 


f R S X S exp ^ 


(r u + X u )du) ds | Tt 


(14) 


assuming r > t and all the technical conditions 
that ensure that the expectations are finite. 8 
This expression has to be evaluated consider¬ 
ing the treatment of the recovery payment and 
any other assumptions about the correlations 
between interest rates, intensities, and recover¬ 
ies. The first term represents the expected dis¬ 
counted value of the payment of M units at time 
T, taking into account the possibility that the 
firm may default and the M units not received, 
through the inclusion of the hazard or inten¬ 
sity rate (instantaneous probability of default) 
in the discount rate. The second term represents 
the expected discounted value of the recovery 
payment using the risk-free rate plus the in¬ 
tensity rate as discount factor. The first integral 
in the second term of the previous expression, 
from t to T, makes reference to the fact that de¬ 
fault can happen at any time between t and T. 
Thus, for each s e (t, T], we discount the value 
of the recovery rate R s times the instantaneous 
probability of default at time s given that no de¬ 
fault has occurred before, which is given by the 
intensity X s . 


Recovery Rates 

Recovery rates refer to how we model, after 
a firm defaults, the value that a debt instru¬ 
ment has left. 9 In terms of the recovery rate 
parametrization, three main specifications have 
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been adopted in the literature. The first one 
considers that the recovery rate is an exoge¬ 
nous fraction of the face value of the defaultable 
bond (recovery of face value, RFV). 10 Jarrow 
and Turnbull (1997) consider the recovery rate 
to be an exogenous fraction of the value of an 
equivalent default-free bond (recovery of trea¬ 
sury, RT). Finally, Duffie and Singleton (1999a) 
fix a recovery rate equal to an exogenous frac¬ 
tion of the market value of the bond just before 
default (recovery of market value, RMV). 

The RMV specification has gained a great deal 
of attention in the literature thanks to, among 
others, Duffie and Singleton (1999a). Consider 
a zero-coupon defaultable bond, which pays M 
at maturity T if there is no default prior to ma¬ 
turity and whose payoff in case of default is 
modeled according to the RMV assumption. 
They show that this bond can be priced as if 
it were a default-free zero-coupon bond, by re¬ 
placing the usual short-term interest rate pro¬ 
cess r t with a default-adjusted short rate process 
nt = rt + k t L t . L t is the expected loss rate in the 
market value if default were to occur at time f, 
conditional on the information available up to 
time t : 


R r = (1 - L t )Q(t_, T) (15) 

Q(r_,T) = limQ(s,T) (16) 


where r is the default time, Q(r_, T) the mar¬ 
ket price of the bond just before default, and 
R t the value of the defaulted bond. Duffie and 
Singleton (1999a) show that (14) can be ex¬ 
pressed as 


Q(t,T) = E 




M | T t 


(17) 


This expression shows that discounting at the 
adjusted rate n t accounts for both the probabil¬ 
ity and the timing of default, and for the effect of 
losses at default. But the main advantage of the 
previous pricing formula is that, if the mean loss 
rate k t L t does not depend on the value of the de¬ 
faultable bond, we can apply well-known term 
structure processes to model jq instead of q to 


price defaultable debt. One of the main draw¬ 
backs of this approach is that since k t L t appears 
multiplied in jq, in order to be able to estimate 
k t and L t separately using data of defaultable 
instruments, it is not enough to know default- 
able bond prices alone. We would need to have 
available a collection of bonds that share some, 
but not all, default characteristics, or deriva¬ 
tive securities whose payoffs depend, in differ¬ 
ent ways, on k t and L f . In case k t and L t are 
not separable, we shall have to model the prod¬ 
uct k t L t (which represents the short-term credit 
spread). 11 This identification problem is the rea¬ 
son why most of the empirical work that tries to 
estimate the default intensity process from de¬ 
faultable bond data uses an exogenously given 
constant, that is, L t = L for all f, recovery rate. 12 

The previous valuation formula allows one 
to introduce dependencies between short-term 
interest rates, default intensities, and recovery 
rates (via state variables, for example). 

From a pricing point of view, the above pric¬ 
ing formula allows us to include the case in 
which, as is often seen in practice after a default 
takes place, a firm reorganizes itself and contin¬ 
ues with its activity. If we assume that after each 
possible default the firm is reorganized and the 
bondholders lose a fraction L t of the predefault 
bond's market value, Giesecke (2002a) shows 
that letting L t be a constant, that is , L t = L for 
all f, the price of a default risky zero-coupon 
bond is, as in the case with no reorganization, 
given by (17). 

Another advantage of this framework is that 
it allows one to consider liquidity risk by in¬ 
troducing a stochastic process It as a liquidity 
spread in the adjusted discount process jq; that 
is, itt — y t + kfLt + It- 


SINGLE ENTITY 

The aim of this section is to develop some tools 
in the modeling of intensity processes, in or¬ 
der to build the models for default correlation. In 
case we consider a deterministic specification 
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for default intensities, it is natural to think of 
time dependent intensities, in which A f = A (f), 
where A (f) is usually modeled as either a con¬ 
stant, linear, or quadratic polynomial of the time 
to maturity. 13 

The treatment of default-free interest rates, 
the recovery rate, and the intensity process dif¬ 
ferentiates each intensity model. 

It is interesting to note that the difference be¬ 
tween the pricing formulas of default-free zero- 
coupon bonds and survival probabilities in the 
intensity approach lies in the discount rate: 


P (0, t) = E 


s (0, t) = E 



(18) 

(19) 


This analogy between intensity-based default 
risk models and interest rate models allows us 
to apply well-known short-rate term models to 
the modeling of default intensities. 

Schonbucher (2003) enumerates several char¬ 
acteristics that an ideal specification of the in¬ 
terest rate r t and the default intensity A t should 
have. First, both r t and A f should be stochas¬ 
tic. Second, the dynamics of r t and A f should 
be rich enough to include correlation between 
them. Third, it is desirable to have processes 
for r t and A f that remain positive at all times. 
And finally, the easier the pricing of the pricing 
building blocks, the better. 

We start with a general framework, making 
use of the Markov process X t = (Xij,..., Xjj)', 
which represents / state variables. The most 
general process for X f that we shall consider 
is called a basic affine process, which is an ex¬ 
ample of an affine jump diffusion given by 


dXjj — Kj (0j — Xy f j dt T n j Xj fd Wy f T dcj j f 

( 20 ) 


for j = 1,where W ; , t is an ((F t ) , P)- 
Brownian motion. Kj and 9j represent the mean 
reversion rate and level of the process, and oj is 
a constant affecting the volatility of the process. 
dq denotes any jump that occurs at time t of 


a pure-jump process q jj, independent of W j t , 
whose jump sizes are exponentially distributed 
with mean p y and whose jump times are inde¬ 
pendent Poisson random variables with inten¬ 
sity of arrival gj (j um P times and jump sizes 
are also independent). By modeling the jump 
size as an exponential random variable, we re¬ 
strict the jumps to be positive. This process is 
called a basic affine process with parameters 
(K h ej,tTj,n h Yj). u 

Making r t and A f dependent on a set of com¬ 
mon stochastic factors X t , one can introduce 
randomness and correlation in the processes of 
r f and Af. Moreover, if we use basic affine pro¬ 
cesses for the common factors X f , we can make 
use of the following results, which will yield 
closed-form solutions for the building blocks 
we examined in the previous section: 15 


1. For any discount-rate function <p : Rj -» R 
and any function g : Rf —> R, if Xf is a 
Markov process (which holds in the case of 
basic affine process), then 


exp 


jf (p(X u )dujg(X t ) | 


= H(X S ) 
( 21 ) 


for 0 < s < t and for some function H : 
R J -* R. 

2. Defining an affine function as constant plus 
linear, if (p ( x ) and g ( x ) are affine functions 
(<p (x) = a 0 + ci\X\ + ... + cijXj and g(x) = 
b 0 + b\Xi + ... + bj Xj ) then, as shown by 
Duffie, Pan, and Singleton (2000), if X t is 
an affine jump-diffusion process, it is veri¬ 
fied that H (X s ) can be expressed in closed 
form by 

H (X s ) = exp (a (s, t) + 0 (s, t) ■ X s ) (22) 

for some coefficients a (s, f), 9\ (s, t), ..., 
6] (s, t) which are functions, also in closed 
form, of the parameters of the model. 16 

Observing that our pricing building blocks 
P (f, T), s (f, T) and Q (f, T) are special cases of 
the previous expressions, one realizes the gains 
in terms of tractability achieved by the use of 
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affine processes in the modelling of the default 
term structure. 17 In order to make use of this 
tractability the state variables X f should follow 
affine processes, and the specification for the 
risk-adjusted rate n t should be an affine func¬ 
tion of the state variables. 

Consider the case in which the X\ t,..., Xj t 
follow (20). If we eliminate the jump component 
from the process of Xy f 

dX jit = Kj (dj - Xj, t ) dt + cijy/XjjdWj't (23) 

we obtain the CIR process, and eliminating the 
square root of Xj t 

dXjj = Kj ( 0j - X jit ) dt + aj dWjj (24) 

we end up with a Vasicek model. 

Various reduced-form models differ from 
each other in their choices of the state variables 
and the processes they follow. In the models we 
consider below, the intensity and interest rate 
are linear, and therefore affine, functions of X f , 
where X f are basic affine processes. 18 

One can consider expressions for r t and a, of 
the general form 

ft = flo.r (f) + O.r (f) X u + . . . + Cljj (t) Xj t 

(25) 

kf = (f) + a l,X (t) X U +•■•-)- dj, X (f) Xj t 

(26) 

for some deterministic (possibly time- 
dependent) coefficients and a ]t , 

] = 1,.../. This type of model allows us 
to treat r t and X t as stochastic processes, to 
introduce correlations between them, and to 
have analytically tractable expressions for 
pricing the building blocks. A simple example 
could be 


dr t = K r (6 r — r f ) dt + o r ^Jr~ t d]N r t (27) 
dX t = K\ (Ox — Xf) dt 

+ ox-y/Xf dWx.t + dcj kt (28) 

dW r t dWx't = pdt (29) 

in which the state variables are r t and X t them¬ 
selves, whose Brownian motions are correlated. 


Duffie (2005) presents an extensive review 
of the use of affine processes for credit risk 
modeling using intensity models, and applies 
such models to price different credit derivatives 
(credit default swaps, credit guarantees, spread 
options, lines of credit, and ratings-based step- 
up bonds.) 


Default Times Simulation 

Letting It be a uniform (0,1) random variable 
independent of (Sx.t), the time of default is de¬ 
fined by 



Equivalently, we can let be an exponentially 
distributed random variable with parameter 1 
and independent of (Gx.t) an d define the default 
time as 


r = inf 


t > 0 


X s ds > 


(31) 


Once we have specified and calibrated the dy¬ 
namics of Xf, we can easily simulate default 
times using a simple procedure based on the 
two previous definitions. First, we simulate a 
realization u of a uniform [0,1] random variable 
U and choose r such that exp (— f* X s ds) = u. 
Equally, we can simulate a random variable ij 
exponentially distributed with parameter 1 and 
fix r such that f* X s ds = ij. 


DEFAULT CORRELATION 

This section reviews the different approaches to 
model the default dependence between firms in 
the reduced-form approach. With the tools pro¬ 
vided in the previous section we can calculate 
the survival or default probability of a given 
firm in a given time interval. The next natural 
question to ask ourselves concerns the default 
or survival probability of more than one firm. If 
we are currently at time t (0 < t < T) and no de¬ 
fault has occurred so far, what is the probability 
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that n > 1 different firms default before time T? 
or, what is the probability that they all survive 
until time T? 

Schonbucher (2003), again, points out some 
properties that any good approach to model de¬ 
pendent defaults should verify. First, the model 
must be able to produce default correlations 
of a realistic magnitude. Second, it has to do 
it by keeping the number of parameters in¬ 
troduced to describe the dependence structure 
under control, without growing dramatically 
with the number of firms. Third, it should be 
a dynamic model, able to model the number 
of defaults as well as the timing of defaults. 
Fourth, since it is clear from the default history 
that there are periods in which defaults may 
cluster, the model should be capable of repro¬ 
ducing these periods. And fifth, the easier the 
calibration and implementation of the model, 
the better. 

We can distinguish three different approaches 
to model default correlation in the literature 
of intensity credit risk modeling. The first 
approach introduces correlation in the firms' 
default intensities making them dependent on 
a set of common variables X f and on a firm 
specific factor. These models have received the 
name of conditionally independent defaults 
(CID) models, because conditioned to the re¬ 
alization of the state variables X f , the firm's 
default intensities are independent as are the 
default times that they generate. Apparently, 
the main drawback of these models is that they 
do not generate sufficiently high default corre¬ 
lations. Flowever, Yu (2002a) indicates that this 
is not a problem of the model per se, but rather 
an indication of the lack of sophistication in the 
choice of the state variables. 

Two direct extensions of the CID approach 
try to introduce more default correlation in the 
models. One is the possibility of joint jumps 
in the default intensities (Duffie and Singleton 
1999b) and the other is the possibility of default- 
event triggers that cause joint defaults (Duffie 
and Singleton 1999b, Kijima 2000, and Kijima 
and Muromachi 2000). 


The second approach to model default corre¬ 
lation, contagion models, relies on the works by 
Davis and Lo (1999) and Jarrow and Yu (2001). 
It is based on the idea of default contagion in 
which, when a firm defaults, the default inten¬ 
sities of related firms jump upwards. In these 
models default dependencies arise from direct 
links between firms. The default of one firm in¬ 
creases the default probabilities of related firms, 
which might even trigger the default of some of 
them. 

The last approach to model default corre¬ 
lation makes use of copula functions. A cop¬ 
ula is a function that links univariate marginal 
distributions to the joint multivariate distri¬ 
bution with auxiliary correlating variables. 
To estimate a joint probability distribution of 
default times, we can start by estimating the 
marginal probability distributions of individual 
defaults, and then transform these marginal es¬ 
timates into the joint distribution using a cop¬ 
ula function. Copula functions take as inputs 
the individual probabilities and transform them 
into joint probabilities, such that the depen¬ 
dence structure is completely introduced by the 
copula. 


Measures of Default Correlation 

The complete specification of the default cor¬ 
relation will be given by the joint distribution 
of default times. Nevertheless, we can spec¬ 
ify some other measures of default correlation. 
Consider two firms A and B that have not de¬ 
faulted before time t (0 < t < T). We denote the 
probabilities that firms A and B will default in 
the time interval [t, T] by Pa and pp, respectively. 
Denote Pab the probability of both firms default¬ 
ing before T, and ta and r B the default times of 
each firm. The linear correlation coefficient be¬ 
tween the default indicator random variables 
1a = 1{t a <t) and 1 B = l lrB < T ) is given by 


P ( llTA < T )> l{r B <T)) 


Pab - PaPb 

VPa(I-Pa) Pb (1 — Pb) 
(32) 
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In the same way we can define the linear cor¬ 
relation of the random variables l{ rA >r) and 
1(t b >T| ■ Another measure of default dependence 
between firms is the linear correlation between 
the random variables r A and r B , p (r A , r B ). 

The conclusions extracted from the com¬ 
parison of linear default correlations should 
be viewed with caution because they are 
covariance-based and hence they are only the 
natural dependence measures for joint ellipti¬ 
cal random variables. 19 Default times, default 
events, and survival events are not elliptical 
random variables, and hence these measures 
can lead to severe misinterpretations of the true 
default correlation structure. 20 

The previous correlation coefficients, when 
estimated via a risk neutral intensity model, 
are based on the risk neutral measure. How¬ 
ever, when we calculate the correlation co¬ 
efficients using empirical default events, the 
correlation coefficients are obtained under the 
physical measure. Jarrow, Lando, and Yu (2001) 
and Yu (2002a, b) provide a procedure for com¬ 
puting physical default correlation through the 
use of risk neutral intensities. 


Conditionally Independent 
Default Models 

From now on, we consider i = 1,..., I differ¬ 
ent firms and denote by and r, their default 
intensities and default times respectively. 

In CID models, firms' default intensities are 
independent once we fix the realization of the 
state variables X f . The default correlation is in¬ 
troduced through the dependence of each firm's 

intensity on the random vector X t . A firm- 

* 

specific factor of stochasticity X it , independent 
across firms, completes the specification of each 
firm's default intensity: 

* 

k;,t = a o,;\.i i + ... + cij'^Xj t + X it 

(33) 

where a j Aki are some deterministic coefficients, 
for j = 1, ..., / and i =1,, I. 21 


Since default times are continuously dis¬ 
tributed, this specification implies that the prob¬ 
ability of having two or more simultaneous 
defaults is zero. 

Let us consider an example of a CID model 
based on Duffee (1999). The default-free interest 
rate is given by 

r t = a r, 0 + Xi J + X 2 ,f (34) 

where fl r ,o is a constant coefficient, and X\ t and 
X2,f are two latent factors (unobservable, inter¬ 
preted as the slope and level of the default- 
free yield curve). After having estimated the 
latent factors X\j and X2 ,t from default-free 
bond data, Duffee (1999) uses them to model 
the intensity process of each firm i as 

Xi.t = Clo.li + Cl\,Xi (Xi, t — Xi) + fl 2 Pi (X2,t — X2) 
+X it (35) 

dX it = Kj ( 0 i — X { dt + ai^X~ t dWij (36) 

where ..., W/ f are independent Brownian 
motions, flo.;.,, and ^2.;., are constant coeffi¬ 
cients, and X\ and X2 are the sample means of 
X\ i and X2 

The intensity of each firm i depends on the 
two common latent factors Xi t and X 2 J. and on 

Xr 

an idiosyncratic factor a ( ( , independent across 
firms. The coefficients a q.a.,, / k;, 0, and 

a, are different for each firm. In Duffee's model, 

* 

X t t captures the stochasticity of intensities and 
the coefficients and a 2 ,x ir i — 1,.... I, cap¬ 
ture the correlations between intensities them¬ 
selves, and between intensities and interest 
rates. 

Duffee (1999), Zhang (2003), Driessen (2005), 
and Elizalde (2005b) propose, and estimate, dif¬ 
ferent CID models. 22 

The literature on credit risk correlation has 
criticized the CID approach, arguing that it 
generates low levels of default correlation 
when compared with empirical default cor¬ 
relations. However, Yu (2002a) suggests that 
this apparent low correlation is not a prob¬ 
lem of the approach itself but a problem of the 
choice of state or latent variables, owing to the 
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inability of a limited set of state variables to 
fully capture the dynamics of changes in default 
intensities. In order to achieve the level of corre¬ 
lation seen in empirical data, a CID model must 
include among the state variables the evolu¬ 
tion of the stock market, corporate and default- 
free bond markets, as well as various industry 
factors. 

According to Yu, the problem of low corre¬ 
lation in Duffee's model may arise because of 
the insufficient specification of the common fac¬ 
tor structure, which may not capture all the 
sources of common variation in the model, leav¬ 
ing them to the idiosyncratic component, which 
in turn would not be independent across firms. 
In fact, Duffee finds that idiosyncratic factors 
are statistically significant and correlated across 
firms. As long as the firms' credit risk depend 
on common factors different from the interest 
rate factors, Duffee's specification is not able 
to capture all the correlation between firms' 
default probabilities. Xie, Wu, and Shi (2004) 
estimate Duffee's model for a sample of U.S. 
corporate bonds and perform a careful analysis 
of the model pricing errors. A principal com¬ 
ponent analysis reveals that the first factor ex¬ 
plains more than 90% of the variation of pricing 
errors. Regressing bond pricing errors with re¬ 
spect to several macroeconomic variables, they 
find that returns on the S&P 500 index ex¬ 
plain around 30% of their variations. Therefore, 
Duffee's model leaves out some important ag¬ 
gregate factors that affect all bonds. 

Driessen (2005) proposes a model in which 
the firms' hazard rates are a linear function of 
two common factors, two factors derived from 
the term structure of interest rates, a firm id¬ 
iosyncratic factor, and a liquidity factor. Yu also 
examines the model of Driessen (2005), finding 
that the inclusion of two new common factors 
elevates the default correlation. 

Finally, Elizalde (2005b) shows that any firm's 
credit risk is, to a very large extent, driven by 
common risk factors affecting all firms. The 
study decomposes the credit risk of a sample 
of corporate bonds (14 U.S. firms, 2001-2003) 


into different unobservable risk factors. A single 
common factor accounts for more than 50% of 
all (but two) of the firms' credit risk levels, with 
an average of 68% across firms. Such factor rep¬ 
resents the credit risk levels underlying the U.S. 
economy and is strongly correlated with main 
U.S. stock indexes. When three common factors 
are considered (two of them coming from the 
term structure of interest rates), the model ex¬ 
plains an average of 72% of the firms' credit 
risk. 23 


Default Times Simulation 

In the CID approach, to simulate default times 
we proceed as we did in the single entity case. 
Once we know the realization of the state vari¬ 
ables X f , we simulate a set of I independent unit 
exponential random variables ij\, ..., r; ( , which 
are also independent of (Gx.t)- The default time 
of each firm i = 1 ,,1 is defined by 


n = inf 


t > 0 


I $ds > 

J 0 


m 


(37) 


Thus, once we have simulated , r, will be such 
that 


f 


Xj^ds — Tji 


(38) 


Joint Jumps/Joint Defaults Duffie and Singleton 
(1999b) proposed two ways out of the low cor¬ 
relation problem. One is the possibility of joint 
jumps in the default intensities, and the other 
is the possibility of default-event triggers that 
cause joint defaults. 24 

Duffie and Singleton develop an approach 
in which firms experience correlated jumps in 
their default intensities. Assume that the de¬ 
fault intensity of each firm follows the follow¬ 
ing process: 

dXi t = Kj (6i — f'j dt -(- dcji t (39) 

which consists of a deterministic mean rever¬ 
sion process plus a pure jump process cjij, 
whose intensity of arrival is distributed as a 
Poisson random variable with parameter y, and 
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whose jump size follows an exponential ran¬ 
dom variable with mean /x (equal for all firms 
i = 1 ,,1 ). 25 Duffie and Singleton introduce 
correlation to the firm's jump processes, keep¬ 
ing unchanged the characteristics of the in¬ 
dividual intensities. They postulate that each 
firm's jump component consists of two kinds 
of jumps, joint jumps and idiosyncratic jumps. 
The joint jump process has a Poisson intensity 
y c and an exponentially distributed size with 
mean /x. Individual default intensities experi¬ 
ence a joint jump with probability p ( . That is, a 
firm suffers a joint jump with Poisson intensity 
of arrival of p\ y c . In order to keep the total jump 
in each firm's default intensity with intensity 
of arrival y, and size /x, the idiosyncratic jump 
(independent across firms) is set to have an ex¬ 
ponentially distributed size /x and intensity of 
arrival hi, such that y; = p,y c + hi. 

Note that if p, = 0 the jumps are only 
idiosyncratic jumps, implying that default 
intensities and hence default times are indepen¬ 
dent across firms. If p, = 1 and //, =0 all firms 
have the same jump intensity, which does not 
mean that default times are perfectly correlated, 
since the size of the jump is independent across 
firms. Only if we additionally assume that /x 
goes to infinity we obtain identical default 
times. 

The second alternative considers the possibil¬ 
ity of simultaneous defaults triggered by com¬ 
mon credit events, at which several obligors can 
default with positive probability. Imagine there 
exist m = 1,..., M common credit events, each 
one modeled as a Poisson process with inten¬ 
sity X c mt . Given the occurrence of a credit event 
m at time f, each firm i defaults with proba¬ 
bility p/,n!,f. If, given the occurrence of a com¬ 
mon shock, the firm's default probability is less 
than one, this common shock is called nonfa- 
tal shock, whereas if this probability is one, the 
common shock is called fatal shock. In addi¬ 
tion to the common credit events, each entity 
can experience default through an idiosyncratic 
Poisson process with intensity X i ( , which is in¬ 
dependent across firms. Therefore, the total in¬ 


tensity of firm i is given by 

M 

'b.t = 'hx + ) ' Pi,m,t^m,t (^0) 

m =1 

Consider a simplified version of this setting 

with two firms, constant idiosyncratic intensi- 
* * 

ties /. j and ), 2 , and one common and fatal event 
with constant intensity X c . In this case firm i's 
survival probability is given by 

s; (f, T) = exp (— ^X* + X C J (T — f)) (41) 

Denoting by s (f; Tj, T 2 ) the joint survival prob¬ 
ability, given no default until time f, that firm 1 
does not default before time T 1 and firm 2 does 
not default before time T 2 , then 

s(f; Ti, T 2 ) = exp(-kj(Ti - f) - X* 2 (T 2 - t) 
—X c maxjTi — t,T 2 — f}) = 

= exp(—(k* + A. c )(Ti - f) 

~(^ 2 + h c )(T 2 — f) + X c 
minjTi — t,T 2 — f}) (42) 

which can be expressed as 

s (f;Ti, T 2 ) = si (f, T) s 2 (t,T) 
min {exp (X c (Tj - f)), exp (X c (T 2 - f))} (43) 

This expression for the joint survival prob¬ 
ability explicitly includes individual survival 
probabilities and a term that introduces the 
dependence structure. This is the approach 
followed by copula functions, which couple 
marginal probabilities into joint probabilities. 
In fact, the above example is a special case of 
copula function, called Marshall-Olkin copula. 

The relationship between joint survival and 
default probabilities is given by 

s(f; Ti, T 2 ) = 1 — pi(f, Tj) — p 2 (t, T 2 ) 
+p(t;T 1 ,T 2 ) (44) 

where p (f; 7). T 2 ) represents the joint default 
probability, given no default until time f, that 
firm 1 defaults before time T\ and firm 2 
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defaults before time TV Obviously the case with 
multiple common shocks is more troublesome 
in terms of notation and calibration because, for 
every possible common credit event, an inten¬ 
sity must be specified and calibrated. 26 

Duffie and Singleton (1999b) propose algo¬ 
rithms to simulate default times within these 
two frameworks. The criticisms that the joint 
credit event approach has received stem from 
the fact that it is unrealistic that several firms 
default at exactly the same time, and also from 
the fact that after a common credit event that 
makes some obligors default, the intensity of 
other related obligors that do not default does 
not change at all. 

Although theoretically appealing, the main 
drawback of these two last models has to do 
with their calibration and implementation. To 
the best of my knowledge there is not a sin¬ 
gle paper that carries out an empirical calibra¬ 
tion and implementation of a model like the 
ones presented in this section. The same applies 
to the contagion models presented in the next 
section. 

Contagion Mechanisms 

Contagion models take CID models one step 
further, introducing into the model two empir¬ 
ical facts: that the default of one firm can trig¬ 
ger the default of other related firms and that 
default times tend to concentrate in certain pe¬ 
riods of time, in which the default probability 
of all firms is increased. The last model exam¬ 
ined in the previous section (joint credit events) 
differs from contagion mechanisms in that if an 
obligor does not experience a default, its inten¬ 
sity does not change due to the default of any 
related obligor. The literature of default con¬ 
tagion includes two approaches: the infectious 
defaults model of Davis and Lo (1999), and the 
model proposed by Jarrow and Yu (2001), which 
we shall refer to as the propensity model. The 
main issues to be resolved concerning these two 
models are associated with difficulties in their 
calibration to market prices. 


The Davis-Lo Infectious 
Defaults Model 

The model developed by Davis and Lo (1999) 
has two versions, a static version that only con¬ 
siders the number of defaults in a given time 
period, 27 and a dynamic version in which the 
timing of default is also incorporated. 28 
In the dynamic version of the model, each firm 
has an initial hazard rate of A,; t , for i = 1,..., I , 
which can be constant, time dependent, or fol¬ 
low a CID model. When a default occurs, the 
default intensity of all remaining firms is in¬ 
creased by a factor a > 1, called the enhance¬ 
ment factor, to aXi t . This augmented intensity 
remains for an exponentially distributed period 
of time, after which the enhancement factor 
disappears (a = 1). During the period of aug¬ 
mented intensity, the default probabilities of 
all firms increase, reflecting the risk of default 
contagion. 


The Jarrow-Yu Propensity Model 
In order to account for the clustering of default 
in specific periods, Jarrow and Yu (2001) ex¬ 
tend CID models to account for counterparty 
risk: the risk that the default of a firm may 
increase the default probability of other firms 
with which it has commercial or financial rela¬ 
tionships. This allows them to introduce extra¬ 
default dependence in CID models to account 
for default clustering. In a first attempt, Jarrow 
and Yu assume that the default intensity of a 
firm depends on the status (default/ not default) 
of the rest of the firms (symmetric dependence). 
However, symmetric dependence introduces a 
circularity in the model, which they refer to as 
looping defaults, which makes it extremely dif¬ 
ficult and troublesome to construct and derive 
the joint distribution of default times. 

Jarrow and Yu restrict the structure of the 
model to avoid the problem of looping de¬ 
faults. They distinguish between primary firms 
(1,..., K) and secondary firms (K + 1,..., I). 
First, they derive the default intensity of 
primary firms, using a CID model. The primary 
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firm intensities A. 1-t ,..., Xkj are (f/x,t)-adapted 
and do not depend on the default status of 
any other firm. If a primary firm defaults, this 
increases the default intensities of secondary 
firms, but not the other way around (asymmet¬ 
ric dependence). Thus, secondary firms' default 
intensities are given by 

K 

^i,t = ^i,t + £«/.**,*} ( 45 ) 

7=1 

for i = K + 1,..., I and j = 1,..., K, where 
kij and aj t are (C/x. f (-adapted. X it represents the 
part of secondary firm i's hazard rate indepen¬ 
dent of the default status of other firms. 

Default intensities of primary firms 
A.i it ,..., A.x,f are (f7x,t)-adapted, whereas 
default intensities of secondary firms 

A.x+1,!, ..., X.i it are adapted with respect to 
the filtration (Gx,t) v (Gi,t) v ... v (Gk.i)- 
This model introduces a new source of de¬ 
fault correlation between secondary firms, and 
also between primary and secondary firms, but 
it does not solve the drawbacks of low correla¬ 
tion between primary firms, which CID models 
apparently imply, because the setting for pri¬ 
mary firms is, after all, only a CID model. 29 


Default Times Simulation First we simulate the 
default times for the primary firms exactly as 
in the case of CID. Then, we simulate a set 
of I—K independent unit exponential random 
variables rj x + i, ... , ip (independent of (Gx.t) v 
(Gi,t) v ... v (Gk, f)), and define the default time 
of each secondary firm i = K + 1,..., I as 


t; = inf 


t > 0 


f 

Jo 


ds > rji 


(46) 


Copulas 

In CID and contagion models the specification 
of the individual intensities includes all the 
default dependence structure between firms. 
In contrast, the copula approach separates in¬ 
dividual default probabilities from the credit 
risk dependence structure. The copula function 


takes as inputs the marginal probabilities and 
introduces the dependence structure to gener¬ 
ate joint probabilities. 

Copulas were introduced in 1959 and have 
been extensively applied to model, among oth¬ 
ers, survival data in areas such as actuarial 
science. 30 

In the rest of this section we review copula 
theory and its use in the credit risk literature. 
To make notation simple, assume we are at time 
t = 0 and take s,- (f) and p,- (f) (or F,(f)) to be the 
survival and default probabilities, respectively, 
of firm i — 1, ..., I from time 0 to time t > 0. 
Then 

F,(f) = P [n < t] = l - Si (t) = l - P [n > t] 

(47) 


where r, denotes the default time of firm i. 

A copula function transforms marginal prob¬ 
abilities into joint probabilities. In case we 
model default times, the joint default probabil¬ 
ity is given by 

F (fi,... ,ti) = P[n < h,..., ti < ti] 

= C "(Fifa).F/(fi)) (48) 

and if we model survival times, the joint sur¬ 
vival probability takes the form 

s(fi,...,h) = P[Ti > h,..., T/ > t,] 

= C s (si(h),..., sj(tj)) (49) 

where C d and C s are two different copulas. 31 

The copula function takes as inputs the 
marginal probabilities without considering 
how we have derived them. Thus, the intensity 
approach is not the only framework with which 
we can use copula functions to model the de¬ 
fault dependence structure between firms. Any 
other approach to model marginal default prob¬ 
abilities, such as the structural approach, can 
use copula theory to model joint probabilities. 


Description 

An intuitive definition of a copula function is 
as follows: 32 
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Copula Function A function C : [0,1] J —»■ 

[0,1] is a copula if there are uniform ran¬ 
dom variables U\,... ,Ui taking values in 
[0,1] such that C is their joint distribution 
function. 

A copula function C has uniform marginal 
distributions, that is, 

C (1, . . . , 1, W;, 1, . . . , 1) = W; (50) 

for all i = 1 ,, I and u, e [0,1]. 

This definition is used, for example, by 
Schonbucher (2003). 33 The copula function C 
is the joint distribution of a set of I uniform 
random variables U],... ,Uj. Copula functions 
allow one to separate the modeling of the 
marginal distribution functions from the mod¬ 
eling of the dependence structure. The choice 
of the copula does not constrain the choice of 
the marginal distributions. Sklar (1959) showed 
that any multivariate distribution function F 
can be written in the form of a copula func¬ 
tion. The following theorem is known as Sklar's 
theorem: 

Sklar's Theorem Let Y \,..., Yj be 
random variables with marginal dis¬ 
tribution functions F\,...,F[ and joint 
distribution function F. Then there ex¬ 
ists an f-dimensional copula C such 
that F (yi, ..., y,) = C (Fi(yi),..., F z (y/)) 
for all (yi,..., yi) in R 1 . Moreover, if each 
F, is continuous, then the copula C is 
unique. 

We shall consider the default times of each 
firm ri,..., T; as the marginal random variables 
whose joint distribution function will be deter¬ 
mined by a copula function. If Y is a random 
variable with distribution function F, then the 
random variable U, defined as U = F (Y), is a 
uniform [0,1] random variable. Denoting by t, 
the realization of each r, , 34 

F (h,..., ti) = P[ti < h,..., n < ti] 

= C (F 1 (t 1 ),...,F I (t I )) (51) 


The marginal distribution of the default time r, 
will be given by 

F; (f;) = F (OO, . . . , OO, ti, oo,..., oo) 

= P [t! < OO, . . ., T; < ti, ... ,Zi < oo] = 
= C (F\(oo), ..., Fj(U), ..., F/(oo)) 

= C(1.Ff(fe).1) (52) 

In the bivariate case, the relationship between 
the copula C rf and the survival copula C s , which 
satisfiess (h, t 2 ) = C s (si(fi), S 2 (f 2 )),isgivenby 35 

C s (mi, u 2 ) — u\ + u 2 — 1 
+C rf (1 — Hi, 1 — u 2 ) (53) 

Nelsen (1999) points out that C s is a cop¬ 
ula and that it couples the joint survival 
function s (•,...,•) to its univariate margins 
Si(■),..., Sj(-) in a manner completely analo¬ 
gous to the way in which a copula connects 
the joint distribution function F (■,...,) to its 
margins Fi(-),..., F/(-). When modeling credit 
risk using the copula framework we can spec¬ 
ify a copula for either the default times or the 
survival times. 

Measures of the Dependence Structure The de¬ 
pendence between the marginal distributions 
linked by a copula is characterized entirely 
by the choice of the copula. If Ci and C 2 are 
two /-dimensional copula functions we say that 
Ci is smaller than C 2 , denoted by Q -< C 2 , if 
Ci (m) < C 2 (m) for all u e [0, l] 1 . 

The Frechet-Hoeffding copulas, C and C + , 
are two reference copulas given by 36 

C = max fMi + ... + Mi + 1 — I, 0} (54) 

C + = min{i<i, ...,!</} (55) 

satisfying C < C < C + for any copula C. How¬ 
ever, this is a partial ordering in the sense that 
not every pair of copulas can be compared in 
this way. 

In order to compare any two copulas, we 
would need an index to measure the depen¬ 
dence structure between two random variables 
introduced by the choice of the copula function. 
Linear (Pearson) correlation coefficient p is the 
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most used measure of dependence; however, 
it harbors several drawbacks, which makes it 
not very suitable to compare copula functions. 
For example, linear correlation depends not 
only on the copula but also on the marginal 
distributions. 

We focus on four dependence measures that 
depend only on the copula function, not in the 
marginal distributions: Kendall's tau, Spear¬ 
man's rho, and upper/lower tail dependence 
coefficients. 

First, we introduce the concept of 
concordance: 

Concordance Let (j/ 1 , 1 / 2 ) and (ij\, y 2 ) be 
two observations from a vector (Yi, Y 2 ) 
of continuous random variables. Then, 
(yi, y 2 ) and (yi, y 2 ) are said to be concor¬ 
dant if (yi — yi) (y 2 — 1 / 2 ) > 0 and discor¬ 
dant if (y! - yO (y 2 - y 2 ) < 0. 

Kendall's tau and Spearman's rho are two 
measures of concordance: 

Kendall's Tau Let (Y 1; Y 2 ) and (Y u Y 2 ) be 
IID random vectors of continuous random 
variables with the same joint distribution 
function given by the copula C (and with 
marginals F 1 and F 2 ). Then, Kendall's tau 
of the vector (Yi, Y 2 ) (and thus of the cop¬ 
ula C) is defined as the probability of con¬ 
cordance minus the probability of discor¬ 
dance; that is, 

r = P [(Yi - fj) (Y 2 - f 2 ) > 0] 
-P[(Y 1 -Y 1 )(Y 2 - Y 2 ) <0] (56) 

Spearman's Rho Let (Y 1; Y 2 ), (Yi, Y 2 ) and 
be IID random vectors of contin¬ 
uous random variables with the same joint 
distribution function given by the copula 
C (and with marginals F \ and F 2 ). Then, 
Spearman's rho of the vector (Yj, Y 2 ) (and 
thus of the copula C) is defined as 

Ps = 3(P[(Yi - Yi)(Y 2 - f 2 ) > 0] 

-P[(Yj - fi)(Y 2 - f 2 ) < 0]) (57) 


Both Kendall's tau and Spearman's rho 37 take 
values in the interval [0,1] and can be defined 
in terms of the copula function by 

r = 4 JJ C (u, v) dC (u, u) — 1 (58) 

[0,lf 

p s = 12 JJ uvdC(u , u) — 3 = 12 JJ 
[0,1] 2 [0,1] 2 
C(u,v)dudv — 3 (59) 

The Frechet-FIoeffding copulas take the two 
extreme values of Kendall's tau and Spear¬ 
man's rho: If the copula of the vector (Yi, Y 2 ) 
is C then x = p s = —1, and if it has copula 
C + then r = ps = 1- The product copula C p 
represents independent random variables, that 
is, if Yi,..., Y[ are independent random vari¬ 
ables, their copula is given by C p , such that 
C p (mi, ..., Ui) = Mi... mi. For a vector (Yi, Y 2 ) 
of independent random variables, r = ps = 0. 
Kendall's tau and Spearman's rho are equal for 
a given copula C and its associated survival 
copula C s . 

Kendall's tau and Spearman's rho are mea¬ 
sures of global dependence. In contrast, tail 
dependence coefficients between two random 
variables (Yi, Y 2 ) are local measures of depen¬ 
dence, as they refer to the level of dependence 
between extreme values, that is, values at the 
tails of the distributions F\ (Yi) and F 2 (Y 2 ). 

Tail Dependence Let (Y 1; Y 2 ) be a ran¬ 
dom vector of continuous random vari¬ 
ables with copula C (and with marginals 
F 1 and F 2 ). Then, the coefficient of upper 
tail dependence of the vector (Yi, Y 2 ) (and 
thus of the copula C) is defined as 

X u = lhn P [Yi > Ff 1 (m) | Y 2 > F 2 -1 («)] (60) 

where Ff 1 represents the inverse function 
of F„ provided the limit exists. We say that 
the random vector (and thus the copula 
C ) has upper tail dependence if Xu > 0. 
Similarly, the coefficient of lower tail de¬ 
pendence of the vector ( Y \, Y 2 ) (and thus 
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of the copula C) is defined as 

k L = lim P [Yj < Ff 1 (m) | Y 2 < Ff 1 («)] (61) 

We say that the random vector (and thus 
the copula C) has lower tail dependence if 

Xl > 0 . 


Upper (lower) tail dependence measures the 
probability that one component of the vector 
(Yi, Y 2 ) is extremely large (small) given that the 
other is extremely large (small). As in the case of 
Kendall's tau and Spearman's rho, tail depen¬ 
dence is a copula property and can be expressed 


X u — lim 

u /1 


Xi — lim 

u\ o u 


1 + C (u, u) — 2 u 
1 — u 
C (u, u) 


(62) 

(63) 


The upper (lower) coefficient of tail dependence 
of the copula C is the lower (upper) coefficient 
of tail dependence of its associated survival 
copula C s . 

Consider the random vector (ti, r 2 ) of default 
times for two firms. The coefficient of upper 
(lower) tail dependence represents the prob¬ 
ability of long-term survival (immediate joint 
death). The existence of default clustering peri¬ 
ods implies that a copula to model joint default 
(survival) probabilities should have lower (up¬ 
per) tail dependence to capture those periods. 


Examples of Copulas Here, we review some of 
the most used copulas in default risk model¬ 
ing. The first two copulas, normal and Student 
f copulas, belong to the elliptical family of cop¬ 
ulas. We also present the class of Archimedean 
copulas and the Marshall-Olkin copula. 39 

1. Elliptical Copulas The I-dimensional nor¬ 
mal copula with correlation matrix E is given 
by 

C (Mr,..., u r ) = (cW 1 (Ilf), ■ ■ ■, ‘W 1 («j)) 

(64) 

where 0^ represents an I-dimensional normal 
distribution function with covariance matrix E, 


and <t>~ 1 denotes the inverse of the univariate 
standard normal distribution function. 

Normal copulas are radially symmetric (Xu = 
Xi), tail independent (Xu — Xl — 0), and their 
concordance order depends on the linear corre¬ 
lation parameter p: 

C -< C p=—i C p< o Cp—o C -< Cp>o -< 

Cp=i = C+ (65) 


As with any other copula, the normal copula 
allows the use of any marginal distribution. We 
can express the linear correlation coefficients for 
a normal copula (p) in terms of both Kendall's 
tau (r) and Spearman's rho (ps) in the following 
way: 


P = 2sin (^Fs) = sin 


( 66 ) 


Another elliptical copula is the f-copula. Let¬ 
ting X be an random vector distributed as an 
I-dimensional multivariate t-student with v de¬ 
grees of freedom, mean vector pi (for v > 1) 
and covariance matrix E (for v > 2), we can 
express X as 


X=p+^Z 

7s 


(67) 


where S is a random variable distributed as 
an x 2 with v degrees of freedom and Z is an 
I-dimensional normal random vector, indepen¬ 
dent of S, with zero mean and linear correlation 
matrix E. The I -dimensional f-copula of X can 
be expressed as 

C(mi. • ■ •. «i) = tl.ii (f,; 1 (mi) , ■ • •, f" 1 (mj)) 

( 68 ) 

where f,( R represents the distribution function 

°f vl Y, where Y is an I-dimensional normal ran¬ 
dom vector, independent of S, with zero mean 
and covariance matrix R. t denotes the inverse 
of the univariate f-student distribution function 
with v degrees of freedom and R, ( - = —p= 


V 1 


]] 


The f-copula is radially symmetric and ex¬ 
hibits tail dependence given by 


X u = Xl = 2 — 2f„+i 


(v + 1)(1 - P )V 

(1 + p) 


(69) 
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where p is the linear correlation of the bivariate 
t-distribution. 

2. Archimedean Copulas An /-dimensional 
Archimedean copula function C is represented 
by 

C (Ml, . . . , Ml) = <p~ l ((/> (Ml) + ... + </> (Ml)) 

(70) 

where the function (f> : [0,1] —> R + , called the 
generator of the copula, is invertible and sat¬ 
isfies (j)' (m) < 0, cp" (m) > 0, 0(1) = 0, </>(0) = 
oo. An Archimedean copula is entirely char¬ 
acterized by its generator function. Relevant 
Archimedean copulas are the Clayton, Frank, 
Gumbel, and Product copulas, whose genera¬ 
tor functions are given by: 


Copula 

Generator 



Clayton 

ir 0 -l 

e 

for 9 

> 0 

Frank 

ln^;- 1 

e ~°—1 

for 9 

£ R\ 10 } 

Gumbel 

(— In 1 if 

for 9 

> 1 

Product 

— Inu 




The Clayton copula has lower tail dependence 
but not upper tail dependence. The Gumbel 
copula has upper tail dependence but not lower 
tail dependence. The Frank copula does not ex¬ 
hibit either upper or lower tail dependence. 

Archimedean copulas allow for a great vari¬ 
ety of different dependence structures, and the 
ones presented above are especially interesting 
because they are one-parameter copulas. In par¬ 
ticular, the larger the parameter 0 (in absolute 
value), the stronger the dependence structure. 
The Clayton, Frank, and Gumbel copulas are 
ordered in 6 (i.e., C 01 -< C g 2 for all 0\ < 0 2 ). Un¬ 
like the Gumbel copula, which does not allow 
for negative dependence, Clayton and Frank 
copulas are able to model continuously the 
whole range of dependence between the lower 
Frechet-Hoeffding copula, the product copula 
and the upper Frechet-Hoeffding copula. Cop¬ 
ulas with this property are called inclusive or 
comprehensive copulas. Frank copulas are the 


only radially symmetric Archimedean copulas 
(C = C s ). 

For Archimedean copulas, tail dependence 
and Kendall's tau coefficients can be expressed 
in terms of the generator function 


r = 1+4 


1 0(«) 

(«)' 


( * 

Ja V 


du 


X u = 2-21im -—!-*- 
u->0 <p~ l (u) 

l t = 21im tdfiH 

U^OO 0-1 (;,) 


(71) 

(72) 

(73) 


provided the derivatives and limits exist. 

Archimedean copulas are interchangeable, 
which means that the dependence between any 
two (or more) random variables does not de¬ 
pend on which random variables we choose. In 
terms of credit risk analysis, this imposes an im¬ 
portant restriction on the dependence structure 
since the default dependence introduced by an 
Archimedean copula is the same between any 
group of firms. 

3. Marshall-Olkin Copula This copula was 
already mentioned when we dealt with joint de¬ 
faults in intensity models. In its bivariate spec¬ 
ification the Marshall-Olkin copula is given by 


C (mi , M 2 ) = min J u\ ai U 2 ,Uiu\ “ 2 J 
= Mi M 2 min { m j~ ai , Mj } (74) 

for bi, (12 e (0, l). 40 


Copulas for Default Times 

Within the reduced-form approach, we can dis¬ 
tinguish two approaches to introduce default 
dependence using copulas. The first one, which 
we will refer to as Li's approach, was intro¬ 
duced by Li (1999) and represents one of the first 
attempts to use copula theory systematically 
in credit risk modeling. Li's approach takes as 
inputs the marginal default (survival) probabil¬ 
ities of each firm and derives the joint prob¬ 
abilities using a copula function. 41 Although 
Li (1999) studies the case of a normal copula. 
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any other copula can be used within this frame¬ 
work. 

If we are using a copula function as a joint 
distribution for default (survival) times, the 
simulated vector (iii, ...,Uj) of uniform [0,1] 
random variables from the copula will cor¬ 
respond to the default Fi,...,Fi (survival 
Si,..., Si) marginal distributions. Once we have 
simulated the vector (ii \,..., uj), we use it to 
derive the implied default times r\,... ,ti such 
that n = F ; _1 (iij ), or t; = s,” 1 (it/) in the survival 
case, for i = 1,..., I. 

The second approach was introduced by 
Schonbucher and Schubert (2001), and here we 
shall call it the Schonbucher-Schubert (SS) ap¬ 
proach. In the algorithm to draw a default 
time in the case of a single firm, we simu¬ 
lated a realization it, of a uniform [0,1] ran¬ 
dom variable Uj independent of (Gx.t), and 
defined the time of default of firm i as r, such 
that 

— J kj (s) dsj = Uj (75) 

where kj is the default intensity process of firm i. 
The idea of the SS approach is to link the default 
thresholds Uj,, Uj with a copula. 

Schonbucher and Schubert consider that the 
processes ki,... ki are (^ r /,/)-adapted 42 and call 
them pseudo default intensities. Thus, kj is 
the default intensity if investors only consider 
the information generated by the background 
filtration (Gx.t) and by the default status of 
firm i, (Gi.t )• However, investors are not re¬ 
stricted to the information represented by (J 7 /,/) 
as they also observe the default status of the 
rest of the firms. Therefore, kj is not the den¬ 
sity of default with respect to all the informa¬ 
tion investors have available, as represented 
by (Tj), but rather with respect to a smaller 
information set. 

To calculate the default (or survival) proba¬ 
bilities conditional to all the information that 
investors have available, (J 7 ,), we cannot define 
those probabilities in terms of the pseudo de¬ 
fault intensities k\,... kj. We have to find the 


"real" intensities implied by the investors' in¬ 
formation set. The difference between pseudo 
and real intensities lies in the fact that real inten¬ 
sities, in addition to all the information consid¬ 
ered by pseudo intensities, include information 
about the default status of all firms. The default 
thresholds' copula function includes this infor¬ 
mation in the SS approach. 

In order to find the "real" default intensi¬ 
ties hi, ...hi, which are (JF^-adapted, we need 
to combine both the pseudo default intensities 
and the copula function, which links the default 
thresholds. The pseudo default intensity kj in¬ 
cludes information about the state variables and 
the default situation of firm i, and only coincides 
with the "real" default intensity hj in cases of in¬ 
dependent default or when the information of 
the market is restricted to (J 7 /,/). 43 

The simulation of the default times in this ap¬ 
proach is exactly the same as in Li's approach. 
The only difference with the SS approach is that 
it allows us to recover the dynamics of the "real" 
default intensities hi, ...hi, which include the 
default contagion effects implicit in the default 
threshold copula. In contrast to the models of 
Jarrow and Yu (2001) and Davis and Lo (1999), 
the SS approach allows the contagion effects 
to arise endogenously through the use of the 
copula. 

Schonbucher (2003) calls the SS approach a 
dynamic approach in the sense that it considers 
the dynamics of the "real" default intensities 
hi, ...hi, as opposed to Li's approach, which 
only considers the dynamics of the pseudo de¬ 
fault intensities. 

As Schonbucher and Schubert (2001) point 
out, this setup is very general, and the reader 
has freedom to choose the specification of the 
default intensities. We can introduce default 
correlation by both correlating the default in¬ 
tensities, for example with a CID model, and 
by using any of the copula approaches we have 
just presented. 

In an extension of the SS approach, Rogge 
and Schonbucher (2003) propose not to use the 
normal or f-copulas but Archimedean copulas. 
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arguing that normal and f-copulas do not imply 
a realistic dynamic process for default intensi¬ 
ties. 

Galiani (2003) provides a detailed analysis of 
the use of copula functions to price multiname 
credit derivatives using both a normal and Stu¬ 
dent f copula. 

Choosing and Calibrating the Copula 

Once we have reviewed how to use copula the¬ 
ory in the context of joint default probabilities, 
we have to choose a copula and estimate its pa¬ 
rameters. In order to choose a copula we should 
consider aspects such as the dependence struc¬ 
ture each copula involves as well as the number 
of parameters we need to estimate. 

Since the normal copula presents neither 
lower nor upper tail dependence, the use of 
multivariate normal distributions to model de¬ 
fault (or price) behavior has been strongly crit¬ 
icized for not assigning enough probability to 
the occurrence of extreme events and, among 
them, the periods of default clustering. The use 
of the f-copula is the natural answer to the 
lack of tail dependence, since, subject to the 
degrees of freedom and covariance matrix, this 
copula exhibits tail dependence. The main prob¬ 
lem in using a normal or f-copula is the number 
of parameters we have to estimate, which grows 
with the dimensionality of the copula. 44 

Archimedean copulas are especially attrac¬ 
tive because there exists a large number of 
one-parameter Archimedean copulas 45 which 
allows for a great variety of dependence 
structures. The disadvantage of Archimedean 
copulas is that they may impose too much 
dependence structure in the sense that, as they 
are interchangeable copulas, the dependence 
between any group of firms is the same inde¬ 
pendently of the firms we consider. 

In case we decide to use an Archimedean cop¬ 
ula, Genest and Rivest (1993) propose a proce¬ 
dure for identifying the Archimedean copula 
that best fits the data. 46 The problem is that 
they consider only the bivariate case and that. 


as we shall see later, we need a sample of the 
marginal random variables (the random vari¬ 
ables Xj, ..., X/ whose marginal distributions 
we link to the copula function) that is available 
if we are modeling equity returns, but not if 
we are modeling default times. More generally, 
Fermanian and Scaillet (2004) discuss the issue 
of choosing the copula that best fits a given data 
set, using goodness-of-fit tests. 

According to Durrleman, Nikeghbali, and 
Rone alii (2000): 

There does not exist a systematic rigorous method 
for the choice of the copula: nothing can tell us that 
the selected family of copula will converge to the 
real structure dependence underlying the data. This 
can provide biased results since according to the 
dependence structure selected the obtained results 
might be different. 

Jouanin et al. (2001) use the term model risk 
to denote this uncertainty in the choice of the 
copula. 

Assuming we manage to select a copula 
function, we now face the estimation of its pa¬ 
rameters. The main problem of the use of cop¬ 
ula theory to model credit risk is the scarcity 
of default data from which to calibrate the 
copula. 

We cannot rely on multiname credit deriva¬ 
tives, such as i f/ '-to-default products, to cali¬ 
brate the copula because, in most cases, they 
are not publicly traded and also because of their 
lack of liquidity. 

Imagine that, instead of fitting a copula to 
default times, we are fitting a copula to daily 
stock returns for I different firms. Let Y\,... ,Yj 
be random variables denoting the daily 
returns of firms i = 1,..., I with marginal dis¬ 
tribution functions Fi,...,Fj and joint dis¬ 
tribution function F. Sklar's theorem proves 
that there exists an 1-dimensional copula C 
such that F (yi,..., y,) = C (F x ( t/i),..., F I (y I )) 
for all (y\,... ,yi) in R 1 . In this case, we have 
available, for each day, a sample of the ran¬ 
dom vector Yi,... ,Yi that we can use to esti¬ 
mate the parameters of the copula. We would 
have to estimate the parameters of the marginal 
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distribution functions Fi, ..., F f and then esti¬ 
mate the parameters of the copula. Since, in our 
application to default times, we already have 
the marginal distributions, determined by the 
specification of the marginal default intensities, 
we are left with the estimation of the copula 
parameters. Providing we have a large sample 
of the random variables Y\,... ,Yi, we can esti¬ 
mate the copula parameters in several ways. 47 

If the copula is differentiable we can al¬ 
ways use maximum likelihood to estimate the 
parameters. 48 De Matteis (2001) mentions that 
this parametric method may be convenient 
when we work with a large data set, but in 
case there are outliers or if the marginal distri¬ 
butions are heavy tailed, a nonparametric ap¬ 
proach may be more suitable. 

A nonparametric approach would involve the 
use of the sample version of a dependence mea¬ 
sure, such as Kendall's tau or Spearman's rho 
(or both), 49 to calibrate the copula parameters. 
However, this nonparametric approach is re¬ 
stricted to the bivariate case, and we would 
need to have at least the same sample depen¬ 
dence measures as copula parameters. 50 

The estimation methods exposed above rely 
on the availability of a large sample of the ran¬ 
dom variables Yj,..., Y/. However, this is not 
the case when we work with default times. We 
do not have available a large sample of default 
times for the I firms. In fact, we do not have a 
single realization of the default times random 
vector. 

One solution is to assume that the 
marginal default (survival) probabilities and 
the marginal distributions of the equity returns 
share the same copula, that is, share the same 
dependence structure, and use equity returns to 
estimate the copula parameters. But this short¬ 
cut has its own drawbacks. We need to fit a cop¬ 
ula to a set of given marginal distributions for 
the default (survival) times, which are charac¬ 
terized by a default intensity for each firm. Ide¬ 
ally we should estimate the parameters of the 
copula function using default times data. How¬ 
ever, we rarely have enough default times data 


available such as to properly estimate the pa¬ 
rameters of the copula function. In those cases, 
we must rely on other data sources to calibrate 
the copula function. For example, a usual prac¬ 
tice is to calibrate the copula using equity data 
of the different firms. However, the dependence 
of the firms' default probabilities will probably 
differ from the dependence in the evolution of 
their equity prices. 

Another way of dealing with the estimation of 
the copula parameters is, as Jouanin et al. (2001) 
propose, through the use of "original meth¬ 
ods that are based on the practice of the credit 
market rather than mimicking statistical meth¬ 
ods that are never used by practitioners." They 
suggest a method based on Moody's diversity 
score. 51 The diversity score or binomial expan¬ 
sion technique consists of transforming a port¬ 
folio of (credit dependent) defaultable bonds on 
an equivalent portfolio of uncorrelated and ho¬ 
mogeneous credits assumed to mimic the de¬ 
fault behavior of the original portfolio, using 
the so-called diversity score parameter, which 
depends on the degree of diversification of the 
original portfolio. We then match the first two 
moments of the number of defaults within a 
fixed time horizon for both the original and the 
transformed portfolio. Since the original port¬ 
folio assumes default dependence, the distribu¬ 
tion of the number of defaults will depend on 
the copula parameters. In the transformed port¬ 
folio, that is, independent defaults, the number 
of defaults follows a binomial distribution with 
some probability p. Matching the first two mo¬ 
ments of the number of defaults in both port¬ 
folios, we would extract an estimation for the 
probability p and for the copula parameters. 52 
However, Moody's diversity score approach 
has its own drawbacks. Among others, it is a 
static model with a fixed time horizon, that is, it 
does not consider when defaults take place but 
only the number of defaults within the fixed 
time horizon. In fact, the Committee on the 
Global Financial System (Bank for International 
Settlements) suggests, in its last report, 53 that 
diversity scores "are a fairly crude measure of 
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the degree of diversification in a portfolio of 
credits." 

Similarly to the choice of the copula function, 
there does not exist a rigorous method to esti¬ 
mate the parameters of the copula. We can talk 
about parameter risk which, together with the 
model risk mentioned earlier, are the principal 
problems we face if we use the copula approach 
in the modeling of dependent defaults. 


KEY POINTS 

• There are two primary types of models in the 
literature that attempt to describe default pro¬ 
cesses: structural and reduced-form models. 
Intensity models represent the most extended 
type of reduced-form models. In contrast to 
structural models, the time of default in inten¬ 
sity models is not determined via the value of 
the firm, but it is the first jump of an exoge¬ 
nously given jump process. The fundamental 
idea of the intensity-based framework is to 
model the default time as the first jump of a 
Poisson process. The default intensity of the 
Poisson process, also referred to as the hazard 
rate, can be deterministic (constant or time 
dependent) or stochastic. 

• We review three different ways of introducing 
default correlations among firms in the frame¬ 
work of intensity models: the conditionally 
independent defaults (CID) approach, conta¬ 
gion models, and copula functions. 

• CID models generate credit risk dependence 
among firms through the dependence of the 
firms' intensity processes on a common set 
of state variables. Firms' default rates are in¬ 
dependent once we fix the realization of the 
state variables. Different CID models differ in 
their choices of the state variables and the pro¬ 
cesses they follow. Extensions of CID mod¬ 
els introduce joint jumps in the firms' default 
processes or common default events. 

• Contagion models extend the CID approach 
to account for the empirical observation of de¬ 


fault clustering (periods in which firms' credit 
risk increases simultaneously and in which 
the majority of defaults take place). They are 
based on the idea that, when a firm defaults, 
the default intensities of related firms jump 
(upwards), that is, the default of one firm in¬ 
creases the default probabilities of other firms 
(to the point of potentially causing the de¬ 
fault of some of them). These models include, 
on the specification of default intensities, the 
existence of contagion sources among firms, 
which can be explained by either their com¬ 
mercial / financial relationships or simply by 
their common exposure to the economy. 

* In CID and contagion models the specifi¬ 
cation of the individual intensities includes 
all the default dependence structure between 
firms. In contrast, the copula approach sepa¬ 
rates individual default probabilities from the 
credit risk dependence structure. 

* A copula is a function that links univari¬ 
ate marginal distributions to the joint mul¬ 
tivariate distribution function. The copula 
approach takes as given the marginal default 
probabilities of the different firms and plugs 
them into a copula function, which provides 
the model with the dependence structure to 
generate joint default probabilities. This ap¬ 
proach separates the modeling and estima¬ 
tion of the individual default probabilities, 
determined by the default intensity processes, 
from the modeling and calibration or estima¬ 
tion of the device that introduces the credit 
risk dependence, the copula. 

NOTES 

1. Brody, Hughston, and Macrina (2007) 
present an alternative reduced-form model 
based on the amount and precision of the in¬ 
formation received by market participants 
about the firm's credit risk. Such a model 
does not require the use of default in¬ 
tensities; it belongs to the reduced-form 
approach because (like intensity models) 
it relies on market prices of defaultable 
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instruments as the only source of informa¬ 
tion about the firms' credit risk. 

2. An event is a subset of D, namely a col¬ 
lection of possible outcomes, a-algebras 
are models for information, and filtrations 
models for flows of information. 

3. For some applications we may wish to 
enlarge this category of probability mea¬ 
sures by relaxing the martingale condi¬ 
tion to a local martingale condition, though 
this point does not concern us in what 
follows. 

4. Given the filtered probability space n, an 
(Tt )-adapted process X f is a Markov process 
with respect to (T t ) if 

E[f(X t )\T s ] = E[f(X t ) | X s ] 

with probability one, for all 0 < s < f, and 
for every bounded function/. This means 
that the conditional distribution at time s of 
X ( , given all available information, depends 
only on the current state X s . 

5. For a more detailed exposition see Lando 
(1994) and Chapter 5 in Schonbucher (2003). 

6. This specification of the recovery rate incor¬ 
porates all possible ways of dealing with 
recovery payments considered in the lit¬ 
erature. Flere we consider a continuous 
version of the recovery rate, that is, R f is 
measured and received precisely at the de¬ 
fault time. In the discrete version of the re¬ 
covery rate, Rt is measured and received on 
the first date after default among a prespec¬ 
ified list Ti < ... <T n of times, where T„ is 
the maturity date T. 

7. See Flughston and Turnbull (2001). 

8. See proof on Lando (1994, Proposition 3.1) 
and Bielecki and Rutkowski (2002, Proposi¬ 
tion 8.2). 

9. For an extensive review of the treatment of 
recovery rates see Chapter 6 in Schonbucher 
(2003). 

10. Flouweling and Vorst (2001) consider the 
RFV specification for pricing credit default 
swaps. 


11. See Duffie and Singleton (1999a) and Jarrow 
(1999). 

12. There exist some empirical works that, un¬ 
der some specifications of X t and r t , find 
that the value of the recovery rate does not 
substantially affect the results, as long as 
the recovery rate lies within a logical inter¬ 
val. See, for instance, Flouweling and Vorst 
(2001) and Elizalde (2005a). 

13. See Flouweling and Vorst (2001) and 
Elizalde (2005a) for a comparison of dif¬ 
ferent specifications of (deterministic) time- 
dependent intensity rates. 

14. For a detailed description of affine pro¬ 
cesses see Duffie and Kan (1996), Duffie 
(1998), Duffie, Pan, and Singleton (2000), 
Duffie, Filipovic, and Schachermayer 
(2002), and Appendix A in Duffie and 
Singleton (2003). An affine jump-diffusion 
process is a jump-diffusion process for 
which the drift vector, instantaneous 
covariance matrix, and jump intensities all 
have affine dependence on the state vector. 
If X f is a Markov process in some space 
state D C 7 Z d , Xt is an affine jump-diffusion 
if it can be expressed as 

dX t = (X t ) dt + a (X t ) dW t + dq t 

where W t is an (Ti/Brownian motion in 'JZ d , 
li : V -> TZ d , a : D -> lZ d and q is a pure 
jump process whose jumps have a fixed 
probability distribution v on lZ d and ar¬ 
rive with intensity {/ (Xf) : t > 0}, for some 
constant / : V —> [0, oo). That is, the drift 
vector /x, instantaneous covariance matrix 
a o’ and jump intensities / all have affine 
dependence on the state vector X t . Intu¬ 
itively this means that, conditional on the 
path of X t , the jump times of q are the 
jump times of a Poisson process with time 
varying-intensity {i/ (Xf) : t > 0}, and that 
the size of the jump at time T is indepen¬ 
dent of {X s : 0 < s < t] and has the proba¬ 
bility distribution v. 

15. See Duffie and Kan (1996) and Duffie, Pan, 
and Singleton (2000). 
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16. For the basic affine model, the coefficients 
can be calculated explicitly. See Duffie and 
Garleanu (2001), Appendix A in Duffie and 
Singleton (2003), and Duffie, Pan, and Sin¬ 
gleton (2000) for details and extensions. 

17. Duffie, Pan, and Singleton (2000) developed 
a similar closed-form expression to the sec¬ 
ond term of the price of a defaultable zero- 
coupon bond Q(f, T) 


E 





ds | T t 


Using their expression, the pricing of de¬ 
faultable zero-coupon bonds with constant 
recovery of face value reduces to the com¬ 
putation of a one-dimensional integral of a 
known function. 

18. Several versions of the modeling of r t and 
Xt in this framework can be found in 
Duffie, Schroder, and Skidas (1996), Duffee 
(1999), Duffie and Singleton (1999, 2003), 
Kijima (2000), Kijima and Muromachi 
(2000), Duffie and Garleanu (2001), Bielecki 
and Rutkowski (2002), and Schonbucher 
(2003). For the estimation of an affine pro¬ 
cess intensity model without jumps see 
Duffee (1998) and Duffie, Pedersen, and Sin¬ 
gleton (2003). 

19. If Y is an n-dimensional random vector and, 
for some /x e R n and some n x n nonnegative 
definite, symmetric matrix E, the charac¬ 
teristic function (f) of Y — /x is a func¬ 
tion of the quadratic form f r Ef, i/zy-,, (f) = 
</>(f r Ef). We say that Y has an elliptical 
distribution with parameters /x, E and <j>. 
For example, normal and Student t distribu¬ 
tions are elliptical distributions. For a more 
detailed treatment of elliptical distributions 
see Bingham and Kiesel (2002) and refer¬ 
ences cited therein. 

20. See Embrechts, McNeal, and Straumann 
(1999) and Embrechts, Lindskog, and 
McNeil (2001). 

21. We can always consider a model such as 

dX'i f = Ki {0i Xj f^dt -f~ (t iXj fd f -P dcji f 


for each firm i, and introduce correlation 
via the Browian motions Wj f ,..., W[ t . 

22. Duffee (1999), Driessen (2005), and Elizalde 
(2005b) use latent variables instead of state 
variables. Collin-Dufresne, Goldstein, and 
Martin (2001) show that financial and eco¬ 
nomic variables cannot explain the correla¬ 
tion structure of intensity processes. Latent 
factors are modeled as affine diffusions and 
estimated through a maximum likelihood 
procedure based on the Kalman filter. 

23. While Driessen (2005) considers that all 
firms with the same rating are affected in 
the same way by common factors, Elizalde 
(2005b) allows for the effect of each common 
factor to differ across firms, which increases 
the flexibility of the credit risk correlation 
structure. 

24. See also Kijima (2000), Kijima and Muro¬ 
machi (2000), and Giesecke (2002b). 

25. This is a basic affine process with parame¬ 
ters ( Ki, Of , er, = 0, /x, yi). 

26. See Embrechts, Lindskog, and McNeil 
(2001) and Giesecke (2002b). 

27. Extending the diversity score (or binomial 
expansion technique) of Moody's. 

28. This dynamic version is introduced in Davis 
and Lo (2001). 

29. See Yu (2002a) and Frey and Backhaus 
(2003) for an extension of the Jarrow and 
Yu (2001) model. 

30. See Sklar (1959) and Frees and Valdez 
(1998). 

31. Note that F; (f,) = F (oo,..., U,..., oo) and 

Si(ti) = s ( 0 , ..., 0 ). 

32. For a more detailed description of cop¬ 
ula theory see Joe (1997), Frees and Valdez 
(1998), Nelsen (1999), Costinot, Roncalli, 
and Teiletche (2000), Embrechts, Lindskog, 
and McNeil (2001), De Matteis (2001), and 
Georges et al. (2001). 

33. A more formal definition would be the fol¬ 
lowing (Frey, McNeil, and Nyfeler 2001): 
An I-dimensional copula C is a function 
C : [0, l] 1 -*■ [0,1] with the following prop¬ 
erties: 
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• Grounded: For all u e [0, l] 1 , C (w) = 0 
if at least one coordinate Uj =0, j = 
1 

• Reflective: If all coordinates of u are 1 ex¬ 
cept Uj then C (u) = Uj, j = 1 ,..., I. 

• /-increasing: The C-volume of all hyper¬ 
cubes with vertices in [0, l] 1 is positive, 
i.e. 

2 2 

ii=l >;=1 

for all («!_!,..., Ujj) and (it 1,2, ■.., H/,2) in 
[ 0 , l ] 1 with Myi < m,' 2 for all / = 1 ,..., /. 

34. We will use C d , or simply C, to denote the 
copula function of default times and C s for 
the copula function of survival times. 

35. See Georges et al. (2001) for a complete char¬ 
acterization of the relation between default 
and survival copulas. 

36. C is always a copula, but C + is only a cop¬ 
ula for I > 3. 

37. A simple interpretation of Spearman's rho 
is the following. Let (Y \, Y 2 ) be a ran¬ 
dom vector of continuous random vari¬ 
ables with the same joint distribution 
function H (whose margins are F 1 and F 2 ) 
and copula C, and consider the random 
variables U = F (Yi) and V — F (Y 2 ). Then, 
we can write the Spearman's rho coefficient 
of (Yi, Y 2 ) as 


Ps (Yi, Y 2 ) = 12 JJ C (u, v) dudv - 3 

E [UV]-\ 


pur 

= i2E[in/] —3 = 

Cov(U, V) 

~ 7 Var(U)Var(V) 
= p(Fi (Yi),F 2 (Y 2 )) 


12 


= P(U, V) 


where p denotes the Pearson or linear corre¬ 
lation coefficient. So the Spearman's rho of 
the vector (Y\, Y 2 ) is the Pearson correlation 
of the random variables F\ (Yi) and F 2 (Y 2 ). 

38. Since P [Yi > F-f 1 (u) \ Y 2 > F 2 -1 (m)] can be 
written as 

1 - P [n < Ff 1 (")] - p [ y 2 < f 2 -1 («)] + p [Yj < Ff 1 (u) ,y 2 < ry 1 («)] 
i-pfvisFy 1 («)] 


39. 

40. 

41. 


42. 


43. 

44. 


45. 

46. 

47. 


48. 


we can express Xu as 




, 1 + C (u, u) — 2u 

= hm- 

u/\ 1 — U 


See Embrechts, Lindskog, and McNeil 
(2001) and Nelsen (1999) for a more detailed 
description. 

For a multivariate version of the Marshall- 
Olkin copula, see Embrechts, Lindskog, and 
McNeil (2001). 

Li (1999) considers a copula that links indi¬ 
vidual survival probabilities to model the 
joint survival probability. Flowever, as we 
have explained previously, this can be done 
exactly in the same way if we consider de¬ 
fault probabilities instead of survival prob¬ 
abilities. 

Remember that (JF.t) = (Qx.t) v (Qi,t) is the 
information generated by the state variables 
plus the information generated by the de¬ 
fault status of firm i. 

This distinction between pseudo and real 
default intensities can also be found in 
Gregory and Laurent (2002). 

If we are considering / firms, the number 
of parameters of the normal copula will 
be and we have to add the degrees 

of freedom parameter in the case of the 
f-copula. 

See Nelsen (1999). 

See also De Matteis (2001) and Frees and 
Valdez (1998). 

For a more detailed description of cop¬ 
ula parameters estimation see Frees and 
Valdez (1998), Bouye et al. (2000), Durrle- 
man, Nikeghbali, and Roncalli (2000), De 
Matteis (2001), and Patton (2002). 

We have to distinguish the case in which 
we estimate the parameters of the marginal 
distributions and the copula function al¬ 
together from the case in which we first 
estimate the parameters of the marginal 
distributions and then, using those param¬ 
eters, we estimate the parameters of the 
copula function. The latter approach is 
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called inference functions for margins or the 
IFM method. 

49. Imagine we have N random samples of a 
bivariate vector (Yi, Y 2 ), let us denote them 
by (y", 1 / 2 )/ w = 1, • • •, N. The sample esti¬ 
mators of Kendall's tau (f) and Spearman's 
rho (ps) are given by: 


12 


N 


PS 


N (N 2 — 1) 


n= 1 


—— ( rank(y") - 


n (n + 1) 


x { mnk(y") - 
d 2 


T = 


n (n + 1) 
2 


c + d N(N-l) 

v ncrri 

sign [(y[ - 3 /D (3/2 - yD] 


where c and d are the number of concordant 
and discordant pairs, respectively. 

50. In some cases analytical expressions for 
the dependence measures are available. 
Otherwise we have to use a root-finding 
procedure. 

51. For a detailed description of the diversity 
score method see Cifuentes, Murphy, and 
O'Connor (1996), Cifuentes and O'Connor 
(1996), and Cifuentes and Wilcox (1998). 

52. See Jouanin et al. (2001). 

53. See Committee on the Global Financial 
System (2003). 
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Abstract: Structural models and reduced-form models are the two primary types of credit risk 
models that seek to statistically describe default processes. Structural models use the evolution of 
firms' structural variables, such as asset and debt values, to model the time of default. In contrast, 
reduced-form models do not consider structural variables in an explicit manner when modeling 
default processes; instead, they model default as an exogenously driven process. Structural models 
include first passage models, liquidity process models, and state dependent models. 


In this entry we review the structural approach 
for credit risk modeling, both considering the 
case of a single firm and the case with de¬ 
fault dependencies between firms. In the sin¬ 
gle firm case, we review the Merton (1974) 
model and first passage models, examining their 
main characteristics and extensions. Liquida¬ 
tion process models extend first passage mod¬ 
els to account for the possibility of a lengthy 
liquidation process, which might or might not 
end up in default. Finally, we review structural 
models with state-dependent cash flows (reces¬ 
sion vs. expansion) or debt coupons (rating- 
based). The estimation of structural models is 
also addressed in this entry, covering the dif¬ 
ferent ways proposed in the literature. Finally, 
we present some approaches to model default 
dependencies between firms within the struc¬ 
tural approach. These approaches account for 
two types of default correlations: cyclical default 
correlation and contagion effects. 


REVIEW OF STRUCTURAL 
MODELS 

Structural models use the evolution of firms' 
structural variables, such as asset and debt val¬ 
ues, to determine the time of default. Merton's 
model (1974) was the first modern model of 
default and is considered the first structural 
model. In Merton's model, a firm defaults if, 
at the time of servicing the debt, its assets 
are below its outstanding debt. A second ap¬ 
proach, within the structural framework, was 
introduced by Black and Cox (1976). In this ap¬ 
proach defaults occur as soon as the firm's asset 
value falls below a certain threshold. In contrast 
to the Merton approach, default can occur at any 
time. 

Reduced form models do not consider the re¬ 
lation between default and firm value in an ex¬ 
plicit manner. In contrast to structural models, 
the time of default in intensity models is not 


341 




342 


Credit Risk Modeling 


determined via the value of the firm, but it is 
the first jump of an exogenously given jump 
process. The parameters governing the default 
hazard rate are inferred from market data. 1 

Structural default models provide a link be¬ 
tween the credit quality of a firm and the 
firm's economic and financial conditions. Thus, 
defaults are endogenously generated within 
the model instead of exogenously given as 
in the reduced approach. Another difference 
between the two approaches refers to the 
treatment of recovery rates: Whereas reduced 
models exogenously specify recovery rates, in 
structural models the value of the firm's assets 
and liabilities at default will determine recov¬ 
ery rates. 

The structural literature on credit risk starts 
with the paper by Merton (1974), who applies 
the option pricing theory developed by Black 
and Scholes (1973) to the modeling of a firm's 
debt. In Merton's model, the firm's capital struc¬ 
ture is assumed to be composed by equity and 
a zero-coupon bond with maturity T and face 
value of D. The firm's equity is simply a Eu¬ 
ropean call option with maturity T and strike 
price D on the asset value and, therefore, the 
firm's debt value is just the asset value minus 
the equity value. This approach assumes a very 
simple and unrealistic capital structure and im¬ 
plies that default can only happen at the matu¬ 
rity of the zero-coupon bond. 

Black and Cox (1976) introduced the first of 
the so-called/zrsf passage models (FPM). First pas¬ 
sage models specify default as the first time the 
firm's asset value hits a lower barrier, allowing 
default to take place at any time. When the de¬ 
fault barrier is exogenously fixed, as in Black 
and Cox (1976) and Longstaff and Schwartz 
(1995), it acts as a safety covenant to protect 
bondholders. Alternatively it can be endoge¬ 
nously fixed as a result of the stockholders' 
attempt to choose the default threshold that 
maximizes the value of the firm. 2 

Structural models have considered interest 
rates both as nonstochastic processes 3 and as 
stochastic processes. 4,5 


In first passage models, by definition, default 
occurs the first time the asset value goes below 
a certain lower threshold, that is, the firm is liq¬ 
uidated immediately after the default event. In 
contrast with first passage models, a new set of 
models has been put forward, supported by re¬ 
cent theoretical and empirical research, where 
a default event does not immediately cause liq¬ 
uidation but it represents the beginning of a 
process, the liquidation process, which might 
or might not cause liquidation after it is com¬ 
pleted. This practice is consistent, for example, 
with Chapter 11 of the U.S. Bankruptcy Law, 
where firms filing for bankruptcy are granted 
a court-supervised grace period (up to several 
years) aimed at sorting out their financial prob¬ 
lems in order to, if possible, avoid liquidation. 
We label those models liquidation process models 
(LPM). 

State dependent models (SDM) represent, to¬ 
gether with LPM, two recent efforts to incor¬ 
porate into structural models different real-life 
phenomena. Although theoretically they make 
good sense, they lack empirical research test¬ 
ing their performance. SDM assume that some 
of the parameters governing the firm's ability 
to generate cash flows or its funding costs are 
state dependent, where states can represent the 
business cycle (recession vs. expansion) or the 
firm's external rating. 

After the single firm case, we review some 
structural models for default correlations, in 
order to account for both cyclical default 
correlation 6 as well as credit risk contagion 
effects. 7 We will finish the default correlation 
section mentioning the so-called factor models. 8 

We concentrate on the review of the dynam¬ 
ics of the processes that generate the default 
times, without paying attention to the valua¬ 
tion formulas for defaultable bonds that each 
model generates. The aim of this entry is to 
serve as an introduction and guide to the litera¬ 
ture of structural credit risk models. We provide 
an extensive list of references for each model 
specification and possible extensions or related 
papers. 
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SINGLE FIRM 

We denote the physical and risk-neutral prob¬ 
ability measures as P and P respectively, 
and assume an arbitrage-free market. 9 Unless 
otherwise stated, all probabilities and expec¬ 
tations are taken under the risk-neutral mea¬ 
sure. The model for the default-free term 
structure of interest rates is given by a short- 
rate process r t . 

Merton's Model 

Merton (1974) makes use of the Black and 
Scholes (1973) option pricing model to value 
corporate liabilities. This is a straightforward 
application only if we adapt the firm's capital 
structure and the default assumptions to the 
requirements of the Black-Scholes model. Let 
us assume that the capital structure of the firm 
is comprised by equity and by a zero-coupon 
bond with maturity T and face value of D, 
whose values at time f are denoted by £ t and 
z(f, T) respectively, for 0 < t < T. The firm's as¬ 
set value Vt is simply the sum of equity and 
debt values. Under these assumptions, equity 
represents a call option on the firm's assets with 
maturity T and strike price of D. If at maturity T 
the firm's asset value Ur is enough to pay back 
the face value of the debt D, the firm does not 
default and shareholders receive Vt — D. Oth¬ 
erwise ( Vi < D) the firm defaults, bondholders 
take control of the firm, and shareholders re¬ 
ceive nothing. Implicit in this argument is the 
fact that the firm can only default at time T. This 
assumption is important to be able to treat the 
firm's equity as a vanilla European call option, 
and therefore apply the Black-Scholes pricing 
formula. 

The rest of the assumptions Merton (1974) 
adopts are the inexistence of transaction costs, 
bankruptcy costs, taxes, or problems with in¬ 
divisibilities of assets; continuous time trading; 
unrestricted borrowing and lending at a con¬ 
stant interest rate r ; no restrictions on the short 
selling of the assets; the value of the firm is 
invariant under changes in its capital structure 


(Modigliani-Miller theorem), and that the firm's 
asset value follows a diffusion process. 

The firm's asset value is assumed to follow a 
diffusion process given by 

dV t = rVtdt + a v V t dW t (1) 

where ay is the (relative) asset volatility and W f 
is a Brownian motion. 10 

The payoffs to equityholders and bondhold¬ 
ers at time T under the assumptions of this 
model are respectively, max{Vr —D, 0} and 
Vr — Et, that is. 


E t = max{V T - D,0} (2) 

z(T, T) = V T — E t (3) 


Applying the Black-Scholes pricing formula, 
the value of equity at time f(0<f<T)is given 
by 


Et {Vt, ay, T — t) 


= 


[V (T - f) V f O (d\) - DO (d 2 ) (4) 


where 0(.) is the distribution function of a stan¬ 
dard normal random variable and d\ and d 2 are 
given by 


In 


di = 


d 2 = di 


(^) + K( T -f) 


oV\/T - f 
ayVT — t 


( 5 ) 

( 6 ) 


The probability of default at time T is given by 
P[V T < D] = 0(-d 2 ) (7) 


Therefore, the value of the debt at time t is 
z(t, T) = V t — Ef. 

In order to implement Merton's model we 
have to estimate the firm's asset value V t , its 
volatility a v (both unobservable processes), and 
we have to transform the debt structure of the 
firm into a zero-coupon bond with maturity T 
and face value D. 

The maturity T of the zero-coupon bond can 
be chosen either to represent the maturity struc¬ 
ture of the debt, for example as the Macaulay 
duration of all the liabilities, or simply as a re¬ 
quired time horizon (for example, in case we 
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are pricing a credit derivative with some spe¬ 
cific maturity). 

Criticisms and Extensions 
The main advantage of Merton's model is 
that it allows us to directly apply the the¬ 
ory of European options pricing developed by 
Black and Scholes (1973). But to do so the 
model needs to make the necessary assump¬ 
tions to adapt the dynamics of the firm's asset 
value process, interest rates, and capital struc¬ 
ture to the requirements of the Black-Scholes 
model. There is a trade-off between realistic 
assumptions and ease of implementation, and 
Merton's model opts for the latter one. All ex¬ 
tensions to this model introduce more realistic 
assumptions trying to end up with a model not 
too difficult to implement and with closed, or 
at least numerically feasible, solutions for the 
expressions of the debt value and the default 
probabilities. Merton (1974) presents some ex¬ 
tensions to the model, in order to account for 
coupon bonds, callable bonds, stochastic inter¬ 
est rates, and relaxing the assumption that the 
Modigliani-Miller theorem holds. 

One problem of Merton's model is the restric¬ 
tion of default time to the maturity of the debt, 
ruling out the possibility of an early default, no 
matter what happens with the firm's value be¬ 
fore the maturity of the debt. If the firm's value 
falls down to minimal levels before the maturity 
of the debt but it is able to recover and meet the 
debt's payment at maturity, the default would 
be avoided in Merton's approach. 

Another handicap of the model is that the 
usual capital structure of a firm is much more 
complicated than a simple zero-coupon bond. 
Geske (1977, 1979) considers the debt struc¬ 
ture of the firm as a coupon bond, in which 
each coupon payment is viewed as a com¬ 
pound option and a possible cause of default. 
At each coupon payment, the shareholders 
have the option either to make the payment to 
bondholders, 11 obtaining the right to control the 
firm until the next coupon, or to not make the 
payment, in which case the firm defaults. Geske 
also extends the model to consider character¬ 


istics such as sinking funds, safety covenants, 
debt subordination, and payout restrictions. 

The assumption of a constant and flat term 
structure of interest rates is another major criti¬ 
cism the model has received. Jones et al. (1984, 
p. 624) suggest that "there exists evidence that 
introducing stochastic interest rates, as well 
as taxes, would improve the model's perfor¬ 
mance." Stochastic interest rates allow us to 
introduce correlation between the firm's asset 
value and the short rate, and have been consid¬ 
ered, among others, by Ronn and Verma (1986), 
Kim, Ramaswamy, and Sundaresan (1993), 
Nielsenet al. (1993), Longstaff and Schwartz 
(1995), Briys and de Varenne (1997), and Hsu, 
Saa-Requejo, and Santa-Clara (2004). 

Another characteristic of Merton's model, 
which will also be present in some of the FPM, 
is the predictability of default. Since the firm's 
asset value is modeled as a geometric Brow¬ 
nian motion and default can only happen at 
the maturity of the debt, it can be predicted 
with increasing precision as the maturity of the 
debt comes near. As a result, in this approach 
default does not come as a surprise, which 
makes the models generate very low short-term 
credit spreads. 12 As we shall review, introduc¬ 
ing jumps in the process followed by the asset 
value has been one of the solutions considered 
to this problem. 

Delianedis and Geske (2001) study the pro¬ 
portion of the credit spread that, in a corporate 
bond data set, is explained by default risk, us¬ 
ing the Merton (1974) and Geske (1977) frame¬ 
works. They conclude that it only explains a 
small fraction of the credit spreads; the rest is 
attributable to taxes, jumps, liquidity, and mar¬ 
ket risk factors. They also include a jump com¬ 
ponent in the Merton model finding that (p. 24) 
"while jumps may explain a portion of the resid¬ 
ual spread it is unlikely that jumps can explain 
it entirely." 

First Passage Models 

First passage models were introduced by Black 
and Cox (1976) extending the Merton model to 
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the case when the firm may default at any time, 
not only at the maturity date of the debt. 

Consider, as in the previous section, that the 
dynamics of the firm's asset value under the 
risk-neutral probability measure P are given by 
the diffusion process 

dV t = rV t dt + a v V t dW t (8) 

and that there exists a lower level of the asset 
value such that the firm defaults once it reaches 
this level. Although Black and Cox (1976) con¬ 
sidered a time-dependent default threshold, let 
us assume first a constant default threshold 
K > 0. If we are at time t > 0, default has not 
been triggered yet and V t > K, then the time of 
default r is given by 

r =inf{s > t | V s < K] (9) 


Using the properties of the Brownian motion 
Wt, in particular the reflection principle, we 
can infer the default probability from time t to 
time T : 13 


P [r <T \ r > t]— <J> (h\) + exp 


2 lr-f 


where 


'"(iAnO + hr-f) 


h ' a v% rrn 

/z 2 = hi — oyVT — t 


( 10 ) 

( 11 ) 

( 12 ) 


FPM have been extended to account for 
stochastic interest rates, bankruptcy costs, 
taxes, debt subordination, strategic default, 
time-dependent and stochastic default barriers, 
jumps in the asset value process, and so on. 
Although these extensions introduce more real¬ 
ism into the model, they increment its analytical 
complexity. 14 

The default threshold, always positive, can 
be interpreted in various ways. We can think 
of it as a safety covenant of the firm's debt, 
which allows the bondholders to take con¬ 
trol of the company once its asset value has 
reached this level. The safety covenant would 


act as a protection mechanism for the bond¬ 
holders against an unsatisfactory corporate per¬ 
formance. In this case, the default threshold 
would be deterministic, although possibly time 
dependent, and exogenously fixed when the 
firm's debt is issued. Kim, Ramaswamy, and 
Sundaresan (1993) and Longstaff and Schwartz 
(1995) assume an exogenously given constant 
default threshold K. Black and Cox (1976) con¬ 
sider a time-dependent default barrier given by 
e -y( T ~i) k. A particular case of the Black and 
Cox default threshold specification is to con¬ 
sider y — r, that is, to consider a default barrier 
equal to the face value of the debt discounted 
at the risk-free interest rate. In that case, the 
default threshold can be made stochastic if the 
model considers a stochastic process for the in¬ 
terest rate, as in Briys and de Varenne (1997). 

Longstaff and Schwartz (1995) choose a con¬ 
stant default threshold and point out that "since 
it is the ratio of Vt to K, rather than the actual 
value of K, that plays the major role in our anal¬ 
ysis, allowing a more general specification for K 
simply makes the model more complex without 
providing additional insight into the valuation 
of risky debt." 

Hsu, Saa-Requejo, and Santa-Clara (2004) 
suggest that V t and K do not matter directly 
to the valuation of default risky bonds but only 
through their ratio, which is a measure of the 
solvency of the firm. They model the default 
threshold as a stochastic process, which to¬ 
gether with the stochastic process assumed for 
the firm's asset value, allow them to obtain the 
stochastic process of the ratio j-- The dynamics 
of the ratio ^ are used to price corporate bonds. 

The default threshold can also be chosen en¬ 
dogenously by the stockholders to maximize 
the value of the equity. 15 The literature has also 
considered the possibility of negotiation pro¬ 
cesses between stockholders and bondholders 
when the firm goes near the point of financial 
distress, from which the default threshold is 
determined. 16 

Similar to the description of the choice of 
the face-value of the zero-coupon in the Mer¬ 
ton model, in FPM the default threshold can be 
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calculated as a weighted average of short and 
long-term debts. 

Interest rates can be considered either as 
a constant or as a stochastic process. 17 The 
stochasticity of interest rates allows the model 
to introduce correlation between asset value 
and interest rates, and to make the default 
threshold stochastic, in the cases when it is spec¬ 
ified as the discounted value of the face value 
of the debt. Nielsen et al. (1993) and Longstaff 
and Schwartz (1995) consider a Vasicek process 
for the interest rate, correlated with the firm's 
asset value: 

dV t = (c - d)V f dt + avVtdWt (13) 

dr t — (a — br t ) dt + a r dW t (14) 

dW t dW, = pdt (15) 

where W) and W f are correlated Brownian mo¬ 
tions. Other specifications for the stochastic pro¬ 
cess of the short rate have been considered. For 
example Kim, Ramaswamy, and Sundaresan 
(1993) suggest a CIR process 

dr t = (a — br t ) dt + (T r ^/ftdW t (16) 

and Briys and de Varenne (1997) a generalized 
Vasicek process 

dr t = (a(t) - b(t)r t )dt + <r r (t)dW t (17) 

Hsu, Saa-Requejo, and Santa-Clara (2004) con¬ 
sider both the case of independence between 
risk-free interest rates and the default generat¬ 
ing mechanism (given by the dynamics of the 
ratio y) and the case of correlation between 
both processes, specifying the risk-free rate as 
a CIR process. They present an interesting em¬ 
pirical illustration of the model, covering the 
calibration of the risk-free rate process and the 
estimation of the model's parameter through 
the generalized method of moments. 

Drawbacks and Extensions 
The principal drawback of FPM is the analyti¬ 
cal complexity that they introduce, which is in¬ 
creased if we consider stochastic interest rates 
or endogenous default thresholds. This mathe¬ 


matical complexity makes it difficult to obtain 
closed form expressions for the value of the 
firm's equity and debt, or even for the default 
probability, forcing us to make use of numerical 
procedures. 

The empirical testing of FPM and struc¬ 
tural models in general has not been very 
successful. 18 Eom, Helwege, and Huang (2003), 
who carry out an empirical analysis of five mod¬ 
els (Merton, Geske, Leland and Toft, Longstaff 
and Schwartz, and Collin-Dufresne and Gold¬ 
stein), conclude that (p. 502) 

Using estimates from the implementations we con¬ 
sider most realistic, we agree that the five structural 
bond pricing models do not accurately price corpo¬ 
rate bonds. However, the difficulties are not limited 
to the underprediction of spreads.... they all share 
the same problem of inaccuracy, as each has a dra¬ 
matic dispersion of predicted spreads. 

Zhou (1997, Abstract) indicates that "the em¬ 
pirical application of a diffusion approach has 
yielded very disappointing results." 

Another drawback of the structural models 
presented before is the so-called predictability 
of defaults. Generally, structural models con¬ 
sider continuous diffusion processes for the 
firm's asset value and complete information 
about the asset value and default threshold. In 
this setting, the actual distance from the asset 
value to the default threshold tells us the near¬ 
ness of default, in such a way that if we are 
far away from default the probability of default 
in the short-term is close to zero, because the 
asset value process needs time to reach the de¬ 
fault point. The knowledge of the distance of 
default and the fact that the asset value follows 
a continuous diffusion process makes default a 
predictable event, that is, default does not come 
as a surprise. 

This predictability of defaults makes the mod¬ 
els generate short-term credit spreads close to 
zero. In contrast, it is observed in the mar¬ 
ket that even short-term credit spreads are 
bounded from below, incorporating the possi¬ 
bility of an unexpected default or deterioration 
in the firm's credit quality. 19 
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The same characteristics of the structural 
models that imply the predictability of default 
also imply predictability of recovery. In mod¬ 
els that do not consider strategic defaults, the 
bondholders get the remaining value of the firm 
in case of default, which is precisely the value 
of the default threshold at default. Thus, if we 
assume complete information about the asset 
value and default threshold, the recovery rate 
is also a predictable quantity. 

Essentially, two ways out of the predictability 
effects of structural models have been proposed 
in the literature. The predictability of default 
comes from the assumption of investors' per¬ 
fect knowledge of the firm's asset value and 
default threshold. In practice, it is not possible 
to deduce from the capital structure of the firm 
neither the value of the firm V t , its volatility 
cry, nor the level of the default threshold. If we 
consider incomplete information about either 
the firm value process, the default threshold 
(or both), investors can only infer a distribu¬ 
tion function for these processes, which makes 
defaults impossible to predict. These considera¬ 
tions can be found, among others, in Duffie and 
Lando (2001), Giesecke (2005), and Jarrow and 
Proffer (2004). 20 

The second way consists in incorporating 
jumps in the dynamics of the firm value, which 
implies that the asset value of the firm can sud¬ 
denly drop, reducing drastically the distance 
of default (between the asset value and de¬ 
fault threshold), or even causing a default if 
the drop is sufficiently high. Thus, default is 
not a predictable event anymore, the default 
probabilities for short maturities do not tend 
to zero, and so the credit spreads generated. 
Zhou (1997, 2001a) and Hilberink and Rogers 
(2002) deal with structural models in which the 
firm's asset value incorporates a jump compo¬ 
nent. While Zhou extends the Longstaff and 
Schwartz (1995) model considering a lognor- 
mally distributed jump component, Hilberink 
and Rogers (2002) opt for an extension of Le- 
land (1994) and Leland and Toft (1996) using 
Levy processes, which only allow for down¬ 


ward jumps in the firm's value. Both models 
avoid the problem of default predictability im¬ 
plying positive credit spreads for short matu¬ 
rities. Another characteristic of jump models is 
that they convert the recovery payment at de¬ 
fault in a random variable, since the value of 
the firm can drop suddenly below the default 
threshold, whereas if the firm's value follows a 
diffusion process without jumps, the value of 
the firm at default, that is, what bondholders 
get, is always equal to the default threshold be¬ 
cause of the continuity of the firm's value path. 

Fouque, Sircar and Solna (2006) consider 
the effect of introducing stochastic volatility 
in FPM, finding that it increases short-term 
spreads. 

Davydenko (2005) criticizes existing struc¬ 
tural models because they obviate the liquid¬ 
ity reasons as the main determinants of default 
for some firms, particularly the ones with high 
external financing costs (p. 2): 

Several default triggers have been proposed in 
structural models of debt pricing. Most models 
assume that a firm defaults when the market 
value of its assets falls below a certain boundary 
(Black and Cox, 1976; Leland, 1994). This default 
boundary may correspond to an exogenous net- 
worth covenant, or to the endogenously determined 
threshold at which equityholders are no longer will¬ 
ing to service debt obligations. Should the firm find 
itself in a liquidity crisis while its asset value is still 
above the boundary, equityholders in these models 
will always be willing and able to avoid default by 
raising outside financing. This approach contrasts 
ivith the assumption that firms default when cur¬ 
rent assets fall short of current obligations, due to 
either a minimum cash-flow covenant, or market 
frictions precluding the firm from raising sufficient 
new external financing (Kim et al., 1993; Ander¬ 
son and Sundaresan, 1996). Models incorporating 
both value- and liquidity-based defaults are rare, 
and little empirical evidence is available to moti¬ 
vate the choice of the default trigger. If, in reality, 
default is triggered by different factors for different 
firms, existing models are likely to lack accuracy in 
predictions. 

Davydenko (2005), using a sample of U.S. 
(speculative rating-grade) bond issuers from 
1996 to 2003, shows that the importance of 
liquidity shortages in triggering default for a 
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particular firm depends on the firm's cost of ex¬ 
ternal financing (p. 2): "firms with low costs of 
external financing default when the continua¬ 
tion value of assets is low. By contrast, if exter¬ 
nal funds are costly, a liquidity crisis may force 
reorganization even if the going-concern sur¬ 
plus is still substantial." 21 Moreover, the author 
presents empirical evidence against the view 
that default is triggered when the asset value 
crosses a particular threshold. 

Therefore, empirical evidence suggests that 
structural models need to be theoretically ex¬ 
tended in order to incorporate the possibility of 
the firms defaulting because of liquidity short¬ 
ages and high funding costs. 


Estimation 

The literature provides several ways of calibrat¬ 
ing V t and ay. The first method makes use of 
Ito's lemma to obtain a system of two equations 
in which the only two unknown variables are 
Vt and cry. 22 Assume the firm's equity value 
follows a geometric Brownian motion under P, 
with volatility a £ ■ 


dE t = rE t dt + ciEEtdWt (18) 


Since the value of the equity is a function of time 
and of the value of the assets, E t = f (Vt, f), we 
can apply Ito's lemma to get 


dE t = 


SfjVtJ) Sf (V t , t) 
St sv t 

1 s 2 f(v t ,t) 


+ 


2 (8V t ) 2 

Sf(Vt,t) 


(VtavT 


V t r 


dt 


SV t 


VcTvdWt 


(19) 


Comparing the coefficients multiplying the 
Brownian motion in the two previous equations 
we obtain the following identity 




SVt 


V, cry 


( 20 ) 


Noting that = <I> (di) and rearrang¬ 

ing we obtain the first equation of the system: 23 

ay = ^a £ 0(di) (21) 

Vt 

The second equation results simply from 
matching the theoretical value of equity with 
the observed market price (£f): 

Et(V t ,av,T-t) = £t (22) 

As we mentioned before, the only two un¬ 
knowns in the system formed by the last two 
equations are V t and cry . 24 

Duan (1994) points out some drawbacks of 
the previous method. First, the method con¬ 
siders the equity volatility as constant and in¬ 
dependent of the firm's asset value and time. 
Second, he claims that the first equation is re¬ 
dundant since it is used to derive the second 
equation. And third, the traditional method 
does not provide us with distribution functions, 
or even confidence intervals, for the estimates 
of V t and cry. 

Duan (1994) proposes another method of es¬ 
timating V t and cry, based on maximum like¬ 
lihood estimation using equity prices and the 
one-to-one relationship between equity and as¬ 
set levels given by (4) , 25 Duan et al. (2004) follow 
the maximum likelihood approach introduced 
by Duan (1994) but, unlike previous works, 
they take into account the survivorship issue, 
by incorporating into the likelihood function 
the fact that a firm survived. They argue that 
(p. 3), "In the credit risk setting, it is impera¬ 
tive for analysts to recognize the fact that a firm 
in operation has by definition survived so far. 
Estimating a credit risk model using the sample 
of equity prices needs to reflect this reality, or 
runs the risk of biasing the estimator." 

Duan and Fulop (2005) extend Duan's (1994) 
maximum likelihood estimation method to ac¬ 
count for the fact that observed equity prices 
might be contaminated by trading noises. They 
find that taking into account trading noises gen¬ 
erates lower estimates for the asset volatility ay 
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and therefore overestimates the firms' default 
probabilities. 

Bruche (2005) describes how structural mod¬ 
els can be estimated using a simulated max¬ 
imum likelihood procedure, which allows us 
to use data on any of the firm's traded claims 
(bonds, equity, CDS,...) as well as balance sheet 
information to improve the efficiency of the es¬ 
timation. The paper explores the possibility of 
considering that not only equity, but the rest of 
the claims used in the estimation procedure can 
be priced with noise, showing that (p. 3) "even 
small amounts of noise can have serious con¬ 
sequences for estimation results when they are 
ignored." 

A different way of estimating V t and ay, 
which can be found in Jones et al. (1984), con¬ 
sists simply of estimating the asset value as 
the sum of the equity market value, the market 
value of traded debt, and the estimated value 
of nontraded debt. Provided with a time series 
for V t we can estimate its volatility ay. 

Hull, Nelken, and White (2004) propose a way 
to estimate the model's parameters from im¬ 
plied volatilities of options on the company's 
equity, avoiding the need to estimate cte arid to 
transform the firm's debt structure into a zero- 
coupon bond. Using as inputs two equity im¬ 
plied volatilities and an estimate of the firm's 
debt maturity T, their model provides us with 
an estimate of a v and the leverage ratio De ^ - , 
which allows us to calculate E, and the probabil¬ 
ity of default. We should note that to calculate 
the value of the debt z(f, T) = V t — E t we still 
need an estimate for V t . 26 

We still have to estimate the default threshold 
K. Sundaram (2001) indicates that (p. 7) "de¬ 
fault tends to occur in practice when the market 
value of the firm's assets drops below a critical 
point that typically lies below the book value 
of all liabilities, but above the book value of 
short-term liabilities." Thus, one approach is to 
choose a value for D between those two limits. 
Davydenko (2005) estimates the default thresh¬ 
old to be around 72% of the firm's face value 
of debt. 


Liquidation Process Models 

In FPM default occurs the first time the asset 
value goes below a certain lower threshold, that 
is, the firm is liquidated immediately after the 
default event; the default event corresponding 
to the crossing of the asset value through the 
lower barrier. In contrast with FPM, a new set 
of models considers the case where the default 
event does not immediately cause liquidation 
but it represents the beginning of a process, the 
liquidation process, which might or might not 
cause liquidation after it is completed. As ex¬ 
plained earlier, we refer to these models as liq¬ 
uidation process models (LPM). 

The distinction between the terms default event 
and liquidation must be clear to understand 
LPM and their differences with FPM. A default 
event takes place when the firm's asset value 
V t goes below the lower threshold K (which 
can be exogenous, constant, time dependent, 
stochastic, or endogenously derived). A default 
event signals the beginning of a financially dis¬ 
tressed period, which will not necessarily lead 
to liquidation. Liquidation takes place when the 
firm is actually liquidated, its activity stops, 
and its remainings are distributed among its 
claimholders. 

In FPM described above the default event 
does coincide with liquidation. 27 However, as 
pointed out by Couderc and Renault (2005, 
p. 2), most liquidations "do not arise suddenly 
but are rather the conclusion of a long lasting 
process." As pointed out by Moraux (2004, p. 3): 
"Empirical studies in USA have found that ad¬ 
ditional 'survival' periods beyond the main de¬ 
fault event last up to 3 years (Altman-Eberhart 
(1994), Betker (1995), Hotchkiss (1995)). Hel- 
wege (1999) reports that the longest default of 
modern US junk bond market is seven years 
long." 28 

The fact that the liquidation process can take 
quite a while implies that when empirically 
studying the causes of liquidation past informa¬ 
tion shows up as a significant explanatory vari¬ 
able, together, of course, with contemporaneous 
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information, because it comprises information 
about the liquidation process. Information here 
refers to the firms' financial variables as well 
as financial markets, business cycle, credit mar¬ 
kets, and default cycle indicators. Couderc and 
Renault (2005) use a database containing the 
rating history of over ten thousand firms for 
the period 1981-2003 and analyze, using dura¬ 
tion models, whether past values of several fi¬ 
nancial markets (business cycle, credit markets, 
and default cycle) are relevant in explaining 
default probabilities in addition to their con¬ 
temporaneous values. Their results show the 
critical importance of past information in de¬ 
fault probabilities. 

LPM extend FPM to account for the fact that 
the liquidation time takes place after (some¬ 
times quite a lot after) the occurrence of a 
default event. Francois and Morellec (2004), 
Moraux (2004), and Galai, Raviv, and Wiener 
(2005) put forward a theoretical LPM. 

Francois and Morellec (2004) argue that while 
in most of FPM the default event leads to an im¬ 
mediate liquidation of the firm's assets, firms 
in financial distress have several options to 
deal with their distress. First, under Chapter 7 
of the U.S. Bankruptcy Code, they can liqui¬ 
date its assets straight away. This possibility 
would fit FPM. However firms can also file 
for bankruptcy under Chapter 11 of the U.S. 
Bankruptcy Code and start a court-supervised 
liquidation process. The authors refer to exist¬ 
ing literature (p. 390) to provide some evidence 
about the relevance of Chapter 11: 

Upon default, the court grants the firm a period of 
observation during which the firm can renegotiate 
its claims. At the end of this period, the court decides 
whether the firm continues as a going concern or 
not. 

Empirical studies show that most firms emerge 
from Chapter 11. Only a few firms (5%, accord¬ 
ing to Gilson, John, and Lang [1990] and Weiss 
[1990], and between 15% and 25%, according to 
Morse and Shaw [1988]) are eventually liquidated 
under Chapter 7 after filing Chapter 11. Why do 
some firms recover while others do not? It is gener¬ 
ally acknowledged (see Wruck 1990 or White 1996) 
that there exist two types of defaulting firms. First, 


firms that are economically sound promptly recover 
under Chapter 11. Default was only due to a tem¬ 
porary financial distress. Second, firms that are 
economically unsound keep on losing value under 
Chapter 11. 29 

Francois and Morellec consider that, after a 
default event, i.e. after the asset value V t goes 
below the lower threshold K, a firm is liqui¬ 
dated if and only if V t remains below K con¬ 
secutively during a period of time of a given 
length d (which in their numerical simulations 
they take to be two years). If a default event 
happens and the asset value remains under the 
lower threshold for a period lower than d, the 
liquidation process finishes and the firm contin¬ 
ues in business as usual. The term consecutively 
in the definition of liquidation above means that 
the number of successfully managed past de¬ 
fault events and liquidation periods 30 the firm 
has experienced does not affect the maximum 
length d of future liquidation periods. 

The authors provide closed-form solutions for 
corporate debt and equity values and analyze 
the implications of the model for optimal lever¬ 
age and credit spreads. Numerical simulations 
show that credit spreads are an increasing func¬ 
tion of the length d. 

Moraux (2004) extends the Francois and 
Morellec (2004) model including an additional 
cause of liquidation to Francois and Morellec's 
one (which they call liquidation procedure A). 
Under his proposed liquidation procedure, pro¬ 
cedure B, liquidation happens when the total, 
that is, cumulative, time the firm's assets value 
stands under the default threshold exceeds d. 
The difference between procedures A and B lies 
in the words consecutively and cumulative, and 
Moraux (2004, p. 17) explains it clearly: 

Under the procedure A, each time the firm value 
process passes through and above K, the liquida¬ 
tion procedure is closed and the hypothetical distress 
counter is set to zero. The next time a default event 
occurs, an identical procedure is run and an equal 
period of time d is granted.... Under the procedure 
B, the distress counter is never set to zero. Sub¬ 
sequent granted periods (and therefore tolerance) 
will be lower and lower as more default events and 
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long financial distress will be observed. In fact, the 
granted time is lowered (each time) by the duration 
just used. 

Financial distress refers to the situation in 
which Vf < K. A firm can be liquidated by ei¬ 
ther one or the other liquidation procedures. 
Moraux (2004) shows that any liquidation pro¬ 
cedure based on the time spent by the firm in 
financial distress is bounded by the procedures 
A and B in the sense that its implied liquidation 
date will be higher (lower) than the liquidation 
date implied by procedure B (A). 

The author derives closed form solutions for 
different claims such as equity different senior¬ 
ity debts, and convertible debt. In particular, 
the value of equity is derived as a down and 
out Parisian option written on the firm assets 
under liquidation procedure A and as a down 
and out cumulative call option under liquida¬ 
tion procedure B. Numerical simulations show 
that the value of equity is an increasing function 
of d , and that, unlike in Francois and Morellec 
(2004), credit spreads increase or decrease with 
d depending on the seniority of the debt. 

Galai, Raviv, and Wiener (2005) represent a 
step forward in the refinement of LPM, propos¬ 
ing a model extending and including the two 
previous ones. They argue that in the two pre¬ 
vious models, the only thing that matters for 
a firm to be liquidated is the amount of time 
it spends in financial distress (either succes¬ 
sively or cumulatively), but they fail to (p. 5) 
"capture the following two common features 
of bankruptcy procedures: (i) Recent distress 
events may have a greater effect on the deci¬ 
sion to liquidate a firm's assets then old dis¬ 
tress events.... (ii) Severe distress events may 
have greater effect on the decision to liquidate 
a firm than mild distress events." To account for 
such two stylized facts, the authors propose a 
structural model in which a firm is liquidated 
when a state variable representing the cumu¬ 
lative weighted time period spent by the firm 
in distress exceeds d. At each time, the cumu¬ 
lative weighted time period is computed as a 
weighted average of the total time spent by 


the firm in distress, weighted by (1) how far 
away in the past such distress occurred and (2) 
how severe was such a distress, where distress 
severity is measured as an increasing function 
of max {0, K — V}. 

Galai, Raviv, and Wiener's model has as spe¬ 
cial cases models such as Merton (1974), Black 
and Cox (1976), Leland (1994), Fan and Sun- 
daresan (2001), Francois and Morellec (2004), 
and Moraux (2004). As a consequence it repre¬ 
sents a general general LPM so far. They solve 
the model numerically using Monte Carlo sim¬ 
ulation based on Parisian options and Parisian 
contracts techniques to value debt and equity. 
They provide a very intuitive comparison of the 
liquidation mechanics in their general model 
with Francois and Morellec's and Moraux's 
ones, showing that Moraux's cumulative liq¬ 
uidation procedure (B) has too strong mem¬ 
ory because far-away distress periods have the 
same impact on liquidation triggering as cur¬ 
rent ones. 

Although theoretically very appealing, LPM 
have not, unlike FPM, been empirically tested, 
and remains a field for future research. 

State Dependent Models 

Another avenue for (so far) theoretical re¬ 
search within the structural approach consists 
of extending standard models with regime 
switching: Some of the model parameters are 
state-contingent. As we review below, states 
can represent the state of the business cycle or 
simply the firm's external rating. Cash flows, 
bankruptcy costs, and funding costs might be 
state-dependent. 

This branch of structural models is able to re¬ 
duce the problems of predictability of defaults 
(and recovery) suffered by standard models be¬ 
cause the firm is subject to exogenous changes 
of parameters, which affect its ability to gener¬ 
ate cash flows or its funding costs, which are 
the main drivers of default probabilities. 

Flackbarth, Miao, and Morellec (2004) and 
Elizalde (2005b) put forward two different 
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models illustrating the previous ideas. In both 
cases the authors provide closed form expres¬ 
sions for the value of equity and debt, whose 
solutions imply solving systems of ordinary dif¬ 
ferential equations. 

In Hackbarth, Miao, and Morellec (2004) cash 
flows and recovery rates depend on the state 
of the business cycle. Cash flows Xt follow 
a geometric Brownian motion and are scaled 
by a business cycle scalar factor: They are 
higher in expansions ynX t than in recessions 
x/LXt, i/n > yt - In the same way, bankruptcy 
costs are expressed as a state-dependent frac¬ 
tion 1 — a of the firm's assets; again, the re¬ 
covery rate in expansions an is higher than in 
recessions oil, <*h > oil. At each point in time, 
there is an exogenous probability of switching 
between recession and expansion. The default 
threshold is endogenously chosen by equity- 
holders to maximize the value of equity, and 
it turns out to be higher in recessions: The 
firm defaults earlier in recessions than in ex¬ 
pansions. Numerical examples illustrate the im¬ 
plications of the model for default thresholds, 
default clustering, optimal leverage (counter¬ 
cyclical), and credit spreads. As argued above 
the model is able to generate nontrivial short¬ 
term spreads. 

Elizalde (2005b) develops a structural model 
which, although originally applied to banks, 
can be extended to any firm. In contrast with 
previous models, the firms' asset value is 
assumed to be unobserved by debtholders. 
Debtholders rely on the ratings published by 
rating agencies to set the debt's coupon as a 
function of those ratings. As a consequence, 
the firms' funding costs are contingent on their 
ratings. Rating agencies perform timely audits 
to firms, with a given frequency, to find out 
their risk and asset levels, which determine the 
rating. Switching from one rating to another 
implies changes in the cost of debt and, as a 
consequence, in the ability of the firm to repay 
it. As in Hackbarth, Miao, and Morellec (2004) 
the default threshold is chosen endogenously 
by equityholders and it is rating-dependent. 


As described by Duffie (2005, p. 2772), "It 
has become increasingly common for bond is¬ 
suers to link the size of the coupon rate on their 
debt with their credit rating, offering a higher 
coupon rate at lower ratings, perhaps in an at¬ 
tempt to appeal to investors based on some 
degree of hedging against a decline in credit 
quality." This embedded derivative is called a 
ratings-based step-up. The author illustrates an 
example of a ratings-based step-up bond issued 
by Deutsche Telecom in 2002 with coupon pay¬ 
ments linked to the firm's rating. While Elizalde 
(2005b) derives the price of such a bond using 
a structural model, Duffie provides its pricing 
formula using an intensity model. 31 

Like LPM, state-dependent models have only 
been developed theoretically and their future 
success in credit risk modeling (if any) lies in 
their empirical applicability and their ability to 
replicate and predict credit spreads and default 
probabilities. 


DEFAULT CORRELATION 

To incorporate default dependencies between 
firms using structural models the literature 
has essentially relied on natural extensions 
of single firm's models, either Merton (1974) 
type models or FPM. We will start this sec¬ 
tion reviewing these extensions, under which 
the default dependences between firms are in¬ 
troduced through correlated asset processes. 32 
Giesecke (2004) and Giesecke and Goldberg 
(2004) suggest that the default correlation im¬ 
plied by the use of correlated firms' asset pro¬ 
cesses accounts for the dependence of the firms' 
credit quality on common macroeconomic fac¬ 
tors, what they term cyclical default correlation, 
but it does not account for credit risk contagion 
across firms and periods of default clustering. 
In order to introduce the contagion correlation 
in the model, Giesecke (2004) and Giesecke and 
Goldberg (2004) propose a model in which the 
firms' default thresholds are dependent one to 
each other and are unknown to investors. 
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After reviewing Giesecke (2004) and Giesecke 
and Goldberg (2004) we present factor models, 
which express the value of the firms' assets as 
a function of several common factors, which 
generate the correlation, and an idiosyncratic 
factor. 33 Duan et al. (2002) and Hull and White 
(2001) present two alternative approaches to 
deal with default correlation in structural 
models. 


Cyclical Default Correlation 

The most natural way to introduce default de¬ 
pendencies between firms in structural models 
is by correlating the firms' asset processes. 34 
Suppose we have i = 1,, I different firms 
with asset value processes given by 

dVij = rVijdt + a Vi Vi, t dW it t (23) 

for i = 1,..., I, where Wi, ( ,..., W/ f are corre¬ 
lated Brownian motions. As in the single firm 
case, these models imply predictable defaults. 
One way of getting rid of the default pre¬ 
dictability would be to introduce jump compo¬ 
nents in the firms' asset processes. Those jump 
components could be either correlated or uncor¬ 
related across firms. Correlated jump compo¬ 
nents, besides making defaults unpredictable, 
would also account for credit risk contagion ef¬ 
fects. The main problem lies in the calibration 
of the jump components. 

Contagion Default Correlation 

Cyclical default correlation does not account for 
all the credit risk dependence between firms. 
Giesecke (2004) and Giesecke and Goldberg 
(2004) extend structural models for default cor¬ 
relation to incorporate credit risk contagion ef¬ 
fects. The default of one firm can trigger the 
default of related firms. Furthermore, default 
times tend to concentrate in some periods of 
time in which the probability of default of all 
firms is increased and which cannot be to¬ 


tally, or even partially, explained by the firms' 
common dependence on some macroeconomic 
factors. 

Contagion effects can arise in this setting by 
direct links between firms in terms of, for exam¬ 
ple, commercial or financial relationships. The 
news about the default of one firm has a big im¬ 
pact on the credit quality of other related firms, 
which is immediately reflected in their default 
probabilities. 

In structural FPM we assume that investors 
have complete information about both asset 
processes and default thresholds, so they al¬ 
ways know the nearness of default for each 
firm, that is, the distance between the ac¬ 
tual level of the firm's assets and its default 
threshold. 35 

Giesecke (2004) and Giesecke and Goldberg 
(2004) introduce contagion effects in the model 
by relaxing the assumption that investors have 
complete information about the default thresh¬ 
olds of the firms. In Giesecke (2004), bondhold¬ 
ers do not have perfect information, neither 
about the thresholds nor about their joint distri¬ 
bution. However, they form a prior distribution, 
which is updated at any time one of such thresh¬ 
olds is revealed, which only happens when the 
corresponding firm defaults. In Giesecke (2004) 
investors have incomplete information about 
the firms' default thresholds but complete in¬ 
formation about their asset processes. Giesecke 
and Goldberg (2005) extend that framework to 
one in which investors do not have informa¬ 
tion about the firms' asset values or about their 
default thresholds. In this case, default correla¬ 
tion is introduced through correlated asset pro¬ 
cesses and, again, investors receive information 
about the firms' asset and default barrier only 
when they default. Such information is used 
to update their priors about the distribution of 
the remaining firms' asset values and default 
thresholds. 

The incomplete information about the level 
of the default thresholds and the fact that those 
levels are dependent among firms (through a 
copula function) generate the source of credit 
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risk contagion. Investors form a belief about 
the level of the firms' default thresholds. Each 
time one of the firms defaults, the true level of 
its default threshold is revealed, and investors 
use this new information to update their beliefs 
about the default thresholds of the rest of the 
firms. This sudden updating of the investors' 
perceptions about the default thresholds of the 
firm, and thus about the nearness of default for 
each firm, introduces the default contagion ef¬ 
fects in the models. 

This model allows for the introduction of 
default correlation both through dependencies 
between firms' asset values, cyclical default cor¬ 
relations, and through dependencies between 
firms' default barriers, contagion effects. 

The major problem of this approach is to cali¬ 
brate and estimate the default threshold copula. 
See Giesecke (2003) for some remarks on how 
to choose and calibrate that copula. 


Factor Models 

Factor models consider the firms' asset values 
as a function of a group of common factors, 
which introduce the default correlation in the 
model, plus a firm's specific factor: 

/ 

Vi, t = ^2 W.jZj't + €i, t (24) 

/=i 

where Z\,Zj represent the common factors, 
e\,... ,ei the firms' specific factors (indepen¬ 
dent of Z\,... Zj ), and the correlation structure 
is given by the coefficient w. Once we know the 
realization of the common factors, the firms' 
asset value and thus the firms' default proba¬ 
bilities are independent. 

The calibration of factor models is usually car¬ 
ried out by a logit or probit regression, depend¬ 
ing on the assumptions about the distribution of 
the factors. Schonbucher (2000), Finger (1999), 
and Frey, McNeil, and Nyfeler (2001) present 
illustrations of these models. 


KEY POINTS 

* The structural approach for credit risk mod¬ 
eling considers the link between the credit 
quality of a firm and the firm's economic 
and financial conditions. As a consequence, 
defaults are endogenously generated within 
the models (instead of exogenously given as 
in reduced-form models). By relying on the 
firm's assets and liabilities to model default 
risk, structural models also provide a frame¬ 
work to analyze recovery rates. 

* The structural literature on credit risk started 
with the Merton model, which used op¬ 
tion pricing theory for valuing the debt of a 
firm. In the Merton model, the firm's capital 
structure is composed by equity and a zero- 
coupon. The firm is assumed to default at the 
bond maturity if the value of its assets is be¬ 
low the face value of the bond. 

* The structural modeling approach has mainly 
developed by relaxing the strict assumptions 
of the Merton model, generating more real¬ 
istic models, which take into account differ¬ 
ent characteristics of firms' capital structure, 
bankruptcy laws, macro variables, and so on. 

* Structural models include first passage mod¬ 
els, liquidation process models, and state- 
dependent models. In first passage models 
a default occurs the first time the firm's as¬ 
set value goes below a certain lower thresh¬ 
old (related to the firm's level of debt). These 
models assume that the firm is liquidated 
immediately after the default event. Liqui¬ 
dation process models extend first passage 
models by taking into account the fact that 
firms that file for bankruptcy may avoid liq¬ 
uidation. Finally, state-dependent models as¬ 
sume that some of the parameters governing 
the firm's ability to generate cash flows or its 
funding costs depend on variables such as the 
business cycle (recession vs. expansion) or the 
firm's external rating. 

* There are several ways to account for de¬ 
fault correlation within the structural ap¬ 
proach. Cyclical default correlation and factor 
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models consider the dependence of firms' 
credit quality on common macroeconomic 
factors. Contagion models include the depen¬ 
dence of firms' credit quality on other firms' 
credit quality 


NOTES 

1. For a review of reduced form models, see 
Entry 22. 

2. See, for example, Leland (1994) and Leland 
and Toft (1996). 

3. See Black and Cox (1976), Geske (1977), Le¬ 
land (1994), and Leland and Toft (1996). 

4. See Rortn and Verma (1986), Kim, Rama- 
swamy, and Sundaresan (1993), Nielsen 
et al. (1993), Longstaff and Schwartz (1995), 
Briys and de Varenne (1997), and Hsu, Saa- 
Requejo, and Santa-Clara (2004). 

5. We reproduce here an updated list of exten¬ 
sions and improvements within the litera¬ 
ture of structural models provided by Eom, 
Helwege, and Huang (2003, p. 500): "See, 
for example. Black and Cox (1976), Bryis 
and De Varenne (1997), Goldstein, Ju, and 
Leland (2001), Ho and Singer (1982), Kim, 
Ramaswamy, and Sundaresan (1993), Le¬ 
land (1994,1998), Nielsen, Saa-Requejo, and 
Santa-Clara (1993), and Titman and Torous 
(1989). Anderson and Sundaresan (1996) 
and Mella-Barral and Perraudin (1997) in¬ 
corporate strategic defaults into traditional 
structural models. See also Acharya and 
Carpenter (2002), Acharya et al. (2000), An¬ 
derson, Sundaresan, and Tychon (1996), Fan 
and Sundaresan (2000), and Huang (1997). 
Duffie and Lando (2001) take into account 
incomplete accounting information. Gar- 
bade (1999) examines managerial discre¬ 
tion. Huang and Huang (2002) and Zhou 
(2001) incorporate jumps." 

6. See Zhou (2001b) and Giesecke (2004). 

7. See Giesecke (2004) and Giesecke and Gold¬ 
berg (2004). 


8. See Schonbucher (2000), Finger (1999), and 
Frey, McNeil, and Nyfeler (2001). 

9. For our purposes we shall use the class of 
equivalent probability measures P where 
nondividend-paying asset processes dis¬ 
counted with the risk-free interest rate are 
P-martingales. 

10. Since we are working under the risk-neutral 
probability measure, the drift term of the 
asset value process is given by the risk¬ 
free instantaneous interest rate. Under the 
physical probability measure, r would be re¬ 
placed by a parameter /i v representing the 
mean rate of return on assets; and the firm's 
asset process would be given by 

dV t = ^yVtdt + o y Vfd VVf 

where Wj is a Brownian motion under the 
physical probability measure P. 

11. Since shareholders finance each coupon is¬ 
suing new equity, the dilution effect reduces 
the relative value of each share. 

12. See Jones et al. (1984) and Franks and Torous 
(1989). 

13. See Iori (2003) and Chapter 3.1 in Jeanblanc 
and Rutkowski (2000). 

14. For an extensive review of FPM, see Chap¬ 
ter 3 in Bielecki and Rutkowski (2002) and 
references therein. 

15. See, for example, Mello and Parsons (1992), 
Nielsen et al. (1993), Leland (1994), Ander¬ 
son and Sundaresan (1996), Leland and Toft 
(1996), Mella-Barral and Perraudin (1997), 
and Francois and Morellec (2004). 

16. For a discussion of strategic debt service, see 
Mella-Barral and Perraudin (1997), Fan and 
Sundaresan (2000), and references therein. 

17. See Black and Cox (1976), Leland (1994), 
and Leland and Toft (1996) for models with 
constant interest rates, and see Kim, Ra¬ 
maswamy, and Sundaresan (1993), Nielsen 
et al. (1993), Longstaff and Schwartz 
(1995), Bryis and de Varenne (1997), Collin- 
Dufresne and Goldstein (2001), and Hsu, 


356 


Credit Risk Modeling 


Saa-Requejo, and Santa-Clara (2004) for 
models with stochastic interest processes. 

18. See Anderson and Sundaresan (2000), Eom, 
Helwege, and Huang (2003), and Ericsson 
and Reneby (2004). 

19. See Jones et al. (1984), Franks and Torous 
(1989), Sarig and Warga (1989), Fons (1994), 
Huang and Huang (2003), and Leland 
(2004). 

20. Elizalde (2005a) presents a review of struc¬ 
tural models that appeared in the literature, 
which consider incomplete information as¬ 
sumptions and bridge the gap between the 
structural and the reduced approach. 

21. At any given point in time the firms' own¬ 
ers face a decision of whether to liquidate 
the firm, or to maintain the status quo by 
continuing operations under the current 
regime, also referred to as a going concern. 

22. See, for example, Jones et al. (1984), Ronn 
and Verma (1986), Eom et al. (2000), 
Delianedis and Geske (2003), and Ericsson 
and Reneby (2005). 

23. Crosbie and Bohn (2003) point out that this 
equation holds only instantaneously, and 
that in practice, market leverage, which 
would be represented here by e 'j,'' 0 , 
moves around far too much for that equa¬ 
tion to provide reasonable results. 

24. Ronn and Verma (1986) extend the estima¬ 
tion to the cases of nonstationary er E and 
stochastic interest rates. 

25. For a complete description of this method 
see Duan (1994), Duan et al. (2002), and 
Ericsson and Reneby (2005), who also 
present a comparison with the traditional 
method. 

26. Eom et al. (2000) suggest another procedure 
to estimate cry, which they term the bond- 
implied volatility method. 

27. In fact, we called simply default to such an 
occurrence. In what follows we shall use the 
terms default and liquidation with the same 
meaning (different from default event!). A 
default event starts the process of liquida¬ 
tion. The process of liquidation has two pos¬ 


sible endings: liquidation or default and 
reorganization (which happens when the 
firm manages to improve its financial health 
and avoid closure). 

28. See also Frank and Torous (1989) and Gilson 
(1997). 

29. See also Kahl (2001) and Morrison (2003). 

30. Successfully managed means that such liq¬ 
uidation periods did not last longer than d 
and, as a consequence, did not trigger liq¬ 
uidation. 

31. Manso, Strulovici, and Tchistyi (2004) 
present an alternative derivation of ratings- 
based step-up bonds using structural mod¬ 
els, and review the existing literature. 

32. See Zhou (2001b). 

33. See Schonbucher (2000), Finger (1999), and 
Frey, McNeil, and Nyfeler (2001). 

34. See Zhou (2001b). 

35. We are not considering here jump com¬ 
ponents in the dynamics of the assets 
processes. 
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Abstract: Modeling credit risk is more challenging than modeling market risk. Some of these chal¬ 
lenges relate to the differences in the conceptual approaches used for modeling credit risk and the 
data limitations associated with the estimation of key model parameters. Hence, there is invariably 
a subjective element to the modeling of credit risk. A better understanding of these subjective 
elements can help practitioners to exercise sound judgment and to raise the right questions when 
trying to interpret the statistical outputs provided by credit risk models. 


This entry describes the building blocks to 
modeling credit risk. Key elements of the build¬ 
ing blocks include probability of default of the 
issuer; recovery rate in the event of issuer de¬ 
fault; and the probabilities of migrating to dif¬ 
ferent credit rating states. Various techniques 
that can be employed to estimate the probability 
of issuer default, including their relative mer¬ 
its and limitations, are then discussed. Subse¬ 
quently, the common approaches to quantifying 
credit risk are introduced. These include the de¬ 
fault mode paradigm, which considers default 
and no default as two states of the world; and 
the migration mode paradigm, which includes 
migrations to other credit rating categories in¬ 
cluding the default state. The entry concludes 
with a numerical example to illustrate the vari¬ 
ous concepts presented. 


ELEMENTS OF CREDIT RISK 

Credit risk is the risk that a borrower will be 
unable to make payment of interest or princi¬ 
pal in a timely manner. Under this definition, a 
delay in repayments, restructuring of borrower 
repayments, and bankruptcy, which constitute 
default events, will fall under credit risk. In ad¬ 
dition to this, the mark-to-market loss of a bond 
resulting from a change in the market percep¬ 
tion of the issuer to service the debt in future 
is also attributed to credit risk. This manifests 
itself in the form of a widening of the credit 
spread of the security in question against a risk¬ 
free asset, such as the Treasury bond, of similar 
maturity. The fluctuations in the credit spread 
between the two securities reflect views on the 
intrinsic creditworthiness of the issuer of the 
defaultable security. 


The views expressed here are those of the author and not necessarily those of the Bank for International 
Settlements. 
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The key determinants of credit risk at the se¬ 
curity level include probability of default (PD) of 
the issuer, that is, the probability that the is¬ 
suer will default on its contractual obligations 
to repay its debt; recovery rate given that the 
issuer has defaulted; and rating migration prob¬ 
abilities, that is, the extent to which the credit 
quality of the issuer improves or deteriorates as 
expressed by a change in the probability of 
default of the issuer. The following sections 
discuss in greater detail these determinants of 
credit risk for corporate issuers, and wherever 
relevant, methods commonly employed to esti¬ 
mate them will be indicated. 

Probability of Default 

Assessments about an issuer's ability to service 
debt obligations play a fundamental role in es¬ 
tablishing the level of credit risk embedded in 
a security. This is usually expressed through 
the default probability that quantifies the likeli¬ 
hood of the issuer not being able to service the 
debt obligations. Since probability of default is 
a function of the time horizon over which one 
measures the debt servicing ability, it is stan¬ 
dard practice to assume a one-year time horizon 
to quantify this. 

In general, the approaches used to determine 
default probabilities at the issuer level fall into 
two broad categories. The first is empirical in 
nature and requires the existence of a pub¬ 
lic credit-quality rating scheme. The second is 
based on Merton's options theory framework 
(Merton, 1974), and hence, is a structural ap¬ 
proach. The empirical approach to estimating 
PD makes use of a historical database of cor¬ 
porate defaults to form a static pool of com¬ 
panies having a particular credit rating for a 
given year. Annual default rates are then cal¬ 
culated for each static pool, which are then ag¬ 
gregated to provide an estimate of the average 
historical default probability for a given credit 
rating. If one uses this approach, then the de¬ 
fault probabilities for any two issuers having 
the same credit rating will be identical. The op¬ 


tion pricing approach to estimate default prob¬ 
ability makes use of the current estimates of 
the firm's assets, liabilities, and asset volatil¬ 
ity, and hence, is related to the dynamics of the 
underlying structure of the firm. Each of these 
approaches is discussed below in greater detail. 

Empirical Approach 

The empirical approach to determining proba¬ 
bility of default is taken by major rating agen¬ 
cies that include Moody's Investors Service, 
Standard & Poor's Corporation, and Fitch Rat¬ 
ings. The rating agencies assign credit ratings 
to different issuers on the basis of extensive 
analysis of both the quantitative and qualita¬ 
tive performance of a firm, which is intended 
to capture the level of credit risk. (How credit 
ratings are assigned is not discussed in this en¬ 
try.) For purpose of illustrating the empirical 
approach used to determining default probabil¬ 
ities for different credit ratings, we will discuss 
Moody's methodology. 

Moody's rating symbols (Aa, A, Baa, etc.) for 
issuer ratings are opinions of the ability of the 
issuer to honor senior unsecured financial obli¬ 
gations and contracts denominated in foreign 
and/or domestic currency. The rating grada¬ 
tions provide bondholders with a simple sys¬ 
tem to measure an issuer's ability to meet its 
senior financial obligations. 

In addition to the generic rating categories, 
Moody's applies numerical modifiers 1, 2, and 
3 for the rating categories from Aa to Caa. The 
modifier 1 indicates that the issuer is in the 
higher end of its letter-rating category; the mod¬ 
ifier 2 indicates a mid-range ranking; the mod¬ 
ifier 3 indicates that the issuer is in the lower 
end of the letter-ranking category. It is custom¬ 
ary to refer to a rating change from grade Aal 
to Aa2 as a one-notch rating downgrade. Bonds 
issued by firms rated between Aaa to Baa are 
referred to as investment-grade bonds and the 
rest as non-investment-grade bonds. 

It is important to emphasize here that 
Moody's ratings incorporate assessments of 
both the likelihood and the severity of default. 
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Considering that a particular issuer could have 
debt issues with different collateral and senior¬ 
ity, Moody's approach will lead to different debt 
issues of a particular issuer having different rat¬ 
ings. However, when an issuer is deemed to 
have defaulted on a particular debt issue, cross 
default clauses will require all outstanding debt 
of the issuer to be considered as having de¬ 
faulted. This in turn brings us to the following 
question: What events signal the default of an 
issuer? Moody's definition of default considers 
three types of default events: 

1. There is a missed or delayed disbursement of 
interest and / or principal including delayed 
payments made within a grace period. 

2. An issuer files for bankruptcy or legal re¬ 
ceivership occurs. 

3. A distressed exchange occurs where (1) the 
issuer offers bondholders a new security or 
package of securities that amount to a dimin¬ 
ished financial obligation, or (2) the exchange 
had the apparent purpose of helping the bor¬ 
rower to default. 

One may note here that the above defini¬ 
tions of default are meant to capture events 
that change the relationship between the bond¬ 
holder and bond issuer, which subjects the 
bondholder to an economic loss. 

The empirical approach to determining prob¬ 
ability of default relies on historical defaults of 
various rated issuers. This requires forming a 
static pool of issuers with a given rating ev¬ 
ery year and computing the ratio of defaulted 
issuers after a one-year period to the number 
of issuers that could have potentially defaulted 
for the given rating. If, during the year, ratings 
for certain issuers are withdrawn, then these is¬ 
suers are subtracted from the potential number 
of issuers who could have defaulted in the static 
pool. Specifically, the one-year default rates for 
A-rated issuers during a given year represent 
the number of A-rated issuers that defaulted 
over the year divided by the number of A-rated 
issuers that could have defaulted over that year. 
Annual default rates calculated in this manner 


for each rating grade are then aggregated to 
provide an estimate of the average historical 
default probability for a given rating grade. 

We mentioned earlier in this entry that 
although different debt issues of a particular 
issuer could have different ratings assigned de¬ 
pending on the seniority of the issue, cross de¬ 
fault clauses will require all outstanding debt of 
a particular issuer to default at the same time. 
This raises an important question when manag¬ 
ing corporate bond portfolios, namely, whether 
the issuer rating or the rating of the bond issue is 
to be considered when implying the probability 
of default. The short answer to this question is 
that it depends on how credit risk will be quan¬ 
tified for the given bond. The approach taken 
here to quantify bond-level credit risk requires 
that the credit rating of the bond issuer is the 
one to be used. This will be evident when we 
discuss the quantification of credit risk at the 
bond level. 

Merton's Approach 

Merton's approach to estimating the probability 
of default of a firm builds on the limited liabil¬ 
ity rule that allows shareholders to default on 
their obligations while surrendering the firm's 
assets to the creditors. In this framework, the 
firm's liabilities are viewed as contingent claims 
on the assets of the firm, and default occurs at 
debt maturity when the firm's asset value falls 
below the debt value. Assuming that the firm 
is financed by means of equity St and a single 
zero-coupon debt maturing at time T with face 
value F and current market value B t , the firm's 
assets at time t can be represented as 

A = S t + B t (1) 

The probability of default in Merton's frame¬ 
work for the firm will be the probability that 
the firm's assets is less than the face value of 
the debt, which is given by, 

PD = prob[A T < F] (2) 

In order to determine PD in Merton's frame¬ 
work, we need to select a suitable model for the 
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process followed by A t . Standard assumption is 
to postulate that A t follows a log-normal pro¬ 
cess with growth rate /i and asset volatility a A 
which is given below: 

A = A) exp[(p - 0.5<r^)f + a A -/t z t ] (3) 

In equation (3) z t is a normally distributed ran¬ 
dom variable with zero mean and unit variance. 
Using equation (3) in conjunction with equation 
(2) we can denote the PD as 

PD = prob [In Aq + (p.— 0.5<r^)T 

+ (t a Vt zt < In F ] (4) 


In equation (4) we have taken logarithm on 
both sides of the inequality, since doing so does 
not change the probabilities. Rearranging the 
terms in equation (4), the probability of default 
for the firm can be represented as 


PD = prob 


Zt < 


\nf + (p-0.5 o^T 


oh 


VT 


(5) 

Since Zy is a normally distributed random 
variable, PD can be represented as 


PD = N(—D) (6) 


where 


D = 

N(—D) = 


In ^ + (p — 0.5cr^)T 



cr A VT 

-d 

J exp(— jx 2 )dx 


— OO 


(7) 

( 8 ) 


payments, resulting in a maturity transforma¬ 
tion of their liabilities. 

To resolve these difficulties Moody's KMV 
suggests some modifications to Merton's frame¬ 
work to make the default probability estimate 
meaningful in a practical setting (see Cros- 
bie and Bohn, 2002). (Moody's KMV refers to 
probability of default as expected default fre¬ 
quency or EDF™). For instance, rather than 
using face value of the debt to denote the de¬ 
fault point, Moody's KMV suggests using the 
sum of the short-term liabilities (coupon and 
principal payments due in less than one year) 
and one-half of the long-term liabilities. This 
choice is based on the empirical evidence that 
firms default when their asset value reaches a 
level that is somewhat between the value of to¬ 
tal liabilities and the value of short-term liabili¬ 
ties. Further, since the asset returns of the firms 
may in practice deviate from a normal distri¬ 
bution, Moody's KMV maps the distance to the 
default variable to a historical default statistics 
database to estimate the probability of default. 
In the KMV framework, default probabilities 
for issuers can take values in the range between 
0.02% and 20%. 

To illustrate the KMV approach, let DPT de¬ 
note the default point, which is equal to the sum 
of the short-term liabilities due in less than one 
year and one-half of the long-term liabilities, 
and E(At) the expected value of the firm's as¬ 
sets one year from now. Then the distance to 
default is given by. 


In equation (7), D represents the distance to 
default, which is the distance between the log¬ 
arithm of the expected asset value at maturity 
and the logarithm of the default point normal¬ 
ized by the asset volatility. 

Although Merton's framework for determin¬ 
ing PD for issuers is rather simple, applying this 
directly in practice runs into difficulties. This is 
because firms seldom issue zero coupon bonds 
and usually have multiple liabilities. Further¬ 
more, firms in distress may be able to draw on 
lines of credit to honor coupon and principal 


d = ln^ = In -jjjsj + (p — 0.5a|) T 

& A <7/1VT 

In equation (9), the market value of the firm's 
assets is not observed since liabilities of the firm 
are not traded. What can be observed in the 
market is the equity value of the firm because it 
is traded. Since the value of the firm's equity at 
time T can be seen as the value of a call option 
on the assets of the firm with a strike price equal 
to the book value of the liabilities, we have the 
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following equation: 

S r = A t x N(d i) - e~ rT x DPT x N(d 2 ) 

( 10 ) 

In equation (10), N (-)is cumulative standard 
unit normal distribution, r is the risk-free inter¬ 
est rate, and the variables d\ and d 2 are given 

by, 

= in (A T /DPT) + (r + \a\)T 

ctaVT 

d 2 = d\ - a A Vf ( 12 ) 

It is possible to show that equity and as¬ 
set volatility are related through the following 
relation: 

a s = ^ x N(di) x a A (13) 

b T 

From this relation it is possible to solve for the 
asset value and asset volatility, given the equity 
value and equity volatility using an iterative 
procedure. Knowing the asset volatility and as¬ 
set value, it is possible to compute the distance 
to default using equation (9) from which prob¬ 
ability of default can be inferred. 

Relative Merits 

The empirical and structural approaches to de¬ 
termine the probability of default for issuers 
can result in significant differences in the es¬ 
timates of PD. Both approaches have their 
relative advantages and disadvantages. For 
instance, the empirical approach has the im¬ 
plicit assumption that all issuers having the 
same credit rating will have identical PD. Fur¬ 
thermore, this default probability will be equal 
to the historical average rate of default. Use of 
the structural approach, on the other hand, will 
result in PD being more responsive to changes 
in economic conditions and business cycles as 
it incorporates current estimates of the asset 
value and asset volatility of the firm in deriv¬ 
ing this information. One drawback, however, 
is that the historical database of defaulted firms 
is comprised mostly of industrial corporates. 
As a consequence, use of the industrial corpo¬ 


rate default database to infer the PD of regu¬ 
lated financial firms could potentially result in 
biased PD estimates. Seen from a trading per¬ 
spective, credit spreads for corporates tend to 
be influenced much more by agency ratings and 
credit rating downgrades rather than EDF val¬ 
ues. This has the implication that bond market 
participants tend to attach greater significance 
to rating agency decisions for pricing. For the 
purpose of modeling portfolio credit risk and 
selecting an optimal corporate bond portfolio 
to replicate the benchmark risk characteristics, 
we will demonstrate the usefulness of both ap¬ 
proaches in the entries to follow. 

On Rating Outlooks 

Rating agencies provide forward-looking as¬ 
sessment of the creditworthiness of issuers over 
the medium term. Such forward-looking credit 
assessments of issuers are referred to as rating 
outlooks. Outlooks assess the potential direc¬ 
tion of an issuer's rating or creditworthiness 
over the next six months to three years. A posi¬ 
tive outlook suggests an improvement in credit 
rating, a negative outlook indicates deteriora¬ 
tion in credit rating, and a stable outlook sug¬ 
gests a rating change is less likely to happen. 
Bond prices tend to react to changes in rating 
outlook although no actual change in credit rat¬ 
ing has occurred. In particular, the impact on 
prices is much more significant if the issuer is 
Baa since a rating downgrade can result in the 
issuer being rated non-investment grade. Fur¬ 
thermore, if a particular sector (such as Tele¬ 
com) is having a negative rating outlook, a 
change in rating outlook from stable to nega¬ 
tive for an issuer in this sector can also have a 
significant effect on bond prices. 

The above observations raise the following 
important question: Should a negative or a pos¬ 
itive rating outlook for a given issuer be in¬ 
corporated in our assessment of PD through a 
downgrade or upgrade before it has actually 
happened? The short answer to this question 
is no, primarily because our estimate of credit 
risk will incorporate the probability that credit 
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rating of issuers can change over time. Forcing 
a rating change for the issuer before it has actu¬ 
ally happened may tend to bias our estimate of 
credit risk. 

Captive Finance Companies 
Large companies in most industrial sectors 
have captive finance subsidiaries. The princi¬ 
pal function of any financial subsidiary is to 
support the sales of the parent's products. This 
function can make the finance company a crit¬ 
ical component of the parent's long-term busi¬ 
ness strategy. In light of this close relationship 
between the captive finance company and its 
parent, credit ratings for both are usually iden¬ 
tical. However, if the legal clauses guarantee 
that the parent company's bankruptcy does 
not automatically trigger the bankruptcy of 
the financial subsidiary, rating differences may 
exist between the parent company and its finan¬ 
cial subsidiary. For the purpose of quantifying 
credit risk, we will use the actual credit rating 
of the financial subsidiary in the calculations. 

Estimating the probability of default of fi¬ 
nancial subsidiaries on the basis of Merton's 
structural model can lead to difficulties. This is 
because the equity of the financial subsidiary 
may not be traded. For example. Ford Motor 
is traded whereas the financial subsidiary Ford 
Credit is not traded. Considering that the fi¬ 
nancing arm of major industrial corporates is 
vital to the survival of both the parent and the 
subsidiary, one can argue that the equity market 
takes this relationship into account when valu¬ 
ing the parent company. Under this argument, 
one can assign the same probability of default 
to both companies where only one of them is 
traded in the market. 

Recovery Rate In the event of default, bond¬ 
holders will not receive all of the promised 
coupon and principal payments on the bond. 
Recovery rate for a bond, which is defined as 
the percentage of the face value that can be 
recovered in the event of default, will be of 
natural interest to investors. Considering that 


credit market convention is to ask how much 
of promised debt is lost rather than how much 
of it is recovered, the term "loss given default" 
(LGD), which is defined as one minus recovery 
rate, is also commonly used in the credit risk 
literature. 

In general, estimating the recovery value of 
the bond in the event of default is rather com¬ 
plex. This is because the payments made to 
bondholders could take the form of a combi¬ 
nation of equity and derivative securities, new 
debt, or modifications to the terms of the sur¬ 
viving debt. Considering that there may be no 
market for some forms of payments, it may not 
be feasible to measure the recovery value. More¬ 
over, the amount recovered could take several 
months or even years to materialize and could 
potentially also depend on the relative strength 
of the negotiating positions. As a result, esti¬ 
mating historical averages of amounts recov¬ 
ered from defaulted debt will require making 
some simplifying assumptions. 

Moody's, for instance, proxy the recovery rate 
with the secondary market price of the de¬ 
faulted instrument approximately one month 
after the time of default. The motivation for 
such a definition is that many investors may 
wish to trade out of defaulted bonds, and a 
separate investor clientele may acquire these 
and pursue the legal issues related to recover¬ 
ing money from defaulted debt instruments. In 
this context, Moody's recovery rate proxy can 
be interpreted as a transfer price between these 
two investor groups. 

Empirical research on recovery rates suggests 
that industrial sector, seniority of the debt, state 
of the economy, and credit rating of the is¬ 
suer one year prior to default are variables that 
have significant influence on potential recovery 
rates. For example, during periods of economic 
downturns, the recovery rate is usually lower 
relative to historical averages. This has the im¬ 
plication that there is also a time dimension to 
the potential recovery rates. Differences in re¬ 
covery rates for defaulted debt across industry 
sectors arise because the recovery amount will 
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depend on the net worth of the firm's tangible 
assets. For instance, firms belonging to indus¬ 
trial sectors with physical assets such as public 
utilities have higher recovery rates compared 
to the industry-wide average. Empirical results 
also tend to suggest that issuers that were rated 
investment grade one year prior to default tend 
to have higher recovery values compared to is¬ 
suers that were rated non-investment grade. 

In order to incorporate the variations in the 
observed recovery rates over time and between 
issuers when quantifying credit risk, the stan¬ 
dard deviation of recovery rates, denoted cfrr, is 
taken into account. Including the uncertainty in 
recovery rates will have the effect of increasing 
credit risk at the issuer level. Common practice 
is to use beta distribution to model the observed 
variations in recovery rates. The advantage of 
choosing beta distribution is that it has a simple 
functional form dependent on two parameters 
that allows for high recovery rate outliers ob¬ 
served in the empirical data to be modeled. The 
beta distribution has support on the interval 0 
to 1, and its density function is given by. 


f(x,a, fi) 


r>+/i) 

r(«)T(j) 


x“ -1 (l - xy* -1 , 
0 


0 < x < 1 
otherwise 


(14) 

where a > 0, ft > 0, and T(-) is the gamma func¬ 
tion. The mean and variance of the beta distri¬ 
bution are given by. 


h = 


a 

a + P 


a 


2 


a • P 

(a + P) 2 ■ (a + P + 1) 


(15) 

(16) 


Table 1 shows the empirical estimates of re¬ 
covery rates on defaulted securities covering 


the period 1978 to 2001 based on prices at time of 
default. One can notice that senior secured debt 
recovers on average 53% of the face value of the 
debt whereas senior unsecured debt recovers 
only around 42% of face value. The standard 
deviation of the recovery rates for all seniority 
classes is roughly around 25%. 

The empirical estimates for average recov¬ 
ery rates tend to vary somewhat depending on 
the data set used and the recovery rate defi¬ 
nition. For instance, the study by Moody's us¬ 
ing defaulted bond data covering the period 
1982-2008 suggest that mean recovery rates for 
senior secured bonds is 53%, for senior un¬ 
secured bonds is 32.4%, and for subordinated 
bonds is 23.5%. 

In the numerical examples to be presented 
in this entry, we have assumed that the bonds 
under consideration are senior unsecured debt. 
Furthermore, we have assumed that the stan¬ 
dard deviation of the recovery rate is 25% and 
the average recovery rate is 35%, which is closer 
to Moody's estimate incorporating more recent 
default data. 

Rating Migrations The framework for assess¬ 
ing the issuer's PD comprised of estimating 
the probability associated with the issuer de¬ 
faulting on its promised debt payments. In this 
framework, the issuer is considered to be in one 
of two states: its current rating or the default 
state. In practice, default is just one of many 
states to which the issuer's rating can transi¬ 
tion. Actions of rating agencies can result in the 
issuer's rating being downgraded or upgraded 
by one or several notches. One can associate 
the concept of a state with each rating grade 


Table 1 Recovery Rate Statistics on Defaulted Securities (1978-2001) 


Bond Seniority 

Number 
of Issuers 

Median 

Mean 

Standard 

Deviation 

Senior secured 

134 

57.42% 

52.97% 

23.05% 

Senior unsecured 

475 

42.27% 

41.71% 

26.62% 

Senior subordinated 

340 

31.90% 

29.68% 

24.97% 

Subordinated 

247 

31.96% 

31.03% 

22.53% 


Source: Altman, Resti, and Sironi (2001). 
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so that rating actions result in the transition to 
one of several states. Each rating action can be 
viewed as a credit event that changes the per¬ 
ceived probability of default of the issuer. In the 
credit risk terminology such a multistate credit 
event process is described as credit or rating mi¬ 
gration. Associated with rating migrations are 
transition probabilities, which model the rela¬ 
tive frequency with which such credit events 
occur. 

Modeling the rating migrations process will 
require estimating a matrix of transition prob¬ 
abilities, which is referred to as the rating tran¬ 
sition matrix. Each cell in the one-year rating 
transition matrix corresponds to the probabil¬ 
ity of an issuer migrating from one rating state 
to another over the course of a 12-month hori¬ 
zon. Mathematically speaking, a rating transi¬ 
tion matrix is a Markov matrix, which has the 
property that the sum of all cells in any given 
row of the matrix is equal to one. Incorporating 
rating migrations into the credit risk-modeling 
framework provides a much richer picture of 
changes in aggregate credit quality of the issuer. 

The techniques used to estimate transition 
probabilities are similar in principle to the es¬ 
timation of probability of default. For instance, 
computing the one-year transition probability 


from the rating Aal to Baal will require first de¬ 
termining the number of issuers that are rated 
Baal and that had an Aal-rating one year ear¬ 
lier. Dividing this number by the total num¬ 
ber of issuers that were rated Aal during the 
previous year will give us the one-year tran¬ 
sition probability between these two ratings. 
Again, if the ratings of some Aal issuers are 
withdrawn during the one-year period of inter¬ 
est to us, then the total number of Aal issuers 
is reduced by this number. Annual transition 
probabilities calculated in this manner are then 
aggregated over a number of years to estimate 
the average historical transition probability. 
Table 2 shows an example rating transition ma¬ 
trix, with the transition probabilities expressed 
in percentages. 

The interpretation of the numbers in the rat¬ 
ing transition matrix is the following. The first 
cell in the matrix refers to the probability (ex¬ 
pressed in percentage terms) of remaining in 
the rating grade Aaa one year from now. The 
estimate of this probability is 89.06% in the rat¬ 
ing transition matrix. The cell under column A3 
in the first row of the matrix refers to the proba¬ 
bility of an issuer migrating from Aaa-rating to 
A3-rating in one year, and the estimate of this 
probability is 0.17%. Similarly, the cells in the 


Table 2 An Example One-Year Rating Transition Matrix 


Aa 3 

A 1 

A 2 

A 3 

Baal 

Baa 2 

Baa 3 

Bal 

Ba 2 

Ba 3 

B 1 

B 2 

B 3 

Caa-C 

Default 

0.49 

0.74 

0.29 

0.17 

0.00 

0.00 

0.00 

0.04 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.01 

6.86 

2.41 

0.33 

0.05 

0.19 

0.00 

0.00 

0.09 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.02 

8.82 

4.13 

1.42 

0.61 

0.17 

0.00 

0.00 

0.00 

0.00 

0.05 

0.08 

0.00 

0.00 

0.00 

0.03 

81.48 

9.30 

3.28 

0.89 

0.25 

0.22 

0.17 

0.00 

0.04 

0.09 

0.00 

0.00 

0.00 

0.00 

0.04 

5.76 

80.88 

7.50 

3.00 

0.81 

0.28 

0.14 

0.37 

0.26 

0.05 

0.12 

0.01 

0.00 

0.00 

0.06 

0.80 

5.57 

80.75 

7.48 

2.99 

0.83 

0.41 

0.29 

0.11 

0.12 

0.03 

0.07 

0.03 

0.03 

0.08 

0.24 

1.55 

8.68 

75.40 

7.03 

3.83 

1.50 

0.57 

0.20 

0.23 

0.35 

0.05 

0.05 

0.01 

0.10 

0.19 

0.21 

2.84 

8.04 

74.68 

7.73 

3.29 

1.09 

0.48 

0.37 

0.58 

0.09 

0.02 

0.02 

0.13 

0.18 

0.18 

0.92 

3.87 

7.27 

75.35 

7.40 

1.77 

0.55 

0.69 

0.51 

0.47 

0.27 

0.03 

0.23 

0.08 

0.19 

0.61 

0.69 

3.42 

9.92 

71.29 

6.79 

2.76 

2.02 

0.85 

0.33 

0.36 

0.17 

0.46 

0.03 

0.24 

0.13 

0.73 

0.82 

3.20 

8.36 

72.31 

5.00 

4.22 

1.22 

1.38 

1.24 

0.36 

0.67 

0.03 

0.04 

0.16 

0.14 

0.39 

0.77 

2.53 

9.18 

70.35 

6.82 

1.84 

4.07 

2.07 

0.58 

1.03 

0.00 

0.04 

0.17 

0.19 

0.19 

0.28 

0.75 

2.94 

5.47 

72.38 

5.25 

5.60 

3.34 

0.92 

2.46 

0.00 

0.06 

0.10 

0.16 

0.08 

0.26 

0.32 

0.45 

2.69 

6.09 

71.52 

5.58 

6.80 

1.90 

3.97 

0.01 

0.11 

0.00 

0.07 

0.18 

0.12 

0.19 

0.30 

1.69 

3.05 

5.95 

63.38 

11.70 

3.82 

9.37 

0.00 

0.02 

0.04 

0.07 

0.12 

0.13 

0.22 

0.20 

0.38 

1.28 

4.41 

3.69 

68.14 

7.51 

13.72 

0.00 

0.00 

0.00 

0.00 

0.00 

0.54 

0.54 

0.71 

0.00 

1.52 

2.06 

1.37 

3.20 

60.46 

29.60 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

100.0 
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second row correspond to the one-year migra¬ 
tion probabilities of an issuer that is currently 
rated Aal. 

Considering that Table 2 is representative of 
a typical rating transition matrix that credit 
agencies publish, one can draw interesting con¬ 
clusions from the relative frequency of rating 
downgrades and upgrades from this table. For 
example, the rating transition matrix suggests 
that higher ratings have generally been less 
likely to be revised over one year than lower 
ratings. Another observation is that large and 
sudden rating changes occur infrequently. As 
one moves down the rating scale, the likelihood 
of a multinotch rating change increases. 

Quantifying Credit Risk 

In the previous section we identified the im¬ 
portant variables that influence credit risk at 
the security level. In this section we will fo¬ 
cus our attention on quantifying credit risk at 
the security level. Without loss of generality, it 
will be assumed that the security is a corporate 
bond. Most of us are familiar with the concept 
of risk in connection with financial assets. In 
broad terms, risk is associated with potential 
financial loss that can arise from holding the as¬ 
set, the exact magnitude of which is difficult to 
forecast. As a result, it is common to describe 
the potential loss in value using an appropriate 
probability distribution whose mean and stan¬ 
dard deviation serve as useful measures for risk 
quantification. 

The above practice is well known in the eq¬ 
uities market where investors focus on market 
risk that model variations in stock return. This 
leads us to quantifying the market risk mea¬ 
sures through expected return and standard de¬ 
viation of return. Under the assumption that 
equity returns are normally distributed, the re¬ 
alized return will lie within one standard de¬ 
viation of the expected return with two-thirds 
probability. 

Quantifying credit risk for a corporate bond 
is similar in principle. Unlike the case for eq- 



Figure 1 Typical Shape of the Credit Loss 
Distribution 

uities, corporate bond investors focus on the 
distribution of potential losses that can result 
from the issuer-specific credit events. Borrow¬ 
ing the principle from equities market, it has 
become common practice to quantify credit risk 
at the security level through the mean and stan¬ 
dard deviation of the loss distribution. How¬ 
ever, there is an important difference between 
the two risk measures. This pertains to the dis¬ 
tribution of credit loss, which unlike for mar¬ 
ket risk, is far from being a normal distribution. 
Hence, deviations from the expected loss by one 
standard deviation can occur more frequently 
than on one in three occasions. Credit market 
convention is to refer to the standard deviation 
of loss resulting from credit events as unexpected 
loss (UL) and the average loss as expected loss 
(EL). Figure 1 shows the typical shape of the 
distribution of credit losses. 

In this section we will discuss how expected 
and unexpected loss used to quantify credit 
risk at the security or bond level can be deter¬ 
mined. Depending on whether the loss distri¬ 
bution takes into account the changes in bond 
prices resulting from rating migrations, we can 
compute two sets of loss variables, one in the de¬ 
fault mode and another in the migration mode. 
Quantification of credit risk in both these modes 
is discussed below. 

Expected Loss Under Default Mode Expected 
loss under default mode of a bond is defined 
as the average loss the bondholder can expect 
to incur if the issuer goes bankrupt. Consider¬ 
ing that default probability estimates are based 
on a one-year holding period, expected loss 
is also expressed over a one-year period. In 


370 


Credit Risk Modeling 


practice, the issuer could actually default at any 
time during the one-year horizon. Since a bond 
portfolio manager is usually interested in the 
worst-case loss scenario, which corresponds to 
the issuer defaulting in the immediate future, 
we will use the one-year PD to quantify the 
worst-case loss. This has the implication that 
we can quantify credit risk using the current 
trading price for the bond rather than its one- 
year forward price. Often, a portfolio manager's 
goal is to manage relative risk versus a bench¬ 
mark. In this case, the use of one-year PD in 
conjunction with current trading prices will not 
bias the relative risk estimates. Moreover, this 
assumption leads to considerable simplification 
in quantifying credit risk since deriving for¬ 
ward yield curves for various credit ratings is 
quite tedious. 

The estimate of expected loss for a security 
depends on three variables: probability of de¬ 
fault of the issuer, the average recovery rate, 
and the nominal exposure (NE) to the security. 
One can think of the default process 5 as be¬ 
ing a Bernoulli random variable that takes the 
value 0 or 1. The value 5 = 1 signals a default 
and the value 5 = 0 signals no default. Condi¬ 
tional upon default, the recovery rate lit is a ran¬ 
dom variable whose mean recovery rate is RR. 
Figure 2 pictorially depicts the default process 
and the recovery values. In this exhibit, Pdirty de¬ 
notes the dirty price (clean price plus accrued 
interest) for a $1 face value of the bond. 

Figure 2 indicates that if the issuer defaults, 
the price of the bond will be equal to its recovery 
rate i fr, which is a random variable. If the issuer 



Figure 2 Bond Price Distribution Under Default 
Mode 


does not default, the bond can be sold for a 
value equal to its current dirty price Pdirty I n 
this default mode framework, the price of the 
risky debt can be written as, 

P = Pdirty X f[S=0] + X fp=l] (17) 

In equation (17), I is the indicator function of 
the default process. For the purpose of quanti¬ 
fying credit risk, the variable of interest to us is 
the credit loss resulting from holding the cor¬ 
porate bond. This is a random variable, which 
we denote I, and is given by, 

f = Pdirty P = Pdirty Pdirty 

X f[i=0] — 1A X fp = i] (18) 

Taking expectations on both sides of equation 
(18) will allow us to compute the expected loss 
arising from credit risk. This is given by, 

EL = E (2) = Pdirty - Pdirty 

x(l - PD) - E(r/r x I [s=1] ) (19) 

We note that computing expected loss re¬ 
quires taking the expectation of the product of 
two random variables, the recovery rate process 
and the default process. Knowledge of the joint 
distribution of these two random variables will 
be required to compute this expectation. Most 
credit risk models will make the simplifying 
assumption that these two random variables 
are independent. If we make this assumption, 
we get the equation for expected loss as given 
below: 

EL = Pdirty X PD - RR X PD 

= PD x (P dirty - RR) [ ’ 

We remind the reader that Pdirty is the dirty 
price of the bond for $1 nominal and RR in 
equation (20) is the mean recovery rate, which 
is expressed as a fraction of the face value of the 
debt. It is important to draw attention to the fact 
that the quantity (Pdirty — RR) is different from 
LGD, which is defined as one minus the recov¬ 
ery rate. We therefore introduce the term "loss 
on default" (LD) to capture this new quantity 
as given below: 

LD = Pdirty - RR (21) 
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We note that loss on default will be identical 
to the quantity loss given default if the dirty 
price of the bond is equal to one. In all other 
circumstances these two quantities will not be 
the same. 

Equation (20) has been derived under the 
assumption that the nominal exposure is one 
dollar. The expected loss from credit risk for a 
nominal exposure equal to NE is given by, 

EL — NE x PD x LD (22) 

The use of the quantity LD rather than LGD in 
defining expected loss might raise some doubts 
in the minds of the reader. To clear these doubts, 
let us consider the following example that illus¬ 
trates why LD is more appropriate than LGD in 
the context of bond portfolio management. 

Let us consider the case of a bond portfolio 
manager who has the option to invest SI mil¬ 
lion either in a bond with dirty price $100 (issuer 
A) or in a bond with dirty price $80 (issuer B). In 
the latter case, the portfolio manager will buy 
$1.25 million nominal value of issuer B's bond 
to fully invest the $1 million. Let us assume that 
both issuers default within the next year and the 
recovery value is $50 for $100 face value of expo¬ 
sure. If the portfolio manager had invested in is¬ 
suer A's bond, he would recover $500,000 since 
the nominal exposure is $1 million. On the other 
hand, if the portfolio manager invested in issuer 
B's bond, then the amount recovered would be 
$625,000. This is because the portfolio manager 
has a nominal exposure of $1.25 million of is¬ 
suer B's bond. Clearly, from the portfolio man¬ 
ager's perspective the credit loss resulting from 
an investment in issuer A's bond is $500,000, 
whereas the credit loss from an investment in 
issuer B's bond is only $375,000, although both 
investments recovered 50% of the face value of 
debt. Use of the quantity LD correctly identi¬ 
fies the losses in both circumstances whereas 
the LGD definition will indicate that the losses 
are $500,000 for issuer A's bond and $625,000 
for issuer B's bond. In practice, LGD is used in 
conjunction with the exposure amount of the 
transaction to identify the expected loss. How¬ 


ever, this definition will also incorrectly identify 
the losses as being identical for both bonds in 
this example. 

Unexpected Loss Under Default Mode We 
learned that the expected loss on the bond is the 
average loss that the investor can expect to incur 
over the course of a one-year period. However, 
the actual loss may well exceed this average loss 
over certain time periods. The potential devia¬ 
tion from the expected loss that the investor 
can expect to incur is quantified in terms of the 
standard deviation of the loss variable defined 
in equation (18). Credit market convention is 
to refer to the standard deviation of loss as un¬ 
expected loss. Hence, to derive the unexpected 
loss formula, we need to compute the standard 
deviation of the random variable C. To facilitate 
this computation, we will rewrite equation (18) 
as follows: 

f = Pdirty Pdirty X (1 f[s=l]) -p f X f[< 5 =l] 

= f[5=l] X (Pdirty V0 

(23) 

Recalling a standard result from probability 
theory, the variance of any random variable z 
can be written as the difference between the ex¬ 
pected value of the random variable squared 
minus the square of its expected value. In equa¬ 
tion form this is given by, 

a 2 = E(z 2 )-[L(z)] 2 (24) 

We will again make the simplifying as¬ 
sumption that the default and recovery rate 
processes are independent in deriving the 
unexpected loss formula. Under this assump¬ 
tion the variance of the random variable l can 
be written as, 

Var(i) = E(I 2 =1] ) x E[(P dirty - f) 2 ] 

—[£(fp =1 ])] 2 x [E(P d i rt y — VO] 2 (25) 

Taking expected values and using the relation 
(24), equation (25) simplifies to, 

Var(l) = [crp D + PD 2 ] x [cr| R + LD 2 ] - PD 2 x LD 2 

(26) 
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In the above equation, a| D is the variance 
of the Bernoulli random variable S, which is 
given by 

Op D = PD x (1 - PD) (27) 

Simplifying the terms in equation (26), it 
can be shown that unexpected loss, which is 
the standard deviation of the loss variable, is 
given by 

UL = (28) 

The above formula for unexpected loss as¬ 
sumes that the nominal exposure is equal to 
one dollar. For a nominal exposure equal to NE, 
the unexpected loss at the security level will be 
given by 

UL = NE x > x ct* r + LD 2 x a} D (29) 

On the Independence Assumption 

In deriving the expressions for expected and un¬ 
expected losses on a bond resulting from credit 
risk, we made the simplifying assumption that 
the default process and recovery rate process 
are independent. The question we should ask 
ourselves is whether this assumption is a rea¬ 
sonable one to make. Examining existing theo¬ 
retical models on credit risk does not give us a 
definitive answer to this question. For instance, 
in Merton's framework the default process of a 
firm is driven by the value of the firm's assets. 
The risk of a firm's default is therefore explic¬ 
itly linked to the variability in the firm's asset 
value. In this setup both the default process and 
the recovery rate are a function of the structural 
characteristics of the firm, and one can show 
that PD and RR are inversely related. 

The reduced-form models, unlike structural 
models, do not condition default on the value of 
the firm. The default and recovery processes are 
modeled independently of the structural fea¬ 
tures of the firm and are further assumed to be 
independent of each other. This independence 
assumption between default and recovery pro¬ 
cesses, which is fundamental to reduced-form 


models, is pervasive in all existing credit value 
at risk models. 

Empirical results on the relationship between 
default and recovery values tend to suggest that 
these two variables are negatively correlated. 
The intuition behind this result is that both de¬ 
fault rate and recovery rate may depend on 
certain structural factors. For instance, if a bor¬ 
rower defaults on the debt payments, the re¬ 
covery rate will depend on the net worth of the 
firm's assets. This net worth, which is usually a 
function of prevailing economic conditions, will 
be lower during periods of recession. On the 
contrary, during recession the probability of de¬ 
fault of issuers tends to increase. The combina¬ 
tion of these two effects will result in a negative 
correlation between default and recovery rates. 

Empirical research on the relationship be¬ 
tween default and recovery rate processes 
suggests that a simple microeconomic interpre¬ 
tation based on supply and demand tends to 
drive aggregate recovery rate values. In partic¬ 
ular, during high default years the supply of 
defaulted securities tends to exceed demand, 
which in turn drives secondary market prices 
down. Considering that RR values are based on 
bond prices shortly after default, the observed 
recovery rates are lower when there is an excess 
supply of defaulted securities. 

In order to incorporate the empirical evi¬ 
dence that recovery values decrease when de¬ 
fault rates are high, we will have to identify 
periods when PD is high relative to normal lev¬ 
els. If PD values are determined on the basis 
of historical average default rates as is done 
by rating agencies, it is difficult to distinguish 
between low and high default periods. On the 
other hand, if a structural approach is used to 
estimate PD values as is done by KMV Corpo¬ 
ration, it is possible to signal periods when PD 
values are higher than historical average levels. 
This information can then be incorporated to 
determine the appropriate recovery rates to be 
used. Such an approach will amount to the use 
of a regime-switching model to determine the 
average recovery rates. 
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Expected Loss Under Migration Mode 
To derive the formula for expected loss under 
default mode we took into consideration the 
credit event that results in the issuer default¬ 
ing on debt payments. In general, this is not 
the only credit event the bondholder will ex¬ 
perience that influences the market price of the 
bond. More frequent are credit events that re¬ 
sult in rating upgrades or downgrades of the 
bond issuer. These credit events correspond to 
a change in the opinion of the rating agencies 
concerning the creditworthiness of the issuer. 
Since rating changes are issuer-specific credit 
events, the associated bond price changes will 
fall under credit risk. Including price risk result¬ 
ing from rating migrations in the calculation of 
potential credit losses is referred to as credit risk 
under migration mode. 

In practice, the change in bond price can 
be both positive and negative depending on 
whether the rating change results in an upgrade 
or downgrade, respectively. However, we will 
use the term "credit loss" generically to refer to 
a change in bond price as a result of a credit 
event. Before proceeding to derive the formula 
that quantifies expected loss under migration 
mode, we will indicate how the price change 
resulting from a credit event can be estimated. 

Estimating Price Changes Practitioners familiar 
with pricing of corporate bonds know that the 
issuer's rating does not fully explain yield dif¬ 
ferentials between bonds of similar maturities. 
In an empirical study, Elton, Gruber, Agrawal 
and Mann (2002) find that pricing errors can 
vary from 34 cents per $100 for Aa financials to 
over $1.17 for Baa industrials. Their study sug¬ 
gests that the following factors have an impor¬ 
tant influence on observed price differentials 
between corporate bonds: 

• The finer rating categories introduced by the 
major rating agencies when combined with 
the bond's maturity 

• Differences between Standard and Poor's and 
Moody's ratings for the issuer 


• Differences in expected recovery rate for the 
bond 

• The coupon on the bond due to different tax 
treatment 

• Whether the bond is new and has traded for 
more than one year 

These observations indicate that we cannot 
use generic yield curves for various rating 
grades to reprice bonds when the issuer's rating 
changes. We will have to adopt a different tech¬ 
nique to estimate the price risk resulting from 
rating changes. It is important to bear in mind 
that in the context of credit risk quantification, 
our objective is to estimate approximate price 
changes from rating migrations rather than to 
capture the correct trading price for the bond. 
To this end, rating migrations should result in 
a price change that is consistent with perceived 
change in the creditworthiness of the issuer. 

The technique we will adopt here to estimate 
the change in bond price due to a rating change 
makes use of the current modified duration and 
convexity of the bond. To determine the change 
in yield associated with a rating change, we will 
assume that there exists a fixed yield spread 
between each rating grade that is a function of 
the debt issue's seniority. These yield spreads 
will be taken relative to the government yield 
curve. If we denote modified duration of the 
bond by D and convexity by C, then the change 
in price of the bond due to a change Ay in the 
bond yield as a result of the rating change is 
given by. 

Price change = — P d i r ty x D x Ay + 0.5 

x P dirty x C x Ay 2 (30) 

Considering that our interest is to estimate the 
loss resulting from the rating change to quantify 
credit risk, the following equation is the one that 
is relevant to us: 

A P = Pdirty x D x Ay — 0.5 x Pdirty x C x Ay 2 

(31) 

The advantage of such a technique is that it 
will retain price differentials observed in the 
market between bonds with similar maturity 
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Table 3 Example Yield Spreads for Different Rating Grades and Debt Seniority 


Rating 

Grade 

Rating 

Description 

Senior 

Unsecured 

Subordinated 

1 

Aaa / AAA 

15 bp 

20 bp 

2 

Aal / AA+ 

30 bp 

40 bp 

3 

Aa2 / AA 

45 bp 

60 bp 

4 

Aa3 / AA- 

60 bp 

80 bp 

5 

A1 / A+ 

75 bp 

100 bp 

6 

A2 / A 

90 bp 

120 bp 

7 

A3/A- 

105 bp 

140 bp 

8 

Baal / BBB+ 

130 bp 

180 bp 

9 

Baa2 / BBB 

155 bp 

220 bp 

10 

Baa3 / BBB— 

180 bp 

260 bp 

11 

Bal / BB+ 

230 bp 

330 bp 

12 

Ba2 / BB 

280 bp 

410 bp 

13 

Ba3 / BB- 

330 bp 

480 bp 

14 

B1 / B+ 

430 bp 

610 bp 

15 

B2 / B 

530 bp 

740 bp 

16 

B3 / B- 

630 bp 

870 bp 

17 

Caa-C / CCC 

780 bp 

1040 bp 


and credit rating when the issuer migrates to 
a different rating grade. Table 3 shows the in¬ 
dicative yield spreads relative to government 
bonds for different rating grades as a function 
of the seniority of the debt issue. These yield 
spreads will be used to illustrate how the price 
change resulting from a rating migration can be 
estimated by using it in conjunction with the 
current duration and convexity of the bond. 

Deriving Expected Loss 

Unlike in the case of the default mode, the issuer 
can migrate to one of several rating grades un¬ 
der the migration mode during the course of the 
year. Associated with these rating migrations 
are discrete transition probabilities that com¬ 
prise the rows of the rating transition matrix 
given in Table 2. In the rating migration frame¬ 
work, the transition probabilities represent 
historical averages and can be treated as deter¬ 
ministic variables. The random variables here 
are the credit losses that the bondholder incurs 
when the issuer rating changes. The expected 
value of the credit loss for a rating change from 
the zth grade to the kih grade is given by, 

A Pile — Pdirty X T) X A y* 0.5 X Pdirty 

x C x A yl (32) 


In equation (32), Ay* denotes the yield change 
when the issuer rating changes from grade i to 
grade k. When the issuer migrates to the de¬ 
fault state, the credit loss A P* will be equal to 
the loss on default LD. Considering that there 
are 18 rating grades including the default state, 
the expected loss under the rating migration 
mode for an issuer whose current credit rating 
is i is given by, 

18 

EL = ^2 Pik x A Pik (33) 

k=l 

In equation (33), p* denotes the one-year tran¬ 
sition probability to migrate from rating grade i 
to rating grade k. The above equation quantifies 
the expected loss over a one-year horizon for a 
nominal exposure of one dollar. For a nominal 
exposure NE, the expected loss under migration 
mode is given by, 

18 

EL — NE x 'Y2 Pik x A P^ (34) 

k =1 

Unexpected Loss Under 
Migration Mode 

By definition, unexpected loss under migration 
mode is the standard deviation of the credit loss 
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variable. The loss variable under the migration 
mode is given by, 

18 

Z = Y Pik x AP* (35) 

it=i 


In equation (35), A P* denotes the credit loss 
when the credit rating changes from grade i to 
grade k, which is regarded as a random variable. 
The expected value of this random variable is 
A P^, and we shall denote its variance by cr£. 
When k is equal to the default state, a* will be 
equal to orr, which is the standard deviation of 
the recovery rate. Recalling equation (24), we 
can write the variance of the loss variable as, 

/ 18 

Var(l) = £ I ^ p,;t x A P| 
u=i 
- /18 

- E ( Y P‘ k x ' 

. \fc=l 

Taking expectations and making use of the 
relation (24) once more, we get the following 
expression for the variance of the loss variable: 



18 


Var(l) = Y P^ x ( AP lk + < 4 ) 

k=l 


Y Pik x AP ik 

_Jfc=l 


(37) 


If we assume that there is no uncertainty asso¬ 
ciated with the credit losses except in the default 
state, all a k terms in equation (37) will drop out 
other than a RR . Making this assumption and 
noting that p* is equal to PD when k is the de¬ 
fault state, the unexpected loss under migration 
mode for a nominal exposure NE is given by. 


UL 

= NE 



18 

r 18 "i 

X 

PD x ff 2 R +YPik x A P l ~ 

Y Pik X AP ik 

\ 

k =1 

Ufc=l J 


(38) 


Numerical Example 

In this section we will consider a numerical 
example to illustrate the computations of ex¬ 


Table 4 Security Level Details of the Example Bond 


Description 

Value 

Issuer rating grade 

A3 

Dirty price for $1 nominal 

1.0533 

Nominal exposure 

$1,000,000 

Modified duration 

4.021 

Convexity 

19.75 

Mean recovery rate 

35% 

Volatility of RR 

25% 


pected and unexpected losses under the de¬ 
fault mode and migration mode. The security 
level details of the example we will consider 
are given in Table 4. 

Since the mean recovery rate is assumed to be 
35%, the loss on default for this security is equal 
to 0.7033 for one-dollar nominal exposure. The 
probability of default for this security is equal to 
0.10%, which corresponds to the last column in 
row A3 of the transition matrix given in Table 
3. The expected and unexpected losses in the 
default mode when PD = 0.001 are given below. 

EL = NE x PD x LD 

= 1.000,000 x 0.001 x 0.7033 = $703.3 
UL = NE x yjPD x a RR + LD 2 x a RD 
= 1.000,000 

x >.001 x 0.25 2 + 0.7033 2 x 0.001 x (1 - 0.001) 

= $22,369.3 

Under the migration mode, the breakdown 
of the calculations involved in estimating ex¬ 
pected and unexpected losses are given in 
Table 5. 

The expected loss under migration mode is 
given by. 


18 

EL = NE x J2 Pik x A 
k= 1 

= 1,000,000 X 0.003132 

= $3,132 

The unexpected loss under migration mode is 
given by. 

2; 

ii 


x ^PD x ff 2 R + £ p k x A pf k - 

' 18 "I 2 

E Pik X A P ik 

Jc=l 


= 1,000,000 

X x/O.OOl X 0.25 2 + 6.803 x 10~ 4 - 0.003132 2 

= $27, 073.8 
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Table 5 Calculation of EL and UL Under Migration Mode 


Grade 

Pik 

A M 

A Pik 

Pik x A P lk 

Pik x A P| 

1 

0.05% 

-0.90% 

-0.0390 

-0.000019 

7.590E-07 

2 

0.11% 

-0.75% 

-0.0323 

-0.000036 

1.151E-06 

3 

0.05% 

-0.60% 

-0.0258 

-0.000013 

3.325E-07 

4 

0.24% 

-0.45% 

-0.0193 

-0.000046 

8.912E-07 

5 

1.55% 

-0.30% 

-0.0128 

-0.000198 

2.539E-06 

6 

8.68% 

-0.15% 

-0.0064 

-0.000553 

3.529E-06 

7 

75.40% 

0.00% 

0.0000 

0.000000 

0.000E+00 

8 

7.03% 

0.25% 

0.0105 

0.000740 

7.785E-06 

9 

3.83% 

0.50% 

0.0209 

0.000801 

1.676E-05 

10 

1.50% 

0.75% 

0.0312 

0.000468 

1.458E-05 

11 

0.57% 

1.25% 

0.0513 

0.000293 

1.501E-05 

12 

0.20% 

1.75% 

0.0709 

0.000142 

1.006E-05 

13 

0.23% 

2.25% 

0.0900 

0.000207 

1.864E-05 

14 

0.35% 

3.25% 

0.1267 

0.000443 

5.615E-05 

15 

0.05% 

4.25% 

0.1612 

0.000081 

1.299E-05 

16 

0.05% 

5.25% 

0.1937 

0.000097 

1.876E-05 

17 

0.01% 

6.75% 

0.2385 

0.000024 

5.688E-06 

18 

0.10% 


0.7033 

0.000703 

4.946E-04 




Sum 

0.003132 

6.803E-04 


It is useful to note here that under migration 
mode the expected loss is more than four times 
higher. The increase in the unexpected loss in 
migration mode is, however, only around 21% 
higher than the unexpected loss under default 
mode. 


KEY POINTS 

• Approaches used to determine default proba¬ 
bilities at the issuer level fall under two broad 
categories: the empirical approach that uses 
historical default data and public credit rat¬ 
ing schemes; and the structural approach that 
uses options theory framework. 

• Recovery rates on defaulted bonds vary over 
the business cycle and across industry sectors; 
and there is a negative relationship between 
recovery rates and probability of default. 

• Credit risk for a corporate bond can be quan¬ 
tified in terms of the first 


two moments of its loss distribution: expected 
loss and unexpected loss. 

* Approaches to quantifying credit risk fall un¬ 
der two categories: those that are based on 
two states of the world, namely default or 
no default; and those that include migrations 
to other credit rating categories including the 
state of default. 
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Abstract: Monte Carlo methods have become a valuable computational tool in modem finance as 
the increased availability of powerful computers has enhanced their efficiency. A particularly useful 
feature of Monte Carlo methods is that their computational complexity increases linearly with the 
number of variables. Moreover, they are flexible and easy to implement for a range of distributional 
assumptions for the underlying variables that influence the outcomes of interest. Monte Carlo 
methods are particularly effective for simulating credit loss distribution and for evaluating tail risk 
measures, and they are computationally less intensive than analytical methods. 


The distribution of portfolio credit risk is highly 
skewed and has a long fat tail. Unlike the 
case for a normally distributed loss distribu¬ 
tion, knowledge of the first two moments of 
the credit loss distribution provides little in¬ 
formation about tail risk. To compute tail risk 
(large losses that occur with a low probabil¬ 
ity) one has to simulate the credit loss distribu¬ 
tion using Monte Carlo techniques. In this entry 
we will provide a brief introduction to Monte 
Carlo methods and subsequently describe the 
computational process involved in performing 
a Monte Carlo simulation to generate the distri¬ 
bution of credit losses. Simulating the credit loss 
distribution is discussed under the assumption 
that the asset returns that drive credit events are 
either multivariate normal or multivariate t dis¬ 
tributed. The discussion and the examples cited 


in this entry assume that the credit risk arises 
from holding a portfolio of corporate bonds. 


MONTE CARLO METHODS 

Numerical methods known as Monte Carlo 
methods can be loosely described as statistical 
simulation methods that make use of sequences 
of random numbers to perform the simulation. 
The first documented account of Monte Carlo 
simulation dates back to the 18th century when 
a simulation technique was used to estimate 
the value n. However, it is only since the digi¬ 
tal computer era that this technique has gained 
scientific acceptance for solving complex nu¬ 
merical problems in various disciplines. The 
name "Monte Carlo" was coined by Metropolis 
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during the Manhattan Project of World War II 
because of the similarity of statistical simula¬ 
tion to games of chance symbolized by the cap¬ 
ital of Monaco. Von Neumann laid much of 
the early foundations of Monte Carlo simula¬ 
tion that require generation of pseudo-random 
number generators and inverse cumulative 
distribution functions. The application of 
Monte Carlo simulation techniques to finance 
was pioneered by Phelim Boyle (1977) in con¬ 
nection with pricing of options. 

It is tempting to think of Monte Carlo 
methods as a technique to simulate random 
processes that are described by a stochastic 
differential equation. This belief stems from 
the option pricing applications of Monte Carlo 
methods in finance where the underlying vari¬ 
able of interest is the evolution of stock prices 
that are described by a stochastic differential 
equation. However, this description is too re¬ 
strictive because many Monte Carlo applica¬ 
tions have no apparent stochastic content, such 
as the evaluation of a definite integral or inver¬ 
sion of a system of linear equations. In many 
applications of Monte Carlo methods, the only 
requirement is that the physical or mathemat¬ 
ical quantity of interest to us can be described 
by a probability distribution function. 

Monte Carlo methods have become a valu¬ 
able computational tool in modem finance to 
price complex derivative securities and to per¬ 
form value at risk calculations. An important 
advantage of Monte Carlo methods is that they 
are flexible and easy to implement. Further, the 
increased availability of powerful computers 
has enhanced the efficiency of these methods. 
Notwithstanding this, the method can still be 
slow and standard errors of estimates can be 
large when applied to high-dimensional prob¬ 
lems or if the region of interest to us is not 
around the mean of the distribution. In such 
cases, we require a large number of simulation 
runs to estimate the variable of interest with 
reasonable accuracy. The standard errors on 
the estimated parameters can be reduced using 
conventional variance reduction procedures 


such as control variate techniques or antithetic 
sampling approaches. 

More recent techniques to speed up the 
convergence of Monte Carlo methods for high¬ 
dimensional problems make use of determin¬ 
istic sequences rather than random sequences. 
These sequences are known by the name quasi¬ 
random sequences in contrast to the pseudo¬ 
random sequences commonly used in standard 
Monte Carlo methods. The advantage of using 
quasi-random sequences is that they generate 
sequences of ((-tuples that fill n-dimensional 
space more uniformly than uncorrelated points 
generated by pseudo-random sequences. How¬ 
ever, the computational advantage of quasi¬ 
random sequences diminishes as the number 
of variables increases beyond 30. 

An important advantage of Monte Carlo 
methods is that the computational complexity 
increases linearly with the number of variables. 
In contrast, the computational complexity in¬ 
creases exponentially in the number of variables 
for discrete probability tree approaches for solv¬ 
ing similar kinds of problems. This point is best 
illustrated by considering the problem of credit 
loss simulation. One approach to computing 
the loss distribution of a two-bond portfolio is 
to enumerate all possible combination of credit 
states this portfolio can be in one year's time. 
Assuming there are 18 possible credit states that 
each bond can be in, the two-bond portfolio 
could take one of 324 (18 times 18) credit states. 
Valuing the credit loss associated with each one 
of the 324 states will allow us to derive the credit 
loss distribution of the two-bond portfolio. If 
the number of bonds in the portfolio increases 
to 10, the total number of possible credit states 
will be equal to 18 to the power 10, which is 
equal to 3.57 x 10 12 credit states. Clearly, even 
with such a small portfolio, it is practically im¬ 
possible to enumerate all the states and com¬ 
pute the credit loss distribution. 

If we use Monte Carlo simulation, on the other 
hand, the problem complexity remains the same 
irrespective of whether the portfolio is com¬ 
prised of 2,10, or more bonds. In each of these 
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cases we may wish to run several scenarios, 
each of which corresponds to a simulation run, 
and under each scenario compute the credit loss 
associated with the portfolio. Performing many 
simulation runs will allow us to compute the 
credit loss distribution of the bond portfolio. As 
the number of bonds in the portfolio increases, 
the computational effort involved increases lin¬ 
early in the number of bonds in the portfolio. 

The basic building blocks for performing 
Monte Carlo simulation will require a scheme to 
generate uniformly distributed random num¬ 
bers and a suitable transformation algorithm if 
the probability distribution of the variable sim¬ 
ulated is different from a uniform distribution. 
Most applications in finance require the genera¬ 
tion of a normally distributed random variable. 
To simulate such a random variable, the stan¬ 
dard transformation techniques used are either 
the Box-Muller method or the inverse cumu¬ 
lative normal method. If the simulated random 
variables are greater than one, we need methods 
to generate correlated random numbers that 
model the relationship between the variables. 


Credit Loss Simulation 

At the security level, credit loss arises from 
credit events that include rating migrations and 
outright default. As these credit events are as¬ 
sociated with changes in perceptions about an 
obligor's ability to make the contractual debt 
payments, one needs to identify variables that 
influence the obligors' ability to pay. The vari¬ 
able that is often used in practice is the as¬ 
set returns of the obligor. The motivation for 
using asset returns is that changes in asset 
values of a firm influence its solvency posi¬ 
tion. When asset values fall below outstand¬ 
ing liabilities, the firm is no longer considered 
solvent. But other thresholds based on rating 
transition probabilities can be derived and used 
to infer how changes in asset values will influ¬ 
ence credit ratings. Simulating asset returns and 
checking their values against these thresholds 


will allow us to signal credit events, which can 
then be used to estimate the credit loss for a 
particular simulation run. 

Computing portfolio credit risk requires ex¬ 
tending the above approach to model joint 
rating migrations, which in turn requires mod¬ 
eling the comovement of asset returns of dif¬ 
ferent obligors. Considering that the marginal 
distribution of asset returns is assumed to be 
normal in Merton's option pricing framework 
(Merton, 1974), one can make a simplifying as¬ 
sumption that the joint distribution of asset 
returns is multivariate normal. The joint evo¬ 
lution of the asset returns of the obligors under 
the multivariate normal distribution will signal 
how the value of the portfolio evolves, or equiv¬ 
alently, what the credit loss on the portfolio will 
be. The distribution of obligor asset returns un¬ 
der the multivariate normal distribution can be 
generated using Monte Carlo simulation. This 
will allow us subsequently to compute the loss 
distribution of the bond portfolio resulting from 
credit events. 

The description given above provides the 
basic intuition behind the use of Monte Carlo 
simulation for computing the credit loss distri¬ 
bution. In the context of its intended use here, 
the Monte Carlo simulation technique can be 
described as a computational scheme that uti¬ 
lizes sequences of random numbers generated 
from a given probability distribution function 
to derive the distribution of portfolio credit 
loss. The distribution of portfolio credit loss 
can be computed both under the default mode, 
which only considers whether the obligor is 
solvent or not, and under the migration mode 
that includes credit events arising from rating 
changes. Consequently, to compute the credit 
loss under the default mode, we only need to 
consider the loss resulting from obligor default; 
whereas under the migration mode, we have to 
compute the credit loss associated with rating 
migrations in addition to the credit loss result¬ 
ing from obligor default. 

To generate the credit loss for one run of 
the Monte Carlo simulation, we need to go 


380 


Credit Risk Modeling 


through three computational steps described 
below. 

1. Simulate correlated random numbers that 
model the joint distribution of asset returns 
of the obligors in the portfolio. 

2. Infer the implied credit rating of each obligor 
based on simulated asset returns. 

3. Compute the potential loss in value based 
on the implied credit rating, and in those 
cases where the asset return value signals an 
obligor default, compute a random loss on 
default value by sampling from a beta distri¬ 
bution function. 

Repeating the above simulation run many 
times and computing the credit loss under each 
simulation run will allow us to generate the 
distribution of portfolio credit loss under the 
migration mode. If we are only interested in the 
credit loss distribution under the default mode, 
we can compute this by setting credit loss asso¬ 
ciated with rating migrations to zero in the sim¬ 
ulation run. In the following sections we will 
briefly describe the computational steps that are 
required to generate the credit loss distribution. 

Generating Correlated Asset 
Returns 

We briefly described earlier the steps involved 
in simulating the credit loss distribution for 
a bond portfolio. As the first step, we men¬ 
tioned that correlated random numbers that 
model the joint distribution of asset returns 
have to be simulated. An immediate question 
that will arise in our minds is whether the 
obligor-specific means and standard deviations 
of asset returns have to be taken into account 
in the simulations. The simple answer to this 
question is no. This is because the simulated 
asset returns will be compared against the rat¬ 
ing migration thresholds, which are computed 
under the assumption that asset returns are 
standardized normal random variables. As a 
result, the obligor-specific mean and standard 
deviation of asset returns are not required for 


simulating the loss distribution. Hence, we will 
assume that obligor asset returns are standard 
normal random variables (having mean zero 
and standard deviation equal to one). Under 
this assumption, the Monte Carlo simulation 
method will require generating a sequence 
of random vectors that are sampled from a 
standardized multivariate normal distribution. 

Many standard numerical packages provide 
routines to generate sequences of random 
vectors sampled from a multivariate normal 
distribution. Although the details of the im¬ 
plementation are not discussed here, we will 
briefly outline the numerical procedure com¬ 
monly used to generate sequences of multivari¬ 
ate normal random vectors. Let us assume that 
the multivariate normal random vector has a 
mean vector a and covariance matrix C. Co- 
variance matrices have the property that they 
are symmetric and positive definite (meaning 
all its eigenvalues are greater than zero). Given 
such a matrix, it is possible to find a unique 
lower triangular matrix L such that, 

LL T = C (1) 

The matrix L is referred to as the Cholesky fac¬ 
tor corresponding to the positive definite matrix 
C. Once the Cholesky factor is determined, gen¬ 
erating a sequence of random vectors with the 
desired multivariate distribution only requires 
generating a sequence of independent standard 
normal random variables. If x denotes the vec¬ 
tor of independent standard normal random 
variables, the vector r with the desired multi¬ 
variate normal distribution can be constructed 
as follows: 

r = a + Lx (2) 

The sequence of random vectors r that are 
generated will have the property that their joint 
distribution is multinormal with mean vector a 
and covariance matrix C. 

It is useful to note here that by setting the 
mean vector a to zero and the covariance ma¬ 
trix equal to the correlation matrix, we can 
generate a sequence of random vectors whose 
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Table 1 Rating Transition Probabilities and z-Thresholds 


Transition to Rating 

Transition 

Probabilities 

z-Threshold (Gaussian) 

z-Threshold (Student's t) 

A2-rated 

A3-rated 

A2-rated 

A3-rated 

A2-rated 

A3-rated 

Aaa 

0.05 

0.05 

3.28 

3.28 

5.04 

5.04 

Aal 

0.06 

0.11 

3.05 

2.95 

4.43 

4.15 

Aa2 

0.30 

0.05 

2.64 

2.86 

3.49 

3.96 

Aa3 

0.80 

0.24 

2.25 

2.61 

2.77 

3.43 

Al 

5.57 

1.55 

1.49 

2.05 

1.66 

2.45 

Al 

80.75 

8.68 

-1.15 

1.24 

-1.24 

1.35 

A3 

7.48 

75.40 

-1.65 

-1.08 

-1.86 

-1.16 

Baal 

2.99 

7.03 

-2.05 

-1.48 

-2.45 

-1.65 

Baa2 

0.83 

3.83 

-2.27 

-1.87 

-2.79 

-2.18 

Baa3 

0.41 

1.50 

-2.43 

-2.15 

-3.08 

-2.61 

Bal 

0.29 

0.57 

-2.60 

-2.33 

-3.40 

-2.90 

Ba2 

0.11 

0.20 

-2.69 

-2.41 

-3.58 

-3.05 

Ba3 

0.12 

0.23 

-2.81 

-2.54 

-3.86 

-3.28 

B1 

0.03 

0.35 

-2.86 

-2.85 

-3.96 

-3.96 

B2 

0.07 

0.05 

-2.98 

-2.94 

-4.25 

-4.15 

B2 

0.03 

0.05 

-3.06 

-3.06 

-4.43 

-4.43 

Caa-C 

0.03 

0.01 

-3.16 

-3.09 

-4.67 

-4.50 

Default 

0.08 

0.10 

-1000 

-1000 

-1000 

-1000 


joint distribution is standardized multivariate 
normal. Since the joint distribution of obligor 
asset returns was assumed to be standardized 
multivariate normal, this sequence of random 
vectors will be the one of interest to us. 


Inferring Implied Credit Rating 

The next step in the credit loss simulation pro¬ 
cess is to infer the credit rating of the various 
obligors in the portfolio as implied by the simu¬ 
lated asset return vector. In order to do this, we 
need to determine the thresholds against which 
the asset returns will be compared to identify 
rating changes or obligor default. To illustrate 
how these thresholds can be determined, let us 
consider an obligor that has a current credit rat¬ 
ing of Al. (Moody's rating categories are used 
here to denote the credit rating of an obligor.) 
Let pAi,Aaa denote the probability of transition¬ 
ing to the credit rating Aaa. Under the assump¬ 
tion that the asset returns of the obligor are 
normally distributed, the credit event that sig¬ 
nals the obligor rating migration from Al to Aaa 
will occur when the standardized asset returns 
of the obligor exceed the threshold zai, Aaa ■ This 


threshold can be determined by solving the fol¬ 
lowing integral equation: 


P Al, Ana 


oo 



%Al,Aaa 


x 2 )dx 


(3) 


A rating transition of this obligor from Al to 
Aal will occur if the asset return falls between 
the thresholds ZM,Aaa and zal.Aji- The thresh¬ 
old zai,A ai can be determined by solving the 
following integral equation: 

ZAI, Aaa 

PAA,A*\=—j= / exp (-\x 2 )dx (4) 

\Z2j x J 

Zai, Al 


One can extend this sequential rule to de¬ 
termine the thresholds for migrating to other 
rating grades. We note here that these z- 
thresholds are a function of the current credit 
rating of the obligor. Table 1 shows the rating 
transition probabilities and the corresponding 
z-thresholds for two different obligor credit rat¬ 
ings when the asset returns are assumed to be 
Gaussian (normal distribution). 

Let us consider the two-bond portfolio given 
in Table 2 to illustrate the specific steps involved 
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Table 2 Security Level Details for the Two-Bond 
Portfolio 


Description 

Bond 1 

Bond 2 

Issuer rating grade 

A3 

A2 

Dirty price for $1 nominal 

1.0533 

1.0029 

Nominal exposure 

$1,000,000 

$1,000,000 

Modified duration 

4.021 

3.747 

Convexity 

19.75 

16.45 

Mean recovery rate 

35% 

35% 

Volatility of recovery rate 

25% 

25% 


in computing the credit loss from one simula¬ 
tion run for this portfolio. Suppose during one 
draw from a bivariate normal distribution the 
random asset returns are, respectively, 2.5 for 
bond 1 and -3.5 for bond 2. Given the initial is¬ 
suer rating of A3 for bond 1, one can infer from 
the z-threshold values for A3-rated issuers in 
Table 1 that an asset return value of 2.5 implies 
a credit rating change of the issuer to an A1 rat¬ 
ing. Similarly, one can infer from Table 1 that 
an asset return value of -3.5 for an A2-rated is¬ 
suer will imply that the issuer defaults on the 
outstanding debt. Proceeding in this manner, 
the implied credit rating of the debt issuers in 
the two-bond portfolio for every simulation run 
can be derived on the basis of the z-threshold 
values in Table 1. 

For a general /(-bond portfolio, the implied 
credit rating of the debt issuers for each sim¬ 
ulation run can be similarly determined. It is 
important to note here that the number of oblig¬ 
ors in an /(-bond portfolio will be less than or 
equal to n. In the case where there are fewer 
than n obligors, credit rating changes should 
be identical for all bonds issued by the same 
obligor in any simulation run. This has the im¬ 
plication that the dimension of the simulated 
asset return vector should be equal to the num¬ 
ber of obligors or debt issuers in the bond 
portfolio. 

Computing Credit Loss 

Once the implied rating changes for the oblig¬ 
ors are determined for the simulated asset re¬ 


turn vector, the corresponding credit loss asso¬ 
ciated with such implied rating changes could 
be determined. It is important to note here that 
we generically refer to the price change result¬ 
ing from the rating change as a loss although 
a credit improvement of the obligor will result 
in a price appreciation for the bond. The price 
change of a bond as a result of a rating change 
for the bond issuer will be a function of the 
change in the yield spreads and the maturity 
of the bond. Assuming that our interest is to 
estimate the credit loss due to a change in the 
bond's mark to market value as a result of the 
rating change, we would want to know at what 
time horizon the bond's price has to be marked 
to market. If we were to compute the worst-case 
loss scenario, it would correspond to a rating 
change of the obligor during the next trading 
day. In this case, the current trading price of 
the bond and its risk parameters, duration, and 
convexity serve to characterize the credit loss. 
The credit loss resulting from a rating change 
from the zth grade to the /cth grade will be a 
function of the change in the bond yield and is 
given by, 

A Pik = 

Pdirty X D X Ay ik - 0.5x P dirty x C x Ayj k 

( 5 ) 

In equation (5), Pdirty is the dirty price of the 
bond (accrued interest plus traded price), A yi k 
is the yield change when issuer rating changes 
from grade i to grade k, D is the modified dura¬ 
tion of the bond, and C the convexity. When the 
issuer migrates to the default state, the credit 
loss will be equal to the dirty price Pdirty minus 
the recovery rate. 

To illustrate the credit loss computation, let us 
again focus on the two-bond portfolio example. 
In this example, the asset return value signaled 
an upgrade to an At rating from the current 
rating of A3 for bond 1. Suppose the change 
in the yield spread associated with this rating 
change is -30 basis points. Then, substituting 
the various parameter values into equation (5), 
the credit loss for $1 million notional amount 
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held of bond 1 is given by. 


zth simulation run, denoted £, , is given by. 


Credit loss = 1,000, 000 x [1.0533 x 4.021 

x(—0.003) - 0.5 x 1.0533x19.75 
x(—0.003) 2 ] = —$12, 799.6 


We note here that the negative sign associated 
with the credit loss is suggesting that this rating 
change results in a profit rather than a loss. 

For bond 2, the simulated asset return value of 
-3.5 implies default of the obligor. In this case, 
we must find a random loss on default, which 
will be a function of the assumed recovery rate 
distribution. Many credit risk models assume 
the recovery rate process to have a beta distri¬ 
bution with mean ji and standard deviation cr. 
Given the values for // and a , the parameters 
a and [i that define the beta distribution with 
the desired mean and standard deviation can 
be computed as given below: 


a = 


P = 


h 2 ( 1 - b) 



a 

- a 


( 6 ) 

( 7 ) 


For the bond in question, let us assume 
the mean recovery rate to be p = 35% and 
the standard deviation of the recovery rate to 
be a = 25%. Corresponding to these recovery 
rate values, the parameters of the beta distribu¬ 
tion function are a = 0.924 and f J > = 1.716. 

The random recovery rate for bond 2 for the 
simulation run is determined by drawing a ran¬ 
dom number from a beta distribution with a 
and /3 parameter values as above. Let us as¬ 
sume that the simulated recovery value is 40% 
for bond 2. The implied loss on default for the 
bond that trades at a dirty price of $1.0533 is 
then equal to 0.6533 (bond dirty price minus 
the recovery value). The credit loss arising from 
bond 2 for this simulation run will be equal to 
the nominal exposure times the loss on default, 
which is equal to $653,300. 

For the two-bond portfolio, the total credit 
loss for this simulation run is the sum of the 
two losses. If this simulation run corresponds 
to the zth run, the portfolio credit loss under the 


li = —$12, 799.6 + $653, 300 = $640, 500.4 

It is important to emphasize here that for a 
general iz-bond portfolio, all bonds of a particu¬ 
lar issuer should have the same recovery value 
for any one simulation run if they have the same 
seniority. This information must be taken into 
account when simulating the credit loss distri¬ 
bution of a general n-bond portfolio. 


Computing Expected and 
Unexpected Loss 

The above procedure outlined how the portfolio 
credit loss can be computed for one simulation 
run. By repeating the simulation run N times 
where N is sufficiently large, the distribution 
of the credit losses can be generated. Given the 
simulated loss distribution, one can compute 
various risk measures of interest. For instance, 
the expected and unexpected credit loss (the 
first two moments of the loss distribution) us¬ 
ing the simulated loss data can be computed as 
follows: 


£l ' = nX> 


i=i 


UL P = 


N 


N 


1 N 

— £(4-EL P )2 

1 i =1 


( 8 ) 

( 9 ) 


To reduce the standard error of the estimated 
portfolio expected loss, it is common practice 
to perform antithetic sampling when perform¬ 
ing the Monte Carlo simulation. The idea be¬ 
hind antithetic sampling technique is that when 
random samples are drawn from a symmetric 
distribution, sampling errors can be avoided if 
the antithetic or symmetric part of the random 
sample is also drawn. This will ensure that the 
empirical mean of the random samples is equal 
to the mean of the distribution function from 
which the samples are drawn. Including the an¬ 
tithetic part of the samples will double the total 
number of simulation runs. 
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Importance Sampling 

The Monte Carlo simulation technique de¬ 
scribed so far is based on random sampling. 
In such a sampling process, the probability of 
any value being generated is proportional to 
the probability density at that point. This prop¬ 
erty will have the effect of generating asset 
return values in the simulations that tend to 
cluster around the mean of the normal distri¬ 
bution function. Rating migrations and obligor 
defaults, however, are events that are driven 
by asset return values that deviate significantly 
from the mean of the normal distribution. The 
implication is that a significant proportion of 
the simulation runs will not trigger any credit 
events. If our intention is to compute the ex¬ 
pected and unexpected loss of the portfolio 
from the simulations, random sampling will be 
the appropriate method to use. If, on the other 
hand, we expect to compute risk measures asso¬ 
ciated with tail events from the simulated data, 
random sampling will be inefficient. 

If our primary intention of performing Monte 
Carlo simulations is to compute tail risk mea¬ 
sures (to be discussed in the next section), we 
can improve the simulation efficiency through 
importance sampling (see Glasserman and Li, 
2005). Simulation efficiency in our context refers 
to the number of simulation runs required to 
compute the risk measure of interest for a spec¬ 
ified standard error of the estimate. Importance 
sampling artificially inflates the probability of 
choosing random samples from those regions 
of the distribution that are of most interest to 
us. This would mean that our sampling pro¬ 
cess is biased in such a manner that a large 
number of credit events are simulated relative 
to what would occur in practice. In the Monte 
Carlo simulation terminology, the adjustment 
made to the probability of a particular point 
being sampled is referred to as its importance 
weight. To estimate the true probability distri¬ 
bution of the simulated losses when performing 
importance sampling, we have to restore the ac¬ 
tual probability of each sample by multiplying 
it by the inverse of its importance weight. In 


practice, when the number of obligors in the 
portfolio is large (this is usually true for the 
benchmark portfolio), performing importance 
sampling will lead to improved computational 
efficiency. 

Tail Risk Measures 

The discussions so far focused on how the mean 
(expected loss) and standard deviation (unex¬ 
pected loss) of the credit loss distribution for 
a corporate bond portfolio can be computed 
from the simulations. If the distribution of credit 
losses is normally distributed, standard devia¬ 
tion can be interpreted as the maximum devia¬ 
tion around the mean that will not be exceeded 
with a 66% level of confidence. Since the credit 
loss distribution is not normal, a similar inter¬ 
pretation to the standard deviation of credit loss 
does not hold. In most cases, computing the 
probability of incurring a large credit loss on a 
corporate bond portfolio using unexpected loss 
information is usually not possible. 

In general, a major preoccupation of most cor¬ 
porate bond portfolio managers is to structure 
the portfolio so as to minimize the probability 
of large losses. To do this an estimate of the po¬ 
tential downside risk of the portfolio becomes 
a key requirement. Computing any downside 
risk measure requires an estimate of the prob¬ 
ability mass associated with the tail of the loss 
distribution. If the simulated credit loss distri¬ 
bution is available, it is quite easy to derive ap¬ 
propriate tail risk measures of interest. For a 
corporate bond portfolio, the tail risk measures 
of interest are credit value at risk and expected 
shortfall risk. Both these risk measures are dis¬ 
cussed below, and the method to compute these 
measures using the simulated loss distribution 
is also indicated. 

Credit Value at Risk 

Credit value at risk (CVaR) is a tail risk measure 
that quantifies the extreme losses arising from 
credit events that can occur at a prespecified 
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level of confidence over a given time horizon. 
In practical terms, C VaR provides an estimate of 
the maximum credit loss on a portfolio, which 
could be exceeded by a probability p. Without 
loss of generality, it will be assumed that this 
probability is expressed in percentage. If the 
probability p is chosen to be sufficiently small, 
one can expect that the credit loss will not ex¬ 
ceed the CVaR amount at a high confidence 
level given by (100 — p)%. Stated differently, 
CVaR at a confidence level of (100 — p)% refers 
to the maximum dollar value of loss that will 
only be exceeded p% of the time over the given 
time horizon. Since losses from credit risk are 
measured over a one-year horizon, the CVaR 
measure we will compute also relates to a one- 
year time horizon. 

In order to compute CVaR to quantify the tail 
risk of the credit loss distribution in a corporate 
bond portfolio, we need to specify the confi¬ 
dence level at which it should be determined. 
Within the framework of economic capital al¬ 
location, CVaR is usually measured at a confi¬ 
dence level that reflects the solvency standard 
of the institution in question. For instance, the 
solvency standard of an AA-rated institution is 
typically 99.97%, and hence, CVaR will be com¬ 
puted at this confidence level. From a portfolio 
management perspective, however, the confi¬ 
dence level of interest for CVaR estimate would 
typically be much lower. The motivation for 
this is that portfolio managers have to pro¬ 
vide monthly performance reports to clients, 
and return deviations over this period need to 
be explained. In this case, estimating CVaR at a 
confidence level of 91.6% would imply that the 
underperformance relative to the benchmark 
will exceed the monthly CVaR estimate once 
during the year on average if monthly per¬ 
formance reporting is used. In this case, the 
CVaR estimate provides useful information to 
the portfolio manager and the client in terms of 
both the return surprises one could expect and 
also to actually observe it happen. 

Motivated by the above observation, we will 
choose the confidence level for the CVaR esti¬ 


mate to be 90%. At this level of confidence, the 
portfolio manager can expect the credit losses 
to exceed the monthly CVaR estimate for one 
reporting period during the year. Once the con¬ 
fidence level for CVaR is specified, estimating 
CVaR from the simulated loss distribution is 
quite simple. If, for instance, the number of 
simulation runs is equal to 10,000, then the 
90% CVaR will be equal to the 1,000th worst- 
case credit loss. Assuming that the simulated 
credit losses are sorted in an ascending order of 
magnitude, the credit loss corresponding to the 
9,000th row in the sorted data will be the CVaR 
at 90% confidence level for 10,000 simulation 
runs. 

Considering that standard practice in portfo¬ 
lio management is to report risk measures rela¬ 
tive to the current market value of the portfolio, 
we will introduce the term "percentage credit 
value at risk." If Mp denotes the current mark 
to market value of the portfolio, the percentage 
CVaR at 90% confidence level is defined as, 

%CVaRgo°/ 0 = ——- (10) 

Mp 

Expected Shortfall 

Although CVaR is a useful tail risk measure, it 
fails to reflect the severity of loss in the worst- 
case scenarios in which the loss exceeds CVaR. 
In other words, CVaR fails to provide insight 
as to how far the tail of the loss distribution ex¬ 
tends. This information is critical if the portfolio 
manager is interested in restricting the severity 
of the losses in the worst-case scenarios under 
which losses exceed CVaR. In order to better 
motivate this point. Figure 1 shows the credit 
loss distribution for two portfolios that have 
identical CVaR at the 90% level of confidence. 

Examining Figure 1 it is clear that although 
both portfolios have identical CVaR at the 90% 
confidence level, the severity of the worst-case 
losses that exceed the 90% confidence level 
are lower for portfolio 1 compared to portfo¬ 
lio 2. This example suggests that in order to 
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Figure 1 Credit Loss Distribution for Two Port¬ 
folios 



Figure 2 Various Risk Measures for Portfolio 
Credit Risk 


investigate whether portfolio credit risk is well 
diversified, it is not sufficient if we only exam¬ 
ine the tail probability beyond some confidence 
level. Examining the loss exceedence beyond 
the desired confidence level at which CVaR is 
estimated is important to gauge the loss sever¬ 
ity in the tail part of the loss distribution. 

One such risk measure that provides an es¬ 
timate of the loss severity in the tail part of 
the loss distribution is the expected shortfall (ES), 
which is sometimes also referred to as condi¬ 
tional VaR. Similar to CVaR, expected short¬ 
fall requires specifying a confidence level and 
a time horizon. Considering that ES is usually 
used in conjunction with CVaR, the confidence 
level should be chosen as 90% and the time hori¬ 
zon one year. A simple interpretation of ES is 
that it measures the average loss in the worst p% 
scenarios where (100 — p)% denotes the con¬ 
fidence level at which CVaR is estimated. In 
mathematical terms, expected shortfall can be 
defined as the conditional expectation of that 
part of the credit loss that exceeds the CVaR 
limit. The interpretation of ES as conditional 
VaR follows from this definition. If £ denotes 
the loss variable, ES can be defined as given 
below: 

ES = E[l | £ > CVaR] (11) 

Given the simulated loss distribution of the 
portfolio, computing expected shortfall risk is 
quite simple. Let £; denote the simulated credit 
loss distribution for the z’th simulation run, and 
let us assume that the losses are sorted in as¬ 
cending order. If the number of simulation runs 
is equal to N, the relevant equation to compute 


ES at the 90% confidence level from the simula¬ 
tions is given below: 


es 90 % 


I 

(1 — 0.9)N 


N 

x J2 

z=0.9N+l 


( 12 ) 


The percentage ES at 90% confidence level is 
defined as, 

%£S 90 % = (13) 

JVlp 

Figure 2 shows the various credit risk mea¬ 
sures presented here that can be computed from 
the simulated loss data. 


Relaxing the Normal Distribution 
Assumption 

A growing body of empirical studies conducted 
on financial time series data suggests that re¬ 
turns on traded financial instruments exhibit 
volatility clustering and extreme movements 
that are not representative of a normally dis¬ 
tributed random variable. Another commonly 
observed property of financial time series is that 
during times of large market moves, there is 
greater degree of comovement of returns across 
many firms compared to those observed during 
normal market conditions. This property, usu¬ 
ally referred to as tail dependence, captures the 
extent to which the dependence (or correlation) 
between random variables arises from extreme 
observations. Stated differently, for a given level 
of correlation between the random variables 
a multivariate distribution with tail depen¬ 
dence has a much greater tendency to generate 
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simultaneous extreme values for the random 
variables in contrast to those distributions that 
do not have this property. 

A multivariate normal distribution does not 
exhibit tail dependence. The dependence or 
correlation structure exhibited between the 
random variables in a multivariate normal dis¬ 
tribution arises primarily from comovements 
of the variables around the center of the distri¬ 
bution. As a consequence, contagion or herd¬ 
ing behavior commonly observed in financial 
markets is difficult to model within the frame¬ 
work of multivariate normal distributions. In 
order to capture contagion and herding behav¬ 
ior in financial markets, distributions that ex¬ 
hibit tail dependence should be used to model 
financial variables of interest. In the context of 
credit risk modeling, contagion effects would 
result in greater comovement of asset returns 
across firms during periods of recession lead¬ 
ing to higher probability of joint defaults. If 
we model the joint distribution of asset returns 
to be multivariate normal, we will fail to cap¬ 
ture the effects of contagion in the aggregate 
portfolio credit risk measures we compute. In 
the next section we relax the assumption that 
the distribution of asset returns is multivariate 
normal. 


Student's t Distribution 

Among the class of distribution functions that 
exhibit tail dependence, the family of multivari¬ 
ate normal mixture distributions, which include 
Student's t distribution and generalized hyper¬ 
bolic distribution, is an interesting alternative. 
This is because normal mixture distributions in¬ 
herit the correlation matrix of the multivariate 
normal distribution. Hence, correlation matri¬ 
ces for normal mixture distributions are easy to 
calibrate. 

Formally, a member of the m-dimensional 
family of variance mixtures of normal distribu¬ 
tions is equal in distribution to the product of a 
scalar random variable s and a normal random 


vector u having zero mean and covariance ma¬ 
trix £. The scalar random variable s is assumed 
to be positive with finite second moment and 
independent of u. If 3c denotes a random vec¬ 
tor having a multivariate normal mixture dis¬ 
tribution, our definition leads to the following 
equation: 

x = s ■ u (14) 

Since normal mixture distributions inherit the 
correlation matrix of the multivariate normal 
distribution, we have the following relation¬ 
ship: 

Corr(xi, x k ) = Corr(uj, u k ) (15) 

The random vector x will have multivariate 
t distribution with v degrees of freedom if the 
scalar random variable s is defined as below: 



In equation (16), u> is a chi-square distributed 
random variable with v degrees of freedom. 
For v > 2, the resulting Student's t distribu¬ 
tion will have zero mean vector and covariance 
matrix -^£. The Student's t distribution has 
the property that as v increases, the distribu¬ 
tion approaches a normal distribution. In fact, 
for values of v greater than 25, it is difficult to 
distinguish between a normal distribution and 
t distribution. In a multivariate setting, as v de¬ 
creases, the degree of tail dependence between 
the random variables will increase. For finan¬ 
cial time series, v is typically around 4 (Platen 
and Sidorowicz, 2007). 

An important distinction between the t 
distribution and the normal distribution is 
that uncorrelated random variables are mutu¬ 
ally independent, whereas the components of 
multivariate t are in general dependent even if 
they are uncorrelated. In modeling credit risk, 
this property makes it possible to capture co¬ 
movements of asset returns between firms in 
extreme market situations even if the asset re¬ 
turns exhibit little or no correlation under nor¬ 
mal market conditions. 
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In the univariate case, the probability density 
function of the Student's f distribution with v 
degrees of freedom has the following functional 
form: 


fv(x) — 


r((v +1)/2) 

/v7T x r(v/2) 


X 



In equation (17) T(-) is the gamma 
which is given by 


—(«+ l )/2 

(17) 

function. 


CXJ 

r(«) = J: 


~ x e~ x dx 


(18) 


Loss Simulation Under Multivariate 
t Distribution 

The steps involved in simulating the credit loss 
distribution when asset returns are multivari¬ 
ate f follow the same procedure as for the mul¬ 
tivariate normal case. Instead of generating the 
sequence of correlated asset returns from a mul¬ 
tivariate normal distribution, we now have to 
generate this sequence from a multivariate f dis¬ 
tribution. The next step will involve inferring 
the credit rating change of the various oblig¬ 
ors in the portfolio as implied by the simulated 
asset return vector. To do this, we need to de¬ 
termine the thresholds against which the as¬ 
set returns will be compared to identify rating 
changes or obligor default. These z-thresholds 
have to be calibrated to correspond to the Stu¬ 
dent's f distribution. Specifically, the integrand 
for computing the z-thresholds will be the 
Student's f density function. 

For purpose of illustration, let us consider an 
obligor that has a current credit rating of Al. 
Let pAi,Ana denote the probability of transition¬ 
ing to the credit rating Aaa. Under the assump¬ 
tion that the asset returns of the obligor are 
f-distributed, the credit event that signals the 
obligor rating migration from Al to Aaa will 
occur when the asset returns of the obligor ex¬ 
ceed the threshold zai.Aja- This threshold can 
be determined by solving the following integral 


equation: 


P Al,Aaa~ 


r((v +1)/2) 

y/vjt X T(v/2) 


oo 

/ 


r 2\ -( v + l )/2 

1+— 

V 


dx 


(19) 

A rating transition of this obligor from Al to 
Aal will occur if the asset return falls between 
the thresholds Zai.Aou and zai.Abi- The threshold 
zAi, a« l can be determined by solving the follow¬ 
ing integral equation: 


PAl.Aal — - 


r((v+i)/2) 

Thr xT(v/2) 


z Ai,Aaa 


I 


r 2\ —(v+\)/2 

1 + — 

V 


dx 


( 20 ) 

One can extend this sequential rule to deter¬ 
mine the thresholds for migrating to other rat¬ 
ing grades. Table 1 shows the z-threshold values 
computed using the rating transition probabil¬ 
ities for A2- and A3-rated obligors when the 
asset returns are f-distributed. 

The rest of this section discusses the proce¬ 
dure to generate a sequence of random vectors 
from a multivariate f distribution. Following the 
discussion earlier in this entry, a random vector 
with multivariate f distribution having v de¬ 
grees of freedom can be derived by combining 
a chi-square random variable with v degrees 
of freedom and a random vector that is nor¬ 
mally distributed and independent of the chi- 
square random variable. This procedure will 
allow us to generate a sequence of multivariate 
f-distributed random vectors with v degrees of 
freedom. 

To generate a sequence of chi-square dis¬ 
tributed random variables, the standard pro¬ 
cedure is to make use of the relationship 
between chi-square distribution and gamma 
distribution. A random variable x is said to have 
gamma distribution if its density function is de¬ 
fined as below: 


/(*) = 


r(a)/6 Q 


r“ l e x/ P, x > 0 


0, 


x < 0 


( 21 ) 


In equation (21) a > 0 and ft > 0 are the pa¬ 
rameters of the gamma distribution, and r(a) 
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is the gamma function given by equation (18). 
The chi-square distribution with v degrees of 
freedom is a special case of the gamma dis¬ 
tribution with parameter values a = v / 2 and 

P = 2 . 

Given the above relationship between gamma 
and chi-square distribution, a sequence of ran¬ 
dom variables having chi-square distribution 
with v degrees of freedom can be generated 
by sampling from a gamma distribution with 
parameter values a = v/2 and f = 2. Most 
standard software packages provide routines to 
generate random sequences from a gamma dis¬ 
tribution. Hence, we will not discuss the details 
concerned with generating such a sequence of 
random variables. 

To summarize, the following are the steps 
involved in generating an n-dimensional se¬ 
quence of multivariate t distributed random 
variables with v degrees of freedom. 

Step 1: Compute the Cholesky factor L of the 
matrix C where C is the n x n asset return 
correlation matrix. 

Step 2: Simulate n independent standard nor¬ 
mal random variates Zi, Z 2 , ■ ■ ■, z n and set 
u = L z. 

Step 3: Simulate a random variate a) from 
chi-square distribution with v degrees of 
freedom that is independent of the normal 
random variates and set s = ^. 

V<T> 

Step 4: Set x = s ■ u which represents the de¬ 
sired n-dimensional t variate with v degrees 
of freedom and correlation matrix C. 

Repeating the steps 2 to 4 will allow us to gen¬ 
erate the sequence of multivariate f-distributed 
random variables. 

Computing the credit loss for the two-bond 
portfolio in Table 2 will require comparing the 
asset return values under each simulation run 
against the z-thresholds given in Table 1 to trig¬ 
ger rating migrations and defaults for the oblig¬ 
ors in the two-bond portfolio. On the basis of the 
implied rating changes assigned to the obligors 


using simulated asset returns, the credit loss for 
each simulation run can be calculated. The rest 
of the steps involved in computing the credit 
risk measures of interest from the simulated 
loss distribution are identical to the ones for 
the normal distribution case. 

KEY POINTS 

• Monte Carlo methods provide a flexible tool 
to simulate credit loss distribution and are 
relatively simple to implement. 

• To simulate the credit loss distribution under 
the rating migration mode, rating transition 
probabilities have to be transformed into cor¬ 
responding z-thresholds for the assumed dis¬ 
tribution function for the asset returns. 

• Simulating multivariate t random vectors re¬ 
quires appropriately scaling the sequence of 
multivariate normal vectors by another se¬ 
quence of chi-square random variables that 
are uncorrelated with the normal random 
vectors. 

• From the simulated loss distribution, various 
tail risk measures of interest can be computed. 

• Using techniques such as importance sam¬ 
pling can significantly reduce the standard 
errors of tail risk measures for a given num¬ 
ber of simulation runs. 
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Abstract: Extensive empirical research has shown that the spread volatility of credit securities is 
linearly proportional to their level of spread. This finding holds true across corporate and sovereign 
issuers, for both cash and credit default swaps. A superior measure of spread risk for credit securities 
is the product of spread duration and spread, a measure referred to as duration times spread (DTS). 
DTS measures the sensitivity of the price of a bond to relative changes in spread, which are much 
more stable through time and cross-sectionally than absolute spread volatilities. DTS allows for 
better risk projection, hedging, replication, and portfolio construction. 


The traditional presentation of the asset al¬ 
location in a portfolio or a benchmark is in 
terms of percentage of market value. For fixed- 
income portfolios, it is widely recognized that 
this is not sufficient, as differences in durations 
can cause two portfolios with the same mar¬ 
ket weight allocations to have very different ex¬ 
posures to macro-level risks. Market practices 
have evolved to address this issue. A common 
approach to structuring a fixed-income portfo¬ 
lio or comparing it to a benchmark is to partition 
it into homogeneous market cells comprised 
of securities with similar characteristics. Many 
fixed-income portfolio managers have become 


accustomed to expressing their cell allocations 
in terms of contributions to duration—the prod¬ 
uct of the percentage of portfolio market value 
in a given market cell and the average duration 
of securities comprising that cell. This repre¬ 
sents the sensitivity of the portfolio to a par¬ 
allel shift in yields across all securities within 
this market cell. For credit portfolios, the cor¬ 
responding measure would be contributions to 
spread duration, measuring the sensitivity to a 
parallel shift in spreads. Determining the set of 
active spread duration bets to different market 
cells and issuers is one of the primary decisions 
made by credit portfolio managers. 
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Yet all spread durations were not created 
equal. Just as one could create a portfolio 
that matches the benchmark exactly by mar¬ 
ket weights, but clearly takes more credit risk 
(e.g., by investing in the longest duration cred¬ 
its within each cell), one could match the bench¬ 
mark exactly by spread duration contributions 
and still take more credit risk—by choosing 
the securities with the widest spreads within 
each cell. These bonds presumably trade wider 
than their peer groups for a reason—that is, the 
market consensus has determined that they are 
more risky—and they are often referred to as 
high beta, because their spreads tend to react 
more strongly than the rest of the market to 
a systematic shock. We found strong empiri¬ 
cal evidence that this relation takes on a nearly 
perfect linear form: Spread changes are linearly 
proportional to spread levels at the start of the 
period. 

Based on the linear relation between spread 
level and the volatility of spread changes, we 
have advocated since 2005 a new measure of 
risk sensitivity that utilizes spreads as a funda¬ 
mental part of the credit portfolio management 
process. To reflect the view that higher spread 
credits represent greater exposures to sector- 
specific risks, we represent sector exposures 
by contributions to duration times spread (DTS), 
computed as the product of market weight, 
spread duration, and spread. For example, an 
overweight of 5% to a market cell implemented 
by purchasing bonds with a spread of 80 basis 
points (bps) and spread duration of three years 
would be equivalent to an overweight of 3% us¬ 
ing bonds with an average spread of 50 bps and 
spread duration of eight years. 

The shift from spread duration exposures to 
DTS exposures as the measure of market risk 
sensitivity embraces a different paradigm for 
credit spread movement—in the form of rela¬ 
tive spread changes rather than parallel shifts in 
spread. The introduction of the DTS paradigm 
was motivated by an extensive empirical study 
using over 560,000 monthly observations of in¬ 
dividual corporate bonds spreads, spanning the 


period of September 1989 to January 2005. 1 The 
analysis showed that changes in spreads are 
not parallel, but rather depend on the level 
of spread. Specifically, spread change volatility 
(both systematic and idiosyncratic) was shown 
to be linearly proportional to spread level for 
both investment-grade and high-yield credit se¬ 
curities, irrespective of the sector, duration, or 
time period. Subsequent studies indicated that 
the results were not confined to the realm of 
U.S. corporate bonds, but also extend to other 
spread asset classes with a significant default 
risk such as credit default swaps, European 
corporate and sovereign bonds, and emerg¬ 
ing market sovereign debt denominated in U.S. 
dollars. 2 Furthermore, even from a theoretical 
standpoint structural credit risk models such as 
Merton (1974) imply a near-linear relation be¬ 
tween spread level and volatility. 3 

The DTS concept has many implications for 
portfolio managers, both in terms of the way 
they manage exposures to industry and credit 
quality factors (systematic risk) and in terms of 
their approach to issuer exposures (nonsystem- 
atic risk). After a short review of the DTS con¬ 
cept and the empirical evidence supporting it, 
we discuss how it can help investors improve 
projected risk estimates, hedging, replication, 
and portfolio construction. 

THE DTS CONCEPT 

To understand the intuition behind DTS, con¬ 
sider the return, R sp read/ due strictly to change 
in spread. Let D denote the spread duration of a 
bond and s its spread; the spread change return 
is then: 

^spread — D ■ As (1) 

Or, equivalently. 

As 

-^spread = D ■ S • —— (2) 

That is, just as spread duration is the sensitiv¬ 
ity to an absolute change in spread (e.g., spreads 
widen by 5 bps), DTS (D • s) is the sensitivity to 
a relative change in spread (e.g., spreads widen 
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by 5%). Note that this notion of relative spread 
change provides for a formal expression of the 
idea mentioned earlier—that credits with wider 
spreads are riskier since they tend to experience 
greater spread changes. 

In the absolute spread change approach 
shown in equation (1), we can see that the vola¬ 
tility of excess returns can be approximated by 

^return — D ■ <T spread ( 3 ) 

while in the relative spread change approach of 
equation (2), excess return volatility follows 

a retum =D. S Xptad e (4) 

Given that the two representations above are 
equivalent, why should one of them be prefer¬ 
able to another? The key advantage of model¬ 
ing changes in spreads in relative terms is the 
resulting stability. The above equations, for sim¬ 
plicity, present returns and volatilities as ideal¬ 
ized concepts. We have not added subscripts 
to specify whether we are referring to specific 
securities or sectors, or over what time period. 
Yet the way spread changes of different securi¬ 
ties relate to each other, or the way volatilities 
in one time period relate to those in another, 
can be of critical importance in measuring and 
controlling portfolio risk. 

For example, to determine a portfolio's expo¬ 
sure to a systematic widening of spreads, one 
needs to know how spread changes are likely to 
be realized across a sector. If one is concerned 
that spreads might move in parallel, then expo¬ 
sures should be measured as the overall contri¬ 
bution to spread duration as per equation (1). 
However, if spreads tend to change proportion¬ 
ally, then the contribution to DTS provides the 
correct exposure to such an event. 

Similarly, volatility can be measured or pro¬ 
jected in many different ways. Historically 
realized volatilities can be measured using 
observed spread changes at a specified fre¬ 
quency over a given sample period. Projec¬ 
tions of forward-looking volatilities are the 
key building blocks of risk management sys¬ 
tems. The accuracy with which historically re¬ 


alized volatilities can project future volatilities 
is therefore of fundamental importance. If rel¬ 
ative spread volatilities can be predicted with 
greater accuracy than absolute spread volatili¬ 
ties, then equation (4) should be preferred over 
(3). We found this to be the case, based on ex¬ 
tensive empirical evidence from credit markets. 


DTS AS BETA-ADJUSTED 
SPREAD DURATION 

What are the dynamics of credit spread 
changes? Do spreads tend to widen in paral¬ 
lel, or do wider spreads widen by more? Fig¬ 
ure 1 shows a specific example in which spread 
changes show a clear dependence on spread. 
The figure shows the changes in spreads expe¬ 
rienced by key issuers in the Communications 
sector of the Barclays Capital Corporate Index 
in January 2001, during a temporary rally in 
the midst of the dot-com crisis. It is clear that 
this sector-wide rally was not characterized by 
a purely parallel shift; rather, issuers with wider 
spreads tightened by more. 

Certainly, not all spread changes follow such 
a clear pattern. In many months, there are 
no large industry-wide spread changes, and 
spread changes are mostly idiosyncratic in na¬ 
ture. Occasionally, an industry will experience 
a systematic spread change that does seem to 
take the form of a parallel shift. However, an ex¬ 
tensive set of regressions using individual bond 
spread changes across eight distinct market sec¬ 
tors and 185 months indicated that systematic 
factors expressed in terms of relative spread 
changes across an industry were able to capture 
nearly twice as much of the overall spread vari¬ 
ance as factors based on parallel shifts in indus¬ 
try spreads. Furthermore, Ben Dor et al. (2007) 
found clear evidence that whenever a system¬ 
atic widening or tightening of spreads across 
an industry occurred, credits with the highest 
spreads at the beginning of the month were 
most likely to experience the largest change in 
spreads. 
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Figure 1 Average Spreads and Spread Changes for Key Issuers in the Communications Sector of the 
Barclays Capital Corporate Index (January 2001) 

Source: Barclays Capital. 


This idea may strike investors as reminiscent 
of the idea of "market beta" that is familiar 
from the capital asset pricing model (CAPM), 
in which the beta of a given security represents 
the extent to which it would be expected to par¬ 
ticipate in a market-wide rally or decline. Some 
credit market investors, in fact, have used mod¬ 
els of beta-adjusted spread duration to measure 
systematic risk exposures. The difficulty with this 
approach lies in estimating the betas. Empiri¬ 
cal betas can be backed out of historical data, 
for example, by regressing the spread changes 
realized by a given bond against the average 
spread changes of the sector. However, it is not 
clear how much historical data to use for this 
purpose—a short sample may not give a good 
statistical estimate, but a long sample may in¬ 
clude observations from a time when the se¬ 
curity had very different characteristics. From 
this viewpoint, we can offer another interpre¬ 
tation of DTS. Essentially, DTS can be viewed 
as an implementation of beta-adjusted spread 
duration, in which the betas are provided by 
the market in the form of spreads. The ra¬ 
tio of a given issuer's spread to the average 
spread for the industry gives its beta, or sen¬ 
sitivity, to a relative spread change across the 
industry. 


To demonstrate this, we carried out head-to- 
head tests of DTS versus empirical betas using 
weekly spread change data from the credit de¬ 
fault swap (CDS) market. 4 In the first test, we 
measured the empirical betas of each issuer's 
CDS with respect to its industry peer group. 
We then tested two different predictors for this 
beta—either the empirical beta from the prior 
period, or the ratio of issuer DTS to the indus¬ 
try average DTS as of the beginning of the pe¬ 
riod. In the second test, we set up long-short 
CDS trades between two issuers from within 
the same industry and investigated different 
approaches to setting up the hedge ratios so 
as to minimize the systematic risk exposures of 
the trades. The DTS approach was found to be 
superior to empirical betas for both tasks. 


THE RELATION BETWEEN 
SPREAD VOLATILITY AND 
SPREAD LEVEL 

We now turn our attention to the dependence 
of spread change behavior on spread level. 
Figure 2 plots the relation between systematic 
spread volatility and spread level using over 
15 years of monthly spread change data from 
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Figure 2 Systematic Spread Volatility versus Spread Level 

Note: Based on monthly observations of bonds rated Aaa to B in the Barclays Capital Corporate and 
High-Yield Indexes, September 1989-January 2005. 

Source: Barclays Capital. 


U.S. credit markets, spanning investment-grade 
and high-yield rated bonds. The bonds in the 
Barclays Capital indexes for these markets were 
partitioned each month by sector, quality, and 
spread level. The average spread level for 
each market cell is plotted against the time- 
series volatility of the average absolute spread 
changes in each month. The results suggest that 
spread volatility can be closely approximated 
by a simple linear model of the form 

<read‘ e (s) = ^ (5) 

This simple model provides an excellent fit 
to the data shown in Figure 2, with 0 equal to 
9.4% irrespective of sector or maturity. Hence, 
the results suggest that the historical volatil¬ 
ity of systematic spread movements can be 
expressed quite compactly, in terms of a rela¬ 
tive spread change volatility of about 9% per 
month. That is, spread volatility for a market 
segment trading at 50 bps should be about 
4.5 bps / month, while that of a market segment 
at 200 bps should be about 18 bps/month. Ben 
Dor et al. (2007) documented a similar pattern 
for idiosyncratic volatility: The cross-sectional 
volatility of credit spread changes across a sec¬ 


tor also exhibits a linear dependence on spread 
with about the same slope. 

The results in Figure 2 suggest that measur¬ 
ing spread volatility in relative terms should be 
much more stable than absolute spread volatil¬ 
ities, and therefore forms the basis for more ac¬ 
curate projections of forward-looking risk. The 
advantage of using relative spread volatility 
should be particularly strong in the event of 
a market crisis. If we plot the absolute spread 
volatilities of various assets in the postcrisis 
period against their precrisis volatilities, we 
will find a marked increase across the board. 
Essentially, market data from the earlier pe¬ 
riod becomes useless for estimating risk in the 
postcrisis world. However, if we work with 
relative spread volatilities, we may find that 
they have not changed that much. The abso¬ 
lute spread volatility increases proportionally 
with the spread level, and the relative spread 
volatility remains stable. This relationship is 
illustrated in Figure 3 using data from U.S. 
credit markets in the period before and after the 
Russian crisis of 1998. 

Two clear phenomena can be observed in Fig¬ 
ure 3. First, as discussed above, most of the 
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Figure 3 Absolute and Relative Spread Change Volatility before and after 1998 
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Source: Barclays Capital. 


observations representing absolute spread 
volatilities are located far above the diagonal, 
pointing to an increase in volatility in the sec¬ 
ond period of the sample. In contrast, rela¬ 
tive spread volatilities are quite stable, with 
almost all observations located on the 45-degree 
line or very close to it. This is because the 
pickup in volatility in the second period was 
accompanied by a similar increase in spreads. 
Second, the relative spread volatilities of var¬ 
ious sectors are quite tightly clustered, rang¬ 
ing from 5%/month to a bit over 10%/month, 
whereas the range of absolute volatilities is 
much wider, ranging from 5 bps/month to 
more than 20 bps / month. 

The results in Figure 3 exhibit the sharp dis¬ 
continuity in credit market volatility that was 
experienced in 1998 due to the Russian crisis 
and the LTCM hedge fund failure. Since the 
introduction of DTS, global markets have pro¬ 
vided us with ample opportunity to test the 
model with data from new out-of-sample crises. 
In both the credit crisis of 2007-2009 and the 
ensuing sovereign crisis that began in 2009, we 
have found that the DTS model has performed 


admirably. In each case, a plot of precrisis vs. 
postcrisis volatility reveals results similar to 
Figure 3, showing the stability advantage of rel¬ 
ative spread volatilities. The incorporation of 
spread into the projection of risk was shown to 
keep risk projections much more accurate than 
traditional absolute volatility risk measures. 5 

These results clearly indicate that absolute 
spread volatility is highly unstable and tends to 
rise with increasing spread. Computing volatil¬ 
ities based on relative spread change generates 
a more stable time series. These findings have 
important implications for the appropriate way 
of measuring credit exposures and projecting 
excess return volatility, which we discuss next. 


DTS AND EXCESS RETURN 
VOLATILITY 

If the volatility of both systematic and idiosyn¬ 
cratic spread changes is proportional to the level 
of spread, then equation (4) suggests two as¬ 
sertions regarding excess returns. First, excess 
return volatility should increase linearly with 
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DTS, where the slope represents the volatility 
of relative spread changes. Second, the mag¬ 
nitude of excess return volatility should be 
approximately equal across portfolios with sim¬ 
ilar DTS values, irrespective of their spread and 
spread duration characteristics. The results of 
Ben Dor et al. (2007) strongly supported both 
empirical predictions. 

Another implication of the linear relation be¬ 
tween spread level and spread volatility is that 
projecting volatility based on the current level 
of spread and the DTS slope from Figure 2 
should be superior to using historical realiza¬ 
tions of absolute spread changes. Specifically, 
using the product of DTS and the historical 
volatility of relative spread changes should gen¬ 
erate better risk estimates than the product of 
spread duration and volatility of past absolute 
spread changes. 

Our results confirmed that the DTS-based es¬ 
timator was superior. A further indication that 
the DTS-based risk projection was more accu¬ 
rate is that it resulted in a smaller number of 
extreme realizations (above or below two stan¬ 
dard deviations) than either of two estimators 
based on absolute spread volatility, using trail¬ 
ing windows of two different lengths. 

Our understanding of these results is that 
the approach based on relative spread change 
volatility is able to give a more timely risk pro¬ 
jection since it can react almost instantaneously 
to a change in market conditions reflected in 
the spread of the security. This should help the 
model react more quickly both to increase risk 
estimates at the onset of a crisis and to relax 
them once the turbulence subsides. Any sig¬ 
nificant widening or tightening of spreads will 
immediately flow through the DTS into the pro¬ 
jection of excess return volatility. 


IMPLICATIONS OF DTS FOR 
PORTFOLIO MANAGERS 

We have highlighted above the key points that 
emerge from the empirical evidence supporting 


the DTS paradigm. Spread changes are propor¬ 
tional to the level of spread. Systematic changes 
in spread across a sector tend to follow a pattern 
of relative spread change, in which bonds trad¬ 
ing at wider spreads experience larger spread 
changes. The systematic spread volatility of a 
given sector (if viewed in terms of absolute 
spread changes) is proportional to the average 
spread in the sector; the nonsystematic spread 
volatility of a particular bond or issuer is pro¬ 
portional to its spread as well. Those findings 
hold irrespective of sector, duration, or time 
period. 

There are several implications for a portfo¬ 
lio manager who wishes to act on these re¬ 
sults. First, the best measure of exposure to 
a systematic change in spread within a given 
sector or industry is not the contribution to 
spread duration, but the contribution to DTS. 
At many asset management firms, the targeted 
active exposures for a portfolio relative to its 
benchmark are expressed as contribution-to- 
duration overweights and underweights within 
a sector by quality grid. Reports on the ac¬ 
tual portfolio follow the same format. In the 
relative spread change paradigm, managers 
would express their targeted overweights and 
underweights in terms of contributions to DTS 
instead. 

Second, our finding that the volatility of non¬ 
systematic return is proportional to DTS offers 
a simple mechanism for defining an issuer limit 
policy that enforces smaller positions in more 
risky credits. Many investors specify ad hoc 
weight caps by credit quality to control issuer 
risk. Alternatively, we can set a limit on the 
overall contribution to DTS for any single is¬ 
suer. For example, say the product of Market 
value percentage x Spread x Duration must be 
5 or less. Then, a position in issuer A, with a 
spread of 100 bps and a duration of five years, 
could be up to 1% of portfolio market value; 
while a position in issuer B, with a spread of 
150 and an average duration of 10 years, would 
be limited to 0.33%. Issuer limits based on DTS 
and those based on market weight each have 
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their advantages and disadvantages; investors 
might want to consider some combination of 
the two. 6 

Third, DTS can help improve the hedging 
of security-vs.-security or security-vs.-market 
credit trades. Say a hedge fund manager has 
a view on the relative performance of two is¬ 
suers within the same industry and would like 
to capitalize on this view by going long issuer 
A and short issuer B in a market-neutral man¬ 
ner. How do we define market neutrality? A 
typical approach might be to match the dol¬ 
lar durations of the two bonds, or to go long 
and short CDS of the same maturities with the 
same notional amounts. However, if issuer A 
trades at a wider spread than issuer B, our re¬ 
sults would indicate that a better hedge against 
market-wide spread changes would be ob¬ 
tained by using more of issuer B, so as to match 
the contributions to DTS on the two sides of 
the trade. 

Fourth, portfolio management tools such 
as risk and performance attribution models 
should represent sector exposures in terms of 
DTS contributions and sector spread changes 
in relative terms. A risk model for any asset 
class is essentially a set of factors that char¬ 
acterize the main risks that securities in that 
asset class are exposed to. The risk of an indi¬ 
vidual security or portfolio is computed based 
on its sensitivities to the various risk factors and 
the factor volatilities and correlations estimated 
from their past realizations. For credit-risky se¬ 
curities, traditional risk factors typically mea¬ 
sure absolute spread changes based on a sector 
by quality partition that spans the universe of 
bonds. A risk factor specification based instead 
on relative spread changes has two important 
benefits. First, such factors would exhibit more 
stability over time and allow better forward- 
looking risk forecasts. Second, the partition by 
quality would no longer be necessary to control 
risk, and each sector can be represented by a 
single risk factor. This would allow managers to 
express more focused views, essentially trading 
off the elimination of the quality-based factors 


with a more finely grained partition by indus¬ 
try. Similarly, a key goal for performance attri¬ 
bution models is to match the allocation process 
as closely as possible. If and when a manager 
starts to state allocation decisions in terms of 
DTS exposures, performance attribution should 
follow suit. 

One practical difficulty that may arise in the 
implementation of DTS-based models is an 
increased vulnerability to pricing noise. Any 
small discrepancies in asset pricing should 
cause only small discrepancies in market val¬ 
ues, but may potentially result in much larger 
variations in spreads. Consequently, managers 
who rely heavily on contribution-to-DTS expo¬ 
sures will need to implement strict quality con¬ 
trols on pricing. 

Perhaps one of the most useful applications of 
DTS is in the management of core-plus portfo¬ 
lios that combine both investment-grade and 
high-yield assets. Traditionally, investment- 
grade credit portfolios are managed based 
on contributions to duration, while high-yield 
portfolios are managed based on market value 
weights. Using contributions to DTS across both 
markets could help bring consistency to this 
portfolio construction process. Skeptics may 
point out that in high-yield markets, especially 
when moving toward the distressed segment, 
neither durations nor spreads are particularly 
meaningful, and the market tends to trade on 
price, based on an estimated recovery value. A 
useful property of DTS in that context is that 
in the case of distressed issuers, where shorter 
duration securities tend to have artificially high 
spreads, DTS is fairly constant across the matu¬ 
rity spectrum, so that managing issuer contri¬ 
butions to DTS becomes roughly equivalent to 
managing issuer market weights. 

The introduction of the DTS paradigm has 
had wide-ranging effects. It changed portfolio 
management practices across the industry and 
has been incorporated into some of the lead¬ 
ing portfolio management analytics systems. 
We view it as a fundamental insight into the 
behavior of credit markets. 
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KEY POINTS 

• Changes in credit spreads tend to be propor¬ 
tional to spread levels. 

• Volatility of relative spread changes is more 
stable than volatility of absolute spread 
changes. This applies to all credit securities 
with a default component including corpo¬ 
rate and sovereign issuers in developed and 
emerging market countries for both cash and 
derivatives. 

• Whereas spread duration measures sensitiv¬ 
ity to a parallel shift in spreads, DTS measures 
sensitivity to a relative change in spreads. 

• The risk associated with credit spread ex¬ 
posures can therefore be managed more ef¬ 
fectively using contributions to DTS than 
contributions to spread duration. This is true 
at the level of asset classes, industries, and 
individual issuers. 

• Including spread in the estimation of risk can 
reduce the need to rely on credit ratings, al¬ 
lowing risk models to provide greater indus¬ 
try detail. 

NOTES 

1. See Ben Dor, Dynkin et al. (2007). 

2. For example, see Ben Dor, Polbennikov, and 
Rosten (2007) and Ben Dor, Desclee, Hyman, 
and Polbennikov (2010). 


3. See Chapter 4 in Ben Dor et al. (2012). 

4. For details, see Chapter 8 in Ben Dor et al 

( 2012 ). 

5. SeeBenDoretal. (2012) for details. The appli¬ 
cation of DTS to the modeling of European 
sovereign risk is discussed in Chapter 3 of 
Ben Dor et al. (2012), and Chapter 10 reviews 
the performance of the model through the 
2007-2009 credit crisis. 

6. For a detailed discussion of different ap¬ 
proaches to issuer limits, see Chapter 11 in 
Ben Dor et al. (2012). 
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Abstract: Credit spread decomposition refers to breaking down a bond's option-adjusted spread 
to Treasuries into market-wide risk premium, expected default loss, and expected liquidity cost 
components. Credit spread decomposition is implemented empirically by regressing a bond's 
option-adjusted spread on a measure of its expected default losses (credit default swap spread) and 
expected liquidity cost. Credit spread decomposition can help investors determine the extent to 
which credit spreads reflect expected default losses, liquidity costs, or a market-wide risk premium. 
Investors can also apply spread decomposition analysis to construct targeted hedging strategies 
and to identify relative value opportunities. Regulators can use spread decomposition to monitor 
separately the liquidity and credit risk of the institutions they supervise, and to help determine 
capital adequacy. 


At issuance, a credit bond has a positive yield 
spread (i.e., a credit spread) over comparable- 
maturity Treasury bonds to compensate in¬ 
vestors for the chance that the bond may default 
with a recovery value less than par. However, 
studies have documented that credit spreads 
are generally much larger than justified by their 
subsequent default and recovery experience. 1 

Beyond expected default losses, a portion of 
the credit spread may reflect the expected liquid¬ 
ity cost to execute a roundtrip trade. This cost 
is typically greater for a credit bond than for a 
comparable-maturity Treasury bond. Investors 
who anticipate selling a credit bond at some 
point demand compensation for this cost at the 
time of purchase in the form of a wider spread. 
Another portion of the credit spread may re¬ 


flect a market-wide risk premium demanded by 
risk-averse investors due to the general uncer¬ 
tainty associated with the timing, magnitude, 
and recovery of defaults and the magnitude of 
liquidity costs. The greater the degree of this un¬ 
certainty, or the more risk-averse the marginal 
investor, the more the credit spread will exceed 
the expected default cost. Credit spread decomposi¬ 
tion refers to the econometric exercise of break¬ 
ing down a bond's option-adjusted spread 
(OAS) to Treasuries into its risk premium, ex¬ 
pected default loss, and expected liquidity cost 
components. 

Credit spread decomposition can serve many 
purposes. For example, suppose an insurance 
company, typically a buy-and-hold investor, is 
considering investing in credit bonds trading 
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at wide spreads. The company's decision will 
likely depend on whether the wide spreads are 
due to large expected default losses, high liq¬ 
uidity costs, or a high market risk premium. 
Presumably, the company can ride out periods 
of high liquidity cost and risk aversion. How¬ 
ever, if the wide spreads reflect high expected 
default losses, the company may decide not to 
invest. 

This entry begins with an example highlight¬ 
ing the ability of credit spread decomposition 
to reveal additional information hidden in a 
bond's OAS. Next, the entry outlines the speci¬ 
fication of the spread decomposition model and 
shows how it can be implemented. Following a 
discussion on how to interpret the model re¬ 
sults, the entry illustrates how they can be used 
in portfolio management applications. The en¬ 
try concludes with a discussion of some alterna¬ 
tive specifications of the spread decomposition 
model. 


REVEALING THE DRIVERS 
OF CREDIT SPREADS 

To illustrate the informational value of spread 
decomposition, consider the historical spread 


behavior of a typical investment-grade bond. 
As shown in Figure 1 the bond's OAS var¬ 
ied over time. The figure also shows the 
level of the issuer's credit default swap (CDS) 
spread—a measure of expected default losses. 
While movements in the bond's OAS loosely 
tracked changes in the issuer's CDS, there was a 
wide and variable gap between the two spreads, 
reflecting movements in risk premium and ex¬ 
pected liquidity costs. 

Figure 1 also plots the bond's expected liq¬ 
uidity cost over the same period. To measure a 
bond's liquidity cost investors can use a bond's 
bid-ask spread (in price terms) expressed as a 
percentage of the bond's bid price. This cost is 
labeled as the bond's liquidity cost score (LCS) 
by Dastidar and Phelps (2009). Much of the vari¬ 
ability in the OAS-CDS spread gap (the CDS- 
cash basis) mirrored movements in the bond's 
LCS. The initial rise in the issuer's OAS was 
driven by both default and liquidity concerns 
(all three lines moved up), whereas the larger 
subsequent spike was mainly a liquidity event 
(the line plotting the LCS moved up sharply 
while the CDS line was little changed). This ex¬ 
ample illustrates that investors need to measure 
the components of OAS separately to more fully 
understand OAS movements. 


OAS & CDS (bps) LCS (bps) 



Figure 1 OAS, CDS, and LCS of a Typical Bond over Time 
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CREDIT SPREAD 
DECOMPOSITION: MODEL 
SPECIFICATION AND 
IMPLEMENTATION 

To decompose credit spreads, a bond's OAS 
is regressed against three variables: a variable 
reflecting expected default cost, a variable re¬ 
flecting expected liquidity cost, and a market¬ 
wide variable unrelated to the bond's attributes 
representing the market- wide risk premium de¬ 
manded by investors. Conceptually, for every 
time f, the cross-sectional OLS regression model 
is: 

OASu = a t + fit ExpectedDefaultCost u 

+ Yt ExpectedLiquidityCost it + py 

The risk premium variable (the intercept term, 
a) represents a market-level risk premium, not 
a risk premium specific to each bond. The value 
of the intercept is likely, but not necessarily, to 
be positive, reflecting that equilibrium credit 
spreads are typically determined at the margin 
by risk-averse investors. 

Any bond-level risk premium is likely to be 
highly correlated with the bond's default cost or 
liquidity cost. In other words, an investor will 
demand a higher spread premium for a bond 
with a high liquidity cost as compensation for 
liquidity cost uncertainty. This makes it diffi¬ 
cult to decompose a bond's spread into separate 
expected liquidity cost and liquidity risk pre¬ 
mium components. The same applies to default 
cost and default risk premium. If default risk or 
liquidity risk premiums are highly correlated 
with default or liquidity costs, then the regres¬ 
sion coefficients (f> and y) will be larger and/or 
more significant. Any part of the risk premi¬ 
ums that is unrelated to bond-level default 
and liquidity cost—in other words, a market- 
level risk premium—will show up in the 
intercept. 

Credit spread decomposition is implemented 
empirically by running the following regression 
across a set of bonds (denoted by i) at a given 


time t: 

OASu = ot t + p t CDSn + Yt LCS/t + pit ( 1 ) 

The LCS is used to measure bond-level ex¬ 
pected liquidity cost. An issuer's CDS (with a 
similar spread duration as the bond) is used 
to measure its expected default cost (i.e., de¬ 
fault probability and loss given default). If the 
CDS itself is illiquid, it will contain some illiq¬ 
uidity premium, thereby distorting results. So, 
only liquid CDS should be chosen. While an 
issuer's CDS can be used to measure the ex¬ 
pected default cost of its bonds, other measures 
of expected default cost could be used in lieu 
of CDS. For example, some investors may use 
firm-specific fundamental information, equity 
prices, and macroeconomic data to estimate an 
issuer's default probability and recovery rate. 

To get a sense of the value of incorporating 
a bond-level liquidity variable to explain the 
cross-sectional distribution of spreads, an in¬ 
vestor can first estimate the model without LCS 
as an explanatory variable. The model can then 
be re-estimated adding LCS to see if the regres¬ 
sion's fit improves and does not detract from 
the explanatory power of CDS. If LCS is a use¬ 
ful explanatory variable, adding LCS as a re¬ 
gressor should produce an improvement in the 
adjusted R 2 and a significant (and positive) LCS 
coefficient, with little disturbance to the signif¬ 
icance and magnitude of the CDS coefficient. 


INTERPRETING THE 
RESULTS OF THE CREDIT 
SPREAD DECOMPOSITION 
MODEL 

The estimated regression coefficients can be 
used to break down the average OAS into 
the three spread components in terms of basis 
points. For example, suppose the average OAS, 
CDS, and LCS are 2.09%, 1.14%, and 0.73%, re¬ 
spectively. In addition, suppose the estimated 
coefficients of CDS and LCS are 0.67 and 1.41, 
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Figure 2 Relative Contribution of Default Cost, Liquidity Cost, and Risk Premium over Time 


respectively. A variable's contribution to the av¬ 
erage OAS can be determined by multiplying 
the average value of the variable by its esti¬ 
mated regression coefficient (e.g., 1.41 x 0.73% 
is the contribution of LCS to the average OAS). 
Repeating spread decomposition at different 
time periods can show fluctuations in the rel¬ 
ative contributions to OAS of the three compo¬ 
nents over time as shown in Figure 2. 

When liquidity is abundant, LCS might not 
play an important role in explaining spread dif¬ 
ferences across bonds. In fact, adding LCS to 
the regression may not meaningfully improve 
the R 2 . In contrast, when liquidity conditions 
deteriorate, adding LCS to the regression will 
likely improve the R 2 . 

As discussed, the regression intercept cap¬ 
tures the portion of (average) spread that is in¬ 
dependent of CDS and LCS. The market risk 
premium is likely to be, at times, an important 
contributor to the level of OAS. The time series 
can be used as an indicator of the variation of 
the market risk premium—or risk aversion—in 
the credit market. When the intercept explains 
a relatively high proportion of OAS, this sug¬ 
gests that systematic market factors, rather than 
bond-specific factors, are driving spreads. This 
may occur because of very high levels of aggre¬ 


gate risk aversion or because the market is pric¬ 
ing bonds with little concern for issuer-specific 
information. When the intercept explains a rela¬ 
tively low proportion of OAS, this suggests that 
bond-specific factors are driving spreads. 

The regression coefficients for both CDS and 
LCS are expected to be positive. While the rela¬ 
tionship of CDS with OAS is naturally tight, 
it may not be as close as one might think. 
Since default risk for high-grade bonds has been 
very low over long periods of time, a relatively 
large proportion of the OAS is likely liquidity- 
related. 


APPLICATIONS OF CREDIT 
SPREAD DECOMPOSITION 

The parameter estimates from the spread de¬ 
composition model can be used in a variety 
of portfolio management applications. Active 
portfolio managers can use spread decompo¬ 
sition to take positions in specific bonds with 
large liquidity or default components, depend¬ 
ing on their views about how these components 
are likely to evolve. Regulators can use spread 
decomposition to monitor separately the liq¬ 
uidity and credit risk embedded in the credit 
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portfolios of the institutions they supervise, 
which can help determine capital adequacy. 

Presented below is a discussion of two im¬ 
portant applications of credit spread decompo¬ 
sition: identifying bonds that may be trading 
"rich" or "cheap," and allowing the manager to 
construct hedges that target specific drivers of 
OAS fluctuations. 

Identifying Relative Value 

So far, credit spread decomposition analysis has 
been described using contemporaneous data to 
attribute OAS levels to default and liquidity 
cost components at a given time. However, in¬ 
vestors can apply spread decomposition analy¬ 
sis to ex ante investment decisions as well. 

In principle, spread decomposition should 
help identify relative value opportunities. A 
bond's OAS can be compared with its estimated 
OAS using the parameters from the spread de¬ 
composition model. If the actual OAS is wider 
than the estimated OAS, it suggests that the 
bond is trading too wide, and vice versa. This 
may be a signal that the bond's OAS may 
change to correct this "mispricing." 

To examine whether the realized residuals fji.t, 
from (1) can help predict future OAS changes, 
one can examine whether the bond's future 
OAS changes are of the opposite sign to the 
sign of the residual by running the follow¬ 
ing regression and testing to see if the O 's are 
negative. 

AOASit't + j = u t + 0 t fin + 8 t MonthDummy t + e,f 

( 2 ) 

Hedging a Credit Bond Portfolio 

One method to determine a hedge for a credit 
is to use regression to examine the historical 
relationship between the bond's OAS and po¬ 
tential hedge variables. The issuer's CDS may 
be an effective hedge targeted against changes 
in expected default losses. Since movements in 
the volatility index (VIX) are closely related to 
changes in LCS, 2 VIX futures can potentially 


be used as a credit hedging instrument to target 
spread changes related to changes in liquidity. 

If an investor seeks to hedge the default 
or liquidity components separately, then the 
contribution to OAS in basis points from 
the credit spread decomposition model (in 
differences—discussed below) determines the 
appropriate hedge ratio for each component. Of 
course, the success of such a hedge depends on 
the goodness of fit and whether the historical 
relationship will hold in the future. 

ALTERNATIVE CREDIT 
SPREAD DECOMPOSITION 
MODELS 

There are alternative formulations of the credit 
spread decomposition model. As discussed ear¬ 
lier, the analysis has ignored explicit bond-level 
risk premium variables. Instead, it assumes that 
any bond-level risk premium is highly related 
to either the expected liquidity cost or the ex¬ 
pected default cost. An alternative model can 
include a term representing a bond-level liq¬ 
uidity risk premium. This additional term re¬ 
flects compensation demanded by investors for 
the risk that the actual cost at liquidation may 
be different from the expected liquidity cost as 
measured by the current LCS. A bond's LCS 
volatility over the prior 12 months can be con¬ 
sidered a measure of liquidity risk. For example, 
two bonds may have the same LCS today, but 
bond A may have a much more volatile LCS his¬ 
tory than bond B. An investor may view bond 
A as having a riskier liquidity cost and demand 
an OAS premium versus bond B, all else equal. 

The equation below shows the spread de¬ 
composition model incorporating a bond-level 
liquidity risk factor, LCSVolt /t . Generally, the 
results may show that LCSVolj t is highly sig¬ 
nificant, but absorbs part of the effect of LCS, 
thereby not improving the regression's adjusted 
R 2 substantially. 

OASit — cut + Pt C DStt + ytLCSit + (ptLCSVolu 
+ S t MonthDummy t + rut (3) 
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The credit spread decomposition model can 
also be estimated in differences to check if 
changes in the liquidity and default proxies 
affect changes in OAS (i.e., contemporaneous 
returns). The regression model below details 
the specification, where AOASu, A CDSu, and 
ALCSn refer to changes in a bond's characteris¬ 
tics in consecutive periods. As described above, 
this model of spread decomposition can be used 
for designing targeted hedges. 

A OASit = &t T fit ACDSit -}- yt ALCSa 

+ 8 t MonthDummy f + ry t (4) 

Finally, the spread decomposition model may 
be susceptible to outliers, especially since de¬ 
fault and liquidity are arguably more impor¬ 
tant considerations for higher spread bonds. To 
check this, one can run log regressions (e.g., 
the dependent variable is log(OAS) instead of 
OAS, similarly for the independent variables), 
as shown below. If the conclusions from the log 
model are unchanged, this would indicate that 
outliers are not driving the results. 

ln(OAS/t) = + fit ln(CDSif) + y t ln(LCS, t ) + rju 

(5) 


KEY POINTS 

• Credit spread decomposition refers to break¬ 
ing down a bond's option-adjusted spread 
(OAS) to Treasuries into market risk pre¬ 
mium, expected default loss, and expected 
liquidity cost components. 


• To decompose credit spreads, a bond's OAS is 
regressed on a measure of its expected default 
cost (CDS) and expected liquidity cost (LCS). 

• Credit spread decomposition can help credit 
investors determine the extent to which 
spreads reflect expected default losses, high 
liquidity costs, or a high market-wide risk 
premium, and make portfolio decisions ac¬ 
cordingly. 

• Investors can also apply spread decomposi¬ 
tion analysis for determining targeted hedg¬ 
ing strategies and to help identify relative 
value opportunities. 

NOTES 

1. See, for example, Ng and Phelps (2011) and 
Elton et al. (2001). 

2. Dastidar and Phelps (2009). 
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Credit Derivatives and 
Hedging Credit Risk 
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Abstract: The credit crisis of 2007-2009 in the United States and Europe and the collapse of the 
Japanese bubble in the 1990-2002 period show that, without hedging credit risk, the largest financial 
institutions in the world are very likely to fail. Many trillions of dollars of taxpayer bailouts have put 
the credit quality of the United States and Japan at risk. The solution to this financial institutions' 
risk management problem and the related sovereign risk problem is hedging with respect to macro 
factor movements. Hedging interest rate movements has a 40-year history, but now the focus has 
turned to a longer list of macro factors like home prices, commercial real estate prices, oil prices, 
commodity prices, foreign exchange rates, and stock indices. This hedging capability is now widely 
available in best practice enterprise risk management software. Stress testing with respect to macro 
factors is now a mandatory requirement of the European Central Bank and U.S. bank regulators. 


In this entry, we examine practical tools for 
hedging credit risk at both the transaction level 
and the portfolio level, focusing on the interac¬ 
tion between the credit modeling technologies 
and traded instruments that would allow one 
to mitigate credit risk. We start with a discus¬ 
sion linking credit modeling and credit portfo¬ 
lio management in a practical way. We then turn 
to the credit default swap market as a potential 
hedging tool. Finally, the state of the art is dis¬ 
cussed: hedging transaction level and portfolio 
credit risk using hedges that involve macroe¬ 
conomic factors that are traded in the market¬ 
place. 


CREDIT PORTFOLIO 
MODELING: WHAT'S THE 
HEDGE? 

One of the reasons that the popular value-at- 
risk (VaR) concept has been regarded as an 
incomplete risk management tool is that it 
provides little or no guidance on how to hedge 
if the VaR indicator of risk levels is regarded 
as too high. In a more subtle way, the same 
criticisms apply to many of the key modeling 
technologies that are popular in financial mar¬ 
kets, like the copula approach to the simulation of 
credit portfolios. In this entry we summarize the 
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Figure 1 Cyclical Rise and Fall in 5-Year Reduced Form Default Probabilities: Citigroup and Ford 
Motor Company, 2006-2011 


virtues and the vices from a hedging perspec¬ 
tive of both various credit modeling techniques 
and credit derivative instruments traded in the 
marketplace. One of the key issues that requires 
a lot of attention in credit portfolio modeling 
is the impact of the business cycle on default 
probabilities. Default probabilities rise and fall 
when the economy weakens and strengthens. 
This is both obvious and so subtle that almost 
all commercially available modeling technolo¬ 
gies ignore it. It's easy to talk about it and hard 
to do. 

Figures 1 and 2 show the cyclical rise and 
fall in 5-year reduced-form default probabili¬ 
ties for Citigroup and Ford Motor Company 
for the periods 1990-2005 and 2006-2011. 1 The 
figures show the obvious correlation in de¬ 
fault probabilities for both companies as they 
rise or fall in the 1990-1991 recession and in 
the recession spanning 1999-2003, depending 
on the sector, but the greatest correlation is in 
the credit crisis period of 2007-2009. Over the 
full 1990-2011 period, their respective 5-year 
default probabilities have a simple correlation 
of 45.2%. 


With this common knowledge as background, 
we begin with the hedging implications of the 
Merton model at the individual transaction and 
portfolio level (see Merton, 1974). 


THE MERTON MODEL AND 
ITS VARIANTS: 
TRANSACTION-LEVEL 
HEDGING 

As of this writing, every publicized commer¬ 
cial implementation of the Merton model or its 
variants has one principal assumption in com¬ 
mon: The only random factor in the model is 
the "value of company assets." Regardless of 
the variety of Merton model used, all models of 
this type have the following attributes in com¬ 
mon when the value of company assets rises: 

Stock prices rise. 

Debt prices rise. 

Credit spread falls. 

Default probability falls. 
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Figure 2 Cyclical Rise and Fall in 5-Year Reduced Form Default Probabilities: Citigroup and Ford 
Motor Company, 1990-2005 


From a theoretical point of view, there are 
three obvious ways to think about hedging in 
the Merton context: 

• Hedge a long position in the debt of the firm 
with a short position in the assets of the com¬ 
pany. 

• Hedge a long position in the debt of the firm 
with a short position in the common stock of 
the company. 

• Hedge a long position in the debt of the firm 
with a short position in another debt instru¬ 
ment of the company. 

The first hedging strategy is consistent with 
the assumptions of the Merton model and all 
of its commercial variants, because assets of the 
firm are assumed to be traded in perfectly liquid 
efficient markets with no transactions costs. Un¬ 
fortunately, for most industrial companies, this 
is a very unrealistic assumption. Investors in 
Ford Motor Company cannot go long or short 
auto plants in any proportion. The third hedg¬ 
ing strategy is also not a strategy that one can 
use in practice, although the credit derivative 
instruments we discuss in the next section pro¬ 
vide a variation on this theme. 


From a practical point of view, shorting the 
common stock is the most direct hedging route 
and the one that combines a practical hedge and 
one consistent with the model theory. Unfor¬ 
tunately, however, even this hedging strategy 
has severe constraints that restrict its practical 
use. Specifically, even if the Merton model or its 
variant is true, mathematically, the first deriva¬ 
tive of the common stock price with respect to 
the value of company assets approaches zero 
as the company becomes more and more dis¬ 
tressed. When the value of company assets is 
well below the amount of debt due, the com¬ 
mon stock will be trading just barely above zero. 
One would have to short more and more eq¬ 
uity to offset further falls in debt prices, and 
at some point a hedging strategy that shorts 
even 100% of the company's equity becomes too 
small to fully offset the risk still embedded in 
debt prices. In short, even if the Merton model 
is literally true, the model fails the hedging 
test ("What's the hedge?") for deeply distressed 
situations. 

What about companies that are not yet 
severely distressed? Jarrow and van Deventer 
(1998, 1999), analyzed a 9-year weekly data 
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series of new issue fixed rate bond spreads 
collected by First Interstate Bancorp, which at 
the time was the seventh largest bank holding 
company in the United States. Over the sam¬ 
ple period used by Jarrow and van Deventer, 
First Interstate's debt ratings varied from AA to 
BBB. They analyzed the debt and equity hedge 
ratios produced by the Merton model (and its 
variants) and tested for biases that would re¬ 
duce hedging errors. The results of that analysis 
showed that a common stock hedge in the op¬ 
posite direction of that indicated by the Merton 
model (and its variants) would have improved 
results. That is, one should have gone long the 
equity even if one is long the debt, not short 
the equity. Jarrow and van Deventer are careful 
to point out that this strategy is certainly not 
recommended. The reason for this finding was 
simple: During the 9-year weekly data series be¬ 
ginning in 1984, credit spreads and stock price 
changes move in the direction predicted by the 
Merton model less than 45% of the time. Van 
Deventer and Imai (2003) obtain similar results 
over a much larger sample. 

Jarrow and van Deventer make the point that 
the Merton model is clearly missing key vari¬ 
ables that would allow credit spreads and eq¬ 
uity prices to move in either the same direction 
or the opposite direction as these input vari¬ 
ables change. None of the Merton models in 
commercial use have this flexibility and there¬ 
fore any hedge ratios they imply are quite sus¬ 
pect. 

What about companies that are not invest¬ 
ment grade but do not yet fall in the "severely 
distressed" category? It is in this sector that 
individual transaction hedging using Merton- 
type intuition is potentially the most useful. 
Most of the research that has been done in this 
regard has been done on a proprietary basis on 
Wall Street. Even if the Merton model hedging 
is useful for companies in the BB and B rat¬ 
ings grade, how effective can it be in protecting 
the owner of a bond that once was rated AA but 
sinks to a distressed CCC? Whether or not hedg¬ 
ing errors in the AA to BBB and CCC ratings 


ranges more than offset hedging benefits in the 
BB and B range is an important question. Mod¬ 
ern corporate governance requires that users of 
the Merton model have evidence that it works 
in this situation, rather than relying on a be¬ 
lief that it works. On September 12, 2005, the 
Wall Street Journal reported on the hundreds of 
millions of dollars that were lost by arbitrageurs 
using Merton-type hedges on Ford and General 
Motors when both firms were downgraded by 
the major rating agencies. 2 

There are a few more points that one needs 
to make about the Merton model and all of its 
commercial variants when it comes to transac¬ 
tion level hedging: 

• The Merton model default probability is not 
an input in this hedging calculation for the 
same reason that the return on the common 
stock is not an input in the Black-Scholes op¬ 
tions model. The Merton model and all of its 
commercial variants incorporate all possible 
probabilities of default that stem from every 
possible variation in the value of company 
assets. 

• Loss given default is also not an input in this 
hedging calculation because all possible loss 
given defaults (one for each possible ending 
level of company asset value) are analyzed by 
the Merton model and in turn have an impact 
on the calculated hedge ratio. 

These insights are not widely recognized by 
analysts who consider hedging using the Mer¬ 
ton technology. Given the value of company as¬ 
sets, we can calculate the Merton hedge ratio 
directly with no need for a default probabil¬ 
ity estimate or a loss given default estimate. If 
instead we are given the Merton (or its vari¬ 
ants) default probability, we do not know the 
hedge ratio without full disclosure of how the 
default probability was derived. Any failure to 
make this disclosure is a probable violation of 
the Basel II capital accords from the Basel Com¬ 
mittee on Banking Supervision. 
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THE MERTON MODEL AND 
ITS VARIANTS: PORTFOLIO- 
LEVEL HEDGING 

One of the attractive things about the Merton 
model, in spite of the limitations mentioned 
above, is its simple intuition. We know that the 
basic businesses of Ford and General Motors 
are highly correlated, so it is a small logical step 
to think about how the assets of the two compa¬ 
nies must be closely correlated. One has to make 
a very substantial set of additional assumptions 
if one wants to link the macroeconomic factors 
that drive correlated defaults to the value of 
company assets in the Merton framework or 
any of its one-factor commercial variants. Let's 
assume away those complexities and assume 
that we know the returns on the assets of Ford 
have a 0.25 correlation with the returns on the 
assets of General Motors. Note that the 0.25 cor¬ 
relation does not refer to 

• The correlation in the default probabilities 
themselves. 

• The correlation in the events of default, de¬ 
fined as the vector of Os and Is at each time 
step where 0 denotes no default and 1 denotes 
default. 

These are different and mathematically dis¬ 
tinct definitions of correlation. Jarrow and van 
Deventer (1998,1999) show some of the mathe¬ 
matical links between these different definitions 
of correlation. Jarrow and van Deventer (2005) 
formalize these results. 

Once we have the correlation in the returns on 
the value of company assets, we can simulate 
correlated default as follows: 

• We generate N random paths for the values 
of company assets of GM and Ford that show 
the assumed degree of correlation. 

• We next calculate the default probability that 
would prevail, given that level of company 
assets, at that point in time in the given sce¬ 
nario. 

• We then simulate default/no default. 


For any commercial variant of the Merton 
model, an increase in this "asset correlation" 
results in a greater degree of bunching of de¬ 
faults from a time perspective. This approach 
was a common first step for analysts evalu¬ 
ating first-to-default swaps and collateralized 
debt obligations because they can be done in 
common spreadsheet software packages with a 
minimum of difficulty. 

There are some common pitfalls to beware of 
in using this kind of analysis that are directly 
related to the issues raised about the Merton 
framework and its commercial variants: 

• If one is using the original Merton model of 
risky debt, default can happen at only one 
point in time: the maturity date of the debt. 
This assumption has to be relaxed to allow 
more realistic modeling. 

• If one is using the "down and out option" 
variation of the Merton model, which dates 
from 1976, one has to specify the level of the 
barrier that triggers default at each point in 
time during the modeling period. 

Unless one specifically links the value of com¬ 
pany assets to macroeconomic factors, the port¬ 
folio simulation has the same limitations from 
a hedging point of view as a single transaction. 
As explained earlier, the hedge using a short 
position in the common stock would not work 
for deeply troubled companies from a theoreti¬ 
cal point of view and it does not work for higher 
rated credits (BBB and above) from an empirical 
point of view. 

If one does link the value of company as¬ 
sets to macroeconomic factors, there is still 
another critical and difficult task one has to 
undertake to answer the key question: "What's 
the hedge?" One needs to convert the single¬ 
period, constant interest rates Merton model 
or Merton variant to a full valuation frame¬ 
work for multiperiod fixed-income instru¬ 
ments, many of which contain a multitude of 
embedded options (like a callable bond or a line 
of credit). One of the many lessons of the Wall 
Street Journal article cited above and subsequent 
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experience in 2007-2009 is easy to summarize: 
This approach to hedging and simulating credit 
risk (called the "copula approach" as well as the 
Merton approach) simply did not work. Salmon 
(2009) called the Merton/copula approach the 
"formula that killed Wall Street" via the $945 
billion in credit losses that resulted from the 
credit crisis. 3 

What if we want to use the Merton / copula 
approach in spite of its role in recent losses? As 
Lando (2004) discusses, this is a large set of non¬ 
trivial analytical issues to deal with. Most im¬ 
portantly, moving to a multiperiod framework 
with random interest rates leads one immedi¬ 
ately to the reduced form model approach, where 
it is much easier for the default probability mod¬ 
els to be completely consistent within the valu¬ 
ation framework. We turn to that task now. 

Reduced-Form Models: 
Transaction-Level Hedging 

One of the many virtues of the reduced form 
modeling approach is that it explicitly links 
factors driving default probabilities, like in¬ 
terest rates and other macroeconomic factors, 
to the default probabilities themselves. Just as 
important, the reduced form framework is a 
multiperiod, no-arbitrage valuation framework 
in a random interest rate context. Once we 
know the default probabilities and the factors 
driving them, credit spreads follow immedi¬ 
ately, as does valuation. Valuation, even when 
there are embedded options, often comes in 
the form of analytical closed-form solutions. 
More complex options require numerical meth¬ 
ods that are commonly used on Wall Street. 
The ability to stress test portfolio values and 
portfolio losses with respect to macro factor 
movements is now required by the European 
Central Bank and by U.S. bank regulators via 
two programs: the Comprehensive Capital As¬ 
sessment Review and the Supervisory Capital 
Assessment Program. The later program, re¬ 
quired of the top 19 U.S. financial institutions 
in 2009, mandated stress tests with respect to 


changes in home prices, real gross domestic prod¬ 
uct, and the unemployment rate. 

Suffice it to say that for any simulated value 
of the risk factors driving default, there are 
two valuations that can be produced in the re¬ 
duced form framework. The first valuation is 
the value of the security in the event that the is¬ 
suer has not defaulted. This value can be stress 
tested with respect to the risk factors driving 
default to get hedge ratios with respect to the 
nondiversifiable risk factors. The second value 
that is produced is the value of the security 
given that default has occurred. In the reduced 
form framework of Duffie and Singleton (1999) 
and Jarrow (2001), this loss given default can 
be random and is expressed as a fraction of 
the defaultable instrument one instant prior to 
default. 

These default-related jumps in value have 
two components. The first part is the sys¬ 
tematic (if any) dependence of the loss given 
default or recovery rate on macroeconomic 
factors. The second part is the issuer-specific 
default event, since (conditional on the current 
values of the risk factors driving default for all 
companies) the events of default are indepen¬ 
dent. At the individual transaction level, this 
idiosyncratic company-specific component can 
only be hedged by shorting a defaultable in¬ 
strument of the same issuer or a credit default 
swap of that issuer. 

At the portfolio level, this is not necessary. We 
explain why next. 

Reduced-Form Models: 
Portfolio-Level Hedging 

One of the key conclusions of a properly spec¬ 
ified reduced form model is that the default 
probabilities of each of N companies at a given 
point in time are independent, conditional on 
the values of the macroeconomic factors driv¬ 
ing correlated defaults. That is, as long as none 
of the factors causing correlated default have 
been left out of the model, then by defini¬ 
tion, given the value of these factors, default is 
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independent. This is an insight of Jarrow, 
Lando, and Yu (2005). 

This powerful result means that individual 
corporate credit risk can be diversified away, 
leaving only the systematic risk driven by 
the identified macroeconomic variables. This 
means that we can hedge the portfolio with re¬ 
spect to changes in these macroeconomic vari¬ 
ables just as we do in every hedging exercise: 
We mark to market the portfolio on a credit- 
adjusted basis and then stress test with respect 
to one macroeconomic risk factor. We calcu¬ 
late the change in value that results from the 
macroeconomic risk factor shift and this gives 
us the "delta." We then can calculate the equiva¬ 
lent hedging position to offset this risk. This is a 
capability that is present in modern enterprise¬ 
wide risk management software. 4 

This exercise needs to be done for a wide 
range of potential risk factor shifts, recogniz¬ 
ing that some of the macroeconomic risk factors 
are in fact correlated themselves. Van Deventer, 
Imai, and Mesler (2004) outline procedures for 
doing this in great detail. 

We turn now to commonly used credit-related 
derivative instruments and discuss what role 
they can play in a hedging program. 


CREDIT DEFAULT SWAPS 
AND HEDGING 

Credit default swaps in their purest form pro¬ 
vide specific credit protection on a single issuer. 
They are particularly attractive when the small 
size of a portfolio (in terms of issuer names) or 
extreme concentrations in a portfolio rule out 
diversification as a vehicle for controlling the 
idiosyncratic risk associated with one portfolio 
name. 

Generally speaking, credit default swaps 
should only be used when diversification does 
not work. As we discuss in a later section, deal¬ 
ing directly in the macroeconomic factors that 
are driving correlated default is much more ef¬ 
ficient both in terms of execution costs and in 


terms of minimizing counterparty credit risk. 
An event that causes a large number of cor¬ 
porate defaults over a short time period would 
also obviously increase the default risk of the fi¬ 
nancial institutions that both lend to them and 
act as intermediaries in the credit default swap 
market. This insight was not widely appreci¬ 
ated as recently as 2006, but it is now. The 
bankruptcy of Lehman Brothers on Septem¬ 
ber 15, 2008, the March 2008 rescue of Bear 
Stearns, and the September 2008 rescues of Mer¬ 
rill Lynch (by Bank of America with U.S. gov¬ 
ernment support) and Morgan Stanley (by the 
Federal Reserve) have convinced any doubters 
of the importance of counterparty credit risk in 
the credit default swap market. As of this writ¬ 
ing, only 14 dealers are registered to clear credit 
default swaps with the Depository Trust and 
Clearing Corporation. 5 

Many researchers have begun to find that 
credit spreads and credit default swap quota¬ 
tions are consistently higher than actual credit 
losses would lead one to expect. 6 How can 
such a "liquidity premium" persist in an effi¬ 
cient market? From the perspective of the in¬ 
surance provider on the credit default swap, 
in the words of one market participant, "Why 
would we even think about providing credit in¬ 
surance unless the return on that insurance was 
a lot greater than the average losses we expect to 
come about?" That preference is simple enough 
to understand, but why doesn't the buyer of the 
credit insurance refuse to buy insurance that is 
"overpriced"? 

One potential explanation is related to the 
lack of diversification that individual market 
participants face even if their employers are 
fully diversified. An individual fund manager 
may have only 10-20 fixed income exposures 
and a bonus pool that strictly depends on his 
or her ability to outperform a specific bench¬ 
mark index over a specific period of time. One 
default may devastate the bonus, even if the 
fund manager in 1 billion repeated trials may 
in fact outperform the benchmark. The individ¬ 
ual has more reason to buy single-name credit 
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insurance than the employer does because (1) 
his or her work-related portfolio is much less 
diversified than the entire portfolio of the em¬ 
ployer, (2) the potential loss of the bonus makes 
him or her much more risk averse than the 
employer, and (3) the employer is much less 
likely to be aware that the credit insurance 
is (on average) overpriced than the individ¬ 
ual market participant. Jarrow, Li, Mesler, and 
van Deventer (2007) have quantified the mag¬ 
nitude of this premium and shown that fac¬ 
tors as diverse as company size (bigger firms 
get smaller spreads) and location (Japanese 
firms get smaller spreads) affect the premium of 
CDS spreads over default risk. These premiums 
are available daily via the Thomson Reuters 
"Credit Views" page. 7 

A more important concern with credit default 
swap hedging is the very thin trading volume 
in the CDS market in the aftermath of the credit 
crisis. A study 8 found the following: 

• Only 241 corporate reference names averaged 
more than 5 trades per day. 

• Only 63 reference names averaged more than 
10 trades per day. 

• Only 14 reference names averaged more than 
15 trades per day. 

• No reference names averaged more than 23 
trades per day. 

Given these low volumes, there is a serious 
risk of market manipulation that should give 
any potential hedger great concern. 


PORTFOLIO- AND 
TRANSACTION-LEVEL 
HEDGING USING TRADED 
MACROECONOMIC INDICES 

The instantaneous probability of default can be 
specified as a linear function of one or more 
macroeconomic factors. An example is the case 
where the default intensity is a linear func¬ 
tion of the random short-term rate of interest 


r and a macroeconomic factor with normally 
distributed return Z: 

k(f) = ko + M r(t) + X2 Z(f) 

The constant term in this expression is an id¬ 
iosyncratic term that is unique to the company. 
Random movements in the short rate r and the 
macroeconomic factor Z will cause correlated 
movements in the default intensities for all com¬ 
panies whose risk is driven by common factors. 
The default intensity has a term structure like 
the term structure of interest rates, and this en¬ 
tire term structure moves up and down with 
the business cycle as captured by the macroeco¬ 
nomic factors. The parameters of this reduced 
from model can be derived by observable his¬ 
tories of bond prices of each counterparty or 
from observable histories of credit derivatives 
prices using enterprise-wide risk management 
software. 

Alternatively, a historical default database can 
be used to parameterize the term structure of 
default probabilities using discrete instead of 
continuous default probabilities, just as discrete 
interest rates are used in practice based on yield 
curve movements in continuous time. The most 
common approach to historical default prob¬ 
ability estimation uses logistic regression. For 
each company, monthly observations are de¬ 
noted 0 if the company is not bankrupt in the 
following month and 1 if the company does go 
bankrupt in the next month. Explanatory vari¬ 
ables X, are selected and the parameters a and 
f>, which produce the best fitting predictions of 
the default probability using the following lo¬ 
gistic regression formula: 

n 

P[f] = 1/[1 + exp(—a— A*i)] 

i=l 

By fitting this logistic regression for each de¬ 
fault probability on the default probability term 
structure, one can build the entire cumulative 
and annualized default probability term struc¬ 
tures for a large universe of corporations. Fig¬ 
ure 3 shows the cumulative term structure of 
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Figure 3 Cumulative Term Structure of Default Probabilities for Washington Mutual: September 2008, 
One Day Prior to Default 


default probabilities for Washington Mutual, 
just prior to its failure in September 2008. 

Alternatively, one can annualize the entire 
term structure of default probabilities for easy 
comparison with credit spreads and credit de¬ 
fault swap quotations. The resulting curve is 
downward sloping for high-risk credits like 
Washington Mutual (see Figure 4). 

The key advantage of the reduced-form ap¬ 
proach is that critical macroeconomic factors 
can be linked explicitly to default probabilities 
as explanatory variables. The result is a spe¬ 
cific mathematical link like the linear function 
of the pure Jarrow reduced form model or the 
logistic regression formula used for historical 
database fitting. The logistic regression formula 
is very powerful for simulating forward since it 
always produces default probability values be¬ 
tween zero and 100%. These values can then be 
converted to the linear Jarrow form for closed- 
form mark-to-market values for every transac¬ 
tion in a portfolio. 

Van Deventer, Imai, and Mesler (2004) then 
summarize how to calculate the macroeco¬ 


nomic risk factor exposure as follows. The 
Jarrow model is much better suited to hedging 
credit risk on a portfolio level than the Merton 
model because the link between the ( N ) macro 
factor(s) M and the default intensity is explicitly 
incorporated in the model. Take the example of 
Washington Mutual, whose probability of de¬ 
fault is driven by interest rates and home prices, 
among other things. If M(t) is the macro factor 
defined as the one-year change in home prices, 
it can be shown that the size of the hedge that 
needs to be bought or sold to hedge one dol¬ 
lar of risky debt zero coupon debt with market 
value v under the Jarrow model is given by 

3u;(f, T : i)/dM(t) = —[3y,(f, T)/3M(f) 

+ k 2 (l -Si)(T-t)/ 
a m M(t)]vi(t, T : i) 

The variable v is the value of risky zero- 
coupon debt and y is the liquidity discount 
function representing the illiquidities often ob¬ 
served in the debt market. There are similar 
formulas in the Jarrow model for hedging 
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Figure 4 Annualized Term Structure of Default Probabilities for Washington Mutual: September 2008, 
One Day Prior to Default 


coupon-bearing bonds, defaultable caps, floors, 
credit derivatives, and so on. 

In practice, these hedge ratios are derived 
from a sophisticated simulation on "best prac¬ 
tice" enterprise-wide risk management soft¬ 
ware. Van Deventer and Imai (2003) show that 
the steps in hedging the macro factor risk for 
any portfolio are identical to the steps that a 
trader of options has been taking for 30 years 
(hedging the net position with a long or short 
position in the common stock underlying the 
options): 

• Calculate the change in the value (including 
the impact of interest rates on default) of all 
retail credits with respect to interest rates. 

• Calculate the change in the value (including 
the impact of interest rates on default) of all 
small business credits with respect to interest 
rates. 

• Calculate the change in the value (including 
the impact of interest rates on default) of all 
major corporate credits with respect to inter¬ 
est rates. 


• Calculate the change in the value (including 
the impact of interest rates on default) of all 
bonds, derivatives, and other instruments. 

• Add these "delta" amounts together. 

• The result is the global portfolio "delta," on 
a default-adjusted basis, of interest rates for 
the entire portfolio. 

• Choose the position in interest rate deriva¬ 
tives with the opposite delta. 

• This eliminates interest rate risk from the 
portfolio on a default-adjusted basis. 

We can replicate this process for any macroe¬ 
conomic factor that impacts default, such as 
home prices, exchange rates, stock price indices, 
oil prices, the value of class A office buildings 
in the central business district of key cities, and 
so on. 

Most importantly, 

• We can measure the default-adjusted trans¬ 
action level and portfolio risk exposure with 
respect to each macroeconomic factor. 

• We can set exposure limits on the default- 
adjusted transaction level and portfolio risk 
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exposure with respect to each macroeconomic 
factor. 

• We know how much of a hedge would elimi¬ 
nate some or all of this risk. 

The reason this analysis is so critical to suc¬ 
cess in credit risk portfolio management is the 
all-pervasiveness of correlated risk. Let us put 
aside the 2007-2009 credit crisis and look at 
other recent history. Take the Japan scenario. 
At the end of December 1989, the Nikkei stock 
price index had reached almost 39,000. Over 
the course of the next 14 years, it traded as 
low as 7,000. Commercial real estate prices fell 
by more than 60%. Single-family home prices 
fell in many regions for more than 15 consecu¬ 
tive years. More than 135,000 small businesses 
failed. Six of the 21 largest banks in Japan were 
nationalized in a span of two years. How would 
this approach have worked in Japan? 

First of all, fitting a logistic regression for 
small businesses in Japan over this period 
shows that the properly specified inputs for 
the Nikkei and the yen/U.S. dollar exchange 
rates have t -score equivalents of more than 45 
standard deviations from zero in a logistic re¬ 
gression. By stress testing a small business loan 
portfolio with this knowledge, we would have 
known how many put options on the Nikkei 
and put options on the yen were necessary to 
fully or partially offset credit-adjusted mark-to- 
market loan losses, just as the Federal Deposit 
Insurance Corporation announced it was doing 
in its 2003 Loss Distribution Model. 9 
This same approach works with 

• Retail loan portfolios 

• Small business loan portfolios 

• Large corporate loan, bond, derivative, and 
other portfolios 

• Sovereign and other government exposures 

If common factors are found to drive each 
class of loans, then we have enterprise-wide 
correlations in defaults. An identical approach 
in the U.S. market would have spared many fi¬ 
nancial institutions tens of billions in losses that 


resulted from an inability to do the stress tests 
described above and that are now mandated by 
the U.S. government and the European Central 
Bank. 

The key to success in this analysis is a risk 
management software package that can handle 
it. 10 What is also important in doing the model¬ 
ing is to recognize that macroeconomic factors 
that are exchange traded (such as the S&P 500, 
home price futures, etc.) are much preferred to 
similar indicators that are not traded (such as 
the Conference Board index of leading indica¬ 
tors or the unemployment rate). 

If one takes this approach, total balance sheet 
credit hedging is very practical 

* Without using credit derivatives 

* Without using first-to-default swaps 

* Without using Wall Street as a counterparty 
from a credit risk point of view 

All of these benefits are critical to answer the 
key question of "What's the hedge?" 


KEY POINTS 

* It is not enough to know only the default risk 
of a counterparty. Over the full portfolio, a fi¬ 
nancial institution needs to know the answer 
to the question "What is the hedge?" if the 
measured credit risk is uncomfortably large. 

• The major U.S. (2007-2009) and Japanese 
(1990-2002) financial institutions required 
government bailouts in the trillions of dol¬ 
lars because of their inability to measure and 
hedge macro factor risks like those of home 
price movements and commercial real estate 
price movements. 

• The Merton model is a logical place to start 
thinking about how to hedge because of its 
simple structure and focus on the value of 
company assets. 

* Unfortunately, for theoretical reasons alone, 
hedging in the Merton framework does not 
work for a company that is highly distressed. 
A perfect hedge could easily require a short 


418 


Credit Risk Modeling 


position of more than 100% of the shares of 
outstanding common stock. 

• The only practical and accurate approach to 
hedging credit risk is the reduced form mod¬ 
eling approach. 

• Hedging with credit default swaps is not 
practical because of the high degree of coun¬ 
terparty credit risk that is now obvious in the 
wake of the 2007-2009 credit crisis and the 
effective failures of investment banking firms 
like Bear Stearns, Lehman Brothers, Morgan 
Stanley, and Merrill Lynch. Moreover, trading 
volume in the credit default swap market is 
now so thin that large trades cannot be effi¬ 
ciently executed, and the risk of market ma¬ 
nipulation is very high. 

• The reduced form approach explicitly links 
macro factors to both observable bond and 
CDS prices and to a historical default 
database. A similar approach links macro fac¬ 
tors to credit spreads and recovery rates. The 
recovery on a mortgage that is in default is an 
example. Obviously, it depends on the value 
of the house that is collateral. 

• Delta hedging of aggregate portfolio exposure 
to these macro factors that drive credit risk 
is done in best practice enterprise-wide risk 
management software. 

• This modem application of stress testing, ap¬ 
plied to a longer list of macro factors than in¬ 
terest rates alone, is not just theory. It is now 
mandated by the European Central Bank and 
U.S. regulatory authorities. 


NOTES 

1. Default probabilities presented in this chap¬ 
ter are supplied by Kamakura Corporation. 

2. See Whitehouse (2005). 

3. International Monetary Fund, Global Stabil¬ 
ity Report, as reported by the Financial Times, 
April 8, 2008. 

4. See "Kamakura Risk Manager In Depth," 
July 2011, available on www.kamakuraco 
.com for an example. 


5. See www.dtcc.com for a list of dealers and 
related CDS trading volume. 

6. See Chapter 18 in van Deventer, Imai, and 
Mesler (2004) for a summary of the research 
in this area. 

7. The credit views page compares credit de¬ 
fault swap spreads reported by Markit 
Partners with default probabilities from 
Kamakura Risk Information Services. 

8. van Deventer (2010). 

9. See press release dated December 10, 2003, 
on www.fdic.gov . 

10. See, for example, the Kamakura Risk Man¬ 
ager risk management software system. 
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Abstract: The three key factors that drive the valuation of a financial asset are risk, return, and 
timing of cash flows. A fundamental assumption in valuation is that in the absence of costless 
arbitrage opportunities, if two investments whose risk, return, and timing of cash flow properties 
are exactly the same are identified, they must have the same price in the marketplace. Otherwise, 
market participants can make free money by simultaneously selling the more expensive one and 
buying the cheaper one. This principle allows for the development of no-arbitrage price relations for 
forwards, futures, and swaps. The price of a futures contract is identical to the price of a forward 
contract in an environment in which short-term interest rates are known. In addition, a swap 
contract is nothing more than a portfolio of forward contracts. Hence, if a forward contract can be 
valued, a swap can be valued. The forward price and the underlying spot price are inextricably 
linked by the net cost of carry relation. 


Exchange-traded and over-the-counter (OTC) 
derivatives contracts are traded worldwide. Of 
these, the lion's share is plain-vanilla forwards, 
futures, and swaps. The purpose of this entry is 
to develop no-arbitrage price relations for for¬ 
wards, futures, and swap contracts. In doing so, 
we rely only on the assumption that two perfect 
substitutes must have the same price. The two 
substitutes, in this case, are a forward / futures 
contract and a levered position in the under¬ 
lying asset. The key to understanding the for¬ 
ward / futures valuation lies in identifying the 
net cost of carrying (i.e., "buying and holding") 
an asset. We begin therefore with a discussion of 
carry costs/benefits. We then proceed by devel¬ 
oping a number of important no-arbitrage rela¬ 


tions governing forward and futures prices. Fi¬ 
nally, we show that, since a swap contract is an 
exchange of future payments at a price agreed 
upon today, it can be valued as a portfolio of 
forward contracts. 


UNDERSTANDING CARRY 
COSTS/BENEFITS 

Derivative contracts are written on four types of 
assets—stocks, bonds, foreign currencies, and 
commodities. The derivatives literature con¬ 
tains seemingly independent developments of 
derivative valuation principles for each type 
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of asset. Generally speaking, however, the val¬ 
uation principles are not asset-specific. The 
only distinction among assets is how carry 
costs/benefits are modeled. 

The net cost of carry refers to the difference 
between the costs and the benefits of holding 
an asset. Suppose a breakfast cereal producer 
needs 5,000 bushels of wheat for processing in 
two months. To lock in the price of the wheat 
today, he can buy it and carry it for two months. 
One carry cost common to all assets is the op¬ 
portunity cost of funds. To come up with the 
purchase price, he must either borrow money 
or liquidate existing interest-bearing assets. In 
either case, an interest cost is incurred. We as¬ 
sume this cost is incurred at the risk-free rate 
of interest. Beyond interest cost, however, carry 
costs vary depending upon the nature of the 
asset. For a physical asset or commodity such 
as wheat, we incur storage costs (e.g., rent and 
insurance). At the same time, certain benefits 
may accrue. By storing wheat we may avoid 
some costs of possibly running out of our reg¬ 
ular inventory before two months are up and 
having to pay extra for emergency deliveries. 
This is called convenience yield. Thus, the net 
cost of carry for a commodity equals interest 
cost plus storage costs less convenience yield, 
that is. 

Net carry cost = Cost of funds + Storage cost 
— Convenience yield 

For a financial asset or security such as a stock 
or a bond, the carry costs/benefits are differ¬ 
ent. While borrowing costs remain, securities 
do not require storage costs and do not have 
convenience yields. What they do have, how¬ 
ever, is income (yield) that accrues in the form of 
quarterly cash dividends or semiannual coupon 
payments. Thus, the net cost of carry for a 
security is 

Net carry cost = Cost of funds — Income 


Carry costs and benefits are modeled either 
as continuous rates or as discrete flows. Some 
costs/benefits such as the cost of funds (i.e., 
the risk-free interest rate) are best modeled 
as continuous rates. The dividend yield on a 
broadly based stock portfolio, the interest in¬ 
come on a foreign currency deposit, and the 
lease rate on gold also fall into this category. 
Other costs/benefits such as warehouse rent 
payments for holding an inventory of grain, 
quarterly cash dividends on individual com¬ 
mon stocks, and semiannual coupon receipts on 
a bond are best modeled as discrete cash flows. 
Below we provide the continuous rate and dis¬ 
crete flow cost of carry assumptions. For ease 
of exposition, we first introduce some notation. 
The current price of the asset is denoted S. Its 
price at future time T is St, where the tilde de¬ 
notes the future asset price is uncertain. The 
opportunity cost of funds (i.e., the risk-free rate 
of interest) is assumed to be a constant, contin¬ 
uous rate and is denoted r. If we borrow to buy 
the asset today, we will owe Se rT at time T. 


Continuous Rates 

The types of assets whose carry costs are typ¬ 
ically modeled as constant, continuous rates 
include broadly based stock index portfolios, 
foreign currencies, and gold. Assume that we 
borrow at the risk-free rate of interest to buy a 
stock index portfolio that pays cash dividends 
at a constant continuous rate i. If we buy one 
unit of the index today and reinvest all divi¬ 
dends immediately as they are received in more 
shares of the index portfolio, the number of 
units of the index portfolio will grow to exactly 
e' T units at time T. Alternatively, if we want ex¬ 
actly one unit of the index on hand at time T, we 
buy only e~ tT units today at a cost of Se~' T . The 
terminal value of our investment in the index 
portfolio at time T will be Sj ■ The loan value has 
accrued from Se _,T to Se~ ,T e rT = Se < ' r ^’ >r . After 
repaying the loan, the terminal portfolio value 
will be St — Se <r ^ l>r . Within this continuous 
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rate framework, the net cost of carry rate of an 
index portfolio equals the difference between 
the risk-free rate of interest r and the dividend 
yield rate i. The situation for a foreign currency 
is identical. If we borrow at the domestic risk¬ 
free rate, buy a foreign currency, and then invest 
the currency at the prevailing foreign risk-free 
rate, the net cost of carry rate equals the differ¬ 
ence between the domestic interest rate r and 
the foreign interest rate i. Similarly, if we bor¬ 
row at the risk-free rate, buy gold, and then 
lend it in the marketplace, the net cost of carry 
rate equals the difference between the interest 
rate r and the lease rate on gold i. Within this 
framework, the total cost of carry paid at time 
T is 

Net carry costr = S[e (r_!)r ] — 1 (1) 

To illustrate, assume that the S&P 500 index 
is currently at a level of 1,100 and pays divi¬ 
dends at the continuous rate of 3% annually 
Assume also that "shares" of the S&P 500 index 
can be purchased and sold at the index level 
(i.e., one share currently costs $1,100). Suppose 
that an investor wants exactly 3,000 shares of 
the S&P 500 index on hand in five days. How 
many shares of the S&P 500 index must the in¬ 
vestor buy today if all dividends paid are rein¬ 
vested in more shares of the index portfolio? 

If the investor wants 3,000 shares of the index 
on hand in five days, the investor needs to buy 
3,OOOfT ao3(5/365) = 2,998.77 shares today. Over 
the first day, the number of shares will grow 
by a factor e 003(1 / 365 ) q ue to q^ reinvestment 
of dividends, bringing the number of shares 
to 2,998.77e a03 < 1/365 > = 2,999.01. Over the sec¬ 
ond day, the number of shares will again grow 
by a factor e °° 3 ( i / 365 ) q ue |- 0 q-^ reinvestment 
of dividends, bringing the number of shares to 
2,999.26. Since the dividends are being paid at 
a constant, continuous rate, we know the orig¬ 
inal number of shares purchased will grow to 
exactly 3,000 shares by the end of day 5 (i.e., 
S^OOe 0 ' 03 ^ 365 ^ -0 ' 03 ^ 1 / 365 ) = 3,000), as is shown 
in the following table. 


Day 

Index Level 

Units of Index 

Value of Index 
Position 

0 

1,100.00 

2,998.77 

3,298,644 

1 

1,160.00 

2,999.01 

3,478,856 

2 

1,154.00 

2,999.26 

3,461,146 

3 

1,145.00 

2,999.51 

3,434,435 

4 

1,170.00 

2,999.75 

3,509,712 

5 

1,175.00 

3,000.00 

3,525,000 


Discrete Flows 

For most other types of assets including stocks 
with quarterly cash dividends and bonds 
with semiannual coupon payments, noninter¬ 
est carry costs/benefits are best modeled as dis¬ 
crete flows. Suppose a stock promises to pay n 
known cash dividends in the amount f, at time 
f„ i = 1,..., n between now and future time T. 
If we borrow S to cover the purchase price of 
the stock and reinvest all cash dividends as they 
are received at the risk-free rate of interest, the 
terminal value of our position will be 

~St + J2 lie r(T ~ k) - Se rT 
1=1 

In this instance, the net cost of carry at time T is 

n 

Net carry cost r = S(e rT — 1) — ^ 

i=i 

For coupon-bearing bonds, the expressions are 
the same; however, S denotes the bond price 
and I, at time f„ i = 1,..., n denote coupon 
payments. 

To illustrate, an investor buys 10,000 shares 
of ABC Corporation and carries that position 
for 90 days. ABC's current share price is $50, 
and the stock promises to pay a $4 dividend 
in exactly 30 days. What will be the value of 
the portfolio when the investor unwinds in 
90 days, assuming that the risk-free rate of inter¬ 
est is 5%? As Table 1 shows, the initial invest¬ 
ment in 10,000 shares of ABC costs $500,000. 
The investor financed the entire purchase price 
with risk-free borrowings, hence the initial in¬ 
vestment is $0. In 90 days, the investor has 
three components to the portfolio. First, the in¬ 
vestor owns 10,000 shares valued at St a share. 
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Table 1 Future Value of Asset That Pays Discrete Cash Flows 


Trade 

Initial Investment 

Value on Day T 

Buy stock 

-50(10,000) 

10,000Sr 

Borrow funds 

500,000 

-500,000e at)5(90/365) = -506,202.54 

Receive cash dividends on day f, and reinvest at 
risk-free rate until day T 


40,OOOe 005(60/365) = 40,330.12 

Value of position 

0 

10. 000Sr - 506, 202.54 + 40, 330.12 


Next, the investor must repay the $500,000 in 
risk-free borrowings plus interest at a cost of 
$506,202.54. Finally, the investor received cash 
dividends of $4 a share or $40,000 on day 30, 
which the investor invested immediately in 
risk-free discount bonds. Dividends plus ac¬ 
crued interest amount to $40,330.12 on day T. 
Thus, the total value of the portfolio in 90 days 
is 10,000S T - 506,202.54 + 40,330.12. 

Summary and Some Guidelines 

Carry costs/benefits are the known costs/ 
benefits associated with holding an asset over a 
fixed period of time. In general, they consist of 
two components—(1) interest and (2) income 
(in the case of a financial asset) or storage (in 
the case of a physical asset). The interest com¬ 
ponent is always expressed as a rate. If we buy 
an asset today with borrowed funds, we will 
owe e rT per unit of the asset on day T. Income 
and noninterest costs are expressed either as a 
continuous proportion of the asset price or as 
discrete cash flows, depending upon the nature 
of the underlying asset. Firms potentially have 
four different sources of price risk—equity risk, 
interest rate risk, foreign exchange risk, and 
commodity price risk. Table 2 presents terminal 
values of leveraged asset positions using the 
net cost of carry assumption appropriate to 
each asset category. 

VALUING FORWARDS 

With the concept of net cost of carry in hand, 
we now turn to valuing forward contracts. A 
forward is a contract that requires its seller to 


deliver the underlying asset on future day T at 
a price agreed upon today. We denote today's 
forward price as/. Its price on day T is denoted 
Jt . A forward with no time remaining to expi¬ 
ration must have the same price as the underly¬ 
ing asset, that is, fj = Sr as shown in Figure 1. 
Otherwise, a costless arbitrage profit is possi¬ 
ble by buying the asset and selling the forward, 
or vice versa. The purpose of this section is to 
derive the value of a forward contract relative 
to its underlying asset price prior to time T un¬ 
der the continuous and discrete net carry cost 
assumptions. 

Continuous Rates 

To establish the price of a forward today, con¬ 
sider a U.S. corporation that needs to make a 
EUR 1,000,000 payment in T days and wants 
to lock in the U.S. dollar value of this payment 
today. The firm can accomplish this goal in two 
ways. 

First, it can borrow U.S. dollars and buy euros 
today at the spot exchange rate S, and then carry 
the position for T days. To have one euro on 
hand in T days, they need to buy e~' T units today 
where i is the risk-free interest rate in Europe. 
To finance the entire purchase today, they need 
to borrow Se~ tT . The repayment of the loan will 
occur in T days, and the principal plus interest 
will amount to Se~' T e‘ T per euro where r is the 
U.S. risk-free interest rate. 

Second, it can negotiate the price of a 
T-day forward contract with its bank. Under the 
terms of the forward contract, the firm will buy 
1,000,000 euros in T days at a cost of/ per euro. 
No money changes hands today. In making its 






incur interest cost at a constant continuous rate r. 


73 

CD 


O 

<33 


3 

o 

3 

.0 

0 

o 

CD 

bJD 

.S 


c 

Oh 


o» 

c 

ai 

CO 


a 

e O 

13 

a 

a 

X 

w 


73 

01 

73 

c 

O) 

s 

6 'qJ 
O 73 

II 


0> 

a 


^ T3 ^ 
u .23 cr 


In 73 




03 73 ' 


Oh 

I 

ti 

CU 

t-O 

I 

»<-n 


o 

<33 

a> 


° 73 
v; O) O 
u $ ^ 
o S o 

“ tg 

& Oh 

S ■* 

s « 
« ° 
C cn 


tti 

I 

o £ 
cr ^ 
W 


*Wj 


Oh 


c •. 
o „ 

Oh-. 
0 . 
C 


*8 »2 


O U 

<4H 

0) 2j 

03 8 
t! cn 

.5 o 

S 2 

S 3 

»H O 1 
CD CD •' 


.2 

I 

J-H 

o 

Oh 
^ . 


Sl| 

§1 5 ; 

U (C ^ 

0 as 

hJT Jh 

4 g ? 

H aj 
d .b »h 
O u 

,o> 


cn ecs W) 
73 - - 

<d 73 
cn »th 

«j .2^ 

^ & 
^73 
73 o 
CU 

O 73 

j-h -r 1 

-O .£ 


d .0 

73 CD 
•i-i -d qj 
S a,* 

-|a-S 

o C OJ 

|g* 

o 

u d r 
Oh E"< 


d 


Oh 


o 

Oh 


PP 

I 

O 


<-0 

I 


0 

o 

0 

.2 

o 

o 

U 


£ 

o 

<33 

0) 


o 

Oh 

P4 


73 

0 

O 

CO 


=Wj 


Oh 



<-T) 

I 


Oh 

I 

hH 

O) 

!-n 

I 

tc£f 


0 

o 

u 


£ 

o 

<33 

CD 


S 

u 


3^ 

73 

o 

i 

£ 

o 

U 


427 





428 


Derivatives Valuation 



Figure 1 Price paths of forward contract and its underlying asset through time. Price convergence 
occurs at expiration. 


decision about which strategy to take, the firm 
will compare the forward price with the future 
value had the euros been purchased today and 
carried until day T. If / exceeds S/ r_i)T , the firm 
will buy the euros in the spot market and carry 
them. If/ is less than Se <r ^' >T , the firm will buy 
the forward contract. Both alternatives provide 
the firm with EUR 1,000,000 in T days at a price 
locked in today. Since they are perfect substi¬ 
tutes, they must have the same price. The value 
of a forward in a constant continuous net cost 
of carry framework is 

/ = Se (r-!)T (2) 

The relation (2) is sometimes called the net cost 
of carry relation. When the prices of the forward 
and the asset are such that (2) holds exactly, the 
forward market is said to be at full carry. Un¬ 
less costless arbitrage is somehow impeded, we 
can be assured that the forward market will al¬ 
ways be at full carry. Suppose, for an instant 
in time, / > Sc , ' r ~' ,, . Such a condition implies 
that there is a costless arbitrage opportunity. 
We should immediately sell the forward and 
buy the asset, financing the purchase of the as¬ 
set with risk-free borrowing. Table 3 shows the 
outcome. With no investment today, we earn 
a certain outcome of / — Se^ r ~'^ T > 0 on day 
T. Naturally, the market cannot be in equilib¬ 


rium. The costless arbitrage activity would con¬ 
tinue until the selling pressure on the forward 
price and the buying pressure on the asset price 
makes the arbitrage profit equal to 0. Where no 
arbitrage opportunity exists, the cost of carry 
relation (2) holds. 

The net cost of carry relation (2) is written 
in future value form, since both sides of the 
equation are values on day T, as shown in Table 
3. The relation can also be expressed in present 
value form. Multiplying both sides of (2) by the 
discount factor e~ rT , we get 

fe~ rT = Se~ iT (3) 

What (3) says is that the prepaid forward con¬ 
tract, fe~ rT , equals the initial cost of the asset 
position, Se~ ,T . 


Table 3 Costless Arbitrage Trades Where / > Sd r ! ' T 


Trades 

Initial 

Investment 

Value on Day T 

Buy e~ ,T units of asset 

-Se~ iT 

S T 

Borrow (sell risk-free 

Se~ iT 

-Se^ T 

bonds) 



Sell forward contract 


Q 

i 

iko 

1 

Net portfolio value 

0 

/ - Se^ 
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Discrete Flows 

In the event that income or noninterest carry 
costs are more appropriately modeled as dis¬ 
crete cash flows, the net cost of carry relation is 

/ = Se rT - FVI 

where FVI is the future value of the promised in¬ 
come receipts. If the underlying asset is a phys¬ 
ical asset, the future value of the income, FVI, 
may be negative as a result of storage cost pay¬ 
ments. The relation can also be written in its 
present value form, 

fe~ rT = S-PVI 

where PVI is the present value of the promised 
income receipts, that is, PVI = FVIe~ rT . The 
prepaid forward price equals S — PVI, where 
the underlying asset distributes discrete known 
cash flows through time. 

To illustrate, let's compute the value of a 
forward contract on a hypothetical dividend¬ 
paying stock, HAL Company. Specifically, we 
want to value a six-month forward contract on 
3,000 shares of this company, assuming that the 
current share price is $120 and that a $3 cash 
dividend will be paid in two months and then 
again in five months. Assume the risk-free rate 
of interest is 5%. Since the cash dividend pay¬ 
ments are discrete cash inflows, the cost of carry 
relation given by (1) is the most appropriate. 
The future value of the first dividend payment 
is 3 e 0 05 ( 4 / 12 ) given by (1) and the future value 
of the second dividend is ^e 0 05 P/ 12 )_ The future 
value of all income received during the forward 
contract's life is therefore 

FVI = 3e a05(4/12) + 3e a05(1/12) = 6.06 
The value of the forward contract is therefore 
/ = I20e 0 05(6/12) - 6.06 = 116.97per share 
or $350,910 in total. 

Hedging with Forwards 

Before turning to futures contract valuation, it 
is worth considering the no-arbitrage portfolio 
in Table 3 more closely. It contains important 


Table 4 Hedging a Stock Protfolio Using a Forward 
Contract 


Trades 

Initial 

Investment 

Value on Day T 

Own stock portfolio. 
Reinvest all 
dividend income 
into more shares of 
stocks. 

-S 

S T e iT 

Sell e~ ,T forward 
contract. 

0 

-(St - f)e iT 

Net portfolio value 

0 

fe ,T 


intuition regarding hedging risk. Suppose that 
we hold a stock portfolio and fear that the mar¬ 
ket will decline over the next few months. To 
avoid the risk of a stock market decline, we can 
sell our stocks and buy risk-free bonds. Alter¬ 
natively, we can sell a forward contract on our 
stock portfolio. These alternatives are perfect 
substitutes. 

To see this, assume that our portfolio is suf¬ 
ficiently broad-based that it is reasonable to 
assume that the dividend yield is a constant 
continuous rate, i. If all dividend income is in¬ 
vested in more units of the stock portfolio, one 
unit in the stock portfolio today will grow to 
e tT units on day T, as we discussed earlier and 
illustrated in Table 4. To hedge the price risk 
exposure of e lT units of the stock portfolio on 
day T, we need to sell e lT forward contracts to¬ 
day. The value of this forward position will be 
— (St — /)e' 7 on day T. Once the positions are 
netted, the terminal value of the portfolio is fe ,T . 
Note that the value is certain. The forward price, 
the dividend yield rate, and the hedge period 
horizon (i.e., the life of the forward contract) 
are all known on day 0. To see that the return 
on the hedged portfolio equals the risk-free re¬ 
turn, substitute the net cost of carry relation, 
/ = Se (r - i)T , in the expression for the terminal 
value of the portfolio in Table 4. The net termi¬ 
nal value is fe' T = Se^~ l ^ T e lT = Se rT , exactly the 
amount we would have had if the stock portfo¬ 
lio had been liquidated and invested in risk-free 
bonds at the outset. 
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Table 5 Perfect Substitutes Implied by the Net Cost of Carry Relation 


Position 1 

Position 2 

Buy asset/sell forward 

Buy risk-free bonds (lend)/buy forward 

Buy asset/sell risk-free bonds (borrow) 

= Buy risk-free bonds (lend) 

= Buy asset 

= Buy forward 

Sell asset/buy forward 

Sell risk-free bonds (borrow)/sell forward 
Sell asset/buy risk-free bonds (lend) 

= Sell risk-free bonds (borrow) 

= Sell asset 

= Sell forward 


Summary 

A long forward position is a perfect substitute 
for buying the asset using risk-free borrowings. 
Consequently, the price of a forward equals the 
price of the asset plus net carry costs. But this 
is only one possible combination of positions 
in the asset, the forward, and risk-free bonds. 
Table 5 shows all possible pairings. Using the 
net cost of carry relation, we can demonstrate 
why Position 1 is a perfect substitute for Posi¬ 
tion 2 in all six rows of the table. A full under¬ 
standing of each relation will prove invaluable 
in understanding valuation and risk manage¬ 
ment problems. 


VALUING FUTURES 

Futures contracts are like forward contracts, ex¬ 
cept that price movements are marked-to-market 
each day rather than waiting until contract ex¬ 
piration and having a single, once-and-for-all 
settlement. If the marking-to-market produces 
a gain during the futures contract's life, the gain 
can be reinvested in interest-bearing securities. 
Conversely, if the marking-to-market produces 
a loss, the loss must be covered with either exist¬ 
ing interest-bearing assets or borrowing at the 
risk-free interest rate. 

To distinguish between buying a forward and 
buying a futures, consider the futures position 
cash flows shown in Table 6. As we discussed 
earlier, a forward contract purchased today has 
a value Sr — / on day T. In contrast, a futures 
contract is marked to market each day, and the 
daily gains/losses gather interest. If risk-free 


rate of interest is 0%, the terminal value of the 
futures position (i.e., the sum of the mark-to- 
market gain/loss column) is the same as the 
terminal value of the forward position. If risk¬ 
free rate of interest is greater than 0%, however, 
the value of the futures position on day T may 
be greater or less than the terminal value of the 
forward position, depending on the path that 
futures prices follow over the life of the contract. 

To illustrate, suppose that an investor needs 
£1,000,000 in three days and wants to lock in 
the price today. Suppose also that a three-day 
forward contract on British pounds is priced 
at $1.60 per pound and that a British pound 
futures contract with three days remaining to 
expiration also has a price of $1.60. Let's com¬ 
pare the terminal values of a long forward po¬ 
sition with a long futures position at the end 
of three days assuming the domestic risk-free 
rate is 5%. Assume that the futures prices over 


Table 6 Cash Flows of Long Futures Positions 
through Time 




Mark-to- 

Market 

Value of 


Futures 

Gain/Loss 

Gain/Loss on 

Day t 

Price 

on Day t 

Day T 

0 

F 



1 

F i 

Fi — F 

(Fj - Fyv-v 

2 

F 2 

F 2 -T 1 

(F 2 - fi)e r(r-2) 

t 

Ft 

jTJi 

1 

JTH 

T 

(F t - F f -i)e r(T -' ) 

T—l 

Ft-1 

Ft- 1 — Ft -2 

(Ft- 1 - F T - 2 y 

T 

Ft 

Ft — Ft-i 

Ft — Ft —1 

Total 


F 2 -F 1 

E (F t - Ft-iyv-Q 

t= 1 
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the next three days are $1.71, $1.67, and $1.70, 
respectively. 

The terminal value of a long forward posi¬ 
tion is simply the exchange rate on day 3, $1.70, 
less the forward price, $1.60, times one million, 
$100,000, exactly equal to the sum of the mark- 
to-market gains/losses on the long futures po¬ 
sition. The terminal value of the long futures 
position when the mark-to-market gains/losses 
are invested / financed at the risk-free rate of in¬ 
terest, however, is $100,024.66, as is shown in 
the following table. 


Day t 

Futures 

Price 

Mark-to- 
Market 
Gain/Loss on 
day t 

Value of 
Gain/Loss on 
Day T 

0 

1.60 



1 

1.71 

110,000.00 

110,030.14 

2 

1.67 

-40,000.00 

-40,005.48 

3 

1.70 

30,000.00 

30,000.00 

Total 


100,000.00 

100,024.66 


In general, the terminal value of a long for¬ 
ward and a long futures will be different. The 
reason that the terminal values are different 
is that the terminal value of the futures posi¬ 
tion depends on how the futures price evolves 
through time. Other futures price paths will 
produce different terminal values. If, for exam¬ 
ple, the futures price had been $1.51 on day 
1 rather than $1.71, the terminal value of the 
futures position would have been $99,997.26, 


below (not above) the $100,000 terminal value 
of the long forward. 


Telescoping Futures Position 

Interestingly, the fact that a long forward po¬ 
sition does not have the same terminal value 
of a long futures position does not imply that 
the forward and futures prices are different. In¬ 
deed, as we will show shortly, they are equal. 
We can control the effect of the reinvestment 
of the mark-to-market proceeds by creating a 
"telescoping futures position." 

A telescoping futures position is created as 
follows. We begin, on day 0, with e~ rT futures 
contracts. Since we enter the position at the 
close of day 0, the marked-to-market gain for 
the day is 0. In preparation for day 1, we in¬ 
crease the size of the futures position by a 
factor e r . At the end of day 1, the futures posi¬ 
tion is marked-to-market, generating proceeds 
of i — F). If this gain/loss is carried 

forward at the risk-free interest rate until day T, 
the terminal gain/loss will be e _ '’( r_1 )(F 1 — F) 
e qr-i) — p 1 — p f as shown in Table 7. On day 2, 
the position is again increased by a factor e f and 
is marked-to-market at e~'’ (Y 2> (F 2 — Fi). Car¬ 
rying this amount forward to day T, we have 
£ —r(T-2)(p 2 _ f? l)e r(T—2) = (p 2 _ p^ and SQ Qn 

Because the number of futures is chosen to ex¬ 
actly offset the accumulated interest factor on 
the daily mark-to-market gain/loss, there will 
be exactly one futures contract on hand on day 


Table 7 Cash Flows of Telescoping Futures Position Providing Same Terminal Value as Forward Position on Day T 


Day t 

Futures 

Prices 

No. of Futures 
Contracts 

Mark-to-Market 
Gain/Loss on Day t 

Value of Gain/Loss on Day T 

0 

F 




1 

Fi 

e-HT-l) 

e -r(T-l)(f 1 _ f ) 

e-r(T-l) _ p) e HT-l) = (p 2 _ p) 

2 

F 2 

e -r(,T-2) 

e -r(r- 2 ) ( p 2 _ p i} 

f 2 -f 1 

t 

F, 

e-HT-t) 

e-r(T-‘)(F t - Ff-r) 

jrn 

1 

T 

T—l 

Ft-i 

e~ r 

e~’ (Fr_i — Fj -2 ) 

Ft-i — Ft -2 

T 

Ft 

1 

F t — Ft -i 

Ft — Ft-i 


Total Fj — F = Sj — F 










432 


Derivatives Valuation 


T, and the value of the futures position will be 
Sj—F. Assuming that the futures and forward 
contracts expire at the same time, the telescop¬ 
ing futures position will have exactly the same 
terminal value as the long forward position. 

Using an illustration, let's compare terminal 
values of long forward and long telescoping fu¬ 
tures positions. Suppose that an investor needs 
£1,000,000 in three days and wants to lock in 
the price today. Suppose also that a three-day 
forward contract on British pounds is priced at 
$1.60 per pound and that a British pound fu¬ 
tures contract with three days remaining to ex¬ 
piration also has a price of $1.60. Assume that 
the domestic risk-free interest rate is 5% and 
that the futures prices over the next three days 
are $1.71, $1.67, and $1.70, respectively. 

As in the previous illustration, the terminal 
value of a long forward position is the exchange 
rate on day 3, $1.70, less the forward price, 
$1.60, times one million, or $100,000. Because 
the initial futures position has less than 1 million 
units, the total of the mark-to-market gains/ 
losses column is less than $100,000. The 
terminal value of the telescoping futures 
position when the mark-to-market gains /losses 
are invested / financed at the risk-free rate of in¬ 
terest is exactly $100,000, as is shown in the 
following table: 


Day 

Futures 

Price 

Number of 
Units 

Mark-to- 
Market 
Gain/Loss 
on day t 

Value of 
Gain/Loss 
on Day T 

0 

1.60 




1 

1.71 

999,726.06 

109,969.87 

110,000.00 

2 

1.67 

999,863.02 

-39,994.52 

-40,000.00 

3 

1.70 

1,000,000.00 

30,000.00 

30,000.00 

Total 



99,975.35 

100,000.00 


The dynamic rebalancing of the futures position 
within the telescoping strategy ensures that the 
outcome is exactly the same as a long forward 
position. 


Equivalence of Forward and 
Futures Prices 

The fact that a long telescoping futures position 
has a terminal value of Sr — F and that a long 
forward position has a terminal value of Sr — F 
implies that the futures price and forward price 
must be equal to each other. 1 If they are not, a 
costless arbitrage profit would be possible by 
selling the forward and entering a long tele¬ 
scoping position in the futures (if / > F) or by 
buying the forward and entering a short tele¬ 
scoping position in the futures (if / < F). Given 
the equivalence of forward and futures prices, 
the valuation equations for a futures contract 
are the same as those of the forward, that is, 

F = / = Se {r ~ i)T (4) 

if all carry costs are constant continuous rates, 
and 

F = / = Se rT - FVI (5) 

if noninterest carry costs are discrete. 

Let's illustrate how to short sell stocks syn¬ 
thetically using stock futures. Retail investors 
in the U.S. often find it costly to short sell shares 
of common stock. Consequently, stocks futures 
were recently launched. Assume that an in¬ 
vestor wants to short sell a particular stock over 
the next T days. Its current share price is S, and 
a cash dividend of D has been declared and will 
be paid in t days. Let's demonstrate that selling 
a telescoping position in share futures is equiv¬ 
alent to short selling the stock. 

First, the value in T days of a short position 
in the stock must be identified. Short selling 
a share of the stock generates proceeds of S. 
Assume that an investor can take the proceeds 
from the short sale and invest them at the risk¬ 
free rate of interest. In addition, the stock pays 
a cash dividend of D on day t. The investor 
is responsible for paying the cash dividend. 
On day T, the value of each security position 
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in the portfolio is as reported in the following 
table: 


Trades 

Initial 

Investment 

Value on Day T 

Short sell stock. 
Must pay cash 
dividends, if 
any. 

S 

-S T - De^-V 

Buy risk-free 
bonds 

-S 

Se rT 

Net portfolio value 

0 

Se rT - De r{ ?- 1 '> - S- 


The net portfolio value on day T is Se rT — 

DehT-f) _ 5 t _ 

From the discussion above, we know that sell¬ 
ing a telescoping position in the share futures 
has a terminal value of F — St- But, from val¬ 
uation equation (5), we know that, in the ab¬ 
sence of costless arbitrage opportunities, F = 
Se rT — De r(l ~ t \ Substituting, we find that the 
value of the short futures position on day T is 
Se rT — De r l r_f ) — §t, an amount identical to that 
of the short stock position. 


HEDGING WITH FUTURES 

The telescoping futures position has implica¬ 
tions in terms of hedging with futures contracts. 
For the hedge to be completely effective, the 
number of futures must equal the number of 
units of the underlying asset on day T. Under 
the continuous carry cost assumption, we know 
that one unit of the asset grows to e lT units on 
day T. We also know that telescoping futures 
positions that starts with e~ rT futures contracts 
today has a single contract at time T. Con¬ 
sequently, to hedge the long asset position in 
Table 4, our futures hedge would start off with 
being short e~^ r ~^ T futures contract on day 0, 
and would scale up by a factor of e r contracts 
per day over the life of the hedge. Assum¬ 
ing the futures expires on day T, the terminal 
value of the short telescoping position would be 
—(Sr — F)e lT and the net terminal value of the 
hedged portfolio would be Fe lT . Substituting the 


net cost of carry relation (4), the net terminal 
value of the hedged portfolio may be written 
Se rT , which shows that hedging using a short 
telescoping futures position is equivalent to liq¬ 
uidating the asset position and buying risk-free 
bonds. The day-to-day increase in the size of the 
futures position by the interest factor e r undoes 
the effects of interest on the daily marking to 
market of the futures gains/losses. In practice, 
this dynamic, day-to-day adjustment is called 
tailing the hedge. 

SUMMARY 

Futures contracts are like forward contracts ex¬ 
cept that price movements are marked to mar¬ 
ket daily. Because these daily gains /losses are 
allowed to accrue interest until the end of the 
contract's life, a long futures position will not 
in general have the same terminal value as a 
long forward position. The effects of the inter¬ 
est accrual on the mark-to-market gains/losses 
can be undone, however, using a telescoping 
futures position. Each day f, the number of fu¬ 
tures is set equal to e - h T-f ) for each unit of the 
underlying asset at the end of the hedging in¬ 
terval. Set in this way, the terminal value of a 
long telescoping position in the futures equals 
the terminal value of a long forward. From a 
costless arbitrage perspective, therefore, the fol¬ 
lowing are perfect substitutes: 

Long telescoping futures position = Long 
forward position 

Short telescoping futures position = Short 
forward position 

The telescoping futures strategy also has im¬ 
plications for hedging. To undo the effects of 
interest on the daily marking to market of the 
futures gains /losses when the life of the futures 
matches the hedging horizon T, the size of a fu¬ 
tures hedge starts at a level equal to the present 
value of the number of terminal units of that 
asset, that is, e~ rT for each unit of the asset and 
increases in size by a factor of e r each day. 
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IMPLYING FORWARD NET 
CARRY RATES 

Thus far, we have examined forward/futures 
contracts with a single maturity. A casual exam¬ 
ination of the financial pages, however, shows 
multiple maturities for the same underlying as¬ 
set. In these situations, we can use the net cost 
of carry relation (2) to deduce implied forward 
cost of carry rates. 

VALUING SWAPS 

A swap contract is an agreement to exchange a 
set of future cash flows. A plain-vanilla swap 
is usually regarded to be an exchange of a 
fixed payment for a floating payment, where 
the floating payment is tied to some reference 
rate, index level, or price. Like a forward con¬ 
tract, the underlying asset can be anything from 
a financial asset such as a stock or a bond to a 
physical asset such as crude oil or gold. Also, 
like a forward contract, a swap involves no up¬ 
front payment. 

The key information needed to value a swap 
contract is th e forward curve of the underlying 
asset and the zero-coupon yield curve for 
risk-free bonds. The forward curve refers to 
the relation between the price of a forward 
contract on the underlying asset and its time 
to expiration or settlement. Where the time to 
expiration is 0, the forward price equals the pre¬ 
vailing spot price. Figure 2 shows two possible 
forward curve relations. A normal forward curve 
is upward sloping, and an inverted forward curve 
is downward sloping. For financial assets, the 
slope will depend on the net difference between 
the risk-free rate and the income received on 
the underlying asset. Thus, a normal forward 
curve will arise in markets where the interest 
rate is greater than the income rate, and an 
inverted forward curve will arise in markets 
where the interest rate is less than the income 
rate. For physical assets or commodities, the 
nature of the forward curve depends also on 



Figure 2 Forward curve: Relation between for¬ 
ward price and its time to expiration. Where time 
to expiration is 0, forward price equals spot price. 

the cost of storage and convenience yield. The 
zero-coupon yield curve refers to the relation 
between interest rates and term to maturity. 

In terms of swap valuation, the nature of the 
forward curve is irrelevant as long as the for¬ 
ward prices represent tradable prices. To see 
this, consider a jeweler (i.e., long hedger) who 
needs 1,000 troy ounces of gold each quarter 
over the next two years and wants to lock in his 
input cost today. One hedging alternative is to 
buy a strip of forward (or futures) contracts, one 
corresponding to each desired delivery date. 
The cost of the gold each quarter will be locked 
in; however, the cost of the gold will be dif¬ 
ferent each quarter unless the forward curve 
is a horizontal line. The gold market, however, 
is typically in contango, so the cost, although 
certain, will escalate through time. A second 
alternative is to buy a swap contract that pro¬ 
vides for the delivery of 1,000 ounces of gold 
each quarter, where there is single fixed price 
for all deliveries. 2 In the absence of costless ar¬ 
bitrage opportunities, it must be the case that 
the present value of the deliveries using the for¬ 
ward curve must be the same as the present 
value of the deliveries using the fixed price of 
the swap contract, that is, 

X>- riTi = X>- riTi (6) 

i =1 i =1 
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where n is the number of delivery dates, /, is 
the price of a forward contract with time to 
expiration T ; , r, is the risk-free rate of interest 
corresponding to time to expiration T,, 3 and/ 
is the fixed price in the swap agreement. 4 In 
an instance where the right-hand side of (6) 
is greater (less) than the left-hand side, an ar¬ 
bitrageur would buy (sell) the swap and sell 
(buy) the strip of forward contracts, pocketing 
the difference. Because such free money op¬ 
portunities do not exist, (6) must hold as an 
equality. 

Equation (6) can be rearranged to isolate the 
fixed price of the swap agreement, that is. 


7 = 


E/i« 

z=l 


-nTi 


Ee- riTl 

i =1 


E7 


/ \ 

p-nTi 


n 



(7) 


Expressed in this fashion, it becomes obvious 
that the fixed price of a swap is a weighted 
average of forward prices, one corresponding 
to each delivery date. 5 


• The interest cost is modeled as a con¬ 
stant continuous rate and the noninterest 
costs/benefits as either continuous rates or 
discrete cash flows, depending on the nature 
of the underlying asset. 

• Given the assumption and definition of the 
cost of carry, pricing equations for forward 
and futures contracts can be developed. The 
price of a forward equals the price of a fu¬ 
tures and both are equal to the asset price 
plus net carry costs. This is because if an in¬ 
vestor needs an asset on hand at some future 
date at a price "locked-in" today, the investor 
can buy a forward contract, buy a futures, or 
buy the underlying asset and carry it. 

* Perfect substitutes must have the same 
price. 

* The relation between the forward curve 
and the fixed price of a swap is as follows. 
In the absence of costless arbitrage opportu¬ 
nities, the fixed price is a weighted average of 
the prices of the corresponding forward con¬ 
tracts, with the weights equal to the discount 
factor of each flow in relation to the sum of all 
discount factors. 


KEY POINTS 

* The net cost of carry is the cost of holding an 
asset over a period of time. One component 
of the cost of carry for all assets is the oppor¬ 
tunity cost of funds. In order to buy the asset, 
an investor must pay for it. 

* Beyond interest cost, however, carry costs 
may be positive or negative, depending upon 
the nature of the underlying asset. If the as¬ 
set is a physical asset or commodity such as 
grain, the asset holder must pay storage costs 
such as warehouse rent and insurance. If the 
underlying asset is a financial asset or secu¬ 
rity such as a stock, a bond, or a currency, on 
the other hand, there are no storage costs. In¬ 
stead, such assets produce a known income 
stream in the form of dividend payments or 
interest receipts, and this income can be used 
to subsidize the cost of borrowing. 


NOTES 

1. Cox, Ingersoll, and Ross (1981) use no¬ 
arbitrage arguments to demonstrate the 
equivalence of forward and futures prices 
when future interest rates are known. They 
go on to show, however, that if interest rates 
are uncertain, the futures price will be greater 
than or less than the forward price, de¬ 
pending upon whether the correlation be¬ 
tween futures price changes and interest 
rate changes is negative or positive. See also 
Jarrow and Oldfield (1981). 

2. As a practical matter, many swap agree¬ 
ments are cash-settled, so, instead of pay¬ 
ing the fixed price per ounce and receiving 
1,000 ounces in gold, we will receive in cash 
1,000 times the difference between the pre¬ 
vailing (random) spot price of gold each 
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quarter and the fixed price. If the spot price is 
greater than the fixed price, we receive a cash 
payment from our counter-party, and vice 
versa. 

3. Note that we are allowing for the fact that 
the risk-free rate may be term-specific. 

4. The delivery quantity is irrelevant since it 
is the same on both sides of the equation. 
That is, equation (6) assumes that one unit is 
delivered on each delivery date. 

5. For illustrations of how to compute the fixed 
rate of a swap based on the forward curve 
and the unwind price of swap based on 


forward curve, see Chapter 4 in Whaley 
(2006). 
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Abstract: For derivative instruments, in the absence of costless arbitrage price relations can be 
developed. In the case of options (calls and puts), there are three types of price relations that can 
be obtained. The first is the lower bound on the option's price. The second, and perhaps most 
important, no-arbitrage price relation is the one between the price of a put and the price of a call. 
This relation is called the put-call parity relation and arises from simultaneous trades in the call, the 
put, and the asset. The third price relation is the intermarket relation, which is the link between the 
prices of asset options and the prices of futures options. The price relations exist for European-style 
and American-style options and under both the continuous rate and discrete flow net cost of carry 
assumptions. Price relations are important for risk management strategies using options. Option 
pricing models go beyond these price relations to provide a fair value for an option. 


The purpose of this entry is to develop no¬ 
arbitrage price relations for option contracts as¬ 
suming that two perfect substitutes have the 
same price. In the absence of costless arbitrage 
opportunities, options have three types of no¬ 
arbitrage price relations—lower bounds, put- 
call parity relations, and intermarket relations. 
Each type of relation is developed in turn, for 
both European- and American-style options 1 
and under both the continuous rate and discrete 
flow net cost of carry assumptions. Before deriv¬ 
ing the no-arbitrage price relations for options, 
however, we focus on clearly distinguishing be¬ 
tween the characteristics of option and forward 
contracts. 


OPTIONS AND FORWARDS 

Options differ from forwards in two key re¬ 
spects. First, the net cost of carry of a forward 
contract is zero since it involves no investment 
outlay. An option, on the other hand, involves 
investment. An option buyer pays the option 
premium for the right to buy or sell the un¬ 
derlying asset, and, like the buyer of any other 
asset, faces carry costs. For an option, however, 
the only carry cost is interest. Holding an op¬ 
tion neither produces income like a dividend¬ 
paying stock nor requires storage costs like a 
commodity (i.e., a physical asset). 

The effects of carry costs on the terminal profit 
functions of forward and option contracts are 
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Profit, tc t < a > Lon S fo^ard 


Profit, 7t T < b > Short forward 




Figure 1 Terminal Profit of Long and Short Forward Positions 


shown in Figures 1 through 3. The profit from a 
long forward position at expiration is 


^hong forward, T — 5 / f 


( 1 ) 


where Sj denotes the future price of the asset 
and / denotes the forward price. 

On the other hand, the profit from a long call 
position is 


TTlong call T 


S T — X — ce rT , if S T > X 
—ce rT , if S r < X 


( 2 ) 


and from a long put position is 

f — pe rT , if S T > X 

X — Sr — pe rT , if S T <X 


^long put, T — 


( 3 ) 

where c and p are the prices of a European- 
style call and put, respectively; X is the exercise 


price or strike price of the option. The oppor¬ 
tunity cost of funds (i.e., the risk-free rate of 
interest) is denoted by r. Note that the profit 
functions for the long call and the long put 
(2) and (3) reflect the fact that the initial op¬ 
tion premiums, c and p, are carried forward 
until the option's expiration at the risk-free in¬ 
terest rate. We have lost the opportunity cost 
of the funds we tied up in buying the option. 
Conversely, short call and short put positions 
(i.e., TTshort call, T — TT| on g ca || 7' and TTghort put. T = 
—7Tiong put, r) reflect the fact that the option seller 
receives the premium payment and invests the 
cash at the risk-free interest rate. The profit func¬ 
tion of a long forward position (1) has no in¬ 
terest component since the forward price is a 
promised payment on day T rather than a cash 
outlay today. 


Profit, n T (a) Long call Profit, jt T ( b ) short ca U 




Figure 2 Terminal Profit of Long and Short Call Positions 
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Profit, n T (a) Long put 


Profit, k t (b) Short put 




Figure 3 Terminal Profit of Long and Short Put Positions 


The second key difference between forwards 
and options is that the buyer of a forward is 
obliged to buy the underlying asset at expira¬ 
tion, independent of whether or not the termi¬ 
nal asset price is greater than or less than the 
initial forward price. The buyer of an option, 
on the other hand, is not obliged to buy or 
sell the underlying asset, but will do so only 
when it is profitable. The profit function for the 
long call position (2), for example, shows that 
the option is exercised only when Sr > X. If 
St < X, the call option buyer chooses not to 
exercise, forfeiting only his original investment 
plus carry costs, ce rT . The limited liability fea¬ 
ture of the long call and long put positions are 
illustrated in Figures 2a and 3a, respectively. In 
the interest of completeness, the short positions 
in the respective instruments are illustrated in 
Figures lb through 3b. 

The profit functions of the call and the put 
show a certain complementarity to the profit 
function of a forward. Suppose we buy a call 
and sell a put at the same exercise price. The 
profit function for the overall position is 

S T -X- ce rT + pe rT if S T > X 
S T -X- ce rT - pe rT if S T < X 
= St — X — ce rT — pe rT 

Now, suppose that we chose the exercise price 
of the options such that X — f— ce rT + pe rT . 
The profit functions of the option portfolio and 
the long forward position will be exactly the 
same. If we buy the option portfolio and sell the 


forward contract, the terminal value of the over¬ 
all position must be 0. In the absence of costless 
arbitrage opportunities, the current value of the 
position must also be equal to 0, and, therefore, 
the call and put prices must be equal. Buying 
the call and selling the put (with the exercise 
price defined as above) is a perfect substitute 
for buying a forward. Viewed in this way, we 
can construct virtually any derivatives contract 
from any of the following pairs of basic instru¬ 
ments: (1) a forward and a call, (2) a forward 
and a put, and (3) a call and a put. 


CONTINUOUS RATES 

The net cost of carry refers to the difference be¬ 
tween the costs and the benefits of holding an 
asset. One carry cost common to all assets is 
the opportunity cost of funds. We assume this 
cost is incurred at the risk-free rate of inter¬ 
est. Beyond interest cost, however, carry costs 
vary depending upon the nature of the asset. 
For a physical asset or commodity, we incur 
storage costs (e.g., rent and insurance). At the 
same time, certain benefits may accrue. By stor¬ 
ing wheat we may avoid some costs of possi¬ 
ble running out of our regular inventory before 
two months are up and having to pay extra for 
emergency deliveries. This is called convenience 
yield. Thus, the net cost of carry for a commod¬ 
ity equals interest cost plus storage costs less 
convenience yield. 
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Table 1 Arbitrage Portfolio Trades Supporting Lower Price Bound of European-Style Call 
Option Where the Underlying Asset Has a Continuous Net Carry Rate, c > Se~ lT — Xe~ rT 




Value on Day T 

Trades 

Initial Investment 

S T < X 

Sr > X 

Sell asset 

Se~ ir 

— St 

— St 

Buy call option 

—c 

0 

-S t -X 

Buy risk-free bonds 

—Xe~ rT 

X 

X 

Net portfolio value 

Se~ iT - Xe~ rT - c 

X- Sr 

0 


For a financial asset or security such as a stock 
or a bond, the carry costs/benefits are differ¬ 
ent. While borrowing costs remain, securities 
do not require storage costs and do not have 
convenience yields. What they do have, how¬ 
ever, is income (yield) that accrues in the form of 
quarterly cash dividends or semiannual coupon 
payments. Thus, the net cost of carry for a se¬ 
curity is equal to the cost of funds reduced by 
income. Carry costs and benefits are modeled 
either as continuous rates or as discrete flows. 
Some costs/benefits such as the cost of funds 
(i.e., the risk-free interest rate) are best modeled 
as continuous rates. 

Under the continuous rate assumption, both 
interest cost and noninterest costs/benefits are 
modeled as continuous rates. Under the dis¬ 
crete flow assumption, interest cost is mod¬ 
eled as a continuous rate but noninterest costs/ 
benefits are modeled as discrete cash flows. This 
section relies on the continuous rate assump¬ 
tion. The interest carry cost rate is represented 
by the notation r, and the noninterest carry 
benefit / cost rate is i. If the asset holder receives 
income from holding the asset such as the div¬ 
idend yield on a stock portfolio or interest on 
a foreign currency investment, the income rate 
is positive (i.e., i > 0). If the asset holder pays 
costs in addition to interest in order to hold the 
asset (e.g., storage costs of holding a physical 
commodity), the income rate is negative (i.e., 
i < 0). Where i = 0, the only cost of carry is 
interest. As noted earlier in this section, the net 
cost of carry of an option is simply the interest 
rate. 


Lower Price Bound of 
European-Style Call 

Under the continuous rate assumption, the 
lower price bound of a European-style call op¬ 
tion is 

c > max(0, Se~ iT - Xe~ rT ) (4) 

The reason that the call price must be greater 
or equal to 0 is obvious—we do not have to be 
paid to take on a privilege. The reason the call 
price must exceed Se~ lT — Xe~ rT is less obvious 
and is derived by means of an arbitrage port¬ 
folio. Suppose we form a portfolio by selling 
e~ lT units of the underlying asset 2 and buying 
a European-style call. In addition, to make sure 
that we have enough cash on hand to exercise 
the call at expiration, we buy Xe~ rT in risk-free 
bonds. The initial investment and terminal val¬ 
ues of these positions are shown in Table 1. On 
day T, the net terminal value of the portfolio 
depends on whether the asset price is above or 
below the exercise price. If the asset price is less 
than the exercise price (i.e., St < X), we let the 
call expire worthless. We then use the risk-free 
bonds to buy one unit of the asset to cover the 
short sale obligation. What remains is X — St, 
which we know is greater than 0. If the asset 
price is greater than or equal to the exercise 
price (i.e., St > X), we exercise the call. This 
requires a cash payment of X. Fortunately we 
have exactly that amount on hand in the form 
of risk-free bonds. The unit of the asset that we 
receive upon exercising the call is used to retire 
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the short sale obligation. In this case, the net 
terminal value is certain to be 0. 

What are the implications of this strategy? 
Well, we have formed a portfolio that is cer¬ 
tain to have a terminal value of at least 0. 
In the absence of costless arbitrage opportuni¬ 
ties, this implies that the greatest initial value 
is 0. More simply, we cannot reasonably ex¬ 
pect to collect money at the outset without 
risk of loss. In the absence of costless arbitrage 
opportunities, Se~ lT — Xe~ rT — c < 0. Hence, a 
lower price bound for the European-style call is 
c > Se~ iT - Xe~ rT . 3 

In general, the lower price bound of an op¬ 
tion is called its intrinsic value, and the differ¬ 
ence between the option's market value (price) 4 
and its intrinsic value is called its time value. 
Thus a European-style call has an intrinsic value 
of max(0,Se _,T — Xe ~ rT ) and a time value of 
c — max(0,Se _ ' T — Xe~ rT ). This entry deals 
with identifying intrinsic values by virtue of 
no-arbitrage arguments. Option pricing mod¬ 
els uncover the determinants of time value. 

To illustrate, suppose a three-month European- 
style call option written on a stock index portfo¬ 
lio has an exercise price of 70 and a market price 
of 4.25. Suppose also the current index level is 
75, the portfolio's dividend yield rate is 4%, and 
the risk-free rate of interest is 5%. Is a costless 
arbitrage profit possible? 

To test for the possibility of a costless arbitrage 
profit, substitute the problem parameters into 
the lower price bound (4), that is, 

4.25 < max[0, 75e-° 04(3/12 ) - 70e-° 05 < 3 /i2)] = 5.12 

Since the lower bound relation is violated, a 
costless arbitrage profit of at least 5.12 — 4.25 
= 0.87 is possible. Since the violation may re¬ 
sult from either the call being underpriced or 
the asset being overpriced, the arbitrage re¬ 
quires buying the call and selling the asset. 5 The 
appropriate arbitrage trades are provided in 
Table 1. Substituting the prices and rates. 


Trades 

Initial 

Investment 

Value at Time T 

ST < 70 

ST> 70 

Sell index portfolio 

74.25 

— St 

— St 

Buy call option 

-4.25 

0 

S t - 70 

Buy risk-free bonds 

-69.13 

70 

70 

Net portfolio value 

0.87 

70-S r 

0 


In examining the net portfolio value, note that 
you (a) earn an immediate profit of 0.87, and (b) 
have the potential of earning even more if the in¬ 
dex level is below 70 at the option's expiration. 
If prices in the market were actually configured 
at such levels, you should expect that buying 
pressure on the call and selling pressure on the 
index portfolio would very quickly return the 
market to equilibrium. In the absence of costless 
arbitrage opportunities, c > Se~ iT - Xe~ rT . 

Lower Price Bound of 
American-Style Call 

American-style options are like European-style 
options except that they can be exercised at any 
time up to and including the expiration day. 
Since this additional right cannot have a neg¬ 
ative value, the relation between the prices of 
American-style and European-style call options 
is 

C > c (5) 

where the uppercase C represents the price of 
an American-style call option with the same ex¬ 
ercise price and time to expiration and on the 
same underlying asset as the European-style 
call. The lower price bound of an American- 
style call option is 

C >max(0, Se~ H - Xe~ rt , S - X) (6) 

This is the same as the lower price bound of the 
European-style call (4), except that the term S — 
X is added within the maximum value operator 
on the right-hand side. The reason is, of course, 
that the American-style call cannot sell for 
less than its immediate early exercise proceeds, 
S — X. If C < S — X, a costless arbitrage profit 
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of S — X — C can be earned by simultaneously 
buying the call (and exercising it) and selling 
the asset. 

As an illustration, suppose a three-month 
American-style call option written on a stock 
index portfolio has an exercise price of 70 and 
a market price of 4.25. Suppose also the current 
index level is 75, the portfolio's dividend yield 
rate is 4%, and the risk-free rate of interest is 
5%. Is a costless arbitrage profit possible? 

To test for the possibility of a costless arbitrage 
profit, substitute the problem information into 
(6), that is, 

4.25 < max[0, 75e~ a04(3/12) 

- 70e _0 05 ( 3/12 ), 75 - 70] 

= max(0, 5.12, 5) = 5.12 

At the current call price of 4.25, two types of ar¬ 
bitrage are possible. A costless arbitrage profit 
of 5.00 — 4.25 = 0.75 is possible simply by buy¬ 
ing the call, exercising it, and selling the asset. 
The amount of this arbitrage profit, however, is 
less than the arbitrage profit of at least 5.12 — 
4.25 = 0.87 that can be earned by buying the 
call, selling the asset, buying risk-free bonds, 
and holding the portfolio until the call's expi¬ 
ration, as was shown in the previous arbitrage 
table. Under this second alternative, you earn 
an immediate profit of 0.87, and have the po¬ 
tential of earning even more if the asset price is 
below 70 at the option's expiration. 

Early Exercise of American-Style 
Call Options 

The structure of the lower price bound of the 
American-style call (6) can be used to provide 
important insight regarding the possibility of 
early exercise. The second term in the squared 
brackets, Se~ lT — Xe~ rT , is the minimum price at 
which the call can be sold in the marketplace. 6 
The third term is the value of the American- 
style if it is exercised immediately. If the value 


of the second term is greater than the third term 
(for a certain set of call options), the call's price 
in the marketplace will be always exceed its 
exercise proceeds so it will never be optimal to 
exercise the call early. 

To identify this set of calls, we must examine 
the conditions under which the relation 

Se~ u - Xe~ rt > S-X 

holds. The job is easier if we rearrange the rela¬ 
tion to read 

S(e~ iT — 1) > —X(1 — e~ rT ) (7) 

Since the risk-free interest rate is positive, the 
expression of the right-hand side is negative. If 
the left-hand side is positive or zero, the call op¬ 
tion holder can always get more by selling his 
option in the marketplace than by exercising 
it; so early exercise will never be optimal and 
the value of the American-style call is equal to 
the value of the European-style call, C = c. This 
condition is met for calls whose underlying as¬ 
set has a negative or zero noninterest carry rate, 
i < 0. 

The intuition for this result can be broken 
down into two components—interest cost, r, 
and noninterest benefit (i.e., i > 0) or cost (i.e., 
i < 0). With respect to interest cost, recognize 
that exercising the call today requires that we 
pay X today. If we defer exercise until the call's 
expiration, on the other hand, we have the 
opportunity to earn interest (i.e., our liability 
is only the present value of the exercise cost, 
Xe~ rT ). So, holding other factors constant, we al¬ 
ways have an incentive to defer exercise. 7 With 
respect to the noninterest costs, recall that as¬ 
sets with i < 0 are typically physical assets that 
require storage. If we exercise a call written on 
such an asset, we must take delivery, where¬ 
upon we immediately begin to incur storage 
costs. If we defer exercise, on the other hand, 
and continue to hold the claim on the asset 
rather than the asset itself, we avoid paying 
storage costs. Thus, where i < 0, there are two 
reasons not to exercise early. But even if storage 
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costs are zero (i.e., with i = 0), condition (7) 
holds since the interest cost incentive remains. 

For American-style call options on assets with 
i > 0 (e.g., stock index portfolio with a nonzero 
dividend yield and foreign currencies with a 
nonzero foreign interest rate), on the other 
hand, early exercise may be optimal. The intu¬ 
ition is that, while there remains the incentive 
to defer exercise and earn interest on the exer¬ 
cise price, deferring exercise means forfeiting 
the income on the underlying asset (e.g., the 
dividend yield on a stock index portfolio). The 
only way to capture this income is by exercis¬ 
ing the call and taking delivery of the asset. For 
American-style call options on assets with i > 
0, early exercise may be optimal and, therefore, 
C > c. 


Lower Price Bound of 
European-Style Put 

The lower price bound of a European-style put 
option is 

p > max(0, Xe~ rT — Se~ lT ) (8) 

Again, the reason that the option price must 
be greater or equal to 0 is obvious—we do not 
have to be paid to take on a privilege. The 
reason the put price must exceed the bound, 
Xe~ rT — Se ~ lT , is given by the arbitrage trade 
portfolio in Table 2. If we buy e~' T units of the 
asset and a put, and sell Xe~ rT risk-free bonds, 
the net terminal value of the portfolio is certain 
to be greater than or equal to 0. If the asset price 
is less than or equal to the exercise price at the 


option's expiration (i.e., St < X), we will exer¬ 
cise the put, delivering the asset and receiving X 
in cash. We will then use the exercise proceeds 
X to cover our risk-free borrowing obligation. 
In the event the asset price is greater than the 
exercise price (i.e., St < X), we will consider the 
put expire worthless. We still need to cover our 
risk-free borrowing, which we do by selling the 
asset. After repaying our debt, we have St — X 
remaining. 

For example, a three-month European-style put 
option written on a stock index portfolio has 
an exercise price of 70 and a market price of 
8.80. Suppose also the current index level is 61, 
the portfolio's dividend yield rate is 4%, and 
the risk-free rate of interest is 5%. Is a costless 
arbitrage profit possible? 

To test for the possibility of a costless arbitrage 
profit, substitute the problem parameters into 
the lower price bound (8), 

8.80 > max[0, 70 e -° 05{3/12) - 61e“ 0 04(3/12) ] = 8.74 

At the current price of 8.80, the no-arbitrage 
condition (8) holds, so no costless arbitrage op¬ 
portunity exists. 

Lower Price Bound for 
American-Style Put 

An American-style put has an early exer¬ 
cise privilege, which means that the rela¬ 
tion between the prices of American-style and 
European-style put options is 

P > p (9) 


Table 2 Arbitrage Portfolio Trades Supporting Lower Price Bound of European-Style Put 
Option Where the Underlying Asset Has a Continuous Net Carry Rate, p > Xe~ rT - Se~ iT 




Value on Day T 

Trades 

Initial Investment 

S T < X 

St>X 

Sell asset 

—Se~ iT 

St 

St 

Buy call option 

~P 

X-S t 

0 

Buy risk-free bonds 

Xe~ rT 

-X 

-X 

Net portfolio value 

Xe~ rT - Se~ iT - p 

0 

S T -X 
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where uppercase P represents the price of an 
American-style put option with the same ex¬ 
ercise price, time to expiration, and underlying 
asset as the European-style put. The lower price 
bound of an American-style put option is 

p > max(0, Xe~ rT - Se~ iT , X-S) (10) 

This is the same as the lower price bound of 
the European-style put (8), except that, because 
the American-style put may be exercised at any 
time including now, the exercise proceeds, X — 
S, is added within the maximum value operator 
on the right-hand side. If P < X — S, a costless 
arbitrage profit of X — S — P can be earned by 
simultaneously buying the put (and exercising 
it) and buying the asset. 

To illustrate, assume that a three-month 
American-style put option written on a stock 
index portfolio has an exercise price of 70 and 
a market price of 8.80. Suppose also the current 
index level is 61, the portfolio's dividend yield 
rate is 4%, and the risk-free rate of interest is 
5%. Is a costless arbitrage profit possible? 

To test for the possibility of a costless arbitrage 
profit, substitute the problem information into 
(10), that is, 

8.80 < max[0, 70 e -° 05 < 3 / 12 > 

- 61e- 0 04 < 3 / 12 >, 70 - 61] 

= max(0, 8.74, 9.00) = 9.00 

At the current price of 8.80, the no-arbitrage re¬ 
lation (10) is violated, indicating the presence 
of a costless arbitrage opportunity. Since it is 
the early exercise condition (third term) on the 
right-hand side that is violated, you should buy 
the put (and exercise it) and buy the index port¬ 
folio. You would pay 8.80 for the put and 61 for 
the index portfolio, and receive 70 when you 
deliver the index portfolio upon exercising the 
put. The amount of the arbitrage profit is 0.20 
and is earned immediately. 


Early Exercise of American-Style 
Put Options 

In the case of an American-style call, we found 
that if the underlying asset had carry costs and 
above interest (e.g., storage), the call option 
holder would never (rationally) exercise early. 
In the case of an American-style put, no com¬ 
parable condition exists. 8 There is always some 
prospect of early exercise, so the American-style 
put is always worth more than the European- 
style put, that is, P > p. The intuition is 
straightforward. Suppose, for whatever reason, 
the asset price falls to 0. The put option holder 
should exercise immediately. There is no chance 
that the asset price will fall further, so delaying 
exercise means forfeiting the interest income 
that can be earned on the exercise proceeds of 
the put, X. The interest-induced, early-exercise 
incentive works in exactly the opposite way for 
the put than it did for the call. For the put, we 
want to exercise early to get the cash and let it 
begin to earn interest. For the call, we want to 
defer exercise and let the cash continue to earn 
interest. 


Put-Call Parity for European-Style 
Options 

Perhaps the most important no-arbitrage price 
relation for options is put-call parity. 9 The 
put-call parity price relation arises from the 
simultaneous trades in the call, the put, and 
the asset. Put-call parity for European-style 
options is given by 

c-p= Se~ iT - Xe~ rT (11) 

The composition of the put-call parity arbitrage 
portfolio is given in Table 3. A portfolio that 
consists of a long position of e~ lT units of 
the asset, a long put, a short call, and a short 
position of Xe~' r in risk-free bonds is certain to 
have a net terminal value of 0. If the terminal 
asset price is less than or equal to the exercise 
price of the options (i.e., S y < X), we exercise 
the put and deliver the asset. The cash proceeds 
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Table 3 Arbitrage Portfolio Trades for European-Style Put-Call Parity Where the 
Underlying Asset Has a Continuous Net Carry Rate, c — p = Se~ lT — Xe~ rT 




Value at Time T 

Trades 

Initial Investment 

S T < X 

S T >X 

Buy asset 

h 

T 

1 

St 

St 

Buy put option 

~P 

X- Sr 

0 

Sell call option 

c 

0 

-(Sr - X) 

Sell risk-free bonds 

Xe~ rT 

-X 

-X 

Net portfolio value 

Xe~ rT — Se~ lT — p + c 

0 

0 


from exercise are used to repay our debt. The 
call option is out-of-the-money, so the call 
option holder will let it expire worthless. On 
the other hand, if the terminal asset price 
exceeds the exercise price (i.e., St > X), we will 
let our put expire worthless. The call option 
holder will exercise, requiring that we deliver 
a unit of the asset, which we just happen to 
have. 10 The call option holder pays us X, which 
we use to retire our risk-free borrowings. Since 
the net terminal portfolio value is zero, the cost 
of entering into such a portfolio today must 
also be 0, otherwise costless arbitrage would 
be possible. If the initial investment is 0, the 
put-call parity relation (11) holds. 

The set of arbitrage trades spelled out in 
Table 3 (i.e., buy the asset, buy the put, sell the 
call, and sell risk-free bonds) is called a conver¬ 
sion. If all of the trades are reversed (i.e., sell the 
asset, sell the put, buy the call, and buy risk-free 
bonds), it is called a reverse conversion. These 
names arise from the fact that we can create any 
position in the asset, options, or risk-free bonds 
by trading (or converting) the remaining secu¬ 


rities, in the same manner we used a call and a 
put to create a forward contract at the beginning 
of the entry. Table 4 provides a complete list of 
the conversions that are possible using the put- 
call parity relation for European-style options. 
The first row says that buying the asset, buying 
a put, and selling a call is equivalent to buying 
risk-free bonds. We can check this by creating 
an arbitrage trade table, or by simply working 
through it mentally. If the asset price is less than 
the exercise price at expiration, we will exercise 
our put and sell the asset. If the asset price is 
greater than the exercise price, the call option 
holder will exercise, requiring that we deliver 
the asset. In both cases, we are certain to have 
X in cash when all is said and done. This is the 
same as the amount we would have had if we 
bought risk-free bonds. 

Let's see how put-call parity is applied for 
European-style options. Suppose that a three- 
month call and put with an exercise price of 
70 have prices of 5.00 and 4.50, respectively. 
Suppose also that the current level of the index 


Table 4 Perfect Substitutes Implied by European-Style Put-Call Parity 


Position 1 


Position 2 

Buy asset/buy put/sell call 

= 

Buy risk-free bonds (lend) 

Buy asset/buy put/sell risk-free bonds 

= 

Buy call 

Sell asset/buy call/buy risk-free bonds 

= 

Buy put 

Sell put/buy call/buy risk-free bonds 

= 

Buy asset 

Sell asset/ sell put/buy call 

— 

Sell risk-free bonds (borrow) 

Sell asset/sell put/buy risk-free bonds 

= 

Sell call 

Buy asset/sell call/sell risk-free bonds 

= 

Sell put 

Buy put/sell call/sell risk-free bonds 

= 

Sell asset 
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portfolio underlying the options is 70, the index 
portfolio has a dividend yield rate of 3%, and 
the risk-free rate of interest is 5%. Is a costless 
arbitrage profit possible? 

To test for the possibility of a costless arbitrage 
profit, substitute the problem parameters into 
the put-call parity relation (11), 

5.00 - 4.50 = 0.50 > 70e -ao3(3/12) 

- 70 e - ao5 < 3 / 12 > = 0.34 

Since the equation does not hold, a costless arbi¬ 
trage profit is possible. Since the violation may 
result from either the call being overpriced, the 
put being underpriced, or the asset being un¬ 
derpriced, the arbitrage will require all three 
trades: selling the call, buying the put, and buy¬ 
ing the asset. Using the trades as set out in 
Table 3, we get: 


Trades 

Initial 

Investment 

Value at Time T 

S T < 70 

S T >70 

Buy asset 

-69.48 

S T 

S T 

Buy put option 

-4.50 

S t -70 

0 

Sell call option 

5.00 

0 

-(Sr - 70) 

Sell risk-free bonds 

69.13 

-70 

-70 

Net portfolio value 

0.16 

0 

0 


By forming this portfolio, we generate a costless 
arbitrage profit of 0.16. The buying pressure on 
the index portfolio and the put will cause their 
prices to rise, and the selling pressure on the 
call will cause its price to fall. The arbitrage 
trading will stop when the initial value invest¬ 
ment column sums to zero (i.e., the costless ar¬ 


bitrage opportunity ceases to exist), or where 

c-p = Se~ iT - Xe~ rT . 

Put-Call Parity for American-Style 
Options 

The early exercise feature of American-style op¬ 
tions complicates the put-call parity relation. 
The nature of the relation depends on the level 
of noninterest costs/benefits, i. Specifically, the 
put-call parity relations are 

S - X < C - P < Se~ iT - Xe~ rT if i = 0 

(12a) 

and 

Se~ iT - X<C -P <S- Xe~ rT if i > 0 

(12b) 

Each inequality in (12a) and in (12b) has a sep¬ 
arate set of arbitrage trades. To illustrate, con¬ 
sider (12b), the case in which the asset pays 
some form of income, say, a stock index port¬ 
folio with a constant dividend yield rate, or a 
foreign currency with a constant foreign risk¬ 
free rate of interest. To establish the left-hand 
side inequality of (12b), consider the arbitrage 
portfolio trades in Table 5. To generate the table 
entries, assume the left-hand side inequality of 
(12b) is reversed. This means the asset price is 
overpriced, the put is overpriced, and/or the 
call is underpriced. Thus, the arbitrage port¬ 
folio must account for all three possibilities. 
We should sell the asset, sell the put, buy the 
call, and buy some risk-free bonds. At the op¬ 
tions' expiration, the portfolio is certain to have 


Table 5 Arbitrage Portfolio Trades Supporting American-Style Put-Call Parity Where the Underlying Asset Has a 
Continuous Net Carry Rate, Se~ ,T — X < C — P 


Trades 

Initial Investment 

Early Exercise at t 

Value on Day T 

S T <X S T > X 

Sell asset 

—Se~ iT 

-S ( e- i < r -') 

— St 

— St 

Sell put option 

P 

-(X-S f ) 

-(X-Sr) 

0 

Buy call option 

-C 

-c, 

0 

S T -X 

Buy risk-free bonds 

-X 

Xe rt 

Xe rt 

Xe rl 

Net portfolio value 

Se~ iT + P -C-X 

Sf[l — e -, "C r- 0] + C f + X(e rT — 1) 

X(e rT - 1) 

X(e rT - 1) 
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positive value X(e' T — 1). If Sj < X, the put op¬ 
tion holder exercises, requiring that we pay X in 
return for a unit of the underlying asset. We pay 
the exercise price using a portion of our risk-free 
bonds, and use the delivered asset to cover our 
short position. On the other hand, if St > X, we 
exercise the call and receive the asset. The asset 
delivered on the call is used to cover the short 
position. We use some of the risk-free bonds to 
pay for the exercise price of the call. 

The early exercise feature of the American- 
style options requires that we consider one 
other contingency within the arbitrage table, 
that is, what happens if the put option holder 
decides to exercise early at some arbitrary time 
t between now and expiration. Looking at 
Table 5, we see that our obligation should the 
put be exercised early is —(X — St). But since we 
have Xe n in risk-free bonds, we have more than 
enough to cover the payment of X to the put 
option holder. In return, we receive St, which 
is more than enough to cover our short asset 
position in the asset that has value —5 te~ 1< ~ T ~ t \ 
In addition, we have a long position in the 
call with value Cy. Because the net portfolio 
value is positive at expiration and also in the 
event the put is exercised early, the initial in¬ 
vestment must be negative (since if it were 
zero or positive, there would be a certain ar¬ 
bitrage). And, if Se~ lT — X — C + P <0, then 
Se~ iT - X < C + P. 

To establish the right-hand side inequality of 
(12b), consider the arbitrage portfolio trades 
in Table 6. To generate the table entries, again 
assume the right-hand side inequality of (9b) 


is reversed. This means the asset price is 
underpriced, the put is underpriced, and/or the 
call is overpriced. The arbitrage portfolio trades 
must account for all possibilities. We should 
buy the asset, buy the put, sell the call, and 
sell some risk-free bonds. At the options' expi¬ 
ration, the portfolio is certain to have positive 
value Sy(e' 7 — 1). If Sy < X, we exercise the put 
and sell the asset. The long asset position has a 
value Sy e' 7 , which is more than enough to pay 
for the unit of the asset owed on the put. The 
cash received from exercising the put is used 
to cover our risk-free bond obligation. On the 
other hand, if St > X, the call option holder ex¬ 
ercises, implying that we receive X in return for 
delivering one unit of the asset. We use the call 
received from the call option holder to retire the 
risk-free bond position. The value of our asset 
position, Sye' 7 , is more than we need to deliver 
on the put. 

The early exercise feature of the American- 
style call must also be considered, that is, what 
happens if the call option holder decides to ex¬ 
ercise early on day f? Looking at Table 6, we 
see that the call exercise obligation is — (Sy — X). 
But, if we receive X, that is more than enough 
to cover the balance of —Xe~'^ T ~^ in risk-free 
bonds. We must pay S f , but we have more than 
one unit of the asset, that is, Sye'( r-f k In addi¬ 
tion, we have a long position in the put with 
value Pf. Since the net portfolio value is pos¬ 
itive at expiration and in the event the call is 
exercised early, the initial investment must be 
negative. And, if -S + Xe~ rT + C — P < 0, 
C-P <S- Xe~ rT . 


Table 6 Arbitrage Portfolio Trades Supporting American-Style Put-Call Parity Where the Underlying Asset Has a 
Continuous Net Carry Rate, C — P < S — Xe~ rT 


Trades 

Initial Investment 

Early Exercise at t 

Value on Day T 

S T <X S t >X 

Buy asset 

-S 

S t e iT 

S T e iT 

S T e iT 

Buy put option 

-P 

Pt 

X-S T 

0 

Sell call option 

C 

-(St - X) 

0 

-(Sr - X) 

Sell risk-free bonds 

Xe rt 

_ Xe -hr-<) 

-X 

-X 

Net portfolio value 

-S- P + Xe rT + C 

Stie-* 1 - 1) + P, + X[1 - e- r T-0] 

S T (e iT - 1) 

Sr(e iT - 1) 
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Table 7 No-Arbitrage Price Relations For European- and American-Style Options Where the Underlying Asset 
Has a Continuous Net Carry Rate 


Description 

European-Style Options 

American-Style Options 

Lower price bound for call 
Lower price bound for put 
Put-call parity relation 

c > max(0, Se~ lT — Xe~ rT ) 
p > max(0, Xe~ rT — Se~' T ) 
c — p = Se~ iT - Xe~ rT 

C > max(0, Se~ iT - Xe~ rT , S - X) 

P > max[0, Xe~ rT - Se ~ iT , X - S] 
S-X<C-P< Se~ iT - Xe~ rT , if z < 0 
Se~ iT -X<C-P < S- Xe~ rT , if i > 0 


Summary 

This completes the derivations of no¬ 
arbitrage price relations for European-style and 
American-style options on assets with a con¬ 
tinuous net carry rate. For convenience, a sum¬ 
mary of the no-arbitrage relations is provided 
in Table 7. 


DISCRETE FLOWS 

With the no-arbitrage price relations for an un¬ 
derlying asset with a continuous carry cost rate 
in hand, the focus now turns to developing the 
same set of relations for an asset that has interest 
cost modeled as a continuous rate but noninter¬ 
est costs/benefits modeled as a discrete flow. 
If the noninterest flow is income such as in the 
case of a cash dividend payment on a share of 
stock or a coupon payment on a bond, the in¬ 
come is represented as a positive value, that is. 
It > 0. If the flow is a cost such as, say, warehouse 
rent from storing an inventory of wheat, the in¬ 
come is represented as a negative value, that is, 
ft <0. Again, since this book deals primarily 
with financial assets, most of the illustrations 
will have I f discussed as being a positive value. 
Although I t represents a cash payment on any 


type of asset, we will call I t a dividend payment 
throughout this section for expositional conve¬ 
nience. 

Lower Price Bound of 
European-Style Call 

The lower price bound of a European-style call 
option on an asset that makes a single, discrete 
cash dividend payment during the option's life 
is 

c > max(0, S — Ite~ rt — Xe~ rT ) (13) 

In this relation, t f e~ rt is the present value of 
the promised dividend to be received at time 
f, where t < T. The arbitrage trading strategy 
that supports (13) is: sell the asset, buy a call, 
and buy risk-free bonds. The initial investment 
and terminal values are shown in Table 8. The 
first row in the table represents the short asset 
position. Today, we collect S, and, at the option's 
expiration, the short position must be covered 
at a cost of S f . Shorting an asset, however, re¬ 
quires that we pay any dividends on the under¬ 
lying asset. If we are short a stock and the stock 
pays a dividend, for example, we are obliged to 
pay the dividend out of our own pocket. Since 
the dividend is made during the option's life 


Table 8 Arbitrage Portfolio Trades Supporting Lower Price Bound of European-Style Call 
Option Where the Underlying Asset Pays a Discrete Cash Dividend, C — P < S — Xe~ rT 


Trades 

Initial Investment 

Cash Flow 
at t 

Value on Day T 

S T < X 

St>X 

Buy asset 

S 

-It 

—S T 

—St 

Buy call option 

—C 


0 

S T -X 

Buy risk-free bonds 

— Xe~ rT - I t e~ rt 

It 

X 

X 

Net portfolio value 

S - I,e~ rt - Xe~ rT - c 

0 

X-S T 

0 
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(i.e., t < T), the first row has a cash outflow of 
—I t paid on day t. The second row shows the 
long call position. On day f, the call is worth 
nothing if Sj < X and Sj — X if Sj > X. Finally, 
we buy some risk-free bonds. The amount nec¬ 
essary must be sufficient to cover the payment 
of the exercise price, X, on day T and the pay¬ 
ment of the cash dividend, I t , on day f, that is, 
— Xe~ rT — Ite~ rt . Since the portfolio is certain to 
have a nonnegative net value on day f, the net 
portfolio value today must be less than or equal 
to 0, which implies c > S — Ite~ rt — Xe~ rT . 

Lower Price Bound of 
American-Style Call 

A discrete cash dividend payment on the under¬ 
lying asset affects the early exercise behavior of 
American-style call options differently than in 
the continuous carry rate case. In the case of an 
American-style call written on a stock, it may 
be optimal to exercise either just prior to the ex- 
dividend date (when the stock price falls by It) 
or at expiration. Early exercise between today 
and the ex-dividend instant and between the 
ex-dividend instant and expiration are not op¬ 
timal because the call is worth more alive than 
dead. 11 The lower price bound of an American- 
style call is therefore the lower bound of a call 
expiring at the ex-dividend instant, max(0,S — 
Xe~ rt ), and the lower bound of the call expiring 
at expiration, max(0, S — I t e~ rt — Xe~ rT ). Com¬ 
bining these two results, 

c > max(0, S — Xe~ rt , S — Ue~ rt — Xe~ rT ) 

(14) 

Early Exercise of American-Style 
Call Options 

The last two terms on the right-hand side of 
(14) provide important guidance in deciding 
whether to exercise the American call option 
early, just prior to the ex-date. The second term 
in the parentheses is the present value of the 
early proceeds of the call. If this amount is less 
than the lower price bound of the call that ex¬ 


pires normally, that is, if 

S - Xe~ rt <S- I t e~ rt - Xe~ rT 

an American-style call will not be exercised 
early. To understand why, rewrite the expres¬ 
sion as 

It < X[1 - e - h T - f )] (15) 

The American-style call will not be exercised 
early if the cash flow (e.g., dividend or coupon 
payment) captured by exercising prior to the ex¬ 
date is less than the interest implicitly earned 
by deferring exercise from the ex-date until 
expiration. 

The logic underlying the relation (15) also 
applies to the case where there are multiple 
known dividends paid during the call option's 
life. Take a stock option, for example. If the zth 
dividend is less than the present value of the 
interest income that can be implicitly earned as 
a result of deferring the payment of the exercise 
price until the next dividend payment, that is, 
if 

h < X[1 - e - r(fi+1 - fi) ] (16) 

exercising just prior to the ith dividend pay¬ 
ment will not be optimal. This relation proves 
useful for simplifying the valuation of long¬ 
term stock options. The following example 
shows that dividend-induced early exercise on 
a long-term American-style call is most likely 
to occur just prior to the last dividend payment 
during the option's life. 

Let's identify whether an American-style call 
option with an exercise price of 50 and one year 
remaining to expiration may be exercised early 
just prior to any of the dividend payments. 
Assume that the stock pays a quarterly divi¬ 
dend of 0.50 in 70 days, 161 days, 252 days, and 
343 days. Assume the risk-free rate of interest 
is 5%. 

Whether or not the call may be exercised early 
depends on the amount of the dividend pay¬ 
ment in relation to the present value of the 
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interest income implicitly received by deferring 
the payment of the exercise price. For the first 
dividend, compute the values in expression (16) 
and find 

0.50 < 50[1 - e -0 05(161/365-70/365)j = Q. 6194 

Hence, the call will not optimally be exercised 
just prior to the first dividend payment. The 
same is true for the second and third dividend 
payments, as shown in the table below. 


Quarter 

Cash 

Dividend 

Days to 

Dividend 

Payment 

Years to 

Dividend 

Payment 

PV of 

Interest 

Income 

1 

0.50 

70 

0.1918 

0.6194 

2 

0.50 

161 

0.4411 

0.6194 

3 

0.50 

252 

0.6904 

0.6194 

4 

0.50 

343 

0.9397 

0.1505 


For the last dividend payment in 353 days, con¬ 
dition (13) is violated, that is, 

0.50 > 50[1 - e — 005(365—343V365] = Q.1505 

This implies that exercise just prior to the last 
dividend payment during this option's life may 
be optimal. 

Lower Price Bound of 
European-Style Put 

The lower price bound for the European-style 
put option is 

p > max(0,Xe~ rT — S + Ite~ rt ) (17) 

Again, the asset price is reduced by the present 
value of the promised cash dividend on the as¬ 
set. Unlike the call, however, the dividend pay¬ 
ment increases the lower price bound of the 
European-style put. Because the put option is 
the right to sell the underlying asset at a fixed 
price, a discrete drop in the asset price such as 
one induced by the payment of a dividend on 
a stock serves to increase the value of the op¬ 
tion. The arbitrage trades driving this relation 
are buy a put, buy a share of stock, and sell 
I t e~ rt + Xe~ rT risk-free bonds. 


Lower Price Bound of 
American-Style Put 

The lower price bound of the American-style 
put is 

P > max(0, Xe~ rt - S + l t e~ r \ X - S) (18) 

The second term on the right-hand side is the 
present value of the exercise proceeds if the put 
is exercised just after the dividend payment. 
This lower price bound is supported by the 
arbitrage trades listed above for the European- 
style put. The third term on the right is the 
exercise proceeds if the put is exercised imme¬ 
diately. If P < X — S, a costless arbitrage profit 
can be earned by buying the put and the asset, 
and then exercising the put. The arbitrage profit 
is X - S - P > 0. 


Early Exercise of American-Style 
Put Options 

The early exercise behavior induced by the dis¬ 
crete cash dividend on the asset is different for 
the American-style put that it was for the call. 
If the third term exceeds the second in (18), the 
put will not be exercised early prior to the pay¬ 
ment date. In that period the interest earned on 
the exercise proceeds of the option is less than 
the drop in the stock price from the payment 
of the dividend. For the third term to be larger 
than the second, that is, 

Xe~ rt - S + l,e~ rt > X - S 

it must be the case that 

It > X(e rt - 1) (19) 

In other words, if the amount of the dividend 
amount exceeds the interest income that will 
accrue on the cash received if the put is exer¬ 
cised immediately, the put will not optimally 
be exercised early. 

As in the case of the call, this argument can be 
generalized to handle the multiple dividends 
during the life of an American-style put. Again, 
consider a stock option. If the zth dividend is 
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greater than the interest that will accrue over 
the period, 

I f > X[e r{t ‘- t ‘- {) - 1] (20) 

the put will not be exercised before the dividend 
payment, as the illustration below shows. 

We'll use an example to identify whether an 
American-style put option with an exercise 
price of 50 and one year remaining to expira¬ 
tion may be exercised early just after any of 
the dividend payments. Assume that the stock 
pays a quarterly dividend of 0.50 in 70 days, 
161 days, 252 days, and 343 days. Assume the 
risk-free rate of interest is 5%. 

Whether or not the put may be exercised early 
depends on the amount of the dividend pay¬ 
ment in relation to the interest income that 
could be earned if the put were exercised im¬ 
mediately. For the first dividend, compute the 
values in expression (20), that is, 

0.50 > 50[e 0 05 ( 7 °/ 365 ) - 1] = 0.4818 

This implies that the put will not be exercised 
before the first dividend payment in 70 days. 
The computation for the second dividend is 

0.50 > 50[e 005 ( 161 /365-70/365) _ !] = 0.6272 

This implies that the put may be exercised in 
the period between the first and second divi¬ 
dends. The same is true between the second and 
third dividends, and the third and fourth divi¬ 
dends, as indicated below. Early exercise after 


the fourth dividend is paid may also be optimal 
since no more dividends are paid during the 
option's life. 


Quarter 

Cash 

Dividend 

Days to 

Dividend 

Payment 

Years to 

Dividend 

Payment 

Accrued 

Interest 

1 

0.50 

70 

0.1918 

0.4818 

2 

0.50 

161 

0.4411 

0.6272 

3 

0.50 

252 

0.6904 

0.6272 

4 

0.50 

343 

0.9397 

0.6272 


Put-Call Parity for European-Style 
Options 

Put-call parity for European-style options on 
assets with discrete noninterest cash flows is 

c — p = S — I t e~ rt - Xe~ rT (21) 

To see this, assume the left-hand side of (21) is 
less than the right-hand side. If such is the case, 
an arbitrage profit can be made by selling the as¬ 
set, selling the put, buying the call, and buying 
some risk-free bonds. The arbitrage is shown in 
Table 9. On day f, the net portfolio value is cer¬ 
tain to be 0. The same is true on day f, when the 
cash dividend is made. Thus the value at time 
0, S — I t e~ rt — Xe~ rT + p — c, represents the ar¬ 
bitrage profit and is positive if the left-hand side 
of (21) is less than the right-hand side. Since the 
market cannot be in equilibrium, arbitrage will 
continue until the net portfolio value goes to 0. 
When it does, the market is in equilibrium and 
(21) holds. 


Table 9 Arbitrage Portfolio Trades Supporting European-Style Put-Call Parity Where the 


Underlying Asset Pays 

a Discrete Cash Dividend, c 

- p = S — I t e rl 

- Xe~ rT 




Cash Flow 

Value on 

Day T 

Trades 

Initial Investment 

at f 

S T <X 

S T >X 

Sell asset 

S 

-I, 

—S T 

-S t 

Sell put option 

P 


-(X-Sr) 

0 

Buy call option 

—C 


0 

S T -X 

Buy risk-free bonds 

— Xe~ rT - I t e~ rt 

I, 

X 

X 

Net portfolio value 

S - I,e~ rt - Xe~ rT +p-c 

0 

0 

0 










452 


Derivatives Valuation 


Table 10 Arbitrage Trades Supporting American-Style Put-Call Parity Where the Underlying Asset Pays a 
Discrete Cash Dividend, S — Ite~ rt — X < C — P 




Ex-Day 
Value (t) 

Put Exercised Early, 
Intermediate 

Put Exercised Normally, 
Terminal Value (T) 

Trades 

Initial Value 

Value (t) 

Sr < X 

S T < X 

Buy call 

-C 


c r 

0 

c/v 

H 

1 

X 

Sell put 

P 


-(X-S r ) 

E-i 

icn 

1 

X 

1 

0 

Sell asset 

s 

-It 

-Sr 

—Sj 

— St 

Buy risk-free bonds 

—I t e~ rt - X 

It 

Xe rr 

Xe rT 

Xe rT 

Net portfolio value 

-C + P + S 
- I t e~ rt - X 

0 

C T + X(e rr - 1) 

X(e rT - 1) 

X(e rT - 1) 


Put-Call Parity for American-Style 
Options 

The put-call parity for American-style options 
on assets with discrete cash dividends is 

S - I t e~ rt - X < C - P <S- I t e~ rt - Xe~ rT 

( 22 ) 

To understand why, we consider each inequal¬ 
ity in (22) in turn. The inequality on the left 
can be derived by considering the values of a 
portfolio that consists of buying a call, selling a 
put, selling the stock, and buying X + Ite~ rt in 
risk-free bonds. Table 10 contains these trades 
as well as the net portfolio value. 

In Table 10, we see that, if all positions stay 
open until expiration, the net portfolio value 
is positive independent of whether the termi¬ 
nal asset price is above or below the exercise 
price of the options. If the terminal asset price 
is above the exercise price, the call option is ex¬ 
ercised, and the asset acquired at exercise price 


X is used to deliver, in part, against the short as¬ 
set position. If the terminal asset price is below 
the exercise price, the put is exercised. The as¬ 
set received in the exercise of the put is used to 
cover the short stock position. In the event the 
put is exercised early at time r, the investment 
in the risk-free bonds is more than sufficient 
to cover the payment of the exercise price, and 
the asset received upon delivery can be used to 
cover the short asset position. In addition, the 
call position remains open and has a nonnega¬ 
tive value. In other words, the combination of 
securities described in Table 10 will never have 
a negative future value. And, if the future value 
is certain to be nonnegative, the sum of the ini¬ 
tial investment column must be nonpositive. In 
the absence of costless arbitrage opportunities, 
the left-hand inequality of (22) must hold. 

The right inequality of (19) maybe derived us¬ 
ing the same portfolio used to prove European- 
style put-call parity. Table 11 contains the 


Table 11 Arbitrage Trades Supporting American-Style Put-Call Parity Where the Underlying Asset Pays a 
Discrete Cash Dividend, C — P < S — I t e~ rt — Xe~ rT 




Ex-Day 
Value (t) 

Call Exercised Early, 

Call Exercised 
Normally, Terminal 
Value (T) 

Trades 

Initial Value 

(t) 

S T < X 

S T < X 

Sell call 

C 


-(Sr - X) 

0 

~(S T - X) 

Buy put 

-P 


Pr 

X 

1 

era 

0 

Buy stock 

-s 

I, 

S T 

— St 

St 

Sell risk-free bonds 

1 

+ 

x 

cc, 

4 

-It 

-Xe- r < r - r > 

-X 

-X 

Net portfolio value 

C-P-S 
+ I,e~ rt + X 

0 

Pr + X(1 - edr-U) 

0 

0 
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Table 12 No-Arbitrage Price Relations For European- and American-Style Options on Assets Where the 
Underlying Asset Pays a Discrete Cash Dividend 


Description 

European-Style Options 

American-Style Options 

Lower price bound for call 
Lower price bound for put 
Put-call parity relation 

c > max(0,S - I t e~ rt - Xe~ rT ) 
p > max(0,Xe - ' r — S + Ite~ rl ) 
c — p = S — I,e~ rt - Xe~ rT 

c > max[0,S - Xe~ rt , S - I t e~ rt - X] 
P > max(0,X - S, Xe~ rt -S + I t e~ rl ) 
S - I,e~ rt - X < C - P 
< S - I t e~ rl - Xe~ rT 


arbitrage portfolio trades. In this case, the net 
portfolio value at expiration is certain to be 
0 should the option positions stay open until 
that time. In the event the American call option 
holder decides to exercise early, the portfolio 
holder delivers his share of stock, receives cash 
in the amount of the exercise price, and then 
uses the cash to retire his outstanding debt. Af¬ 
ter these actions are taken, the portfolio holder 
still has an open long put position and cash in 
the amount of X[1 — e" r(T ~ r *]. Since the portfo¬ 
lio is certain to have nonnegative outcomes, the 
initial value must be negative or the right-hand 
inequality of (22) must hold. 

Summary 

This completes our derivations of arbitrage re¬ 
lations for European-style and American-style 
options on assets with discrete cash dividends. 
Options on dividend-paying stocks and on 
coupon-bearing bonds fall into this category. 
Before proceeding with a discussion of arbi¬ 
trage relations for futures options, we summa¬ 
rize our results in Table 12. 

NO-ARBITRAGE FUTURES 
OPTIONS RELATIONS 

A futures option is like an asset option, except 
that if the option is exercised, a futures po¬ 


sition is entered. Exercising a call option on 
a futures contract, for example, means that 
the holder will receive a long position in the 
futures at a price equal to the exercise price of 
the call. 

Developing the lower bounds and put-call 
parity for European- and American-style fu¬ 
tures options follows directly from the previous 
discussions. All we need to do is recall the pre¬ 
paid version of the net cost of carry relations for 
futures: Fe~ rT = Se~ ,T where noninterest costs 
are modeled as a continuous rate, and Fe~ rT = S 
— Ie~ rt where noninterest costs are modeled as a 
discrete flow. Substituting Fe~ tT = Se~ lT into the 
no-arbitrage price relations summarized in Ta¬ 
ble 7 or Fe~ rT — S — Ie~ u in the relations summa¬ 
rized in Table 12 produces the no-arbitrage price 
relations for futures options summarized in Ta¬ 
ble 13. The arbitrage portfolios supporting each 
of these relations are the same as those used to 
derive the relations for the asset throughout the 
entry. 

NO-ARBITRAGE 
INTERMARKET RELATIONS 

In many cases, both asset options and futures 
options trade concurrently. The Chicago Board 
Options Exchange, for example, lists options 
on the S&P 500 index, while the Chicago Mer¬ 
cantile Exchanges lists options on the S&P 500 


Table 13 No-Arbitrage Price Relations For European- and American-Style Options on Futures Contracts 


Description European-Style Options American-Style Options 

Lower price bound for call c > max[0, e~ rT (F — X)] C > max(0,F — X) 

Lower price bound for put p > max[0, e~ rT (X — F)] P > max(0, X — F) 

Put-call parity relation c — p = e~ rT (F — X) Fe~ rT — X < C — P < F — Xe~ rT 
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futures (which, in turn, is written on the S&P 
500 index). The prices of asset options are inex¬ 
tricably linked to the prices of futures options. 
Under the assumption that the futures and op¬ 
tions expire simultaneously and that the exer¬ 
cise prices of the asset and futures options are 
the same, a number of no-arbitrage price rela¬ 
tions may be derived. Next we present such re¬ 
lations for European-style and American-style 
options. 

European-Style Options 

The price of a European-style asset option is 
equal to the price of the corresponding futures 
option, that is. 


c(S) = c(F) 

(23a) 

P(S) = p(F) 

(23b) 


The reason is that at expiration the payoffs of 
the asset option and the futures option are iden¬ 
tical. Suppose, for the sake of illustration, that 
the price of a call on a futures exceeds the price 
of a call on an asset. In such a situation, cost¬ 
less arbitrage profits may be earned by buying 
the asset call and selling the futures call, as is 
shown in Table 14. The long asset option posi¬ 
tion pays nothing at expiration if the terminal 
asset price is less than the exercise price and 
pays St — X if the terminal asset price exceeds 
the exercise price. At the same time, the short 
futures option position expires worthless at ex¬ 
piration if the terminal futures (asset) price is 
less than the exercise price and costs —(Ft — X) 
if the terminal futures (asset) price exceeds the 
exercise price. But, since Ft — St, the net port¬ 


folio value is certain to be zero. A portfolio that 
is certain to pay nothing on day T must cost 
nothing. Hence, in the absence of costless ar¬ 
bitrage opportunities, European-style asset op¬ 
tions and European-style futures options have 
the same price. 

American-Style Options 

The relation between the price of an American- 
style asset option and the price of the corre¬ 
sponding futures option depends on whether 
the futures price is greater than the asset price 
or not. If F > S, 


C(S) < C(F) 

(24a) 

P(S) > P(F) 

(24b) 


To see this, consider the American-style call op¬ 
tions. Since both the call on the futures and the 
call on the asset may be exercised early, we can 
compare the early exercise proceeds to establish 
which has greater value. The call on the asset 
has immediate early exercise proceeds of S — 
X and the call on the futures has early exercise 
proceeds of F — X > S — X. Thus as long as there 
is some chance of early exercise, the call on the 
futures is worth more than the call on the asset 
and the put on the asset is worth more than the 
put on the futures. 

For cases where futures price is less than the 
asset price, the opposite results hold, that is. 


C(S) > C(F) 

(25a) 

P(S) < P(F) 

(25b) 


Table 14 Arbitrage Portfolio Trades Demonstrating the Equivalence of Prices of European-Style 
Call Options on an Asset and a Futures, c(F) = c(S) 





Value on Day T 

Trades 

Initial Investment 

S T <X 

S T >X 

Buy call option on asset 

— c(S) 

0 

S T -X 

Sell call option on futures 

c(F) 

0 

-(F t -X)= -(St - X) 

Net portfolio value 

c(F) - c(S) 

0 

0 
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Table 15 No-Arbitrage Relations Between the Prices 
of Asset Options and Futures Options 



European-Style 

American-Style 

Description 

Options 

Options 

Call 

c(S) = c(F ) 

C(S)<C(F) iff >S 
C(S) > C(F) if f < S 

Put 

p(S)=p(F) 

P(S) > P(F) if f > S 
P(S) < P(F) if F < S 


The previous arbitrage argument is merely re¬ 
versed. Table 15 summarizes the results. 


KEY POINTS 

* Under the assumption that no costless ar¬ 
bitrage (i.e., free money) opportunities are 
available in the marketplace, no-arbitrage 
price relations for European- and American- 
style options can be developed. 

* The net cost of carry of the underlying asset 
plays an important role. Consequently, it is 
necessary to model interest cost as a constant 
continuous rate and the noninterest cost as a 
continuous rate or as a discrete flow, depend¬ 
ing on the nature of the underlying asset. 

For options on stock indexes, currencies, 
and some commodities, the continuous rate 
assumption is most appropriate. For options 
on stocks, bonds, and other commodities, the 
discrete flow assumption is preferred. 

• With the assumptions regarding net cost of 
carry, lower price bounds, put-call parity 
price relations, and intermarket price rela¬ 
tions can be derived for both European-style 
and American-style options on an asset and 
on a forward/futures. 

• For American-style options, there is always 
the prospect of early exercise. Under certain 
circumstances regarding the cost of carry, the 
holder of an American-style call option would 
never (rationally) exercise early. In the case of 
an American-style put, there is always some 
prospect of early exercise, so the American- 
style put is always worth more than the 
European-style put. 


• Perhaps most important is the no-arbitrage 
price relation between the price of a put and 
the price of a call. This relation, called the 
put-call parity relation, arises from simulta¬ 
neous trades in the call, the put, and the asset. 

• With respect to intermarket price relations, 
the prices of asset options are inextricably 
linked to the prices of futures options. Under 
the assumption that the futures and options 
expire simultaneously and that the exercise 
prices of the asset and futures options are the 
same, a number of no-arbitrage price relations 
may be derived. 

NOTES 

1. European-style options can be exercised only 
on expiration day, while American-style op¬ 
tions can be exercised at any time up to and 
including the expiration day. Both types 
of options are traded on exchanges and in 
OTC markets. 

2. Under the continuous cost of carry rate as¬ 
sumption, the continuously paid income 
received from holding the asset is immedi¬ 
ately reinvested in more units of the asset, 
so that e~ tT units on day 0 grows to one unit 
on day T. For a short asset position, the re¬ 
verse applies in the sense that our liability 
(in terms of number of units owed) grows 
at rate i. 

3. It is also worthwhile to note that the 
lower price bound of the call can be re¬ 
expressed relative to the forward/futures 
prices. The net cost of carry relation for 
forwards/futures prices is fe~ rT — Se~ tT . 
Substituting the cost of carry relation into 
(4), c > max(0 ,fe~ rt — Xe~ rT ). 

4. The distinction between value and price is 
subtle, but important. A price is what we ob¬ 
serve for the security in the marketplace; a 
value is what we believe a security is worth. 
If the value exceeds the price, the security 
is underpriced, and, if the value is less than 
the price, the security is overpriced. 

5. Note that we are not making any judgment 
on whether the call price is too high or too 
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low per se. We are saying only that the call 
is incorrectly priced (in this case it is priced 
too low) relative to the price of the underly¬ 
ing asset. To execute the arbitrage, we must 
trade both the call and the underlying asset, 
so that we make money when their prices 
come back into line relative to each other. In 
this example, the prices come back into line 
with each other for certain at the option's 
expiration. 

6. To exit a long position in an American-style 
call option, we have three alternatives. First, 
we can hold it to expiration, at which time 
we will (a) let it expire worthless if it is out 
of the money or (b) exercise it if it is in the 
money. Second, we can exercise it immedi¬ 
ately, receiving the difference between the 
current asset price and the exercise price. 
Third, we can sell it in the marketplace. 
There is, after all, an active secondary mar¬ 
ket for standard calls and puts. 

7. This point was first demonstrated by Mer¬ 
ton (1973) for call options on nondividend¬ 
paying stocks. He refers to such options are 
being worth more "alive" than "dead." 

8. In the expression on the right-hand side of 
(10), the third term is greater than the sec¬ 
ond term over some range for S, indepen¬ 
dent of the level of i. 

9. The term, "put-call parity," was first coined 
by Stoll (1969) in the first academic study to 
develop and test the relation. 


10. If we buy a put option, we pay the premium 
today for the right to sell the underlying 
asset at the exercise price. If we sell the put, 
we collect the premium today but have the 
obligation to deliver the asset and receive 
the exercise price if the put option buyer 
chooses to exercise. 

11. By not exercising in the period prior to 
ex-dividend, the call option holder enjoys 
the benefits of implicitly earning interest on 
the dividend and the exercise price of the 
call. By not exercising after the ex-dividend 
date but before expiration, the call option 
holder enjoys the benefit of implicitly earn¬ 
ing interest on the exercise price of the 
call. 
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Abstract: Contingent claims are a tool for valuing securities and for analyzing the effects of risky fi¬ 
nancial decisions. Contingent claims analysis can be used to value any kind of financial instrument, 
including such apparently exotic instruments as put and call options and convertible securities. 
Contingent claim analysis defines risky outcomes relative to states of the world, and uses claims to 
represent and value state outcomes. Thus given a definition of risky states, all financial instruments 
and arrangements can be represented as combinations of contingent claims on those states. Theo¬ 
retically complete markets assume claims can be traded on every state of the world, but in practice 
markets are not likely to be complete at any point in time. Since in practice market incompleteness 
will inhibit certain risk management strategies, in so doing it also provides incentives to create new 
instruments that can be used to manage and to value claims on additional states of the world. 


Contingent claims analysis is used in financial 
modeling to value any financial instrument, in¬ 
cluding such apparently exotic instruments as 
put and call options and convertible securities. 
In this entry, we discuss this important tool. 
We begin by explaining the notion of states of 
the world, a way of classifying risky outcomes 
whose value can then be represented using con¬ 
tingent claims. After providing examples of val¬ 
uation using contingent claims, we introduce 
the concept of incomplete markets and consider 
its importance for modeling real-world finan¬ 
cial arrangements. We then examine some fi¬ 
nancial instruments and arrangements that can 
be used to trade or to manage risks. 


STATES OF THE WORLD 

The idea of states of the world is useful for think¬ 
ing about convenient ways to model risky pay¬ 
offs. In a two-time-point model, states of the 
world are defined as those future events that 
matter to the decision problem being consid¬ 
ered. These states of the world are defined by 
the decision maker to be mutually exclusive 
and collectively exhaustive. Using an example 
given by Savage (1951), if one is about to break 
a ninth egg into a bowl already containing eight 
other eggs, the relevant states of the world could 
be whether the ninth egg is rotten and would 
hence spoil the others. (Here we presume the 
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rottenness of an egg is not discernible until the 
egg has been broken and fallen into the bowl.) 

In a second example more closely related to 
finance, an investor might be concerned with 
the future price of a share of stock, and this 
price might in turn depend on economic condi¬ 
tions. Suppose the investor defines (1) "states" 
to represent economic conditions, and (2) "fu¬ 
ture prices" to be the following list of possible 
share prices that may obtain at the time a given 
state is actually realized: 

State Future Price 

1 $10 

2 $8 

3 $6 

For example, state 1 might mean that the in¬ 
dustry in which the firm operates faces buoyant 
market conditions; state 2, conditions that are 
neither good nor bad; and state 3, conditions 
that are depressed. In each state, the effect is 
registered on the stock price. 

We shall usually associate probabilities with 
the states; for example, p, might represent the 
probability that state i will actually occur; that 
is, i = 1, 2, 3. Because the states are mutually 
exclusive, only one can actually occur; because 
they are collectively exhaustive, one of the three 
must occur. Hence E, p, = 1. 

Note that although in this chapter we make 
less use of multiperiod models using contin¬ 
gent claims, we can also define states at differ¬ 
ent points in time, for example, the states of the 
world at different times. 


CONTINGENT CLAIMS AND 
THEIR VALUE 

A unit contingent claim is a security that will pay 
an amount of $1 if a certain state of the world 
is actually realized, but nothing otherwise. A 
claim that pays $1 if state i is realized is fre¬ 
quently called a unit claim on state i. A unit 
contingent claim is also referred to as a primary 
security or Arrow-Debreu security (so named after 


the economists who introduced them—Arrow 
[1964] and Debreu [1959]). 

Accordingly, the future stock price described 
earlier may be regarded as equivalent to a pack¬ 
age containing all of the following: 

Ten unit claims on state 1 
Eight unit claims on state 2 
Six unit claims on state 3 

The idea of a contingent claim is thus useful 
for expressing, in terms of fundamental units, 
exactly what a given security's payoff may be 
in different possible states of the world. 

It may take a little imagination to come 
up with real-world examples of claims, and 
those real-world examples are not numerous. 
(A ticket to win on a horse race is an example 
of a claim; a fire insurance policy is another. 
One example of a unit claim is an option that 
pays off $1 if the value of some underlying asset 
exceeds a fixed dollar value.) But packages of 
unit claims represent perfect substitutes for the 
more ordinary types of securities such as stocks 
or bonds, and we shall frequently find it use¬ 
ful to employ claims to help understand price 
relations between securities. For example, if we 
assume a perfectly competitive financial mar¬ 
ket along with a description of future events in 
terms of states of the world, certain price re¬ 
lationships between securities and contingent 
claims must obtain. This means in turn that cer¬ 
tain predictable relationships between securi¬ 
ties prices must also obtain. 

To see these relationships, suppose that we 
can describe the world using two states and 
that two stocks are available, stock A and stock 
B. We assume the stocks' future prices have the 
following distributions: 

Stock A Stock B 

1 $10 $7 

2 $8 $9 

Let A(0) = denote the time 0 price of stock 
A and B(0) = the time 0 price of stock B, and 
suppose these prices admit no arbitrage oppor¬ 
tunities. Now if we let Ci and C 2 represent the 
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time 0 prices of unit claims on states 1 and 2, we 
can use the foregoing information about stock 
prices and payoffs to find the time 0 prices Ci 
and C 2 . Purchasing stock A for $6 is equivalent 
to buying a package of 10 unit claims on state 
1 and 8 unit claims on state 2, while buying 
stock B for $5 is equivalent to buying a package 
of 7 unit claims on state 1 and 9 unit claims on 
state 2. Since the unit claims comprising the two 
stocks are perfect substitutes, they must sell for 
the same prices in a perfect market. Hence we 
can write 


10Ci + 8C 2 = $6 
7Ci + 9C 2 = $5 


which can be solved to obtain 


Ci 



c 2 = $ 


4 

17 


We can use the same reasoning to find the 
risk-free rate of return that must obtain in this 
market. Since a risk-free instrument is one that 
offers the same payoff irrespective of which 
state of the world obtains, we wish to find a 
combination of the two stocks that gives the 
same time 1 payoff, here denoted k, in either 
state of the world. That is, the following equa¬ 
tion must be solved for a: 


a 



+ (!—«) 



We can write the payoff k as equal to either of 
the following payoffs: 

10a + 7(1 — a) = 8a + 9(1 — a) 


which implies that 


2a = 2(1 — a) 


so that a = \. The riskless payoff is then 
|(10) + 1(7) = $8.50, and this can be obtained 
for a price equal to |(6) + |(5) = $5.50, since 
a portfolio composed of equal proportions of 
the two stocks creates the riskless investment. 
Accordingly, the risk-free rate of return is 


$8.50 - $5.50 
$5.50 


6 

IT 


54.55% 


Of course, this is not necessarily a realistic 
number for a risk-free rate of interest. (Whether 
it is realistic or not depends on the length of 
the time period under consideration, a matter 
we have left unspecified.) However, our pur¬ 
pose here is to develop illustrative calculations 
to display relations between contingent claims, 
and for this purpose particular sizes of numbers 
are not really important. 

Another way of making a riskless investment 
is to buy one of each available unit claim, that 
is, one claim on state 1 and one claim on state 2. 
Such a portfolio gives a certain payoff of $1 for 
an investment cost of 


$ 


4 

17 



= $ 


11 

17 


The rate of return on this investment is then 


$1 - $ 


HI 
$ — 
17 


11 

17 


17-11 

IT 


6 

IT 


54.55% 


just as before. 


INVESTOR'S UTILITY 
MAXIMIZATION IN 
CONTINGENT CLAIMS 
MARKETS 

In this section, we describe how an investor may 
solve the utility maximization problem when 
facing risk in a market for contingent claims. For 
our illustration, we shall continue with stocks 
A and B from the previous section. Further, we 
shall assume the investor's initial wealth to be 
$600. This scenario is summarized in Table 1. 
We let W\ represent wealth if state 1 occurs and 
correspondingly for w 2 , and we may plot these 
data in (w\, w 2 ) space, as shown in Figure 1 . 
Note that the previously determined riskless 
position of dividing the purchases to obtain an 
equal number of each security (54.5 of each) is 
also shown and generates a riskless terminal 
wealth position of W\ = w 2 = $926.50. 
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Table 1 Summary of Terminal Wealth in Two States 



No. of Shares 

Terminal Wealth 


Purchased 

State 1 

State 2 

Purchases A only 

too 

$1,000 

$ 800 

Purchases B only 

120 

$ 840 

$1,080 


We can also use another way to calculate the 
value of the claims' combinations at time 1. We 
can write the equation of the straight line in 
Figure 1 as 

u >2 — a — bw\ 

so that for the time 1 price of stock A we have 
$800 = a- $1,0001? 

while for the time 1 price of stock B we have 
$1,080 = a = $840b 

Solving these two simultaneous equations, 
we find b = 0.175 and a = $2,550. Thus, when 
zvi = 0, W 2 = $2,550, while when 102 = 0, W\ = 
$1,457, which are the two intercepts of the line 
on their respective axes in Figure 1. 

W\ 

3000 - 



Now if iv 2 = 0, we have the case of a claim 
(primary security) on state 1. (The security pays 
$1,457 in state 1 and nothing otherwise.) The 
price of this claim can be calculated by divid¬ 
ing initial wealth by the maximum wealth ob¬ 
tained if state 1 occurs, or $600/$l,457 = 0.41 
(= y^). Similarly, the price of primary security 
2 is $600/$2,550 = 0.24 (= ^), and our earlier 
results are confirmed. 

Note that in Figure 1 the investor's time 1 
position is some point on the line from A to 
B. Flow could the investor obtain a terminal 
wealth position lying beyond these points? The 
investor could engage in short sales, that is, sell¬ 
ing shares not currently owned, for delivery 
when the unknown future state of the world 
is revealed. In this transaction the investor ob¬ 
tains cash from the time 0 sale of one security 
and uses it to buy the other. In so doing, the 
investor promises later to buy the security sold 
short at whatever price will be prevailing and 
deliver it. Note that there is a potential for large 
gains or losses in such transactions. Flere the 
initial wealth will be used as a constraint and 
we shall require that at worst the investor will 


Riskless Portfolio 


(1457,0) 



Figure 1 Market Opportunity Line, Showing Implied Prices of Unit Claims. 
Note: u>i represents wealth if state 1 occurs and correspondingly for iv 2 . 
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have zero terminal wealth if he or she guesses 
incorrectly. That is, no net borrowing is permit¬ 
ted at the end of the period so that the investor 
cannot go beyond the intercepts in Figure 1. 

To illustrate, consider point W\ = $1,457, w 2 = 
0. Let ha be the number of shares of stock A and 
n /< the number of shares of stock B purchased. 
If state 1 occurs, the terminal wealth will be 

10 riA + 7n B — $1,457 
while if state 2 occurs, we must have 
8 riA + 9 n B = 0 

Solving these equations simultaneously, we 
find n B = 343. If the investor sells short 343 
shares of stock B at the current price of $5, he 
or she will receive $1,715. Combining this with 
the initial wealth of $600 gives $2,315, so this 
investor may buy $2,315/$6 = 386 ha at $6 per 
share. If state 1 eventuates, the investor will re¬ 
ceive $3,860 ($10 for the 386 shares) but now 
must pay $2,401 ($7 for 343 shares) for stock B 
shares to cover the short position. The net ter¬ 
minal wealth is $3,860 — $2,401 = $1,459 (dif¬ 
ference due to rounding), as required. In state 2, 
the terminal wealth will be equal to $3,088 (386 
shares times $8 per share) reduced by the cost 
to repurchase stock B to cover the short posi¬ 
tion, 343 shares at $9 per share or $3,087. There¬ 
fore, the net terminal wealth is equal to zero 
(the calculations show it is $1 but that is due to 
rounding). 

Note that none of the points we have con¬ 
sidered will necessarily be a utility-maximizing 
point. To determine this point, it is necessary 
to know the investor's utility function in (w\, 
W 2 ) space. The optimal portfolio for the investor 
satisfies the tangency condition that the slope of 
the wealth constraint (the ratio of the prices of 
the unit claims) equals the slope of the indif¬ 
ference curve (marginal rate of substitution of 
state 1 consumption for state 2 consumption). 

The point of the foregoing demonstration is 
to show first that every security can be viewed 
as a bundle of unit claims and thus represents 
a combination of positions regarding future 


states of the world. Moreover, in these circum¬ 
stances an investor can attain any point along 
the market opportunity line. If, on the other 
hand, there are fewer securities than the num¬ 
ber of distinct states, the individual's optimal 
consumption choices may not be attainable. 
The significance of this will be explored in the 
next section. 

Although we do not discuss it here, the real 
power of the contingent claim analysis is in pro¬ 
viding the basis for valuing complex financial 
instruments and financial arrangements. 

INCOMPLETE MARKETS FOR 
CONTINGENT CLAIMS 

A market is said to be a complete market when 
economic agents can structure any set of fu¬ 
ture state payoffs by investing in a portfolio 
of unit contingent claims (i.e., primary securi¬ 
ties). A financial market is said to be incomplete 
if the number of (linearly) independent secu¬ 
rities traded in it is smaller than the number 
of distinct states of the world. Clearly, market 
incompleteness depends on how states of the 
world are defined. However, since the number 
of states of the world needed to describe a typi¬ 
cal financial market is likely to be large, the pos¬ 
sibility that real-world financial markets will be 
incomplete is a very real one. 

The importance of market incompleteness is 
best introduced by an example. Let us consider 
an economy with three possible states of the 
world and suppose only two securities (taking 
the form of unit claims for ease of exposition) 
are traded in it. We describe the securities in 
terms of their time 1 market value, for each 
state of the world, as in Table 2. It is appar¬ 
ent from the table that weighted averages of 
the two unit claims can be used to create pack¬ 
ages with time 1 distributions of values ranging 
between zero and unity, the actual outcome de¬ 
pending on whether state 1 or state 2 obtains. 
However, an investor cannot create an income 
claim of other than zero in state 3 by using 
just the existing two unit claims. Moreover, no 
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Table 2 Market Values of Two 
Securities at Time 1 



States of the World 

Security 

12 3 

1 

1 0 0 

2 

0 10 


investor can arrange a risk-free investment in 
this example, because it is not possible to guar¬ 
antee the same return in every state of the world 
by using just the available securities. 

The situation is quite different if a third unit 
claim worth $1 in state 3 and zero in the other 
states becomes available. Now the number of 
claims equals the number of distinct states, and 
a risk-free investment can now be arranged. 

We are now ready to discuss some practi¬ 
cal implications of market incompleteness. It 
is obvious from the foregoing example that in¬ 
vestor choice is restricted in incomplete mar¬ 
kets. Moreover if investor choices are restricted, 
the investors will never be better off, and are 
likely to be worse off, than would be the case 
if markets were complete (i.e., if the restrictions 
were removed). In such situations, it is to be 
expected that if ways of completing the market 
can be found, those possibilities are likely to 
be utilized. That is, in the context of incomplete 
financial markets the appearance of new instru¬ 
ments might be regarded as attempts to provide 
investors with financial opportunities not oth¬ 
erwise available. The appearance of derivatives 
(options, futures, and swaps) might be exam¬ 
ples of such attempts. Mossin (1977) argues that 
the preference existing firms show for organiz¬ 
ing new activities as separate corporations may 
be another indication of attempts to deal with 
market incompleteness. 

FINANCIAL INSTRUMENTS 
AS CONTINGENT CLAIMS 

Most financial instruments can be bought or 
sold, but not all of them are actively traded in 


financial markets. For example, a common form 
of contingent claim (and one that is close in 
concept to a unit claim) is a lottery ticket. In 
its simplest form this claim results in its holder 
winning either a positive prize or zero. Accord¬ 
ingly, this lottery ticket represents a claim that 
can be valued using two states of the world. Ob¬ 
viously, if a lottery has several different prizes, 
several states of the world may need to be de¬ 
fined in order to describe it completely. But lot¬ 
tery tickets, once issued, are rarely traded again. 
The same is true of such other contingent claims 
as the tickets obtained when betting on horse 
races or similar contests. 

An insurance policy is a contingent claim that 
comes closer to our usual notions of a financial 
instrument, but again it is rarely traded in the 
financial markets. On the other hand, put or 
call options, representing contingent claims for 
selling or buying securities or financial indexes 
at prespecified prices, trade actively on such 
organized exchanges. Rights and warrants are 
other examples of contingent claims in that they 
permit, but do not require, the holder to buy 
securities on prespecified terms. 

There are also securities that have embed¬ 
ded derivatives in them, derivatives that are not 
traded separately from the instrument itself. For 
example, a callable bond is a bond that grants the 
issuer the right to redeem the bond at some time 
in the future and at a specified price. That is, a 
callable bond can be viewed as a straight bond 
with an embedded call option granted to the 
issuer. A putable bond is a bond that grants the 
investor the right to sell (i.e., put) the bond to the 
issuer in the future at a specified price. Flence, 
the bond structure can be viewed as a straight 
bond with an embedded put option. Convert¬ 
ible securities, which include convertible bonds 
or convertible preferred stocks, represent con¬ 
tingent claims in that they typically allow the 
owner to exchange the original issue for other 
securities, usually common stock, and they are 
callable. Some convertible securities even in¬ 
clude an embedded put option. 






Introduction to Contingent Claims Analysis 


463 


KEY POINTS 

• Contingent claims analysis and contingent 
strategies are tools for dealing with risk in 
financial decision making. 

• Contingent claims analysis uses the notion of 
states of the world in assessing future risky 
payoffs. 

• A unit contingent claim (also known as a pri¬ 
mary security or Arrow-Debreu security) is a 
security that has a payoff of $1 if a certain state 
of the world is actually realized, but nothing 
in all other states. 

• A contingent claim that pays off $1 if state i is 
realized is also referred to as a unit claim on 
state i. 

• Although few unit contingent claims exist in 
reality, claims represent a useful tool to em¬ 
ploy in valuing securities and in understand¬ 
ing relations among them. 

• An investor may solve the utility maximiza¬ 
tion problem when facing risk in a market for 
contingent claims. 

• Using contingent claims analysis, an investor 
can obtain a terminal wealth position beyond 
what can be obtained by simply buying secu¬ 
rities with initial wealth by engaging in short 
sales (i.e., selling shares not currently owned, 
for delivery when the unknown future state 
of the world is revealed). The outcomes in this 
case are more risky than they would be in the 
absence of short selling. 


• Every security can be viewed as represent¬ 
ing a bundle of unit claims and thereby fur¬ 
ther represents a combination of positions 
(long and short) regarding future states of the 
world. 

* If the number of (linearly) independent se¬ 
curities traded is smaller than the number of 
distinct states of the world, the financial mar¬ 
ket is said to be incomplete. 

• Because the number of states of the world 
necessary to describe a well-functioning fi¬ 
nancial market is likely to be large, the pos¬ 
sibility that real-world financial markets will 
be incomplete is a very real one. 

* Although most financial instruments repre¬ 
senting contingent claims can be bought or 
sold, there are financial instruments or finan¬ 
cial arrangements that are not actively traded 
in financial markets. 
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Abstract: The most popular continuous-time model for option valuation is based on the Black- 
Scholes theory. Although certain drawbacks and pitfalls of the Black-Scholes option pricing model 
have been understood shortly after its publication in the early 1970s, it is still by far the most 
popular model for option valuation. The Black-Scholes model is based on the assumption that 
the underlying follows a stochastic process called geometric Brownian motion. Besides pricing, 
every option pricing model can be used to calculate sensitivity measures describing the influence 
of a change in the underlying risk factors on the option price. These risk measures are called the 
"Greeks" and their use will be explained and described. 


In this entry, we look at the most popular model 
for pricing options, the Black-Scholes model, 
and look at the assumptions and their impor¬ 
tance. We also explain the various Greeks that 
provide information about the sensitivity of the 
option price to changes in the factors that the 
model tells us affects the value of an option. 


MOTIVATION 

Let us assume that the price of a certain stock 
in June of Year 0 (f = 0) is given to be So = $100. 
We want to value an option with strike price 


X = $110 maturing in June of Year 1 (t = T). As 
additional information we are given the contin¬ 
uously compounded one-year risk-free interest 
rate r — 5%. Figure 1 visualizes potential paths 
of the stock between t — 0 and t = T. How can 
we define a reasonable model for the stock price 
evolution? 

It is clear that the daily changes or the change 
between the current and the next quotes cannot 
be predicted and can consequently be seen as 
realizations of random variables. However, we 
know that if we are investing in stocks then we 
can expect a rate of return in the long run which 
is higher than the risk-free rate. Let us denote 
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June 05 t [Trading Days] June 06 

Figure 1 Possible Paths of the Stock Price Evo¬ 
lution over One Year with So = $100 and X = $110 

that unknown expected rate of return as /x. Here 
and in the rest of this entry, we assume that the 
stock pays no dividend. 

Furthermore, we know that stock returns ex¬ 
hibit random fluctuations called volatility Let 
a denote the unknown yearly rate of volatility. 
Here and below we have implicitly assumed 
that the expected return and the volatility of 
the stock are time independent. This assump¬ 
tion might be violated in practice. Formalizing 
our ideas about the stock price we come up 
with the following equation for the return of 
the stock in a small time interval of length At: 


"Stochastic noise" 

= Return in period [t,t+ At] 

The stochastic noise a ■ e Af should have the 
following properties: 

• No systematic influence: £(e Af ) = 0. 

• No dependence between the noise of differ¬ 
ent dates: The random variables e f and s s are 
independent for s =£ t. 

• The variance of the noise is proportional to 
the length of the time interval At. 

One possible model for the noise process is 
provided by a stochastic process called Brown¬ 
ian motion. A detailed discussion of Brownian 
motion is beyond the scope of this entry, but 


we provide a brief discussion of its defining 
properties. 

Brownian motion is a stochastic process 
( W t ) t > o in continuous time that has the follow¬ 
ing four properties: 

1. Wo = 0, that is. Brownian motion starts at 
zero. 

2. (W t ) t > o is a process with homogeneous and 
independent increments. 

3. Any increment W t + h - W t is normally dis¬ 
tributed with mean zero and variance li. 

4. The paths of ( W t ) t > o are continuous. 

The increments of Brownian motion are an 
appropriate candidate for the stochastic noise 
in our stock price model and we define: 

e t Af = W t+At - W t 

From its defining properties, we know that the 
increments of the Brownian motion are inde¬ 
pendent and that the variance of the increments 
is proportional to the length of the considered 
time interval. Additionally, the expectation of 
the increments is zero. 

With this definition, it is possible to write the 
equation for the return process in the following 
form: 

S,+At ~ St = M Af + a(W f+Af - W t ) 

Jt 

If we decrease the length Af of the time inter¬ 
val over which the increment is considered con¬ 
stant, then we can switch to a "differential type" 
notation: 

d S 

—- = ii • dt + a ■ dWt, t > 0 
St 

The process defined in the above equation 
is called geometric Brozvnian motion. In explicit 
notation the geometric Brownian motion pos¬ 
sesses the following form 

s, = 

and St is lognormally distributed. This pro¬ 
cess is used in the Black-Scholes model to de¬ 
scribe the stock price dynamic. Additionally, 
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the model assumes the existence of a risk¬ 
free asset—called money market account or 
bond—with the following dynamic: 

d B 

—- =r ■ dt, t > 0 o Bf = B 0 e rt ■ t >0 (1) 
Bt 


BLACK-SCHOLES FORMULA 


Black and Scholes (1973) have shown that it is 
possible—under some assumptions discussed 
in this section—to duplicate the payoff of a 
European call option with a continuously re¬ 
balanced portfolio consisting of the two assets 
S and B. This means that the price of the call 
option equals the initial costs for starting the 
hedging strategy. 

The Black-Scholes option pricing model com¬ 
putes the fair (or theoretical) price of a European 
call option on a nondividend-paying stock with 
the following formula: 

C = S4>(di) - Xe~ rT <&(d 2 ) (2) 


where 


di = 


ln(S/X) + (r + 0.5ct 2 T) 

aVT 


where 


d 2 = d\ — a VT 


(3) 

(4) 


ln(-) = natural logarithm 
C = call option price 
S — current stock price 
X — strike price 

r — short-term risk-free interest rate in 
percent per annum 
e — 2.718 (natural antilog of 1) 

T = time remaining to the expiration date 
(measured as a fraction of a year) 
a — expected return volatility for the stock 
(standard deviation of the stock's re¬ 
turn in percent per annum) 

<!>(■) = the cumulative distribution function 
of a standard normal distribution 


The option price derived from the Black- 
Scholes option pricing model is "fair" in the 
sense that if any other price existed in a market 


where all the assumptions of the Black-Scholes 
model are fulfilled, it would be possible to earn 
riskless arbitrage profits by taking an offsetting 
position in the underlying stock. That is, if the 
price of the call option in the market is higher 
than that derived from the Black-Scholes op¬ 
tion pricing model, an investor could sell the 
call option and buy a certain number of shares 
in the underlying stock. If the reverse is true, 
that is, the market price of the call option is less 
than the "fair" price derived from the model, 
the investor could buy the call option and sell 
short a certain number of shares in the under¬ 
lying stock. This process of hedging by taking a 
position in the underlying stock allows the in¬ 
vestor to lock in the riskless arbitrage profit. The 
number of shares necessary to hedge the posi¬ 
tion changes as the factors that affect the option 
price change, so the hedged position must be 
changed constantly. 


COMPUTING A CALL 
OPTION PRICE 

To illustrate the Black-Scholes option pricing 
formula, assume the following values: 

Strike price = $45 

Time remaining to expiration = 183 days 
Current stock price = $47 

Expected return volatility = Standard deviation 
= 25% per annum 
Risk-free rate = 10% per annum 

In terms of the values in the formula: 

S = 47 
X = 45 

T = 0.5 (183 days/365, rounded) 
a = 0.25 
r = 0.10 

Substituting these values into equations (3) 
and (4): 

ln(47/45) + (0.10 + 0.5[0.25] 2 )0.5 
0.25VO5 

d 2 = 0.6172 - 0.25VO5 = 0.440443 
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From a normal distribution table, 

0(0.6172) = 0.7315 and 0(0.4404) = 0.6702 
Then 

C = $47(0.7315) - $45(e“ (010)(a5) )(0.6702) = $5.69 

Table 1 shows the option value as calculated 
from the Black-Scholes option pricing model for 
different assumptions concerning (1) the stan¬ 
dard deviation for the stock's return (that is, 
expected return volatility); (2) the risk-free rate; 
and (3) the time remaining to expiration. Notice 
that the option price varies directly with three 


Table 1 Comparison of Black-Scholes Call Option 
Price Varying One Factor at a Time 


Base Case 

Call option: 

Strike price = $45 

Time remaining to expiration = 183 days 

Current stock price = $47 

Expected return volatility = Standard deviation of a 
stock's return = 25% 

Risk-free rate = 10% 

Holding All Factors Constant except Expected Return 
Volatility 

Expected Price Volatility Call Option Price [$] 

15% per annum 

4.69 

20 

5.17 

25 (base case) 

5.59 

30 

6.26 

35 

6.84 

40 

7.42 

Holding All Factors Constant Except the Risk-Free Rate 

Risk-Free Interest Rate, 

% per annum 

Call Option Price [$] 

7% 

5.27 

8 

5.41 

9 

5.50 

10 (base case) 

5.69 

11 

5.84 

12 

5.99 

13 

6.13 

Holding All Factors Constant except Time Remaining 

to Expiration 

Time Remaining to 

Expiration 

Call Option Price [$] 

30 days 

2.85 

60 

3.52 

91 

4.15 

183 (base case) 

5.69 

273 

6.99 


variables: expected return volatility, the risk¬ 
free rate, and time remaining to expiration. That 
is: (1) the lower (higher) the expected volatility, 
the lower (higher) the option price; (2) the lower 
(higher) the risk-free rate, the lower (higher) 
the option price; and (3) the shorter (longer) the 
time remaining to expiration, the lower (higher) 
the option price. 

SENSITIVITY OF OPTION 
PRICE TO A CHANGE IN 
FACTORS: THE GREEKS 

In employing options in investment strategies, 
an asset manager or trader would like to know 
how sensitive the price of an option is to a 
change in any one of the factors that affect its 
price. Sensitivity measures for assessing the 
impact of a change in factors on the price of 
an option are referred to as the Greeks. In this 
section, we will explain these measures for the 
factors in the Black-Scholes model. Specifically, 
we discuss measures of the sensitivity of a 
call option's price to changes in the price of 
the underlying stock, the time to expiration, 
expected volatility, and interest rate. These 
factors can be divided into "market factors" 
and "model factors." The underlying price 
and the time to expiration are market factors, 
whereas the volatility and the interest rate are 
model factors. The special aspect about the 
latter variables is that they are assumed to be 
constant within the model. By admitting that 
these parameters can change as well, we are 
admitting that the model is inconsistent with 
reality. Table 2 gives an overview and lists the 
sensitivities of the option price with respect to 
all parameters of the Black-Scholes model. 

Price of a Call Option Price and 
Price of the Underlying: Delta 
and Gamma 

In developing an option-pricing model, we 
have seen the importance of understanding the 
relationship between the option price and the 







Black-Scholes Option Pricing Model 


469 



Interest rate r Rho p = — 

P dr 

= X ■ T ■ e~ rT ■ <t> (d 2 ) 


price of the underlying stock. Moreover, an as¬ 
set manager employing options for risk man¬ 
agement wants to know how the value of an 
option position will change as the price of the 
underlying changes. 

One way to do so is to determine the deriva¬ 
tive of the call option price with respect to the 
spot price of the underlying stock: 

A = ^ = <*>(*) (5) 

This quantity is called the "delta" of the option, 
and can be used in the following way to deter¬ 
mine the expected price change in the option if 
the stock increases by about $1: 

C\/~' 

AC = C(S + $x) - C(S) « — AS = $*<&(£*!) (6) 

3 S 

The relation given by (6) holds true for small 
changes in the price of the underlying. For large 
changes the assumed linear relationship be¬ 
tween call and option price is not valid and we 
must apply a so-called convexity adjustment: 

AC = C(S + $x) - C(S) « ^ ■ 0($*) 2 

=r 

Here, T denotes the "options gamma," which 
measures the curvature of the option price as a 
function of the price of the underlying stock. 


Figure 2 Accuracy of Simple Delta Approxima¬ 
tion and Convexity-Adjusted Approximation 
Note: The example is calculated for a one-month 
option with strike X = $100 and current stock price 
S = $100 with an interest of 10% per annum and 
volatility of 20% per annum. 

Figure 2 visualizes this effect. We see that for 
small variations in the stock price the "true 
price" and both approximations nearly coin¬ 
cide. But for medium-sized variations, only the 
convexity-adjusted approximation is still accu¬ 
rate. For large variations in the underlying stock 
price both approximations fail. 

The impact of the parameters stock price, in¬ 
terest rate, time to maturity, and volatility on 
the option's delta is visualized in Figure 3. We 
can recognize that the influence of a change in 
the underlying on the option value measured 
by the option's delta increases with increasing 
stock price. Intuitively, this is clear as for large 
values of the underlying stock the option be¬ 
haves like the stock itself, whereas for values 
of the underlying stock near zero, the option is 
virtually worthless. Also, we can see that if the 
option is at the money, the impact of a change in 
the value of the underlying stock increases with 
increasing time to maturity and with increas¬ 
ing interest rate, which is not as obvious. The 
delta of the option that is at the money decreases 
with increasing volatility. The reason is as fol¬ 
lows. Imagine that you possess an option on 
an underlying which is virtually nonrandom. 
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Stock Price S 





Figure 3 Delta as a Function of the Parameters 

Note: The example is calculated for a one-month option with strike X = $100 and current stock price 
S = $100 with an interest of 10% per annum and a volatility of 20% per annum. 


In this case, the value of the option equals its 
intrinsic value and therefore a change in the un¬ 
derlying stock price has a large impact on the 
value of the option provided that the current 
stock price is above the strike. In a stochastic 
environment (that is, nonzero volatilty), every 
movement of the stock can be immediately fol¬ 
lowed by a movement in the opposite direction. 
This is why the option price is not as sensi¬ 
tive to stock price movements when volatility 
is high (that is, delta decreases with increasing 
volatility). 

For gamma, it is clear that the impact of a 
change in the price of the underlying is the high¬ 
est if the option is at the money. If the option is 
far out or far in the money, we have C ~ 0 or C 
ss S and, therefore, the second derivative with 
respect to S will vanish. 

Below we will give a brief overview of the re¬ 
maining sensitivity measures called theta, vega, 
and rho. Figure 4 visualizes the effect of the cur¬ 


rent stock price on the Greeks gamma, theta, 
rho, and vega. 

The Call Option Price and 
Time to Expiration: Theta 

All other factors constant, the longer the time 
to expiration, the greater the option price. Since 
each day the option moves closer to the expira¬ 
tion date, the time to expiration decreases. The 
theta of an option measures the change in the 
option price as the time to expiration decreases, 
or equivalently, it is a measure of time decay. 

Assuming that the price of the underlying 
stock does not change (which means that the 
intrinsic value of the option does not change), 
theta measures how quickly the time value of 
the option changes as the option moves toward 
expiration. 

Buyers of options prefer a low theta so that 
the option price does not decline quickly as it 
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Gamma Rho 



Figure 4 Variation of the Greeks with Respect to the Current Price of the Underlying Stock 

Note: The example is calculated for a one-month option with strike X = $100 and spot price S = $100 

with an interest of 10% per annum and a volatility of 20% per annum. 


moves toward the expiration date. An option 
writer benefits from an option that has a high 
theta. 

Option Price and Expected 
Volatility: Vega 

All other factors constant, a change in the ex¬ 
pected volatility will change the option price. 
The vega (also called "kappa") of an option 
measures the dollar price change in the price 
of the option for a 1% change in the expected 
price volatility. (Vega is not a Greek letter. Vega 
is used to denote volatility, just as theta is used 
for time and rho is used for interest rate.) The 
option price is most sensitive with respect to 
a change in volatility when the option is at or 
near the money. This can be easily understood 
as follows. Imagine the option is very deep out 
of the money (that is, the option is virtually 
worthless). In this case, any small change in the 
volatility of the underlying will have no impact 
on the option price. It will still be nearly zero. 


The same holds true if the option is far in the 
money (that is, it is nearly sure that the option 
will end in the money and the price of the option 
equals nearly the price of the stock). In this case, 
the impact of a small change in the volatility of 
the stock is negligible as well and, therefore, 
vega will be small. The situation is different if 
the option ranges near at the money. In this case, 
the option is very sensitive to volatility changes 
as they change the probability of ending in or 
out of the money dramatically. That is why we 
have a high vega for an option near the money. 

Call Option Price and Interest 
Rate: Rho 

The sensitivity of the option price to a change 
in the interest rate is called "rho." The option's 
rho is the least popular among the Greeks. Nev¬ 
ertheless, it is of practical value as it can be used 
to immunize a trader's position against interest 
rate risk. An equivalent concept which might 
be familiar to some readers is the duration of 
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a bond. For our purposes, rho plays a minor 
role, and we have introduced it for the sake of 
completeness. 

The Greeks and Portfolio 
Applications 

In practical applications, the Greeks are used 
to hedge portfolios with respect to certain risk 
exposures. Because a portfolio is a linear com¬ 
bination of assets and as the derivative of a lin¬ 
ear combination of functions equals the linear 
combination of the derivatives, we can simply 
calculate the Greek of a portfolio of options or 
other assets as the linear combination of the 
individual Greeks. When we seek to build a 
portfolio in a way that one or several of the 
Greeks equal zero, then the portfolio is said to 
be hedged with respect to the respective risk 
factor. A zero-delta portfolio, for example, is in¬ 
sensitive with respect to small changes in the 
value of S, and similarly for the other factors. 


COMPUTING A PUT 
OPTION PRICE 

We have focused our attention on call options. 
How do we value put options? This is done by 
using the following put-call parity relationship, 
which gives the relationship among the price of 
the common stock, the call option price, and the 
put option price. By simple no-arbitrage con¬ 
siderations, it can be shown that the following 
price identity must hold true for a European 
call and put option with the same strike and 
maturity: 

Call price — Put price = Stock price 
—Present value of dividends 
—Present value of the strike 

If we can calculate the fair value of a call op¬ 
tion, the fair value of a put with the same strike 
price and expiration on the same stock can be 
calculated from the put-call parity relationship. 


ASSUMPTIONS 
UNDERLYING THE 
BLACK-SCHOLES MODEL 
AND BASIC EXTENSIONS 

The Black-Scholes model is based on several re¬ 
strictive assumptions. These assumptions were 
necessary to develop the hedge to realize risk¬ 
less arbitrage profits if the market price of the 
call option deviates from the price obtained 
from the model. Here, we will look at these as¬ 
sumptions and mention some basic extensions 
of the model that make pricing more realistic. 

Taxes and Transactions Costs 

The Black-Scholes model ignores taxes and 
transactions costs. The model can be modified 
to account for taxes, but the problem is that 
there is not one unique tax rate. Transactions 
costs include both commissions and the bid- 
ask spreads for the stock and the option, as well 
as other costs associated with trading options. 
This assumption, together with the next two, is 
the most important for the validity of the Black- 
Scholes model. The derivation of the price de¬ 
pends mainly on the existence of a replicating 
portfolio. When transaction costs exist, even if 
they are negligibly small, then the hedge port¬ 
folio can no longer be built and the argument 
leading to the uniqueness of the price fails. 

Trading in Continuous Time, 

Short Selling, and Trading 
Arbitrary Fractions of Assets 

One crucial assumption underlying the Black- 
Scholes model is the opportunity to (1) perform 
trades in continuous time; (2) buy a negative 
number of all traded assets (short selling); and 
(3) buy and sell arbitrary fractions of all traded 
assets. Only these more or less unrealistic as¬ 
sumptions together with the previously dis¬ 
cussed absence of transaction costs and taxes 
allow the derivation of the unique call option 
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price by the hedging argument. The portfolio, 
consisting of certain fractions of the bond and 
the underlying stock, needs an ongoing rebal¬ 
ancing that is only possible in a market that 
allows continuous-time trading. Additionally, 
the number of stocks and bonds needed in the 
portfolio to replicate the option can be an arbi¬ 
trary real number, possibly negative. 

Variance of the Stock's Return 

The Black-Scholes model assumes that the vari¬ 
ance of the stock's return is (1) constant over 
the life of the option and (2) known with cer¬ 
tainty. If (1) does not hold, an option pricing 
model can be developed that allows the vari¬ 
ance to change. The violation of (2), however, is 
more serious. As the Black-Scholes model de¬ 
pends on the riskless hedge argument and, in 
turn, the variance must be known to construct 
the proper hedge, if the variance is not known, 
the hedge will not be riskless. 

Stochastic Process Generating 
Stock Prices 

To derive an option pricing model, an assump¬ 
tion is needed about the way stock prices move. 
The Black-Scholes model is based on the as¬ 
sumption that stock prices are generated by a 
geometric Brownian motion. Geometric Brown¬ 
ian motion is a stochastic process with continu¬ 
ous paths. In reality, one can sometimes observe 
that the market exhibits large fluctuations that 
cannot be explained by a continuous-time pro¬ 
cess with constant volatility as the Brownian 
motion. In theory, there are two possibilities to 
overcome this problem. Either one introduces 
the previously mentioned stochastic volatility 
or one allows for jumps in the stock price. 

Risk-Free Interest Rate 

In deriving the Black-Scholes model, two as¬ 
sumptions were made about the risk-free inter¬ 


est rate. First, it was assumed that the interest 
rates for borrowing and lending were the same. 
Second, it was assumed that the interest rate 
was constant and known over the life of the 
option. The first assumption is unlikely to hold 
because borrowing rates are higher than lend¬ 
ing rates. The effect on the Black-Scholes model 
is that the option price will be bound between 
the call price derived from the model using the 
two interest rates. The model can handle the sec¬ 
ond assumption by replacing the risk-free rate 
over the life of the option by the geometric aver¬ 
age of the period returns expected over the life 
of the option. Returns on short-term Treasury 
bills cannot be known with certainty over the 
long term. Only the expected return is known, 
and there is a variance around it. The effects of 
variable interest rates are considered in Merton 
(1973). 


BLACK-SCHOLES MODEL 
APPLIED TO THE PRICING 
OF OPTIONS ON BONDS: 
IMPORTANCE OF 
ASSUMPTIONS 

While the Black-Scholes option pricing model 
was developed for nondividend paying stocks, 
it has been applied to options on bonds. We 
conclude this entry by demonstrating the limi¬ 
tations of applying the model to valuing options 
on bonds. This allows us to appreciate the im¬ 
portance of the assumptions on option pricing. 
To do so, let us look at the values that would be 
derived in a couple of examples. 

We know that there are coupon-paying bonds 
and zero-coupon bonds. In our illustration we 
will use a zero-coupon bond. The reason is that 
the original Black-Scholes model was for com¬ 
mon stock that did not pay a dividend and so 
a zero-coupon bond would be the equivalent 
type of instrument. Specifically, we look at how 
the Black-Scholes option pricing model would 
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value a zero-coupon bond with three years to 
maturity assuming the following: 

Strike price = $88.00 

Time remaining to expiration = 2 years 

Current bond price = $83.96 

Expected return volatility = Standard deviation 

= 10% per annum 
Risk-free rate = 6% per annum 

The Black-Scholes model would give an op¬ 
tion value of $8,116. There is no reason to sus¬ 
pect that this value generated by the model 
is incorrect. However, let us change the prob¬ 
lem slightly. Instead of a strike price of $88, 
let us make the strike price $100.25. The Black- 
Scholes option pricing model would give a fair 
value of $2.79. Is there any reason to believe 
this is incorrect? Well, consider that this is a call 
option on a zero-coupon bond that will never 
have a value greater than its maturity value of 
$100. Consequently, a call option with a strike 
price of $100.25 must have a value of zero. Yet, 
the Black-Scholes option pricing model tells us 
that the value is $2.79! In fact, if we assume 
a higher expected volatility, the Black-Scholes 
model would give an even greater value for the 
call option. 

Why is the Black-Scholes model off by so 
much in our illustration? The answer is that 
there are three assumptions underlying the 
Black-Scholes model that limit its use in pric¬ 
ing options on fixed income instruments. 

The first assumption is that the probability 
distribution for the underlying asset's prices 
assumed by the Black-Scholes model permits 
some probability—no matter how small—that 
the price can take on any positive value. But in 
the case of a zero-coupon bond, the price can¬ 
not take on a value above $100. In the case of 
a coupon bond, we know that the price can¬ 
not exceed the sum of the coupon payments 
plus the maturity value. For example, for a five- 
year 10% coupon bond with a maturity value 
of $100, the price cannot be greater than $150 
(five coupon payments of $10 plus the maturity 
value of $100). Thus, unlike stock prices, bond 


prices have a maximum value. The only way 
that a bond's price can exceed the maximum 
value is if negative interest rates are permitted. 
While there have been instances where nega¬ 
tive interest rates have occurred outside the 
United States, users of option pricing models 
assume that this outcome cannot occur. Conse¬ 
quently, any probability distribution for prices 
assumed by an option pricing model that per¬ 
mits bond prices to be higher than the max¬ 
imum bond value could generate nonsensical 
option prices. The Black-Scholes model does al¬ 
low bond prices to exceed the maximum bond 
value (or, equivalently, assumes that interest 
rates can be negative). 

The second assumption of the Black-Scholes 
model is that the short-term interest rate is con¬ 
stant over the life of the option. Yet the price 
of an interest rate option will change as interest 
rates change. A change in the short-term inter¬ 
est rate changes the rates along the yield curve. 
Therefore, for interest rate options it is clearly 
inappropriate to assume that the short-term rate 
will be constant. 

The third assumption is that the variance 
of returns is constant over the life of the op¬ 
tion. As a bond moves closer to maturity, its 
price volatility declines and therefore its return 
volatility declines. Therefore, the assumption 
that variance of returns is constant over the life 
of the option is inappropriate. 


KEY POINTS 

• The most popular option pricing model is the 
Black-Scholes model. 

• The factors that affect the value of an op¬ 
tion include the current price of the asset, 
the strike price, the short-term risk-free inter¬ 
est rate, the time remaining to the expiration 
date of the option, and the expected return 
volatility. 

• Option pricing models depend on the as¬ 
sumption regarding the distribution of re¬ 
turns. 
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The option price derived from the Black- 
Scholes option pricing model is "fair" in the 
sense that if any other price existed in a mar¬ 
ket where all the assumptions of the Black- 
Scholes model are satisfied, riskless arbitrage 
profits can be realized by taking an offsetting 
position in the underlying asset. 

The sensitivity of the price of an option to a 
change in the value of a factor that affects the 
option's price can be computed for any option 
pricing model. These sensitivity measures are 
referred to as the Greeks (delta, gamma, vega, 
theta, and rho). 

As with any economic model, there are 
assumptions that are made. When these 
assumptions are violated, the model value 


can depart significantly from the true value 
of the option. 

* Using the Black-Scholes option pricing model 
to value an option on a bond is a good 
example where the model assumptions are 
not consistent with the realities of the bond 
market. 
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Abstract: There are various models that been proposed to value financial assets in the cash market. 
Models for valuing derivatives such as futures, forwards, options, swaps, caps, and floors are 
valued using arbitrage principles. Basically, the price of a derivative is one that does not allow 
market participants to generate riskless profits without committing any funds. In developing a 
pricing model for derivatives, the model builder begins with a strategy (or trade) to exploit the 
difference between the cash price of the underlying asset for a derivative. The market price for the 
derivative is the cost of the package to replicate the payoff of the derivative. 


Derivative instruments play an important role 
in financial markets as well as commodity mar¬ 
kets by allowing market participants to control 
their exposure to different types of risk. When 
using derivatives, a market participant should 
understand the basic principles of how they 
are valued. While there are many models that 
have been proposed for valuing financial in¬ 
struments that trade in the cash (spot) market, 
the valuation of all derivative models is based 
on arbitrage arguments. Basically, this involves 
developing a strategy or a trade wherein a pack¬ 
age consisting of a position in the underlying 
(that is, the underlying asset or instrument for 
the derivative contract) and borrowing or lend¬ 
ing so as to generate the same cash flow profile 
as the derivative. The value of the package is 
then equal to the theoretical price of the deriva¬ 


tive. If the market price of the derivative devi¬ 
ates from the theoretical price, then the actions 
of arbitrageurs will drive the market price of 
the derivative toward its theoretical price until 
the arbitrage opportunity is eliminated. 

In developing a strategy to capture any mis¬ 
pricing, certain assumptions are made. When 
these assumptions are not satisfied in the real 
world, the theoretical price can only be ap¬ 
proximated. Moreover, a close examination of 
the underlying assumptions necessary to de¬ 
rive the theoretical price indicates how a pric¬ 
ing formula must be modified to value specific 
contracts. 

In this entry, how futures, forwards, and op¬ 
tions are valued is explained. The valuation 
of other derivatives such as swaps, caps, and 
floors is described in other entries. 
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PRICING OF 

FUTURES/FORWARD 

CONTRACTS 

The pricing of futures and forward contracts 
is similar. If the underlying asset for both con¬ 
tracts is the same, the difference in pricing is 
due to differences in features of the contract 
that must be dealt with by the pricing model. 
To understand the differences, we begin with a 
definition of the two contracts. 

A futures contract and a forward contract are 
agreements between a buyer and a seller, in 
which the buyer agrees to take delivery of the 
underlying at a specified price at some future 
date and the seller agrees to make delivery of 
the underlying at the specified price at the same 
future date. The buyer and the seller of the con¬ 
tract refers to the obligation that the party has 
in the future since neither party is obligated to 
transact in the underlying at the time of the 
trade. The futures price in the case of a futures 
contract or forward price in the case of a for¬ 
ward contract is the price at which the parties 
have agreed to transact in the future. The settle¬ 
ment date or delivery date is the future date when 
the two parties have agreed to transact (that is, 
buy or sell the underlying). 

Differences between Futures and 
Forward Contracts 

Futures contracts are standardized agreements 
as to the delivery date (or month) and qual¬ 
ity of the deliverable, and are traded on orga¬ 
nized exchanges. Associated with every futures 
exchange is a clearinghouse. The clearinghouse 
plays an important function: It guarantees that 
both parties to the trade will perform in the 
future. In the absence of a clearinghouse, the 
risk that the two parties face is that in the fu¬ 
ture when both parties are obligated to per¬ 
form one of the parties will default. This risk 
faced in any derivative contract is referred to 
as counterparty risk. The clearinghouse allows 
the two parties to enter into a trade without 


worrying about counterparty risk with respect 
to the counterparty to the trade. The reason is 
that after the trade is executed by the parties, 
the relationship between the two parties is ter¬ 
minated. The clearinghouse interposes itself as 
the buyer for every sale and the seller for every 
purchase. Consequently, the two parties to the 
trade are free to liquidate their positions with¬ 
out involving the original counterparty. 

To protect itself against the counterparty risk 
of both the buyer and seller to the trade, the ex¬ 
change where the contract is traded requires 
that when a position is first taken in a fu¬ 
tures contract, both parties must deposit a min¬ 
imum dollar amount per contract. This amount 
is specified by the exchange and referred to as 
initial margin. The parties have a choice of pro¬ 
viding the initial margin in the form of cash or 
an interest-bearing security such as a Treasury 
bill. As the price of the futures contract fluctu¬ 
ates each trading day, the value of the equity 
of each party in the position changes. The eq¬ 
uity in a futures margin account is measured 
by the sum of all margins posted and all daily 
gains less all daily losses to the account. To fur¬ 
ther protect itself against counterparty risk, the 
exchange specifies that the parties satisfy min¬ 
imum equity positions. Maintenance margin is 
the minimum level that the exchange specifies 
that a party's equity position may fall as a re¬ 
sult of an unfavorable price movement before a 
party is required to deposit additional margin. 
Variation margin is the additional margin that a 
party is required to provide in order to bring the 
equity in the margin account back to its initial 
margin level. If a party fails to furnish variation 
margin within 24 hours, the exchange closes 
the futures position out. Unlike initial margin, 
variation margin must be in cash rather than an 
interest-bearing security. Any excess margin in 
a party's margin account may be withdrawn. 

In pricing futures contracts, the potential in¬ 
terim cash flows of futures contracts that are 
due to variation margin, in the case of ad¬ 
verse price movements, or withdrawal of cash 
for a party that experiences a favorable price 
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movement that results in the margin account's 
having excess margin must be taken into 
account. 

We'll now compare these characteristics of a 
futures contract to a forward contact. A for¬ 
ward contract is an over-the-counter (OTC) in¬ 
strument. That is, it is not an exchange-traded 
product. A forward contract is usually nonstan- 
dardized because the terms of each contract are 
negotiated individually between the parties to 
the trade. Also, there is no clearinghouse for 
trading forward contracts, and secondary mar¬ 
kets are often nonexistent or extremely thin. 

As just explained, futures contracts are 
marked to market at the end of each trading 
day. A forward contract may or may not be 
marked to market, depending on the wishes of 
the two parties. For example, both parties to a 
forward contract may be high-credit-quality en¬ 
tities. The parties may feel comfortable with the 
counterparty risk up to some specified amount 
and not require margin. Or one party may be 
satisfied with the high quality of the counter¬ 
party but the other party may not. In such cases, 
the forward contract may call for the marking to 
market of the position of only one of the coun¬ 
terparties. For a forward contract that is not 
marked to market, there are no interim cash¬ 
flow effects because no additional margin is 
required. 

Other than these differences, which reflect the 
institutional arrangements in the two markets, 
most of what we say about the pricing of fu¬ 
tures contracts applies equally to the pricing of 
forward contracts. 

Basic Futures Pricing Model 

We will illustrate the basic model for pricing 
futures contracts here. By "basic" we mean that 
we are extrapolating from the nuisances of the 
underlying for a specific contract. The issues as¬ 
sociated with applying the basic pricing model 
to some of the more popular futures contracts 
are described in other entries. Moreover, while 
the model described here is said to be a model 


for pricing futures, technically, it is a model 
for pricing forward contracts with no mark-to- 
market requirements. 

Rather than deriving the formula alge¬ 
braically, the basic pricing model will be 
demonstrated using an illustration. We make 
the following six assumptions for a futures 
contract that has no initial and variation margin 
and which the underlying is asset U: 

1. The price of asset U in the cash market is 

$ 100 . 

2. There is a known cash flow for asset U over 
the life of the futures contract. 

3. The cash flow for asset U is $8 per year paid 
quarterly ($2 per quarter). 

4. The next quarterly payment is exactly three 
months from now. 

5. The futures contract requires delivery three 
months from now. 

6. The current three-month interest rate at 
which funds can be lent or borrowed is 4% 
per year. 

The objective is to determine what the fu¬ 
tures price of this contract should be. To do so, 
suppose that the futures price in the market is 
$105. Let's see if that is the correct price. We 
can check this by implementing the following 
simple strategy: 

* Sell the futures contract at $105. 

* Purchase asset U in the cash market for $100. 

• Borrow $100 for three months at 4% per year 
($1 per quarter). 

The purchase of asset U is accomplished with 
the borrowed funds. Flence, this strategy does 
not involve any initial cash outlay. At the end 
of three months, the following occurs 

• $2 is received from holding asset U. 

* Asset U is delivered to settle the futures 
contract. 

• The loan is repaid. 
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This strategy results in the following out¬ 
come: 


From settlement of the futures contract: 


Proceeds from sale of asset U to settle 
the futures contract 

= $105 

Payment received from investing in 
asset U for three months 

= 2 

Total proceeds 

= $107 

From the loan: 


Repayment of principal of loan 

= $100 

Interest on loan (1% for three months) 

= 1 

Total outlay 

= $101 

Profit from the strategy 

= $6 


The profit of $6 from this strategy is guaran¬ 
teed regardless of what the cash price of asset 
U is three months from now. This is because 
in the preceding analysis of the outcome of the 
strategy, the cash price of asset U three months 
from now never enters the analysis. Moreover, 
this profit is generated with no investment out¬ 
lay; the funds needed to acquire asset U are 
borrowed when the strategy is executed. In fi¬ 
nancial terms, the profit in the strategy we have 
just illustrated arises from a riskless arbitrage 
between the price of asset U in the cash market 
and the price of asset U in the futures market. 

In a well-functioning market, arbitrageurs 
who could realize this riskless profit for a zero 
investment would implement the strategy de¬ 
scribed above. By selling the futures and buying 
asset U in order to implement the strategy, this 
would force the futures price down so that at 
some price for the futures contract, the arbitrage 
profit is eliminated. 

This strategy that resulted in the capturing 
of the arbitrage profit is referred to as a cash- 
and-carry trade. The reason for this name is that 
implementation of the strategy involves bor¬ 
rowing cash to purchase the underlying and 
"carrying" that underlying to the settlement 
date of the futures contract. 

From the cash-and-carry trade we see that the 
futures price cannot be $105. Suppose instead 
that the futures price is $95 rather than $105. 


Let's try the following strategy to see if that 
price can be sustained in the market: 

• Buy the futures contract at $95. 

• Sell (short) asset U for $100. 

• Invest (lend) $100 for three months at 1% per 
year. 

We assume once again that in this strategy 
that there is no initial margin and variation mar¬ 
gin for the futures contract. In addition, we as¬ 
sume that there is no cost to selling the asset 
short and lending the money. Given these as¬ 
sumptions, there is no initial cash outlay for the 
strategy just as with the cash-and-carry trade. 
Three months from now, 

• Asset U is purchased to settle the long posi¬ 
tion in the futures contract. 

• Asset U is accepted for delivery. 

• Asset U is used to cover the short position in 
the cash market. 

• Payment is made of $2 to the lender of asset U 
as compensation for the quarterly payment. 

• Payment is received from the borrower of the 
loan of $101 for principal and interest. 

More specifically, the strategy produces the 
following at the end of three months: 

From settlement of the futures contract: 


Price paid for purchase of asset U to 
settle futures contract 

= $95 

Proceeds to lender of asset U to borrow 
the asset 

= 2 

Total outlay 

= $97 

From the loan: 


Principal from loan 

Interest earned on loan ($1 for three months) 

= $100 
= 1 

Total proceeds 

= $101 

Profit from the strategy 

= $4 


As with the cash and trade, the $4 profit from 
this strategy is a riskless arbitrage profit. This 
strategy requires no initial cash outlay, but will 
generate a profit whatever the price of asset U is 
in the cash market at the settlement date. In real- 
world markets, this opportunity would lead ar¬ 
bitrageurs to buy the futures contract and short 
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asset U. The implementation of this strategy 
would be to raise the futures price until the ar¬ 
bitrage profit disappeared. 

This strategy that is implemented to capture 
the arbitrage profit is known as a reverse cash- 
and-carry trade. That is, with this strategy, the 
underlying is sold short and the proceeds re¬ 
ceived from the short sale are invested. 

We can see that the futures price cannot be 
$95 or $105. What is the theoretical futures price 
given the assumptions in our illustration? It can 
be shown that if the futures price is $99 there is 
no opportunity for an arbitrage profit. That is, 
neither the cash-and-carry trade nor the reverse 
cash-and-carry trade will generate an arbitrage 
profit. 

In general, the formula for determining the 
theoretical futures price given the assumptions 
of the model is: 

Theoretical futures price 

= Cash market price + (Cash market price) 
x (Financing cost — Cash yield) (1) 

In the formula given by (1), "Financing cost" 
is the interest rate to borrow funds and "Cash 
yield" is the payment received from investing 
in the asset as a percentage of the cash price. In 
our illustration, the financing cost is 1% and the 
cash yield is 2%. 

In our illustration, since the cash price of asset 
U is $100, the theoretical futures price is: 

$100 + $100 X (1% - 2%) = $99 

The future price can be above or below the 
cash price depending on the difference between 
the financing cost and cash yield. The difference 
between these rates is called the net financing 
cost. A more commonly used term for the net fi¬ 
nancing cost is the cost of carry, or simply, carry. 
Positive carry means that the cash yield exceeds 
the financing cost. (Note that while the differ¬ 
ence between the financing cost and the cash 
yield is a negative value, carry is said to be pos¬ 
itive.) Negative carry means that the financing 
cost exceeds the cash yield. Below is a summary 


of the effect of carry on the difference between 
the futures price and the cash market price: 


Positive carry 

Futures price will sell at a discount 

to cash price. 

Negative carry 

Futures price will sell at a premium 


to cash price. 

Zero 

Futures price will be equal to the 


cash price. 


Note that at the settlement date of the futures 
contract, the futures price must equal the cash 
market price. The reason is that a futures con¬ 
tract with no time left until delivery is equiva¬ 
lent to a cash market transaction. Thus, as the 
delivery date approaches, the futures price will 
converge to the cash market price. This fact is 
evident from the formula for the theoretical fu¬ 
tures price given by (1). The financing cost ap¬ 
proaches zero as the delivery date approaches. 
Similarly, the yield that can be earned by hold¬ 
ing the underlying approaches zero. Flence, the 
cost of carry approaches zero, and the futures 
price approaches the cash market price. 

A Closer Look at the Theoretical 
Futures Price 

In deriving theoretical futures price using the 
arbitrage argument, several assumptions had 
to be made. These assumptions as well as the 
differences in contract specifications will result 
in the futures price in the market deviating from 
the theoretical futures price as given by (1). It 
may be possible to incorporate these institu¬ 
tional and contract specification differences into 
the formula for the theoretical futures price. In 
general, however, because it is oftentimes too 
difficult to allow for these differences in build¬ 
ing a model for the theoretical futures price, 
the end result is that one can develop bands 
or boundaries for the theoretical futures price. 
So long as the futures price in the market re¬ 
mains within the band, no arbitrage opportu¬ 
nity is possible. 

Next, we will look at some of the institutional 
and contract specification differences that cause 
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prices to deviate from the theoretical futures 
price as given by the basic pricing model. 

Interim Cash Flows 

In the derivation of a basic pricing model, it is 
assumed that no interim cash flows arise be¬ 
cause of changes in futures prices (that is, there 
is no variation margin). As noted earlier, in the 
absence of initial and variation margins, the 
theoretical price for the contract is technically 
the theoretical price for a forward contract that 
is not marked to market rather than a futures 
contract. 

In addition, the model assumes implicitly that 
any dividends or coupon interest payments are 
paid at the settlement date of the futures con¬ 
tract rather than at any time between initiation 
of the cash position and settlement of the fu¬ 
tures contract. However, we know that interim 
cash flows for the underlying for financial fu¬ 
tures contracts do have interim cash flows. Con¬ 
sider stock index futures contracts and bond 
futures contracts. 

For a stock index, there are interim cash flows. 
In fact, there are many cash flows that are de¬ 
pendent upon the dividend dates of the compo¬ 
nent companies. To correctly price a stock index 
future contract, it is necessary to incorporate the 
interim dividend payments. Yet, the dividend 
rate and the pattern of dividend payments are 
not known with certainty. Consequently, they 
must be projected from the historical dividend 
payments of the companies in the index. Once 
the dividend payments are projected, they can 
be incorporated into the pricing model. The 
only problem is that the value of the dividend 
payments at the settlement date will depend 
on the interest rate at which the dividend 
payments can be reinvested from the time they 
are projected to be received until the settlement 
date. The lower the dividend, and the closer 
the dividend payments to the settlement date 
of the futures contract, the less important the 
reinvestment income is in determining the 
futures price. 


In the case of a Treasury futures contract, the 
underlying is a Treasury note or a Treasury 
bond. Unlike a stock index futures contract, 
the timing of the interest payments that will be 
made by the U.S. Department of the Treasury 
for a given issue that is acceptable as deliver¬ 
able for a contract is known with certainty and 
can be incorporated into the pricing model. 
However, the reinvestment interest that can 
be earned from the payment dates to the 
settlement of the contract is unknown and 
depends on prevailing interest rates at each 
payment date. 

Differences in Borrowing and 
Lending Rates 

In the formula for the theoretical futures price, it 
is assumed in the cash-and-carry trade and the 
reverse cash-and-carry trade that the borrow¬ 
ing rate and lending rate are equal. Typically, 
however, the borrowing rate is higher than the 
lending rate. The impact of this inequality is 
important and easy to quantify. 

In the cash-and-carry trade, the theoretical 
futures price as given by (1) becomes: 

Theoretical futures price based on borrowing rate 
= Cash market price + (Cash market price) 
x (Borrowing rate — Cash yield) (2) 

For the reverse cash-and-carry trade, it 
becomes 

Theoretical futures price based on lending rate 
= Cash market price + (Cash market price) 
x (Lending rate — Cash yield) (3) 

Formulas (2) and (3) together provide a band 
between which the actual futures price can exist 
without allowing for an arbitrage profit. Equa¬ 
tion (2) establishes the upper value for the band 
while equation (3) provides the lower value for 
the band. For example, assume that the bor¬ 
rowing rate is 6% per year, or 1.5% for three 
months, while the lending rate is 4% per year, 
or 1% for three months. Using equation (2), the 
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upper value for the theoretical futures price is 
$99.5 and using equation (3) the lower value for 
the theoretical futures price is $99. 

Transaction Costs 

The two strategies to exploit any price discrep¬ 
ancies between the cash market and theoretical 
price for the futures contract will require the 
arbitrageur to incur transaction costs. In real- 
world financial markets, the costs of entering 
into and closing the cash position as well as 
round-trip transaction costs for the futures con¬ 
tract affect the futures price. As in the case of 
differential borrowing and lending rates, trans¬ 
action costs widen the bands for the theoretical 
futures price. 

Short Selling 

The reverse cash-and-strategy trade requires 
the short selling of the underlying. It is assumed 
in this strategy that the proceeds from the short 
sale are received and reinvested. In practice, for 
individual investors, the proceeds are not re¬ 
ceived, and, in fact, the individual investor is 
required to deposit margin (securities margin 
and not futures margin) to short sell. For insti¬ 
tutional investors, the underlying may be bor¬ 
rowed, but there is a cost to borrowing. This 
cost of borrowing can be incorporated into the 
model by reducing the cash yield on the un¬ 
derlying. For strategies applied to stock index 
futures, a short sale of the components stocks 
in the index means that all stocks in the index 
must be sold simultaneously. This may be diffi¬ 
cult to do and therefore would widen the band 
for the theoretical future price. 

Known Deliverable Asset and 
Settlement Date 

In the two strategies to arbitrage discrepancies 
between the theoretical futures price and the 
cash market price, it is assumed that (1) only 
one asset is deliverable and (2) the settlement 
date occurs at a known, fixed point in the fu¬ 
ture. Neither assumption is consistent with the 


delivery rules for some futures contracts. For 
U.S. Treasury note and bond futures contracts, 
for example, the contract specifies that any one 
of several Treasury issues that is acceptable for 
delivery can be delivered to satisfy the contract. 
Such issues are referred to as deliverable issues. 
The selection of which deliverable issue to de¬ 
liver is an option granted to the party who is 
short the contract (that is, the seller). Flence, the 
party that is long the contract (that is, the buyer 
of the contract) does not know the specific 
Treasury issue that will be delivered. FIow- 
ever, market participants can determine the 
cheapest-to-deliver issue from the issues that 
are acceptable for delivery. It is this issue that is 
used in obtaining the theoretical futures price. 
The net effect of the short's option to select the 
issue to deliver to satisfy the contract is that it re¬ 
duces the theoretical future price by an amount 
equal to the value of the delivery option granted 
to the short. 

Moreover, unlike other futures contracts, the 
Treasury bond and note contracts do not have a 
delivery date. Instead, there is a delivery month. 
The short has the right to select when in the 
delivery month to make delivery. The effect of 
this option granted to the short is once again to 
reduce the theoretical futures price below that 
given by equation (1). More specifically. 

Theoretical futures price adjusted for delivery options 
= Cash market price + (Cash market price) 
x (Financing cost — Cash yield) — Value of the 
delivery options granted to the short (4) 

Deliverable as a Basket of Securities 
Some futures contracts have as the underlying a 
basket of assets or an index. Stock index futures 
are the most obvious example. At one time, mu¬ 
nicipal futures contracts were actively traded 
and the underlying was a basket of municipal 
securities. The problem in arbitraging futures 
contracts in which there is a basket of assets 
or an index is that it may be too expensive to 
buy or sell every asset included in the basket or 
index. Instead, a portfolio containing a smaller 
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number of assets may be constructed to track 
the basket or index (which means having price 
movements that are very similar to changes in 
the basket or index). Nonetheless, the two ar¬ 
bitrage strategies involve a tracking portfolio 
rather than a single asset for the underlying, and 
the strategies are no longer risk-free because of 
the risk that the tracking portfolio will not pre¬ 
cisely replicate the performance of the basket 
or index. For this reason, the market price of 
futures contracts based on baskets of assets or 
an index is likely to diverge from the theoretical 
price and have wider bands. 

Different Tax Treatment of Cash and Futures 
Transaction 

Participants in the financial market cannot ig¬ 
nore the impact of taxes on a trade. The strate¬ 
gies that are implemented to exploit arbitrage 
opportunities between prices in the cash and fu¬ 
tures markets and the resulting pricing model 
must recognize that there are differences in the 
tax treatment under the tax code for cash and 
futures transactions. The impact of taxes must 
be incorporated into the pricing model. 


PRICING OF OPTIONS 

Now we will look at the basic principles for 
valuing options. There are two parties to an op¬ 
tion contract: the buyer and the writer or seller. 
The writer of the option grants the buyer of the 
option the right, but not the obligation, to either 
purchase from or sell to the writer something at 
a specified price within a specified period of 
time (or at a specified date). In exchange for 
the right that the writer grants the buyer, the 
buyer pays the writer a certain sum of money. 
This sum is called the option price or option 
premium. The price at which the underlying 
may be purchased or sold is called the exercise 
price or strike price. The option's expiration date 
(or maturity date) is the last date at which the 
option buyer can exercise the option. After the 


expiration date, the contract is void and has no 
value. 

There are two types of options: call options 
and put options. A call option, or simply call, is 
one in which the option writer grants the buyer 
the right to purchase the underlying. When the 
option writer grants the buyer the right to sell 
the underlying, the option is called a put option, 
or simply, a put. 

The timing of the possible exercise of an op¬ 
tion is an important characteristic of an option 
contract. An American option allows the option 
buyer to exercise the option at any time up to 
and including the expiration date. A European 
option allows the option buyer to exercise the 
option only on the expiration date. 

As with futures and forward contracts, the 
theoretical price of an option is also derived 
based on arbitrage arguments. However, as will 
be explained, the pricing of options is not as 
simple as the pricing of futures and forward 
contracts. 

Basic Components of the 
Option Price 

The theoretical price of an option is made up 
of two components: the intrinsic value and a 
premium over intrinsic value. 

Intrinsic Value 

The intrinsic value is the option's economic 
value if it is exercised immediately. If no pos¬ 
itive economic value would result from exer¬ 
cising immediately, the intrinsic value is zero. 
An option's intrinsic value is easy to compute 
given the price of the underlying and the strike 
price. 

For a call option, the intrinsic value is the dif¬ 
ference between the current market price of the 
underlying and the strike price. If that differ¬ 
ence is positive, then the intrinsic value equals 
that difference; if the difference is zero or neg¬ 
ative, then the intrinsic value is equal to zero. 
For example, if the strike price for a call option 
is $100 and the current price of the underlying 
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is $109, the intrinsic value is $9. That is, an op¬ 
tion buyer exercising the option and simultane¬ 
ously selling the underlying would realize $109 
from the sale of the underlying, which would 
be covered by acquiring the underlying from 
the option writer for $100, thereby netting a $9 
gain. 

An option that has a positive intrinsic value is 
said to be in-the-money. When the strike price of 
a call option exceeds the underlying's market 
price, it has no intrinsic value and is said to be 
out-of-the-money. An option for which the strike 
price is equal to the underlying's market price 
is said to be at-the-money. Both at-the-money 
and out-of-the-money options have intrinsic 
values of zero because it is not profitable to 
exercise them. Our call option with a strike 
price of $100 would be (1) in the money when 
the market price of the underlying is more than 
$100; (2) out of the money when the market 
price of the underlying is less than $100, and 
(3) at the money when the market price of the 
underlying is equal to $100. 

For a put option, the intrinsic value is equal to 
the amount by which the underlying's market 
price is below the strike price. For example, if 
the strike price of a put option is $100 and the 
market price of the underlying is $95, the in¬ 
trinsic value is $5. That is, the buyer of the put 
option who simultaneously buys the underly¬ 
ing and exercises the put option will net $5 by 
exercising. The underlying will be sold to the 
writer for $100 and purchased in the market for 
$95. With a strike price of $100, the put option 
would be (1) in the money when the underly¬ 
ing's market price is less than $100, (2) out of 
the money when the underlying's market price 
exceeds $100, and (3) at the money when the 
underlying's market price is equal to $100. 

Time Premium 

The time premium of an option, also referred to 
as the time value of the option, is the amount 
by which the option's market price exceeds its 
intrinsic value. It is the expectation of the op¬ 
tion buyer that at some time prior to the ex¬ 


piration date changes in the market price of 
the underlying will increase the value of the 
rights conveyed by the option. Because of this 
expectation, the option buyer is willing to pay 
a premium above the intrinsic value. For exam¬ 
ple, if the price of a call option with a strike 
price of $100 is $12 when the underlying's mar¬ 
ket price is $104, the time premium of this op¬ 
tion is $8 ($12 minus its intrinsic value of $4). 
Flad the underlying's market price been $95 in¬ 
stead of $104, then the time premium of this 
option would be the entire $12 because the op¬ 
tion has no intrinsic value. All other things be¬ 
ing equal, the time premium of an option will 
increase with the amount of time remaining to 
expiration. 

An option buyer has two ways to realize the 
value of an option position. The first way is by 
exercising the option. The second way is to sell 
the option in the market. In the first example 
above, selling the call for $12 is preferable to 
exercising, because the exercise will realize only 
$4 (the intrinsic value), but the sale will realize 
$12. As this example shows, exercise causes 
the immediate loss of any time premium. It is 
important to note that there are circumstances 
under which an option may be exercised prior 
to the expiration date. These circumstances 
depend on whether the total proceeds at the 
expiration date would be greater by holding 
the option or exercising and reinvesting any re¬ 
ceived cash proceeds until the expiration date. 

Put-Call Parity Relationship 

For a European put and a European call option 
with the same underlying, strike price, and ex¬ 
piration date, there is a relationship between the 
price of a call option, the price of a put option, 
the price of the underlying, and the strike price. 
This relationship is known as the put-call parity 
relationship. The relationship is: 

Put option price — Call option price = Present value 
of strike price + Present value of cash distribution 
—Price of underlying 


486 


Derivatives Valuation 


Factors That Influence the 
Option Price 

The factors that affect the price of an option 
include the: 

• Market price of the underlying. 

• Strike price of the option. 

• Time to expiration of the option. 

• Expected volatility of the underlying over the 
life of the option. 

• Short-term, risk-free interest rate over the life 
of the option. 

• Anticipated cash payments on the underlying 
over the life of the option. 

The impact of each of these factors may de¬ 
pend on whether (1) the option is a call or a 
put, and (2) the option is an American option 
or a European option. Table 1 summarizes how 
each of the six factors listed above affects the 
price of a put and call option. Here, we briefly 
explain why the factors have the particular 
effects. 

Market Price of the Underlying Asset 
The option price will change as the price of the 
underlying changes. For a call option, as the 
underlying's price increases (all other factors 
being constant), the option price increases. The 
opposite holds for a put option: As the price 
of the underlying increases, the price of a put 
option decreases. 


Table 1 Summary of Factors That Affect the Price of 
an Option 



Effect of an Increase 


of Factor On 

Factor 

Call Price 

Put Price 

Market price of underlying 

Increase 

Decrease 

Strike price 

Decrease 

Increase 

Time to expiration of option 

Increase 

Increase 

Expected volatility 

Increase 

Increase 

Short-term, risk-free interest rate 

Increase 

Decrease 

Anticipated cash payments 

Decrease 

Increase 


Strike Price 

The strike price is fixed for the life of the op¬ 
tion. All other factors being equal, the lower 
the strike price, the higher the price for a call 
option. For put options, the higher the strike 
price, the higher the option price. 

Time to Expiration of the Option 

After the expiration date, an option has no 
value. All other factors being equal, the longer 
the time to expiration of the option, the higher 
the option price. This is because, as the time to 
expiration decreases, less time remains for the 
underlying's price to rise (for a call buyer) or 
fall (for a put buyer), and therefore the proba¬ 
bility of a favorable price movement decreases. 
Consequently, as the time remaining until ex¬ 
piration decreases, the option price approaches 
its intrinsic value. 

Expected Volatility of the Underlying over the 
Life of the Option 

All other factors being equal, the greater the ex¬ 
pected volatility (as measured by the standard 
deviation or variance) of the underlying, the 
more the option buyer would be willing to pay 
for the option, and the more an option writer 
would demand for it. This occurs because the 
greater the expected volatility, the greater the 
probability that the movement of the underly¬ 
ing will change so as to benefit the option buyer 
at some time before expiration. 

Short-Term, Risk-Free Interest Rate over the 
Life of the Option 

Buying the underlying requires an investment 
of funds. Buying an option on the same quan¬ 
tity of the underlying makes the difference be¬ 
tween the underlying's price and the option 
price available for investment at an interest 
rate at least as high as the risk-free rate. Con¬ 
sequently, all other factors being constant, the 
higher the short-term, risk-free interest rate, the 
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greater the cost of buying the underlying and 
carrying it to the expiration date of the call 
option. Hence, the higher the short-term, risk¬ 
free interest rate, the more attractive the call 
option will be relative to the direct purchase of 
the underlying. As a result, the higher the short¬ 
term, risk-free interest rate, the greater the price 
of a call option. 

Anticipated Cash Payments on the 
Underlying over the Life of the Option 

Cash payments on the underlying tend to de¬ 
crease the price of a call option because the cash 
payments make it more attractive to hold the 
underlying than to hold the option. For put op¬ 
tions, cash payments on the underlying tend to 
increase the price. 

Option Pricing Models 

Earlier in this entry, it was explained how the 
theoretical price of a futures contract and for¬ 
ward contract can be determined on the ba¬ 
sis of arbitrage arguments. An option pricing 
model uses a set of assumptions and arbi¬ 
trage arguments to derive a theoretical price 
for an option. Deriving a theoretical option 
price is much more complicated than deriv¬ 
ing a theoretical futures or forward price be¬ 
cause the option price depends on the expected 
volatility of the underlying over the life of the 
option. 

Several models have been developed to de¬ 
termine the theoretical price of an option. The 
most popular one was developed by Fischer 
Black and Myron Scholes (1973) for valuing 
European call options on common stock. The 
Black-Scholes model requires as input the six 
factors discussed above that affect the value of 
an option. Several modifications to the Black- 
Scholes model followed. One such model is 
the lattice model suggested by Cox, Ross, and 
Rubinstein (1979), Rendleman and Bartter 
(1979), and Sharpe (1981). 


Basically, the idea behind the arbitrage ar¬ 
gument is that if the payoff from owning a 
call option can be replicated by (1) purchas¬ 
ing the underlying for the call option and (2) 
borrowing funds to purchase the underlying, 
then the cost of creating the replicating strat¬ 
egy (position) is the theoretical price of the 
option. 


KEY POINTS 

* For futures and forward contracts, the theo¬ 
retical price can be derived using arbitrage ar¬ 
guments. Specifically, a cash-and-carry trade 
can be implemented to capture the arbitrage 
profit for an overpriced futures or forward 
contract while a reverse cash-and-carry trade 
can be implemented to capture the arbitrage 
profit for an underpriced futures or forward 
contract. 

* The basic model states that the theoretical fu¬ 
tures price is equal to the cash market price 
plus the net financing cost. The net financing 
cost, also called the cost of carry, is the differ¬ 
ence between the financing cost and the cash 
yield on the underlying. 

* Because of institutional and contract specifi¬ 
cation differences, the market price for the fu¬ 
tures or forward contract can deviate from the 
theoretical price without any arbitrage oppor¬ 
tunities being possible. Basically, a band can 
be established for the theoretical futures price 
and as long as the market price for the futures 
contract is not outside of the band, there is no 
arbitrage opportunity. 

* The two components of the price of an option 
are the intrinsic value and the time premium. 
The former is the economic value of the op¬ 
tion if it is exercised immediately, while the 
latter is the amount by which the option price 
exceeds the intrinsic value. 

* The option price is affected by six factors: 
(1) the market price of the underlying; (2) 
the strike price of the option; (3) the time 
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remaining to the expiration of the option; 
(4) the expected volatility of the underlying as 
measured by the standard deviation; (5) the 
short-term, risk-free interest rate over the life 
of the option; and (6) the anticipated cash pay¬ 
ments on the underlying. 

• It is the uncertainty about the expected 
volatility of the underlying that makes valu¬ 
ing options more complicated than valuing 
futures and forward contracts. 

• There are various models for determining 
the theoretical price of an option. These in¬ 
clude the Black-Scholes model and the lattice 
model. 
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Abstract: Interest rate modeling has quickly become one of the main areas in financial markets. The 
models have grown in sophistication in response to development of new products and structures. 
Almost all pricing of securities and the risk management function, including marking-to-market, 
relies on interest rate modeling of some description. The information on interest rates, usually 
conveyed from the options markets, is important for other markets as well, such as the more 
established credit risk, commodities, equities, and the more recent ones such as inflation derivatives 
and insurance derivatives. Many models have been developed over the years, and their advantages 
and disadvantages should be appreciated and understood when they are applied. 


Throughout the world, interest rates serve as 
instruments of control. When inflation rises to 
an undesirable or politically unacceptable level, 
the appropriate authorities raise interest rates to 
curb expenditure. In times when economic ac¬ 
tivity and corporate and consumer confidence 
is less buoyant, the policy is to lower rates. In¬ 
terest rate derivatives were among the first con¬ 
tracts to be offered on derivative exchanges and 
have their origins in the period following the 
breakdown of the Bretton Woods Agreement. In 
today's sometimes volatile markets, they con¬ 
tinue to be extremely useful tools for corporates, 
banks, and individuals from hedging, financial 
engineering, and speculative perspectives. 

Two of the early prime movers in the interest 
rate derivatives market were the Chicago Board 


of Trade (CBOT) and the Chicago Mercantile 
Exchange (CME). (In July 2007 the CBOT and 
CME merged to form the CME group.) Some of 
the contracts that were introduced in the 1970s 
are still popular today, as evidenced by the high 
volume they enjoy. 

At the short end of the yield curve, the CME 
has the world's most actively traded exchange- 
based interest rate option contracts: Eurodollar 
options. Each Eurodollar option has as its 
underlying a Eurodollar time deposit futures 
contract with a principal value of $1 million, 
which will be cash settled at maturity. Another 
high-volume contract available on the CME 
is the option on the one-month Eurodollar 
futures contract. This, too, is cash settled at 
maturity. 
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CME also, has option contracts on U.S. Trea¬ 
sury bonds and notes in its portfolio of interest 
rate derivative products. There are American- 
style options available on bonds with a maturity 
of at least 25-years, between 15 but less than 25 
years, 10-year, 5-year, 3-year and 2-year notes, 
the most heavily traded of which is the 10-year 
Treasury note. The option contract has as its 
deliverable one U.S. Treasury 10-year note fu¬ 
tures contract with a face value of $100,000 at 
maturity. Whereas the CME futures contracts 
on short-term interest rates are cash settled, the 
bond / note futures require physical settlement. 
The corresponding option contracts have simi¬ 
lar settlement requirements identified in their 
contract specifications. The CME Group lists 
seven international data vendors who provided 
quotes for call/put options across a range of 
strike prices and maturities. 

Although option prices are easy to read and 
interpret from vendor screens, there is a mass 
of academic and practitioner research literature, 
which provides a platform from which bond op¬ 
tion prices in general can be calculated with in¬ 
tegrity. The literature on modeling interest rate 
derivatives in this arena is frequently divided 
into one- or two-factor (or multifactor) models. 

• Calculating option prices in a one-factor 
model usually proposes that the process is 
driven by the short rate, often with a mean- 
reversion feature linked to the short rate. 
There are several popular models that fall into 
this category, for example, the Vasicek model 
and the Cox, Ingersoll, and Ross (CIR) model, 
both of which will be discussed in more detail 
later. 

• Calculating option prices in a two-factor 
model involves both the short- and long-term 
rates linked by a mean-reversion process. 

The problem with some of the preceding mod¬ 
els is that they generate their own term struc¬ 
tures, which, in the absence of adjustment, do 
not match the term structure observed in the 
market. A category of arbitrage-free models 
proposed by Ho and Lee (1986); Hull and White 


(1990); and Black, Derman, and Toy (1990) seeks 
to eliminate this problem. For example, the 
Black, Derman, and Toy model enjoys a degree 
of popularity among market practitioners, since 
it takes account of and matches the term struc¬ 
ture observed in the market, it eliminates the 
possibility of generating negative interest rates, 
and it models the observed interest rate volatil¬ 
ity. These models together with other proposi¬ 
tions will be discussed in more detail in this 
entry. 

In order to examine some of the major de¬ 
velopments in option/derivative pricing in the 
interest rate field, it is appropriate at this point 
to establish a working framework. 


MODELING THE TERM 
STRUCTURE AND 
BOND PRICES 

Let (fi, E, {F f }f>o. Q) be a filtered probability 
space modeling a financial market, where the 
filtration F = {F t }t>o describes the flux of in¬ 
formation and the probability measure Q de¬ 
notes the risk-neutral measure; the real-world 
or physical measure will be denoted by P. 
The starting point in modeling bond prices is 
the assumption that there is a bank account 
B — {B(f)h>o that is linked to the bank instan¬ 
taneous interest rate (also called short rate, spot 
rate) process r = [r(t)} t >o through 


dB(t ) = r(t)B(t)dt or 


B(t) = B (0) exp 



( 1 ) 


From a practical point of view, we can safely 
assume that the majority of stochastic processes 
representing prices of traded financial assets are 
adapted to the filtration F and that the short- 
rate process r — jr(f)} f Q is a predictable pro¬ 
cess, meaning that r(t ) is Ff_i measurable. This 
implies that B(f) is also F f _i measurable and 
this condition is automatically satisfied for con¬ 
tinuous or left-continuous processes. 
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In this entry we consider only default-free se¬ 
curities. We shall denote by p(t , T) the price at 
time t of a pure discount bond with maturity T 
and obviously p(t,t) = p(T,T) =1. 

The following relationships are well known 
in the fixed-income area: 


0< p(f,T)<l, 

3 In p(t , T) 
~ 3 T 


. . 3 In p(t, T) 

r(t) = -9-- it=( 


3 t 

| j=t, for any t <T (2) 


Let f(t, s) be the forward rate at time s > 0 
calculated at time t < s. The instantaneous for¬ 
ward rate at time t to borrow at time T can be 
calculated from the bond prices using 


fit, T) = - 


3 In p(t, T) 
3T 


( 3 ) 


The reverse works as well; if forward rates 
are known, then bond prices can be calculated 
via p(t, T) = e~f‘ f^’ s l ds . The short rate is in¬ 
trinsically related to the forward rates because 

r(t) = fit, t). 


Short-Rate Models of Term Interest 
Rate Structure 

Many models proposed for the short-rate pro¬ 
cess r = {r(f)}f>o are particular cases of the gen¬ 
eral diffusion equation: 

dr(t) = a(t, r(t))dt + b(t, r(t))dW{t) (4) 

where W = {W(f)},> 0 is a standard Wiener 
process defined on (£2, E, {F t } t > 0 , Q). The fol¬ 
lowing list of models describes a chronological 
evolution without claiming that it is an exhaus¬ 
tive list: 

The Merton model (Merton, 1973) is 

dr(t) = adt + crdW(t) (5) 


normally distributed. The main moments are 

Mr(s)) = ^ + (r(t)-^je- pis ~ t) , t<s 

var f [r( S )]=^(l-e- 2 ^- f >), t<s (7) 
2 P 

cov t [r(u),r(s)] = _ l), 

2f 

t < u < s 

Another advantage is that this model can 
be also derived within a general equilibrium 
framework as illustrated by Campbell (1986). 

One disadvantage that is often discussed 
in the interest rate modeling literature is that 
there is a long-run possibility of negative inter¬ 
est rates. However, Rabinovitch (1989) proved 
that when the initial interest rate r(0) is posi¬ 
tive and the parameter estimates have reason¬ 
able values, the expected first-passage time of 
the process through the origin is longer than 
nine months. This result supports the use of the 
Vasicek model in practice since the majority of 
options traded on the organized exchanges ex¬ 
pire in less than nine months. 

The Dothan model (Dothan, 1978) is 

dr(t) = ar{t)dt + ar(t)dW(t) (8) 

This is the same model as Rendleman and 
Bartter's model (Rendleman and Bartter, 1980). 
This model is the only lognormal single-factor 
model that leads to closed formulae for pure 
discount bonds. Nonetheless, there is no closed 
formula for a European option on a pure dis¬ 
count bond. 

The Cox-Ingersoll-Ross (CIR) models (Cox, 
Ingersoll, and Ross, 1980,1985) are 

dr(t) = p(r(t)) 3/2 dW(t) 

dr(t) = (a- Pr(t))dt + o{r(t)) l l 2 dVJ(t) K ’ 


The Vasicek model (Vasicek, 1977) model is 

dr(t) — (a — fir(t))dt + <rdW(t) (6) 

One advantage of the Vasicek model is that 
the conditional distribution of r at any future 
time, given the current interest rates at time t, is 


CIR wrote arguably the first of several pa¬ 
pers developing one-factor models of the term 
structure of interest rates. Around the same 
time models in the same spirit include the Va¬ 
sicek, Dothan, Courtadon (1982), and Brennan 
and Schwartz (1979) models. The movements of 
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longer-maturity instruments are perfectly cor¬ 
related with the instantaneous short-term rates. 
The Ho-Lee model (Ho and Lee, 1986) is 

dr(t) = a(t)dt + adW(t) (10) 


This is the continuous version of the origi¬ 
nal model that was probably the first model 
designed to match exactly the observable term 
structure of interest rates. 

The Black-Derman-Toy (BDT) model (Black, 
Derman, and Toy, 1990) is 

dr ft) — a(t)r(t)dt + aft)dWft) (11) 


The Hull-White (HW) models (Hull and White, 
1990,1994,1996) are 

dr ft) = [aft) — ftft)rft)]dt + aft)dWft) 
dr(t) = [a(t) - ftft)rft)]dt + off)frft)) vl dWft) 

( 12 ) 

These models are two more general families 
of models incorporating the Vasicek model and 
CIR model, respectively. The first one is more 
often used and it can be calibrated to the ob¬ 
servable term structure of interest rates and 
the volatility term structure of spot or forward 
rates. However, its implied volatility structures 
may be unrealistic. Hence, it may be wise to use 
a constant coefficient ft ft) = ft and a constant 
volatility parameter aft) = a and then calibrate 
the model using only the term structure of mar¬ 
ket interest rates. It is still theoretically possible 
that the short rate r may go negative. The risk- 
neutral probability for the occurrence of such 
an event is 


Q(r(t) < 0) = N 


/(0,f)+gl(l-^) 2 \ 

y$<i-^) ) 

(13) 


where ffO, t) is the market instantaneous for¬ 
ward rate. In practice, this probability seems 
to be rather small, as empirical evidence il¬ 
lustrated by Brigo and Mercurio (2007) shows. 
However, the probability is not zero, and this 
may bother some analysts. 

An example will provide an idea of how a 
variation of one of the models proposed by Hull 


and White described above by the first of (12) 
models can be used to price an option on a zero- 
coupon bond. If the assumptions are made that 
both ft, the reversion rate, and er, volatility, are 
constant, then the model can be restated as: 


dr ft) = [a(f) — f)rft)]dt + adWft) (14) 


and the function aft) can be calculated from a 
given term structure using: 

a ft) = f T (0,0 + ftf (0, f) + (1 - e-W) 

(15) 

The future market price of a zero-coupon 
bond in this framework can be found by defin¬ 
ing the reversion rate, ft, the volatility, and the 
time period involved. 

p (T 0 , T) = AfT 0 , T)e- B ( T o.T>(T 0 ) (16) 


where To represents the forward date at which 
the bond is to be priced, T represents the bond's 
maturity date, t is a time period index typically 
taken to be equal to zero (that is, representing 
the current point in time) 


! (To, T) = i (l — e -« T -b)) 


(17) 


In AfT 0 , T) = In 


V (T T 0 ) 


9 In p (f, T 0 ) 
~9T 


-B(T 0 ,T) 

CT 2 / -/»(r-o _ -/»(T„-ey 

4 B 3 V ) 


x e 


(>(To-t) _ i) 


(18) 


and r(T) is the prevailing short rate at the for¬ 
ward date. 

To illustrate how this works consider the case 
where we wish to find the 1-year forward price 
of a bond with 4 years remaining to matu¬ 
rity. Assume that the yield curve offers 4.00% 
continuously compounded for all maturities, 
volatility is 2.00%, and the reversion rate is 
0.1. In this example T is 4 and To is 1. The 
price of the bond can be found using p ( 1,4) = 
A(lA)e~ B ^ om \ Clearly, A(l,4) and B(l,4) 
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must be evaluated. Starting with £>(1,4) we have 

B (1, 4) = Q \ (1 - e -o i( 4 ~i)) = 2.5918 . 

The next step requires the evaluation of A{\, 4) 
and the expression for In A( 1, 4) can be broken 
down into a series of relatively straightforward 
calculations: 


In 


\P(t,T 0 )J 


= In 


P(0,4) 


, / 0.8521 \ 
n 0.9607 j 


= In 


e -(4)(0.04) 


e -(l)(0.04) 

= - 0.12 


B(l, 4) has already been calculated and is 
equal to 2.5918. Moreover, can be ap¬ 

proximated by ( ln ? b+AO-ln p(t,T 0 -At) y vhfcht jf 

a time interval. At, is assumed to be 0.1 years 
yields ^ n P(o.i+o.i)-inp(o,i-o.i) ^ = _ 0 04 This 

leaves the expression: 


-^o 2 - l) 

= —-—-0.02 2 f e -(° D(4) _ e -(o.i )( i)\ 2 
4(0.1 ) 3 ' ' 

x (e 2(ai)(1) - l) = 0.001217 


Combining all the above calculations we 
find ln A(l, 4) = —0.01754 and then the 
one-year forward bond price is p ( 1 ,4) = 
g—0.01754...£—2.5918...(0.04) = Q.8858. 

The Black-Karasinski (BK) model (Black and 
Karasinski, 1991) is 


dr(t) = r(t)[a(t) — /1(f) ln r(t)]dt + cr(t)r(t)dW(t) 

(19) 


The BDT, HW, and BK models extended the 
Ho-Lee model to match a term structure volatil¬ 
ity curve (e.g., the cap prices) in addition to 
the term structure. The BK model is a gener¬ 
alization of the BDT model, and it overcomes 
the problem of negative interest rates, assum¬ 
ing that the short rate r is the exponential of 
an OU process having time-dependent coeffi¬ 
cients. It is popular with practitioners because 
it fits well the swaption volatility surface. Nev¬ 
ertheless, it does not have closed formulae for 
bonds or options on bonds. 


The Sandmann-Sondermann model (Sand- 
mann and Sondermann, 1993) is 

r(t) = ln(l + 17 (f)) 

drj(t) = p(t) (a(t)dt + a(t)dW(t)) ’ 

The Dothan model, BKi model, and the expo¬ 
nential Vasicek model given below imply that 
r is lognormally distributed. While this finding 
may seem reasonable, it is the cause for the ex¬ 
plosion of the bank account; that is, from a sin¬ 
gle unit of money, one may be able to make in an 
infinitesimal interval of time an infinite amount 
of money. The Sandmann-Sondermann model 
overcomes this problem by modeling the short 
rates as above. 

The Chen model (Chen, 1995) is 

dr(t) = (or(f) - r(t))dt + (a(f)r(f)) 1 / 2 dW 1 (f) 
da(t) = {a - ct(t))dt + (u(t)) 1/2 dW 2 (t) 
da(t) = (y- a(t))dt + (a(t)) 1/2 dW 3 (t) 

( 21 ) 

where a, y are constants and W 1 , W 2 , and W 3 
are independent Wiener processes. This is an 
example of a three-factor model. 

The Schmidt model (Schmidt, 1997) is 

r(t) = H[f(t)+g(t)W(Tm (22) 

where T = T(f) and H = H(x) are continu¬ 
ous nonnegative strictly increasing functions of 
f > 0 and real x, while / = /(f)andg = g(t) > 0 
are continuous functions. 

The exponential Vasicek model is 

dr(t) — r(t) [p — a lnr(f)] dt + (ir(t)dW(t) 

(23) 


This model is similar to the Dothan model, be¬ 
ing a lognormal short-rate model. This model 
does not lead to explicit formulae for pure dis¬ 
count bonds or for options contingent on them. 
In addition, this is an example of a nonaffine 
term-structure model. 

The Mercurio-Moraleda model (Mercurio and 
Moraleda, 2000) is 


dr(t) — r(t) 




—) 
i + ytj 


+ur(t)dW(t) 


ln r(t) 


dt 


(24) 
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The CIR++ model (Brigo and Mercurio, 2007) 
is 

r(t) = x(t) + <p(t) 

dx(t) — k[9 — x(t)]dt + <jy/x(t)dW(t) 

The extended exponential Vasicek model 
(Brigo and Mercurio, 2007) is 

r(t) = x(t) + <p(t) 

dx(t) = x(t) [rj — X lnx(f)] dt + ax(t)dW(t) 

(26) 

Two-factor models were based on a second 
source of random shocks. Two-factor mod¬ 
els were developed by Brennan and Schwartz 
(1982), Fong and Vasicek (1992), and Longstaff 
and Schwartz (1992a). However, Hogan (1993) 
proved that the solution to the Brennan and 
Schwartz model explodes, that is, reaches in¬ 
finity in a finite amount of time with positive 
probability. This shows that adding more fac¬ 
tors may cause unseen problems. More complex 
multifactor models are described by Rebonato 
(1998) and by Brigo and Mercurio (1997). 

Therefore, the short-rate models lead to two 
main problems. Mean-reverting models such 
as Vasicek or Hull and White may produce 
negative interest rates. From a computational 
perspective, if the risk-neutral probability of 
producing such negative rates is negligible, 
then those scenarios can simply be ignored in 
a Monte Carlo setup. The so-called lognormal 
models ensure nonnegativity of interest rates 
but may become explosive due to the change 
of scale in the short-rate modeling. Multifactor 
short-rate models become rapidly compu¬ 
tationally infeasible, and they may produce 
volatility surfaces that do not match those 
observed in the markets. 

The problems signaled above for the short- 
rate models led to the development of new 
classes of models, more notably the LIBOR 
market models or BGM developed by Brace, 
Gatarek, and Musiela (1997); Jamshidian (1997); 
and Musiela and Rutkowski (1997). This model 
starts with a geometric Brownian motion for 
the forward LIBOR rate L;(f) := L(f;T,, T, + 1 ), 


where 0 = T 0 < Tj < • • • < T„ to acquire posi¬ 
tivity of rates 

dLi(t) = n?(t)Li(t)dt + ai(t)Li(t)dW?(t) 

(27) 

where Q is the martingale measure corre¬ 
sponding to the numeraire N(t) — p(t, T n ), also 
called the terminal measure because the nu¬ 
meraire is the price of the bond with the last 

tenor. Now f] (1 + (Tjt+i ~ T k )L k (t)) = is 
k=i r ’ 

the numeraire rebased price of a traded asset, 

the zero-coupon bond with maturity T,. Hence, 
it should be a martingale and its drift must be 
zero. Calculating the drift with Ito calculus for 
all consecutive indexes i, i + 1 allows the drift 
determination 


/h Q (0= J2 


k>i+l 

k<n 


(% + 1 - 
[1 + (T k+ 1 


T k )L k (t) 

- T k )L k (t)] 


<ri(t)cr k (t)pi' k (t) 


( 28 ) 


for all i e {0 — 1 }. 

Other numeraires are also feasible but lead 
to a different style of calibration. The pricing of 
interest rate derivatives is realized with Monte 
Carlo simulation. 

The quest for ensuring positiveness of the 
short rates motivated the development of a new 
class sometimes called Markov functional mod¬ 
els. Important contributions in this area are 
Flesaker and Hughston (1996), Rogers (1997), 
and Rutkowski (1997), although some semi¬ 
nal ideas are also contained in Constantinides 
(1992). In a nutshell, given a strictly positive 
diffusion process {D(f)} f >o adapted to the filtra¬ 
tion of the probability space, the term-structure 
model described by p(t, T) = E ‘ is arbi¬ 
trage free, and if the diffusion process is also 
a supermartingale, then the short-rate process 
{r(f)h > 0 is positive with probability P one. 


MODELING IN PRACTICE 

One popular way of turning theory into prac¬ 
tice is to use a tree approach to modeling. The 
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Table 1 Market Spot and Forward Rates 


Time 

(Months) 

Implied Spot 
Zero Rates 

Implied 
Forward Rates 

6 

5.0000% 

5.0000% 

12 

5.1266% 

5.2533% 

18 

5.2544% 

5.5103% 

24 

5.3835% 

5.7714% 

30 

5.5141% 

6.0371% 

36 

5.6462% 

6.3080% 


tree can be either binomial or trinomial in its 
construction. To illustrate the idea, consider first 
the binomial approach. The tree could be set up 
to reflect observed or estimated market short 
rates, and the data provided in Table 1 will help 
to demonstrate this idea. 

The process starts from the first six-month pe¬ 
riod where the rate is known to be 5.000%. At 
the end of the six-month period, the following 
six-month forward rates are treated as being the 
short rates and are split, allowing interest rates 
to rise with a probability of 0.5 or fall with a 
probability of 0.5, but also taking into account 
the short-rate volatility. For a description of how 
this is achieved, see Eales (2000). Figure 1 shows 
how the rates would appear in a binomial tree 
once the procedure has been performed. 

When the rates have been established, they 
must then be calibrated. The calibration pro¬ 
cedure is achieved using the observed mar¬ 
ket price of a bullet government bond and 
pricing the bond using the "tree" calculated 
rates to obtain the appropriate discount fac¬ 


tors. Consider a three-year-to-maturity govern¬ 
ment bond trading at par and offering a coupon 
of 5.625% paid semiannually as an example. 
On maturity, the bond will be redeemed for 
102.8125, which is made up of the bond's face 
value, say 100, and one half of the annual 
coupon, 2.8125. 

Figure 2 illustrates how, moving back through 
the tree, the discounting process of the terminal 
payment taken together with the discounted in¬ 
terim coupons generate a bond price of 100.013. 
Given that the observed bond price is 100, the 
rates in the tree will need to be adjusted to en¬ 
sure that the backward calculated price agrees 
with the market price of the bond. In this exam¬ 
ple the adjustment factor is 0.6 basis points, and 
this will be added to every node in the tree with 
the exception of the starting value. The result¬ 
ing rates will then be as displayed in Figure 3. 

The calibrated tree can now be used to calcu¬ 
late corporate bond spreads as well as bond 
options. The outlined procedure is close to 
that advanced by Black, Derman, and Toy in 
that the process fits observed market rates and 
short-rate volatility. There is, however, a dan¬ 
ger that interest rates could go negative in this 
procedure. 

As an alternative to this binomial approach, 
FIull and White (1994) have suggested a two- 
stage methodology that uses a mean-reverting 
process with the short rate as the source of 
uncertainty and calculated in a trinomial tree 
framework. The first stage in the approach 
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Figure 1 Term Structure Evolution: Binomial Tree 
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100.013 

87.009 

2.455 

2.529 

2.602 

2.674 

2.744 


Coupons 
RD+ last coupons 
5 
4 
3 
2 
1 


90.067 


88.302 

2.536 


2.497 

2.607 


2.578 

2.677 


2.657 

2.745 


2.736 


92.961 


91.570 


90.000 

2.612 


2.583 


2.551 

2.680 


2.661 


2.639 

2.747 


2.737 


2.726 


95.685 


94.661 


93.499 


92.185 

2.684 


2.665 


2.644 


2.619 

2.749 


2.739 


2.729 


2.716 


1 98.236 1 

1 97.568 1 

1 96.807 | 

1 95.941 1 

I 94.957 | 

2.751 | 

2.741 | 

2.731 | 

2.719 | 

2.705 | 


1 100.612 1 

1 100.286 1 

1 99.913 1 

1 99.487 1 

1 99.000 1 

1 98.446 1 

I 2.8125 | 

1 2.8125 | 

I 2.8125 | 

1 2.8125 1 

1 2.8125 1 

I 2.8125 | 


100 

2.8125 

102.8125 


100 100 

2.8125 2.8125 

102.8125 102.8125 


100 

2.8125 

102.8125 


100 

2.8125 

102.8125 


100 

2.8125 

102.8125 


100 

2.8125 

102.8125 


Figure 2 Calibration 


| 4.766% 
T 4.639% | 

| 4.510% | | 5.195% 
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4.888% 


5.343% 


5.810% 


5.000% 


5.489% 


5.983% 


5.630% 


6.154% 


6.692% 


6.321% | 

| 7.088% | 

6.891% | | 7.937% | 

| 7.708% | | 8.878% I 


Figure 3 Adjusted Tree to Coincide with Current Market Price 


Rates rising 


ignores the observed market rates and centers 
the evolution of rates around zero and identifies 
the point at which the mean-reversion process 
takes effect. The second stage introduces the 
observed market rates into the framework es¬ 
tablished in stage one. The trinomial approach 
gives the tree a great deal more flexibility over 
its binomial counterpart, not least in relaxing 
the assumption that rates can either rise or fall 
with probability 0.5. 


HJM METHODOLOGY 

Heath, Jarrow, and Morton (1990a, 1990b, 1992) 
derived both one-factor and multifactor models 
for movements of the forward rates of interest. 
The models were complex enough to match the 
current observable term structure of forward 
rate and by equivalence the spot rates. Ritchken 
and Sankarasubramanian (1995) provide nec¬ 
essary and sufficient conditions for the HJM 
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models with one source of error and two state 
variables such that the ex post forward premium 
and the integrated variance factor are sufficient 
statistics for the construction of the entire term 
structure at any future point in time. 

Under this methodology, the bond dynamics 
are described by an Ito process: 

dp(t, T) = r(t)p(t, T)dt + a(t, T)p(t , T)dW(t) 

(29) 

Then 


d In p(t, T) — 



^ 2 (h T) 


dt + a(t,T)dW(t) 


(30) 


The equation for the forward rate can be de¬ 
rived now: 


Using (3) it follows that f^f(t,s)ds = 
— In p(t, T) = g(r(t), t, T) where 

g(x, t, T) = - In £ G [e-^ r(s)ds \r(t) = x 

(35) 

The continuous variant of the Ho-Lee model 
can be obtained for 


£(x, h T) = x(T - f) - V(T - f) 3 



— s)a(s)ds 


(36) 


where cr(f, T) = er(T — t), which implies that 
= cj 2 (T — t)dt + adW(t) so the initial for¬ 
ward curve is 


df(t, T) = -d ( — In p(t. 


(A ]np(t , T ) ) = _(^ d l np(f , T ) ) 


3 

3T 


r(t) - —<y 2 (t, T ) 


3 T 

dt + <r(f, T)dW(t) 


3cr(t, T) 9cr(f, T) 

= <r(t, T )—- — ^dW(t) (31) 


/(0, T) = 9g(r(0) ’°’ T) = r(0) - -ct 2 T 2 
1 v ’ 3 T w 2 


/ u(s)ds 

Jo 


(37) 


The short rate is given by 


The Wiener process W = |W(f)} is symmetric, 
and therefore we can safely replace W with -W, 
so 


df(t, T) = cr(t, T) 


3<r(t, T) 
3T 


dt + 


du(t, T) 
3 T 


dW(t) 

(32) 


Applying the fundamental theorem of calcu¬ 
lus for 3er(f, T)/3Tleads to 

a(f, T) — a(t, t) = f ^ ^ ds (33) 

Jt 3s 


It is obvious that n{t, t) = 0 and therefore the 
volatility of the forward rate determines the 
drift as well. In other words, all that is needed 
for the HJM methodology is the volatility of the 
bond prices. The short rates are easily calcu¬ 
lated from the forward rates. Once a model for 
short rates is determined under the risk-neutral 
measure Q, the bond prices are calculated from 


p(f, T) = E Q [e--^ r(s)ds |F f ] 


r(f) = /(0,f) + a 2 - + aW(f) 


(38) 


and the price of the pure discount bond with 
maturity T is 


p(t, T) — exp 


[-Jf 


f(t, s)ds — a 2 t 


t 


£ ^ - )ds-a(T -t)W(t) 


(39) 


Similarly, the Vasicek model is recovered for 
a(f, T) = ae^ (r -9 and /(0, T) 


a 

= — + e 


~ pT lr(0 ) - - ^(1 - e -^ T ) 2 


2 / 1 2 


(40) 


and this leads to 


r(t) = — + e ( r(0) — — J + ere pt J e ps dW(s) 


(41) 


(34) 
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BOND OPTION PRICING 

Formulae for bond options were found by Cox, 
Ingersoll, and Ross using the CIR model (square 
root process) for short rates and by Jamshidian 
(1989), Rabinovitch (1989), and Chaplin (1987) 
using the Vasicek model for the short-rate pro¬ 
cess. Rabinovitch advocated the idea that the 
bond follows a lognormal process (similar to 
equity prices). Chen (1991) pointed out that 
this assumption is grossly misleading since the 
bond price is a contingent claim on the same 
interest rate, so the bond option pricing model 
cannot be a two-factor model as proposed by 
Rabinovitch and it rather collapses onto a one- 
factor model, in which case the formulas are the 
same with those proved respectively by Chap¬ 
lin (1987) and by Jamshidian (1989). 

Bonds are traded generally over the counter. 
Futures contracts on bonds may be more liq¬ 
uid and may remove some of the modeling 
difficulties generated by the known value at 
maturity of the bonds. Fledging may be more 
efficient in this context using the futures con¬ 
tracts on pure discount bonds (provided they 
are liquid) rather than the bonds themselves. 
Chen (1992) provides closed-form solutions for 
futures and European futures options on pure 
discount bonds, under the Vasicek model. 

Hull and White used a two-factor version 
of the Vasicek model to price discount bond 
options. Turnbull and Milne (1991) proposed 
a general equilibrium model outside the HJM 
framework. They provide analytical solutions for 
European options on Treasury bills, interest rate 
forward and futures contracts, and Treasury 
bonds. In addition, a closed formula is iden¬ 
tified for a call option written on an interest 
rate cap. A two-factor model is also investi¬ 
gated, and closed-form solutions are provided 
for a European call on a Treasury bill. Chen and 
Scott (1992) use a two-factor CIR model that is 
essentially the same as the model analyzed by 
Longstaff and Schwartz (1992), and derive so¬ 
lutions for bond and interest rate options. The 
two-factor model is used, with the first factor 
having a strong mean reversion, explaining the 


variation in short-term rates, while the second 
factor has a very slow mean reversion, model¬ 
ing long-term rates. The model is also used for 
calculating premiums for caps on floating inter¬ 
est rates and for European options on discount 
bonds, coupon bonds, coupon bond futures, 
and Eurodollar futures. These are not closed- 
form solutions, but they are expressed as multi¬ 
variate integrals. However, the calculus can be 
reduced to univariate numerical integrations. 

European Options on the 
Money Fund 

In this section we consider the pricing of a Eu¬ 
ropean option on the money fund (this is the 
same as a bank account when the initial value 
B(0) =1). Thus, the payoff of a European call op¬ 
tion with exercise price K is max[B(T) — K, 0]. 
The continuous version of the Ho-Lee model 
is assumed for the short interest rate process. 
The risk-neutral valuation methodology pro¬ 
vides the solution as 

c B (0),t,k = E e [e-^ r(u)du max[B(T)-X,0]' 

= B(0)N(d + ) - p( 0, T)KN(d _) (42) 

where 



A proof of this formula is described in Epps 
(2000) in Section 10.2.2. 

Options on Discount Bonds 

Discount bond options are not very liquid, but 
they form an elementary component for pricing 
other options. For example, a floating rate cap 
can be decomposed into a portfolio of European 
puts on discount bonds. Similarly, with the Eu¬ 
ropean option contingent to the bank account. 
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we can price European options contingent on 
discount bonds. 

When the short rate process r = {r(t) } fol¬ 
lows the continuous time version of the Ho-Lee 
model given above by (10), the price at time 0 
of a European call option with maturity To with 
exercise price K on a discount bond maturing at 
T (T 0 < T) is 


c p(o,t);T o; k = E° x[p(T 0 , T) - K, 0]] 


p(0, T) 

= p(0 ’ To) Sa^ N(d+) 

— p(0, T 0 )KN(d-) 


(43) 


where 


In 


= 


( P(O.T) \ 

\p(OJo)K J 


,2 (T—Tq)Tq 


Oy /(T ~ Tp)T 0 
d- = d + — a y/(T — T 0 )T 0 


and 


A proof of this result is provided in Epps 
(2000). There is a similar put-call parity for 
European options contingent on a discount 
bond. If p p (o,r);T 0 ;K is the price at t = 0 of a 
European put option on the discount bond with 
maturity T, then for B(0) = 1, 


Cp(0,T);T 0 ;K ~ Pp(0,T);T 0 ;K 

= E Q [e-fo° r ^ ds (maxlp(T 0 , T) - K, 0] 

— max[fC - p(T 0 , T), 0])] 

= E Q [e-f°° r ( s)ds [p(T 0 , T) - X]] 

= E Q [e~lo r W°] - p( 0, T 0 )K 
= p( 0, T) - p(0. T 0 )K 

Put-call parity can be used to derive the price 
of a European put option: 


call option with maturity To with exercise price 
K on a discount bond maturing at T (To < T) is 


C p( 0,T);T 0 ;X = p(0, T)N(d+) - Kp( 0, T 0 )N(d _) 

(45) 

_ h '(m£b ) +,,2/2 


where d+ = 


and d_ = d+ 


with i] = g(1 e p T To)) y l 1 e 2 2f,r ° 

The put price can be obtained from put-call 
parity as 

Pp(0,T);T 0 ;K = Kp( 0, T 0 )N(-d.) - P (0, T)N(d+) 

(46) 


Example: Valuing a Zero-Coupon Bond Call 
Option with the Vasicek Model 

Let's consider this model for pricing a 3-year 
European call option on a 10-year zero-coupon 
bond with face value $1 and exercise price K 
equal to $0.5. As in Jackson and Staunton (2001), 
we use for the parameters of this model the 
values estimated by Chan et al. (1992) for U.S. 
1-month Treasury bill yield from 1964 to 1989. 
Thus, or = 0.0154, p = 0.1779, and cr = 2%. In 
addition, the value of the short rate r at time 
t = 0 is needed, so we take ;'o = 3.75%. Feeding 
this information into the above formulas, we 
get the output in Table 2. Thus, the value of the 
European call option is 

Cp(o,T);T 0 ;K = 0.5406 x 0.9822 — 0.5 x 0.8655 
x 0.9767 = 0.108 

A more general case is discussed by Shiryaev 
(1999) for single-factor Gaussian models mod¬ 
eling the short interest rate. These are single¬ 
factor affine models where the short rate r is 
also a Gauss-Markov process. The equation for 
this short rate process is 


Pp(0,T);T 0 ;K = P (0, T 0 )K N(—d_) 

-p(0,T 0 )^P-N(-d + ) (44) 
P( 0, T 0 ) 

Initially, the first formulas on pricing options 
on pure discount bonds used the Vasicek model 
for the term structure of interest rates. Thus, 
given that r follows (6), the price of a European 


dr(t) = [cx(t) — fi(t)r(t)]dt + cr(t)dW(t) (47) 

Table 2 Calculations of Elements for Pricing a 
European Call Option on a Zero-Coupon Bond When 
Short Rates Are Following the Vasicek Model 


p(0,T 0 ) 

p(0,T) 

d + 

d _ 

N(d + ) 

N(d.) 

0.8655 

0.5406 

2.1013 

1.9926 

0.9822 

0.9767 
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and we can easily recognize the first Hull-White 
model. The price of a European call option is also 


Cp(0,T);T 0 ;K = p(0, T)N(d+) - Kp( 0, T 0 )N(d_) 

(48) 

but where 


In 

d + = ~ 
d _— d-\- 


(4l3i) + b 2 ( T B,T)B2(T 0 ,T) 


,(T 0 ,T)B(T 0 ,T) 


and 


with 

B(T 0 ,T)=f T ^}-ds and <p( s) = e"-/? « u > du 
Jto <P(T 0 ) 

The price of the European put option is ob¬ 
viously again p p (p,T);T 0 ;K = Kp( 0, T 0 )N(-d_) - 

P(0, T)N(d+). 


Example: Valuing a Zero-Coupon Bond Call 
Option with the Hull-White Model 

When considering the pricing of a forward pure 
discount bond earlier in this entry, we used 
a numerical example. That example can now 
be expanded to demonstrate how, in practice, 
European calls and puts can be estimated in a 
Hull-White framework. Explicitly, the illustra¬ 
tion will demonstrate the pricing of a one-year 
European call option on a four-year-to-maturity 
discount bond with a strike price set equal to the 
forward price of the bond (0.8858...). 

Breaking down (d + ) into its component parts 
and evaluating each individually yields: 


In 


P(0. T) 

K (p (0, To)) 


= In I 


0.8521 


,(0.8858) (0.9607) 


= 0, B(T 0 ,T) = 2.5918 


n = 


r (l _ g-flT-To)) 1 _ e -2fiTo 


2p 


0.02 (1 - e- 01 < 3 )) /1 _ e -2(o.i)(i) 


0.1 


2 ( 0 . 1 ) 


= 0.0493 


The expression for (d + ) reduces to 
9 (Tp, T) B (Tp, T) = (0.0493) (2.5918) = Q 6395 
2 2 

The expression for d is (d_) = (d + ) — i] = 
0.6395 - 0.0493 = 0.0146. N(d+) is found to be 
0.5255 and N(d_) = 0.5058. Substituting these 


results into the call option formula gives a 
premium of 

c p{OJ y,T 0 ;K = (0.8521) (0.5255) 

-(0.8858) (0.9608) (0.5058) 

= 0.01730 


or 1.73%. 

One notable exception from this general class 
is the CIR model. There is a closed formula for 
this case, too. Following Clewlow and Strick¬ 
land (1998), the price at time 0 of a European 
pure discount bond option is 

Cp(o.r);T o; x = p(0, T)x 2 ^2.8[(p + \[r + B(T 0 , T)];2w, 

2(f> 2 r(0)e eT ° \ 

<p + ir + B(T 0 , T)J 

7 ( 2(h 2 r(0)e 6T ° 

-Kp( 0. T 0 )/ 2 2 S[4> + tfr];2<w, v y \ 

\ (p + ys 

(49) 

where 


e = s/W+i (H, ^ = 


29 


o 2 (e~ eT - 1)’ 


f = 

B(t,s) = ( 


p + e 


X = 


P+6 


2 P 


e 8 (s -0 . 


8 = 


\,Z( e fl(s-f) _Y) + Q / 

OJ (XT + In 9 - ln[X(e 9T - 1) + 61]) - In (K) 
B(T 0 . T ) 


and / 2 (.; a , b ) is the noncentral chi-squared den¬ 
sity with a degrees of freedom and noncentral¬ 
ity parameter b. 

Example: Valuing a Zero-Coupon Bond Call 
Option with the CIR Model 

Let's consider the same problem as described 
in the example using the Vasicek model above 
and price the 3-year European call option on 
a 10-year pure discount bond using the CIR 
model for the short interest rates. Recall that 
face value is $1 and exercise price K is equal to 
$0.5. As in the example with the Vasicek model, 
we consider that a — 2% and i'q = 3.75%. The 
CIR model overcomes the problem of negative 
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interest rates known for the Vasicek model as 
long as 2a > a 2 . This is true, for example, if we 
take a — 0.0189 and /3 = 0.24. Feeding this in¬ 
formation into the above formulas is relatively 
tedious. A spreadsheet application is provided 
by Jackson and Staunton. After some work, we 
get that the price of the call is 

Cp(o,r);T 0 ;X = 0.5324 x 1 — 0.5 x 0.8624 
x 1 =0.1012 


note by K, the value of p(T {) . 7}) as calculated 
from (49) with r K instead of r(Tg). Then 

m 

p K l +K m = K (52) 

!=i[r 0 ] 

Hence, the value at time 0 of a European call 
option with maturity To and strike price K on 
the coupon-bearing bond, under the one-factor 
HJM model described above, is given by 


Options on Coupon-Paying Bonds 

When short rates are modeled with single¬ 
factor models, Jamshidian (1989) proved that 
an option on a coupon bond can be priced 
by valuing a portfolio of options on discount 
bonds. This approach does not work in mul¬ 
tifactor models as proved by El Karoui and 
Rochet (1995). 

Consider a bond paying a periodic cash pay¬ 
ment p at times Ti, T 2 ,..., T m , and the princi¬ 
pal at maturity T = T m . A coupon bond can be 
mapped into a portfolio of discount bonds with 
corresponding maturities (under one source of 
uncertainty, that is, one factor model). The value 
of a coupon-bearing bond at time t < T m is 

m 

p(t, Tx,..., T m ; p) = p ^2 p(t, T) + p(t, T m ) 

i=m 

(50) 

where i[t] = min{/ : f < Ty}. 

Under the one-factor HJM model correspond¬ 
ing to the Ho-Lee model, a European option on 
a coupon bond can be valued as a portfolio of 
options contingent on zero discount bonds with 
maturities T), T 2 ,..., T m . Let To be the maturity 
of such a European option. 

Epps (2000) shows that 


P(To, T) 


pi 0, Ti) [— <r 2 <Ii ; 0>ZT ° -(r i -ro)(r(To)-/(0,To))] 
P(0,T 0 ) e 

(51) 


For any strike price K, there is a value rK 
of /'(To) such that when replaced in (48) with 
t = T 0 , implies p(T 0 , Ti,..., T,„) = K. Let's de¬ 


c p(o, 7 i . T„; P ) = E Q je to 0 '(#max[p(T 0 , Ti,..., T m ; p) 

-K.0]) 


= p ^2 E e je to 0r ( s '> ds max[ p(T 0 , Ti) — Ki, 0] J 

i=‘[T 0 ] 

+ E Q | e -/„W(s)rfs max[p(T 0 , T m ) - K m , 0] j 

m 

= P ^2 C P(0.T i y,T a ,K i +Cp(0,T m ),T 0 ;K„ (53) 

>=ipb] 


Example: Valuing a Coupon-Bond Call Option 
with the Vasicek Model 

The above example is reconsidered using the 
Vasicek model for the short-term interest rates. 
The bond is no longer a zero-bond but now pays 
an annual coupon at a 5% rate (p = 0.05), all the 
other characteristics being the same as before. 
In order to calculate the European call option 
price on the coupon bond, we need to calculate 
the interest rate ty, such that the present value at 
the maturity of the option of all later cash flows 
on the bond equals the strike price. This is done 
by trial and error using (48), and the value we 
get here is )'«■ = 22.30%. Next, we map the strike 
price into a series of strike prices via (50) that 
are then associated with coupon payments con¬ 
sidered as zero-coupon bonds and calculate the 
value of the European call options contingent 
on those zero-coupon bonds as in the preced¬ 
ing example. The calculations are described in 
Table 3. 

Because we started with a one -factor model 
for the short interest rates, we can use 
the decomposition property emphasized by 
Jamshidian (1997) and calculate the required 
coupon-bond European call price as the sum 
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Table 3 Calculations Using the Vasicek Model for 
Separate Zero-Coupon European Call Options (bond 
prices shown are calculated with the estimated rjp) 


Year 

p(T 0 ,Ti) 

| r K Face Value pK t 

Call Option 

4.0 

0.8094 

0.05 

0.0405 

0.006 

5.0 

0.6688 

0.05 

0.0334 

0.009 

6.0 

0.5624 

0.05 

0.0281 

0.012 

7.0 

0.4800 

0.05 

0.0240 

0.013 

8.0 

0.4148 

0.05 

0.0207 

0.013 

9.0 

0.3622 

0.05 

0.0181 

0.013 

10.0 

0.3192 

1.05 

0.3351 

0.278 


of all the elements in the last column in Table 3, 
which includes the coupon rate factor p. Thus, 
the value of this option is 0.344. 


Example: Valuing a Coupon-Bond Call Option 
with the CIR Model 

We repeat the calculation of the coupon-bond 
call option when the CIR model is employed 
for the short rates. The procedure is the same as 
in the case discussed previously for the Vasicek 
model. First, we calculate the interest rate 
such that the present value at the maturity of 
the option of all later cash flows on the bond 
equals the strike price. This value is here = 
25.05%. Next, we map the strike price into a 
series of strike prices via (50) that are then 
associated with coupon payments considered 
as zero-coupon bonds and calculate the value 
of the European call options contingent to 
those zero-coupon bonds. The calculations are 
described in Table 4. 


Table 4 Calculations Using the CIR Model for 
Separate Zero-Coupon European Call Options (bond 
prices shown are calculated with the estimated r^) 


Year 

p(T 0 ,Ti) 

| r K Face Value pK t 

Call Option 

4.0 

0.7934 

0.05 

0.0397 

0.006 

5.0 

0.6503 

0.05 

0.0325 

0.010 

6.0 

0.5470 

0.05 

0.0273 

0.012 

7.0 

0.4694 

0.05 

0.0235 

0.013 

8.0 

0.4094 

0.05 

0.0205 

0.013 

9.0 

0.3615 

0.05 

0.0181 

0.013 

10.0 

0.3223 

1.05 

0.3385 

0.267 


The value of the call option is 0.334, that is, the 
sum of all zero-coupon bond call option prices 
in the last column. 


Pricing Swaptions 

Swaptions options allow the buyer to obtain at 
a future time one position in a swap contract. It 
is elementary that an interest rate swap, fixed 
for floating, can be understood as a portfolio 
of bonds. Let's consider here that the notional 
principal is 1. Then the claim on the fixed pay¬ 
ments is the same as a bond paying coupons 
with the rate p and no principal. Let r be the 
time when the swap is conceived. The claim 
on the fixed income stream is worth, at time 

m 

r, p p( r, Tj). The floating income stream is 
i =1 

made up of cash returns on holding, over the 
period [Tj_i, T, ] a discount bond with maturity 
Tj, which is worth ~ 1- Thus, the value 

of the whole floating stream at time t = r is 


E z lj2e-- f ’ T ‘ ris)ds 


1 ~ p(Tj-i, Tj) 

pW-uTi) 


E ('V' (T fV 1 r(s)ds r(s)ds 1 - PCfl-l, Ti) \ 

r vtr ptf-i.s) / 


(54) 


Applying the properties of conditional expec¬ 
tations it follows that the above is equal to 



/ T; 

/ r ! r(s)ds 

I ,_1 


1 " p(Tj-i, L) l\ 
p(T-i.T) \) 


= E, 


m t- 

E e ~ fr ‘~ r(s) * (1 — p(T;-i, T;)) 


m 

= [pb. L-i) - pb. Tj)] = 1 - pb, T m ) ( 55 ) 

J=1 


Imposing the condition that the two streams 
have equal initial value leads to 

m 

p T <) = 1 - p( T - T >») 

i =1 
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which is equivalent to 

m 

p ^ P( T ’ + P( r > T m )-1=0 

i =1 

It follows then that the value of the swap at 
initialization is p(r, T\,, T m ) — 1. Thus, the 
option to get a long position in the fixed leg of 
the swap, with a fixed payment rate p, is worth 
at time 0 

E Q j e -/o»* max[ p ( r, Ti,..., T m ) - 1,0]} 

(56) 

It is clear now that this is the same as a Eu¬ 
ropean call option on a coupon-bearing bond 
when the exercise price is equal to 1. 

PRACTICAL 

CONSIDERATIONS 

As mentioned in the introduction, the 10-year 
U.S. Treasury note option traded on the CME 
is an extremely popular contract offering tight 
bid/ask spreads and transparent price quotes. 

The Eurodollar futures option traded on the 
CME is the most actively traded short-term in¬ 
terest rate option in the world. If the option 
contracts are exercised, the buyer and the seller 
of the option take positions in the Eurodollar 
futures contract, which is cash-settled, and the 
final price at delivery is equal to 100 minus the 
three-month US dollar LIBOR. 

Another liquid interest rate derivative market 
is the OTC in floating rate caps. The majority of 
caps are contingent on LIBOR (but can be also 
on a Treasury rate), and discounted payments 
are made at the beginning of each tenor. The 
payments can be made either at the beginning 
or the end of each reset period, and the life of 
a cap may be only a few years or as long as 10 
years. The starting point in pricing these Euro¬ 
pean options is a model for future changes in 
US dollar LIBOR. 

Hull and White (1990) showed that the cap 
can be priced as a portfolio of European puts 
on discount bonds. 


KEY POINTS 

• One-factor short-rate models for interest rate 
derivatives are easy to work with since the 
majority of them lead to closed-form solu¬ 
tions for options pricing. However, some of 
them allow for negative interest rates, which 
may not be acceptable in a real trading envi¬ 
ronment. 

• Two-factor models for interest rates provide 
improved calibration at the expense of com¬ 
putational simplicity. 

• The two-factor Hull and White model, 
falling under the general Heath-Jarrow- 
Morton framework, is complex enough to 
calibrate market data easily while retaining 
computational simplicity through closed- 
form solutions for a wide range of interest 
rate derivatives. 

• The need for improved calibration of forward 
curves led to the development of a different 
class of models called LIBOR models. 

• The calibration of caps and floors, and also 
swaptions, is indicative of the success of an 
interest rate model. 
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Abstract: Historically, theorists have devoted a substantial amount of work developing a mathemat¬ 
ical model for pricing options and, hence, a number of different models exist as a result. All make 
certain assumptions about market behavior, which are not totally accurate, but which give the best 
solution to the price of an option. Professionals use these models to price their own options and to 
give theoretical fair value; however, actual market rates will always be the overriding determinant. 
In other words, an option is worth as much as someone is prepared to pay for it. Although the 
formula for pricing options is complex, they are all based on the same principles. 


Historically, option-pricing models have fallen 
into two categories: 

• Ad hoc models, which generally rely only 
upon empirical observation or curve fit¬ 
ting and, therefore, need not reflect any of 
the price restrictions imposed by economic 
equilibrium. 

• Equilibrium models, which deduce option 
prices as the result of maximizing behavior 
on the part of market participants. 

The acknowledged basis of modern option 
pricing formulas is the often-quoted Black- 
Scholes formula, devised by Black and Scholes 
(1973) to produce a "fair value" for options on 
equities. Of course, currency options differ be¬ 
cause there is no dividend and both elements of 
the exchange carry interest rates that can be fixed 
until maturity. Therefore, various adaptations 
to the original Black-Scholes formula have been 
made for use in currency option pricing. The 


best known of these is the Garman-Kohlhagen 
adaptation, which adequately allows for the 
two interest rates and the fact that a currency 
can trade at a premium or at a discount forward 
depending on the interest rate differential. 

American-style options cause further prob¬ 
lems in the pricing due to the probability of 
early exercise. Cox, Ross, and Rubinstein (1979) 
introduced a pricing model to take account of 
American-style options. By using the same ba¬ 
sics as Black-Scholes, they adopted what is now 
known as the "binomial" method for pricing 
such options. This same binomial model is now 
used alongside the Garman-Kohlhagen version 
to price currency options. 

BASIC PROPERTIES 

First, though, there are a few basic properties 
of options, especially when looking at option 
prices to consider: 
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• Options cannot have a negative value to their 
holders. Since options are rights and these 
rights will be exercised to benefit only the 
holder, the option cannot be a liability to its 
holder. 

• Option prices should not allow simple arbi¬ 
trage; that is, it should not be possible to buy 
an American call or put and immediately ex¬ 
ercise the option for a profit greater than the 
price paid for the option. This need not be true 
for European options, since the option holder 
does not have the right to exercise until the 
maturity date. 

• American-type options should be worth at 
least as much as European-type options. 
Since American options have all the rights a 
European option has plus the right of early ex¬ 
ercise, an American option will be as valuable 
as a European option if the right to early ex¬ 
ercise is worthless and more valuable than a 
European option if the right of early exercise 
is valuable. 

In addition to the currency price, the exercise 
price, and the time to maturity, option values 
depend on the price volatility of the underlying 
currency, the risk-free rate of interest, and any 
cash distributions made by the currency during 
the life of the option. For a call option, a higher 
current currency price should imply a greater 
value to the option holder. This is because a 
higher present currency price makes it more 
likely that on the expiration date, the market 
price of the currency will be above the exercise 
price. As this is precisely the condition under 
which the option will be exercised, the value of 
a call option increases as the present currency 
price increases. For put options, however, the 
effects of changes in the current asset price go 
in the opposite direction, as it pays the holder 
of the put to exercise when the currency price is 
low; that is, the value of a put option decreases 
as the present currency price increases. 

The effect of the exercise price, X, on the value 
of the call option is straightforward. Holding all 


other factors constant, a higher exercise price 
diminishes the profit from the exercise of the 
option. An increase in the exercise price would, 
therefore, lead to a decrease in the price of the 
call option. In the case of put options, a higher 
exercise price increases the profit from exercise 
of the option. Thus, put option prices increase 
with an increase in their exercise price. 

The effect of an increase in time to maturity 
on the value of an option depends on the nature 
and type of option. There is an asymmetric na¬ 
ture to option contracts that causes the holder to 
benefit from increased uncertainty. The option 
holder stands to gain by a rise in uncertainty, 
and therefore the value of the call option in¬ 
creases as its time to maturity increases. Also, 
the present value of the exercise price decreases 
as the time to maturity increases. Therefore, the 
time left to maturity has a way of influencing 
option values. An American put option cannot 
logically decrease in value with an increased 
time to maturity, but with a European put op¬ 
tion, the net effect of these two influences is am¬ 
biguous; that is, increased uncertainty increases 
value, while the decreased present value of the 
exercise price decreases value. 

An increase in the volatility of the currency 
price makes future currency prices more vari¬ 
able and increases the probability of large gains. 
Again, the asymmetry of the option contract al¬ 
lows the option holder to benefit from increased 
uncertainty since the option is effectively in¬ 
sured against downside risk. 


THEORETICAL VALUATION 

The price and subsequent value of an option are 
determined by a theoretical valuation based on 
several known and estimated factors. The time 
until maturity, the current foreign exchange 
spot and forward exchange prices, the strike, 
and the cost of funding the option premium are 
all readily available. Meanwhile, a market has 
developed that estimates the future volatility 
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or, in other terms, the activity of the underlying 
cash product. The greater the anticipated move¬ 
ment, the greater the value of the option for a 
given fixed set of parameters. Options also in¬ 
crease in value the smaller the distance between 
the strike price and the forward foreign exchange 
rate, and the greater the time to maturity. For 
European and American options, most market 
participants accept the valuation put forward 
by Black and Scholes, and, as such, option prices 
can be agreed once the factors are entered into 
the equation. 

This theoretical model also calculates the risk 
associated with changes in any of the variables 
required for pricing the currency option. The 
delta, or hedge ratio, of the option is the de¬ 
gree to which the option value will change 
with a movement in the underlying currency. 
A dollar/Swiss franc option with a 20% delta 
would change in value by approximately 20 
franc points for every 100-point spot move. 
While the delta is the first derivative of the 
price, gamma is the second one, or change in 
delta for every move in the spot foreign ex¬ 
change rate. A 50-delta dollar call option with a 
15% gamma would have a 65 delta if the dollar 
appreciated 1%. 

It is this dynamic nature of the delta that 
allows an option to be a leveraged product 
with limited risk and unlimited profit poten¬ 
tial. Profitable positions effectively grow in size, 
while unprofitable trades are impacted less by 
adverse changes in the market. 

The vega , or volatility risk, of an option is the 
extent to which the valuation will change with 
varying estimates of volatility. The theta, or time 
decay, is the decrease in value of the option as 
it approaches maturity, as an option is a con¬ 
stantly diminishing asset. Finally, every option 
has forward foreign exchange risk equivalent to 
the delta and an interest rate exposure based on 
changes in funding costs. The delta and interest 
rate risks can be hedged easily in the relevant 
markets. The dynamic nature of the other risks 
is the essence of the options market. 


BLACK-SCHOLES MODEL 

In 1973, Black and Scholes published a paper de¬ 
scribing an equilibrium model of stock option 
pricing that is based on arbitrage. This is made 
possible by their crucial insight that it is possible 
to replicate the payoff to options by following 
a prescribed investment strategy involving the 
underlying asset and lending/borrowing. 

The mathematics employed in the Black- 
Scholes model is complex, but the principle is 
straightforward. The model states that the stock 
and the call option on the stock are two com¬ 
parable investments. Therefore, it should be 
possible to create a riskless portfolio by buying 
the stock and hedging it by selling call options. 
The hedge is a dynamic one because the stock 
and the option will not necessarily move by the 
same amount, but by continuously adjusting 
the option hedge to compensate for movement 
in the underlying market, the overall position 
should be riskless. Therefore, the income 
received from investing in the call option 
premium will be offset exactly by the cost of 
replicating (hedging) the position. If the option 
premium is too high, the arbitrageur will make 
a riskless profit by writing call options and 
hedging the underlying stock. If too low, it 
should be possible to profit by buying the call 
option and selling sufficient stock. 

Black and Scholes demonstrated that the op¬ 
tion premium could be arrived at through an 
arbitrage process in a similar manner to that in 
which a currency forward rate can be derived 
through a formula linking the spot rate and the 
interest rate differential. Also, in the same way 
that a currency forward rate is not "what the mar¬ 
ket thinks the currency will be worth at a future 
date" but simply based on an arbitrage relation¬ 
ship, the Black-Scholes model is not influenced 
by such factors as market sentiment, direction, 
or apparent bias. In fact, an assumption of the 
model is that the market moves in a random 
fashion in that, while prices will change, the 
chances of an up move against a down move are 
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about even, and that future price movements 
cannot be predicted from the behavior of the 
past. 


The Model: 

C = SN(dj) - Ke(- rt >N(d 2 ) 

C = theoretical call premium 
S = current stock price 
t = time until option expiration 
K = option striking price 
r = risk-free interest rate 
N = cumulative standard normal 
distribution 

e = exponential term (2.7183) 

ln(S/K) + (r + £) t 
di =-7=- 

sVt 

d 2 = dj — sVt 

s = standard deviation of stock returns 
In = natural logarithm 


Plotted over a period of time, the distribution of 
prices takes on the characteristics of the "bell- 
shaped" curve. Such a distribution is a key as¬ 
sumption of the Black-Scholes model, yet with 
the foreign exchange markets in particular, it 
is a questionable one. Even with its economic 
liquidity and its global 24-hour structure, for¬ 
eign exchange is by no means a perfect market. 
Frequently, there are times when prices do not 
behave in a normally distributed fashion. Such 
occurrences as wars, central bank intervention, 
and unexpected political or economic news are 
all factors, which can and do disrupt the day- 
to-day business of the market. 

Furthermore, in order to simplify the calcula¬ 
tion process. Black and Scholes made other as¬ 
sumptions about market behavior, which may 
vary from the real world. They assumed that 
volatility was known and constant, that in¬ 
terest rates were constant, that there were no 
transaction costs or taxation effects, that trad¬ 
ing was continuous, that there were no divi¬ 
dends payable, and that options could only be 
exercised on the expiry date. 


Interest rates will vary, of course, as will 
volatility, and even the foreign exchange mar¬ 
kets have transaction cost in the bid-offer 
spread. Frequently, the market will become 
very thin or almost untradable during highly 
volatile periods. However, most of these as¬ 
sumptions can be relaxed without inordinately 
affecting the formulations of the pricing model, 
and where the assumptions are more critical, 
other models have been developed. 


EXAMPLES OF OTHER 
MODELS 

Theorists have devoted a substantial amount of 
time and effort developing mathematical mod¬ 
els for pricing options, and a number of differ¬ 
ent models exist as a result. All make certain 
assumptions about market behavior, which are 
not totally accurate, but which give the best so¬ 
lutions to the price of an option. For example, 
the model formulated by Merton (1973) general¬ 
ized the Black-Scholes formula, so it could price 
European options on stocks or stock indexes 
paying a known dividend yield. 

Another example is the Cox, Ross, and Ru¬ 
binstein model (1979), which could account for 
the early exercise provisions in American-style 
options. Using the same parameters as in the 
Black-Scholes model, they adopted what is 
known as a "binomial" method to evaluate the 
premium. Making the assumption that the op¬ 
tion market behaves efficiently and therefore 
the holder of a call or put option will exercise if 
the benefit of holding the option is outweighed 
by the cost of carrying the hedge, the binomial 
process involves taking a series of trial esti¬ 
mates over the life of the option; each estimate 
(or iteration) is a probability analysis of the like¬ 
lihood of early exercise on any given day. 

Garman and Kohlhagen (1983) extended the 
Black-Scholes model to cover the foreign ex¬ 
change market, where they allowed for the 
fact that currency pricing involves two interest 
rates, not one, and that a currency can trade 
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at a premium or discount forward, depend¬ 
ing on the interest rate differential. Like the 
Merton formula, the Garman and Kohlhagen for¬ 
mula applies only to European options. 

PRICING WITHOUT A 
COMPUTER MODEL 

Against all the above theories, there is a way 
to price an option without a computer model. 
This can be obtained by the following equa¬ 
tion, which will give a good approximation for a 
European option premium. The formula is: 

Price = BB x forward outright rate 

This is where: 

AA = square root (daystoexpiry/365) 
x volatility x 0.19947 

and 

BB = ((AA + 0.5) x 2) - 1 

This formula is where price is the premium 
for an at-the-money European option quoted in 
units per base currency. 

Educated Guess 

Another calculation relies heavily on probabil¬ 
ity theory. The principal concepts are expected 
value and the lognormal distribution. Since the 
future is unknown, it is an "educated guess" 
about where the spot market might be in order 
to determine the value of that right today. Thus, 
rather than trying to predict the future spot rate, 
option pricing takes a systematic, mathematical 
approach to the educated guess. 

In this case, expected value (EV) is the payoff 
of an event multiplied by the probability of it oc¬ 
curring. For example, the probability of rolling 
a six on one die is 1 /6 or 16.67%. The EV of a 
game in which is paid $100 for rolling a six and 
nothing for any other roll is: 

(1/6 x $100) + (5/6 x $0) = $16.67 


where the expected value is the fair price for 
playing such a game. 

An options premium can be thought of in the 
same way, although instead of six possible out¬ 
comes, there are hundreds. All the spot rates 
that might prevail are the options expirations. 
Each outcome will have a specific value. This 
will either be zero if the option is out-of-the- 
money or the difference between the closing 
spot and the strike price if the option is in- 
the-money. Each closing spot rate can also be 
thought of as having its own discrete probabil¬ 
ity. If, for each outcome, the value of that out¬ 
come is multiplied by its probability and then 
the results are added up, the sum would be the 
premium of the option. The expected value of 
an option (the probability minus the weighted 
sum of all its possible payoffs) is the fair price 
for buying the option. 

THE PRICE OF AN OPTION 

The price of an option is made up of two sepa¬ 
rate components: 

Option premium = Intrinsic value + Time value 

where intrinsic value is the value of an option 
relative to the outright forward market price, 
that is, it represents the difference between the 
strike price of the option and the forward rate at 
which one could transact today. Intrinsic value 
can be zero but never negative. 

There are six factors that contribute to this 
pricing of an option: 

• Prevailing spot price 

• Interest rate differentials (forward rate) 

• Strike price 

• Time to expiry 

• Volatility 

• Intrinsic value 

As described above, the best-known origi¬ 
nal closed-form solution to option pricing is 
the Black-Scholes model. Also, as was men¬ 
tioned, in its simplest form, it offers a solution to 
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pricing European-style options on assets with 
interim cash payouts over the life of the op¬ 
tion. The model calculates the theoretical, or 
fair value for the option by constructing an in¬ 
stantaneously riskless hedge that is one whose 
performance is the mirror image of the option 
payout. The portfolio of option and hedge can 
then be assumed to earn the risk-free rate of 
return. 

Central to the model is the assumption that 
markets' returns are normally distributed (that 
is, have lognormal prices), that there are no 
transaction costs, that volatility and interest 
rates remain constant throughout the life of the 
option, and that the market follows a diffusion 
process. The model has these five major inputs: 

* The risk-free interest rate 

* The option's strike price 

* The price of the underlying 

* The option's maturity 

* The volatility assumed 

Since the first four are usually determined, 
markets tend to trade the implied volatility of 
the option. For example, a six-month European- 
style sterling put/dollar call with the spot rate 
at $/£1.7500 and forward points of 515, giving 
an outright forward of 1.6985 (1.7500 - 0.0515), 
will have an intrinsic value of 4.15 cents per 
pound. 

While the Black-Scholes pricing formula looks 
formidable, it is important to understand that 
the formula is nothing more than the simple 
two-state option-pricing model applied with an 
instantaneous trading interval. 

If the strike price of the option is more favor¬ 
able than the current forward price, the option 
is said to be in-the-money. If the strike price is 
equal to the forward rate, it is an at-the-money 
option, and if the strike price is less favorable 
than the outright, the option is termed out-of- 
the-money. 

For American-style options, a similar defini¬ 
tion applies except that the option's "money¬ 
ness" relative to the spot price also needs to be 


considered. Clearly, in the example above, an 
American-style option would be in-the-money 
relative to the forward but not to the spot. Con¬ 
versely, if the option had the same details except 
that it was a call on sterling, it would clearly be 
out-of-the-money under the European defini¬ 
tion, but as an American style option it would 
be in-the-money relative to the spot price. Nat¬ 
urally, the cost of the option would need to be 
considered in order to achieve a profitable early 
exercise of an American option and this leads to 
a phenomenon peculiar to American-style op¬ 
tions known as "optimal exercise." This is the 
point at which it becomes profitable to exercise 
an American-style option early, having taken 
account of the premium paid. 


Option Premium Profile 

Figure 1 shows premium against spot at a given 
point in time. It can be seen that the time value 
call position is greatest when the option is at- 
the-money. This is because it represents the 
highest level of asymmetric risk, which is opti¬ 
mum risk reward profile. 

The time value tends to zero as spot goes deep 
out-of-the-money and thus converges with the 
maximum loss expiry line and also as it goes 
deep in-the-money, converging with the un¬ 
limited profit expiry line. The change in the 
premium is not parallel to the change in the 
underlying value. The premium will change 
more rapidly when the option is near at-the- 
money and less rapidly when the option is in- 
the-money or out-of-the-money. 
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Time remaining until expiration (in months) 
Figure 2 Time Value Premium Delay 

Time Value and Intrinsic Value 
The option premium can be split into two parts: 
intrinsic value and time value. The effect of an 
increase in time on an options premium is not 
linear. This is because the probability of a rise or 
fall in a currency's value does not increase on a 
straight-line basis. For example, all things being 
equal, the premium for an at-the-money three- 
month option is worth only about two-thirds 
more than for a one-month option (not three 
times its value). A one-year option is worth only 
about one-third more than a six-month option 
(instead of twice its value). As a consequence, 
the premium for at-the-money options declines 
at an accelerating rate towards expiry. Figure 2 
demonstrates the time value premium delay. 
Time value is affected by a number of factors: 

• The time remaining to expiration. 

• The volatility of the underlying spot market. 

• The strike price of the option. 

• The forward rate of the currency pair. 

• The current interest rates. 

Time to Expiry 

The time decay of an option is related to the time 
remaining in the option; in fact, it is propor¬ 
tional to the square root of the time remaining. 
The reason for this phenomenon is twofold: 

1. The longer the time to maturity, the greater is 
the chance that the exchange rate moves such 
that the option will be exercised. The rate at 
which the premium diminishes as the option 
approaches expiry is called the "time decay" 


and the rate of decay is exponential, that is, 
the option loses time value more quickly ap¬ 
proaching expiry than it does earlier in its 
life. At expiry, the option will have only in¬ 
trinsic value and no time value. 

2. The time value can be thought of as "risk 
premium" or the cost to the writer of hedging 
the uncertainty of exercise. 

Volatility 

In essence, volatility is a measure of the vari¬ 
ability (but not the direction) of the price of the 
underlying instrument, essentially the chances 
of an option's being exercised. It is defined as 
the annualized standard deviation of the natu¬ 
ral log of the ratio of two successive prices. 

Historical volatility is a measure of the stan¬ 
dard deviation of the underlying instrument 
over a past period and is calculated from actual 
price movements by looking at intraday price 
changes and comparing this with the average 
(the standard deviation). The calculation is not 
affected by the absolute exchange rates, merely 
the change in price involved. Thus, for example, 
the starting and finishing points for two sepa¬ 
rate calculations could be exactly the same but 
could give two very different levels of volatility 
depending on how the exchange rate traded in 
between. Thus, if the market has traded up and 
down erratically, the reading will be high, and if 
instead it has gradually moved from one point 
to the other in even steps, then the reading will 
be lower. 

Implied volatility is the volatility implied in the 
price of an option, that is, the volatility that 
is used to calculate an option price. Implied 
volatilities rise and fall with market forces and 
tend to reflect the level of activity anticipated 
in the future although supply and demand 
can at times be dominant factors. In the pro¬ 
fessional interbank market, two-way volatility 
prices are traded according to market percep¬ 
tion and these volatilities are converted into 
premium using option models. Implied volatil¬ 
ity is the only variable affecting the price of an 
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option that cannot be directly observed in the 
markets, thus leading to the typical variations 
in price inherent in any marketplace. 

Actual volatility is the actual volatility that oc¬ 
curs during the life of an option. It is the differ¬ 
ence between the actual volatility experienced 
during delta hedging and the implied volatil¬ 
ity used to price an option at the outset, which 
determines if a trader makes or loses money on 
that option. 

In summary, implied volatility is a timely 
measure, in that it reflects the market's per¬ 
ceptions today. On the other hand, historical 
volatility is a retrospective measure of volatil¬ 
ity. This implies that it reflects how volatile the 
variable has been in the recent past. But it has 
to be remembered that it is a highly objective 
measure. Implied volatilities can be biased, es¬ 
pecially if they are based upon options that are 
traded in a market with very little liquidity. 
Also, historical volatility can be calculated for 
any variable for which historical data is tracked. 

Volatility affects the time value or risk pre¬ 
mium of an option, as an increase in volatility 
increases the time value and thus the price of the 
option. Likewise, a decrease in volatility lowers 
the price of the option. For example, consider 
the position of the writer of an option, whereby, 
say, a bank sells an option to a client, giving 
the client the right to purchase dollars and sell 
Swiss francs in three months' time. In order to 
correctly hedge the position, consider what will 
happen in three months' time. 

If the spot is above the strike price of the 
option, the client will exercise the option and 
the bank will be obliged to sell dollars and buy 
francs. However, if the spot is below the strike 
price, the client will allow the option to lapse. 
Hence, the bank's initial hedge for the option 
will be to purchase a proportion of dollars in the 
spot market against this potential short dollar 
position. If the spot subsequently rises, the like¬ 
lihood of the option's being exercised will in¬ 
crease and so the initial hedge will be too small. 
Therefore, the bank will need to buy some more 
dollars, which it does at a rate worse than the 


original rate at which the option was priced, 
thereby losing money. Conversely, if the spot 
rate falls, this makes the option less likely to be 
exercised and the bank will then find itself hold¬ 
ing too many dollars and will have to sell them 
out at a lower price than where they were pur¬ 
chased, thus losing more money. These losses 
are called "hedging costs," and each time the 
spot market moves, the rehedging required will 
lose the bank money. In essence, the premium 
received by the writer is effectively the best es¬ 
timate of these hedging costs over the life of the 
option. 

Strike Price and Forward Rates 

An option's time value is greatest when the 
strike price is at-the-money and the further 
in or out-of-the-money the strike price is, the 
lower the time value is. This can be explained 
by again considering the hedging costs. If the 
option is originally at-the-money, it is said to 
have a 50 delta and therefore the initial hedge 
will be to buy or sell half the principle amount 
of the option. The delta of the option can be 
thought of as the probability of exercise and so 
a 50 delta gives a l-in-2 chance of exercise, that 
is, maximum uncertainty. As the spot moves, 
the delta will change and require readjusting 
of the hedge in the spot market. The change in 
delta (or gamma) is greater for a 50-delta option 
than for an option with a much higher or lower 
delta, for example 80- or 20-delta. This is be¬ 
cause the likelihood of exercise, and therefore 
the amount of hedge required, changes more 
rapidly. Thus, less readjustment is required for 
these high and low delta options, and conse¬ 
quently, fewer hedging costs are incurred for 
the low and high delta options. This leads to 
lower levels of risk premium or time value for 
in-the-money and out-of-the-money options. 

Interest Rates 

The currency interest rate is another factor that 
affects option premiums. As premium is usu¬ 
ally paid up front, it must be discounted to take 
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account of the interest that would be earned 
by putting the premium on deposit. Thus, the 
higher the domestic interest rate, the greater the 
discounting effect on the premium. 

The effect of interest rate differential on the 
option premium is not intuitively obvious, yet 
it is one of the most important components of 
the premium for a currency option. If the dol¬ 
lar interest rate rises in relation to the interest 
rate of the foreign currency, the premium of 
a currency call option will increase in value. 
This is because holding a foreign currency and 
buying a currency call option are alternative in¬ 
vestments. On the one hand, the investor will 
sell (borrow) dollars and buy (invest in) a for¬ 
eign currency in order to take advantage of a 
rise in that foreign currency. On the other hand, 
the trader could just simply buy a currency call 
option. If the dollar interest rate rises, the cost 
of borrowing dollars will increase, which will 
make the alternative of buying a currency call 
option more attractive. Consequently, the pre¬ 
mium will rise. 

This can equally be explained in terms of the 
forward value of a currency. If the dollar in¬ 
terest rate rises in relation to the foreign cur¬ 
rency interest rates, and the spot rate remains 
the same (unchanged), covered interest rate ar¬ 
bitrage will ensure that the forward rate of the 
foreign currency will rise relative to the spot. 
Therefore, the call option on that currency will 
also rise in value. Of course, the dollar interest 
rate might remain the same, but the interest rate 
of the foreign currency might fall. The effect on 
the interest rate differential and therefore on the 
value of the currency call option will remain the 
same, but the premium will rise. 

The converse is true for currency put options, 
because an increase in the dollar interest rate 
in relation to the foreign currency interest rate 
will, given no change in the spot price, result in 
a rise in the forward value of the currency. Thus, 
the holder of a put option on the currency will 
see the premium fall. Buying a currency put op¬ 
tion is an alternative strategy to borrowing in 
that currency and investing in dollars. Hence, a 


rise in the dollar interest rate or a fall in the for¬ 
eign currency interest rate makes the put option 
strategy less attractive, and the put premium 
will fall. 

The effect of interest rate differential changes 
on currency option premiums can be summa¬ 
rized as follows: 

* Assuming the spot rate remains unchanged, 
a rise in dollar interest rates relative to the 
foreign currency interest rate, or a fall in the 
foreign currency interest rate relative to the 
dollar interest rate, will increase the premium 
for a currency call option and decrease the 
premium for a currency put option. 

• Assuming the spot rate remains the same, a 
fall in the dollar interest rate relative to the 
foreign currency interest rate, or a rise in 
the foreign currency interest rate relative to 
the dollar interest rate, will decrease the pre¬ 
mium for a currency call option and increase 
the premium for a currency put option. 

American versus European 
For European options, intrinsic value is the 
value of an option relative to the outright for¬ 
ward market price; that is, it represents the 
difference between the strike price of the op¬ 
tion and the forward rate at which one could 
transact today. Intrinsic value can be zero but 
is never negative. If the strike price of the op¬ 
tion is more favorable than the current forward 
price, the option is in-the-money. If the strike 
price is equal to the forward rate, the option is 
at-the-money and if the strike price is less fa¬ 
vorable than the outright forward, the option is 
out-of-the-money. 

A similar definition applies for American- 
style options, except that the option's 
"moneyness" relative to the spot price also 
needs to be considered. Naturally, the cost of 
the option needs to be considered in order to 
achieve a profitable early exercise and this leads 
to a phenomenon peculiar to American options 
known as optimal exercise. This is the point 
at which it becomes profitable to exercise an 
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American option early, having taken account of 
the premium paid. 

In fact, there are several occasions when it 
would be better to pay extra premium and buy 
a more expensive American-style option. For 
example: 

1. When a trader is buying an option where the 
call currency has the higher interest rate and 
there is an expectation that the interest rate 
differential will widen significantly. 

2. When a trader is buying an option where 
the interest rates are close to each other and 
there is an expectation that the call inter¬ 
est rate will move above the put interest 
rate. 

3. When a trader is buying an out-of-the-money 
option with interest rates as in both of the 
above and there is an expectation for it 
to move significantly into the money, then 
the American-style option is more highly 
leveraged and will hence produce higher 
profits. 


THE GREEKS 

Traders extensively use the "Greeks," a set of 
factor sensitivities, to quantify the exposure of 
portfolios that contain options. Each measures 
how the portfolio's market value should re¬ 
spond to a change in some variable. For specu¬ 
lative purposes, the value of an option needs to 
be known on a continual basis, and more impor¬ 
tantly, the factors that change an option's value 
need to be understood. In analyzing an option 
risk (or value), the market norm is to use letters 
of the Greek alphabet. Not surprisingly, they 
are often referred to as the "Greeks," and they 
include delta, vega/kappa, theta, gamma, and 
rho. Flowever, vega is not in the Greek alpha¬ 
bet, but is named after a star in the constellation 
Lyra. Sometimes, vega has also been referred to 
as kappa. Also, four of the five are risk metrics. 
The exception here is theta, because the passage 
of time is certain and thus entails no risk. 


These major Greeks, which measure these 
risks and need to be taken into account before 
taking any option positions, are: 


Vega/Kappa 

Theta 

Delta 

Gamma 

Rho 

Measures the 

Measures 

Measures 

Measures 

Measures 

impact of a 

the impact 

the impact 

the rate of 

the 

change in 

of a change 

of a change 

change in 

sensitivity 

volatility 

in time 

in the price 

delta 

to an 


remaining 

of the 


applicable 



underlying 


interest rate 


Delta 

When option traders sell or buy a currency op¬ 
tion, they will use the foreign exchange market 
to hedge the exposure. The most common type 
of hedging is delta hedging. 

Delta is the change in premium per change 
in the underlying. Technically, the underlying 
is the forward outright rate but as the option¬ 
pricing model assumes constant interest rates, 
this is often calculated using spot. For example, 
if an option has a delta of 25 and spot moved 
100 basis points, then the option price gain/loss 
would be 25 basis points. For this reason, delta is 
sometimes thought of as representing the "spot- 
sensitive" amount of the option. 

Also, delta can be thought of as the estimated 
probability of exercise of the option. As the 
option-pricing model assumes an outcome pro¬ 
file based around the forward outright rate, an 
at-the-money option has a delta of 50%. It falls 
for out-of-the-money options and increases for 
in-the-money options, but the change is non¬ 
linear, in that it changes much faster when the 
option is close-to-the-money. 

Turning to calculus for the formal definition 
of delta, let 0 be the current time. Let °p and 
°s be current values for the portfolio and un- 
derlier. Delta is the first partial derivative of a 
portfolio's value with respect to the value of the 
underlier: 
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This technical definition leads to an approxi¬ 
mation for the behavior of a portfolio. 

A °p m deltaA°s 

where A°s is a small change in the underlier's 
current value, and A °p is the corresponding 
change in the portfolio's current value. This is 
called the delta approximation. 

An option is said to be delta hedged if a po¬ 
sition has been taken in the underlying in pro¬ 
portion to its delta. For example, if one is short 
a call option on an underlying with a face value 
of SI million and a delta of 0.25, a long po¬ 
sition of $250,000 in the underlying will leave 
one delta neutral with no exposure to changes 
in the price of the underlying, but only if these 
are infinitesimally small. 

As the underlying market moves throughout 
the life of the option, the delta will change, thus 
requiring the underlying hedge to be adjusted. 
Once the initial hedge has been transacted, calls 
and puts behave in precisely the same way, in 
terms of the hedging required. 

For example, an at-the-money sterling call/ 
dollar put option in £10 million, with a strike 
price of 1.75, has an initial delta of 50. The op¬ 
tion writer, therefore, buys £5 million in the 
spot market to hedge the option position. If the 
spot rises to 1.77, the delta will increase to, say, 
60. Now, the writer needs to purchase an extra 
£1 million to attain delta neutrality. If the ex¬ 
change rate then falls back again to the origi¬ 
nal rate, the option writer is overhedged and 
requires selling back £1 million in order to re¬ 
main delta neutral. Clearly, as the option writer 
rehedges, losses will be incurred, which will in¬ 
crease as volatility increases. 

Another example could be where a trader 
sells a dollar call/Swiss franc put at 1.5500 for 
six months for $10 million. The trader's risk is 
that in six months, the option will be exercised 
and there will be a payout of dollars and a re¬ 
ceipt of francs. The trader's hedge against this 
risk would therefore be to buy dollars and sell 
francs, thus hedging the delta amount because 
this represents the likelihood of exercise. If spot 


is 1.5300 and the forward outright is 1.5345, then 
the trader's hedging, ignoring time movement, 
would look like that shown in the following 
table, as the forward rate changes: 


Forward 

Delta 

Fledge 

Total 

1.5345 

35 

Buy $3.5 million 

+$3.5 million 

1.5500 

50 

Buy $1.5 million 

+$5.0 million 

1.5600 

57 

Buy $0.7 million 

+$5.7 million 

1.5200 

30 

Sell $2.7 million 

+$3.0 million 


Whether or not the trader loses money will de¬ 
pend on volatility. From the preceding table, it 
can be seen that hedging a short option position 
loses money, as the trader would be continually 
buying high and selling low. However, when 
the option was first sold, the trader received 
a premium for it, representing the estimated 
cost of hedging to the trader. If the volatility of 
the market is higher than the trader expected 
and then has to hedge more frequently, then 
the trader may lose more money hedging than 
originally gained on the premium. If, however, 
the market is less volatile than the assumption 
of the option price, the trader should lose less 
money hedging than received in premium and 
therefore make a profit overall. 

If the trader had bought the option rather than 
sold it, the trader would then hope for increased 
volatility because the hedging exercise would 
be making money. 

For example, the trader buys exactly the same 
options, a dollar call/Swiss franc put at 1.5500 
in S10 million. The trader's risk is now that there 
will be a long dollar position in six months, so 
the hedge will be to sell dollars and buy francs. 
As the forward outright rate moves, however, 
the delta of the option will move in exactly the 
same way as before. This follows as the option 
is the same and the delta does not depend on 
who owns the option. In this case, therefore, the 
trader will be buying low and selling high and 
making money on the hedging. Just as before, 
this makes sense, as the trader originally paid 
out a premium to buy the option, so the hedg¬ 
ing is making back that premium. This time. 
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the trader has bought volatility and hopes that 
volatility will in fact be higher than the rate 
at which the option was bought for. If it is, the 
trader will make more money hedging than was 
paid out in premium. 

Hence, buying and selling volatility is like any 
other product in that there is a wish to buy at 
a low rate and sell at a higher rate to make a 
profit. 

As another example, consider a short sterling 
call at a 1.8100 position at 342 points. The loss 
profile corresponds to the loss profile on a short 
sterling cash position. Thus, a hedge on a short 
sterling call position would be to buy sterling 
cash. The value of the option will go up with 
sterling going up, but it is not a one-to-one re¬ 
lationship. 

The delta ratio indicates the increase in value 
of the option for every increase in value of one 
point on the cash market. Thus, the following 
rules on delta can be established. On a call op¬ 
tion, delta will range from 0% when out-of-the- 
money to 50% at-the-money to 100% when deep 
in-the-money. Conversely, the delta of a put op¬ 
tion goes from 0% when out-of-the-money to 
-50% at-the-money to -100% when deep in-the- 
money. 

In the preceding example, the delta of the op¬ 
tion is, say, 45%, which means that to hedge the 
position, an amount of sterling of 45% of the 
face value of the option will have to be bought. 
Therefore, if the option is for £1 million, a move 
up of 50 points on the rate will result in a loss 
of: 


£1 million x 0.0050 x 45% = $2,250 

This will be offset by the long cash position 
of: 


450,000 x 0.0050 = $2,250 



Underlying market price 


Figure 3 Delta Profile 


to keep the premium received initially when 
selling the option. 

Figure 3 shows that delta is the gradient of the 
tangent of the curve of the premium in relation 
to the cash prices. This will also reveal that delta 
will move more rapidly for an option with a 
short remaining life than for an option with a 
long remaining life. 

In conclusion, basically, the delta of an option 
will change if any factor which influences the 
potential probability of exercise changes. These 
include spot price, volatility, time, and interest 
rates. Option trades use the delta as a guide to 
hedging. Taken simply, if a bank is short one 
option with a delta of 50%, the bank will hedge 
only half of the nominal amount of the option as 
it only has a 50% chance of being exercised. This 
is known as "delta hedging." This is a simplistic 
example, and, in reality, banks have large option 
books, which they hedge on a daily basis, but 
the principal applies no matter what the size of 
the portfolio. 

Also, there are three points to keep in mind 
with delta: 

1. Delta tends to increase as it gets closer to 
expiration for near or at-the-money options. 

2. Delta is not a constant. 

3. Delta is subject to change given changes in 
implied volatility. 


The delta of an option does not remain con¬ 
stant and the new delta of this position is, say, 
47%. In order to maintain a delta-neutral posi¬ 
tion, the trader will have to buy another £20,000. 
Such a hedging strategy will enable the trader 


Gamma 

The rate of change of delta is called gamma, and 
it will give a measure of the amount of change 
in the delta for a given change in the cash price. 




Basics of Currency Option Pricing Models 


519 



Therefore, it will provide an estimate of how 
much it will cost to delta hedge. 

The cost of rebalancing the hedge is a conse¬ 
quence of the curvature of the premium curve 
against cash prices. The curvature is greatest at- 
the-money and reduces when in-the-money or 
out-of-the-money. This is shown in Figure 4. 

A short option position is called gamma neg¬ 
ative. The higher the gamma, the less stable is 
the delta hedge. A first conclusion is that it is 
more costly to hedge a short long-dated option 
position than a short position of short-dated 
options. 

Thus, gamma is the change in delta per 
change in the underlying and is important 
because the option model assumes that delta 
hedging is performed on a continuous basis. 
In practice, however, this is not possible, as the 
market gaps and the net amounts requiring fur¬ 
ther hedging would be too small to make it 
worthwhile. The gapping effect that has to be 
dealt with in hedging an option gives the risk 
proportional to the gamma of the option. 

For a formal definition of gamma, again turn 
to calculus. Gamma is the second partial deriva¬ 
tive of a portfolio's value 0 p with respect to the 
value °s of the underlier: 



By incorporating gamma, there can be an im¬ 
provement to the approximation for how the 
portfolio's value should change in response to 
small changes in the underlier's value: 

A 0 p ^ g^^ A 0 s 2 + deltaA 0 s 


This is called the delta-gamma approxima¬ 
tion. 

An option's gamma is at its greatest when 
an option is at-the-money and decreases as the 
price of the underlying moves further away 
from the strike price. Therefore, gamma is 
U-shaped and is also greater for short-term op¬ 
tions than for long-term options. 

By convention, gamma can be expressed in 
two ways: 

1. A gamma of, say, 5.23 will mean that for 
1% change in the underlying price the delta 
will change by 5.23 units. That is, from 50% 
to 55.23%. 

2. A gamma of 3% will mean that for a one 
unit change in the underlying price, the delta 
will change by 3%, for example from 50% to 
51.5%. 

As an example of gamma hedging, as the for¬ 
ward outright rate moves from 1.5600 to 1.5200, 
the delta of the option moves from 57 to 30. The 
size of movement of the delta given this move¬ 
ment of the underlying is the gamma of the op¬ 
tion by the definition "gamma is the change in 
delta per change in the underlying." The hedg¬ 
ing the trader was required to do was to sell 
SI.7 million. In practice, the trader sold the full 
amount at a rate of 1.5200. If the trader were 
able to hedge continuously as the model as¬ 
sumes, the trader would have sold the same 
amount, that is, $1.7 million, but at an average 
rate of 1.5450. This would obviously have been 
more profitable. From this example, it can be 
seen that the gapping effect works against the 
trader when there is a short options position 
(and therefore short gamma), and a repetition 
of the exercise would show that the gapping is 
in the trader's favor if a long options position 
were being held (and gamma). 

The value of gamma is, therefore, very impor¬ 
tant in determining sensitivity to spot move¬ 
ment and this gapping effect. 

Flowever, gamma is not the same for all op¬ 
tions. Gamma is greater for short-term options 
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than for long-term options. For example, as¬ 
sume a dollar call/Swiss franc put option with 
a strike of 1.5500 and that there is one second 
to get to expiry. If the spot at the time is 1.5501, 
the option is extremely likely to be exercised 
and the delta will be 100. If, in that second, the 
spot moved to 1.5499, the option would not, 
in fact, be exercised and the delta would move 
to 0. Here, it can be seen that a 0.0002 move in 
spot produced a change in delta from 100 to 0. 
If it were the same option but there was one 
year to maturity, a movement of 0.0002 in spot 
would not significantly alter the likelihood that 
the option would be exercised; that is, the delta 
would not change noticeably. 

Gamma is greater for at-the-money options 
than for options with deltas above or below 50. 
Assume an extreme example to see this effect, 
using the same option of a dollar call/Swiss 
franc put with a strike of 1.5500, and there is a 
second to go before expiry. If the spot is at 1.5500 
and thus the option has a delta of 50, there 
would be the same situation as before when 
a 0.0001 movement in spot created a movement 
of 50 in the delta. If, however, the spot were at 
1.5200, the delta of the option would be 0, and 
a movement even as large as 0.0200 would not 
increase that delta. 

In conclusion, gamma is seen as a second- 
generation derivative, where the others 
considered are regarded as first-generation 
derivatives in the pricing of an option, in that 
the others all consider the change that an ex¬ 
ternal effect has on an option's value, such as 
change in spot. However, gamma measures the 
rate of change of the delta itself. Therefore, it 
is literally the delta of the delta. Since the delta 
is the key pricing tool used by market partici¬ 
pants in controlling the portfolio risk, to be able 
to work out the rate of change of this risk is 
very useful. Hence, gamma is a very important 
part of any option portfolio and is affected by 
three different factors: spot movement, time to 
maturity, and volatility. 

Also, the three points to keep in mind with 
gamma are: 


1. Gamma is smallest for deep out-of-the- 
money and deep in-the-money options. 

2. Gamma is highest when the option gets near- 
the-money. 

3. Gamma is positive for long options and neg¬ 
ative for short options. 

Theta 

Theta is the depreciation of the time value el¬ 
ement of the premium, that is, it measures the 
effect on an option's price of a one-day decrease 
in the time to expiration. The more the market 
and strike prices diverge, the less effect theta 
has on an option's price. Obviously, if you are 
the holder of an option, this effect will dimin¬ 
ish the value of the option over time, but if you 
are the seller (the writer) of the option, the effect 
will be in your favor, as the option will cost less 
to purchase. Theta is nonlinear, meaning that its 
value accelerates as the option approaches ma¬ 
turity. Positive gamma is generally associated 
with negative theta and vice versa. 

The rate at which the time value decays with 
respect to time is expressed as hundredths of a 
percent per unit of time (day/week). Obviously, 
the theta factor plays in favor of a short op¬ 
tion position. Shorter-dated options have larger 
thetas as do those at-the-money. This effect will 
give rise to trading strategies referred to as a 
calendar spread. 

To determine theta, assume f denotes time, 
and let L p denote the portfolio's value at time 
t. Formally, theta is the partial derivative of the 
portfolio's value with respect to time: 



where the derivative is evaluated at time t = 0. 
This technical definition leads to an approxima¬ 
tion for the behavior of a portfolio. 

A f p theta At 

where A f is a small interval of time, and A' p 
is the change in the portfolio's value that will 
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occur during that interval, assuming all other 
market variables remain the same. 

The delta of an option does have an influence 
on the time decay of an option because the time 
value element of an option total value is max¬ 
imum for at-the-money options. As the delta 
increases or decreases, the time value of the 
option decreases. Obviously, for options where 
there is very little time value, there will be very 
little time decay. If there is any doubt about 
which date to choose for an option maturity, as 
can be seen in Figure 5, there is little increase 
in time value for days at the far end of the op¬ 
tion. To buy a slightly longer option, therefore, 
will not cost much more. However, if a trader 
waits until the option expires and then has to 
buy another option to cover the final period, 
the additional cost could be substantially more. 
For this reason, buying an option for the longest 
period needed is recommended. 

In actual practice, traders do not use theta, 
but it is an important conceptual dimension. 
However, some additional points of note are: 

1. Theta can be very high for out-of-the-money 
options if they contain a lot of implied 
volatility. 

2. Theta is typically highest for at-the-money 
options. 

3. Theta will increase sharply in the last few 
weeks of trading and can severely under¬ 
mine a long option holder's position, espe¬ 
cially if implied volatility is on the decline at 
the same time. 


Vega 

Vega, sometimes also called kappa, quantifies 
risk exposure to implied volatility changes. 
Vega tells us approximately how much an 
option price will increase or decrease given 
an increase or decrease in the level of implied 
volatility. Option sellers benefit from a fall in 
implied volatility, while option buyers bene¬ 
fit from an increase in implied volatility. Vega 
is greatest for at-the-money options and in¬ 
creases with the time to maturity. This is the 
case because the longer the time to maturity, the 
greater the possibility of exchange rate move¬ 
ments and, therefore, the greater the sensitivity 
of the option price to a change in volatility. 

Vega is the first partial derivative of a port¬ 
folio's value °p with respect to the value °a 
of implied volatility. This technical definition 
leads to an approximation for the behavior of a 
portfolio. 

A°p as vegaA°or 

where, here, A°er is a small change in the im¬ 
plied volatility from its current value, and A p 
is the corresponding change in the portfolio's 
value. 

Thus, the more volatile the underlying price 
the more expensive the option will become 
because of the uncertainty element. The ra¬ 
tio of how much the value of the premium 
changes for a 1% change in volatility is vega. 
Longer-dated options have higher vegas and 
at-the-money options have higher vegas. It is 
expressed as a percentage change of dollars 
for a 1% change of volatility. For example, a 
vega of 1.0 means the option premium will 
appreciate by 1% in dollar or sterling terms. 

Rho 

It is generally considered to be the least im¬ 
portant of the Greeks, but nevertheless any 
option, be it a single position or a large port¬ 
folio, will be exposed to such a risk. This is 
because with over-the-counter European-style 
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options, the price (in part) is derived from the 
forward rate. Therefore, if either of the two in¬ 
terest rates of the currency pair in the option 
should change, so the forward and hence the 
price will change. This can happen without a 
move in the spot price. 

In formulating rho, let °p and °r be current 
values for the portfolio and underlier. Formally, 
rho is the partial derivative of the portfolio's 
value with respect to the risk-free rate: 


This technical definition leads to an approxima¬ 
tion for the behavior of a portfolio. 

A°p as rhoA°r 

where A°r is a small change in the risk-free 
rate, and A °p is the corresponding change in 
the portfolio's value. 

In summary, rho is the general term used for 
interest rate risk, but it is broken down further. 
Rho usually refers to the base currency inter¬ 
est rate (usually dollars), and phi relates to the 
traded currency interest rates (e.g., Swiss francs 
or Japanese yen). 

Beta and Omega 

Some other Greek letters that are used do not 
actually measure an option's value but are more 
geared to looking at the use of options or risks 
associated with valuation methods. Briefly, they 
include beta and omega. 

Beta represents the risk involved in hedging 
one currency pair against another, especially 
when sometimes currency pairs have a high 
correlation, for example, within the old Euro¬ 
pean Monetary System (EMS) with the deutsch 
mark and the French franc. Some traders that 
had a dollar against the franc position would 
have been happier hedging this exposure in the 
more liquid dollar against the mark market be¬ 
cause it fairly closely correlated to the franc. The 
risk here would have been if the mark against 
the franc correlation had started to weaken. 


Omega measures the translation profit/loss 
risk assumed by trading in currency pairs 
(which result in profits /losses in those two cur¬ 
rencies) that are not the same as the reporting 
base currency for accounting purposes. An ex¬ 
ample would be an American bank that gets 
profits for its sterling against Swiss franc trades 
in either sterling or francs, yet has to convert 
these to dollars for the balance sheet. 


KEY POINTS 

• The generally accepted pricing basis for op¬ 
tions today is the Black-Scholes formula, 
which was devised in the early 1970s to pro¬ 
vide a "fair value" for equity options. How¬ 
ever, the foreign exchange markets needed 
something to take account of interest rates 
and the fact that there are no dividends due 
on currencies. 

• Various adaptations of the Black-Scholes 
model emerged, of which the most popular 
one used today is the Garman-Kohlhagen sys¬ 
tem. This method makes allowances for the 
interest rates of the respective currencies and 
the fact that a currency can trade at a discount 
or premium forward relative to the other 
currency. 

• American-style options differ due to the 
possibility of early exercise. The Cox-Ross- 
Rubenstein model is the generally accepted 
method for these, but they do not feature 
heavily in the over-the-counter market. 

• Overall, the industry norm is to use the 
Black-Scholes formula adapted by Garman- 
Kohlhagen for valuing over-the-counter 
European-style currency options. 

• The factors required to price an option in¬ 
clude: (1) currency pair; (2) call or put; 
(3) strike rate; (4) amount; (5) style (European 
or American); (6) expiration date and time 
(New York expiry or Tokyo expiry); (7) pre¬ 
vailing spot rate; (8) interest rates for both 
currencies; (9) foreign exchange swap rate 
(calculated from the information in the 
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previous factor); and (10) volatility of the cur¬ 
rency pair. 

• The six factors chosen by the potential 
buyer /seller of the option are the currency 
pair, call or put, strike rate, amount, style, 
and expiration date and time. The prevailing 
spot rate, interest rates for both currencies, 
and foreign exchange swap rate are given by 
the market. The volatility of the currency pair 
is the only unknown factor, representing the 
anticipated market volatility expected for the 
life of the option, and is determined using the 
option pricing models discussed in this entry. 
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Abstract: Credit default swaps are the most popular of all the credit derivative contracts traded. 
Their purpose is to provide financial protection against losses incurred following a credit event of 
a corporate or sovereign reference entity. Replication arguments attempting to link credit default 
swaps to the price of the underlying credits are generally used by the market as a first estimate 
for determining the price at which a credit default swap should trade. The replication argument, 
however, is dependent on the existence of same maturity and same seniority floating rate bonds. 
Even if such securities do exist, contractual differences between CDS and bonds can weaken the 
replication relationship. Over the past decade, the increased liquidity of the CDS market has meant 
that in some cases, it, and not the bond market, is the place where credit price discovery occurs. 
Despite this it still necessary to have a CDS valuation model for the valuation and risk-management 
of existing positions. 


Credit default swaps (CDSs), or simply de¬ 
fault swaps, provide an efficient credit-risk 
transferring financial instrument. Their over- 
the-counter nature also makes them infinitely 
customizable, thereby overcoming many of the 
limitations of the traditional credit market in¬ 
struments such as lack of availability of instru¬ 
ments with the required maturity or seniority. 
Increasing standardization and familiarization 
with the legal framework has made capital mar¬ 
ket participants more willing to enter into de¬ 
fault swap transactions as have developments 
in credit modeling and pricing that have made 


it possible to mark-to-market and hedge default 
swap positions. 

Bonds are the main source of liquidity in the 
credit markets, especially in the United States. 
In the early years of the CDS market, replication 
arguments that attempted to link CDSs to bonds 
were therefore generally used by the market as a 
first estimate for determining the price at which 
CDSs should trade Nowadays, the greater liq¬ 
uidity of the CDS market means that it is of¬ 
ten the place where price discovery occurs and 
can at times lead the cash credit bond mar¬ 
ket. So while the replication relationship is still 
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important, it is now a two-way process with 
bond traders looking at CDS prices and CDS 
traders looking at bond prices, all watching to 
see if the replication relationship breaks down 
to the extent that any dislocation becomes arbi- 
trageable, at which point they will step in and 
enter into positions to profit from the disloca¬ 
tion. If done in a material size, the effect of such 
an action should be to realign the two markets. 
However, the replication argument is not ex¬ 
act, as it is based on a number of assumptions 
that often break down in practice. Market par¬ 
ticipants who wish to price CDSs and examine 
relative value opportunities need to understand 
replication and its assumptions. We discuss the 
replication approach in this entry. 

However, replication only provides a start¬ 
ing point for quoting CDS spreads. It does not 
allow traders to actually mark to market their 
existing CDS positions. By definition, marking 
a CDS position to market must involve pricing 
it off the current market CDS spread curve—a 
set of CDS spreads quoted for different maturi¬ 
ties. The main objective of this chapter will be 
to explain how to determine the CDS spread, 
what factors affect its pricing, and how to mark- 
to-market CDSs. We show that this requires a 
model and set out the standard model that is 
used by the market. 

DEFAULT SWAPS 

In a standard CDS contract one party pays a 
regular fee to another to purchase credit pro¬ 
tection to cover the loss of the face value of an 
asset following a credit event. The company (or 
sovereign) to which the triggering of the credit 
event is linked is known as the reference entity. 

This protection lasts until some specified ma¬ 
turity date which falls on the 20th of either 
March, June, September or December, typically 
five years from the trade date. To pay for this 
protection, the protection buyer makes a reg¬ 
ular stream of payments. These are quoted in 
terms of an annualized percentage known as 
the CDS spread. These payments are typically 
paid quarterly according to an Actual 360 ba¬ 


sis convention and are collectively known as 
the premium leg. Payments occur until matu¬ 
rity of the contract or a credit event occurs, 
whichever happens first. The protection buyer 
will also pay the protection seller the fraction of 
the coupon which has accrued since the previ¬ 
ous premium payment date. 

If a credit event does occur before the matu¬ 
rity date of the contract, there is a payment by 
the protection seller, known as the protection 
leg. There are two ways to settle the payment of 
the protection leg: physical settlement and cash 
settlement. The form of settlement is specified 
at the time of the ISDA organised auction used 
to determine the final recovery price of the de¬ 
liverable obligations. This can take the form of 
physical or cash settlement and one of the pur¬ 
poses of the auction is to ensure that both have 
the same economic value. 

• Physical settlement: Following the ISDA auc¬ 
tion, a protection buyer who elects for physi¬ 
cal settlement will submit a facevalue amount 
of bonds into the auction and receive a pay¬ 
ment of 100 on the same facevalue. A protec¬ 
tion seller who elects for physical settlement 
will end up receiving a deliverable obligation 
and paying par. In general there is a choice of 
deliverable obligations from which the pro¬ 
tection buyer can choose. These deliverable 
obligations will satisfy a certain number of 
characteristics that typically include restric¬ 
tions on the maturity of the deliverable obli¬ 
gations and the requirement that they be pari 
passu—most default swaps are linked to se¬ 
nior unsecured debt. Typically, they include 
both bonds and loans. If deliverable obliga¬ 
tions trade with different prices following a 
credit event, which they are most likely to do 
if the credit event is a restructuring, the pro¬ 
tection buyer can take advantage of this situ¬ 
ation by buying and delivering the cheapest 
deliverable. The protection buyer is therefore 
long a cheapest to deliver (CTD) option. 

• Cash settlement: A protection buyer who 
opts for cash settlement receives par minus 
the recovery price on his face value. The 
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recovery price is the one determined by the 
ISDA auction. The protection seller pays par 
minus the same recovery price. 

CDS spreads are typically quoted for a variety 
of maturities with most liquidity at the five-year 
maturity followed by the three-year and seven- 
year maturities. The bid is the spread at which 
the dealer is willing to buy protection, while the 
offer is the spread at which the dealer is willing 
to sell protection. Clearly, the bid spread will 
be less than the offer spread. Note that this is 
opposite to the convention for bonds where the 
bid spread is the spread at which the dealer is 
willing to buy the bond and this will be higher 
than the offer spread. This is because the buyer 
of a bond is selling protection, while the buyer 
of a CDS is buying protection. 

Illustration 

Suppose a protection buyer purchases 5-year 
protection on a company at a default swap 
spread of 200bp. The face value of the protec¬ 
tion is $10 million. The protection buyer there¬ 
fore makes quarterly payments approximately 
equal to $10 million x 0.02 x 0.25 = $50,000. 
(The exact payment amount is a function of the 
calendar and basis convention used.) After a 
short period the reference entity suffers a credit 
event. Assuming that the subsequent ISDA auc¬ 
tion which takes place within 2 months of the 
credit event determines a recovery price of $35 
per $100 of face value, the payments are as 
follows: 

• The protection seller compensates the protec¬ 
tion seller for the loss on the face value of the 
asset received by the protection buyer. This 
is equal to $10 million x (100% — 35%) = 
$6.5 million. 

• The protection buyer pays the accrued pre¬ 
mium from the previous premium payment 
date to time of the credit event. For example, 
if the credit event occurs after a month then 
the protection buyer pays approximately $10 
million x 0.02 x 1/12 = $16,666 of premium 
accrued. Note that this is the standard for cor¬ 
porate reference entity linked default swaps. 


The Mechanics of Settlement 

The timeline around the physical settlement of 
a CDS following a credit event consists of three 
steps: 

1. A CDS market participant who has previ¬ 
ously signed up to the ISDA protocols sub¬ 
mits a request to the ISDA determinations 
committee asking whether or not a credit 
event has occurred on a specified reference 
entity. The event must be evidenced by at 
least two sources of publicly available in¬ 
formation (e.g., a news article on Reuters, 
the Wall Street Journal, the Financial Times or 
some other recognized publication or elec¬ 
tronic information service). The determina¬ 
tions committee, which consists of both buy 
and sell-side representation then has to de¬ 
cide whether or not the credit event has oc¬ 
curred. An 80% supermajority is needed to 
approve any decision. If it is determined that 
a credit event has occurred, the process lead¬ 
ing to the ISDA auction is then begun. 

2. The ISDA then begins compiling a list of 
the deliverable obligations and publishes the 
details of the auction which will take place 
in order to determine the recovery price. If 
the credit event is a bankruptcy or a fail¬ 
ure to pay then CDS contracts are automat¬ 
ically triggered. However if the event is a 
restructuring, CDS protection buyers can de¬ 
cide whether to trigger their contract or not - 
if they decide not to trigger then the contract 
can be used later if a bankruptcy or failure 
to pay occurs. In Europe, the settlement of 
a restructuring event is also complicated by 
the fact that standard CDS contracts with dif¬ 
ferent maturities can have different baskets 
of deliverable obligations and separate auc¬ 
tions will be needed to determine their final 
recovery price for each basket. 

3. The auction takes place. CDS market partic¬ 
ipants who have positions in the triggered 
contracts need to decide whether or not to 
settle physically or in cash. Buyers and sell¬ 
ers of CDS protection can choose physical 
settlement even if their trade counterparty 
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chose cash settlement, and vice-versa. The 
various dealers through whom market par¬ 
ticipants trade then bring all of these posi¬ 
tions plus their own positions into an auction 
at the end of which only the net position - 
the net open interest - will be transferred, 
thereby averting any short squeeze which 
may be caused if the gross notional of CDS 
positions exceeds the outstanding notional 
of deliverable obligations. Dealers can then 
submit bids or offers on the net open interest 
of physical obligations, which may be long or 
short. At the end of this auction procedure, 
a recovery price is determined. All CDS con¬ 
tracts are then automatically settled at this 
recovery price. 

As a result, the maximum delay between notice 
of a credit event and the actual payment of the 
protection is approximately 72 calendar days. 


CREDIT EVENTS 

The most important section of the documenta¬ 
tion for a default swap is what the parties to the 
contract agree constitutes a credit event that will 
trigger a payment by the protection seller to the 
protection buyer. Definitions for credit events 
are provided by the International Swap and 
Derivatives Association (ISDA). First published 
in 1999, there have been periodic updates and 
revisions of these definitions. The most recent, 
and one of the most important updates of the 
ISDA documentation for credit default swaps 
was the introduction of the Big Bang protocol 
in 2009. These were are response to the Financial 
Crisis of 2008 and were intended to streamline 
the process of determing and settling a credit 
event. They were also intended to enable the 
migration of CDS trades to centralised counter¬ 
parties by increasing fungibility. 

ISDA Credit Event Definitions 

Of the eight possible credit events referred to in 
the 1999 ISDA Credit Derivative Definitions, the 
ones typically used within most contracts are 
listed in Table 1. In terms of which are used. 


Table 1 Credit Events Typically Used within Most 
CDS Contracts 


Credit Event Description 


Bankruptcy 


Failure to pay 


Restructuring 


Obligation 

acceleration/ 

obligation 

default 


Repudiation/ 

moratorium 


Source: ISDA. 


Corporate becomes insolvent or is 
unable to pay its debts. The 
bankruptcy event is of course not 
relevant for sovereign issuers. 

Failure of the reference entity to 
make due payments, taking into 
account some grace period to 
prevent accidental, triggering due 
to administrative error. 

Changes in the debt obligations of 
the reference creditor but 
excluding those that are not 
associated with credit 
deterioration such as a 
renegotiation of more favorable 
terms. 

Obligations have become due and 
payable earlier than they would 
have been due to default or similar 
condition. 

Obligations have become capable of 
being defined due and payable 
earlier than they would have been 
due to default or similar condition. 
This is the more encompassing 
definition and so is preferred by 
the protection buyer. 

A reference entity or government 
authority rejects or challenges the 
validity of the obligations. 


the market distinguishes between corporate- 
and sovereign-linked CDSs. For corporate- 
linked CDSs the market standard is to use just 
three credit events—bankruptcy, failure to pay, 
and restructuring. For sovereign-linked CDSs, 
obligation acceleration/default and repudia¬ 
tion/ moratorium are also included. 


Restructuring Controversy 

Restructuring means a waiver, deferral, restruc¬ 
turing, rescheduling, standstill, moratorium, 
exchange of obligations, or other adjustment 
with respect to any obligation of the reference 
entity such that the holders of those obligations 
are materially worse off from either an eco¬ 
nomic, credit, or risk perspective. It has been 
the most controversial credit event that may be 
included in a default swap. 
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In bankruptcy or failure to pay, pari passu 
assets trade at or close to the same recovery 
value. But restructuring is different. Following 
a restructuring, debt continues to trade. Short- 
dated bonds trade at higher prices than longer- 
dated bonds, bonds with large coupons trade 
at a higher price than bonds with low coupon. 
Loans, which are typically also deliverable, 
tend to trade at higher prices than bonds due to 
their additional covenants. 

This makes the delivery option that is em¬ 
bedded in a default swap potentially valuable. 
A protection buyer hedging a short-dated high 
coupon asset may find that following a restruc¬ 
turing credit event it is trading at, say, $80 while 
another longer-dated deliverable may be trad¬ 
ing at $65. By selling the $80 asset, purchasing 
the $65 asset, and delivering it into the CDS, the 
protection buyer may make a $15 windfall gain 
out of the delivery option. However, this gain 
is made at the expense of the protection seller 
who has to take ownership of the $65 asset in 
return for a payment of par. 

Such a situation arose in the summer of 2000 
when the U.S. insurer Conseco restructured its 
debt. At that time, the range of deliverable obli¬ 
gations following a restructuring event was the 
same as those used for bankruptcy or failure 
to pay. This meant that bonds or loans with a 
maximum maturity of 30 years could be de¬ 
livered. Protection sellers were displeased at 
being delivered long-dated low-priced bonds 
in the price range 65 to 80 by banks who held 
much higher-priced short-term loans. In addi¬ 
tion, it was believed that there was a conflict 
of interest—banks who exercised their default 
swaps had also been party to the restructuring 
of Conseco's debt. 

The results of this experience led to the mar¬ 
ket discussing a restructuring supplement to 
the standard ISDA documentation. This was 
completed on May 11, 2001, and introduced a 
new restructuring definition called modified re¬ 
structuring (mod-re). The essence of this was to 
reduce the range of deliverable obligations fol¬ 
lowing a restructuring event and so limit the 
value of the delivery option. 


Although adopted by the North American 
market between 2002 and 2009, this standard 
has now become redundant for the standard 
North American contract (SNAC) since restruc¬ 
turing is no longer one of the standard trigger¬ 
ing credit events. Europe has retained the re¬ 
structuring credit event. However the basket of 
allowed deliverable obligations is determined 
by the Modified Modified Restructuring clause 
which effectively limits the maturity of these 
obligations to the greater of the maturity of the 
CDS contract and 60 months. Credit default 
swaps linked to Asian corporate credits con¬ 
tinue to include restructuring as a credit event. 
They also retain the old style rules about what 
can be delivered, allowing all bonds and loans 
of the appropriate seniortity and with a maxi¬ 
mum maturity of 30 years. A summary descrip¬ 
tion of the different standard market contracts 
by geographical region is shown in Table 2. 

Where the same credit trades with differ¬ 
ent restructuring conventions, these differ¬ 
ent contract standards should be reflected 
in the quoted market spreads. For example. 


Table 2 Different Restructuring Standards by 
geographic region. 


Region 

Description 

North America 

The standard North American 
contract (SNAC) now trades 
without restructuring as a credit 
event. 

Europe 

Both CDS and CDS indices trade 
with bankruptcy, failure to pay 
and restructuring. In the case of 
restructuring, the deliverable 
obligations are determined 
according to the Modified 
Modified Restructuring clause 
which limits the maturity of 
deliverables to the maximum of 
the maturity of the CDS and 

60 months. 

Asia 

Both CDS and CDS indices trade 
with bankruptcy, failure to pay 
and restructuring. Following a 
restructuring event the only limit 
on deliverables is the old-style 
limit of a maximum maturity of 

30 years. 

Source: ISDA. 
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modified-modified restructuring allows the 
protection buyer to have a broader range of 
deliverables than modified restructuring. This 
means that the value of the delivery option is 
greater for mod-mod-re than for mod-re and so 
the protection should trade at a wider spread for 
the more valuable delivery option. More gener¬ 
ally, there should be a strict theoretical relation¬ 
ship between these spread levels of 1 

Spreadoid.Re > Spread Mod _ Mod _ Re > Spread Mod _ Re 

> Spread No _ Re 

In this entry, the aim is not to determine what 
the spread differences should be, but to price 
contracts of a given type given the correspond¬ 
ing curve of market spreads. 

Credit Events and Implementation 
of Default Swap Pricing Models 

In the pricing model presented in this entry, we 
refer to "default." By this we mean any of the 
credit events included in the CDS contract. This 
means that the value of a contract will depend 
on which credit events are included in a partic¬ 
ular trade. 

While the model presented handles any of the 
credit events that may be selected by the parties 
to a trade, the data required are typically drawn 
from databases that collect defaults defined in a 
different way than those set forth by ISDA credit 
event definitions. For example, major studies 
regarding default rates and recovery rates, as 
well as default times, define default in terms of 
the legal definition of default. In contrast, con¬ 
sider restructuring. Suppose that full restruc¬ 
turing is included in a trade as a credit event. 
Then a reduction in a reference obligation's in¬ 
terest rate that is material is a credit event. In 
fact, actions by lenders to modify the terms of a 
reference obligation without a bankruptcy pro¬ 
ceeding are not uncommon. Yet, they are not 
included (or even known) to researchers who 
compile data on defaults. 

The key point is that in the implementation 
stage, the inputs must be modified based on 
the credit events included in a trade. 


PRICING CREDIT DEFAULT 
SWAPS BY STATIC 
REPLICATION 

There is a fundamental relationship between 
the default swap market and the cash market in 
the sense that a default swap can be shown as 
being economically equivalent to a combina¬ 
tion of cash bonds. This cash-CDS relationship 
means that determination of the appropriate de¬ 
fault swap spread for a particular credit usually 
begins by observing the London Interbank Of¬ 
fered Rate (LIBOR) spread at which bonds of 
that issuer trade. The usual comparison is to 
look at what is called the par asset swap spread 
of a bond of a similar maturity to the default 
swap contract. This is the spread over LIBOR 
paid by a package containing a fixed-rate bond 
and interest rate swap purchased at par. This 
spread can easily be calculated. 2 

Since 2009, CDS contracts have traded with 
fixed premiums. Prior to this, any new CDS 
contract would have its premium set at initi¬ 
ation so that the contract would have zero ini¬ 
tial value In order to facilitate moves towards 
a centralised counterparty for CDS, in 2009 the 
market decided that all contracts on a specific 
reference entity, regardless of their maturity and 
when they were traded will trade with the same 
fixed premium. The value of this fixed premium 
is different for different reference entities. In the 
US it is lOObp for investment grade credits and 
500bp for high-yield credits. A similar conven¬ 
tion exists in Europe with additional spreads 
levels. The effect of this is that CDS contracts no 
longer have zero value at initiation. This is actu¬ 
ally not a radical change - it simply means that 
new contracts have to be valued in the same 
way that seasoned CDS contracts were valued 
in the past. However, it does mean that the old 
CDS-bond static replication argument becomes 
less realisable. 

Since it is a fixed number through time, the 
premium spread on the CDS linked to some 
reference entity no longer reflects the market 
implied credit risk of the reference entity at 
the time of the trade. That information is now 
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embedded in the upfront cost of the CDS. But 
this cost is not a spread measure and is difficult 
to use to compare the market implied credit 
risk across different credits and different ma¬ 
turities. Instead, the market has created a new 
spread measure known as the par CDS spread. 
This is defined as the coupon on a fictional CDS 
which would give it a zero initial value today. It 
is the old CDS premium now reborn as a spread 
measure. The following static replication argu¬ 
ment is therefore based on such a fictional CDS 
contract where the spread S is set so that the 
contract has zero initial value. The reason for 
doing this is that we wish to understand the re¬ 
lationship between this par spread and the par 
asset swap spread. Note also that the standard 
model which we will describe later is the mech¬ 
anism used to convert the upfront cost of a CDS 
contract to a par spread and vice-versa. 

The premium payments in a default swap 
contract are defined in terms of a default swap 
spread, S, which is paid periodically on the pro¬ 
tected notional until maturity or a credit event. 
It is possible to show that the default swap 
spread can, to a first approximation, be prox- 
ied by a par floater bond spread (the spread to 
LIBOR at which the reference entity can issue a 
floating rate note of the same maturity at a price 
of par) or the asset swap spread of an asset of the 
same maturity provided it trades close to par. 

To see this, consider a strategy in which an 
investor buys a par floater issued by the refer¬ 
ence entity with maturity T. The investor can 
hedge the credit risk of the par floater by pur¬ 
chasing protection to the same maturity date. 
Suppose this par floater (or asset swap on a par 
asset) pays a coupon of LIBOR plus F. Default 
of the par floater triggers the default swap, as 
both contracts are written on the same reference 
entity. With this portfolio the investor is effec¬ 
tively holding a default-free investment, ignor¬ 
ing counterparty risk. 

The purchase of the asset for par may be 
funded on balance sheet or on repo—in which 
case we make the assumption that the repo rate 
can be locked in to the bond's maturity. The re¬ 
sulting funding cost of the asset is LIBOR plus 


B, assumed to be paid on the same dates as the 
default swap spread S. Consider what happens 
in the following scenarios: 

No credit event—The hedge is unwound at the 
bond maturity at no cost since the protection 
buyer receives the par redemption from the 
asset and uses it to repay the borrowed par 
amount. 

Credit event—The protection buyer delivers 
the reference asset to the protection seller 
in return for par. If we assume that the 
credit event occurs immediately following a 
coupon payment date, then the cost of clos¬ 
ing out the funding is par, which is repaid 
with this principal. The position is closed out 
with no net cost. 

Both scenarios are shown in Figure 1. As the 
hedged investor has no credit risk within this 
strategy they should not earn (or lose) any ex¬ 
cess spread. This implies that S = F — B; that is, 
the default swap spread should be equal, to the 
par floater spread minus the funding cost of the 
cash bond. For example, suppose the par floater 
pays LIBOR plus 25 basis points and the pro¬ 
tection buyer funds the asset on balance sheet 
at LIBOR plus 10 basis points. For the protec¬ 
tion buyer the breakeven default swap spread 
equals F — B = 25 — 10 = 15 basis points. 

This analysis certainly shows that there 
should be a close relationship between cash and 
default swap spreads. Flowever, the argument 
is not exact as it relies on several assumptions 
that could result in small but observable differ¬ 
ences. Some are listed below: 

1. We have assumed the existence of a par 
floater with the same maturity date as the 
default swap and that the coupon on the de¬ 
fault swap contract has been set so that it has 
zero initial value. 

2. We have assumed a common market-wide 
funding level of LIBOR + B. In practice, 
different market participants have different 
funding costs which therefore imply differ¬ 
ent spread levels. 

3. We have assumed repo funding to term. 
Repo funding cannot usually be locked in 
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Figure 1 Theoretical Default Risk-Free Hedge for an Investor Who Buys Protection 


to term but only for short periods of a couple 
of months only. One attraction of CDS is that 
unlike cash, they effectively lock in funding 
at LIBOR flat to maturity. 

4. We have ignored accrued coupons. If the 
credit event occurs just before a coupon pay¬ 
ment on the funding leg, the protection does 
not cover the loss of par plus coupon on the 
funding leg. We have also ignored the effect 
of the accrued CDS premium payment from 
the previous payment date. 

5. We have assumed that the par floater is the 
cheapest-to-deliver asset. 

6. We have ignored counterparty risk on the 
CDS. This is usually mitigated through the 
use of collateral. 

7. Due to the difficulty of shorting cash bonds, 
any widespread market demand to go short a 
particular credit will first impact CDS, caus¬ 
ing spreads to widen before cash. 

8. For asset swaps the initial price of the asset 
must be close to par. This is because the loss 
on an asset swap of a bond trading with a full 
price P is about P — R. The credit risk is then 
only comparable to a default swap when the 
asset trades close to par. 

9. We have ignored transaction costs. 

Despite these assumptions, cash market 

spreads usually provide the starting point for 

where the default swap spreads should trade. 


Empirically, there is a high correlation between 
the two spread levels. The difference between 
where and cash LIBOR spreads trade is known 
as the default swap basis, defined as 

Default swap basis = S — F 

There are now a significant number of mar¬ 
ket participants who actively trade the default 
swap basis, viewing it as a new relative value 
opportunity. 3 

PRICING OF A SINGLE-NAME 
CREDIT DEFAULT SWAP 
Reduced versus Structural Models 

To value credit derivatives it is necessary to 
be able to model the default risk, the recov¬ 
ery rate risk and the effect of interest rates. 
The two most commonly used approaches to 
model credit risk are structural models and re¬ 
duced form models. The first structural model 
for credit-risky bonds was proposed by Black 
and Scholes (1973) who explained how equity 
owners hold a call option on the firm. After that 
Merton (1973 and 1974) extended the frame¬ 
work and analyzed the behavior of risky debt 
using the model. 4 

The second type of credit models, known as 
reduced-form models, are more recent. 5 These 
models, most notably the Jarrow-Tumbull 
model and Duffie-Singleton model, do not look 
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inside the firm. Instead, they model directly the 
likelihood of a default occurring. Not only is 
the current probability of default modeled, they 
also attempt to model a "forward curve" of de¬ 
fault probabilities that can be used to price in¬ 
struments of varying maturities. Characterizing 
default as an event that occurs with a modeled 
probability has the effect of making default a 
surprise—the default event is a random event, 
which can suddenly occur at any time. All we 
know is its probability. 

Reduced-form models are easy to calibrate 
to the term structure of CDS prices observed 
in the marketplace. This is known as work¬ 
ing in an "arbitrage-free" framework. It is 
only by ensuring that a pricing model fits 
the market that a trader can be sure that he 
does not quote prices that expose him to any 
price arbitrages. The ability to quickly and eas¬ 
ily calibrate to the entire CDS market is the 
major reason why reduced-form models are 
strongly favored by real-world practitioners 
in the credit derivatives markets for pricing. 
Structural-based models are used more for de¬ 
fault prediction and credit risk management. 

Increasingly, investors are seeking consis¬ 
tency between the markets that use different 
modeling approaches, as the interests in seek¬ 
ing arbitrage opportunities across various mar¬ 
kets grows. Chen (2003) has demonstrated that 
all the reduced-form models described above 
can be regarded in a nonparametric framework. 
This nonparametric format makes the compari¬ 
son of various models possible. Furthermore, as 
Chen contends, the nonparametric framework 
focuses the difference of various models on 
recovery. 

The basic framework that underlies the 
reduced-form model is a binomial default pro¬ 
cess. There are two branches at each time point 
on the tree: default and survival. The branches 
that lead to default will terminate the contract 
and incur a recovery payment. The branches 
that lead to survival will continue the contract 
that will then face future defaults. This is a very 
general framework to describe how default oc¬ 


curs and contract terminates. Various models 
differ in how the default probabilities are de¬ 
fined and the recovery is modeled. 

Reduced form models use risk-neutral pricing 
to be able to calibrate to the market. In practice, 
we need to determine the risk-neutral proba¬ 
bilities in order to reprice the market and price 
other instruments not currently priced. In do¬ 
ing so, we do not need to know or even care 
about the real-world default probabilities. 

Since in reality, a default can occur any time, to 
accurately value a default swap, we need a con¬ 
sistent methodology that describes the follow¬ 
ing: (1) when defaults occur, (2) how recovery 
is paid, and (3) how discounting is handled. 

Survival Probability 

Assume the risk-neutral probabilities exist. 
Then we can identify a series of risk-neutral 
default probabilities so that the weighted aver¬ 
age of default and no-default payoffs can be 
discounted at the risk-free rate. The risk-free 
rate used in the pricing of CDS is LIBOR. This 
is because within a derivatives framework, the 
risk-free rate is close to the rate at which market 
dealers fund their hedges. 

Assume Q(t) to be the survival probability from 
now till some future time t. Then Q(t) — Q(t + 
r) is the default probability between t and 
t + r (that is, survive till t but default before 
t + r). Assume defaults can only be observed 
at multiples of r. Then the total probability of 
default over the life of the CDS is the sum of all 
the per period default probabilities: 

n 

J2 QK; - l)r] - Q(jr) = l-Q(nr) = l-Q(T) 
7=1 

where Q(0) = 1.0 and m = T, the maturity 
time of the CDS. It is no coincidence that the 
sum of the all the per-period default probabil¬ 
ities should equal one minus the total survival 
probability. 

The survival probabilities have a useful appli¬ 
cation. A $1 "risky" cash flow received at time t 
has a risk-neutral expected value of Q(t) and a 
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present value of P(t)Q(t) where P is the risk-free 
discount factor. 

The value of the protection leg of a CDS is 
the present value of the payment of (1 — R) 
at default. To take into account the timing of 
the default payment (1 — R), we break the 
time to maturity into n intervals which corre¬ 
spond to the premium payment dates on the 
premim leg. This is a simple numerical approx¬ 
imation which works well given the quarterly 
payment convention of CDS. However a more 
exact model would break the time to maturity 
into monthly or even weekly time steps. For 
each time period we consider the probability of 
defaulting in each. The probability of defaulting 
in a forward interval [(/ — 1) r, /r] is given by 

Q[(j ~ l)^] - QO'r) (1) 

We then discount the payment of (1 — R) 
back to today by multiplying it by the risk¬ 
free discount factor P(f). We then consider the 
likelihood of default occurring in all of the 
intervals by summing over all intervals. We 
therefore have 

n 

V = (1 -R)J2 P(jr){Q[U - l)r] - QO'r)} 

;'=i 

( 2 ) 

where R(-) is the expected recovery rate deter¬ 
mined by a CDS auction which takes place soon 
after a credit event rate. 

In the above equation, it is implicitly assumed 
that the discount factor is independent of the 
survival probability. In reality, these two may be 
correlated—usually higher interest rates lead to 
more defaults because businesses suffer more 
from higher interest rates. To account for this 
we would need to introduce a stochastic prob¬ 
ability and interest rate model. However, the 
effect of this correlation is almost negligible on 
the valuation of CDS and is further reduced by 
calibration. Equation (2) has no easy solution. 6 

Premium payments on the premium leg of a 
CDS terminate as soon as a credit event occurs. 
As a result the expected present value of the 
premium leg of the default swap is given by dis¬ 
counting each of the expected spread payments 


by the risk-neutral discount factor weighted by 
the probability of surviving to each payment 
date. This is given by 

N 

S^A ; P(;t)Q(/t) 

;'=i 

where Ay is the corresponding year fraction in 
the appropriate basis convention (typically ac¬ 
tual 360). By definition the value of the de¬ 
fault swap spread is the value at which the 
premium and protection legs have the same 
present value. Hence, we have 

n 

y=S^A ; P(;r)Q(/r) 

j = i 


E A ;P(/ t )Q(/ t ) 

7=1 

Figure 2 depicts the general default and re¬ 
covery structure. The payoff upon default of a 
default swap is par minus the recovery value as 
determined by any future ISDA auction which 
takes place after a credit event. As of today, the 
value of this recovery is unknown, we do not 
even know if a credit event will occur. As our 
model is based on the expected value of the 
protection leg, the recovery rate used has to be 
the expected value of the recovery rate condi¬ 
tional on a default and for this, market practi¬ 
tioners refer to historical recovery rates. Mar¬ 
ket convention is to use a 40% recovery rate as 
this is close to the average historical recovery 



Figure 2 Payoff and Payment Structure of a CDS 
where as a simple approximation we assume that 
a credit event can only occur on a CDS premium 
leg payment date. In practice the credit even can 
occur at any time and the market standard model 
would take this into account. 
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rate for senior unsecured US corporate bonds - 
most CDS are linked to bonds which are senior 
unsecured. 

In practice the portion of the premium pay¬ 
ment that has accrued from the previous 
coupon payment date is paid by the protec¬ 
tion buyer following the credit event. We have 
ignored it in our analysis since its effect on the 
calculated spread is small. 7 

Valuation of a Credit Default Swap 

The valuation of CDS can be broken down into 
two separate tasks. The first is the determina¬ 
tion of the default swap spread, which should 
be paid by a protection buyer at the initiation 
of a trade. This has already been discussed. The 
second is to determine the value of an existing 
CDS position, which we call the mark-to-market 
(MTM) or the upfront value. They are the same. 

Since the recouponing of CDS contracts in 
2009, we can no longer state that the MTM or 
upfront of a new trade is always zero. The ef¬ 
fect of fixing the premium leg coupon means 
that the risk of the reference entity must now be 
embedded in the initial cost of protection. 

Once a CDS position has been established, 
changes in the current market CDS spread will 
mean that the MTM begins to deviate from its 
initial value and must be determined by observ¬ 
ing the current level of default swap spreads in 
the market. To see how this is done, consider 
the following example. 

An investor sells protection on a high yield 
reference entity for five years at an agreed con¬ 
tractual spread of 500 basis points. By selling 
protection the investor is assuming the credit 
risk of the reference entity as though he was 
buying one of the reference entity's issued 
bonds. A year later the reference entity's credit 
rating has improved and the market quoted 4- 
year par CDS spread is at 100 basis points. What 
is the MTM or upfront value of the position? 

To begin with, the MTM value of the contract 
to the investor is given by the difference be¬ 
tween what the investor is expecting to receive 


minus what they are expected to pay. As a result 
we can write 

MTM = + Present value of four years of risky 
premium payments of 500 basis 
points — Present value of protection 
for the remaining four years 

We can also write that the current four-year par 
CDS spread of 100 basis points is the current 
break-even spread. By definition, the current 
value of a new four-year "par" CDS contract 
with a coupon equal to the par CDS spread is 
zero so we can write 

Present value of four years of risky premium 
payments of 100 basis points = Present 
value of protection for the remaining 
four years 

Substituting, we write 

MTM = -(-Present value of four years of risky 
premium payments of 500 basis 
points — Present value of four years 
of risky premium payments 
at 100 basis points 

which can be rewritten as 

MTM = -f Present value of four years of risky 
premium payments of 400 basis points 

To go any further we have to compute the 
expected present value of these 400-basis points 
payments. However these payments are only 
made until the maturity of the CDS or to the 
time of a credit event, whichever occurs first. To 
compute the MTM we therefore need to weight 
each premium payment by the probability that 
there is no credit event up until that payment 
date. We therefore write 

MTM = 400 basis points x RPV01 

where the RPV01 is the "risky" price value of a 
basis point (PV01). This is defined as the present 
value of a 1 basis points payment made until the 
contractual maturity date of the position or to 
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the date of a credit event, whichever is sooner. 
Mathematically, we can write the RPV01 as 

n 

RPV01 = ^A y p(;r)Q(;r) 

;=i 

where A j is the year fraction for the payment j 
in the appropriate basis (typically Actual 360). 
For quarterly paying CDS, Ay is usually close to 
or equal to 0.25. Bringing this all together, we 
can write the MTM value of a long protection 
position as 

MTM = +[S(f,T) - S(0,T)] x RPV01[S(f,T),R] 

and that of a short protection position as 

MTM = —[S(f,T) - S(0,T)] x RPV01 [S(t,T),R] 

where S(0,T) is the contractual spread of the 
contract, T is the contractual maturity date and 
S(f,T) is the current par CDS spread to the con¬ 
tractual maturity date. It is essential to note that 
the RPV01 is a function of the market spread 
S(f,T) and the assumed recovery rate R since 
both are used to imply out the risk-neutral sur¬ 
vival probabilities. 

To crystallize all of this theory, we present in 
Table 3 the valuation of the trade introduced 
at the beginning of this section in which an in¬ 
vestor sells $10 million of five-year protection 
at 500 basis points and then wishes to mark it to 
market one year later when the market has a flat 
term structure at 100 basis points. For simplicity 
we have assumed a flat LIBOR term structure at 
2.5%. We assume a recovery rate of 40%. In par¬ 
ticular we show the quarterly coupon payment 
dates (we have ignored holidays and weekends 
for simplicity) and the corresponding values of 
P and Q, calibrated to reprice the term structure 
of default swap spreads. 

We see that the current par CDS spread is 100 
basis points, and that the risky PV01 of the po¬ 
sition is 3.7247—the present value of four years 
of risky 1 basis points payments is 3.7247 basis 
points. The resulting MTM value is $1,489,892. 
This makes sense. The market has valued the 
risk of four year protection on the reference 
entity at lOObp in spread terms, but the fixed 


Table 3 An Illustration of Calculation of the MTM 
Value 


Long or short protection 
Notional ($) 

Contractual Spread (bp) 
Settlement Date 

Maturity Date 

Flat LIBOR 

Par CDS Spread (bp) 
Recovery Rate 


Short 

10,000,000 

500 

20-Mar-13 

20-Mar-17 

2.50% 

100 

40% 

Payment 

Dates 

YearFrac 

Premium 
Leg Flows 

Q(t) 

P(t) 

20-Mar-13 



1.00000 

1.00000 

20-Jun-13 

0.25556 

127,778 

0.99575 

0.99372 

20-Sep-13 

0.25556 

127,778 

0.99152 

0.98748 

20-Dec-13 

0.25278 

126,389 

0.98735 

0.98135 

20-Mar-14 

0.25000 

125,000 

0.98324 

0.97533 

20-Jun-14 

0.25556 

127,778 

0.97906 

0.96920 

20-Sep-14 

0.25556 

127,778 

0.97490 

0.96312 

20-Dec-14 

0.25278 

126,389 

0.97080 

0.95714 

20-Mar-15 

0.25000 

125,000 

0.96677 

0.95126 

20-Jun-15 

0.25556 

127,778 

0.96266 

0.94529 

20-Sep-15 

0.25556 

127,778 

0.95857 

0.93936 

20-Dec-15 

0.25278 

126,389 

0.95454 

0.93352 

20-Mar-16 

0.25278 

126,389 

0.95053 

0.92773 

20-Jun-16 

0.25556 

127,778 

0.94649 

0.92190 

20-Sep-16 

0.25556 

127,778 

0.94246 

0.91612 

20-Dec-16 

0.25278 

126,389 

0.93850 

0.91043 

20-Mar-17 

0.25000 

125,000 

0.93460 

0.90484 

20-Jun-17 

0.25556 

127,778 

0.93063 

0.89916 

Prot Leg PV 

372,473 




Risky PV01 

3.7247 




Replication 

100.00 




Spread (bp) 





Contract MTM 

1,489,892 





coupon is 500bp. A new investor wanting to 
sell four year protection is therefore being over¬ 
compensated and to correct for this, has to pay 
a large upfront cost. 


CDS Risk and Sensitivities 

Market practitioners using CDS usually con¬ 
sider two risk measures. First is the CreditOl or 
SpreadOl. This is the change in the MTM value 
of a CDS position for a 1 basis points parallel 
shift in the CDS curve. Then there is the Interest 
Rate 01 which is the change in the MTM value 
of a CDS position for a 1 basis points change 
in LIBOR. In practice the LIBOR sensitivity of a 
CDS is small, usually at least an order of magni¬ 
tude less than that of the CreditOl. This reflects 
the fact that a CDS is almost a pure credit play. 
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It is actually possible to make some simple ap¬ 
proximations that make clear the dependence 
of the MTM on these inputs. First, we can ap¬ 
proximate the CDS spread in terms of the risk- 
neutral annualized default probability p, and 
assumed recovery rate R, using the equation 
S = p( 1 — R). The interpretation is that the an¬ 
nualized spread received for assuming a credit 
risk should equal the annualized default prob¬ 
ability times the loss on default, which in a CDS 
equals (100% — R). This approximation works 
very well in practice. If we assume a flat term 
structure of CDS spreads, approximate A with 
l /i, then we can approximate the MTM of a long 
protection position as 


MTM> I S ^- S ^ ± P{jT ) 

;=i 


1 - 


S(f, T) 
1 - R 


-i;/4 


We can immediately draw a number of con¬ 
clusions from this mathematical expression for 
the MTM value. First, the MTM value is not 
a linear function of the market spread S(t,T). 
In fact the MTM value of a short protection 
position is convex in the market spread, just 
as the price of a corporate bond is convex in 
the yield. Furthermore, it is also clear that the 
recovery rate sensitivity of the MTM value is 
large when the market spread is large. This 
means that where the market spread is below, 
say, 300 basis points, one does not have to be 
so precise about the recovery rate assumption. 
However, if spreads become large (say, 300 basis 
points and above) the recovery rate sensitivity 
becomes increasingly significant and care must 
be taken in making a recovery rate assumption. 


Calibrating the Recovery 
Rate Assumption 

To be precise, the recovery rate assumption, R, 
is the assumed price of the cheapest-to-deliver 
asset into the CDS contract within 72 calendar 
days of the notification of the credit event. This 
is not known today. Nor can it be extracted from 
any market prices. In theory, this would be pos¬ 


sible given the existence of an active and liquid 
digital default swap market. A digital default 
swap is a contract that pays the face value in 
the event of default—it is like a standard default 
swap but instead assumes a fixed recovery rate 
of zero. The ratio of the normal CDS spread and 
the digital default swap spread would equal 
(1 — R). However, the lack of liquidity of the 
digital market makes this calibration approach 
impractical. 

The usual starting point for calibrating recov¬ 
ery rates is to observe rating agency statistics. 
Both Moody's and S&P maintain significant 
databases of U.S. corporate bond defaults. Care 
must be taken to adjust any average recovery 
rates for country and sector effects. Recovery 
rates also have a link to the economic cycle. In 
recent years, average recovery rates have fallen 
well below the long-term averages computed 
by rating agencies. One reason why this is so is 
that Moody's, for example, defines the recovery 
rate of a bond as the price of that bond within 
some short period following the default. It is 
not the final value received by holders of the 
bond after going through the workout process. 
This means that the recovery rate is driven by 
the size of the bid for the bond in the distressed 
debt market. In periods of credit weakness, the 
distressed debt market is unable to absorb the 
oversupply of defaulted assets and the bid con¬ 
sequently falls. 

Another consideration when marking recov¬ 
ery rate assumptions is to take into account that 
following a restructuring event, which is not 
a full default, the deliverable obligations may 
trade at higher prices than in a full default. Since 
rating agencies do not consider restructuring as 
a full default, this effect is not accounted for 
in their statistics. Typical recovery rates being 
quoted in the market for good quality credits 
vary between 30% and 45%. 

When spreads are trading at very high lev¬ 
els of 1,000 basis points and above, it is 
important to look to the bond market to see 
if bond prices are revealing any information 
about the expected recovery rate in the event 
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of a default. For example, a recovery rate as¬ 
sumption of 40% would make no sense if one 
of the deliverable bonds into the CDS is trading 
at 30 cents on the dollar. In this case, the recov¬ 
ery rate assumption should clearly be moved 
below 30%. 

The Practicalities of Unwinding a 
Credit Default Swap 

A CDS is an over-the-counter (OTC) deriva¬ 
tive contract. This means that unlike some 
other derivatives contracts it is not exchange 
traded. Instead it involves an agreement be¬ 
tween two counterparties. As almost all CDS 
are traded within the framework of the ISDA 
Master Agreement, there is widespread stan¬ 
dardization of the documentation of CDS and 
many counterparties are happy to trade these 
bilateral contracts in what is effectively a sec¬ 
ondary market. To unwind a CDS before its 
maturity date, an investor may consider one 
of three courses of action: 

1. Negotiate a cash upfront price with the orig¬ 
inal counterparty. The price should be the 
same as the MTM value calculated according 
to the model. In practice a bid-offer spread 
will have to be crossed. Part of this negotia¬ 
tion may involve some exchange of informa¬ 
tion as to the recovery rate assumptions used 
by both counterparties. 

2. If the investor is shown a better upfront price 
by a counterparty different to the one with 
whom the initial trade was executed, they 
can ask to have the contract reassigned to 
this other counterparty and then close it out 
for a cash unwind value. 

3. They may choose to enter into an offsetting 
position. For example, an investor who has 
sold protection for five years may decide a 
year later to close out the contract by selling 
protection for four years. The value of this 
combined position should exactly equal the 
model market to market. 

Which one of these choices is made is usually 
determined by which is showing the best price. 


Prior 2009 we would have said that option 3 is 
different from the others because it leaves the 
CDS holder with an ongoing position consist¬ 
ing of a future stream of risky cashflows equal to 
the difference between the spread of the initial 
contract and that of the new unwind contract. 
Flowever now that CDS contracts on the same 
reference entity all trade with the same coupon, 
option 3 actually now leaves the parties with no 
net cashflows as both coupon streams will can¬ 
cel eachother. Instead the CDS unwind value is 
realised through the upfront cost of the offset¬ 
ting position and will be the same as options 1 
and 2. 

The matching of coupons means there is no 
economic value in retaining both positions and 
both positions can be cancelled. Indeed this ef¬ 
fect was the purpose of fixing CDS coupons 
since it means that in future, major dealers in 
CDS will no longer be left with many tens 
of thousands of legacy partially offsetting po¬ 
sitions and their associated counterparty risk. 
This reduces the gross notional of the CDS mar¬ 
ket and should reduce fears, unfounded or not, 
about systemic risk. It may also help to facilitate 
any future plans to migrate CDS contracts from 
the OTC market to an exchange traded market. 


KEY POINTS 

• There is a fundamental no-arbitrage relation¬ 
ship that links the pricing of credit default 
swaps and the bonds which they reference. 
Various market and contractual differences 
mean that this relationship is not strictly 
obeyed at all times. Flowever material devia¬ 
tions from this relationship should not persist. 

• Since the recouponing of CDS contracts in 
2009, CDS contracts no longer trade with zero 
initial value. The valuation of a CDS contract 
has become the process of determining the 
upfront value of a contract. 

• A pricing model for CDS contracts needs 
to take into account the different factors 
that drive the pricing of CDS. These include 
the market implied term structure for the 
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probability of survival/default, the expected 
recovery price if there is a credit event, and 
the level of interest rates used to discount fu¬ 
ture cashflows. 

• The role of the standard valuation model set 
out in this chapter is to determine this upfront 
value. As market prices are actually quoted 
in the form of a term structure of CDS par 
spreads, the model must be able to exactly 
refit these par spreads and to then use the im¬ 
plied survival curve plus assumptions about 
the expected recovery price to determine the 
upfront value of any given CDS contract. 

* An implementation of the standard pricing 
model has been produced by the ISDA and is 
available from www.cdsmodel.com. 

NOTES 

1. See O'Kane, Pedersen, and Turnbull (2003). 

2. See O'Kane (2001). 

3. For a discussion of the driving factors behind 
the basis, see O'Kane and McAdie (2001). 

4. Geske (1977) extended the Black-Scholes- 
Merton model to include multiple debts. See 
also Geske and Johnson (1984). Many barrier 
models appear as an easy solution for ana¬ 
lyzing the risky debt problem. 

5. The name "reduced-form" was first given 
by Darrell Duffie to differentiate from the 
structural form models of the Black-Scholes- 
Merton type. 

6. A continuous-time version of the equation 
can be found in the appendix of Chen, 
Fabozzi, and O'Kane (2003). 

7. See O'Kane and Turnbull (2003). 
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Abstract: A total return swap is a swap in which one party makes periodic floating rate payments 
to a counterparty in exchange for the total return realized on a reference asset (or underlying asset). 
The reference asset could be a credit-risky bond, a loan, a reference portfolio consisting of bonds or 
loans, an index representing a sector of the bond market, or an equity index. A total return swap 
can be used by asset managers for leveraging purposes and /or a transactionally efficient means 
for implementing a portfolio strategy. Bank managers use a total return swap as an efficient vehicle 
for transferring credit risk and as a means for reducing credit risk exposures. The Duffie-Singleton 
model can be used to value total return swaps. 


In this entry we explain the valuation of total 
return swaps. 1 We begin with an intuitive ap¬ 
proach. 


AN INTUITIVE APPROACH 

A typical total return swap is to swap the re¬ 
turn on a reference asset for a risk-free return, 
usually the London Interbank Offered Rate 
(LIBOR). The cash flows for the swap buyer 
(that is, the total return receiver) are shown in 
Figure 1. In the figure, L t is LIBOR at time f, s is 
the spread to LIBOR, and R t is the total return 
at time t. The cash outlay at time t per $1 of no¬ 
tional amount that must be made by the swap 


buyer is L t + s ; the cash inflow at time t per $1 
of notional amount is Rf. 

As a result, the pricing of a total return swap 
is to decide the right spread, s, to pay on the 
funding (that is, LIBOR) leg. Formally, 



n 

/ T l \ 

£ 0 

E eX P 

— 1 r(t)dt 


/=! 
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where r is the risk-free discount rate. 

In words, the spread should be set so that the 
expected payoff of the total return swap is equal 
to zero. (We employ the standard risk-neutral 
pricing and discounting at the risk-free rate.) 
To make the matter simple (we shall discuss 
more rigorous cases later), we view r, R, and 
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R, 


r 2 


s 


R 3 


r 4 


The next step is to use the forward mea¬ 
sure to simplify the left-hand side of the above 
H n equation: 


£ p(o, Tj)e 0 f ^irj 


Lj ] = £ P(o, T ; -)s 


;=i 


;'=i 


t-i 




Figure 1 Cash Flows for the Total Return 
Receiver 


Later, we show that the forward measure ex¬ 
pectation of an asset gives the forward price of 
the asset. Hence, the left-hand side of the above 
equation gives two forward curves, one on the 
asset return, R, and the other on LIBOR, L: 


L as three separate random variables. We then 
rearrange the above equation as 
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wher eff is the forward rate of i (i = R or L) for 
j periods ahead. Therefore, the spread can be 
solved easily as 

t P(0, Tj)[ff - /' ] 

» = ^- 

E P( 0, Tj) 

;=1 


Exchanging expectation and summation of 
the right-hand side gives 
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as the sum of risk-free pure discount bond 
prices. This implies 
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The result is intuitive: the spread is a weighted 
average of the expected difference between two 
floating-rate indexes. The weight is 

P(0. Tj) 
t P(0, T;) 

7=1 

Note that all the weights should sum to one. 


USING THE DUFFIE- 
SINGLETON MODEL 

The difference in two floating rates is mainly 
due to their credit risk, otherwise they should 
both offer identical rates and give identical for¬ 
ward curves. As a consequence, to be rigorous 
about getting the correct result, we need to in¬ 
corporate the credit risk in one of the indexes. 

Among various choices, the model by Duffie 
and Singleton (1999) suits the best for this 
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situation. The Duffie-Singleton model is a pop¬ 
ular reduced-form model that is used in credit 
risk modeling. In the model, the present value 
of any risky cash flow is defined as 


C(t) = 




N 


where N is the notional, L is LIBOR, and S 
is the index level. As noted earlier, since both 
cash flows are random, it is a floating-floating 
swap. Also since the index is always higher than 
LIBOR because of credit risk, this swap re¬ 
quires a premium. As a result, the premium is 
computed as the sum of all future values, dis¬ 
counted and expected: 
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where q is the "spread" in the Duffie-Singleton 
model that incorporates the recovery rate and 
default probability. 


THE FORWARD MEASURE 


In this section, we show how the forward mea¬ 
sure works and why a forward-adjusted expec¬ 
tation gives the forward value. We first state 
the separation principle that leads to the for¬ 
ward measure. Based on the no-arbitrage prin¬ 
ciple, the current value of any asset is the risk- 
neutral expected value of the discounted future 
payoff: 


c(f) = £ f 



J r(u)du I C(T) 


The separation principle states that if we 
adopt the forward measure, then the above 
equation can be written as 


C(t) = 



£ f F(r) [C(T)] 


where Ef rij [•] is the forward measure. 2 Note 
that the first term is nothing but the zero- 
coupon bond price: 


P(t, T) = £, 




and hence 


C(f) = P(f, T)E f F(F) [C(T)] 


While we do not prove this result, we should 
note the intuition behind it. Let C be a zero- 
coupon bond expiring at time u. Then the above 
result can be applied directly and gives 

P(f, s) = P(f, T)E f F(T) [P(T, u)] 
or equivalently 


£ f F(r) [P(T, s)] = 


Pit, s) 
P(t, T) 


This is an indirect proof that the forward- 
adjusted expectation gives a forward value. The 
instantaneous forward rate can be shown to be 
the forward-adjusted expectation of the future 
instantaneous spot rate: 
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The discrete forward rates, /d(L w , T) for 
all w and T, can also be shown to be the 
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forward-adjusted expectations of future dis¬ 
crete spot rates: 


f D (t, w, T) = 


^(f, w, T) 
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where t <w <T. 


• A reduced form model used in valuing credit 
derivatives, the Duffie-Singleton model, is 
employed to value total return swaps. 

• The forward measure expectation of an asset 
gives the forward price of the asset that is the 
underlying for a total return swap. 

NOTES 

1. For a discussion of total return swaps and 
their applications, see Anson et al. (2004). 

2. The derivation of this result can be found in a 
number of places. See, for example, Jamshid- 
ian (1987) and Chen (1996). 


KEY POINTS 

* A total return swap is a swap in which one 
party makes periodic floating rate payments 
to a counterparty in exchange for the total 
return realized on a reference asset such as a 
credit-risky bond. 

* The pricing of a total return swap is to decide 
the right spread to pay on the funding leg. 

* Using the standard risk-neutral pricing and 
discounting at the risk-free rate, the spread 
should be set so that the expected payoff of 
the total return swap is equal to zero. 
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Abstract: Swaps are useful for volatility hedging and speculation. Volatility swaps are forward 
contracts on future realized stock volatility, and variance swaps are similar contracts on variance, 
the square of future volatility. Covariance and correlation swaps are covariance and correlation 
forward contracts, respectively, of the underlying two assets. Using change of time method, one 
can model and price variance, volatility, covariance, and correlation swaps. 


Variance, volatility, covariance, and correlation 
swaps are relatively recent financial products 
that market participants can use for volatility 
hedging and speculation. The market for these 
types of swaps has been growing, with many 
investment banks and other financial institu¬ 
tions now actively quoting volatility swaps on 
various assets: stock indexes, currencies, and 
commodities. 

A stock's volatility is the simplest measure 
of its riskiness or uncertainty. In this entry we 
describe, model, and price variance, volatility, 
covariance, and correlation swaps. 

DESCRIPTION OF SWAPS 

We begin with a description of the different 
kinds of swaps that we will be discussing in 
this entry: variance swaps, volatility swaps, co- 
variance swaps, and correlation swaps. Table 
1 provides a summary of studies dealing with 
these swaps. 


Variance and Volatility Swaps 

A stock's volatility is the simplest measure of its 
riskiness or uncertainty. Formally, the volatil¬ 
ity ctr is the annualized standard deviation of 
the stock's returns during the period of interest, 
where the subscript R denotes the observed or 
"realized" volatility. 

Why trade volatility or variance swaps? As 
mentioned in Demeterfi et al. (1999, p. 9), "just 
as stock investors think they know something 
about the direction of the stock market so we 
may think we have insight into the level of fu¬ 
ture volatility. If we think current volatility is 
low, for the right price we might want to take a 
position that profits if volatility increases." 

The easiest way to trade volatility is to 
use volatility swaps, sometimes called realized 
volatility forward contracts, because they pro¬ 
vide only exposure to volatility and not other 
risk. Variance swaps are similar contracts on vari¬ 
ance, the square of the future volatility. As noted 
by Carr and Madan (1998), both types of swaps 
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Table 1 Summary of Studies Dealing with Variance, Volatility, Covariance, and Correlation Swaps 


Demeter et al. (1999) 


Javaheri et al. (2002) 

Brockhaus et al. (2000) 
Swishchuk (2004) 

Cheng et al. (2002) 

Elliott and Swishchuk (2007) 
Carr and Lee (2009) 
Swishchuk (2009a) 


Swishchuk et al. (2010) 

Kallsen et al. (2009) 

Swishchuk et al. (2010) 
Swishchuk (2005, 2006, 2007), 
Swishchuk et al. (2007), 
Swishchuk (2009a, 2010b), 
Swishchuk et al. (2010) 
Swishchuk (2011) 

Howison et al. (2004) 


Explained properties and theory of both variance and volatility swaps. 

Derived an analytical formula for theoretical fair value in the presence of realistic 
volatility skew. 

Pointed out that volatility swaps can be replicated by dynamically trading the 
more straightforward variance swap. 

Discussed the valuation and hedging of a GARCH(1,1) stochastic volatility model. 
Used a general and flexible PDE approach to determine first two moments of the 
realized variance in a continuous or discrete context. 

Approximated the expected realized volatility via a convexity adjustment. 
Provided an analytical approximation for the valuation of volatility swaps. 
Analyzed other options with volatility exposure. 

Priced covariance and correlation swaps in continuous time (Heston models for 
two stock prices) 

Priced covariance and correlation swaps in discrete time (Heston models for two 
stock prices) 

Studied option pricing formulae and pricing swaps for Markov-modulated 
Brownian with jumps. 

Provide an overview of the market of volatility derivatives and survey the early 
literature. 

Considered a semi-Markov modulated market consisting of a riskless asset or 
bond, B; and a risky asset or stock, S; whose dynamics depend on a semi-Markov 
process x : 

Using the martingale characterization of semi-Markov processes, noted the 
incompleteness of semi-Markov modulated markets and found the minimal 
martingale measure. 

Priced variance and volatility swaps for stochastic volatilities driven by the 
semi-Markov processes. 

Generalized results in Swishchuk (2009a) for the cases of the local current 
semi-Markov and local semi-Markov volatilities. 

Priced variance and volatility swaps and options on variance in affine stochastic 
volatility models. 

Volatility and variance swaps for COGARCH(l,l). 

Priced and modeled variance swaps for many stochastic volatility models with 
delay and jumps. 


Priced variance and volatility swaps in energy markets 

Considered the pricing of a range of volatility derivatives, including volatility and 
variance swaps and swaptions. 


provide an easy way for investors to gain expo¬ 
sure to the future level of volatility. 

A stock volatility swap's payoff at expiration 
is equal to 

N(cj r (S) - K ml ) 

where <jr(S) is the realized stock volatility 
(quoted in annual terms) over the life of 
contract. 



o> is a stochastic stock volatility, K vo i is the 
annualized volatility delivery price, and and 
N is the notional amount of the swap in dollar 
per annualized volatility point. 


Although options market participants talk of 
volatility, it is variance, or volatility squared, 
that has more fundamental significance. 1 A 
variance swap is a forward contract on annual¬ 
ized variance, the square of the realized volatil¬ 
ity. Its payoff at expiration is equal to 

N(<ri(S) - K mr ) 

where a|(S) is the realized stock variance 
(quoted in annual terms) over the life of the 
contract; that is. 
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K var is the delivery price for variance, and N is 
the notional amount of the swap in dollars per 
annualized volatility point squared. The holder 
of variance swap at expiration receives N dol¬ 
lars for every point by which the stock's realized 
variance erf (S) has exceeded the variance deliv¬ 
ery price K mr . Therefore, pricing the variance 
swap reduces to calculating the square of the 
realized volatility. 

Valuing a variance forward contract or swap 
is no different from valuing any other derivative 
security. The value of a forward contract P on 
future realized variance with strike price K mr is 
the expected present value of the future payoff 
in the risk-neutral world: 


P va r = E{e- rT (cr 2 R (S)-K var )} 


where r is the risk-free interest rate correspond¬ 
ing to the expiration date T, and E denotes 
the expectation. Thus, for calculating variance 
swaps we need to know only E (a|(S)}, namely 
the mean value of the underlying variance. 

To calculate volatility swaps we need more. 
Using the Brockhaus and Long (2000) approx¬ 
imation (which is the second-order Taylor ex¬ 
pansion for function ^/x) we have 2 


El/q^S)} « JE{V} 


Var{ V } 
8E{V} 3 /2 


where V = er|(S) and is the convexity 

adjustment. 

Thus, to calculate the value of volatility swaps 


Pvoi = {e ' T (E{&r(S)} — K vo i)} 


we need both £ {V} and Var{V). 

Later we explicitly solve the Cox-Ingersoll- 
Ross 3 equation for the Heston model for stochas¬ 
tic volatility 4 using the change of time method 
and present the formulas for price variance and 
volatility swaps for this model. 


Covariance and Correlation Swaps 

Options dependent on exchange rate move¬ 
ments, such as those paying in a currency dif¬ 
ferent from the underlying currency, have an 
exposure to movements of the correlation be¬ 


tween the asset and the exchange rate. This 
risk can be eliminated by using a covariance 
swap. 

A covariance swap is a covariance forward con¬ 
tract of the underlying rates S 1 and S 2 , which 
have a payoff at expiration that is equal to 

N(Cov R (S\ S 2 ) - K cov ) 

where K cov is a strike price, N is the notional 
amount, and Coz>r(S\ S 2 ) is a covariance be¬ 
tween two assets S 1 and S 2 . 

Logically, a correlation swap is a correlation for¬ 
ward contract of two underlying rates S 1 and S 2 
whose payoff at expiration is the following 

N(Corr R (S\ S 2 ) - K corr ) 

where Corr(S 1 , S 2 ) is a realized correlation of 
two underlying assets S 1 and S 2 , K corr is a strike 
price, and N is the notional amount. 

Pricing covariance swaps, from a theoretical 
point of view, is similar to pricing variance 
swaps, since 

Cov R (S\ S 2 ) = l/4{cr|(S 1 S 2 ) - <r|(SVS 2 )} 

where S 1 and S 2 are two underlying assets, 
<r|(S) is a variance swap for the underlying as¬ 
sets, and Covr(S^ , S 2 ) is a realized covariance of 
the two underlying assets S 1 and S 2 . 

Thus, we need to know the variances for S 1 S 2 
and for S 1 /S 2 . Correlation Corr^S 1 , S 2 ) is de¬ 
fined as follows: 

„ , C 1 c2\ Covr(S\S 2 ) 

Corr R (S , S z ) = = 

V CT i( S1 )v CT «( S2 ) 

where Cov^S 1 , S 2 ) is defined as above and 
ct|(S : ) is the realized variance for S 1 . 

Given two assets S} and Sf with f e [0, T], 
sampled on days f 0 = 0 < fj < t 2 < ... < t n = T 
between today and maturity T, the log-return 
of each asset is 

Ri = log (^r) ’ i = 1 ’ 2 ’-’ n ’ f = 1 ’ 2 
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Covariance and correlation can be approxi¬ 
mated by 


Cov n (S\ S 2 ) = 


(n — 1)T 




i =1 


and 


Corr n (S\ S 2 ) 
respectively. 


C0V n (S\ S 2 ) 
JVar„(S^Var n (S 2 ) 


MODELING AND PRICING 
OF VARIANCE, VOLATILITY, 
COVARIANCE, AND 
CORRELATION SWAPS WITH 
STOCHASTIC VOLATILITY 

In this section, we explicitly solve the Cox- 
Ingersoll-Ross equation for the stochastic 
volatility Heston model, using the change of 
time method, and present the formulas for price 
variance, volatility, covariance, and correlation 
swaps for this model. 


Stochastic Volatility: Heston Model 

Let (£2, T , Tt, P) be a probability space with 
filtration Tt, f e [0, T], Assume that the un¬ 
derlying asset St in the risk-neutral world 
and variance follow the following model (see 
Heston, 1993): 

j ‘f = r t dt + a t dw] 

| dcr 2 = k(6 2 — <j 2 )dt + yo t dw 2 

where r t is the deterministic interest rate, cro and 
6 are short and long volatility, k > 0 is the re¬ 
version speed, y > 0 is the volatility (of volatil¬ 
ity) parameter, and w \ and w 2 are independent 
standard Wiener processes. 

The Heston asset process has a variance a 2 
that follows a Cox-Ingersoll-Ross process, de¬ 
scribed by the second equation in (1). If the 
volatility a t follows the Omstein-Uhlenbeck 
process (see, for example, 0ksendal, 1998), then 


Ito's lemma shows that the variance a 2 fol¬ 
lows the process described exactly by the sec¬ 
ond equation in (1). Note that if 2 kd 2 > y 2 , then 
a 2 > 0 with P = 1 (see Heston, 1993). 

Solving the equation for variance a 2 in (1) ex¬ 
plicitly using the change of time method gives 

dol =k(6 2 — o 2 )dt + yotdw 2 (2) 

and takes the following form: 

Of 2 = e~ kt (oQ - e 2 + w 2 ( 0 t -1 )) + e 2 (3) 

where v; 2 (t) is an .^-measurable one¬ 
dimensional Wiener process, and ipf 1 is an in¬ 
verse function to f/; f : 

<pt = y- 2 f{e k ^{a 2 - 0 2 + w 2 (s)) + dh^y'ds 

Jo 

(4) 

This result simply follows from the following 
substitution 

v t = e kt (a 2 - 0 2 ) (5) 

into the equation (2) instead of <r 2 . 

Note that if 2 kd 2 > y 2 . then a 2 > 0 with P = 
1 (see, for example, Heston, 1993). From (5) it 
follows that 


v t e~ k, + d 2 

is strictly positive too. If we take the integrand 
in the last integral we obtain 

[e^»(a 2 - e 2 + w 2 (t)) + eV ^]- 1 

= [e 2k *°(e~ kt (a 2 - 6 2 + w 2 (t))) + O’ 1 )]' 1 
= [e k +°y/e- kt (cT 2 -0 2 + w 2 m + 0 2 }- 2 

= [e k * s Je~ kt v t +e 2 ]~ 2 

since v t = cr 2 — 0 2 + w 2 (t)). In the above inte¬ 
grals, the expression under the square root sign 
is positive and the square root is well defined. 
Hence, the last expression and therefore, the 
integrand in the integral in (4), are strictly pos¬ 
itive. It means that 0 f is a monotone function 
and there exists an inverse function e/y 1 in (3). 
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Valuing of Variance and Volatility 
Swaps 

From previous results we get the following ex¬ 
pression for the price of a variance and volatility 
swap. 

The value (or price) P var of a variance swap is 


Pvar — ^ 


-rT 


n -kT 


kT 


-(a 2 - e 2 ) + 9 2 -K v 


( 6 ) 


and the value (or price) P vo i of volatility swap is 
approximately 

| - » 2) +s 2 ) V2 

- (§?frK 2e!,r - 4e ‘ TtT - - ® 2 ) 

+(2 e 2kT kT - Se 2 * 7 + 4e kT - 1 )0 2 ] S j/ 

\ n _ e -kT \ 3 / 2 ' 

8^-^^(a o 2 -@ 2 )+0 2 j 

(7) 



The same expressions for E [V] and for Var[ V] 
also may be found in Brockhaus and Long 
( 2000 ). 


Valuing of Covariance and 
Correlation Swaps 

To value a covariance swap the following must 
be calculated 

P = e~ rT (ECov(S 1 , S 2 ) — K cov ) (8) 

To calculate ECov(S 1 , S 2 ) we need to calculate 
E jcr|(S 1 S 2 ) — cr|(S 1 /S 2 )} for the two underlying 
assets S 1 and S 2 . 

Let S{, i = 1, 2, be two strictly positive Ito's 
processes given by the following model 

dS^ 

— I = /r[df + aldzu'i 

d(c r') 2 = k'(0 2 - (<y‘) 2 )dt + y'^dw), i = 1, 
2,7 =3,4 


where n\,i = 1,2, are deterministic functions, 
k l ,6 ‘, y‘ ,i =1,2, are defined in a similar way 
as in (1), standard Wiener processes w(, j = 
3,4, are independent, [wj, wf] = ptdt,p t is de¬ 
terministic function of time, [, ] means the 
quadratic covariance, and standard Wiener 
processes w\,i =1, 2, and w(, j — 3,4, are 
independent. 

We note that 


d In S) = m\dt + a\dw\ 


where 


and 



Cov R (Sj, S 2 ) = ^[lnSy,lnS|] 

T 

Pto k a 2 dt 




Let us show that 

[In Sj, In S 2 ] = ^([ln(S}S 2 )] - [ln(Sf/S|)]) 

(9) 

First, note that 

d ]n(SjSf) = (m) + m 2 )dt + <7^dw+ 

and 

d In (Sj/Sf) = (m) — m))dt + afdwj 
where 

(a±) 2 := (ai) 2 ±2p t alal + (af) 2 

and 

dwf := ~^((j k dw) ± a 2 dw 2 ) 

a t 

Processes wf above are standard Wiener pro¬ 
cesses by the Levi-Kunita-Watanabe theorem 
and (7 t L are defined above. 

In this way, we obtain that 

[ln(S f 1 S f 2 )]= f(cr+) 2 ds = f ((^) 2 + 2p t a]a 2 
Jo Jo 

+( cr s) 2 )ds (10) 
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and 

[In (S f VS f 2 )]= f\a~) 2 ds — f\{a}) 2 -2p t a}o 2 
Jo Jo 

+(cr 2 ) 2 )ds (11) 

From (9)—(11) we have directly formula (8): 

[In Sj, In S 2 ] = i([ln(S^Sf)[ - [ln(S^/S^)[) 

( 12 ) 

Thus, from (12) we obtain that 

Co»r(S 1 , S 2 ) = l/4(<ri(S 1 S 2 ) - ctr(S 1 /S 2 )) 

Returning to the valuation of the covariance 
swap in (8) we have 

P = E{e~ rT (Cov(S\ S 2 ) - K cov j = \e~ rT (Ea\ 
(S 2 S 2 ) — Ect|(S 1 /S 2 ) — 4K C0 „) 

The problem now has reduced to the same 
problem as above, but instead of a 2 we need to 
take (er f + ) 2 for S X S 2 and (aft) 2 for S 1 /S 2 (with 
(er^) 2 = (<r/) 2 ± 2p t a}a 2 + (a 2 ) 2 ), and proceed 
with similar calculations as for the variance and 
volatility swaps. 


Table 2 Statistics on Log Returns S&P60 Canada 
Index 


Series: 

LOG RETURNS S&P60 
CANADA INDEX 

Sample: 

1 1300 

Observations: 

1300 

Mean 

0.000235 

Median 

0.000593 

Maximum 

0.051983 

Minimum 

-0.101108 

Std. Dev. 

0.013567 

Skewness 

-0.665741 

Kurtosis 

7.787327 


exhibit leptokurtosis. If we take a look at the 
S&P60 Canada Index log returns for the 5-year 
historical period, we observe volatility cluster¬ 
ing in the return series. These facts indicate 
the presence of conditional heteroscedasticity. 
A GARCH(1,1) regression is applied to the se¬ 
ries and the results are obtained as in Table 3. 
This table allows one to generate different input 
variables for the volatility swap model. 

We use the following relationships: 0 = 

ji, k = —y = aM to calculate the 
following discrete GARCH(1,1) parameters: 


NUMERICAL EXAMPLE: 
VOLATILITY SWAP LOR 
S&P60 CANADA INDEX 

In this section, we apply the analytical solutions 
provided above to price a swap on the volatil¬ 
ity of the S&P60 Canada Index for five years 
(January 1997-February 2002). 5 

Suppose that at the end of February 2002 we 
wanted to price the fixed leg of a volatility swap 
based on the volatility of the S&P60 Canada 
Index. The statistics on log returns for the 
S&P60 Canada Index for the five years covering 
January 1997-February 2002 are presented in 
Table 2. 

From the statistical data for the S&P60 Canada 
Index log returns for the 5-year historical pe¬ 
riod (1,300 observations from January 1997 to 
February 2002) it may be seen that the data 


• ARCH(1,1) coefficient a = 0.060445 

• GARCH(1,1) coefficient ft = 0.927264 

• The Pearson kurtosis (fourth moment of the 
drift-adjusted stock return) £ = 7.787327 

• Long volatility <9 = 0.05289724; k = 3.09733 

• y = 2.499827486 

• Short volatility ctq = 0.01 


Parameter V may be found from the expres¬ 
sion V = , where C = 2.58 x 10~ 6 is de¬ 

fined in Table 3. Thus, V = 0.00020991; dt = 
1/252 = 0.003968254. 

Applying the analytical solutions (6) and (7) 
for a swap maturity T of 0.91 years, we find the 
following values: 


E{V} = 


1 - 


-kT 

(a 2 - 6 2 ) + 0 2 = 0.3364100835 


kT 
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Table 3 Estimation of the GARCH(1,1) Process 


Dependent Variable: Log returns of S&P60 Canada Index Prices 

Method: ML-ARCH 

Included Observations: 1,300 

Convergence achieved after 28 observations 


Coefficient 

Std. error 

z-statistic 

Prob. 

C 

0.000617 

0.000338 

1.824378 

0.0681 

Variance Equation 

C 

2.58E-06 

3.91E-07 

6.597337 

0 

ARCH(l) 

0.060445 

0.007336 

8.238968 

0 

GARCH(l) 

0.927264 

0.006554 

141.4812 

0 

R-squared 

-0.000791 

Mean dependent var 

- 

0.000235 

Adjusted R-squared 

-0.003108 

S.D. dependent var 

- 

0.013567 

S.E. of regression 

0.013588 

Akaike info criterion 

- 

-5.928474 

Sum squared resid 

0.239283 

Schwartz criterion 

- 

-5.912566 

Log likelihood 

3857.508 

Durbin-Watson stat 

- 

1.886028 


and 

Var(V) = Y -^^\.& 2kT - 4 e kT kT - 2)(a 2 - 6 2 ) 

+ (2 e 2kT kT - 3e 2kT + Ae kT - 1)<9 2 ] 

= 0.0005516049969 


The convexity adjustment ^E\vpn is e T ual to 
0.0003533740855. 


If the nonadjusted strike is equal to 18.7751%, 
then the adjusted strike is equal to 

18.7751% - 0.03533740855% = 18.73976259% 

This is the fixed leg of the volatility swap for a 
maturity T = 0.91. 

Repeating this approach for a series of matu¬ 
rities up to 10 years, we obtain the result shown 
in Figure 2 for the S&P60 Canada Index Volatil¬ 
ity Swap. Figure 1 illustrates the nonadjusted 


Convexity Adjustment 
(S&P60 Canada Index) 



S&P60 Canada Index Volatility Swap 



Figure 1 Convexity Adjustment 


Figure 2 S&P60 Canada Index Volatility Swap 
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and adjusted volatility for the same series of 

maturities (see formula (7)). 

KEY POINTS 

• Variance, volatility, covariance, and correla¬ 
tion swaps are useful for volatility hedging 
and speculation. 

• Volatility swaps are forward contracts on fu¬ 
ture realized stock volatility 

• Variance swaps are similar contracts on vari¬ 
ance, the square of the future volatility 

• Covariance and correlation swaps are covari¬ 
ance and correlation forward contracts, re¬ 
spectively, of the underlying two assets. 

• Using change of time one can model and price 
variance, volatility, covariance, and correla¬ 
tion swaps for the stochastic volatility Heston 
model. 


NOTES 

1. See Demeterfi, Derman, Kamal, and Zou 
(1999). 

2. See also Javaheri et al. (2002, p. 16). 

3. See Cox, Ingersoll, and Ross (1985). 

4. See Heston (1993). 

5. These data were supplied by Raymond 
Theoret (Universite du Quebec a Montreal, 
Montreal, Quebec, Canada) and Pierre 
Rostan (Analyst at the R&D Department 
of Bourse de Montreal and Universite 
du Quebec a Montreal, Montreal, Quebec, 
Canada). They calibrated the GARCH pa¬ 
rameters from five years of daily historic 
S&P60 Canada Index from January 1997 
to February 2002. See Theoret, Zabre, and 
Rostan (2002). 
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Abstract: Derivatives are financial contingent claims designed for the pricing, transfer, and manage¬ 
ment of risk embedded in underlying securities in the fixed income, equity, and foreign exchange 
markets. Their rapid growth spurred their introduction to the energy commodity and shipping 
markets where the underlying assets are real commodities, crude oil, refined products, natural 
gas, electricity, and shipping tonnage. Risk-neutral pricing and stochastic models developed for 
financial derivatives have been extended to energy derivatives for the modeling of correlated com¬ 
modity and shipping forward curves and for the pricing of their contingent claims. This has enabled 
the valuation and risk management of a wide range of assets and derivatives in the energy and 
shipping markets. They include storage for natural gas, floating storage of crude oil, products and 
liquefied natural gas in tankers, refineries, power plants and utility scale wind farms, shipping 
structured securities, cargo vessels, and shipping derivatives portfolios. 


Investments in energy and shipping assets are 
exposed to interest rate, commodity price, 
and freight rate risks. The management of 
these risks has led to the introduction and 
widespread use of derivatives, which have 
experienced explosive growth over the past 
several decades. In the fixed income market, 
interest rat efutures and futures options emerged 
in the 1980s in response to the need to hedge 
interest rate swap risk. This led to the develop¬ 
ment of financial models for the arbitrage-free 
evolution of the term structure of interest 
rates and the pricing of a wide range of fixed 
income derivatives, laying the foundation for 
the development of analogous models for the 


arbitrage-free evolution of the forward curves 
of physical commodities including crude oil, 
its refined products, natural gas, and recently 
shipping freight rates. 

Commodity futures settle physically against 
the price of a spot commodity that must be 
delivered at the contract expiration or in cash 
against a spot commodity index. The latter 
is the case in shipping where forward freight 
agreements (FFAs) and freight rate futures set¬ 
tle against shipping indexes composed of a 
basket of freight rates. The first generation of 
commodity futures models was based on the 
development of stochastic models for the 
spot index and the use of the principles of 
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risk-neutral pricing for the valuation of deriva¬ 
tives written on the spot. Recent models are 
based on the insight that in the absence of arbi¬ 
trage futures prices with daily credits and deb¬ 
its into a margin account are martingales. This 
has enabled the modeling of the evolution of fu¬ 
tures prices of any tenor as lognormal diffusions 
with zero drift in a Gaussian setting. The pri¬ 
mary unknown in this model of futures prices is 
the volatility term structure, which may be esti¬ 
mated from market prices of liquid futures and 
futures options. The arbitrage-free price process 
for the spot commodity or underlying index fol¬ 
lows from this martingale representation of the 
entire commodity forward curve in the limit of 
small tenors. 

The martingale representation of the com¬ 
modity forward curve lends itself to a 
parsimonious modeling using the powerful 
statistical techniques of principal components 
analysis in the case of a single futures curve 
and of canonical correlation analysis in the case 
of multiple correlated forward curves. In both 
cases a small number of statistical factors may 
be derived, which are shown to follow mean 
reverting log-Ornstein-Uhlenbeck diffusions. 
This financial modeling and statistical inference 
framework of cross-correlated energy com¬ 
modity and shipping forward curves allows the 
pricing of a wide range of vanilla, spread, and 
exotic derivatives written on futures contracts. 
Moreover, the model of the forward curve in 
terms of a small number of statistical factors 
enables the explicit valuation and hedging of a 
wide range of energy and shipping assets with 
cash flow exposures that may be replicated by 
the prices of traded futures contracts. 

This entry reviews the fundamental develop¬ 
ments that led to the introduction of financial 
and commodity derivatives, their stochastic 
modeling, and risk-neutral pricing. Pricing 
in Gaussian and non-Gaussian settings is 
addressed and shown in most cases to lead to 
closed-form results even when the underlying 
is represented by an advanced stochastic 
model. The martingale modeling of forward 


curves and their estimation by a principal 
components analysis is discussed for the crude 
oil market, its refined products, natural gas, 
and the shipping market. 

The valuation of real assets and financial 
claims of energy and shipping entities is dis¬ 
cussed. The parsimonious form of the arbitrage- 
free factor models of the pertinent forward 
curves enable the pricing of these assets and se¬ 
curities by risk-neutral pricing. Their risk man¬ 
agement is also addressed, drawing upon the 
explicit form of the underlying factor model 
and the techniques of stochastic optimal control, 
which have found widespread use for the man¬ 
agement of portfolios of financial securities. 

ENERGY COMMODITY 
PRICE MODELS 

The past several decades have witnessed the 
emergence and rapid development of the fields 
of financial engineering and derivatives. Grown 
out of Paul Samuelson's foundational insights 
on the relationship between informationally ef¬ 
ficient markets and the random walk and his 
introduction of the lognormal diffusion model 
of security prices, a wide range of stochastic 
models of security prices and arbitrage-free val¬ 
uation methods were developed for the pricing 
of derivatives written on financial securities, 
real assets, and other variables (see Samuel- 
son, 1965). The use of these models and pric¬ 
ing methods in the fixed income, equity, foreign 
exchange, and credit markets is growing as is 
the complexity of the mathematical, economet¬ 
ric, and filtering methods necessary for their 
implementation. More recently, these methods 
have been adapted to the energy and shipping 
sectors in order to control the high volatility of 
energy prices and freight rates and spur new 
investment. 

Spot Price Models 

Energy commodity prices are characterized 
by idiosyncrasies not encountered in the 
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financial markets. The volatility of the price 
of oil, natural gas, and especially electricity 
is a lot larger than that of currencies, interest 
rates, and equities. Energy prices often exhibit 
mean reversion, seasonality, and sharp and 
asymmetric spikes, which require the develop¬ 
ment of advanced price models and derivative 
valuation methods, extensions of the Black- 
Scholes-Merton stock option pricing formula. 
Moreover, a complex interaction often exists 
between the attributes of the spot physical 
commodity and its forward contracts and other 
derivatives that is not present for financial 
securities and their derivatives, which settle 
electronically and do not require the delivery 
of a physical asset. This requires the use of 
the extended risk-neutral valuation method of 
derivatives written on real assets and other 
variables that are not tradable (Ross, 1976). 

Standard reduced form stochastic models for 
the spot price of crude oil and natural gas are 
diffusions that account for mean reversion and 
seasonality and depend on hidden economic 
factors. They include a stochastic convenience 
yield—the implied dividend received by the 
owner of the commodity held in storage—and a 
long-term stochastic equilibrium price to which 
the spot price mean reverts. The two-factor spot 
price models of Gibson and Schwartz (1990), 
Schwartz (1997), and Schwartz and Smith (2000) 
model the spot price and its factors as diffusions 
and permit the explicit valuation of futures 
and forward contracts and their options writ¬ 
ten on the spot commodity using the extended 
risk-neutral valuation of derivatives written on 
real assets (Hull, 2003). More general spot price 
models that may include stochastic volatility 
and jumps are discussed in Clewlow and Strick¬ 
land (2000) and London (2007). In the study 
of Cortazar and Naranjo (2006) the entire oil 
futures curve and its volatility term structure 
are shown to be very well modeled by a four- 
factor spot price Gaussian model, which was 
estimated by Kalman filtering. 

Stochastic models for the evolution of the elec¬ 
tricity prices must account for sharp and asym¬ 


metric spikes, strong mean reversion, jumps, 
and a dependence on structural factors af¬ 
fecting the electricity market. Reduced form 
stochastic models of electricity prices are usu¬ 
ally jump-diffusions and Levy processes. An 
example is the jump-diffusion model of Kou 
(2002), which permits the independent para¬ 
metric adjustment of the tail thicknesses of its 
probability distribution and allows the explicit 
pricing of electricity derivatives. Other models 
are discussed in Eydeland and Wolyniec (2003), 
London (2007), and Bength, Bength, and Koeke- 
bakker (2008). Analogous models apply to the 
modeling of the spot price process of shipping 
freight rates. 

Forward Curve Models 

Crude oil and natural gas have liquid futures 
contracts trading on the New York Mercantile 
Exchange (NYMEX) and the Intercontinental 
Exchange (ICE) with tenors of several years. 
For these commodities arbitrage-free forward 
curve models have been developed by Mil- 
tersen and Schwartz (1998), which accept as in¬ 
put the market prices of liquid futures and lead 
to the pricing of a number of other derivatives. 
The arbitrage-free evolution of the spot price 
follows from futures contracts of small tenors. 

The modeling of the oil and natural gas 
futures curve is based on the Heath-Jarrow- 
Morton (HJM) framework developed for the 
arbitrage-free modeling of the term structure 
of interest rates. A principal task of the HJM 
framework is the parameterization of the 
volatility and correlation structure of the 
futures curve by a small number of indepen¬ 
dent factors using a principal components 
analysis. This was carried out by Sclavounos 
and Ellefsen (2009), where it was shown that 
three principal components capture most of the 
fluctuations of the forward curve. In the same 
paper the arbitrage-free evolution of the spot 
price was derived as implied in equilibrium by 
the forward curve and was shown to be driven 
by three independent factors that follow mean 
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reverting logarithmic Ornstein-Uhlenbeck 
(log-OU) processes with stochastic drifts. Calls, 
puts, swaps, caps, and their options written on 
futures contracts may then be valued explicitly 
as in the interest rate markets for use in energy 
risk management applications (Hull, 2003; 
Musiela and Rutkowski, 2005). 

Energy Derivatives 

In addition to the standard derivatives dis¬ 
cussed above, more complex derivatives have 
been introduced in the energy markets reflect¬ 
ing the economics of energy assets. In partic¬ 
ular, power plants are exposed to the spot/ 
futures price difference of two energy commodi¬ 
ties , for example, natural gas / electricity, coal / 
electricity; refineries are exposed to the price 
differentials of two fuels—crude oil / gasoline, 
crude oil/jet fuel; and oil and natural gas 
pipelines and electricity transmission lines are 
exposed to the price differentials of the same 
spot commodity at two different geographical 
locations. 

A partial list of exotic derivatives used for 
the valuation, hedging, and risk management of 
energy assets include options on the spread be¬ 
tween two futures contracts with different expi¬ 
rations written on the same commodity, options 
on the price difference of two futures contracts 
with the same expiration written on two sepa¬ 
rate commodities, options to exchange two spot 
commodities or their futures, average-price and 
average-strike Asian options, barrier options— 
which are exercised when the commodity price 
crosses a threshold—and American swing op¬ 
tions for the delivery of an uncertain amount 
of the commodity. A discussion of these and 
other exotic energy derivatives is presented in 
Clewlow and Strickland (2000), Eydeland and 
Wolyniec (2003), and Geman (2005). 

Exotic energy derivatives are complex to price 
and hedge for advanced commodity price mod¬ 
els. Furthermore, spread derivatives depend 
not only on the volatility but also on the corre¬ 
lation between various spot/futures contracts. 


which may be challenging to model and cali¬ 
brate to market prices. Consequently, the de¬ 
velopment of accurate stochastic price models 
and pricing methods for exotic derivatives and 
spread options may be particularly helpful for 
the valuation and hedging of energy assets. Ac¬ 
curate analytical approximations of spread op¬ 
tions prices and their hedge ratios are derived 
by Li, Deng, and Zhou (2008) for two assets that 
follow correlated log-OU diffusions. Extensions 
to multiasset spread option pricing and hedg¬ 
ing are presented in Li, Zhou, and Deng (2010). 

Shipping Derivatives 

The success and rapid growth of derivatives 
in the energy commodity markets has spurred 
their introduction in the shipping markets. 
Shipping derivatives—forward freight agree¬ 
ments (FFAs) and freight futures—were intro¬ 
duced in 1985 and are widely used by the dry 
bulk and tanker shipping markets as discussed 
by Alizadeh and Nomikos (2009). Freight rate 
swaps were also recently introduced in the con¬ 
tainer ship markets. The growth of shipping 
derivatives is also motivated by the correla¬ 
tion of the supply and demand for shipping 
ton-miles with that of the bulk commodities 
transported by cargo vessels—crude oil, refined 
products, iron ore, and coal. An example is the 
recent introduction of over-the-counter iron ore 
swaps following the initiation of quarterly pric¬ 
ing of that bulk commodity. Therefore the need 
arises for the robust statistical modeling of the 
correlated forward curves of shipping and com¬ 
modity markets and the pricing of shipping 
derivatives for use in risk management. 

VALUATION AND HEDGING 
OF DERIVATIVES 

The pricing of derivatives written on a finan¬ 
cial security, a spot commodity, or another 
variable—the underlying—may be carried out 
by using the fundamental principles of risk- 
neutral valuation. When the underlying is a 
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nontradable—for example, temperature—an 
associated market price of risk process enters 
in the derivative price, which must be esti¬ 
mated from the prices of traded instruments. 
Otherwise, the fundamental economic insight 
of risk-neutral pricing and the associated 
mathematical techniques apply over a wide 
range of assets and stochastic models used for 
the modeling of the underlying process. 

Vanilla Derivatives for 
Jump-Diffusions 

A standard derivative pricing method for the 
wide class of jump-diffusion processes is based 
on the derivation of a risk-neutral probabil¬ 
ity measure under which European derivative 
prices may be expressed as conditional expecta¬ 
tions of a payoff at a specified horizon (Duffie, 
2001; Hull, 2003; Shreve, 2004; Musiela and 
Rutkowski, 2005). Derivative prices expressed 
as conditional expectations may be evaluated 
explicitly in the form of Fourier integrals of 
the complex characteristic function of the jump- 
diffusion by using the methods developed by 
Heston (1993), Carr and Madan (1998), Duffie, 
Pan, and Singleton (2000), and Lewis (2005). 
The use of this derivative pricing method in 
practice for the modeling of the equity-implied 
volatility surface and the calibration of a wide 
range of jump-diffusion models are discussed 
in Gatheral (2006). 

Derivative prices expressed in the form of 
Fourier integrals allow the explicit evaluation 
of the derivative sensitivities known as the 
Greeks. They permit the analytical derivation of 
the stochastic process followed by the deriva¬ 
tive price itself by using the Ito-Doeblin for¬ 
mula and often allow the explicit pricing of 
European derivatives with more general pay¬ 
offs. The evaluation of Fourier integrals may be 
carried out efficiently by complex contour in¬ 
tegration, numerical integration, or fast Fourier 
transform techniques. 

The valuation of American options for jump- 
diffusions and the optimal stopping problems 


that arise when early exercise is permitted is dis¬ 
cussed in Oksendal and Sulem (2005). When the 
use of analytical techniques is not possible for 
the evaluation of American options and the de¬ 
termination of the early exercise boundary, the 
approximate method of Longstaff and Schwartz 
(2001), the quasi-analytical method described 
in Albanese and Campolieti (2006), and Monte 
Carlo simulation methods described in Glasser- 
man (2004) may be used. 

Exotic Derivatives for 
Jump-Diffusions 

The valuation of a number of exotic derivatives 
is considerably more complex than their vanilla 
counterparts because their price depends on the 
path of the underlying process. Typical exam¬ 
ples are barrier and Asian options. Therefore, 
the price of exotic derivatives is more sensitive 
on the structure of the underlying stochastic 
process than is the price of vanilla calls and 
puts. Consequently, the choice of the under¬ 
lying process and the subsequent pricing and 
hedging of exotic derivatives may be a task of 
considerable complexity, a topic discussed for 
equities by Gatheral (2006). 

For the geometric Brownian motion with con¬ 
stant drift and volatility, explicit prices of a 
number of exotic derivatives are derived in 
Shreve (2004). When the underlying process 
follows a jump-diffusion, the pricing of ex¬ 
otic derivatives by Fourier methods leads to 
Wiener-Hopf problems in the complex plane, 
the factorization of which is often possible ana¬ 
lytically. This is the case for the jump-diffusion 
model of Kou (2002), which leads to the explicit 
valuation of barrier options. These analytical re¬ 
sults are developed in Cont and Tankov (2004), 
where the class of Levy stochastic processes is 
also studied. 

The extension of these Fourier methods to 
the valuation of options on spread contracts 
and other complex energy derivatives is dis¬ 
cussed in London (2007). In the same reference 
the derivation of the characteristic functions of 
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a number of jump diffusion models of energy 
prices is presented along with the valuation of 
weather derivatives. 

Statistical Inference of Asset 
Price Models 

Asset price models usually contain a number 
of parameters that need to be estimated upon 
calibration of the model against market prices. 
This may be carried out by using the economet¬ 
ric techniques presented in Campbell, Lo, and 
MacKinlay (1997), Greene (2000), and Singleton 
(2006). 

Stochastic models of commodity prices of¬ 
ten contain hidden factors—stochastic trends, 
volatilities, and the convenience yield—which 
are usually modeled as diffusions. The estima¬ 
tion of the models may be carried out by cast¬ 
ing the time series obtained upon discretization 
in state space form by using the methods pre¬ 
sented by Durbin and Koopman (2001). The si¬ 
multaneous inference of the model parameters 
and hidden factors may then be carried out by 
using dual Kalman filters and the expectations 
maximization algorithm presented in Haykin 
(2001). These statistical inference techniques 
may also be used for the estimation of non¬ 
linear structural form models of power prices 
and shipping freight rates, which are known 
to depend on nonlinearities in the supply and 
demand schedules of the underlying markets. 

Stochastic Optimal Control Methods 

The availability of analytical models governing 
the evolution of spot commodity prices and 
their derivatives allows the formulation and 
solution of a wide range of valuation and 
hedging problems involving energy assets 
and their derivatives. The resulting stochastic 
dynamic programming problems are often 
possible to treat analytically by using the 
stochastic optimal control methods presented 
in Yong and Zhou (1999) for diffusions with 


time-dependent deterministic coefficients. 
These results follow from the solution of the 
Hamilton-Jacobi-Belman (HJB) partial differ¬ 
ential equation or the Pontryagin stochastic 
maximum principle and its connection to 
backwards stochastic differential equations. 
Extensions of these stochastic optimal control 
methods for underlying processes that follow 
diffusions with stochastic coefficients are 
discussed in Lewis (2005). Stochastic control 
methods for jump-diffusions and the treatment 
of the associated integro-differential equations 
are discussed in Oksendal and Sulem (2005). 


APPLICATIONS 

The stochastic price models, derivative val¬ 
uation methods, and stochastic optimal con¬ 
trol algorithms presented above have found 
widespread use in the securities markets. A 
number of applications drawn from the energy 
and shipping sectors are discussed below. 

Valuation of Natural Gas and 
Oil Storage 

Storage facilities for natural gas and oil are 
assets that enable the transfer of power gen¬ 
eration capacity between two time periods 
in response to supply and demand fluctua¬ 
tions. Such fluctuations are affected by the dif¬ 
ferent seasonal variations of the natural gas 
and electricity prices, the former usually being 
higher and more volatile during the winter and 
the latter often being a lot higher during the 
summer. 

The availability of inexpensive gas storage fa¬ 
cilities and the need to invest in new capacity 
allows the low-cost shifting of cheap summer 
production and storage of gas into the win¬ 
ter season. Moreover, the availability of gas 
storage facilities allows the quick delivery of 
natural gas when demand peaks, circumvent¬ 
ing the need for expensive new production. 
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These economic drivers call for the valuation 
and optimal operation of storage facilities for 
natural gas and other fuels, in the face of 
stochastic gas prices, which are assumed to be 
unaffected by the availability of storage. 

The storage valuation problem may be cast 
in a stochastic dynamic programming frame¬ 
work that relies on the analytical modeling of 
the commodity spot prices, futures curve, and 
their derivatives as outlined above. In its gen¬ 
erality, this valuation problem reduces to the 
determination of optimal storage in/out-flows 
given the commodity seasonal price dynam¬ 
ics. The analytical framework for this valuation 
problem is presented in Eydeland and Wolyniec 
(2003) and discussed below in the context of the 
valuation of crude oil floating storage using a prin¬ 
cipal components factor model for the forward 
curve. 


Valuation of Flexible Hydrocarbon 
Reservoirs 

The optimal dynamic management of proven 
but undeveloped hydrocarbon reservoirs and 
flexible oil fields leads to a sequence of deci¬ 
sions analogous to those described above for 
above-ground storage facilities. When signifi¬ 
cant irreversible investments with option-like 
value are necessary for the development of flex¬ 
ible hydrocarbon fields, the extended valuation 
framework of real options is needed. Its de¬ 
velopment is presented in Dixit and Pindyck 
(1994), and a number of applications are dis¬ 
cussed in Brennan and Trigeorgis (2000) and 
Copeland and Antikarov (2001). Given an HJM 
model for the oil and natural gas futures curve 
and its derivatives, the operation of flexible hy¬ 
drocarbon fields may be reduced to a stochastic 
dynamic programming problem leading to the 
determination of optimal investment and hy¬ 
drocarbon extraction flows. A number of real 
projects where these valuation methods are ap¬ 
plicable are presented in Ronn (2002). 


Hedging of Fuel Costs 

The risk management of fuel costs in the 
transportation and energy sectors entails the 
hedging of commitments to purchase or deliver 
energy commodities—crude oil, natural gas, 
aviation jet fuel, gasoline, heating oil, and ship¬ 
ping bunker fuels by various entities—refiner¬ 
ies, utilities, airlines, and shipping companies. 
An objective of such hedging programs is the 
minimization of the variance of the commodity 
price exposures over a given horizon. Variance¬ 
minimizing quadratic hedges of complex 
derivative exposures using simpler securities 
is common in the financial markets and may be 
reduced to the solution of a stochastic dynamic 
programming problem (Yong and Zhou, 1999; 
Jouini, Cvitanic, and Musiela, 2001). 

A fuel cost hedging program may be imple¬ 
mented by using a combination of physical 
storage and the futures market. Such a hedg¬ 
ing task faces a number of challenges, includ¬ 
ing commodity price and volume uncertainty, 
a decreasing liquidity of futures contracts of in¬ 
creasing tenor, an increasing volatility of futures 
contracts of decreasing tenor that need to be 
rolled over, and exposure to basis risk when 
liquid futures contracts for the fuel of inter¬ 
est do not exist. The solution of the resulting 
dynamic optimization problem may be car¬ 
ried out by taking advantage of the analyti¬ 
cal modeling, pricing, and optimal control tech¬ 
niques outlined above. The complexity of such 
hedging programs is considerable as is high¬ 
lighted by the collapse of the stacked hedges of 
Mettalgeshellschaft studied in Culp and Miller 
(1999). 

Valuation of Seaborne 
Energy Cargoes 

Crude oil and other liquid energy cargoes trans¬ 
ported in tanker fleets may be traded while the 
cargo is in transit. This is akin to the optimal fi¬ 
nancial management of energy commodities in 
movable storage. Here, the location and speed 
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of the tankers enter as controls in a stochastic 
dynamic programming framework, which may 
be treated with the analytical techniques de¬ 
scribed above. The timing, sales price, and port 
of delivery of the energy cargo are variables that 
may be selected in a value-maximizing manner 
while the commodity is in transit. These deci¬ 
sions must take into consideration the shape 
of the oil futures curve, which may be trading 
in contango, backwardation, or in a composite 
formation, as well as the tanker freight rate for¬ 
ward curve. Moreover, since a large portion of 
the above-ground crude oil is in transit, the ag¬ 
gregate tonnage and average speed of crude oil 
tanker fleets may have a material impact upon 
the crude oil convenience yield, the shape of 
its futures term structure, and its impact on the 
valuation of seaborne oil. 

The principal components model of the crude 
oil forward curve developed by Sclavounos and 
Ellefsen (2009) was applied by Ellefsen (2010) to 
the valuation of crude oil floating storage. The 
value of a crude oil cargo carried by a very large 
crude carrier (VLCC) is shown to be that of an 
American option with an embedded early ex¬ 
ercise premium. The valuation of this option is 
carried out in a semianalytical form by virtue of 
the explicit form of the Omstein-Uhlenbeck dif¬ 
fusions and their transition densities that gov¬ 
ern the independent factors that drive the crude 
oil forward curve, using the method presented 
in Albanese and Campolieti (2006). It is shown 
that the value of the early exercise premium 
can be significant, particularly in volatile mar¬ 
kets and even if the forward curve is not trad¬ 
ing in extreme contango. The returns of crude 
oil floating storage investments are also stud¬ 
ied and shown to be significant. Their hedging 
using crude oil futures is also addressed. 

The valuation methodology developed for 
crude oil floating storage extends with minor 
modifications to land-based storage of crude 
oil, products, bunker fuels, natural gas, and 
other commodities. The necessary analytical 
machinery lies in the development of the prin¬ 
cipal component analysis of the forward curve 


of the commodity under consideration and the 
analytical derivation of the diffusions govern¬ 
ing a small number of independent factors that 
drive the evolution of the respective forward 
curves. 

Analogous considerations apply to the trans¬ 
portation of liquefied natural gas in LNG carri¬ 
ers. The LNG market is not as liquid or global 
as the oil market, yet it is likely to mature in 
the future in light of the growing demand for 
natural gas for the generation of electricity. 

Fuel-Efficient Navigation and 
Optimal Chartering of 
Shipping Fleets 

The shipping industry consumes approxi¬ 
mately 5% of the world oil production in the 
form of bunker fuels. Assuming a daily world 
oil production of 87 million barrels and a long¬ 
term price of oil of $100 per barrel, the daily 
bunker fuel costs for the shipping industry are 
estimated at $400 million. The long-term daily 
average freight rate revenue is harder to esti¬ 
mate and is assumed to be over twice the daily 
bunker fuel costs. 

The selection of the optimal speed and route 
of cargo vessels exposed to stochastic freight 
rates and subject to the constraints imposed 
by the charter contract, cargo loading sched¬ 
ules, and port and other fees leads to a stochas¬ 
tic dynamic programming problem. The ship 
resistance and propulsion characteristics may 
be supplied by the shipowner, estimated from 
models or inferred from real-time measure¬ 
ments of the ship speed, propeller revolutions, 
engine performance, and the weather using the 
inference methods described in Haykin (2001). 
Using a reduced form or structural stochas¬ 
tic price model for the shipping freight rate 
forward curve, optimal routing and chartering 
strategies may be derived analytically aiming to 
minimize the fuel consumption and maximize 
freight rate revenue over single or consecutive 
voyages. A cumulative 5% reduction in bunker 
fuel costs and increase in freight rate revenue 
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would translate into a $50 million increase in 
the daily net income of the shipping industry. 
The promise of these advanced dynamic op¬ 
timization algorithms is underscored by their 
adoption by the aviation industry for the op¬ 
timal routing of commercial jets ("Calculating 
Costs in the Clouds," The Wall Street Journal, 
March 6, 2007). 


Valuation and Hedging of Power 
Plants and Refineries 

The optimal economic dispatch of power plants 
presents a challenging problem that depends 
in part on the price differential of two en¬ 
ergy commodities. The input commodity is usu¬ 
ally a fuel—natural gas or oil—which may be 
traded in the spot and forward markets. The 
output commodity is electricity, which cannot 
be stored. It trades into a spot cash market and 
may not have liquid forward contracts, as dis¬ 
cussed by Joskow (2006). 

In simple cases, the valuation of power gen¬ 
erating units may be reduced to the pricing of a 
strip of options written on the price differentials 
of electricity and the input fuel, for example nat¬ 
ural gas. Given analytical price models for the 
price of the input fuel and electricity, the power 
plant valuation and hedging problem may be 
based on the pricing of these spark-spread op¬ 
tions, which may be available explicitly. In more 
general settings where operational constraints 
apply, the valuation problem may be cast in a 
stochastic dynamic programming framework, 
which may benefit from the use of the analyt¬ 
ical modeling and hedging methods outlined 
above. A similar set of issues arise in the val¬ 
uation and hedging of refineries that process 
crude oil, which has a well-developed spot and 
futures market, into products—gasoline, heat¬ 
ing oil, jet fuel, bunker fuel—which often do not 
have actively traded forward contracts. The use 
of this general valuation and hedging method¬ 
ology in practice is presented in Eydeland and 
Wolyniec (2003). 


Valuation of Wind Farms and 
Electricity Storage Facilities 

Wind is an ample, clean, renewable energy 
source, yet its availability is variable. Conse¬ 
quently the electricity generated from a wind 
farm varies stochastically and is a function of 
the statistical properties of the wind speed aver¬ 
aged over a certain time interval. The volatility 
of the annual mean wind speed is typically 
about 10%. The development of onshore wind 
farms is growing at a 25-35% rate worldwide. 
Offshore wind energy is the next frontier with 
high expected growth rates over the next 
several decades from the development of vast 
expanses of sea areas with high winds and 
capacity factors of 40-45% using innovative 
low-cost floating wind turbine technologies 
that may be deployed in water depths ranging 
from 30 to several hundred meters. An offshore 
wind farm with a rated capacity of 1 GW and a 
lifespan of 25 years is on an energy-equivalent 
basis comparable to a 100 million barrel oil 
reservoir. Moreover, this energy resource is 
available just 100 meters above sea level as 
opposed to thousands of meters below it. 

The valuation of a utility scale onshore or 
offshore wind farm as an energy asset may be 
carried out using the standard weighted av¬ 
erage cost of capital (WACC) discounted cash 
flow method for a constant capital structure. 
Alternatively the adjusted present value (APV) 
method may be used for a varying leverage 
ratio and when tax shields and other incentives 
available to wind farm investments must be val¬ 
ued separately. Wind turbines are high-value 
capital assets that generate steady cash flows 
with an annualized volatility of about 10%. 
Utility-scale wind farm investments may there¬ 
fore be structured using nonrecourse project 
finance with a leverage that may reach 70-80%. 
The risk embedded in debt and equity securities 
used to finance utility-scale wind farm invest¬ 
ments depends on technical, environmental, 
and market factors. Their rational modeling 
permits the pricing of debt and equity financial 
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claims at various levels of leverage. Moreover, 
the availability of robust long-term statistical 
models of the mean wind speed and shorter 
term jump-diffusion models of wind speed fluc¬ 
tuations and of the market prices of electricity 
discussed above allows the pricing of structured 
securities like convertible debt and other deriva¬ 
tives that may be used to design an optimal 
capital structure, hedge financial exposures of 
wind farms as energy assets, and determine the 
optimal mix between fixed PPA versus fluctu¬ 
ating market price contracts for the delivery of 
electricity. 

Investments in large-scale storage facilities 
for electricity generated by wind farms may 
be economically attractive since they would 
permit the storage of kilowatt-hours when 
electricity prices are low and wind speeds are 
high and the sale of electricity when prices are 
attractive. Such large-scale storage facilities in¬ 
clude pumped water storage in large reservoirs 
above ground or below sea level, compressed 
air storage in large underground caverns, high- 
capacity batteries, and compressed hydrogen in 
tanks following the electrolysis of seawater by 
onshore or offshore wind farms. The valuation 
and optimal operation of such storage depends 
on the short-term volatility and longer term 
fluctuations of wind speeds and electricity 
prices. Therefore its value is analogous to that 
derived from natural gas storage. The availabil¬ 
ity of a stochastic price model for the spot and 
forward electricity prices allows the explicit 
valuation of such storage facilities using the 
methods presented above. This analysis would 
suggest the merits, size, and optimal manage¬ 
ment of utility-scale electricity storage facilities 
and would guide investments in these assets. 

Canonical Correlation of 
Commodity and Shipping 
Forward Curves 

The principal components analysis of the term 
structure of interest rates and of the forward 
curves in the commodities markets is a pow¬ 


erful method for the parsimonious modeling 
of the evolution of a large number of highly 
correlated spot and forward securities in the re¬ 
spective market. Examples of the application of 
this statistical modeling method were discussed 
above. 

The forward curves of distinct energy com¬ 
modities, for example, crude oil, gasoline, and 
gasoil, are often correlated. The same applies to 
the FFA forward curves of distinct routes in the 
dry bulk and tanker shipping markets. There¬ 
fore the development of parsimonious and 
robust statistical models of the correlation struc¬ 
ture of two or more forward curves is often 
necessary for the valuation of assets exposed 
to multiple commodities. This may be accom¬ 
plished by carrying out a canonical correlation 
analysis of the block covariance matrix of the 
commodity forward curves of interest. The di¬ 
agonal blocks are the intracommodity covari¬ 
ance matrices, which may be treated by the 
principal component analysis discussed above. 
The off-diagonal blocks are the intercommodity 
covariance matrices, which may be reduced by 
the canonical correlation analysis described in 
Basilevsky (1994) and Anderson (2003). 

In a principal components analysis a small 
number of dominant factors is derived for each 
commodity forward curve, linear combinations 
of the traded futures contracts of varying tenors. 
In a canonical correlation analysis, for exam¬ 
ple of two commodity forward curves, portfo¬ 
lios of futures trading on each forward curve 
may be derived that are maximally correlated. 
The maximum correlation coefficient between 
the two curves is a summary metric that is 
independent of the tenor of the futures con¬ 
tracts used to derive each portfolio. The ex¬ 
tension of this method to multiple commodity 
forward curves is straightforward. A canonical 
correlation analysis allows an in-depth study of 
the cross-correlation structure of multiple com¬ 
modity and shipping markets and may be used 
in the development of cross hedging strate¬ 
gies, in the valuation of assets, and for risk 
management. 
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The canonical correlation of the forward 
curves of distinct routes in the dry bulk and 
tanker shipping markets was carried out by 
Hatziyiannis (2010). This study revealed vari¬ 
ous degrees of maximal correlations between 
shipping routes and a surprisingly large max¬ 
imal correlation between the dry bulk and 
tanker markets. This suggests that there exist 
portfolios of FFAs trading on major routes in 
the dry bulk market that are highly correlated 
with FFA portfolios in the tanker market. The 
composition of these portfolios follows from the 
canonical correlation analysis. The implication 
is that a few liquid forward curves in shipping 
may be used for the hedging of exposures in 
routes with less liquid derivatives. Moreover, 
maximally correlated portfolios of spot and for¬ 
ward contracts may be used for the design of 
broad shipping indexes across shipping sectors 
and routes that may spur the liquidity of ship¬ 
ping derivatives. 

The shipping forward curve principal com¬ 
ponents and canonical correlation analysis 
described above may be extended to include 
cross-correlated energy and other bulk com¬ 
modity forward curves. Combined with the 
powerful methods of conditional multivariate 
statistics coupled with robust Bayesian Stein 
estimators of drifts, risk management strate¬ 
gies of energy and shipping assets may be 
developed and trading strategies involving 
paper assets may be derived. 

Pricing of Shipping Options 

The arbitrage-free pricing of shipping options 
is carried out along lines similar to those in the 
energy markets. A technical complexity of ship¬ 
ping derivatives is that shipping options settle 
against the arithmetic average of the underly¬ 
ing spot index. Shipping options may be priced 
either by modeling the evolution of the under¬ 
lying index, or by modeling the evolution of the 
underlying futures contract. The first method is 
prevalent to date and is discussed in Alizadeh 
and Nomikos (2009). Yet, the second method 


has a number of advantages. By modeling the 
underlying futures or FFA contract as a lognor¬ 
mal diffusion, the pricing of calls and puts may 
be carried out readily by using the Black for¬ 
mula. Moreover, the underlying futures or FFA 
contracts may be used for delta, gamma, and 
vega hedging of options exposures. 

This approach of pricing shipping options 
has been adopted in the multifactor principal 
components model of the forward curve 
developed by Sclavounos and Ellefsen (2009). 
It leads to explicit expressions of the option 
prices and their Greeks and also allows for a 
volatility term structure, which is the result of 
the mean reversion of the factors driving the 
shipping forward curves. The explicit form of 
the Ornstein-Uhlenbeck diffusions governing 
the evolution of the factors leads to explicit 
algebraic expressions for the options and their 
sensitivities discussed in Ellefsen (2010). 

Pricing of Credit Risk and 
Structured Securities in Shipping 

Shipping fleets are primarily financed by debt 
issued by banks and other lending institutions, 
followed by equity raised by shipping firms 
in private placements or on public exchanges. 
The underlying assets financed by this capital 
are cargo ships, which have observable prices 
quoted by shipping brokers. Credit derivatives 
and other structured securities analogous to 
those in widespread use in other asset markets 
are not yet as widely traded in the shipping 
sector. 

The pricing of credit risk is based on the fun¬ 
damental structural form firm value method 
of Merton (1974) and the reduced form haz¬ 
ard rate method of Duffie and Singleton (2003). 
These valuation methods have enabled the 
pricing of derivatives written on individual 
credits—for example, credit default swaps—as 
well as derivatives written on baskets of cred¬ 
its. The values of the underlying entities in a 
basket and their default probabilities are cor¬ 
related, and this dependency structure may be 
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modeled in its generality by using multivari¬ 
ate Gaussian statistics and in non-Gaussian set¬ 
tings copulae functions. This financial technol¬ 
ogy has enabled the design and pricing of an ar¬ 
ray of structured financial securities discussed 
in Duffie and Singleton (2003), Lando (2004), 
London (2007), and Cherubini and Della Lunga 
(2007). 

Shipping credit risk may be modeled and 
priced using a hybrid model, which blends the 
structural and reduced form valuation methods 
discussed in Ammann (2001). The price of the 
assets of a shipping firm—the cargo vessels—is 
stochastic but observable, therefore recovery at 
default is known. The price of equity of pub¬ 
lic shipping firms is also observable and may 
be used to model the hazard rate, the probabil¬ 
ity of default, and hence the pricing of ship¬ 
ping debt by calibrating a hybrid credit risk 
model as described by Overhaus et al. (2007). 
Cargo ship prices within and across shipping 
sectors are correlated, and this dependency may 
be modeled by identifying common underlying 
factors via a principal components and canon¬ 
ical correlation analysis. The above attributes 
of the shipping sector may be introduced to 
price loans, convertible bonds, equity and credit 
linked notes, and other structured securities, 
which may be used to better manage shipping 
risk, reduce bank regulatory risk capital, and 
make available new and innovative sources of 
financing to shipowners. 


KEY POINTS 

• Producers of energy commodities and owners 
of shipping tonnage may take short positions 
in futures and freight forward agreements 
(FFAs) in order to hedge their forward deliv¬ 
ery commitments against a decrease of prices. 

* Consumers of energy commodities and ship¬ 
pers who charter cargo vessels may take long 
positions in futures and FFAs in order to 
hedge their forward commodity and freight 
rate exposures against rising prices. 


• Power plants and refineries that transform an 
input commodity into an output commodity, 
for example, natural gas to electricity, crude 
oil to gasoline, may go long the futures of the 
input commodity and short the futures of the 
output commodity in order to protect their 
profit margins against adverse moves of the 
input/output commodity price spread. 

• Liquid energy commodity forward curves 
convey information about the stochastic evo¬ 
lution of the spot price of the commodity. 

• The stochastic dynamics of individual energy 
commodity and shipping forward curves 
may be modeled by a small number of in¬ 
dependent statistical factors using a princi¬ 
pal components analysis (PCA). The factors 
are portfolios of traded futures contracts, and 
their stochastic dynamics are governed by dif¬ 
fusions that may be derived in explicit form. 

• The joint stochastic dynamics of cross- 
correlated commodity and shipping forward 
curves may be modeled by a small number 
of statistical factors using an intracommodity 
PCA curve and an intercommodity canonical 
correlation analysis (CCA). 

• The parsimonious statistical factor model¬ 
ing of the commodity and shipping forward 
curves may be used for the valuation and 
risk management of energy assets, structured 
securities, and portfolios of commodity and 
shipping derivatives. 

REFERENCES 

Albanese, C., and Campolieti, G. (2006). Ad¬ 
vanced Derivatives Pricing and Risk Management. 
Burlington, MA: Elsevier Academic Press. 

Alizadeh, A. H., and Nomikos, N. K. (2009). Ship¬ 
ping Derivatives and Risk Management. London: 
Palgrave Macmillan. 

Ammann, M. (2001). Credit Risk Valuation. Berlin: 
Springer Verlag. 

Anderson, T. W. (2003). An Introduction to Multi¬ 
variate Statistical Analysis. New York: John Wiley 
& Sons. 

Basilevsky, A. (1994). Statistical Factor Analysis and 
Related Methods: Theory and Applications. New 
York: John Wiley & Sons. 


Modeling, Pricing, and Risk Management of Assets and Derivatives in Energy and Shipping 567 


Bength, F. E., Bength, J. S., and Koekebakker, S. 
(2008). Stochastic Modeling of Electricity and Re¬ 
lated Markets. Advanced Series of Statistical Sci¬ 
ence and Applied Probability. Singapore: World 
Scientific. 

Brennan, M. J., and Trigeorgis, L. (2000). Project 
Flexibility, Agency and Competition: New Devel¬ 
opments in the Theory and Application of Real Op¬ 
tions. Oxford: Oxford University Press. 

Campbell, J. Y., Lo, W. A., and MacKinlay, A. C. 
(1997). The Econometrics of Financial Markets. 
Princeton: Princeton University Press. 

Carr, P., and Madan, D. (1998). Option valuation 
using the fast Fourier transform. Journal of Com¬ 
putational Finance 2: 61-73. 

Cherubini, U., and Della Lunga, G. (2007). Struc¬ 
tured Finance. New York: John Wiley & Sons. 

Clewlow, L., and Strickland, C. (2000). En¬ 
ergy Derivatives: Pricing and Risk Management. 
London: Lacima Publications. 

Cont, R., and Tankov, P. (2004). Financial Modeling 
with Jump Processes. Boca Raton, FL: Chapman 
& Hall/CRC. 

Copeland, T., and Antikarov, V. (2001). Real Op¬ 
tions: A Practitioner's Guide. London: Monitor 
Group, TEXERE. 

Cortazar, G. and Naranjo, L. (2006), An N-factor 
Gaussian model of oil futures prices. Journal of 
Futures Markets, 26: 243-268. 

Culp, C. L., and Miller, M. H. (1999). Corporate 
Hedging in Theory and Practice: Lessons from Met- 
allgesellschaft. London: RISK Books. 

Dixit, A. K., and Pindyck, R. S. (1994). Investment 
under Uncertainty. Princeton: Princeton Univer¬ 
sity Press. 

Duffie, D. (2001). Dynamic Asset Pricing Theory, 3rd 
Edition. Princeton: Princeton University Press. 

Duffie, D., Pan, J., and Singleton, K. J. (2000). Trans¬ 
form analysis and asset pricing for affine jump 
diffusions. Econometrica 68:1343-1376. 

Duffie, D., and Singleton, K. J. (2003). Credit Risk: 
Pricing, Measurement and Management. Prince¬ 
ton: Princeton University Press. 

Durbin, J., and Koopman, S. J. (2001). Time Series 
Analysis by State Space Methods. Oxford: Oxford 
University Press. 

Ellefsen, P. E. (2010). Commodity market mod¬ 
eling and physical trading strategies. Master's 
thesis. Department of Mechanical Engineering, 
Massachusetts Institute of Technology. 

Eydeland, A., and Wolyniec, K. (2003). Energy and 
Power Risk Management: New Developments in 
Modeling Pricing and Hedging. New York: John 
Wiley & Sons. 


Gatheral, J. (2006). The Volatility Surface: A Practi¬ 
tioner's Guide. New York: John Wiley & Sons. 

Geman, H. (2005). Commodities and Commodity 
Derivatives: Modeling and Pricing for Agricultur- 
als, Metals and Energy. New York: John Wiley & 
Sons. 

Gibson, R., and Schwartz, E. S. (1990). Stochastic 
convenience yield and the pricing of oil con¬ 
tingent claims. The Journal of Finance 45: 959- 
976. 

Glasserman, P. (2004). Monte Carlo Methods in Fi¬ 
nancial Engineering. Berlin: Springer Verlag. 

Greene, W. H. (2000). Econometric Analysis, 4th Edi¬ 
tion. Upper Saddle River, NJ: Prentice Hall. 

Hatziyiannis, N. (2010). Canonical correlation 
of shipping forward curves. Master's thesis. 
Department of Mechanical Engineering, Mas¬ 
sachusetts Institute of Technology. 

Haykin, S. (2001). Kalman Filtering and Neural Net¬ 
works. New York: John Wiley & Sons. 

Heston, S. (1993). A closed form solution for op¬ 
tions with stochastic volatility with applica¬ 
tions to bond and currency options. Review of 
Financial Studies 6: 327-343. 

Hull, J. C. (2003). Options, Futures and Other Deriva¬ 
tives, 5th Edition. Upper Saddle River, NJ: Pren¬ 
tice Hall. 

Joskow, P. L. (2006). Competitive electricity mar¬ 
kets and investment in new generating capac¬ 
ity. MIT Working Paper. Center for Energy and 
Environmental Policy Research. 

Jouini, E., Cvitanic, J., and Musiela, M. (2001). Op¬ 
tion Pricing, Interest Rates and Risk Management. 
Cambridge: Cambridge University Press. 

Kou, S. (2002). A jump-diffusion model for option 
pricing. Management Science 48:1086-1101. 

Lando (2004). Credit Risk Modeling: Theory and Ap¬ 
plications. Upper Saddle River, NJ: Princeton 
University Press. 

London, J. (2007). Modeling Derivatives Applica¬ 
tions. London: Financial Times Press. 

Longstaff, F. A., and Schwartz, E. S. (2001). Valu¬ 
ing American options by simulation: A simple 
least-squares approach. Review of Financial Stud¬ 
ies 14:113-147. 

Lewis, A. L. (2005). Option Valuation un¬ 
der Stochastic Volatility. Newport Beach, CA: 
Finance Press. 

Li, M., Deng, S-J., and Zhou, J. (2008). Closed-form 
approximations for spread option prices and 
Greeks. Journal of Derivatives 15, 3: 58-80. 

Li, M., Zhou, J., and Deng, S-J. (2010). Multi-asset 
spread option pricing and hedging. Quantitative 
Finance 10, 3: 305-324. 


568 


Derivatives Valuation 


Merton, R. (1974). On the pricing of corporate 
debt: The risk structure of interest rates. Jour¬ 
nal of Finance 29: 449-470. 

Miltersen, K. R., and Schwartz, E. S. (1998). Pricing 
of options on commodity futures with stochas¬ 
tic term structures of convenience yields and 
interest rates. Journal of Financial and Quantita¬ 
tive Analysis 33: 61-86. 

Musiela, M., and Rutkowski, M. (2005). Martingale 
Methods in Financial Modelling. Berlin: Springer 
Verlag. 

Oksendal, B., and Sulem, A. (2005). Applied 
Stochastic Control of Jump Diffusions. Berlin: 
Springer Verlag. 

Overhaus, M., Bermudez, A., Buehler, H., Ferraris, 
A., Jordinson, C., and Lamnouar, A. (2007). 
Equity Hybrid Derivatives. New York: John Wiley 
& Sons. 

Ronn, E. I. (2002). Real Options and Energy Manage¬ 
ment. London. RISK Publications. 

Ross, S. (1976). The arbitrage theory of capital asset 
pricing. Journal of Economic Theory 13: 343-362. 

Samuelson, P. (1965). Proof that properly antici¬ 
pated prices fluctuate randomly. Industrial Man¬ 
agement Review 6: 41-49. 


Schwartz, E. S. (1997). The stochastic behavior 
of commodity prices: Implication for valuation 
and hedging. Journal of Finance 3: 923-973. 

Schwartz, E. S., and Smith, J. E. (2000). The stochas¬ 
tic behavior of commodity prices: Implications 
for valuation and hedging. Management Science 
46: 893-911. 

Sclavounos, P. D., and Ellefsen, P. E. (2009). 
Multi-factor model of correlated commodity 
forward curves for crude oil and shipping 
markets. Working Paper. Center for Environ¬ 
mental and Energy Policy Research (CEEPR). 
Massachusetts Institute of Technology, Sloan 
School of Management http://web.mit.edu/ 
ceepr / www / publications / workingpapers 
.html 

Shreve, S. E. (2004). Stochastic Calculus for Finance 
II. Continuous-Time Models. Berlin: Springer 
Verlag. 

Singleton, K. J. (2006). Empirical Dynamic As¬ 
set Pricing: Model Specification and Economet¬ 
ric Assessment. Princeton: Princeton University 
Press. 

Yong, J., and Zhou, X. Y. (1999). Stochastic Controls. 
Berlin: Springer Verlag. 


Index 


Absence of arbitrage principle, 1:99, 
1:127. See also arbitrage, 
absence of 

ABS/MBS (asset-backed 

securities/mortgage-backed 
securities), 1: 258-259, 1 :267 
cash flow of, 111:4 

comparisons to Treasury securities, 
III: 5 

modeling for. III :536 
Accounting, II: 532, II: 542-543 
Accounting firms, watchdog function 
of, II :542 

Accounts receivable turnover ratio, 
11:557-558 

Active-passive decomposition model, 
111:17, III: 19-22,111:26 
Activity ratios, 11:557-558,11:563 
Adapted mesh, one year to maturity, 
71:680/ 

Adjustable rate mortgages (ARMs). 
See ARMs (adjustable rate 
mortgages) 

Adjustments for changes in net 

working capital (ANWC), 11:25 
Adverse selection, 777:76 
Affine models, 777:554-557 
Affine process, basic, 7:318-319,7:334n 
Agency ratings, and model risk, 
77:728-729 

Airline stocks, 77:249-250, 77:250/, 
77:2507, 77:2527 

Akaike Information Criterion (AIC), 
77:703, 77:717 

Algorithmic trading, 77:117 
Algorithms, 77:676-677,77:701-702, 
777:124 

Allied Products Corp., cash flow of, 
77:576 

^-stable densities, 777:243/ 777:244f 
ff-stable distributions 
defined, 77:738 
discussion of, 777:233-238 


fitting techniques for, 77:743-744 
properties of, 77:739 
simulations for, 77:750 
subordinated representation of, 
77:742-743 

usefulness of, 777:242 
and VaR, 77:748 
variables with, 77:740 
a-stable process, 777:499 
Alternative risk measures 
proposed, 777:356-357 
Amazon.com 

cash flows of, 77:568, 77:5687 
American International Group (AIG), 
stock prices of, 777:238 
Amortization, 77:611, 777:72-73 
Analysis 

and Barra model, 77:244-248 
bias in, 77:109 
common-size, 77:561-563 
crisis-scenario, 777:379-380 
to determine integration, 77:514 
formulas for quality, 77:239 
fundamental, 77:243,77:248, 
77:253-254 

interpretation of results, 777:42-44 
mathematical, 7:18 
model-generated, 777:41-42 
multivariate, 77:48 
statistical, 7:140, 77:353-354 
sum-of-the-parts, 77:43^4 
vertical ps. horizontal common-size, 
77:562 

Analytics, aggregate, 77:2697 
Anderson, Philip W., 777:275 
Annual percentage rate (APR), 77:598, 
77:615-616 

Annual standard deviation, ps. 

volatility, 777:534 
Annuities 

balances in deferred, 77:610/ 
from bonds, 7:211-212 
cash flows in, 77:604-607 


future value factor, 77:605-606 
ordinary, 77:605 
present value factor, 77:605, 
77:606-607 

valuation of due, 77:608-609 
valuing deterred, 77:609-611 
Anticipation, in stochastic integrals, 
777:475 

Approximation, quality of, 77:330-331 
APT (arbitrage pricing theory), 7:116 
Arbitrage 

absence of, 7:56,7:135, 77:473 
in continuous time, 7:121-123 
convertible bond, 7:230 
costless profits, 7:442 
costless trades, 7:4287 
defined, 7:99, 7:119, 7:123 
in discrete-time, continuous state, 
7:116-119 

and equivalent martingale 
measures, 7:111-112 
in multiperiod finite-state setting, 
7:104-114 

in one-period setting, 7:100-104 
pricing of, 7:124, 7:134-135, 77:476 
profit from, 7:221-222 
and relative valuation models, 
7:260 

and state pricing, 7:55-56, 7:102, 
7:130 

test for costless profit, 7:441 
trading strategy with, 7:105 
types of, 7:55-56 
using, 7:70-71 

Arbitrage-free, 777:577, 777:593-594 
Arbitrage opportunities, 7:55,7:56, 
7:100,7:117,7:260-261,7:437 
Arbitrage pricing theory (APT), 7:116 
application of, 7:60-61 
development of, 77:468, 77:475^176 
factors in, 77:138 
key points on, 77:149-150 
and portfolio optimization, 7:40 


569 


570 


Index 


ARCH (autoregressive conditional 
heteroskedastic) models 
and behavior of errors, 77:362 
defined, 7:176 
in forecasting, 77:363 
reasons for, 777:351 
type of, 77:131 
use of, 77:733-734 
ARCH/GARCH models 
application to VaR, 77:365-366 
behavior of, 77:361-362 
discussion of, 77:362-366 
generalizations of, 77:367-373 
usefulness of, 77:366-367 
ARCH/GARCH processes, 777:277 
Area, approximation of, 77:589-590, 
77:589/ 

ARIMA (autoregressive integrated 
moving average) process, 
77:509-510 

ARMA (autoregressive moving 
average) models 
defined, 77:519 
and Hankel matrices, 77:512 
linearity of, 77:402 
and Markov coefficients, 77:512 
multivariate, 77:510-511, 77:513-514 
nonuniqueness of, 77:511 
representations of, 77:508-512 
and time properties, 77:733 
univariate, 77:508-510 
ARMA (autoregressive moving 

average) processes, 777:276-277 
ARMs (adjustable rate mortgages), 
777:25, 777:71-72, 777:72/ 777:74 
Arrays, in MATLAB and VBA, 

777:420-421, 777:457-458,777:466 
Arrow, Kenneth, 77:467, 77:699 
Arrow-Debreu price, 7:53-55. Sec also 
state price 

Arrow-Debreu securities, 7:458, 7:463 
Arthur, Bryan, 77:699 
Artificial intelligence, 77:715 
Asian fixed calls, with finite difference 
methods, 77:670f 

Asian options, pricing, 777:642-643 
Asset allocation 
advanced, 7:36 
building blocks for, 7:38 
modeling of, 7:42 
standard approach to, 7:37-38 
Asset-backed securities (ABS), 7:258 
Asset-liability management (ALM), 
77:303-304, 777:125-126 
Asset management, focus of, 7:35 
Asset prices 
codependence of, 7:92 
multiplicative model for, 7:86-87, 
7:88 


negative, 7:84, 7:88 
statistical inference of models, 

7:560 

Asset pricing, 7:3,7:56-59, 7:59-60, 
7:65-66, 77:197 

Asset return distributions, skewness 
of, 777:242 
Asset returns 
characteristics of, 777:392 
errors in estimation of, 777:140-141 
generation of correlated, 7:380-381 
log-normal distribution applied to, 
777:223-225 
models of, 777:381 
normal distribution of, 7:40 
real-world, 777:257 
simulated vector, 7:380-381 
Assets 

allocation of, 7:10 

on the balance sheet, 77:533-534 

carry costs, 7:424-425 

correlation of company, 7:411 

current vs. noncurrent, 77:533 

deliverable, 7:483 

discrete flows of, 7:425-426 

expressing volatilities of, 777:396-397 

financing of, 77:548 

funding cost of, 7:531 

future value of, 7:426f, 7:427f 

highly correlated, 7:192 

intangible, 77:534 

liquid, 77:551 

management of, 77:558 

market prices of, 7:486 

new fixed, 77:25 

prices of, 7:60 

redundant, 7:51 

representation of, 77:515 

risk-free, 7:112-113 

risky vs. risk-free, 7:5-6 

shipping, 7:555 

storage of physical, 7:439, 7:442-443, 
7:560-561 

values of after default events, 7:350 
Asset swaps, 7:227-230 
Assumptions 
about noise, 77:126 
under CAPM, 7:68-69 
errors in, 777:399 
evaluation of, 77:696 
homoskedasticity vs. 

heteroskedasticity, 77:360 
importance of, 777:62 
for linear models, 77:310-311 
for linear regression models, 77:313 
in scenario analysis, 77:289 
simplification of, 777:397 
using inefficient portfolio analysis, 
7:288f 


violations of, 7:475 
zero mean return, 777:397 
Attribution analysis, 77:188-189 
AT&T stock, binomial experiment, 
7:146-148 

Audits, of financial statements, 

77:532 

Augmented Dickey-Fuller test (ADF), 
77:387, 77:389, 77:3907, 77:514 
Autocorrelation, 77:328-329, 77:503, 
77:733 

Autoregressive conditional duration 
(ACD) model, 77:370 
Autoregressive conditional 

heteroskedastic (ARCH) 
models. See ARCH 
(autoregressive conditional 
heteroskedastic) models 
Autoregressive integrated moving 
average (ARIMA) process, 
77:509-510 

Autoregressive models, 77:360-362 
Autoregressive moving average 

(ARMA) models. See ARMA 
(autoregressive moving 
average) models 

AVaR. See average value at risk (AVaR) 
Average credit sales per day, 
calculation of, 77:553 
Average daily volume (ADV), 77:63 
Averages, equally weighted, 
777:397-409 

Average value at risk (AVaR) measure 
advantages of, 777:347 
back-testing of, 777:338-340 
boxplot of fluctuation of, 777:338/ 
and coherent risk measures, 
777:333-334 

computation of in practice, 
777:336-338 

computing for return distributions, 
777:334-335 
defined, 777:331-335 
estimation from sample, 777:335-336 
and ETL, 777:345-347 
geometrically, 777:333/ 
graph of, 777:347/ 
higher-order, 777:342-343 
historical method for, 777:336-337 
hybrid method for, 777:337 
minimization formula for, 
777:343-344 

Monte Carlo method for, 777:337-338 
with the multivariate normal 
assumption, 777:336 
of order one, 777:342-343 
for stable distributions, 777:344-345 
tail probability of, 777:332-333 
Axiomatic systems, 777:152-153 


Index 


571 


Bachelier, Louis, 77:121-122, 77:467, 
77:469-470, 777:241-242, 777:495 
Back propagation (BP), 77:420 
Back-testing 

binomial (Kupiec) approach, 777:363 
conditional testing (Christoffersen), 
777:364-365 

diagnostic, 777:367-368 
example of, 77:748-751 
exceedance-based statistical 
approaches, 777:362-365 
in-sample vs. out-sample, 77:235-236 
need for, 777:361-362 
statistical, 777:362 
strengths/weaknesses of 
exceedance-based, 777:365 
tests of independence, 777:363-364 
trading strategies, 77:236-237 
use of, 777:370 

using normal approximations, 
777:363 

of VaRs, 777:365-367 
Backward induction pricing 
technique, 777:26 
Bailouts, 7:417 
Balance sheets 
common-size, 77:562, 77:562f 
information in, 77:533-536 
sample, 77:5347, 77:5467 
structure of, 77:536 
XYZ, Inc. (example), 77:297 
Balls, drawing from urn, 777:174-177, 
777:175/, 777:179-180 
Bandwidth, 77:413^14,77:746 
Bank accounts, and volatility, 777:472 
Bank for International Settlements 

(BIS), definition of operational 
risk, 777:82 

Bankruptcy, 7:350,7:366-369, 77:577 
Banks, use of VaR measures, 777:295 
Barclays Global Risk Model, 77:173, 
77:193n, 77:268 
Barra models 
E3, 77:256, 77:257f, 77:261 
equity, 77:245-246 
fundamental data in, 77:2467 
fundamental factor, 77:244-248, 
77:248-250 
risk, 77:256 
use of, 77:254n 
Barrier options, 77:683 
Basel II Capital Accord, on operational 
risk, 777:86-87 

Basic earning power ratio, 77:547,77:549 
Bayes, Thomas, 7:140, 7:196 
Bayesian analysis 
empirical, 7:154-155 
estimation, 7:189 

hypothesis comparison, 7:156-157 


in parameter estimation, 77:78 
and probability, 7:140,7:148 
steps of decision making in, 7:141 
testing, 7:156-157 
use of, 7:18 

Bayesian inference, 7:151, 7:157-158, 
77:719 

Bayesian Information Criterion (BIC), 
77:703, 77:717 

Bayesian intervals, 7:156, 7:170 
Bayesian methods, and economic 
theory, 777:142 

Bayes' theorem, 7:143-148,7:152 
Behaviors, patterns of, 77:707-710, 
777:34-35 

BEKK(1,1,K) model, 77:372 
Beliefs 

about long-term volatility, 
777:408-409 
posterior, 7:151-152 
prior, 7:152, 7:159 
Bellman's principle, 77:664—665 
Benchmarks 
choice of, 77:114—115 
effect of taxes on, 77:74 
fair market, 777:626 
modeling of, 77:696 
portfolio, 77:272f 

for risk, 77:265, 777:350, 777:354-355 
risk in, 77:259 
tracking of, 77:67 
for trades, 77:117, 777:624 
use of, 7:41^2, 77:66-69 
Benchmark spot rate curves, 7:222-223 
Berkowitz transformation, application 
of, 777:366-367,777:368 
Bernoulli model, parameter inference 
in, 77:726-727 

Bernoulli trials, 7:81, 777:170,777:174 
Bessel function of the third kind, 
777:232 

Best bids/best asks, 77:449-450 
Best practices, 7:416 
Beta function, 777:222 
Betas 

betai% 3 , 7:74-75 
betai% 4 , 7:75 

betai963 vs. betai%4, 7:76-77 
distribution of, 777:222 
meanings of, 7:74 
in portfolios, 77:273 
pricing model, 7:60-61, 7:71-72 
propositions about, 7:75-77 
robust estimates of, 77:442—443 
in SL-CAPM models, 7:66-67 
two beta trap, 7:74-77 
Bets, unintended, 77:261, 77:263-264, 
77:264, 77:265 

Better building blocks, 7:36 


Bias 

from data, 77:204 
discretization error, 777:641 
estimator, 777:641 

survivorship (look-ahead), 77:202, 
77:204,77:712-713, 77:718 
Bid-ask bounce, 77:455-457 
Bid-ask spread 
aspects of, 777:597 
average hourly, 77:454/ 
defined, 77:454 

under market conditions, 77:455/ 
risk in, 777:372 

Binomial experiment, 7:146-148 
Black, Fischer, 77:468, 77:476 
Black and Scholes 
assumptions of, 7:510 
Black-Derman-Toy (BDT) model 
defined, 7:492 
discussion of, 777:608-609 
features of, 777:549 
interest rate model, 777:616/ 
as no arbitrage model, 777:604 
use of, 777:300 

Black-Karasinski (BK) model, 777:548, 
777:607-608 

binomial lattice, 777:611 
defined, 7:493 
features of, 777:604 
forms of, 777:6001 
interest rate trinomial lattice, 
777:615/ 

trinomial lattice, 777:616/ 
Black-Litterman model 

assumptions with, 7:196-197 
derivation of, 7:196-197 
discussion of, 7:195-201 
with investor's views and market 
equilibrium, 7:198-199 
mixed estimation procedure, 7:200 
use of for forecasting returns, 
7:193-194, 77:112 

use of in parameter estimation, 77:78 
variance of, 7:200 
Black-Scholes formula 
for American options, 77:674 
with change of time, 777:522, 
777:524-525 

and diffusion equations, 77:654 
and Gaussian distribution, 77:732 
and Girsanov's theorem, 7:132-133 
statistical concepts for, 777:225 
use of, 7:126-127,7:136 
use of in MATLAB, 777:423-427, 
777:447 

use of with VBA, 777:462-463 
and valuation models, 7:271 
Black-Scholes-Merton stock option 
pricing formula, 7:557 


572 


Index 


Black-Scholes model 
assumptions of, 7:512, III :655 
and calibration, II: 681-682 
for European options, II: 660-662, 

III: 639-640 
and hedging, f:410 
and Merton's model, 1:343 
for pricing options, f:487,7:509-510, 
7:522 

usefulness of, 7:475 
use of, 7:272 
volatility in, 777:653 
Black volatility, 777:548, 777:550 
Bohr, Niels, 7:123 
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capped floating rate, valuation of, 
7:249/ 

changes in prices, 7:373-374 
computing accrued interest and 
clean price of, 7:214—215 
convertible, 7:230, 7:271 
corporate, 7:279, 777:598-599 
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input information for example, 
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7:246f 
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777:598-600 

Borrowing, 7:72-73,7:479^80 
Boundary conditions, 77:660 
need for, 77:661 
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fractal properties of, 777.479M80, 
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of derivatives, 7:494 
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American-style, 7:441—442, 7:449 
error on value of, 77:668f 
European-style, 7:440^41, 

7:448-449, 7:4487 

Canonical correlation analysis, 7:556 

Capital asset pricing model (CAPM). 
See CAPM (capital asset pricing 
model) 

Capital expenditures coverage ratio, 
77:575-576 

Capital gains, taxes on, 77:73 

Caplets, 7:249, 777:589-590 

CAPM 

multifactor, 77:475 

CAPM (capital asset pricing model). 
See also Roy CAPM; SL-CAPM 
application of, 7:60-61 
areas of confusion, 7:67-68 
for assessing operational risk, 
777:92-93 

in asset pricing, 77:474 
defined, 7:394 
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and investor risk, 7:73-74 
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strengths and weaknesses of, 
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7:487 

Cash concept, 77:567 

Cash flows 
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analysis of, 77:574-577,777:4-5 
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cash flow at risk (CFaR), 777:376-378 
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discounted, 7:225 
discrete, 7:429 

distribution analysis vs. benchmark, 
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estimation of, 7:209-210,77:21-23 
expected, 7:211 
factors in, 777:31-32, 777:377 
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777:62 

futures vs. forwards, 7:431f 
future value of, 77:603/ 
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perpetual stream of, 77:607-608 
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statement of, 77:539-541, 77:566-567 
time patterns of, 77:607-611 
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valuation of, 77:618-619 
us. free cash flow, 77:22-23 
Cash flow statements 
example of, 77:541 
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reformatting of, 77:569f 
restructuring of, 77:568 
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Cash flow-to-debt ratio, 77:576 
Cash-out refinancing, 777:66,777:69 
Cash payments, 7:486-487, 777:377 
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usefulness of, 77:335 
Cauchy, Augustin, 77:655 
Cauchy initial value problem, 77:655, 
77:656, 77:656/, 77:657 
CAViaR (conditional autoregressive 
value at risk), 77:366 
CDOs (collateralized debt 

obligations), 7:299,7:525,777:553, 
777:645 

CDRs (conditional default rates) 
in cash flow calculators, 777:34 
defaults measured by, 777:58-59 


defined, 777:30-31 
monthly, 777:627 
projections for, 777:35/ 
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CDSs (credit default swaps) 
basis, 7:232 
bids on, 7:527 
cash basis, 7:402 
discussion of, 7:230-232 
fixed premiums of, 7:530-531 
hedging with, 7:418 
illustration of, 7:527 
initial value of, 7:538 
maturity dates, 7:526 
payoff and payment structure of, 
7:534/ 

premium payments, 7:231/, 

7:533-535 

pricing models for, 7:538-539 
pricing of by static replication, 
7:530-532 

pricing of single-name, 7:532-538 
quotations for, 7:413 
risk and sensitivities of, 7:536-537 
spread of, 7:526 
unwinding of, 7:538 
use of, 7:403, 7:413, 77:284 
valuation of, 7:535-536 
volume of market, 7:414 
Central limit theorem 
defined, 7:149n, 777:209-210, 777:640 
and the law of large numbers, 
777:263-264 

and random number generation, 
777:646 

and random variables, 77:732-733 
Central tendencies, 77:353, 77:354, 77:355 
Certainty equivalents, 77:723-724, 
77:724-725 

CEV (constant elasticity of variance), 
777:550, 777:551/, 777:654-655 
Chambers-Mallows-Stuck generator, 
77:743-744 

Change of measures, 777:509-517, 
777:5167 

Change of time methods (CTM) 
applications of, 777:522-527 
discussion of, 777:519-522 
general theory of, 777:520-521 
main idea of, 777:519-520,777:527 
in martingale settings, 777:522-523 
in stochastic differential equation 
setting, 777:523 
Chaos, defined, 77:653 
Chaos: Making a New Science (Gleick), 
77:714 
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us. probability density function, 
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Characteristic lines, II: 316,11:3181, 
11:344-348,11:345-3471 
Chebychev inequalities. III: 210,111:225 
Chen model, 1:493 
Chi-square distributions, 1:388-389, 

111: 212-213 
Cholesky factor, 1:380 
Chow test, 11:336,11:343,11:344,11:350 
CID (conditionally independent 
defaults) models, 1:320, 

1:321-322,1:333 

CIR model, 1:498,1:500-501,1:502 
Citigroup, 1:302,1:408/, 1:409/ 

CLA (critical line algorithm), 1:73 
Classes 

criteria for, 11:494 
Classical tempered stable (CTS) 

distribution, 11:741-742,11:741/ 
11:742/ 11:743-744,111:512 
Classification, and Bayes' Theorem, 
1:145 

Classification and regression trees 
(CART). See CART 
(classification and regression 
trees) 

Classing, procedure for, 11:494^98 
Clearinghouses, 1:478 
CME Group, 1:489-490 
CMOs (collateralized mortgage 
obligations). III: 598, III :645 
Coconut markets, 1:70 
Coefficients 

binomial, 111:171,111:187-191 
of determination, 11:315 
estimated, 11:336-337 
Coherent risk measures, 111:327-329 
and VaR, 111:329 

Coins, fair/unfair, 111:169, 111:326-327 
Cointegrated models, 11:503 
Cointegration 
analysis of, 11:3811 
defined, 11:383 

empirical illustration of, 11:388-393 
technique of, 11:384—385 
testing for, 11:386-387 
test of, 11:3941,11:3961 
use of, 11:397 

Collateralized debt obligations 

(CDOs), 1:299,1:525,111:553, 
111:645 

Collateralized mortgage obligations 
(CMOs), 111:598,111:645 
Collinearity, 11:329-330 
Commodities, 1:279,1:556,1:566 
Companies. See firms 
Comparison principals, 11:676 
Comparisons ys. testing, 1.T56 
Complete markets, 1:103-104,1:119, 
1:133,1:461 


Complexity, profiting from, 11:57-58 
Complexity (Waldrop), 11:699 
Complex numbers, 11:591-592,11:592/ 
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continuous, 11:599,11:617 
determining number of periods, 
11:602 
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formula for growth rate, 11:8 
more than once per year, II: 598-599 
and present value, 11:618 
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Review, 1:300 

Comprehensive Capital Assessment 
Review, 1:412 

Computational burden. III: 643-644 
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applications 

increased use of. III: 137-138 
introduction of into finance, 11:480 
modeling with, 1:511, II: 695 
random walk generation of, 11:7 08 
in stochastic programing, ill: 124, 

111: 125-126 
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Conditional autoregressive value at 
risk (CAViaR), 11:366 
Conditional default rate (CDR). See 
CDRs (conditional default 
rates) 

Conditionally independent defaults 
(CID) models, 1:320,1:321-322, 
1:323 

Conditioning/conditions, 1:24, 
11:307-308,11:361,11:645 
Confidence, 1:200,1:201,11:723,111:319 
Confidence intervals, 11:440,111:3381, 
III:399-400, 111:400/ 
Conglomerate discounts, 11:43 
Conseco, debt restructure of, 1:529 
Consistency, notion of, 11:666-667 
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111:550,111:551/ 111:654-655 
Constant growth dividend discount 
model, 11:7-9 
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cardinality, 11:64-65 
common, 111:146 
commonly used, 11:62-66,11:84 
holding, 11:62-63 
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11:65 

nonnegativity, 1:73 
real world, 11:224-225 
round lot, 11:65-66 
setting, 1:192 
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on weights of, I.T91-192 


Constraint sets, 1:21,1:28,1:29 
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1:277-278,1:291/ 1:292,1:292/ 
Consumption, 1:59-60,11:360, Ill: 570 
Contagion, 1:320,1:324,1:333 
Contingent claims 
financial instruments as, 1:462 
incomplete markets for, 1:461^62 
unit, 1:458 

use of analysis, 1:463 
utility maximization in markets, 
1:459^61 

value of, 1:458^159 
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II: 583-584 

Continuous distribution function 

(c.d.f.), 111:167, III: 196,111:205, 
Ill:345-346,111:345/ 

Continuous distribution function F(a), 
III: 196 
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111:578 

Continuous-time processes, change of 
measure for, 111:511-512 
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111:458-460 

Control methods, stochastic, 1:560 
Convenience yields, 1:424,1:439 
Convergence analysis, II: 667-668 
Conversion, 1:274,1:445 
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in callable bonds. III: 302-303 
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effective, 111:13, III: 300-304, IIl:617f 
measurement of, 111:13-14, 

Ill:304-305 

negative, III.T4,111:49,111:303 
positive, 111:13 
use of. III: 299-300 
Convex programming, 1:29,1:31-32 
Cootner, Paul, 111:242 
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advantages of. Ill:284 
defined, 111:283 
mathematics of. III:284-286 
usefulness of. III:287 
visualization of bivariate 
independence, 111:285/ 
visualization of Gaussian, 111:287/ 
Corner solutions, 1:200 
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use of, 111:286-287 
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concept of. III :283 
drawbacks of. III: 283-284 
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777:540f 

and portfolio risk, 7:11 
robust estimates of, 77:443^46 
serial, 77:220 
undesirable, 7:293 
use of, 77:271 

Costs, net financing, 7:481 
Cotton prices, model of, 777:383 
Countable additivity, 777:158 
Counterparts, robust, 77:81 
Countries, low- vs. high inflation, 

7:290 

Coupon payments, 7:212,777:4 
Coupon rates, computing of, 
777:548-549 

Courant-Friedrichs-Lewy (CFL) 
conditions, 77:657 
Covariance 

calculation of between assets, 7:8-9 
estimators for, 7:38^0, 7:194-195 
matrix, 7:38-39, 7:155, 7:190 
relationship with correlation, 7:9 
reliability of sample estimates, 77:77 
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777:404 
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statistical methodology for, 
777:398-399 
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use of, 77:158-159, 77:169 
using EWMA in, 777:411 
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Cox-Ingersoll-Ross (CIR) model, 7:260, 
7:491-492, 7:547, 7:548, 
777:546-547, 777:656 
Cox processes, 7:315-316, 77:470-471 
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7:277-278, 7:291/, 7:292, 7:292/ 
CPRs (conditional prepayment rates). 

See prepayment, conditional 
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prepayment, conditional 
Cramer, Harald, 77:470-471 
Crank-Nicolson schemes, 77:666, 

77:669, 77:674, 77:680 


Crank Nicolson-splitting (CN-S) 
schemes, 77:675 

Crashmetrics, use of, 777:379, 777:380 
Credible intervals, 7:156 
Credit-adjusted spread trees, 7:274 
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of 2008,777:381 
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Credit default swaps (CDSs). Sec CDSs 
(credit default swaps) 
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definitions of, 7:528 
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exchanges/payments in, 7:231/ 
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prepayments from, 777:49-50 
protection against, 7:230 
and simultaneous defaults, 7:323 
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computation of, 7:382-383 
distribution of, 7:369/ 
example of distribution of, 7:386/ 
simulated, 7:389 

steps for simulation of, 7:379-380 
Credit models, 7:300, 7:302,7:303 
Credit performance, evolution of, 
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categories of, 7:362 
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disadvantages of, 7:300-301 
implied, 7:381-382 
maturity of, 7:301 
reasons for, 7:300 
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defined, 7:361 
distribution of, 7:377 
importance of, 777:81 
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modeling, 7:299-300, 7:322, 777:183 
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applications of, 7:404-405 
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drivers of, 7:402 
interpretation of, 7:403-404 
model specification, 7:403 
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risk in, 77:279f 
use of, 7:222-223 

Credit support, evaluation of, 111:39-40 
Credit value at risk (CVaR). See CVaR 
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in, 777:378-380 

Critical line algorithm (CLA), 7:73 
Cross-trading, 77:85n 
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77:413-414 

Crude oil, 7:561, 7:562 
Cumulation, defined, 777:471 
Cumulative default rate (CDX), 777:58 
Cumulative frequency distributions, 
77:493/ 77:4937, 77:498-499 
formal presentation of, 77:492-493 
Currency put options, 7:515 
Current ratio, 77:554 
Curve imbalances, 77:270-271 
Curve options, 777:553 
Curve risk, 77:275-278 
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77:202-203 
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See also value at risk (VaR) 
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Dark pools, 77:450, 77:454 
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acquisition and processing of, 

77:198 

alignment of, 77:202-203 
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bias of, 77:204, 77:713 
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cross-sectional, 77:201, 77:488, 77:488/ 
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Data ( Continued ) 
high-frequency (HFD) (See 

high-frequency data (HFD)) 
historical, 11:77-78, II: 122, 11:172 
housing bubble, II: 397-399 
importing into MATLAB, 

1/1:433—434 

industry-specific, II: 105 
integrity of, II: 201-203 
levels and scale of, 71:486-487 
long-term, 777:389-390 
in mean-variance, 7:193-194 
misuse of, 77:108 
on operational loss, 777:99 
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pooling of, 777:96 
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preliminary analysis of, 777:362 
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77:493-494 
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restatements of, 77:202 
sampling of, 77:459/, 77:711 
scarcity of, 77:699-700, 77:703-704, 
77:718 

sorting and counting of, 77:488^191 
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structure/sample size of, 77:703 
types of, 77:486—488 
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working with, 77:201-206 
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Compustat Point-In-Time, 77:238 
Factiva, 77:482 
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third-party, 77:198,77:211n 
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77:295-296, 77:298/, 77:502, 77:702, 
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Days payables outstanding (DPO), 
calculation of, 77:553-554 
Days sales outstanding (DSO), 
calculation of, 77:553 
DCF (discounted cash flow) models, 
77:16, 77:44-45 

DDM (dividend discount models). See 
dividend discount models 
(DDM) 

Debt 

long-term, in financial statements, 
77:542 

models of risky, 7:304-307 
restructuring of, 7:230 
risky, 7:307-308 
Debt-to-assets ratio, 77:559 
Debt-to-equity ratio, 77:559 
Decomposition models 
active/passive, 777:19 
Default correlation, 7:317-318 
contagion, 7:353-354 
cyclical, 7:352,7:353 
linear, 7:320-321 
measures of, 7:320-321 
tools for modeling, 7:319-333 
Default intensity, 777:225 
Default models, 7:321-322, 7:370/ 
Default probabilities 
adjustments in real time, 7:300-301 
between companies, 7:412-413 
cyclical rise and fall, 7:408/ 7:409/ 
defined, 7:299-300 
effect of business cycle on, 7:408 
effect of rating outlooks on, 
7:365-366 

empirical approach to, 7:362-363 
five-year (Bank of America and 
Citigroup), 7:301/ 7:302/ 
merits of approaches to, 7:365 
Merton's approach to, 7:363-365 
probability of, 77:727, 77:727/ 77:728/ 
and survival, 7:533-535 
and survival probability, 7:323-324 
term structure of, 7:303 
time span of, 7:302-303 
vs. ratings and credit scores, 
7:300-302 

for Washington Mutual, 7:415/ 
7:416/ 

of Washington Mutual, 7:415/ 

7:416/ 

Defaults 

annual rates of, 7:363 
and Bernoulli distributions, 
777:169-170 

calculation of monthly, 77/:61f 
clustering of, 7:324—325 
contagion, 7:320 
copulas for times, 7:329-331 


correlation of between companies, 
7:411 

cost of, 7:401,7:404/ 
dollar amounts of, 777:59/ 
effect of, 7:228, 777:645 
event vs. liquidation, 7:349 
factors influencing, 777:74—75 
first passage model of, 7:349 
historical database of, 7:414 
intensity of, 7:330, 7:414 
looping, 7:324-325 
measures of, 777:58-59 
in Merton approach, 7:306 
Moody's definition of, 7:363 
predictability of, 7:346-347 
and prepayments, 777:49-50, 
7/7:76-77 

process, relationship to recovery 
rate, 7:372 

pseudo intensities, 7:330 
rates of cumulative/conditional, 
777:63 

recovery after, 7:316-317 
risk of, 7:210 

simulation of times, 7:322-324,7:325 
threshold of, 7:345-346 
times simulation of, 7:319 
triggers for, 7:347-348 
variables in, 7:307-308 
Default swaps 

assumptions about, 7:531-532 
and credit events, 7:530 
digital, 7:537 
discussion of, 7:526-528 
market relationship with cash 
market, 7:530 

and restructuring, 7:528-529 
value of spread, 7:534 
Default times, 7:332 
Definite covariance matrix, 77:445 
Deflators, 7:129, 7:136 
Degrees, in ordinary differential 
equations, 77:644-645 
Degrees of freedom (DOF) 
across assets and time, 77:735-736 
in chi-square distribution, 777:212 
defined, 77:734 

for Dow Jones Industrial Average 
(DJIA), 77:735-737, 77:737/ 
prior distribution for, 7:177 
range of, 7:187n 

for S&P 500 index stock returns, 
77:735-736,77:736/ 

Delinquency measures, 7/7:57-58 
Delivery date, 7:478 
Delta, 7:509, 7:516-518, 7:521 
Delta-gamma approximation, 7:519, 
7/7:644-645 

Delta hedging, 7:413,7:416,7:418, 7:517 
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Delta profile, 1:518/ 

Densities 
beta, 111:108/ 

Burr, 111:110/ 

closed-form solutions for, 111:243 
exponential. 111:105-106,111:105/ 
gamma, 111:108/ 

Pareto, 111:109/ 
posterior, 1:170/ 
two-point lognormal, 111:111/ 
Density curves, 1:1 47f 
Density functions 
asymmetric, 111:205/ 
of beta distribution, 111:222/ 
chi-square distributions, 111:213/ 
common means, different variances, 
111:203/ 

computing probabilities from, 

111:201 

discussion of. 111:197-200 
of F-distribution, 111:217/ 
histogram of, 111:198/ 
of log-normal distribution, 111:223/ 
and normal distribution, 11:733 
and probability, 111:206 
rectangular distributions, 111:220 
requirements of, 111:198-200 
symmetric. 111:204/ 
of f-distribution, 111:214/ 
Dependence, 1:326-327,11:305-308 
Depreciation, 11:22 

accumulated, 11:533-534 
expense zjs. book value, 11:539/ 
expense zjs. carrying value, 11:540/ 
in financial statements, 11:537-539 
on income statements, 11:536 
methods of allocation, 11:537-538 
Derivatives 

construction of, 11:586-587 
described, 11:585-586 
embedded, 1:462 
energy, 1:558 
exotic, 1:558,1:559-560 
of functions, defined, 11:593 
and incomplete markets, 1:462 
interest rate. 111:589-590 
nonlinearity of, 111:644-645 
OTC, 1:538 

pricing of, 1:58, 111:594-596 
pricing of financial, 111:642-643 
relationship with integrals, 11:590 
for shipping assets, 1:555,1:558, 
1:565-566 

use of instruments, 1:477 
valuation and hedging of, 1:558-560 
vanilla, 1:559 
Derman, Emanuel, 11:694 
Descriptors, 11:140,11:246-247,11:256 
Determinants, 11:623 


Deterministic methods 
usefulness of, 11:685 
Diagonal VEC model (DVEC), 11:372 
Dice, and probability, 111:152,111:153, 
111:155-156, lll:156f 
Dickey-Fuller statistic, 11:386-387 
Dickey-Fuller tests, 11:514 
Difference, notation of, 1:80 
Differential equations 
classification of, 11:657-658 
defined, 1:95,11:644,11:657 
first-order system of, 11:646 
general solutions to, 11:645 
linear, 11:647-648 
linear ordinary, 11:644—645 
partial (PDE), 11:643,11:654-657 
stochastic, 11:643-644 
systems of ordinary, 11:645-646 
usefulness of, 11:658 
Diffusion, 111:539, 111:554-555 
Diffusion invariance principle, 1:132 
Dimensionality, curse of, 11:673,111:127 
Dirac measures, 111:271 
Directional measures, 11:428,11:429 
Dirichlet boundary conditions, 11:666 
Dirichlet distribution, 1:181-183, 
!:186-187n 

Discounted cash flow (DCF) models, 
11:16,11:44-45 

Discount factors, 1:57-58,1:59-62,1:60, 
11:600-601 
Discount function 
calculation of, 111:571 
defined, 111:563 
discussion of. 111:563-565 
forward rates from. 111:566-567 
graph of, 111:563/ 
for on-the-run Treasuries, 

111:564-565 

Discounting, defined, 11:596 
Discount rates, 1:211,1:212,1:215-216, 
11:6 

Discovery heuristics, 11:711 
Discrepancies, importance of small, 
11:696 

Discrete law. 111: 165-169 
Discrete maximum principle, 11:668 
Discretization, 1:265,11:669/ 11:672 
Disentangling, 11:51-56 
complexities of, 11:55-56 
predictive power of, 11:54-55 
return revelation of, 11:52-54 
usefulness of, 11:52,11:58 
Dispersion measures, 111:352, 

111:353-354,111:357 
Dispersion parameters. 111:202-205 
Distress events, 1:351 
Distributional measures, 11:428 
Distribution analysis, cash flow, 111:310 


Distribution function, 111:218/ 111:224/ 
Distributions 

application of hypergeometric, 
111:177-178 

beliefs about, 1:152-153 
Bernoulli, 111:169-170,111:1851 
beta, 1:148,111:108 
binomial, 1:81/ 111:170-174,111:1851, 
111:363 

Burr, 111:109-110 

categories for extreme values, 11:752 
common loss, 111:1121 
commonly used, 111:225 
conditional, 111:219 
conditional posterior, 1:178-179, 
1:182-183,1:184-185 
conjugate prior, 1:154 
continuous probability, 111:195-196 
discrete, 111:1851 
discrete cumulative. 111: 166 
discrete uniform, 111:183-184, 
111:1851,111:638/ 

empirical, 11:498,111:104-105,111:105/ 
exponential, 111:105-106 
finite-dimensional, 11:502 
of Frechet, Gumbel and Weibull, 
111:267/ 

gamma, 111:107-108, 111:221-222 
Gaussian, 111:210-212 
Gumbel, 111:228,111:230 
heavy-tailed, l:186n, 11:733, 111: 109, 
111:260 

hypergeometric, 111:174-178, lll:185t 
indicating location of, 111:235 
infinitely divisible. 111:253-256, 
111:2531 

informative prior, 1:152-153 
inverted Wishart, 1:172 
light- os. heavy-tailed, 111:111-112 
lognormal, 111:106,111:106/ 

111:538-539 

mixture loss, 111:110-111 
for modeling applications, 111:257 
multinomial. 111:179-182,111:1851 
non-Gaussian, 111:254 
noninformative prior, 1:153-154 
normal (See normal distributions) 
parametric, 111:201 
Poisson, 1:142,111:182-183,111:1851, 
111: 217-218 

Poisson probability, 111:1871 
posterior, 1:147-148,1:165,1:166-167, 
1:169-170,1:177,1:183-184 
power-law. 111:262-263 
predictive, 1:167 
prior, 1:177,1:181-182,1:196 
proposal, 1:183-184 
representation of stable and CTS, 
11:742-743 
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Distributions ( Continued ) 
spherical, 71:310 

stable, 777:238, 777:242, 777:264-265, 
777:384 (See also a-stable 
distributions) 

subexponential, 777:261-262 
tails of, 777:112/, 777:648 
tempered stable, 777:257, 777:382 
testing applied to truncated, 777:367 
Diversification, 77:57-58 
achieving, 7:10 
and cap weighting, 7:38 
and credit default swaps, 7:413-414 
example of, 7:15 
international, 77:393-396 
Markowitz's work on, 77:471 
Diversification effect, 777:321 
Diversification indicators, 7:192 
Dividend discount models (DDM) 
applied to electric utilities, 77:127 
applied to stocks, 77:16-17 
basic, 77:5 

constant growth, 77:7-9, 77:17-18 
defined, 77:14 
finite life general, 77:5-7 
free cash flow model, 77:21-23 
intuition behind, 77:18-19 
multiphase, 77:9-10 
non-constant growth, 77:18 
predictive power of, 77:54 
in the real world, 77:19-20 
stochastic, 77:10-12, 77:127 
Dividend payout ratio, 77:4, 77:20 
Dividends 

expected growth in, 77:19 
forecasting of, 77:6 
measurement of, 77:3-4, 77:14 
per share, 77:3—4 
reasons for not paying, 77:27 
required rate of return, 77:19 
and stock prices, 77:4-5 
Dividend yield, 77:4, 77:19 
Documentation 

of model risk, 77:696, 77:697 
Dothan model, 7:491, 7:493 
Dow Jones Global Titans 500 (DJGTI), 
77:4907, 77:4917 

Dow Jones Industrial Average (DJIA) 
in comparison of risk models, 
77:747-751 

components of, 77:4897 
fitted stable tail index for, 77:740/ 
frequency distribution in, 77:4897 
performance (January 2004 to June 
2011), 77:749/ 

relative frequencies, 77:4917 
stocks by share price, 77:4927 
Drawing without replacement, 
777:174-177 


Drawing with replacement, 777:170, 
777:174, 777:179-180 

Drift 

effects of, 777:537 
of interest rates, 7:263 
in randomness calculations, 777:535 
in random walks, 7:84,7:86 
time increments of, 7:83 
of time series, 7:80 
as variable, 777:536 
DTS (duration times spread), 7:392, 
7:393-394, 7:396-398 
Duffie-Singleton model, 7:542-543 
Dupire's formula, 77:682-683,77:685 
DuPont system, 77:548-551, 77:551/ 
Duration 

calculations of real yield and 
inflation, 7:286 
computing of, 7:285 
defined, 7:284, 777:309 
effective, 777:300-304, 777:6177 
effective/option adjusted, 777:13 
empirical, of common stock, 
77:318-322,77:319-3227 
estimation of, 77:3237 
measurement of, 777:12-13, 
777:304-305 
models of, 77:461 
modified os. effective, 777:299 
Duration/convexity, effective, 7:255, 
7:256/ 

Duration times spread (DTS). Sec DTS 
(duration times spread) 
Durbin-Watson test, 777:647 
Dynamical systems 

equilibrium solution of, 77:653 
study of, 77:651 

Dynamic conditional correlation 
(DCC) model, 77:373 
Dynamic term structures, 777:576-577, 
777:578-579,777:591 

Early exercise, 7:447, 7:455. See calls, 
American-style; options 
Earnings before interest, taxes, 

depreciation and amortization 
(EBITDA), 77:566 

Earnings before interest and taxes 
(EBIT), 77:23, 77:547, 77:556 
Earnings growth factor, 77:223 
Earnings per share (EPS), 77:20-21, 
77:38-39, 77:537 

Earnings revisions factor, 77:207,77:209/ 
EBITDA/EV factor 
correlations with, 77:226 
examples of, 77:203, 77:203/ 77:207, 
77:208/ 

in models, 77:232, 77:238-239 
use of, 77:222-223 


Econometrics 
financial, 77:295,77:298-300, 
77:301-303 

modeling of, 77:373, 77:654 
Economic cycles, 7:537,77.-42M3 
Economic intuition, 77:715-716 
Economic laws, changes in, 77:700 
Economy 

states of, 7:49-50,77:518-519, 777:476 
term structures in certain, 
777:567-568 

time periods of, 77:515-516 
Economy as an Evolving Complex 

System, The (Anderson, Arrow, 
& Pines), 77:699 

Educated guesses, use of, 7:511 
EE (explicit Euler) scheme, 77:674, 
77:677-678 

Effective annual rate (EAR), interest, 
77:616-617 
Efficiency 

in estimation, 777:641-642 
Efficient frontier, 7:13-14, 7:17/ 7:289/ 
Efficient market theory, 77:396,777:92 
Eggs, rotten, 7:457-458 
Eigenvalues, 77:627-628,77:705, 
77:706-707/ 77:707f 
Einstein, Albert, 77:470 
Elements, defined, 777:153-154 
Embedding problem, and change of 
time method, 777:520 
Emerging markets, transaction costs 
in, 777:628 

EM (expectation maximization) 
algorithm, 77:146, 77:165 
Empirical rule, 777:210, 777:225 
Endogenous parameterization, 
777:580-581 
Energy 

cargoes of, 7:561-562 
commodity price models, 7:556-558 
forward curves of, 7:564-565 
power plants and refineries, 7:563 
storage of, 7:560-561, 7:563-564 
Engle-Granger cointegration test, 
77:386-388, 77:391-392, 77:395 
Entropy, 777:354 

EPS (earnings per share), 77:20-21, 
77:38-39, 77:537 

Equally weighted moving average, 
777.-400M02, 777.-406M07, 
777:408-409 

Equal to earnings before interest and 
taxes (EBIT), 77:23,77:547, 77:556 
Equal-variance assumption, 7.T64, 
7:167 

Equations 

difference, homogenous vs. 
nonhomogeneous, 77:638 
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difference vs. differential, II:629 
diffusion, II: 654-656, 11: 658n 
error-correction, II: 391, II:395f 
homogeneous linear difference, 

II:639-642,11:641/ 
homogenous difference, II:630-634, 
11:631-632/ 11:633-634/ 11:642 
linear, II:623-624 
linear difference, systems of. 

If:637-639 

matrix characteristics of, II: 628 
no arbitrage, 111:612,111:617-619 
nonhomogeneous difference, 

II: 634-637,11:635/ II:637-638/ 
stochastic. III: 478 
Equilibrium 

and absolute valuation models, 
1:260 

defined, II: 385-386 
dimensions of. III: 601 
in dynamic term structure models, 
III: 576 

expectations for, 11:112 
expected returns from, 11:112 
modeling of, 111:577,111:594 
in supply and demand. III:568 
Equilibrium models 
use of. III:603-604 
Equilibrium term structure models, 
III: 601 
Equities, 1:279 
investing in, II:89-90 
Equity 

on the balance sheet, 11:535 
changes in homeowner, 111:73 
in homes. III: 69 
as option on assets, 1:304-305 
shareholders', 11:535 
Equity markets, 11:48 
Equity multipliers, 11:550 
Equity risk factor models, 11:173-178 
Equivalent probability measures, 
1:111,111:510-511 
Ergodicity, defined, 11:405 
Erlang distribution. III:221-222 
Errors. See also estimation error; 
standard errors 

absolute percentages of, 11:525/ 
11:526/ 

estimates of, 11:676 
in financial models, 11:719 
a posteriori estimates, II:672-673 
sources of, 11:720 
terms for, 11:126 
in variables problem, 11:220 
Esscher transform, 111:511, III: 514 
Estimates/estimation 
confidence in, 1:199 
consensus, 11:34-35 


equations for, 1:348-349 
in EVT, 111:272-274 
factor models in, 11:154 
with GARCH models, 11:364-365 
in-house from firms, II: 35 
maximum likelihood, 11:311-313 
methodology for, 11:174—176 
and PCA, 11:167/ 
posterior, 1:176 
posterior point, 1:155-156 
processes for, 1:193,11:176 
properties of for EWMA, 1/1:410—411 
robust, 1:189 
techniques of, 11:330 
use of, 11:304 
Estimation errors 

accumulation of, 11:7 8 

in the Black-Litterman model, 1:201 

covariance matrix of, 111:139-140 

effect of, 1:18 

pessimism in, 111:143 

in portfolio optimization, II: 82, 

III: 138-139 
sensitivity to, 1:191 
and uncertainty sets, 111:141 
Estimation risk, 1:193 
minimizing. III: 145 
Estimators 
bias in. III: 641 
efficiency in, 111:641-642 
equally weighted average, 
111:400-402 
factor-based, 1:39 
terms used to describe, 11:314 
unbiased. III: 399 
variance, 11:313 

ETL (expected tail loss). III: 355-356 
Euler approximation, II: 649-650, 
11:649/ 11:650/ 

Euler constant. III: 182 

Euler schemes, explicit/implicit, II: 666 

Europe 

common currency for, 11:393 
risk factors of, 11:174 
European call options 
Black-Scholes formula for, 

III: 639-640 

computed by different methods, 

III: 650-651,111:651/ 
explicit option pricing formula, 

III: 526-527 

pricing by simulation in VBA, 
111:465-466 

pricing in Black-Scholes setting, 

III: 649 

simulation of pricing, 111:444—445, 
111:462^163 

and term structure models, 
111:544-545 


European Central Bank, 1:300 
Events 

defined. III: 85, III: 162, III: 508 
effects of macroeconomic, 11:243-244 
extreme, 111:245-246, III: 260-261, 

III: 407 

identification of, 11:516 
mutually exclusive. III: 158 
in probability. III: 156 
rare. III: 645 
rare us. normal, 1:262 
tail, III:88n, 111:111,111:118 
three-,5, III: 381-382 
EVT (extreme value theory). See 

extreme value theory (EVT) 
EWMA (exponentially weighted 

moving averages), 111:409—113 
Exceedance observations. III: 362-363 
Exceedances, of VaR, III: 325-326, 

III: 339 

Excel 

accessing VBA in. III: 477 
add-ins for, 1:93, III: 651 
data series correlation in, 1:92-93 
determining corresponding 
probabilities in. III: 646 
Excel Link, 111:434 
Excel Solver, 11:70 
interactions with MATLAB, III: 448 
macros in. III: 449,111:454—455 
notations in, III:477n 
random number generation in, 
111:645-646 

random walks with, 1:83,1:85,1:87, 
1:90 

@RISK in, II:12f 
syntax for functions in. III: 456 
Exchange-rate intervention, study on, 
111:177-178 

Exercise prices, 1:452,1:484,1:508 
Expectation maximization (EM) 
algorithm, 11:146,11:165 
Expectations, conditional, 1:122, 

II: 517-518, III: 508-509 
Expectations hypothesis. III: 568-569, 
III: 601n 

Expected shortfall (ES), 1:385-386, 

III:332. See also average value at 
risk (AVaR) 

Expected tail loss (ETL), III:291, 

111:293/ 111:345-347,111:347/ 

III:355-356 

Expected value (EV), 1:511 
Expenses, noncash, 11:25 
Experiments, possibility of, 11:307 
Explicit costs, defined. III: 623 
Explicit Euler (EE) scheme, 11:674, 

II:677-678 

Exponential density function, 111:218/ 
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Exponential distribution. III: 217-219 
applications in finance. III :219 
Exponentially weighted moving 
averages (EWMA) 
discussion of, 777:409—413 
forecasting model of, 777:411 
properties of the estimates, 
111:410-411 

standard errors for, 777:411^412 
statistical methodology in, 777:409 
usefulness of, 777:413—414 
volatility estimates for, 777:410/ 
Exposures 

calculation of, II:247t 
correlation between, 77:186 
distribution of, 77:250/, 17:251/, 77:254 
management of, 77:182-183 
monitoring of portfolio, 77:249-250 
name-specific, 17:188 
Extrema, characterization of local, 

7:23 

Extremal random variables, 777:267 
Extreme value distributions, 
generalized, 777:269 
Extreme value theory (EVT), 
77:744-746, 777:95, 777:228 
defined, 777:238 
for IID processes, 777:265-274 
in IID sequences, 777:275 
role of in modeling, 77:753n 

Factor analysis 
application of, 77:165 
based on information coefficients, 
77:222 

defined, 77:141, 77:169 
discussion of, 77:164-166 
importance of, 77:238 
vs. principal component analysis, 
77:166-168 

Factor-based strategies 
vs. risk models, 77:236 
Factor-based trading, 77:196-197 
model construction for, 77:228-235 
performance evaluation of, 
77:225-228 

Factor exposures, 77:247-248, 
77:275-283 

Factorials, computing of, 777:456 
Factorization, defined, 77:307 
Factor mimicking portfolio (FMP), 
77:214 

Factor model estimation, 77:142-147, 
77:150 

alternative approaches and 
extensions, 77:145-147 
applied to bond returns, 77:144-145 
computational procedure for, 
77:142-144 


fixed N, 77:143 
large N, 77:143-144 
Factor models 

in the Black-Litterman framework, 
7:200 

commonly used, 77:150 
considerations in, 77:178 
cross-sectional, 77:220-221 
defined, 77:153 
fixed income, 77:271-272 
in forecasting, 77:230-231 
linear, 77:154-156,77:168 
normal, 77:156 
predictive, 77:142 
static/dynamic, 77:146-147, 

77:155 

in statistical methodology, 77:141 
strict, 77:155-156 
types of, 77:138-142 
usefulness of, 77:154, 77:503 
use of, 7:354, 77:137,77:150, 77:168, 
77:219-225 

Factor portfolios, 77:224-225 
Factor premiums, cross-sectional 
methods for evaluation of, 
77:214-219 

Factor returns, 77:1917, 77:1927 
calculation of, 77:248 
Factor risk models, 77:113, 77:119 
Factors 

adjustment of, 77:205-206 
analysis of data of, 77:206-211 
categories of, 77:197 
choice of, 77:232-235 
defined, 77:196, 77:211 
desirable properties of, 77:200 
development of, 77:198 
estimation of types of, 77:156 
graph of, 77:166/ 
known, 77:138-139 
K systematic, 77:138-139 
latent, 77:140-141,77:150 
loadings of, 77:144, 77:1457, 77:155, 
77:1667, 77:167/ 77:1687 
market, 77:176 

orthogonalization of, 77:205-206 
relationship to time series, 77:168/ 
sorting of, 77:215 
sources for, 77:200-201 
statistical, 77:197 

summary of well-known, 77:1967 
transformations applied to, 77:206 
use of multiple, 77:141-142 
Failures, probability of, 77:726-727 
Fair equilibrium, between multiple 
accounts, 77:76 
Fair value 

determination of, 777:584-585 
Fair value, assessment of, 77:6-7 


Fama, Eugene, 77:468, 77:473^74 
Fama-French three-factor model, 
77:139-140, 77:177 

Fama-MacBeth regression, 77:220-221, 
77:224, 77:227-228,77:228/ 77:237, 
77:240n 

Fannie Mae/Freddie Mac, 

writedowns of, 777:77n 
Fast Fourier transform algorithm, 
77:743 
Fat tails 

of asset return distributions, 

777:242 

in chaotic systems, 77:653 
class 2 , 777:261-263 
comparison between risk models, 
77:749-750 
effects of, 77:354 
importance of, 77:524 
properties of, 777:260-261 
in Student's t distribution, 77:734 
Favorable selection, 777:76-77 
F-distribution, 777:216-217 
Federal Reserve 

effects of on inflation risk premium, 
7:281 

study by Cleveland Bank, 
777:177-178 

timing of interventions of, 777:178 
Feynman-Kac formulas, 77:661 
FFAs (freight forward agreements), 
7:566 

Filtered probability spaces, 7:314-315, 
7:334n 

Filtration, 77:516-517, 111:476-477, 
777:489-490, 777:508 
Finance, three major revolutions in, 
777:350 

Finance companies, captive, 7:366-369 
Finance theory 
development of, 11:467-468 
effect of computers on, 77:476 
in the nineteenth century, 
77:468-469,77:476 
in the 1960s, 77:476 
in the 1970s, 77:476 
stochastic laws in, 777:472 
in the twentieth century, 77:476 
Financial assets, price distribution of, 
777:349-350 

Financial crisis (2008), 777:71 
Financial date, pro forma, 77:542-543 
Financial distress, defined, 7:351 
Financial institutions, model risk of, 
77:693 

Financial leverage ratios, 77:559-561, 
77:563 

Financial modelers, mistakes of, 
77:707-710 


Index 


581 


Financial planning, 111:126-127, III: 128, 
III: 129 

Financial ratios, II: 546,11:563-564 
Financial statements 

assumptions used in creating, 

II: 532 

data in, 11:563 

information in, 11:533-542,11:543 
pro forma, 11:22-23 
time statements for, 11:532 
usefulness of, 11:531 
use of, 11:204-205,11:246 
Financial time series, 1:79-80, 

1:386-387,11:415-416,11:503-504 
Financial variables, modeling of, 
111:280 

Find, in MATLAB, 111:422 
Finite difference methods, 11:648-652, 
11:656-657,11:665-666, 

11:674-675,11:676-677,111:19 
Finite element methods, 11:669-670, 
11:672,11:679-681 
Finite element space, 11:670-672 
Finite life general DDM, 11:5-7 
Finite states, assumption of, 1:100-101 
Firms 

assessment of, 11:546-547 
and capital structure, 11:473 
characteristics of, 11:94,11:176-177, 
11:201 

clientele of, 11:36 
comparable, 11:34,11:35-36 
geographic location of, 11:36 
history os. future prospects, 11:92 
phases of, 11:9-10 
retained earnings of, 11:20 
valuation of, 11:26-27,11:473 
value of, 11:27-31,11:39 
os. characteristics of group, 11:90-91 
First boundary problem, 11:655-656, 
11:657/ 

First Interstate Bancorp, 1:304 
analysis of credit spreads, 1:305f 
debt ratings of, 1:410 
First passage models (FPMs), 1:342, 
1:344-348 

Fischer-Tippett theorem. 111:266-267 
Fisher, Ronald, 1:140 
Fisherian, defined, 1:140 
Fisher's information matrix, 1:160n 
Fisher's law, 11:322-323 
Fixed-asset turnover ratio, 11:558 
Fixed-charge coverage ratio, 

11:560-561 

Flesaker-Hughston (FH) model, 

111:548-549 

Flows, discrete, 1:448-453 
FMP (factor mimicking portfolio), 
11:214 


Footnotes, in financial statements, 

11:541-542 

Ford Motor Company, 1:408/ 1:409/ 
Forecastability, 11:132 
Forecastability, concept of, 11:123 
Forecast encompassing 
defined, 11:230-231 
Forecasts 

of bid-ask spreads, H.-456M57 
comparisons of, 11:420-421 
contingency tables, !!:429f 
development of, 11:110-114 
directional, 11:428 
effect on future of, 11:122-123 
errors in, 11:422/ 

evaluation of, 11:428-430,111:368-370 
machine-learning approach to, 

11:128 

measures of, 11:429-430,11:430 
need for, 11:110-111 
in neural networks, II.419M20 
one-step ahead, 11:421/ 
parametric bootstraps for, 

11.-428M30 

response to macroeconomic shocks, 
11:55/ 

usefulness of, II: 131-132 
use of models for, 11:302 
of volatility, 111:412 
Foreclosures, III: 31,111:75 
Forward contracts 
advantages of, 1:430 
buying assets of, 1:439 
defined, 1:426,1:478 
equivalence to futures prices, 
1:432^33 

hedging with, 1:429, l:429t 
as OTC instruments, 1:479 
prepaid, 1:428 
price paths of, f:428f 
short us. long, 1:437-438,1:438/ 
valuing of, 1:426^430 
vs. futures, 1:430M31,1:433 
us. options, 1:437-439 
Forward curves 
graph of, 1:434/ 
modeling of, 1:533,1:557-558, 
1:564-565 

normal vs. inverted, 1:434 
of physical commodities, 1:555 
Forward freight agreements (FFAs), 
1:555,1:558,1:566 

Forward measure, use of, 1:543-544 
Forward rates 

calculation of, 1:491,111:572 

defined, 1:509-510 

from discount function, 111:566-567 

implied, 111:565-567 

models of. 111:543-544 


from spot yields, 111:566 
of term structure, 111:586 
Fourier integrals, 11:656 
Fourier methods, 1:559-560 
Fourier transform, 111:265 
FPMs (first passage models), 1:342, 
1:344-348 

Fractals, 11:653-654,111:278-280, 

111:479M80 

Franklin Tempelton Investment 

Funds, ll:496f, !!:497f, I!:498f 
Frechet distribution, !I:754n, 111:228, 
111:230,111:265,111:267,111:268 
Frechet-Hoeffding copulas, 1:327, 

1:329 

Freddie Mac, ll:77n, !!:754n. 111:49 
Free cash flow (FCF), 11:21-23 
analysis of, 11:570-571 
calculation of, 11:23-24,11:571-572 
defined, 11:569-571,11:578 
expected for XYZ, Inc., ll:30t 
financial adjustments to, 11:25-26 
statement of, direct method, 
11:24-25, ll:24f 

statement of, indirect method, 
11:24-25, ll:24f 
us. cash flow, 11:22-23 
Freedman-Diaconis rule, 11:494,11:495, 
11:497 
Frequencies 

accumulating, 11:491-492 
distributions of, 11:488-491,11:499/ 
empirical cumulative, 11:492 
formal presentation of, 11:491 
Frequentist, 1:140,1:148 
Frictions, costs of, 11:472-473 
Friedman, Milton, 1:123 
Frontiers, true, estimated and actual 
efficient, 1:190-191 
F_SCORE, use of, 11:230-231 
F-test, 11:336,11:337,11:344,11:425, 

11:426 

FTSE 100, volatility in. 111:412-413 
Fuel costs, 1:561,1:562-563. See also 
energy 

Full disclosure, defined, 11:532 
Functional, defined, 1:24 
Functional-coefficient autoregressive 
(FAR) model, 11:417 
Functions 
affine, 1:31 

Archimedean, 1:329,1:330-331,1:331 
Bessel, of the third kind, 11:591 
beta, 11:591 

characteristic, 11:591-592,11:593 
choosing and calibrating of, 
1:331-333 

Clayton, Frank, Gumbel, and 
Product, 1:329 
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Functions ( Continued) 
continuous, 77:581-584,77:582/, 
77:583,77:592-593 

continuous/discontinuous, 77:582/ 
convex, 7:24-27, 7:25, 7:25/ 7:26/ 
convex quadratic, 7:26, 7:31/ 
copula, 7:320, 7:325-333, 7:407-408 
for default times, 7:329-331 
defined, 7:24, 7:333 
density, 7:141 
with derivatives, 77:585/ 
elementary, 777:474 
elliptical, 7:328-329 
empirical distribution, 777:270 
factorial, 77:590-591 
gamma, 77:591, 77:591/ 777:212 
gradients of, 7:23 
Heaviside, 77:418-419 
hypergeometric, 777:256, 777:257 
indicator, 77:584-585, 77:584/ 77:593 
likelihood function, 7:141-143, 
7:143/ 7:144/ 7:148,7:176, 7:177 
measurable, 777:159-160, 777:160/ 
777:201 

minimization and maximization of 
values, 7:22, 7:22/ 
monotonically increasing, 
77:587-588, 77:588/ 
nonconvex quadratic, 7:26-27 
nondecreasing, 777:154-155,777:155/ 
normal density, 777:226/ 
optimization of, 7:24 
parameters of copulas, 7:331-332 
properties of quasi-convex, 7:28 
quasi-concave, 7:27-28, 7:27/ 
right-continuous, 777:154—155, 
777:155/ 

surface of linear, 7:33/ 
with two local maxima, 7:23/ 
usefulness of, 7:411—412 
utility, 7:4-5, 7:14-15, 7:461 
Fund management, art of, 7:273 
Fund separation theorems, 7:36 
Futures 

Eurodollar, 7:503 
hedging with, 7:433 
market for housing, 77:396-397 
prices of, and interest rates, 7:435n 
telescoping positions of, 7.-431M32 
theoretical, 7:487 
valuing of, 7.-430M33 
vs. forward contracts, 7:430—431 
Futures contracts 
defined, 7:478 
determining price of, 7:481 
pricing model for, 7:479—481 
theoretical price of, 7.-481M84 
vs. forward contracts, 7:433, 
7:478-479 


Futures options, defined, 7:453 
Future value, 77:618 
determining of money, 

77:596-600 

Galerkin methods, principle of, 

77:671 

Gamma, 7:509, 7:518-520 
Gamma process, 777:498 
Gamma profile, 7:519/ 

Gapping effect, 7:509 
GARCH (generalized autoregressive 
conditional heteroskedastic) 
models 

asymmetric, 77:367-368 
exponential (EGARCH), 77:367-368 
extensions of, 777:657 
factor models, 77:372 
GARCH-M (GARCH in mean), 
77:368 

Markov-switching, 7:180-184 
time aggregation in, 77:369-370 
type of, 77:131 
usefulness of, 777:414 
use of, 7:175-176, 7:185-186, 77:371, 
77:733-734,777:388 
and volatility, 7:179 
weights in, 77:363-364 
GARCH (1,1) model 
Bayesian estimation of, 7:176-180 
defined, 77:364 
results from, 77:366,77:3667 
skewness of, 777:390-391 
strengths of, 777:388-389 
Student's f, 7:182 
use of, 7:550-551, 777:656-657 
GARCH (1,1) process, 7:5517 
Garman-Kohlhagen system, 7:510-511, 
7:522 

Gaussian density, 777:98/ 

Gaussian model, 777:547-548 
Gaussian processes, 777:280,777:504 
Gaussian variables, and Brownian 
motion, 777:480—481 
Gauss-Markov theorem, 77:314 
GBM (geometric Brownian motion), 
7:95, 7:97 

GDP (gross domestic product), 7:278, 
7:282, 77:138, 77:140 
General inverse Gaussian (GIG) 
distribution, 77:523-524 
Generalized autoregressive 

conditional heteroskedastic 
(GARCH) models. See GARCH 
(generalized autoregressive 
conditional heteroskedastic) 
models 

Generalized central limit theorem, 
777:237, 777:239 


Generalized extreme value (GEV) 

distribution, 77:745, 777:228-230, 
777:272-273 

Generalized inverse Gaussian 

distribution, use of, 77:521-522 
Generalized least squares (GLS), 
7:198-199, 77:328 

Generalized tempered stable (GTS) 
processes, 777:512 
Generally accepted accounting 

principles (GAAP), 77:21-22, 
77:531-532,77:542-543 
Geometric mean reversion (GMR) 
model, 7:91-92 
computation of, 7:91 
Gibbs sampler, 7:172n, 7:179, 7:184-185 
GIG models, calibration of, 77:526-527 
Gini index of dissimilarity (Gini 
measure), 777:353-354 
Ginnie Mae/Fannie Mae/Freddie 
Mac, actions of, 777:49 
Girsanov's theorem 
and Black-Scholes option pricing 
formula, 7:132-133 
with Brownian motion, 777:511 
and equivalent martingale 
measures, 7:130-133 
use of, 7:263, 777:517 
Glivenko-Cantelli theorem, 777:270, 
777:272,777:348n, 777:646 
Global Economy Workshop, Santa Fe 
Institute, 77:699 

Global Industry Classification 

Standard (GICS®), 77:36-37, 
77:248 

Global minimum variance (GMV) 
portfolios, 7:39 

GMR (geometric mean reversion) 
model, 7:91-92 

GMV (global minimum variance) 
portfolios, 7:15, 7:194-195 
GNP, growth rate of (1947-1991), 
77:410-411, 77:410/ 

Gradient methods, use of, 77:684 
Granger causality, 77:395-396 
Graphs, in MATLAB, 777.-428M33 
Greeks, the, 7:516-522 
beta and omega, 7:522 
delta, 7:516-518 
gamma, 7:518-520 
rho, 7:521-522 
theta, 7:509, 7:520-521 
use of, 7:559, 77:660, 777:643-644 
vega, 7:521 

Greenspan, Alan, 7:140-141 
Growth, 7:283/ 77:239, 77:597-598, 
77:601-602 

Gumbel distribution, 777:265, 777:267, 
777:268-269 
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Hamilton-Jacobi equations, II :675 
Hankel matrices, 77:512 
Hansen-Jagannathan bound, 1: 59, 
7:61-62 

Harrison, Michael, 77:476 
Hazard, defined, 777:85 
Hazard (failure) rate, calculation of, 
777:94-95 

Heat diffusion equation, 77:470 
Heath-Jarrow-Morton framework, 
7:503, 7:557 

Heavy tails, 777:227,777:382 
Hedge funds, and probit regression 
model, 77:349-350 
Hedge ratios, 1:416-417, 7:509 
Hedges 

importance of, 7:300 
improvement using DTS, 7:398 
in the Merton context, 7:409 
rebalancing of, 7:519 
risk-free, 7:532/ 

Hedge test, 7:409, 7:411 
Hedging 

costs of, 7:514, 77:725 
and credit default swaps, 7.-413M14 
determining, 7:303-304 
with forward contracts, 7:429, 7:429f 
of fuel costs, 7:561 
with futures, 7:433 
gamma, 7:519 
portfolio-level, 7:412^13 
of positions, 77:724—726 
ratio for, 77:725 
with swaps, 7:434^135 
transaction-level, 7:412 
usefulness of, 7:418 
use of, 7:125-126 
using macroeconomic indices, 
7:414-417 

Hessian matrix, 7:23-24, 7:25,7:186n, 
777:645 

Heston model, 7:547, 7:548, 7:552, 
77:682 

with change of time, 777:522 
Heteroskedasticity, 77:220, 77:359, 
77:360, 77:403 

HFD (high-frequency data). See 
high-frequency data (HFD) 
Higham's projection algorithm, 

77:446 

High-dimensional problems, 77:673 
High-frequency data (HFD) 
and bid-ask bounce, 11:454-457 
defined, 77:449-450 
generalizations to, 77:368-370 
Level I, 11:451-452, 77:452/ 77:4537 
Level II, 77:451 
properties of, 77:451, 77:4537 
recording of, 77:450^151 


time intervals of, 77:457-462 
use of, 77:300, 77:481 
volume of, 11:451-454 
Hilbert spaces, 77:683 
Hill estimator, 77:747, 777:273-274 
Historical method 
drawbacks of, 777:413 
weighting of data in, 777:397-398 
Hit rate, calculation of, 77:240n 
HJM framework, 7:498 
HJM methodology, 7:496-497 
Holding period return, 7:6 
Ho-Lee model 
continuous variant for, 7:497 
defined, 7:492 
in history, 7:493 
interest rate lattice, 777:614/ 
as short rate model, 777:23 
for short rates, 777:605 
as single factor model, 777:549 
Home equity prepayment (HEP) 
curve, 777:55-56,777:56/ 
Homeowners, refinancing behavior of, 
777:25 

Home prices, 7:412, 77:397/ 77:399f, 
777:74-75 

Homoskedasticity, 77:360, 77:373 
Horizon prices, 777:598 
Housing, 77:396-399, 777:48 
Howard algorithm (policy iteration 
algorithm), 77:676-677,77:680 
Hull-White (HW) models 
binomial lattice, 777:610-611 
for calibration, 77:681 
defined, 7:492 
interest rate lattice, 777:614/ 
and short rates, 777:545-546 
for short rates, 777:605 
trinomial lattice, 777:613, 777:616/ 
usefulness of, 7:503 
use of, 777:557, 777:604 
valuing zero-coupon bond calls 
with, 7:500 
Hume, David, 7:140 
Hurst, Harold, 77:714 
Hypercubes, use of, 777:648 

IBM stock, log returns of, 77:407/ 
Ignorance, prior, 7:153-154 
Implementation risk, 77:694 
Implementation shortfall approach, 
777:627 

Implicit costs, 771:631 
Implicit Euler (IE) scheme, 77:674, 
77:677-678 

Implied forward rates, 777:565-567 
Impurity, measures of, 77:377 
Income, defined for public 
corporation, 77:21-22 


Income statements 
common-size, 77:562-563,77:562f 
defined, 77:536 

in financial statements, 77:536-537 
sample, 77:537t, 77:547t 
structure of, 77:536 
XYZ Inc. (example), 77:28f 
Income taxes. See taxes 
Independence, 7:372-373, 77:624-625, 
777:363-364, 777:368 
Independence function, in VaR 
models, 777:365-366 
Independently and identically 

distributed (IDD) concept, 
7:164,7:171, 77:127, 777:274-280, 
777:367,777:414 
Indexes 

characteristics of efficient, 7:427 
defined, 77:67 

of dissimilarity, 777:353-354 
equity, 7:157, 77:1907, 77:262-263 
tail, 77:740-741, 77:740/ 777:234 
tracking of, 77:64, 77:180 
use of weighted market cap, 7:38 
value weighted, 7:76-77 
volatility, 777:550-552, 777:552/ 

Index returns, scenarios of, 77:1907, 
77:1917 

Indifference curves, 7:4—5, 7:5/ 7:14 
Industries, characteristics of, 77:36-37, 
77:39-40 

Inference, 7:155-158, 7:1697 
Inflation 

effect on after-tax real returns, 
7:286-287 

and GDP growth, 7:282 
indexing for, 7:278-279 
in regression analysis, 77:323 
risk of, 77:282 

risk premiums for, 7:280-283 
seasonal factors in, 7:292 
shifts in, 7:285f 
volatility of, 7:281 
Information 

anticipation of, 777:476 
from arrays in MATLAB, 777:421 
completeness of, 7:353-354 
contained in high volatility stocks, 
777:629 

and filtration, 777:517 
found in data, 77:486 
and information propagation, 

77:515 

insufficient, 777:44 
integration of, 77:481-482 
overload of, 77:481 
prior in Bayesian analysis, 
7:151-155, 7:152 
propagation of, 7:104 
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Information ( Continued) 

structures of, f:106/, 11:515-517 
unstructured us. semistructured, 
77:481-482 

Information coefficients (ICs), 77:98-99, 
77:221-223, 77:223/, 77:227f, 77:234 
Information ratios 

defined, 77:86n, 77:115, 77:119, 77:237 
determining, 77:100/ 
for portfolio sorts, 77:219 
use of, 77:99-100 
Information sets, 77:123 
Information structures 
defined, 77:518 

Information technology, role of, 
77:480-481 

Ingersoll models, 7:271-273, 7:275/ 
Initial conditions, fixing of, 77:502 
Initial margins, 7:478 
Initial value problems, 77:639 
Inner quartile range (IQR), 77:494 
Innovations, 77:126 
Insurance, credit, 7:413^14 
Integrals, 77:588-590, 77:593. See a/so 
stochastic integrals 
Integrated series, and trends, 
77:512-514 

Integration, stochastic, 777:472, 777:473, 
777:483 

Intelligence, general, 77:154 
Intensity-based frameworks, and the 
Poisson process, 7:315 
Interarrival time, 777:219,777:225 
Intercepts, treatment of, 77:334-335 
Interest 

accumulated, 77:604—605, 77:604/ 
annual us. quarterly compounding, 
77:599/ 

compound, 77:597, II:597f 
computing accrued, and clean price, 
7:214-215 

coverage ratio, 77:560 
defined, 77:596 

determining unknown rates, 
77:601-602 

effective annual rate (EAR), 
77:616-617 
mortgage, 77:398 
simple us. compound, 77:596 
terms of, 77:619 
from TIPS, 7:277 
Interest rate models 
binomial, 777:173-174, 777:174/ 
classes of, 771:600 
confusions about, 777:600 
importance of, 777:600 
properties of lattices, 777:610 
realistic, arbitrage-free, 777:599 
risk-neutral / arbitrage-free, 777:597 


Interest rate paths, 777:6-9,777:7, 777:87 
Interest rate risk, 777:12-14 
Interest rates 

absolute us. relative changes in, 
777:533-534 

approaches in determining future, 
777:591 

binomial model of, 777:173-174 
binomial trees, 7:236,7:236/ 7:237/ 
7:240f, 7:244, 7:244/ 777:174/ 
borrowing us. lending, 7:482-483 
calculation of, 77:613-618 
calibration of, 7:495 
caps/caplets of, 777:589-590 
caps on, 7:248-249 
categories of term structure, 777:561 
computing sensitivities, 777:22-23 
continuous, 7:428, 7:439^88 
derivatives of, 777:589-590 
determination of appropriate, 
7:210-211 

distribution of, 777:538-539 
dynamic of process, 7:262 
effect of, 7:514-515 
effect of shocks, 777:23 
effect on putable bonds, 777:303-304 
future course of, 777:567, 777:573 
and futures prices, 7:435n 
importance of models, 777:600 
jumps of, 777:539-541 
jumpy and continuous, 777:539/ 
long us. short, 777:538 
market spot/forward, 7:495f 
mean reversion of, 777:7 
modeling of, 7:261-265, 7:267, 7:318, 
7:491, 7:503,777:212-213 
multiple, 77:599-600 
negative, 777:538 
nominal, 77:615-616 
and option prices, 7:486^487 
and prepayment risk, 777:48 
risk-free, 7:442 
shocks/shifts to, 777:585-596 
short-rate, 7:491-494,777:595 
simulation of, 777:541 
stochastic, 7:344, 7:346 
structures of, 777:573,777:576 
use of for control, 7:489 
volatility of, 777:405, 777:533 
Intermarket relations, no-arbitrage, 
7:453-455 

Internal consistency rule, in OAS 
analysis, 7:265 

Internal rate of return (IRR), 77:617-618 
in MBSs, 777:36 

International Monetary Fund 
Global Stability Report, 7:299 
International Swap and Derivatives 
Association (ISDA). See ISDA 


Interpolated spread (I-spread), 7:227 
Interrate relationship, arbitrage-free, 
777:544 

Intertemporal dependence, and risk, 
771:351 

Intertrade duration, 77.-460M61, 

77:4627 

Intertrade intervals, 77.-460M61 
Intervals, credible, 7:170 
Interval scales, data on, 77:487 
Intrinsic value, 7:441,7:511, 7:513, 
77:16-17 

Invariance property, 777:328-329 
Inventory, 77:542,77:557 
Inverse Gaussian process, 777:499 
Investment, goals of, 77:114-115 
Investment management, 777:146 
Investment processes 
activities of integrated, 77:61 
evaluation of results of, 77:117-118 
model creation, 77:96 
monitoring of performance, 77:104 
quantitative, 77:95, 77:95/ 
quantitative equity, 77:95/ 77:96/ 
77:105 

research, 77:95-102 
sell-structured, 77:108 
steps for equity investment, 77:119 
testing of, 77:109 

Investment risk measures, 771:350-351 
Investments, 7:77-78n, 77:50-51, 
77:617-618 

Investment strategies, 77:66-67, 

77:198 

Investment styles, quantamental, 
77:93-94, 77:93/ 

Investors 

behavior of, 77:207, 77:504 
comfort with risk, 7:193 
completeness of information of, 
7:353-354 

focus of, 7:299, 77:90-91 
fundamental us. quantitative, 
77:90-94, 77:91/ 77:92/ 77:105 
goals/objectives of, 77:114-115, 
77:179, 777:631 

individual accounts of, 77:74 
monotonic preferences of, 7:57 
number of stocks considered, 77:91 
preferences of, 7:5,7:260, 77:48, 77:56, 
77:92-93 

prior beliefs of, 77:727 
real-world, 77:132 
risk aversion of, 77:82-83,77:729 
SL-CAPM assumptions about, 7:66 
sophistication of, 77:108 
in uncertain markets, 77:54 
views of, 7:197-199 
Invisible hand, notion of, 77:468^469 
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ISDA (International Swap and 
Derivatives Association) 

Credit Derivative Definitions (1999), 
1:230,1:528 

Master Agreement, I:538 
organized auctions, I:526-527 
supplement definition, 1:230 
I-spread (interpolated spread), 1:227 
Ito, Kiyosi, II :470 
Ito definition, 111:486-487 
Ito integrals, 1:122,111:475,111:481, 
111:490-491 
Ito isometry. III :475 
Ito processes 
defined, 1:95 
generic univariate, 1:125 
and Girsanov's theorem, 1:131 
under HJM methodology, 1:497 
properties of, 111:487-488 
and smooth maps, 111:493 
Ito's formula, 1:126,111:488-489 
Ito's lemma 
defined, 1:98 
discussion of, 1:95-97 
in estimation, 1:348 
and the Heston model, 1:548 

James-Stein shrinkage estimator, 1:194 
Japan, credit crisis in, 1:417 
Jarrow-Turnbull model, 1:307 
Jarrow-Yu propensity model, 1:324-325 
Jeffreys' prior, 1:153,1:160n, 1:171-172 
Jensen's inequality, 1:86, III:569 
Jevons, Stanley, 11:468 
Johansen-Juselius cointegration tests, 
II:391-393,11:395 
Joint jumps/defaults, 1:322-324 
Joint survival probability, 1:323-324 
Jordan diagonal blocks, 11:641-642 
Jorion shrinkage estimator, 1:194,1:202 
Jump-diffusion, III: 554-557,111:657 
Jumps 

default, 1:322-324 
diffusions, 1:559-560 
downward, 1:347 
idiosyncratic, 1:323 
incorporation of, 1:93-94 
in interest rates. III:539-541 
joint, 1:322-324 
processes of, 111:496 
pure processes, 111:497-501, 111:506 
size of, 111:540 

Kalotay-Williams-Fabozzi (KWF) 
model, 111:604,111:606-607, 
111:615/ 

Kamakura Corporation, 1:301,1:307, 

1:308-309,1:310n 
Kappa, 1:521 


Karush-Kuhn-Tucker conditions (KKT 
conditions), 1:28-29 
Kendall's tau, 1:327,1:332 
Kernel regression, 11:403,11:412-413, 
11:415 

Kernels, 11:412,11:413/ 11:746 
Kernel smoothers, 11:413 
Keynes, John Maynard, 11:471 
Key rate durations (KRD), 11:276, 
111:311-315,111:317 
Key rates, 11:276,111:311 
Kim-Rachev (KR) process. III:512-513 
KKT conditions (Karush-Kuhn-Tucker 
conditions), 1:28-29,1:31,1:32 
KoBoL distribution. Ill:257n 
Kolmogorov extension theorem, 
111:477-478 

Kolmogorov-Smirnov (KS) test, 11:430, 
III:366,111:647 

Kolomogorov equation, use of. III:581 
Kreps, David, 11:476 
Krispy Kreme Doughnuts, II:574-575, 
11:574/ 

Kronecker product, 1:172, l:173n 
Kuiper test. III:366 
Kurtosis, 1:41,111:234 

Lag operator L, 11:504—506,11:507, 

II:629-630 

Lagrange multipliers, 1:28,1:29-31, 
1:30,1:32 

Lag times, 11:387,111:31 
Laplace transforms, 11:647-648 
Last trades, price and size of, 11:450 
Lattice frameworks 
bushy trees in, 1:265,1:266/ 
calibration of, 1:238-240 
fair, 1:235 

interest rate, 1:235-236,1:236-238 

one-factor model, 1:236/ 

for pricing options, 1:487 

usefulness of, 1:235 

use of, 1:240,1:265-266,11I.T4 

value at nodes, 1:237-238 

1-year rates, 1:238/ 1:239/ 

Law of iterated expectations, 1:110, 
1:122,11:308 

Law of large numbers, 1:267,1:270n, 

III:263-264,111:275 
Law of one a, 11:50 
Law of one price (LOP), 1:52-55, 
1:99-100,1:102,1:119,1:260 
LCS (liquidity cost score), 1:402 
use of, 1:403 

LDIs (liability-driven investments), 
1:36 

LD (loss on default), 1:370-371 
Leases, in financial statements, 11:542 
Least-square methods, 11:683-685 


Leavens, D. H., 1:10 
Legal loss data 

Cruz study, 1/1:113,111:1151 
Lewis study, 111:117,111:1171 
Lehman Brothers, bankruptcy of, 1:413 
Level (parallel) effect, II: 145 
Levy-Khinchine formula. 111:253-254, 
111: 257 

Levy measures, 111:254,111:2541 
Levy processes 
and Brownian motion. 111: 504 
in calibration, 11:682 
change of measure for, 111:511-512 
conditions for. III:505 
construction of. III:506 
from Girsanov's theorem, 111:511 
and Poisson process. Ill:496 
as stochastic process. 111:505-506 
as subordinators. III :521 
for tempered stable processes, 
111:512-514,111:5141 
and time change, 111:527 
Levy stable distribution, 111:242, 
111:339, Ill:382-386,111:392 
LGD (loss given default), 1:366,1:370, 
1:371 

Liabilities, 11:533,11:534-535,111:132 
Liability-driven investments (LDIs), 
1:36 

Liability-hedging portfolios (LHPs), 
1:36 

LIBOR (London Interbank Offered 
Rate) 

and asset swaps, 1:227 
changes in, by type. III:539-540 
curve of, 1:226 
interest rate models, 1:494 
market model of. III:589 
spread of, 1:530 
in total return swaps, 1:541 
use of in calibration, 111:7 
Likelihood maximization, 1:176 
Likelihood ratio statistic, 11:425 
Limited liability rule, 1:363 
Limit order books, use of. III:625, 

111:632n 

Lintner, John, 11:474 

Lipschitz condition, II:658n, Ill:489, 

111: 490 
Liquidation 
effect of, 11:186 
procedures for, 1:350-351 
process models for, 1:349-351 
time of, 1:350 
vs. default event, 1:349 
Liquidity 

assumption of, 111:371 
in backtesting, II :235 
changes in, 1:405 
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Liquidity ( Continued ) 
cost of, 7:401 

creation of. III: 624-625, III :631 
defined. III: 372, 777:380 
effect of, 11:284 

estimating in crises, 111:378-380 
in financial analysis, II: 551-555 
and LCS, 7:404 
and market costs, 777:624 
measures of, 77:554-555 
premiums on, 7:294, 7:307 
ratios for, 77:555 
in risk modeling, 77:693 
shortages in, 7:347-348 
and TIPS, 7:293, 7:294 
and transaction costs, 777:624-625 
Liquidity-at-risk (LAR), 777:376-378 
Liquidity cost, 777:373-374,777:375-376 
Liquidity cost score (LCS), 7:402,7:403 
Liquidity preference hypothesis, 
777:570 

Liquidity ratios, 77:563 
Liquidity risk, 77:282, 777:380 
Ljung-Box statistics, 77:407,77:421, 
77:422, 77:427-428 
LnMix models, calibration of, 
77:526-527 

Loading, standardization of, 77:177 

Loan pools, 777:8-9 

Loans 

amortization of, 77:606-607, 
77:611-613 

amortization table for, 77:6127 
delinquent, 777:63 

fixed rate, fully amortized schedule, 
77:6147 

floating rate, 77:613 
fully amortizing, 77:611 
modified, 777:32 
nonperforming, 777:75 
notation for delinquent, 777:45n 
recoverability of, 777:31-32 
refinancing of, 777:68-69 
repayment of, 77:612/, 77:613/ 
term schedule, 77:6157 
Loan-to-value ratios (LTVs), 777:31-32, 
777:69, 777:73, 777:74-75 
Location parameters, 7:160n, 
777:201-202 

Location-scale invariance property 
(Gaussian distribution), 77:732 
Logarithmic Ornstein-Uhlenbeck 

(log-OU) processes, 7:557-558 
Logarithmic returns, 777:211-212, 
777:225 

Logistic distribution, 77:350 
Logistic regression, 7:307, 7:308, 7:310 
Logit regression models, 77:349-350, 
77:350 


Log-Laplace transform, 777:255-256 
Lognormal distribution, 777:222-225, 
777:392 

Lognormal mixture (LnMix) 
distribution, 77:524—525 
Lognormal variables, 7:86 
Log returns, 7:85-86, 7:88 
London Interbank Offered Rate 
(LIBOR). See LIBOR 
Lookback options, 7:114,777:24 
Lookback periods, 777:402,777:407 
LOP (law of one price). See law of one 
price (LOP) 

Lorenz, Edward, 77:653 
Loss distributions, conditional, 
777:340-341 

Losses. See also operational losses 
allocation of, 777:32 
analysis of in backtesting, 777:338 
collateral vs. tranche, 771:36 
computation of, 7:383 
defined, 777:85 

estimation of cumulative, 777:39—40 
expected, 7:369-370, 7:373-374 
expected vs. unexpected, 7:369, 
7:375-376 

internal us. external, 771:83-84 
median of conditional, 777:348n 
projected, 111:37f 
restricting severity of, 7:385-386 
severity of, 777:44 
unexpected, 7:371-372,7:374-375 
Loss functions, 7:160n, 777:369 
Loss given default (LGD), 7:366, 7:370, 
7:371 

Loss matrix analysis, 777:40^1 
Loss on default (LD), 7:370-371 
Loss severity, 777:30-31, 777:60-62, 
777:97-99 

Lottery tickets, 7:462 
Lower partial moment risk measure, 
777:356 

Lundbert, Filip, 77:467, 77:470^71 

Macroeconomic influences, defined, 
77:197 

Magnitude measures, 77:429^130 
Maintenance margins, 7:478 
Major indexes, modeling return 

distributions for, 777:388-392 
Malliavin calculus, 777:644 
Management, active, 77:115 
Mandelbrot, Benoit, 77:653,77:738, 
777:234, 777:241-242 
Manufactured housing prepayment 
(MHP) curve, 777:56 
Marginalization, 77:335 
Marginal rate of growth, 777:197-198 
Marginal rate of substitution, 7:60 


Margin calls, exposure to, 777:377 
Market cap vs. firm value, 77:39 
Market completeness, 7:52, 7.T05 
Market efficiency, 7:68-73, 77:121, 
77:473^74 
Market equilibrium 

and investor's views, 7.T98-199 
Market impact 

costs of, 777:623-624, 777:627 
defined, 77:69 
forecasting/modeling of, 

777:628-631 

forecasting models for, 777:632 
forecasting of, 777:628-629, 
777:629-631 

measurement of, 777:626-628 
between multiple accounts, 77:75-76 
in portfolio construction, 77:116 
and transaction costs, 77:70 
Market model regression, 77:139 
Market opportunity, two state, 7:460/ 
Market portfolios, 7:66-67, 7:72-73 
Market prices, 7:57, 777:372 
Market risk 

approaches to estimation of, 777:380 
in bonds, 777:595 
in CAPM, 7:68-69, 77:474 
importance of, 777:81 
models for, 777:361-362 
premium for, 7:203n, 7:404 
Markets 

approach to segmented, 77:48-51 
arbitrage-free, 7:118 
complete, 7:51-52, 777:578 
complex, 77:49 

effect of uncertainty in on bid-ask 
spreads, 11:455-456 
efficiency of, 77:15-16 
frictionless, 7:261 
incomplete, 7:461^62 
liquidity of, 777:372 
models of, 777:589 
for options and futures, 7:453^454 
perfect, 77:472 

properties of modern, 777:575-576 
sensitivities to value-related 
variables, 77:547 
simple, 7:70 

systematic fluctuations in, 
77:172-173 

unified approach to, 77:49 
up/down, defined, 77:347 
Market sectors, defined, 777:560 
Market standards, 7:257 
Market structure, and exposure, 
77:269-270 

Market timing, 77:260 
Market transactions, upstairs, 
777:630-631, 777:632n 
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Market weights, 77:269f 
Markov chain approximations, 77:678 
Markov chain Monte Carlo (MCMC) 
methods, 11:410/, 11:417-418 
Markov coefficients, II: 506-507, II :512 
Markov matrix, 1 :368 
Markov models, 7:114 
Markov processes 
in dynamic term structures, 777:579 
hidden, 7:182 
use of, 777:509, 777:517 
Markov property, 7:82,7:180-181,7:183, 
77:661, 777:193n 

Markov switching (MS) models 
discussion of, 7:180-184 
and fat tails, 777:277-278 
stationarity with, 777:275 
usefulness of, 77:433 
use of, 77:409-411, 77:4117 
Markowitz, Harry M., 7:38, 7:140, 
77:467, 77:471-472, 777:137, 
777:351-352 

Markowitz constraint sets, 7:69, 7:72 
Markowitz diversification, 7:10-11, 

7:11 

Markowitz efficient frontiers, 7:191/ 
Markowitz model 

in financial planning, 777:126 
Mark-to-market (MTM) 

calculation of value, 7:535-536,7:536f 
defined, 7:535 

and telescoping futures, 7:431—432 
Marshall and Siegel, 77:694 
Marshall-Olkin copula, 7:323-324, 

7:329 

Martingale measures, equivalent 
and arbitrage, 7:111-112, 7:124 
and complete markets, 7:133 
defined, 7:110-111 
and Girsanov's theorem, 7:130-133 
and state prices, 7:133-134 
use of, 7:130-131 
working with, 7:135 
Martingales 

with change of time methods 
(CTM), 777:522-523 
defined, 77:124, 77:126,77:519 
development of concept, 77:469—470 
equivalent, 77:476 
measures of, 7:110-111 
use of conditions, 7:116 
use of in forward rates, 777:586 
Mathematical theory, importance of 
advances in, 777:145 
Mathworks, website of, 777:418 
MATLAB 

array operations in, 777:420-421 
basic mathematical operations in, 
777.-419M20 


construction of vectors/matrices, 
777:420 

control flow statements in, 
777:427-428 
desktop, 777:419/ 

European call option pricing with, 
777:444-445 

functions built into, 111:421-422 
graphs in, 777:428-433, 777.-429M30/, 
777:431/ 

interactions with other software, 
777:433-434 

M-files in, 777.-418M19,777:423, 
777:447 

operations in, 777:447 
optimization in, 777:434—444, 

777:4351 

Optimization Tool, 777.-435M36, 
777:436/ 777:440/ 777:441/ 
overview of desktop and editor, 
777:418-419 

quadprog function, 77:70 
quadratic optimization with, 
777:441-444 

random number generation, 

777:444 

for simulations, 777:651 
Sobol sequences in, 777:445—446 
for stable distributions, 777:344 
surf function in, 111:432-433 
syntax of, 777:426—427 
toolboxes in, 111:417-418 
user-defined functions in, 
777:423—127 
Matrices 

augmented, 77:624 
characteristic polynomial of, 77:628 
coefficient, 77:624 
companion, 77:639-640 
defined, 77:622 
diagonal, 77:622-623,77:640 
eigenvalues of random, 77:704-705 
eigenvectors of, 77:640-641 
in MATLAB, 777:422, 777:432 
operations on, 77:626-627 
ranks of, 77:623,77:628 
square, 77:622-623,77:626-627 
symmetric, 77:623 
traces of, 77:623 

transition, 777:32-33, 777:321, 777:331, 
777:35/ 

types of, 77:622, 77:628 
Matrix differential equations, 777:492 
Maturity value (lump sum), from 
bonds, 7:211 

Maxima, 777:265-269,777:266/ 
Maximum Description Length 
principle, 77:703 

Maximum eigenvalue test, 77:392-393 


Maximum likelihood (ML) 
approach, 7:141, 7:348 
methods, 77:348-349, 77:737-738, 
777:273 

principal, 77:312 

Maximum principle, 77:662, 77:667 
Max-stable distributions, 777:269, 
777:339-340 

MBA (Mortgage Bankers Association) 
refi index, 777:70, 777:70/ 

MBS (mortgage-backed securities), 
7:258 

agency os. nonagency, 777:48 
cash flow characteristics of, 777:48 
default assumptions about, 777:8 
negative convexity of, 777:49 
performance of, 777:74 
prices of, 777:26 

projected long-term performance of, 
777:34/ 

time-related factors in, 777:73-74 
valuation of, 777:62 
valuing of, 777:645 
MBS (mortgage-backed securities), 
nonagency 
analysis of, 111:44-45 
defined, 777:48 

estimation of returns, 777:36—44 
evaluation of, 777:29 
factors impacting returns of, 
777:30-32 

yield tables for, 777:411 
Mean absolute deviation (MAD), 
777:353 

Mean absolute moment (MAM(q)), 
777:353 

Mean colog (M-colog), 777:354 
Mean entropy (M-entropy), 777:354 
Mean excess function, 77:746-747 
Mean/first moment, 777:201-202 
Mean residual life function, 77:754n 
Mean reversion 

discussion of, 7:88-92 
geometric, 7:91-92 
in HW models, 777:605 
and market stability, 777:537-538 
models of, 7:97 

parameter estimation, 7:90-91 
risk-neutral asset model, 777:526 
simulation of, 7:90 
in spot rate models, 777:580 
stabilization by, 777:538 
within a trinomial setting, 777:604 
Mean-reverting asset model (MRAM), 
777:525-526 

Means, 7:148, 7:155, 7:380,777:166-167 
Mean-variance 
efficiency, 7:190-191 
efficient portfolios, 7:13, 7:68, 7:69-70 
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Mean-variance ( Continued) 
nonrobust formulation, 777:139-140 
optimization, t:192 
constraints on, 7:191 
estimation errors and, 7:17-18 
practical problems in, 7:190-194 
risk aversion formulation, 77:70 
Mean variance analysis, 7:3,7:15/, 

7:201, 77:471-472, 777:352 
Measurement levels, in descriptive 
statistics, 77:486-487 
Media effects, 777:70 
Median, 7:155,7:159n, 77:40 
Median tail loss (MTL), 777:341 
Mencken, H. L., 77:57 
Menger, Carl, 77:468 
Mercurio-Moraleda model, 7:493^94 
Merton, Robert, 7:299, 7:310, 77:468, 
77:475,77:476 
Merton model 

advantages and criticisms of, 

7:344 

applied to probability of default, 
7:363-365 

with Black-Scholes approach, 
7:305-306 

default probabilities with, 7:307-308 
discussion of, 7:343-344 
drawbacks of, 7:410 
with early default, 7:306 
evidence on performance, 7:308-309 
as first modern structural model, 
7:313,7:341 
in history, 7:491 

with jumps in asset values, 7:306 
portfolio-level hedging with, 
7:411-413 

with stochastic interest rates, 7:306 
and transaction-level hedging, 
7:408-410 

usefulness of, 7:410, 7:411-412, 
7:417-418 

use of, 7:304, 7:305, 7:510 
variations on, 7:306-307 
Methodology, equally weighted, 
777:399 
Methods 

quantile, 77:354—356 
Methods pathwise, 777:643 
Metropolis-Hastings (M-H) algorithm, 
7:178 

M-H algorithm, 7:179 
MIB 30, 777:402^03,777:402/ 777:403/ 
Microsoft, 77:722/. See also Excel 
Midsquare technique, 777:647 
Migration mode 

calculation of expected/unexpected 
losses under, 7:376f 
expected loss under, 7:373-374 


Miller, Merton, 77:467, 77:473 
MiniMax (MM) risk measure, 777:356 
Minimization problems, solutions to, 
77:683-684 

Minimum-overall-variance portfolio, 
7:69 

Minority interest, on the balance 
sheet, 77:536 

Mispricing, risk of, 77:691-692 
Model creep, 77:694 
Model diagnosis, 777:367-368 
Model estimation, in non-IDD 
framework, 777:278 
Modeling 

calibration of structure, 777:549-550 
changes in mathematical, 77:480-481 
discrete vs. continuous time, 777:562 
dynamic, 77:105 
issues in, 77:299 

nonlinear time series, 77:427—428, 
77:430^133 
quantitative, 77:481 
Modeling techniques 
non-parametric/nonlinear, 77:375 
Model risk 

of agency ratings, 77:728-729 
awareness of, 7:145, 77:695-696 
with computer models, 77:695 
consequences of, 77:729-730 
contribution to bond pricing, 
77:727-728 

defined, 7:331, 77:691, 77:697 
discussion of, 77:714-715 
diversification of, 77:378 
endogenous, 77:694-695, 77:697 
in finical institutions, 77:693 
guidelines for institutions, 
77:696-697 

management of, 77:695-697, 77:697 
misspecification of, 77:199 
and robustness, 77:301 
of simple portfolio, 77:721-726 
sources of, 77:692-695 
Models. See also operational risk 
models 

accuracy in, 777:321 
adjustment, 77:502 
advantages of reduced-form, 7:533 
analytical tractability of, 777:549-550 
APD, 777:18, 777:20-22, 777:21/, 777:26 
application of, 77:694 
appropriate use of classes of, 
777:597-598 
arbitrage-free, 777:600 
autopredictive, 77:502 
averages across, 77:715 
bilinear, 77:403^04 
binomial, 7:114-116, 7:119 
binomial stochastic, 77:10-11 


block maxima, 77:745 
choosing, 777:550-552 
comparison of, 777:617 
compatibility of, 777:373 
complexity of, 77:704, 77:717 
computer, 7:511, 77:695 
conditional normal, 77:733-734 
conditional parametric fat-tailed, 
77:744 

conditioning, 77:105 
construction of, 77:232-235 
for continuous processes, 7:123 
creation of, 77:100-102 
cross-sectional, 77:174-175,77:1757 
cumulative return of, 77:234 
defined, 77:691, 77:697 
to describe default processes, 7:313 
description and estimation of, 
77:256-257 

designing the next, 777:590-591 
determining, 77:299-300 
disclosure of, 7:410 
documentation of, 77:696 
dynamic factor, 77:128, 77:131, 
777:126-127 

dynamic term structure, 777:591 
econometric, 77:295, 77:304 
equilibrium forms of, 777:599-600 
equity risk, 77:174, 77:178-191, 77:192 
error correction, 77:3817, 77:387-388, 
77:394-395 

evidence of performance, 7:308-309, 
77:233 

examples of multifactor, 77:139-140 
financial, 7:139,77:479-480 
forecasting, 77:112, 77:303-304 
for forecasting, 777:411 
formulation of, 777:128-131 
fundamental factor, 77:244, 77:248 
generally, 77:360-362 
Gordon-Shapiro, 77:17-18 
Heath-Jarrow-Morton, 777:586-587, 
777:589 

hidden-variable, 77:128, 77:131 
linear, 77:264, 77:310-311,77:348, 
77:507-508 

linear autoregressive, 77:128, 
77:130-131 

linear regression, 7:91, 7:163-170, 
77:360, 77:414^15 
liquidation process, 7:342 
martingale, 77:127-128,777:520-521 
MGARCH, 77:371-372 
model-vetting procedure, 77:696-697 
moving average, 777:414 
multifactor, 77:231-232,777:92 
multivariate extensions of, 
77:370-373 
no arbitrage, 777:604 
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nonlinear, II: 402-421,11:417-418 
penalty functions in, 11:703 
performance measurement of, 11:301 
predictive regressive, 11:130 
predictive return, 11:128-131 
for pricing, 11:127-128 
pricing errors in, 1:322 
principals for engineering, 
11:482-483 
probabilistic, 11:299 
properties of good, 1:320 
ranking alternative. III: 368-370 
recalibration of, 11:713-714 
reduced form default, 1:310,1:313 
regressive, 11:128,11:129-130 
relative valuation, 1:260 
return forecasting, 11:119 
returns of, 11:2331 
robustness of, 11:301 
selection of, 1:145,11:298,11:692-693, 
11:699-701 
short-rate, 1:494 
single-index market, 11:317-318 
static, 11:297,111:573 
static regressive, 11:129-130 
static vs. dynamic, 11:295-296,11:304 
statistical, 11:175,11:1751 
stochastic, 1:557,111:124-125 
structural, 1:305,1:313-314,1:341-342 
structural os. reduced, 1:532-533 
subordinated, 11:742-743 
temporal aggregation of, 11:369 
testing of, 11:126-127,11:696-697 
time horizon of, 11:300-301 
time-series, 11:175,11:1751 
tree, 11:381, 111:22-23 
tuning of, 111:580-581 
two-factor, 1:494 
univariate regression, 1:165 
usefulness of, 11:122 
use of in practice, 1.494M96,111:6001 
Models, lattice 

binomial, 111:610,111:610/ 
Black-Karasinski (BK) lattice, 111:611 
Hull White binomial. 111: 610-611 
Hull White trinomial, 111:613 
trinomial, 111:610,111:610/, 

111:611-612 
Models, selection of 
components of, 11:717 
generally, 11:715-717 
importance of, 11:700 
machine learning approach to, 
11:701-703,11:717 
uncertainty/noise in, 11:716-717 
use of statistical tools in, 11:230 
Modified Accelerated Cost Recovery 
System (MACRS), 11:538 
Modified Restructuring clause, 1:529 


Modified tempered stable (MTS) 
processes, 111:513 
Modigliani, Franco, 11:467,11:473 
Modigliani-Miller theorem, 1:343, 

1:344,11:473,11:476 
Moment ratio estimators, 111:274 
Moments 

exponential. 111:255-256 
first. 111:201-202 
of higher order. 111:202-205 
integration of, 11:367-368 
raw, 11:739 
second, 111:202 
types of, 11:125 
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portfolios based on, 11:181 
Momentum factor, 11:226-227 
Money, future value of, 11:596-600 
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Money markets, 1:279,1:282,1:314, 
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sequences in, 1:378-379 
speed of, 111:644 
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11:415 
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1:437,1:439-440,1:455 
Net free cash flow (NFCF), 11:572-574, 
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11:418/ 11:701-702 
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Noise 

continuous-time. III :486 
in financial models, 11:7 21-722 
in model selection, 77:716-717 
models for, 77:726 
reduction of, 77:51-52 
Noise, white 
defined, 7:82, 77:297 
qualities of, 77:127 
sequences, 77:312, 77:313 
in stochastic differential equations, 
777:486 
strict, 77:125 

us. colored noise, 777:275 
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model, 77:417 

Nonlinear dynamics and chaos, 77:645, 
77:652-654 
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tests of, 77:421-427 

Non-normal probability distributions, 
77:480 

Nonparametric methods, 77:411^16 
Normal distributions, 7:81,7:82f, 
7:177-178, 777:638/ 
and AVaR, 777:334 
comparison with o'-stable, 777:234/ 
fundamentals of, 77:731-734 
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Gaussian distribution) 
likelihood function, 7:142-143 
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777:98-99 
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7:387 

properties of, 77:732-733,777:209-210 
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standard, 777:208 

standardized residuals from, 77:751 
use of, 77:752n 
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distribution, 777:211 
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777:209/ 
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processes, 777:513 
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NTS distribution, 777:257n 
Null hypothesis, 7:157, 7:170,777:362 
Numeraire, change of, 777:588-589 
Numerical approximation, 7:265 
Numerical models for bonds, 
7:273-275 

OAS (option-adjusted spread). See 
option-adjusted spread 
Obligations, deliverable, 7:231, 7:526 
Observations, frequency of, 777:404 
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77:696 

Odds ratio, posterior, 7:157 
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method, 777:57-58 

Oil industry, free cash flows of, 77:570 
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ordinary least squares (OLS) 
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Operating cash flow (OCF), 77:23 
Operating cycles, 77:551-554 
Operating profit margin, 77:556 
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777:115/ 
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near-miss, 777:84-85 
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process of occurrence, 777:86/ 
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types of, 777:81, 777:88 
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approaches to, 777:103-104 
assumptions in, 777:104 
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777:103-104,777:104-105, 777:118 
parametric approach, 777:104, 
777:105-110,777:118 
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classifications of, 777:83-88,777:87-88, 
777:87/ 777:88 
defined, 777:81-83,777:88 
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777:86/ 

indicators of, 777:83 
models of, 777:91-96 
nature of, 777:99 
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sources of, 777:82 
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distinctions between, 777:85-87 
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actuarial (statistical) models, 777:95 
bottom-up, 777:92/ 777:94-96, 777:99 
causal, 777:94 
expense-based, 777:93 
income-based, 777:93 
multifactor causal models, 777:95 
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process-based, 777:94-95 
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reliability, 777:94-95 
top down, 777:92-94,777:99 
types of, 777:91-92 
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addition, 77:625, 77:626 
defined, 77:628 

inverse and adjoint, 77:626-627 
multiplication, 77:625-626, 77:626 
transpose, 77:625,77:626 
vector, 77:625-626 
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Ophelimity, concept of, 77:469 
Opportunity cost, 7:435,7:438,7:439, 
77:596, 777:623 

Optimal exercise, 7:515-516 
Optimization 

algorithms for, 777:124 
complexity of, 77:82 
constrained, 7:28-34 
defined, 777:434-435 
local us. global, 77:378 
in MATLAB, 777:434-444 
unconstrained, 7:22-28 
Optimization theory, 7:21 
Optimization Toolbox, in MATLAB, 
777:435-436, 777:436/ 

Optimizers, using, 77:115-116, 77:483 
Option-adjusted spread (OAS) 
calculation of, 7:253-255 
defined, 7:254, 777:11 
demonstrated, 7:254/ 
determination of, 7:259 
implementation of, 7:257 
and market value, 7:258 
results from example, 777:617/ 
and risk factors, 777:599 
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1: 264-265 
usefulness of. 777:3 
values of, 1: 267,7:268 
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Option premium, 7:508-509 
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Option premium profiles, 7:512,7:512/ 
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components of, 7:484^85, 7:511-512 
factors influencing, 7:486-487, 7:486f, 
7:487-488, 7:522-523 
models for, 7:490 
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American, 77:664-665, 77:669-670, 
77:674-679, 77:679-681 
American-style, 7:444, 7:454—455, 
7:490 

Asian, 77:663-664, 77:668-669, 
777:642-643 

on the average, 77:663-664 
barrier, 77:662-663 
basic properties of, 7:507-508 
basket, 77:662, 77:672 
Bermudean, 77:663-664, 777:597 
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costs of, 7:441-442, 777:11-12 
difference from forwards, 7:437-439 
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models of, 7:510-511 
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7:484-488, 7:507, 777:408 
theoretical valuation of, 7:508-509 
time premiums of, 7:485 
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valuing of, 7:252-253,777:639 
vanilla, 77:661, 777:655 
volatility of, 7:488 
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in differential equations, 77:643, 
77:644-645 
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limit, 777:625, 777:631 
market, 777:625, 777:631 
Order statistics, 777:269-270 
bivariate, 777:293-295 
joint probability distributions for, 
777:291-292 
use of, 777:289 
for VaR and ETL, 777:292f 
in VaR calculations, 777:291 
Ordinary differential equations 

(ODE), 77:644-645,77:646-648, 
77:648-652, 77:649/ 

Ordinary least squares (OLS) 
alternate weighting of, 77:438—439 
estimation of factor loadings matrix 
with, 77:165 

in maximum likelihood estimates, 
77:313-314 

pictorial representations of, 
77:437-438, 77:438/ 
squared errors in, 77:439^40 
use of, 7:165, 7:172n, 77:353 
vs. Theil-Sen estimates of beta, 
77:442/ 

vs. Theil-Sen regression, 77:441 1 
Ornstein-Uhlenbeck process 
with change of time, 777:523 
and mean reversion, 7:263, 7:264/ 
solutions to, 777:492 
use of, 7:89, 7:95 
and volatility, 777:656 
Outcomes, identification and 

evaluation of worst-case, 
777:379-380 
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in data sets, 77:200 

detection and management of, 77:206 
effect of, 77:355/ 77:442^143 
and market crashes, 77:503 
in OLS methods, 77:354 
in quantile methods, 77:355-356 
and the Thiel-Sen regression 
algorithm, 77:440 

Out-of-sample methodology, 77:238 

Pair trading, 77:710 
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events, 777:158 

Parallel yield curve shift assumption, 
777:12-13 
Parameters 
calibration of, 77:693 
density functions for values, 777:229/ 
777:230/ 777:231/ 
distributions of, 77:721 
estimation of for random walk, 7:83 
robust estimation of, 77:77-78 
stable, 777:246/ 

Parametric methods, use of, 77:522 


Parametric models, 77:522-523, 
77:526-527 

Par asset swap spreads, 7:530,7:531 
Par CDS spread, 7:531 
Par-coupon curve, 777:561 
Pareto, Vilfredo, 77:467,77:468-469, 
77:474 

Pareto(2) distribution, 77:441 
Pareto distributions 
density function of, 77:738 
generalized (GPD), 77:745-746, 
77:747, 777:230-231 
in loss distributions, 777:108-109 
parameters for determining, 77:738 
stable, 77:738-741 
stable/varying density, 77:739/ 
tails of, 77:751 
Pareto law, 77:469 
Pareto-Levy stable distribution, 
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Partial differential equations (PDEs) 
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77:660-665 

framework for, 7:261, 7:265, 77:675, 
777:555 

pricing European options with, 
77:665-674 

usefulness of, 77:659-660 
use of, 777:18-19 
Partitioning, binary recursive, 
77:376-377, 77:376/ 

Paths 

in Brownian motion, 777:501,777:502/ 
dependence, 777:18-19 
stochastic, 77:297 
Payments, 7:229, 77:611-612 
Payment shock, 777:72 
Payoff-rate process, 7.T21-122 
Payoffs, 777:466,777:638-639 
PCA (principal components analysis). 
See principal component 
analysis (PCA) 

Pearson skewness, 777:204-205 
Pension funds, constraints of, 77:62 
Pension plans, 77:541, 777:132 
P/E (price/earnings) ratio, 77:20-21, 
77:38 
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77:615-617 

Percolation models, 777:276 
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(PSPs), 7:36, 7:37 
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Phillips-Perron statistic, 77:386, II: 398 
Pickand-Balkema-de Haan theorem, 
11:746 

Pickand estimator. III :273 

Pliska, Stankey, II :476 

Plot function, in MATLAB, III: 428M32 

P-null sets, 111:197 

Pochhammer symbol. III :256 

Poincare, Henri, II :469 

POINT® 

features of, ff.T93n, 17:291n 
modeling with, 11:182 
screen shot of, II:287f 71:288/ 
use of, 11:179, 77:189, 77:286-287 
Point processes, 777:270-272 
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distribution tails for, 

777:540-541 
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777:540 
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compounded, 777:497 
homogeneous, 777:270-271 
and jumps, 7:93, 777:498, 777:540 
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as stochastic process, 777:496,777:497, 
777:506 

use of, 7:262, 7:315-316 
Poisson variables, distribution of, 
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Policy iteration algorithm (Howard 
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real world, 7:190 
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defined, 7:36 

formulation of theory, 77:476 
max-min problem, 777:139 
models of, 77:84—85n 
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techniques of, 77:115-116 
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allocation of, 7:192-193,77:72 
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777:637-638 
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quadratic approximation for value, 
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replication of, 77:476 
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returns of, 7:6-7 
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results from, 77:225/ 
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function, 7.T42-143 
Positive homogeneity property, 
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Posterior distribution, 7:159, 7:165 
Posterior odds ratio, 1:157 
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7:158-159 
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measure, 777:356 
Power law, 771:234-235 
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Power sets, 777:156,777:1567 
Precision, 7.T58, 77:702 
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Predictive return modes, adoption of, 
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Preferred habitat hypothesis, 
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Prepayments 
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calculating speeds of. III: 50-56 
in cash-flow yields. III:4 
conditional rate of (CPR), III: 30, 

III: 50-51, III: 58-59 
defaults and involuntary. III: 59, 
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defined. III :50 
disincentives for, 111:7-8 
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effect of time on rates of, 111:73-74 

evaluation of. III :62 
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fundamentals of. III: 66-69 
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interactions with defaults, 111:76-77 
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modeling of, 7:258, 7:267, 7:268, 
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Price patterns, scaling in, 777:279 
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changes in, 77:722/ 77:723/ 77:742, 
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dirty, 7:382 
distribution of, 7:510 
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natural logarithm of, 7:85 
path-dependent, 777:193n 
strike, 7:484-485, 7:486 
truncation of, 777:304 
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Price time series, autocorrelation in, 
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grids for, 777:18-19 
linear, 7:52-55 
models for, 77:127-128 
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Probability-integral transformation 
(PIT), 777:365 
Probability law, 777:161 
Probability measures, 777:157-159, 
777:594-597 
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worst returns for, 777:3827 
State dependent models (SDMs), 7:342, 
7:351-352 

Statement of stockholders' equity, 
77:541 

State price deflators 
defined, 7:103,7:129-130 
determining, 7:118-119, 7:124 
formulas for, 7:107-108, 7:109-110 
in multiperiod settings, 7:105 
and trading strategy, 7:106 
State prices 
and arbitrage, 7:55-56 
condition, 7:54 
defined, 7:101-102 
and equivalent martingale 
measures, 7:133-134 
vectors, 7:53-55,7:58, 7:119 
States, probabilities of, 7:115 
States of the world, 7:457-458,7:459, 
77:306,77:308,77:720 
State space, 7:269n 
Static factor models, 77:150 
Stationary series, trend vs. difference, 
77:512-513 

Stationary univariate moving average, 
77:506 


Statistical concepts, importance of, 
77:126-127 

Statistical factors, 77:177 
Statistical learning, 77:298 
Statistical methodology, EWMA, 
777:409 

Statistical tests, inconsistencies in, 
77:335-336 

Statistics, 77:387, 77:499 
Stein paradox, 7:194 
Stein-Stein model, 77:682 
Step-up callable notes, valuing of, 
7:251-252 

Stochastic, defined, 777:162 
Stochastic control (SC), 777:124 
Stochastic differential equations 
(SDEs) 

binomial/trinomial solutions to, 
777:610-613 

with change of time methods, 
777:523 

defined, 77:658 
examples of, 777:523-524 
generalization to several 

dimensions with, 777:490^91 
intuition behind, 777:486-487 
modeling states of the world with, 
777:127 

for MRAM equation, 777:525-526 
setting of change of time, 777:521 
solution of, /77:491M93 
steps to definition, 777:487 
usefulness of, 777:493 
use of, 77:295,777:485-486, 

777:489-490,777:536, 777:603, 
777:619 

Stochastic discount factor, 7:57-58 
Stochastic integrals 
defined, 777:481-482 
intuition behind, 777:473-475 
in Ito processes, 777:487 
properties of, 777:482^83 
steps in defining, 777:474^75 
Stochastic processes 
behavior of, 7:262 
characteristic function of, 777:496 
characteristics of, 77:360 
continuous-time, 777:496, 777:506 
defined, 7:263-264, 7:269n, 77:518, 
777:476, 777:496 
discrete time, 77:501 
properties of, 77:515 
representation of, 77:514-515 
and scaling, 777:279 
specification of, 77:692-693 
Stochastic programs 
features of, 777:124, 777:132 
Stochastic time series, linear, 
77:401^02 


Stochastic volatility models (SVMs) 
with change of time, 777:520 
continuous-time, 777:656 
discrete, 777:656-657 
importance of, 777:658 
for modeling derivatives, 

777:655-656 

multifactor models for, 7/7:657-658 
and subordinators, 777:521-522 
use of, 777:653, 777:656 
Stock indexes 
interim cash flows in, 7:482 
risk control against, 77:262-263 
Stock markets 
bubbles in, 77:386 
as complex system, 77:47^8 
1987 crash, 77:521, 777:585-586 
dynamic relationships among, 
77:393-396 

effects of crises, 777:233-234 
variables effects on different sectors 
of, 77:55 

Stock options, valuation of long-term, 
7:449 

Stock price models 
binomial, 777:161,777:171-173,777:173/ 
multinomial, 777:180-182, 777:181/, 
777:184 

probability distribution of 
two-period, 777:1817 
Stock prices 
anomalies in, 77:1117 
behavior of, 77:58 
correlation of, 7:92-93 
and dividends, 77:4-5 
lognormal, 777:655-656 
processes of, 7:125 

Stock research, main areas of, 77:244f 
Stock returns, 77:56,77:159/ 

Stocks 

batting average of, 77:99, 77:99/ 
characteristics of, 77:204 
common, 77:4, 77:316-322 
cross-sectional, 77:197 
defined, 77:106 
defining parameters of, 77:49 
determinants of, 77:245/ 
execution price of, 777:626 
fair value os. expected return, 77:13/ 
finding value for XYZ, Inc., 77:31f 
information coefficient of, 77:98/ 
information sources for, 77:90/ 
measures of consistency, 77:99-100 
mispriced, 77:6-7 

quantitative research metrics tests, 
77:97-99 

quintile spread of, 77:9// 
relative ranking of, 7:196-197 
review of correlations, 77:101/ 
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sale/terminal price of, 77: 5 
short selling of, 7.-432M33 
similarities between, 77:245/ 
sorting of, 77:215 
testing of, 77:95, 77:96/ 
that pay no dividend, 77:17 
use of, 77:90 

valuation of, 77:6, 77:8-9, 77:14, 
77:18-19 

weightings of, 77:101/ 

Stock selection 
models for, 77:197 
in quantitative equity investment 
process, 77:105 
quantitative model, 77:94-95 
for retail sector, 77:94/ 
strategies for, 77:195 
tree for, 77:379-381, 77:380/ 

Stopping times, 77:685 
Straontonovich, Ruslan, 77:470 
Strategies, backtesting of, 77:235-236 
Stress tests, 7:412,7:417,7:418, 777:93, 
777:596-597 

Strike price, 7:509, 7:514 
Strong Law of Large Numbers 

(SLLN), 7:270n, 777:263-264 
Structural breaks, 7:167, 777:274-275 
Student's f distribution 
applications to stock returns, 
777:215-216 

and AVaR, 777:334-335 
classical, 77:734-738 
density function of, 77:735 
discussion of, 777:213-216 
distribution function of, 777:215/ 
for downside risk estimation, 
777:386-387 

fitting and simulation of, 77:737-738 
heavy tails of, 7:160n, 7:176, 

77:747-748, 77:751, 777:227-228 
limitations of, 77:736 
in modeling credit risk, 7:387-388 
normals representation in, 
7:177-178 

skewed, 77:736-737, 77:753n 
skewness of, 777:390 
standard deviation of, 7:173n 
symmetry of, 777:387 
tails of, 777:392 

use of, 7:153-154, 7:172n, 777:234 
Student's t-test, 77:219 
Sturge's rule, 77:495 
Style analysis, 77:189 
Style factors, 77:247 
Style indexes, 77:48 
Stylized facts, 77:503-504 
Subadditivity property, 777:328 
Subordinated processes, 7:186n, 
777:277, 777:521-522 


Successive over relaxation (SOR) 
method, 77:677 

Summation stability property 
(Gaussian distribution), 
77:732-733 

Supervisory Capital Assessment 
Program, 7:300, 7:412 
Support, defined, 777:200 
Survey bias, 7:293 
Survival probability, 7:533-535 
Swap agreements, 7:434, 7:435-436n 
Swap curves, 7:226, 77:275-276 
Swap rates, 7:226,777:536/ 

Swaps 

with change of time method, 777:522 
covariance/correlation, 7:547-548, 
7:549-550, 7:552 
duration-matched, 7:285 
freight rate, 7:558 

modeling and pricing of, 7:548-550 
summary of studies on, 7:546f 
valuing of, 7:434-435 
Swap spread (SS) risk, 77:278,77:2787 
Swaptions, 7:502-503, 777:550 
Synergies, in conglomerates, 77:43—44 
Systematic risk, 77:290 
Systems 

homogenous, 77:624 
linear, 77:624 
types of, 77:47,77:58 

Tailing the hedge, defined, 7:433 
Tail losses 

in loss functions, 777:369-370 
Tail probability, 777:320 
Tail risk, 7:377, 7:385, 77:752 
Tails 

across assets through time, 
77:735-736 

behavior of in operational losses, 
777:111-112 

in density functions, 777:203 
dependence, 7:327-328, 7:387 
Gaussian, 777:98-99,777:260 
heavy, 77:734-744, 777:238 
modeling heaviness of, 77:742-743 
for normal and STS distributions, 
777:2467 

power tail decay property, 77:739, 
777:244 

properties of, 777:261-262 
tempering of, 77:741 
Takeovers, probability of, 7:144-145 
Tangential contour lines, 7:29-30,7:30/ 
7:32/ 

Tanker market, 7:565 
TAR-F test, 77:426 

TAR(l) series, simulated time plot of, 
77:404/ 


Tatonnement, concept of, 77:468 
Taxes 

and bonds, 7:226 
capital gains, 77:73 
cash, 77:573 

for cash/futures transactions, 7:484 
complexity of, 77:73-74 
deferred income, 77:535, 77:538 
effect on returns, 77:83-84, 77:84, 
77:85n 

in financial statements, 77:541 
impact of, 7:286-287 
incorporating expense of, 77:73-75 
managing implications of, 777:146 
and Treasury strips, 7:218 
Tax policy risk, 77:282-283 
Technology, effect of on relative 
values, 77:37 

Telescoping futures strategy, 7:433 
Tempered stable distributions 
discussions of, 777:246-252, 
777:384-386 

generalized (GTS), 777:249 
Kim-Rachev (KRTS), 777:251-252 
modified (MTS), 777:249-250 
normal (NTS), 777:250-251 
probability densities of, 777:247/ 
777:248/ 777:250/ 777:252/ 
rapidly decreasing (RDTS), 777:252 
tempering function in, 777:254, 
777:258n 

Tempered stable processes, 

777:499-501, 777:5007, 777:512-517 
Tempering functions, 777:254, 777:2557 
Templates, for data storage, 77:204 
Terminal profit, options and forwards, 
7:438/, 7:439/ 

Terminal values, 77:45 
Terminology 

of delinquency, default and loss, 
777:56 

of prepayment, 777:49-50 
standard, of tree models, 77:376 
Term structure 

in contiguous time, 777:572-573 
continuous time models of, 
777:570-571 
defined, 777:560 
eclectic theory of, 777:570 
of forward rates, 777:586 
mathematical relationships of, 
777:562 

modeling of, 7:490-494,777:560 
of partial differential equations, 
777:583-584 

in real world, 777:568-570 
Term structure modeling 
applications of, 777:584-586 
arbitrage-free, 777:594 
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Term structure modeling ( Continued) 
calibration of. III:580-581 
discount function in. III: 565 
discussion of. III: 560-561 
Term structure models 
approaches to. III: 603-604 
defined, 1: 262,1:263 
discrete time. III: 562-563 
discussion of. III: 561-562 
of interest rates, 1:314 
internal consistency checks for, 

III: 581 

with no mean reversion. III: 613-616 
for OAS, 1:265-267 
quantitative. III: 563 
static vs. dynamic. III: 561-562 
Term structures. III: 567-568, III: 570, 
111:579,111:587 

Tests 

Anderson-Darling (AD), 111:112-113 
BDS statistic, 11:423-424,11:427 
bispectral, 1TA22-A23 
cointegration, 11:708-710 
Kolmogorov-Smirnov (KS), 
111:112-113 

monotonic relation (MR), 11:219 
nonlinearity, II:426-427, 11:427f 
nonparametric, 11:422^424 
out-of-sample vs. in-sample, 

11:236 

parametric, 11:424^26 
RESET, 11:424-425 
run tests. III: 364 
threshold, 11:425-426 
for uniformity, 111:366 
TEV (tracking error volatility), 11:180, 
11:186,11:272-274,11:286-287 
Theil-Sen regression algorithm, 
11:440-442,11:443-446, 
ll:444f 

The Internal Measurement Approach 
(BIS), III.TOOn 

Theoretical value, determination of, 
111 : 10-11 

Theorie de la Speculation (The Theory of 
Speculation) (Bachelier), 

11:121-122,11:469 

Theory of point processes, 11:470-471 
Three Mile Island power plant crisis, 
11:51-52 

Three-stage growth model, 11:9-10 
Threshold autoregressive (TAR) 
models, 11:404—408 
Thresholds, 11:746-747 
Through the cycle, defined, 1:302-303, 
1:309-310 

Thurstone, Louis Leon, 11:154 
Tick data. See high-frequency data 
(HFD) 


Time 

in differential equations, 11:643-644 
physical vs. intrinsic scales of, 11:742 
use of for financial data, 11:546-547 
Time aggregation, 11:369 
Time decay, 1:509,1:513,1:521/ 

Time dependency, capture of, 

11:362-363 

Time discretization, 11:666,11:679 
Time increments 
models of, 1:79 
in parameter estimation, 1:83 
Time intervals, size of, 11:300-301 
Time lags, 11:299-300 
Time points, spacing of, 11:501 
Time premiums, 1:485 
Time series 

autocorrelation of, 11:331 
causal, 11:504 
concepts of, 11:501-503 
continuity of, 1:80 
defined, 11:501-502,11:519 
fractal nature of. III: 480 
importance of, 11:360 
multivariate, 11:502 
stationary, 11:502 
stationary/nonstationary, 11:299 
for stock prices, 11:296 
Time to expiry, 1:513 
Time value, 1:513,1:513/, II:595-596 
TIPS (Treasury inflation-protected 
securities) 

and after-tax inflation risk, 1:287 
apparent real yield premium, 1:293/ 
effect of inflation and flexible price 
CPI, 1:292/ 
features of, 1:277 
and flexible price CPI, 1:291/ 
and inflation, 1:290,1:294 
performance link with short-term 
inflation, 1:291-292 
real yields on, 1:278 
spread to nominal yield curve, 
1:281/ 

volatility of, 1:288-290,1:294 
vs. real yield, 1:293-294 
10-year data, 1:279-280 
yield of, 1:284 
yields from, 1:278 

TLF model, strengths of. III:388-389 
Total asset turnover ratio, 11:558 
Total return reports, II:237t 
Total return swaps, 1:540-542, 
1:541-542 

Trace test statistic, 11:392 
Tracking error 

actual vs. predicted, 11:69 
alternate definitions of, 11:67-68 
defined, II: 115,11:119 


estimates of future, 11:69 
as measure of consistency, 11:99-100 
reduction of, II:262-263 
standard definition, 11:67 
with TIPS, 1:293 

Tracking error volatility (TEV). See 
TEV (tracking error volatility) 
Trade optimizers, role of, II: 116-117 
Trades 

amount needed for market impact, 
III: 624 

cash-and-carry, 1:487 
crossing of, 11:75 

importance of execution of. III: 623, 
III: 631 

measurement of size. Ill: 628 
in portfolio construction, 11:104, 
11:116-117 

round-trip time of, 11:451 
size effects of, 111:372,111:630 
speed of, 11:105 
timing of. III:628-629 
Trading costs, 11:118, III:627-628, 

Ill:631-632 

Trading gains, defined, 1:122,1.T23 
Trading horizons, extending. Ill: 624 
Trading lists, ll:289f 
Trading strategies 
backtesting of, II:236-237 
categories of, 11:195 
in continuous-state, 

continuous-time, 1:122 
development of factor-based, 
11:197-198,11:211 
factor-based, 11:195, II:232-235 
factor weights in, 11:233/ 
in multiperiod settings, 1:105 
risk to, 11:198-200 
self-financing, 1:126-127,1:136 
Trading venues, electronic, 11:57 
Training windows, moving, 11:713-714 
Tranches, III: 38,111:391, Ill:45 
Transaction costs 
in backtesting, 11:235 
in benchmarking, 11:67 
components of, 11:119 
consideration of, 11:64,11:85-86n 
dimensions of, 111:631 
effect of, 1:483 
figuring, 11:85n 
fixed, 11:72-73 
forecasting of, 11:113-114 
incorporation of, 11:69-73,11:84 
international. III: 629 
linear, 11:70 

and liquidity. III:624-625 
managing, II1.T46 
measurement of, 111:626 
piecewise-linear, 11:70-72,11:71/ 
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quadratic, 11:7 2 
in risk modeling, II: 693 
types of. III: 623 
Transformations, nonlinear, 

III: 630-631 

Transition probabilities, 1:368, J:381f 
Treasuries 

correlations of, lll:405f 
covariance matrix of, fff:406f 
curve risk, 11:2771 
discount function for. 111:564-565 
futures, 1:482 
inflation-indexed, 1:286 
movements of, 111:403/ 
on-the-run, 1:227,111:7, III: 560 
par yield curve, !:218f 
spot rates, 1:220 
3-month, 1M15M16,11:416/ 
volatility of, 111:404-406, !!!:406f 
Treasury bill rates, weekly data, 1:89/ 
Treasury inflation-protected securities 
(TIPS). See TIPS (Treasury 
inflation-protected securities) 
Treasury Regulation T (Reg T), 1:67 
Treasury securities, 1:210-211 
comparable, defined. III :5 
in futures contracts, 1:483 
hypothetical, illustration of 
duration / convexity, 

111:308-310,111:3081 
maturities of, 1:226 
options on, 1:490 
par rates for, 1:217 
prediction of 10-year yield, 

II:322-328 
valuation of, 1:216 
yield of, Il:324-327t 
Treasury strips, 1:2181,1:220-221,1:286, 
111:560 

Treasury yield curves, 1:226, Ill:561 
Trees/lattices 

adjusted to current market price, 
1:496/ 

bushy trees, 1:265,1:266/ 
calibrated, 1:495 

convertible bond value, 1:274-275 
extended pricing tree, 111:23/ 
from historical data, 111:131/ 
pruning of, 11:377 
stock price, 1:274 
three-period scenario, 111:131/ 
trinomial, 1:81,1:273,1:495^96 
use of in modeling, 1:494-496 
Trees/lattices, binomial 
building of, 1:273 
for convertible bonds, 1:275/ 
discussion of, 1:80-81 
interest rate, 1:244 
model of, 1:273-275 


stock price model, 111:173 
term structure evolution, 1:495/ 
use of, 1:114-115,1:114/ 

Trends 

deterministic, 11:383 
in financial time series, 11:504 
and integrated series, 11:512-514 
stochastic, 11:383,11:384 
Treynor-Black model., I:203n 
Trinomial stochastic models, 11:11-12 
Truncated Levy flight (TLF), III:382, 
III:384-386 
IDD in, 111:386 
time scaling of, 111:385/ 

Truncation, III:385-386 
Truth in Savings Act, 11:615 
T-statistic, ll:240n, 11:336,11:350,11:390 
Tuple, defined. 111: 157 
Turnover 

assessment of, 111:68 
defined, 111:66 
in MBSs, 111:48 
in portfolios, 11:234,11:235 
Two beta trap, 1:74—77 
Two-factor models. 111:553-554 
Two-stage growth model, II:9 

U.K. index-linked gilts, tax treatment 
of, 1:287 
Uncertainties 

and Bayesian statistics, 1:140 
in measurement processes, 11:367 
modeling of, 11:306,111.T24, 

III T31-132 

and model risk, 11:729 
quantification of, 1:101 
representation of. Ill..128 
time behavior of, 11:359 
Uncertainty sets 

effect of size of, III.T43 
in portfolio allocation, 11:80 
selection of. III:T40-141 
structured, 111:143-144 
in three dimensions, 11:81/ 
use of, I11.T38,111:140 
Uncertain volatility model, 11:673-674 
Underperformance, finding reasons 
for, 11:118 

Underwater, on homeowner's equity, 
111:73 

Unemployment rate 

as an economic measure, 11:398 
application of TAR models to, 

II:405M06 

characteristics of series, 11:430 
forecasts from, 11:433 
performance of forecasting, 

II:432M33, Il:432f 
and risk, II:292n 


test of nonlinearity, 11:431, Il:431f 
time plot of, 11:406/ 11:430/ 
Uniqueness, theorem of. Ill:490 
Unit root series, 11:385 
Univariate linear regression model, 

1:163-170 

Univariate stationary series, 11:504 
U.S. Bankruptcy Code. Sec also 
bankruptcy 
Chapter 7,1:350 
Chapter 11,1:342,1:350 
Utility, 1:56,11:469,11:471,11:719-720 

Validation, out of sample, 11:711 
Valuation 

arbitrage-free, 1:216-217,1:220-222, 
1:2211 

and cash flows, 1:223 
defined, 1:209 

effect of business cycle on, 1:303-304 
fundamental principle of, 1:209 
with Monte Carlo simulation, 

111 : 6-12 

of natural gas/oil storage, 1:560-561 
of non-Treasury securities, 

1:222-223 

relative, 1:225,11:34-40,11:44-45 
risk-neutral, 1:557, Ill:595-596, 

111:601 

total firm, II: 21-23 
uncertainty in, 11:15 
use of lattices for, 1:240 
Value 

absolute vs. relative basis of, 

1:259-260 

analysis of relative, 1:225 
arbitrage-free, 1:221 
book os. market of firms, II: 559-560 
determining present, 11:600-601 
formulas for analysis of, 11:238-239 
identification of relative, 1:405 
intrinsic, 1:484-485 
present, discounted, 11:601/ 
relative, 1:405,11:37-38 
vs. price, l:455n 

Value at risk (VaR). See also CVaR 
(credit value at risk) 
in backtesting, 11:748 
backtesting of, 11:749/ 111:325-327, 
111:365-367 
boxplot of, 111:325/ 
and coherent risk measures. Ill:329 
conditional. III:332, HI:355-356, 
111:382 

deficiencies in, 1:407, III:321, 

111:331-332,111:347 
defined, ll:754n. 111:319-322 
density and distribution functions, 
111:320/ 
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Value at risk (VaR) ( Continued) 
determining from simulation, 

III: 639/ 

distribution-free confidence 
intervals for. III: 292-293 
estimation of, II: 366, III: 289-290, 
111:373-376, III: 644, 777:644f 
exceedances of. III: 325-326 
IDD in, 111:290 

interest rate covariance matrix in, 
111:403 

levels of confidence with, 

777:290-291 

liquidity-adjusted, 111:374,111:376 
in low market volatility, 17:748 
measurements by, 77:354 
methods of computation, 777:323 
modeling of, 77:130-131,777:375-376 
and model risk, 77:695 
normal against confidence level, 
777:294/ 

portfolio problem, 7:193 
in practice, 777:321-325 
relative spreads between 

predictions, 77:750/, 77:751/, 
77:752/ 

as safety-first risk measure, 

777:355 

standard normal distribution of, 
777:3247 
use of, 77:365 

os. deviation measures, 777:320-321 
Value of operations, process for 
finding, 77:307 
Values, lagged, 77:130 
Van der Korput sequences, 777:650 
Variables 

antithetic, 777:647-648 
application of macro, 77:193n 
behavior of, 777:152-153 
categorical, 77:333-334,77:350 
classification, 77:176 
declaration of in VBL, 777:457-458 
dependence between, 77:306-307 
dependent categorical, 77:348-350 
dependent/independent in CAPM, 
7:67 

dichotomous, 77:350 
dummy, 77:334 

exogenous us. endogenous, 77:692 
fat-tailed, 777:280 
independent and identically 
distributed, 77:125 
independent categorical, 77:333-348 
interactions between, 77:378 
large numbers of, 77:147 
macroeconomic, 77:54—55, 77:177 
in maximum likelihood 
calculations, 77:312-313 


mixing of categorical and 
quantitative, 77:334-335 
nonstationary, 77:388-393 
as observation or measurement, 
77:306 

random, 7:159n 
in regression analysis, 77:330 
separable, 77:647 
slope, 777:553 

split formation of, 777:130/ 
spread, 77:336 
standardization of, 77:205 
stationary, 77:385, 77:386 
stationary/nonstationary, 77:384-386 
stochastic, 777:159-164 
use of dummy, 77:335, 77:343-344 
Variables, random, 77:297 
a-stable, 777:242-244,777:244-245 
Bernoulli, 777:169 

continuous, 777:200-201, 777:205-206 
on countable spaces, 777:160-161, 
777:166 

defined, 777:162 
discrete, 777:165 
infinitely divisible, 777:253 
in probability, 777:159-164 
sequences of, 7:389 
on uncountable spaces, 777:161-162 
use of, 7:82 

Variance gamma process, 777:499, 
777:504 

Variance matrix, 77:370-371 
Variances 

addressing inequality of, 7:168 
based on covariance matrix, 77:1617, 
77:1637, 77:164/ 
conditional, 7:180 
conditional/unconditional, 77:361 
in dispersion parameters, 

777:202-203 
equal, 7:164 
as measure of risk, 7:8 
in probablity, 777:167-169 
reduction in, 777:647-651 
unequal, 7:167-168, 7:172 
Variances / covariances, 77:112-113, 
77:302-303,777:395-396 
Variance swaps, 7:545-547, 7:549, 

7:552 

Variational formulation, and finite 
element space, 77:670-672 
Variation margins, 7:478 
Vasicek model 

with change of time, 777:523-524 
for coupon-bond call options, 
7:501-502 

distribution of, 7:493 

in history, 7:491 

for short rates, 777:545-546 


use of, 7:89, 7:497 
valuing zero-coupon bond calls 
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It is often said that investment management 
is an art, not a science. However, since the 
early 1990s the market has witnessed a pro¬ 
gressive shift toward a more industrial view of 
the investment management process. There are 
several reasons for this change. First, with 
globalization the universe of investable assets 
has grown many times over. Asset managers 
might have to choose from among several 
thousand possible investments from around 
the globe. Second, institutional investors, of¬ 
ten together with their consultants, have en¬ 
couraged asset management firms to adopt 
an increasingly structured process with docu¬ 
mented steps and measurable results. Pressure 
from regulators and the media is another fac¬ 
tor. Finally, the sheer size of the markets makes 
it imperative to adopt safe and repeatable 
methodologies. 

In its modern sense, financial modeling is 
the design (or engineering) of financial instru¬ 
ments and portfolios of financial instruments 
that result in predetermined cash flows con¬ 
tingent upon different events. Broadly speak¬ 
ing, financial models are employed to manage 
investment portfolios and risk. The objective 
is the transfer of risk from one entity to an¬ 
other via appropriate financial arrangements. 
Though the aggregate risk is a quantity that can¬ 
not be altered, risk can be transferred if there is 
a willing counterparty. 

Financial modeling came to the forefront of 
finance in the 1980s, with the broad diffusion 


of derivative instruments. However, the con¬ 
cept and practice of financial modeling are quite 
old. The notion of the diversification of risk 
(central to modem risk management) and the 
quantification of insurance risk (a requisite for 
pricing insurance policies) were already under¬ 
stood, at least in practical terms, in the 14th cen¬ 
tury. The rich epistolary of Francesco Datini, 
a 14th-century merchant, banker, and insurer 
from Prato (Tuscany, Italy), contains detailed 
instructions to his agents on how to diversify 
risk and insure cargo. 

What is specific to modem financial model¬ 
ing is the quantitative management of risk. Both 
the pricing of contracts and the optimization of 
investments require some basic capabilities of 
statistical modeling of financial contingencies. 
It is the size, diversity, and efficiency of mod¬ 
ern competitive markets that makes the use of 
financial modeling imperative. 

This three-volume encyclopedia offers not 
only coverage of the fundamentals and ad¬ 
vances in financial modeling but provides the 
mathematical and statistical techniques needed 
to develop and test financial models, as well as 
the practical issues associated with implemen¬ 
tation. The encyclopedia offers the following 
unique features: 

• The entries for the encyclopedia were writ¬ 
ten by experts from around the world. This 
diverse collection of expertise has created the 
most definitive coverage of established and 
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cutting-edge financial models, applications, 
and tools in this ever-evolving field. 

• The series emphasizes both technical and 
managerial issues. This approach provides 
researchers, educators, students, and practi¬ 
tioners with a balanced understanding of the 
topics and the necessary background to deal 
with issues related to financial modeling. 

• Each entry follows a format that includes the 
author, entry abstract, introduction, body, list¬ 
ing of key points, notes, and references. This 
enables readers to pick and choose among 
various sections of an entry, and creates con¬ 
sistency throughout the entire encyclopedia. 

* The numerous illustrations and tables 
throughout the work highlight complex top¬ 
ics and assist further understanding. 

* Each volume includes a complete table of con¬ 
tents and index for easy access to various 
parts of the encyclopedia. 

TOPIC CATEGORIES 

As is the practice in the creation of an ency¬ 
clopedia, the topic categories are presented al¬ 
phabetically. The topic categories and a brief 
description of each topic follow. 

VOLUME I 
Asset Allocation 

A major activity in the investment management 
process is establishing policy guidelines to sat¬ 
isfy the investment objectives. Setting policy be¬ 
gins with the asset allocation decision. That is, 
a decision must be made as to how the funds 
to be invested should be distributed among the 
major asset classes (e.g., equities, fixed income, 
and alternative asset classes). The term "asset 
allocation" includes (1) policy asset allocation, 
(2) dynamic asset allocation, and (3) tactical as¬ 
set allocation. Policy asset allocation decisions 
can loosely be characterized as long-term as¬ 
set allocation decisions, in which the investor 
seeks to assess an appropriate long-term "nor¬ 
mal" asset mix that represents an ideal blend 
of controlled risk and enhanced return. In dy¬ 
namic asset allocation the asset mix (i.e., the 


allocation among the asset classes) is mechanis¬ 
tically shifted in response to changing market 
conditions. Once the policy asset allocation has 
been established, the investor can turn his or her 
attention to the possibility of active departures 
from the normal asset mix established by policy. 
If a decision to deviate from this mix is based 
upon rigorous objective measures of value, it 
is often called tactical asset allocation. The fun¬ 
damental model used in establishing the policy 
asset allocation is the mean-variance portfolio 
model formulated by Harry Markowitz in 1952, 
popularly referred to as the theory of portfolio 
selection and modern portfolio theory. 

Asset Pricing Models 

Asset pricing models seek to formalize the rela¬ 
tionship that should exist between asset returns 
and risk if investors behave in a hypothesized 
manner. At its most basic level, asset pricing 
is mainly about transforming asset payoffs into 
prices. The two most well-known asset pricing 
models are the arbitrage pricing theory and the 
capital asset pricing model. The fundamental 
theorem of asset pricing asserts the equivalence 
of three key issues in finance: (1) absence of 
arbitrage; (2) existence of a positive linear pric¬ 
ing rule; and (3) existence of an investor who 
prefers more to less and who has maximized his 
or her utility. There are two types of arbitrage 
opportunities. The first is paying nothing to¬ 
day and obtaining something in the future, and 
the second is obtaining something today and 
with no future obligations. Although the prin¬ 
ciple of absence of arbitrage is fundamental for 
understanding asset valuation in a competitive 
market, there are well-known limits to arbitrage 
resulting from restrictions imposed on rational 
traders, and, as a result, pricing inefficiencies 
may exist for a period of time. 

Bayesian Analysis and Financial 
Modeling Applications 

Financial models describe in mathematical 
terms the relationships between financial 
random variables through time and / or across 
assets. The fundamental assumption is that the 
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model relationship is valid independent of the 
time period or the asset class under consider¬ 
ation. Financial data contain both meaningful 
information and random noise. An adequate 
financial model not only extracts optimally the 
relevant information from the historical data 
but also performs well when tested with new 
data. The uncertainty brought about by the 
presence of data noise makes imperative the use 
of statistical analysis as part of the process of fi¬ 
nancial model building, model evaluation, and 
model testing. Statistical analysis is employed 
from the vantage point of either of the two main 
statistical philosophical traditions—frequentist 
and Bayesian. An important difference be¬ 
tween the two lies with the interpretation of the 
concept of probability. As the name suggests, 
advocates of the frequentist approach interpret 
the probability of an event as the limit of its 
long-run relative frequency (i.e., the frequency 
with which it occurs as the amount of data in¬ 
creases without bound). Since the time financial 
models became a mainstream tool to aid in un¬ 
derstanding financial markets and formulating 
investment strategies, the framework applied 
in finance has been the frequentist approach. 
However, strict adherence to this interpretation 
is not always possible in practice. When study¬ 
ing rare events, for instance, large samples of 
data may not be available, and in such cases 
proponents of frequentist statistics resort to 
theoretical results. The Bayesian view of the 
world is based on the subjectivist interpretation 
of probability: Probability is subjective, a de¬ 
gree of belief that is updated as information or 
data are acquired. Only in the last two decades 
has Bayesian statistics started to gain greater 
acceptance in financial modeling, despite its 
introduction about 250 years ago. It has been 
the advancements of computing power and the 
development of new computational methods 
that have fostered the growing use of Bayesian 
statistics in financial modeling. 

Bond Valuation 

The value of any financial asset is the present 
value of its expected future cash flows. To value 


a bond (also referred to as a fixed-income secu¬ 
rity), one must be able to estimate the bond's 
remaining cash flows and identify the appro¬ 
priate discount rate(s) at which to discount the 
cash flows. The traditional approach to bond 
valuation is to discount every cash flow with 
the same discount rate. Simply put, the rele¬ 
vant term structure of interest rate used in val¬ 
uation is assumed to be flat. This approach, 
however, permits opportunities for arbitrage. 
Alternatively, the arbitrage-free valuation ap¬ 
proach starts with the premise that a bond 
should be viewed as a portfolio or package 
of zero-coupon bonds. Moreover, each of the 
bond's cash flows is valued using a unique dis¬ 
count rate that depends on the term structure 
of interest rates and when in time the cash flow 
is. The relevant set of discount rates (that is, 
spot rates) is derived from an appropriate term 
structure of interest rates and when used to 
value risky bonds augmented with a suitable 
risk spread or premium. Rather than model¬ 
ing to calculate the fair value of its price, the 
market price can be taken as given so as to 
compute a yield measure or a spread measure. 
Popular yield measures are the yield to matu¬ 
rity, yield to call, yield to put, and cash flow 
yield. Nominal spread, static (or zero-volatility) 
spread, and option-adjusted spread are popu¬ 
lar relative value measures quoted in the bond 
market. Complications in bond valuation arise 
when a bond has one or more embedded op¬ 
tions such as call, put, or conversion features. 
For bonds with embedded options, the finan¬ 
cial modeling draws from options theory, more 
specifically, the use of the lattice model to value 
a bond with embedded options. 

Credit Risk Modeling 

Credit risk is a broad term used to refer to three 
types of risk: default risk, credit spread risk, and 
downgrade risk. Default risk is the risk that the 
counterparty to a transaction will fail to satisfy 
the terms of the obligation with respect to the 
timely payment of interest and repayment of 
the amount borrowed. The counterparty could 
be the issuer of a debt obligation or an entity on 
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the other side of a private transaction such as a 
derivative trade or a collateralized loan agree¬ 
ment (i.e., a repurchase agreement or a secu¬ 
rities lending agreement). The default risk of 
a counterparty is often initially gauged by the 
credit rating assigned by one of the three rat¬ 
ing companies—Standard & Poor's, Moody's 
Investors Service, and Fitch Ratings. Although 
default risk is the one that most market partici¬ 
pants think of when reference is made to credit 
risk, even in the absence of default, investors 
are concerned about the decline in the market 
value of their portfolio bond holdings due to 
a change in credit spread or the price perfor¬ 
mance of their holdings relative to a bond in¬ 
dex. This risk is due to an adverse change in 
credit spreads, referred to as credit spread risk, 
or when it is attributed solely to the downgrade 
of the credit rating of an entity, it is called down¬ 
grade risk. Financial modeling of credit risk is 
used (1) to measure, monitor, and control a port¬ 
folio's credit risk, and (2) to price credit risky 
debt instruments. There are two general cate¬ 
gories of credit risk models: structural models 
and reduced-form models. There is consider¬ 
able debate as to which type of model is the 
best to employ. 

Derivatives Valuation 

A derivative instrument is a contract whose 
value depends on some underlying asset. The 
term "derivative" is used to describe this prod¬ 
uct because its value is derived from the value 
of the underlying asset. The underlying asset, 
simply referred to as the "underlying," can be 
either a commodity, a financial instrument, or 
some reference entity such as an interest rate or 
stock index, leading to the classification of com¬ 
modity derivatives and financial derivatives. 
Although there are close conceptual relations 
between derivative instruments and cash mar¬ 
ket instruments such as debt and equity, the two 
classes of instruments are used differently: Debt 
and equity are used primarily for raising funds 
from investors, while derivatives are primarily 


used for dividing up and trading risks. More¬ 
over, debt and equity are direct claims against a 
firm's assets, while derivative instruments are 
usually claims on a third party. A derivative's 
value depends on the value of the underly¬ 
ing, but the derivative instrument itself repre¬ 
sents a claim on the "counterparty" to the trade. 
Derivatives instruments are classified in terms 
of their payoff characteristics: linear and nonlin¬ 
ear payoffs. The former, also referred to as sym¬ 
metric payoff derivatives, includes forward, 
futures, and swap contracts while the latter in¬ 
clude options. Basically, a linear payoff deriva¬ 
tive is a risk-sharing arrangement between the 
counterparties since both are sharing the risk re¬ 
garding the price of the underlying. In contrast, 
nonlinear payoff derivative instruments (also 
referred to as asymmetric payoff derivatives) 
are insurance arrangements because one party 
to the trade is willing to insure the counter¬ 
party of a minimum or maximum (depending 
on the contract) price. The amount received by 
the insuring party is referred to as the contract 
price or premium. Derivative instruments are 
used for controlling risk exposure with respect 
to the underlying. Hedging is a special case of 
risk control where a party seeks to eliminate 
the risk exposure. Derivative valuation or pric¬ 
ing is developed based on no-arbitrage price 
relations, relying on the assumption that two 
perfect substitutes must have the same price. 

VOLUME II 

Difference Equations and Differential 
Equations 

The tools of linear difference equations and 
differential equations have found many ap¬ 
plications in finance. A difference equation is 
an equation that involves differences between 
successive values of a function of a discrete 
variable. A function of such a variable is 
one that provides a rule for assigning values 
in sequences to it. The theory of linear dif¬ 
ference equations covers three areas: solving 
difference equations, describing the behavior 
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of difference equations, and identifying the 
equilibrium (or critical value) and stability 
of difference equations. Linear difference 
equations are important in the context of dy¬ 
namic econometric models. Stochastic models 
in finance are expressed as linear difference 
equations with random disturbances added. 
Understanding the behavior of solutions of 
linear difference equations helps develop 
intuition for the behavior of these models. In 
nontechnical terms, differential equations are 
equations that express a relationship between 
a function and one or more derivatives (or 
differentials) of that function. The relationship 
between difference equations and differential 
equations is that the latter are invaluable for 
modeling situations in finance where there is a 
continually changing value. The problem is that 
not all changes in value occur continuously. If 
the change in value occurs incrementally rather 
than continuously, then differential equations 
have their limitations. Instead, a financial 
modeler can use difference equations, which 
are recursively defined sequences. It would 
be difficult to overemphasize the importance 
of differential equations in financial modeling 
where they are used to express laws that govern 
the evolution of price probability distributions, 
the solution of economic variational problems 
(such as intertemporal optimization), and 
conditions for continuous hedging (such as in 
the Black-Scholes option pricing model). The 
two broad types of differential equations are 
ordinary differential equations and partial dif¬ 
ferential equations. The former are equations or 
systems of equations involving only one inde¬ 
pendent variable. Another way of saying this 
is that ordinary differential equations involve 
only total derivatives. Partial differential equa¬ 
tions are differential equations or systems of 
equations involving partial derivatives. When 
one or more of the variables is a stochastic pro¬ 
cess, we have the case of stochastic differential 
equations and the solution is also a stochastic 
process. An assumption must be made about 
what is driving noise in a stochastic differential 


equation. In most applications, it is assumed 
that the noise term follows a Gaussian random 
variable, although other types of random 
variables can be assumed. 

Equity Models and Valuation 

Traditional fundamental equity analysis in¬ 
volves the analysis of a company's opera¬ 
tions for the purpose of assessing its economic 
prospects. The analysis begins with the finan¬ 
cial statements of the company in order to in¬ 
vestigate the earnings, cash flow, profitability, 
and debt burden. The fundamental analyst will 
look at the major product lines, the economic 
outlook for the products (including existing 
and potential competitors), and the industries 
in which the company operates. The result of 
this analysis will be the growth prospects of 
earnings. Based on the growth prospects 
of earnings, a fundamental analyst attempts 
to determine the fair value of the stock using 
one or more equity valuation models. The two 
most commonly used approaches for valuing a 
firm's equity are based on discounted cash flow 
and relative valuation models. The principal 
idea underlying discounted cash flow models 
is that what an investor pays for a share of stock 
should reflect what is expected to be received 
from it—return on the investor's investment. 
What an investor receives are cash dividends 
in the future. Therefore, the value of a share of 
stock should be equal to the present value of 
all the future cash flows an investor expects to 
receive from that share. To value stock, there¬ 
fore, an investor must project future cash flows, 
which, in turn, means projecting future divi¬ 
dends. Popular discounted cash flow models in¬ 
clude the basic dividend discount model, which 
assumes a constant dividend growth, and the 
multiple-phase models, which include the two- 
stage dividend growth model and the stochas¬ 
tic dividend discount models. Relative valua¬ 
tion methods use multiples or ratios—such as 
price/earnings, price/book, or price/free cash 
flow—to determine whether a stock is trad¬ 
ing at higher or lower multiples than its peers. 
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There are two critical assumptions in using rela¬ 
tive valuation: (1) the universe of firms selected 
to be included in the peer group are in fact com¬ 
parable, and (2) the average multiple across the 
universe of firms can be treated as a reason¬ 
able approximation of "fair value" for those 
firms. This second assumption may be prob¬ 
lematic during periods of market panic or eu¬ 
phoria. Managers of quantitative equity firms 
employ techniques that allow them to identify 
attractive stock candidates, focusing not on a 
single stock as is done with traditional funda¬ 
mental analysis but rather on stock character¬ 
istics in order to explain why one stock out¬ 
performs another stock. They do so by statis¬ 
tically identifying a group of characteristics to 
create a quantitative selection model. In con¬ 
trast to the traditional fundamental stock se¬ 
lection, quantitative equity managers create a 
repeatable process that utilizes the stock selec¬ 
tion model to identify attractive stocks. Equity 
portfolio managers have used various statistical 
models for forecasting returns and risk. These 
models, referred to as predictive return models, 
make conditional forecasts of expected returns 
using the current information set. Predictive re¬ 
turn models include regressive models, linear 
autoregressive models, dynamic factor models, 
and hidden-variable models. 

Factor Models and Portfolio 
Construction 

Quantitative asset managers typically employ 
multifactor risk models for the purpose of 
constructing and rebalancing portfolios and 
analyzing portfolio performance. A multifactor 
risk model, or simply factor model, attempts to 
estimate and characterize the risk of a portfolio, 
either relative to a benchmark such as a market 
index or in absolute value. The model allows 
the decomposition of risk factors into a sys¬ 
tematic and an idiosyncratic component. The 
portfolio's risk exposure to broad risk factors 
is captured by the systematic risk. For equity 
portfolios these are typically fundamental 
factors (e.g., market capitalization and value 


vs. growth), technical (e.g., momentum), and 
industry/sector/country. For fixed-income 
portfolios, systematic risk captures a portfolio's 
exposure to broad risk factors such as the 
term structure of interest rates, credit spreads, 
optionality (call and prepayment), credit, and 
sectors. The portfolio's systematic risk depends 
not only on its exposure to these risk factors but 
also the volatility of the risk factors and how 
they correlate with each other. In contrast to 
systematic risk, idiosyncratic risk captures the 
uncertainty associated with news affecting the 
holdings of individual issuers in the portfolio. 
In equity portfolios, idiosyncratic risk can be 
easily diversified by reducing the importance 
of individual issuers in the portfolio. Because 
of the larger number of issuers in bond indexes, 
however, this is a difficult task. There are dif¬ 
ferent types of factor models depending on the 
factors. Factors can be exogenous variables or 
abstract variables formed by portfolios. Exoge¬ 
nous factors (or known factors) can be identified 
from traditional fundamental analysis or from 
economic theory that suggests macroeconomic 
factors. Abstract factors, also called unidenti¬ 
fied or latent factors, can be determined with 
the statistical tool of factor analysis or principal 
component analysis. The simplest type of 
factor models is where the factors are assumed 
to be known or observable, so that time-series 
data are those factors that can be used to 
estimate the model. The four most commonly 
used approaches for the evaluation of return 
premiums and risk characteristics to factors are 
portfolio sorts, factor models, factor portfolios, 
and information coefficients. Despite its use by 
quantitative asset managers, the basic building 
blocks of factor models used by model builders 
and by traditional fundamental analysts are 
the same: They both seek to identify the drivers 
of returns for the asset class being analyzed. 

Financial Econometrics 

Econometrics is the branch of economics that 
draws heavily on statistics for testing and 
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analyzing economic relationships. The eco¬ 
nomic equivalent of the laws of physics, 
econometrics represents the quantitative, math¬ 
ematical laws of economics. Financial econo¬ 
metrics is the econometrics of financial markets. 
It is a quest for models that describe financial 
time series such as prices, returns, interest rates, 
financial ratios, defaults, and so on. Although 
there are similarities between financial econo¬ 
metric models and models of the physical sci¬ 
ences, there are two important differences. First, 
the physical sciences aim at finding immutable 
laws of nature; econometric models model the 
economy or financial markets—artifacts subject 
to change. Because the economy and financial 
markets are artifacts subject to change, econo¬ 
metric models are not unique representations 
valid throughout time; they must adapt to the 
changing environment. Second, while basic 
physical laws are expressed as differential 
equations, financial econometrics uses both 
continuous-time and discrete-time models. 

Financial Modeling Principles 

The origins of financial modeling can be traced 
back to the development of mathematical equi¬ 
librium at the end of the nineteenth century, fol¬ 
lowed in the beginning of the twentieth century 
with the introduction of sophisticated mathe¬ 
matical tools for dealing with the uncertainty 
of prices and returns. In the 1950s and 1960s, 
financial modelers had tools for dealing with 
probabilistic models for describing markets, the 
principles of contingent claims analysis, an op¬ 
timization framework for portfolio selection 
based on mean and variance of asset returns, 
and an equilibrium model for pricing capital 
assets. The 1970s ushered in models for pricing 
contingent claims and a new model for pricing 
capital assets based on arbitrage pricing. Con¬ 
sequently, by the end of the 1970s, the frame¬ 
works for financial modeling were well known. 
It was the advancement of computing power 
and refinements of the theories to take into 
account real-world market imperfections and 


conventions starting in the 1980s that facilitated 
implementation and broader acceptance of 
mathematical modeling of financial decisions. 
The diffusion of low-cost high-performance 
computers has allowed the broad use of numer¬ 
ical methods, the landscape of financial mod¬ 
eling. The importance of finding closed-form 
solutions and the consequent search for simple 
models has been dramatically reduced. Com¬ 
putationally intensive methods such as Monte 
Carlo simulations and the numerical solution 
of differential equations are now widely used. 
As a consequence, it has become feasible to 
represent prices and returns with relatively 
complex models. Nonnormal probability dis¬ 
tributions have become commonplace in many 
sectors of financial modeling. It is fair to say 
that the key limitation of financial modeling is 
now the size of available data samples or train¬ 
ing sets, not the computations; it is the data 
that limit the complexity of estimates. Math¬ 
ematical modeling has also undergone major 
changes. Techniques such as equivalent martin¬ 
gale methods are being used in derivative pric¬ 
ing, and cointegration, the theory of fat-tailed 
processes, and state-space modeling (including 
ARCH/GARCFI and stochastic volatility mod¬ 
els) are being used in financial modeling. 

Financial Statement Analysis 

Much of the financial data that are used in 
constructing financial models for forecasting 
and valuation purposes draw from the finan¬ 
cial statements that companies are required to 
provide to investors. The four basic financial 
statements are the balance sheet, the income 
statement, the statement of cash flows, and 
the statement of shareholders' equity. It is im¬ 
portant to understand these data so that the 
information conveyed by them is interpreted 
properly in financial modeling. The financial 
statements are created using several assump¬ 
tions that affect how to use and interpret the 
financial data. The analysis of financial state¬ 
ments involves the selection, evaluation, and 
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interpretation of financial data and other per¬ 
tinent information to assist in evaluating the 
operating performance and financial condition 
of a company. The operating performance of a 
company is a measure of how well a company 
has used its resources—its assets, both tangible 
and intangible—to produce a return on its in¬ 
vestment. The financial condition of a company 
is a measure of its ability to satisfy its obliga¬ 
tions, such as the payment of interest on its 
debt in a timely manner. There are many tools 
available in the analysis of financial informa¬ 
tion. These tools include financial ratio analysis 
and cash flow analysis. Cash flows are essen¬ 
tial ingredients in valuation. Therefore, under¬ 
standing past and current cash flows may help 
in forecasting future cash flows and, hence, de¬ 
termine the value of the company. Moreover, 
understanding cash flow allows the assessment 
of the ability of a firm to maintain current divi¬ 
dends and its current capital expenditure policy 
without relying on external financing. Financial 
modelers must understand how to use these fi¬ 
nancial ratios and cash flow information in the 
most effective manner in building models. 

Finite Mathematics and Basic Functions 
for Financial Modeling 

The collection of mathematical tools that does 
not include calculus is often referred to as 
"finite mathematics." This includes matrix 
algebra, probability theory, and statistical anal¬ 
ysis. Ordinary algebra deals with operations 
such as addition and multiplication performed 
on individual numbers. In financial modeling, 
it is useful to consider operations performed on 
ordered arrays of numbers. Ordered arrays of 
numbers are called vectors and matrices while 
individual numbers are called scalars. Prob¬ 
ability theory is the mathematical approach 
to formalize the uncertainty of events. Even 
though a decision maker may not know which 
one of the set of possible events may finally 
occur, with probability theory a decision maker 
has the means of providing each event with 


a certain probability. Furthermore, it provides 
the decision maker with the axioms to compute 
the probability of a composed event in a 
unique way. The rather formal environment 
of probability theory translates in a reasonable 
manner to the problems related to risk and 
uncertainty in finance such as, for example, the 
future price of a financial asset. Today, investors 
may be aware of the price of a certain asset, but 
they cannot say for sure what value it might 
have tomorrow. To make a prudent decision, 
investors need to assess the possible scenarios 
for tomorrow's price and assign to each sce¬ 
nario a probability of occurrence. Only then can 
investors reasonably determine whether the 
financial asset satisfies an investment objective 
included within a portfolio. Probability models 
are theoretical models of the occurrence of 
uncertain events. In contrast, statistics is about 
empirical data and can be broadly defined as 
a set of methods used to make inferences from 
a known sample to a larger population that is 
in general unknown. In finance, a particular 
important example is making inferences from 
the past (the known sample) to the future 
(the unknown population). There are impor¬ 
tant mathematical functions with which the 
financial modeler should be acquainted. These 
include the continuous function, the indicator 
function, the derivative of a function, the 
monotonic function, and the integral, as well 
as special functions such as the characteristic 
function of random variables and the factorial, 
the gamma, beta, and Bessel functions. 

Liquidity and Trading Costs 

In broad terms, liquidity refers to the ability 
to execute a trade or liquidate a position with 
little or no cost or inconvenience. Liquidity de¬ 
pends on the market where a financial instru¬ 
ment is traded, the type of position traded, and 
sometimes the size and trading strategy of an 
individual trade. Liquidity risks are those as¬ 
sociated with the prospect of imperfect mar¬ 
ket liquidity and can relate to risk of loss or 
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risk to cash flows. There are two main aspects 
to liquidity risk measurement: the measure¬ 
ment of liquidity-adjusted measures of mar¬ 
ket risk and the measurement of liquidity risks 
per se. Market practitioners often assume that 
markets are liquid—that is, that they can liq¬ 
uidate or unwind positions at going market 
prices—usually taken to be the mean of bid 
and ask prices—without too much difficulty or 
cost. This assumption is very convenient and 
provides a justification for the practice of mark¬ 
ing positions to market prices. However, it is 
often empirically questionable, and the failure 
to allow for liquidity can undermine the mea¬ 
surement of market risk. Because liquidity risk 
is a major risk factor in its own right, port¬ 
folio managers and traders will need to mea¬ 
sure this risk in order to formulate effective 
portfolio and trading strategies. A consider¬ 
able amount of work has been done in the eq¬ 
uity market in estimating liquidity risk. Because 
transaction costs are incurred when buying or 
selling stocks, poorly executed trades can ad¬ 
versely impact portfolio returns and therefore 
relative performance. Transaction costs are clas¬ 
sified as explicit costs such as brokerage and 
taxes, and implicit costs, which include market 
impact cost, price movement risk, and opportu¬ 
nity cost. Broadly speaking, market impact cost 
is the price that a trader has to pay for obtain¬ 
ing liquidity in the market and is a key com¬ 
ponent of trading costs that must be modeled 
so that effective trading programs for execut¬ 
ing trades can be developed. Typical forecast¬ 
ing models for market impact costs are based 
on a statistical factor approach where the in¬ 
dependent variables are trade-based factors or 
asset-based factors. 

VOLUME III 

Model Risk and Selection 

Model risk is the risk of error in pricing or 
risk-forecasting models. In practice, model risk 
arises because (1) any model involves simpli¬ 


fication and calibration, and both of these re¬ 
quire subjective judgments that are prone to er¬ 
ror, and/or (2) a model is used inappropriately. 
Although model risk cannot be avoided, there 
are many ways in which financial modelers can 
manage this risk. These include (1) recogniz¬ 
ing model risk, (2) identifying, evaluating, and 
checking the model's key assumption, (3) se¬ 
lecting the simplest reasonable model, (4) resist¬ 
ing the temptation to ignore small discrepancies 
in results, (5) testing the model against known 
problems, (6) plotting results and employing 
nonparametric statistics, (7) back-testing and 
stress-testing the model, (8) estimating model 
risk quantitatively, and (9) reevaluating mod¬ 
els periodically. In financial modeling, model 
selection requires a blend of theory, creativity, 
and machine learning. The machine-learning 
approach starts with a set of empirical data that 
the financial modeler wants to explain. Data are 
explained by a family of models that include 
an unbounded number of parameters and are 
able to fit data with arbitrary precision. There 
is a trade-off between model complexity and 
the size of the data sample. To implement this 
trade-off, ensuring that models have forecast¬ 
ing power, the fitting of sample data is con¬ 
strained to avoid fitting noise. Constraints are 
embodied in criteria such as the Akaike infor¬ 
mation criterion or the Bayesian information 
criterion. Economic and financial data are gen¬ 
erally scarce given the complexity of their pat¬ 
terns. This scarcity introduces uncertainty as 
regards statistical estimates obtained by the fi¬ 
nancial modeler. It means that the data might 
be compatible with many different models with 
the same level of statistical confidence. Methods 
of probabilistic decision theory can be used to 
deal with model risk due to uncertainty regard¬ 
ing the model's parameters. Probabilistic deci¬ 
sion making starts from the Bayesian inference 
process and involves computer simulations in 
all realistic situations. Since a risk model is typi¬ 
cally a combination of a probability distribution 
model and a risk measure, a critical assump¬ 
tion is the probability distribution assumed for 
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the random variable of interest. Too often, the 
Gaussian distribution is the model of choice. 
Empirical evidence supports the use of proba¬ 
bility distributions that exhibit fat tails such as 
the Student's t distribution and its asymmetric 
version and the Pareto stable class of distribu¬ 
tions and their tempered extensions. Extreme 
value theory offers another approach for risk 
modeling. 

Mortgage-Backed Securities Analysis 
and Valuation 

Mortgage-backed securities are fixed-income 
securities backed by a pool of mortgage loans. 
Residential mortgage-backed securities (RMBS) 
are backed by a pool of residential mortgage 
loans (one-to-four family dwellings). The RMBS 
market includes agency RMBS and nonagency 
RMBS. The former are securities issued by 
the Government National Mortgage Associa¬ 
tion (Ginnie Mae), Fannie Mae, and Freddie 
Mac. Agency RMBS include passthrough secu¬ 
rities, collateralized mortgage obligations, and 
stripped mortgage-backed securities (interest- 
only and principal-only securities). The valua¬ 
tion of RMBS is complicated due to prepayment 
risk, a form of call risk. In contrast, nonagency 
RMBS are issued by private entities, have no 
implicit or explicit government guarantee, and 
therefore require one or more forms of credit 
enhancement in order to be assigned a credit 
rating. The analysis of nonagency RMBS must 
take into account both prepayment risk and 
credit risk. The most commonly used method 
for valuing RMBS is the Monte Carlo method, 
although other methods have garnered favor, 
in particular the decomposition method. The 
analysis of RMBS requires an understanding of 
the factors that impact prepayments. 

Operational Risk 

Operational risk has been regarded as a mere 
part of a financial institution's "other" risks. 
However, failures of major financial entities 


have made regulators and investors aware of 
the importance of this risk. In general terms, 
operational risk is the risk of loss resulting from 
inadequate or failed internal processes, people, 
or systems or from external events. This risk 
encompasses legal risks, which includes, but is 
not limited to, exposure to fines, penalties, or 
punitive damages resulting from supervisory 
actions, as well as private settlements. Opera¬ 
tional risk can be classified according to several 
principles: nature of the loss (internally inflicted 
or externally inflicted), direct losses or indirect 
losses, degree of expectancy (expected or unex¬ 
pected), risk type, event type or loss type, and 
by the magnitude (or severity) of loss and the 
frequency of loss. Operational risk can be the 
cause of reputational risk, a risk that can occur 
when the market reaction to an operational loss 
event results in reduction in the market value 
of a financial institution that is greater than the 
amount of the initial loss. The two principal 
approaches in modeling operational loss dis¬ 
tributions are the nonparametric approach and 
the parametric approach. It is important to em¬ 
ploy a model that captures tail events, and for 
this reason in operational risk modeling, dis¬ 
tributions that are characterized as light-tailed 
distributions should be used with caution. The 
models that have been proposed for assessing 
operational risk can be broadly classified into 
top-down models and bottom-up models. Top- 
down models quantify operational risk without 
attempting to identify the events or causes of 
losses. Bottom-up models quantify operational 
risk on a micro level, being based on identified 
internal events. The obstacle hindering the im¬ 
plementation of these models is the scarcity of 
available historical operational loss data. 

Optimization Tools 

Optimization is an area in applied mathematics 
that, most generally, deals with efficient algo¬ 
rithms for finding an optimal solution among 
a set of solutions that satisfy given constraints. 
Mathematical programming, a management 
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science tool that uses mathematical opti¬ 
mization models to assist in decision making, 
includes linear programming, integer program¬ 
ming, mixed-integer programming, nonlinear 
programming, stochastic programming, and 
goal programming. Unlike other mathematical 
tools that are available to decision makers such 
as statistical models (which tell the decision 
maker what occurred in the past), forecasting 
models (which tell the decision maker what 
might happen in the future), and simulation 
models (which tell the decision maker what 
will happen under different conditions), 
mathematical programming models allow the 
decision maker to identify the "best" solution. 
Markowitz's mean-variance model for port¬ 
folio selection is an example of an application 
of one type of mathematical programming 
(quadratic programming). Traditional opti¬ 
mization modeling assumes that the inputs 
to the algorithms are certain, but there are 
also branches of optimization such as robust 
optimization that study the optimal decision 
under uncertainty about the parameters of the 
problem. Stochastic programming deals with 
both the uncertainty about the parameters and 
a multiperiod decision-making framework. 

Probability Distributions 

In financial models where the outcome of 
interest is a random variable, an assumption 
must be made about the random variable's 
probability distribution. There are two types 
of probability distributions: discrete and 
continuous. Discrete probability distributions 
are needed whenever the random variable is 
to describe a quantity that can assume values 
from a countable set, either finite or infinite. 
A discrete probability distribution (or law) is 
quite intuitive in that it assigns certain values, 
positive probabilities, adding up to one, while 
any other value automatically has zero proba¬ 
bility. Continuous probability distributions are 
needed when the random variable of interest 
can assume any value inside of one or more 


intervals of real numbers such as, for example, 
any number greater than zero. Asset returns, 
for example, whether measured monthly, 
weekly, daily, or at an even higher frequency 
are commonly modeled as continuous random 
variables. In contrast to discrete probability 
distributions that assign positive probability to 
certain discrete values, continuous probability 
distributions assign zero probability to any sin¬ 
gle real number. Instead, only entire intervals of 
real numbers can have positive probability such 
as, for example, the event that some asset return 
is not negative. For each continuous probabil¬ 
ity distribution, this necessitates the so-called 
probability density, a function that determines 
how the entire probability mass of one is dis¬ 
tributed. The density often serves as the proxy 
for the respective probability distribution. To 
model the behavior of certain financial assets in 
a stochastic environment, a financial modeler 
can usually resort to a variety of theoretical 
distributions. Most commonly, probability dis¬ 
tributions are selected that are analytically well 
known. For example, the normal distribution (a 
continuous distribution)—also called the Gaus¬ 
sian distribution—is often the distribution of 
choice when asset returns are modeled. Or the 
exponential distribution is applied to charac¬ 
terize the randomness of the time between two 
successive defaults of firms in a bond portfolio. 
Many other distributions are related to them or 
built on them in a well-known manner. These 
distributions often display pleasant features 
such as stability under summation—meaning 
that the return of a portfolio of assets whose 
returns follow a certain distribution again 
follows the same distribution. Flowever, one 
has to be careful using these distributions since 
their advantage of mathematical tractability 
is often outweighed by the fact that the 
stochastic behavior of the true asset returns 
is not well captured by these distributions. 
For example, although the normal distribution 
generally renders modeling easy because all 
moments of the distribution exist, it fails to 
reflect stylized facts commonly encountered in 
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asset returns—namely, the possibility of very 
extreme movements and skewness. To remedy 
this shortcoming, probability distributions 
accounting for such extreme price changes 
have become increasingly popular. Some of 
these distributions concentrate exclusively on 
the extreme values while others permit any real 
number, but in a way capable of reflecting mar¬ 
ket behavior. Consequently, a financial modeler 
has available a great selection of probability 
distributions to realistically reproduce asset 
price changes. Their common shortcoming is 
generally that they are mathematically difficult 
to handle. 

Risk Measures 

The standard assumption in financial models is 
that the distribution for the return on financial 
assets follows a normal (or Gaussian) distri¬ 
bution and therefore the standard deviation 
(or variance) is an appropriate measure of risk 
in the portfolio selection process. This is the 
risk measure that is used in the well-known 
Markowitz portfolio selection model (that is, 
mean-variance model), which is the foundation 
for modern portfolio theory. Mounting evi¬ 
dence since the early 1960s strongly suggests 
that return distributions do not follow a normal 
distribution, but instead exhibit heavy tails 
and, possibly, skewness. The "tails" of the dis¬ 
tribution are where the extreme values occur, 
and these extreme values are more likely than 
would be predicted by the normal distribution. 
This means that between periods where the 
market exhibits relatively modest changes in 
prices and returns, there will be periods where 
there are changes that are much higher (that 
is, crashes and booms) than predicted by the 
normal distribution. This is of major concern to 
financial modelers in seeking to generate prob¬ 
ability estimates for financial risk assessment. 
To more effectively implement portfolio se¬ 
lection, researchers have proposed alternative 
risk measures. These risk measures fall into 


two disjointed categories: dispersion measures 
and safety-first measures. Dispersion measures 
include mean standard deviation, mean abso¬ 
lute deviation, mean absolute moment, index 
of dissimilarity, mean entropy, and mean colog. 
Safety-first risk measures include classical 
safety first, value-at-risk, average value-at-risk, 
expected tail loss, MiniMax, lower partial 
moment, downside risk, probability-weighted 
function of deviations below a specified target 
return, and power conditional value-at-risk. 
Despite these alternative risk measures, the 
most popular risk measure used in financial 
modeling is volatility as measured by the 
standard deviation. There are different types 
of volatility: historical, implied volatility, 
level-dependent volatility, local volatility, 
and stochastic volatility (e.g., jump-diffusion 
volatility). There are risk measures commonly 
used for bond portfolio management. These 
measures include duration, convexity, key rate 
duration, and spread duration. 

Software for Financial Modeling 

The development of financial models requires 
the modeler to be familiar with spreadsheets 
such as Microsoft Excel and/or a platform to 
implement concepts and algorithms such as 
the Palisade Decision Tools Suite and other 
Excel-based software (mostly @RISK1, Solver2, 
VBA3), and MATLAB. Financial modelers can 
choose one or the other, depending on their 
level of familiarity and comfort with spread¬ 
sheet programs and their add-ins versus pro¬ 
gramming environments such as MATLAB. 
Some tasks and implementations are easier in 
one environment than in the other. MATLAB 
is a modeling environment that allows for in¬ 
put and output processing, statistical analysis, 
simulation, and other types of model build¬ 
ing for the purpose of analysis of a situa¬ 
tion. MATLAB uses a number-array-oriented 
programming language, that is, a program¬ 
ming language in which vectors and matrices 
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are the basic data structures. Reliable built-in 
functions, a wide range of specialized tool¬ 
boxes, easy interface with widespread software 
like Microsoft Excel, and beautiful graphing ca¬ 
pabilities for data visualization make imple¬ 
mentation with MATLAB efficient and useful 
for the financial modeler. Visual Basic for Appli¬ 
cations (VBA) is a programming language en¬ 
vironment that allows Microsoft Excel users to 
automate tasks, create their own functions, per¬ 
form complex calculations, and interact with 
spreadsheets. VBA shares many of the same 
concepts as object-oriented programming lan¬ 
guages. Despite some important limitations, 
VBA does add useful capabilities to spreadsheet 
modeling, and it is a good tool to know because 
Excel is the platform of choice for many finance 
professionals. 

Stochastic Processes and Tools 

Stochastic integration provides a coherent way 
to represent that instantaneous uncertainty (or 
volatility) cumulates over time. It is thus fun¬ 
damental to the representation of financial pro¬ 
cesses such as interest rates, security prices, or 
cash flows. Stochastic integration operates on 
stochastic processes and produces random vari¬ 
ables or other stochastic processes. Stochastic 
integration is a process defined on each path as 
the limit of a sum. However, these sums are dif¬ 
ferent from the sums of the Riemann-Lebesgue 
integrals because the paths of stochastic pro¬ 
cesses are generally not of bounded variation. 
Stochastic integrals in the sense of Ito are de¬ 
fined through a process of approximation by 
(1) defining Brownian motion, which is the con¬ 
tinuous limit of a random walk, (2) defining 
stochastic integrals for elementary functions as 
the sums of the products of the elementary 
functions multiplied by the increments of the 
Brownian motion, and (3) extending this defi¬ 
nition to any function through approximating 
sequences. The major application of integra¬ 
tion to financial modeling involves stochastic 


integrals. An understanding of stochastic in¬ 
tegrals is needed to understand an important 
tool in contingent claims valuation: stochastic 
differential equations. The dynamic of finan¬ 
cial asset returns and prices can be expressed 
using a deterministic process if there is no un¬ 
certainty about its future behavior, or, with a 
stochastic process, in the more likely case when 
the value is uncertain. Stochastic processes in 
continuous time are the most used tool to ex¬ 
plain the dynamic of financial assets returns 
and prices. They are the building blocks to con¬ 
struct financial models for portfolio optimiza¬ 
tion, derivatives pricing, and risk management. 
Continuous-time processes allow for more ele¬ 
gant theoretical modeling compared to discrete 
time models, and many results proven in prob¬ 
ability theory can be applied to obtain a simple 
evaluation method. 


Statistics 

Probability models are theoretical models of 
the occurrence of uncertain events. In contrast, 
statistics is about empirical data and can be 
broadly defined as a set of methods used to 
make inferences from a known sample to a 
larger population that is in general unknown. In 
finance, a particular important example is mak¬ 
ing inferences from the past (the known sam¬ 
ple) to the future (the unknown population). In 
statistics, probabilistic models are applied us¬ 
ing data so as to estimate the parameters of 
these models. It is not assumed that all param¬ 
eter values in the model are known. Instead, 
the data for the variables in the model to esti¬ 
mate the value of the parameters are used and 
then applied to test hypotheses or make infer¬ 
ences about their estimated values. In financial 
modeling, the statistical technique of regression 
models is the workhorse. However, because re¬ 
gression models are part of the field of financial 
econometrics, this topic is covered in that topic 
category. Understanding dependences or func¬ 
tional links between variables is a key theme in 
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financial modeling. In general terms, functional 
dependencies are represented by dynamic 
models. Many important models are linear 
models whose coefficients are correlation coeffi¬ 
cients. In many instances in financial modeling, 
it is important to arrive at a quantitative mea¬ 
sure of the strength of dependencies. The cor¬ 
relation coefficient provides such a measure. In 
many instances, however, the correlation coef¬ 
ficient might be misleading. In particular, there 
are cases of nonlinear dependencies that result 
in a zero correlation coefficient. From the point 
of view of financial modeling, this situation is 
particularly dangerous as it leads to substan¬ 
tially underestimated risk. Different measures 
of dependence have been proposed, in partic¬ 
ular copula functions. The copula overcomes 
the drawbacks of the correlation as a measure 
of dependency by allowing for a more general 
measure than linear dependence, allowing for 
the modeling of dependence for extreme events, 
and being indifferent to continuously increas¬ 
ing transformations. Another essential tool in 
financial modeling, because it allows the incor¬ 
poration of uncertainty in financial models and 
consideration of additional layers of complex¬ 
ity that are difficult to incorporate in analytical 
models, is Monte Carlo simulation. The main 
idea of Monte Carlo simulation is to represent 
the uncertainty in market variables through sce¬ 
narios, and to evaluate parameters of interest 
that depend on these market variables in com¬ 
plex ways. The advantage of such an approach 
is that it can easily capture the dynamics of un¬ 
derlying processes and the otherwise complex 
effects of interactions among market variables. 
A substantial amount of research in recent years 
has been dedicated to making scenario genera¬ 
tion more accurate and efficient, and a number 
of sophisticated computational techniques are 
now available to the financial modeler. 

Term Structure Modeling 

The arbitrage-free valuation approach to the 
valuation of option-free bonds, bonds with em¬ 


bedded options, and option-type derivative in¬ 
struments requires that a financial instrument 
be viewed as a package of zero-coupon bonds. 
Consequently, in financial modeling, it is essen¬ 
tial to be able to discount each expected cash 
flow by the appropriate interest rate. That rate 
is referred to as the spot rate. The term struc¬ 
ture of interest rates provides the relationship 
between spot rates and maturity. Because of its 
role in valuation of cash bonds and option-type 
derivatives, the estimation of the term struc¬ 
ture of interest rates is of critical importance as 
an input into a financial model. In addition to 
its role in valuation modeling, term structure 
models are fundamental to expressing value, 
risk, and establishing relative value across the 
spectrum of instruments found in the various 
interest-rate or bond markets. The term struc¬ 
ture is most often specified for a specific market 
such as the U.S. Treasury market, the bond mar¬ 
ket for double-A rated financial institutions, 
the interest rate market for LIBOR, and swaps. 
Static models of the term structure are char¬ 
acterizations that are devoted to relationships 
based on a given market and do not serve future 
scenarios where there is uncertainty. Standard 
static models include those known as the spot 
yield curve, discount function, par yield curve, 
and the implied forward curve. Instantiations of 
these models may be found in both a discrete- 
and continuous-time framework. An important 
consideration is establishing how these term 
structure models are constructed and how to 
transform one model into another. In model¬ 
ing the behavior of interest rates, stochastic dif¬ 
ferential equations (SDEs) are commonly used. 
The SDEs used to model interest rates must cap¬ 
ture the market properties of interest rates such 
as mean reversion and/or a volatility that de¬ 
pends on the level of interest rates. For a one- 
factor model, the SDE is used to model the 
behavior of the short-term rate, referred to as 
simply the "short rate." The addition of another 
factor (i.e., a two-factor model) involves extend¬ 
ing the SDE to represent the behavior of the 
short rate and a long-term rate (i.e., long rate). 
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The entries can serve as material for a wide 
spectrum of courses, such as the following: 

• Financial engineering 

• Financial mathematics 

• Financial econometrics 

• Statistics with applications in finance 


* Quantitative asset management 

* Asset and derivative pricing 

* Risk management 

Frank J. Fabozzi 
Editor, Encyclopedia of Financial Models 



Guide to the Encyclopedia of 
Financial Models 


The Encyclopedia of Financial Models provides 
comprehensive coverage of the field of finan¬ 
cial modeling. This reference work consists of 
three separate volumes and 127 entries. Each 
entry provides coverage of the selected topic 
intended to inform a broad spectrum of read¬ 
ers ranging from finance professionals to aca¬ 
demicians to students to fiduciaries. To derive 
the greatest possible benefit from the Encyclo¬ 
pedia of Financial Models, we have provided this 
guide. It explains how the information within 
the encyclopedia can be located. 

ORGANIZATION 

The Encyclopedia of Financial Models is organized 
to provide maximum ease of use for its readers. 

Table of Contents 

A complete table of contents for the entire en¬ 
cyclopedia appears in the front of each volume. 
This list of titles represents topics that have been 
carefully selected by the editor, Frank J. Fabozzi. 
The Preface includes a more detailed descrip¬ 
tion of the volumes and the topic categories that 
the entries are grouped under. 

Index 

A Subject Index for the entire encyclopedia is 
located at the end of each volume. The sub¬ 


jects in the index are listed alphabetically and 
indicate the volume and page number where 
information on this topic can be found. 

Entries 

Each entry in the Encyclopedia of Financial Mod¬ 
els begins on a new page, so that the reader may 
quickly locate it. The author's name and affilia¬ 
tion are displayed at the beginning of the entry. 
All entries in the encyclopedia are organized 
according to a standard format, as follows: 

• Title and author 

• Abstract 

• Introduction 

• Body 

• Key points 

• Notes 

• References 

Abstract 

The abstract for each entry gives an overview of 
the topic, but not necessarily the content of the 
entry. This is designed to put the topic in the 
context of the entire Encyclopedia, rather than 
give an overview of the specific entry content. 

Introduction 

The text of each entry begins with an intro¬ 
ductory section that defines the topic under 
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discussion and summarizes the content. By 
reading this section, the reader gets a general 
idea about the content of a specific entry. 

Body 

The body of each entry explains the purpose, 
theory, and math behind each model. 

Key Points 

The key points section provides in bullet point 
format a review of the materials discussed in 


each entry. It imparts to the reader the most 
important issues and concepts discussed. 

Notes 

The notes provide more detailed information 
and citations of further readings. 

References 

The references section lists the publications 
cited in the entry. 
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Dividend Discount Models 


PAMELA P. DRAKE, PhD, CFA 

J. Gray Ferguson Professor of Finance, College of Business, James Madison University 

FRANK J. FABOZZI, PhD, CFA, CPA 

Professor of Finance, EDHEC Business School 


Abstract: Dividends are cash payments made by a corporation to its owners. Though cash dividends 
are paid to both preferred and common shareholders, most of the focus of the attention is on the 
dividends paid to the residual owners of the corporation, the common shareholders. Dividends 
paid to common and preferred shareholders are not legal obligations of a corporation, and some 
corporations do not pay cash dividends. But for those companies that pay dividends, changes in 
dividends are noticed by investors—increases in dividends are viewed favorably and are associated 
with increases in the company's stock price, whereas decreases in dividends are viewed quite 
unfavorably and are associated with decreases in the company's stock price. Most models that use 
dividends in the estimation of stock value use current dividends, some measure of historical or 
projected dividend growth, and an estimate of the required rate of return. Popular models include 
the basic dividend discount model that assumes a constant dividend growth, and the multiple- 
phase models, which include the two-stage dividend growth model and the stochastic dividend 
discount models. 


In this entry, we discuss dividend discount 
models and their limitations. We begin with a 
review of the various ways to measure divi¬ 
dends and then take a look at how dividends 
and stock prices are related. 

DIVIDEND MEASURES 

Dividends are measured using three different 
measures: 

• Dividends per share 

• Dividend yield 

• Dividend payout 


The value of a share of stock today is the in¬ 
vestors' assessment of today's worth of future 
cash flows for each share. Because future cash 
flows to shareholders are dividends, we need a 
measure of dividends for each share of stock to 
estimate future cash flows per share. The divi¬ 
dends per share is the dollar amount of dividends 
paid out during the period per share of common 
stock: 

Dividends per share 

Dividends 

Number of shares outstanding 


3 
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If a company has paid $600,000 in dividends 
during the period and there are 1.5 million 
shares of common stock outstanding, then 


_ , , , $600,000 

Dividends per share = - 

1,500,000 shares 

= $0.40 per share 


The company paid out 40 cents in dividends 
per common share during this period. 

The dividend yield, the ratio of dividends to 
price, is 


Dividend yield 

Annual cash dividends per share 
Market price per share 

The dividend yield is also referred to as the 
dividend-price ratio. Historically, the dividend 
yield for U.S. stocks has been a little less than 
5%, according to a study by Campbell and 
Shiller (1998). In an exhaustive study of the re¬ 
lation between dividend yield and stock prices, 
Campbell and Shiller find that: 


• There is a weak relation between the divi¬ 
dend yield and subsequent 10-year dividend 
growth. 

• The dividend yield does not forecast future 
dividend growth. 

• The dividend yield predicts future price 
changes. 


The weak relation between the dividend yield 
and future dividends may be attributed to 
the effects of the business cycle on dividend 
growth. The tendency for the dividend yield to 
revert to its historical mean has been observed 
by researchers. 

Another way of describing dividends paid 
out during a period is to state the dividends 
as a portion of earnings for the period. This is 
referred to as the dividend payout ratio: 

Dividend payout ratio 

Dividends 

Earnings available to common shareholders 

If a company pays $360,000 in dividends and 
has earnings available to common shareholders 


of $1.2 million, the payout ratio is 30%: 

_ , , $360,000 

Dividend payout ratio = - 

K J $1,200,000 

= 0.30 or 30% 

This means that the company paid out 30% of 
its earnings to shareholders. 

The proportion of earnings paid out in div¬ 
idends varies by company and industry. For 
example, the companies in the steel industry 
typically pay out 25% of their earnings in div¬ 
idends, whereas the electric utility companies 
pay out approximately 75% of their earnings in 
dividends. 

If companies focus on dividends per share 
in establishing their dividends (e.g., a constant 
dividends per share), the dividend payout will 
fluctuate along with earnings. We generally ob¬ 
serve that companies set the dividend policy 
such that dividends per share grow at a rela¬ 
tively constant rate, resulting in dividend pay¬ 
outs that fluctuate. 


DIVIDENDS AND STOCK 
PRICES 

If an investor buys a common stock, he or she 
has bought shares that represent an ownership 
interest in the corporation. Shares of common 
perpetual security—there is no maturity. The 
investor who owns shares of common stock 
has the right to receive a certain portion of any 
dividends—but dividends are not a sure thing. 
Whether or not a corporation pays dividends 
is up to its board of directors—the represen¬ 
tatives of the common shareholders. Typically, 
we see some pattern in the dividends compa¬ 
nies pay: Dividends are either constant or grow 
at a constant rate. But there is no guarantee that 
dividends will be paid in the future. 

Preferred shareholders are in a similar situa¬ 
tion as the common shareholders. They expect 
to receive cash dividends in the future, but the 
payment of these dividends is up to the board of 
directors. But there are three major differences 
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between the dividends of preferred and com¬ 
mon shares. First, the dividends on preferred 
stock usually are specified at a fixed rate or dol¬ 
lar amount, whereas the amount of dividends is 
not specified for common shares. Second, pre¬ 
ferred shareholders are given preference: their 
dividends must be paid before any dividends 
are paid on common stock. Third, if the pre¬ 
ferred stock has a cumulative feature, dividends 
not paid in one period accumulate and are car¬ 
ried over to the next period. Therefore, the div¬ 
idends on preferred stock are more certain than 
those on common shares. 

It is reasonable to figure that what an investor 
pays for a share of stock should reflect what 
he or she expects to receive from it—return on 
the investor's investment. What an investor re¬ 
ceives are cash dividends in the future. How 
can we relate that return to what a share of 
common stock is worth? Well, the value of a 
share of stock should be equal to the present 
value of all the future cash flows an investor ex¬ 
pects to receive from that share. To value stock, 
therefore, an investor must project future cash 
flows, which, in turn, means projecting future 
dividends. This approach to the valuation of 
common stock is referred to as the discounted 
cash flow approach and the models used are 
referred to as dividend discount models. 

Dividend discount models are not the only 
approach to valuing common stock. There are 
fundamental factor models, also referred to as 
multifactor equity models. 


BASIC DIVIDEND DISCOUNT 
MODELS 

As discussed above, the basis for the dividend 
discount model (DDM) is simply the applica¬ 
tion of present value analysis, which asserts that 
the fair price of an asset is the present value of 
the expected cash flows. This model was first 
suggested by Williams (1938). In the case of 
common stock, the cash flows are the expected 


dividend payouts. The basic DDM model can 
be expressed mathematically as: 


Pl + ° 2 +■■■ 
(l+n)i (l + r 2 ) 2 


( 1 ) 


where 


P = the fair value or theoretical value of the 
common stock 

D f = the expected dividend for period t 
r t = the appropriate discount or 
capitalization rate for period t 

The dividends are expected to be received 
forever. 

Practitioners rarely use the dividend discount 
model given by equation (1). Instead, one of the 
DDMs discussed below is typically used. 


THE FINITE LIFE GENERAL 
DIVIDEND DISCOUNT 
MODEL 

The DDM given by equation (1) can be modi¬ 
fied by assuming a finite life for the expected 
cash flows. In this case, the expected cash flows 
are the expected dividend payouts and the ex¬ 
pected sale price of the stock at some future 
date. The expected sale price is also called the 
terminal price and is intended to capture the fu¬ 
ture value of all subsequent dividend payouts. 
This model is called th e finite life general DDM 
and is expressed mathematically as: 


Di D 2 

(1+n) 1 + (l+r 2 )2 

+ PN 

(1 +r N ) N 


D n 

" (1 + r N ) N 

( 2 ) 


where 


P N = the expected sale price (or terminal 
price) at the horizon period N 
N = the number of periods in the horizon 

and P, D f , and r t are the same as defined above. 
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Assuming a Constant Discount Rate 

A special case of the finite life general DDM 
that is more commonly used in practice is one 
in which it is assumed that the discount rate 
is constant. That is, it is assumed each r t is the 
same for all f. Denoting this constant discount 
rate by r, equation (2) becomes: 


P = 


Dr 


D ? 


(1 + r) 1 
Pn 


(1 + r) 2 3 


(1 +r) 


N 


D n 

(l+r) N 

( 3 ) 


Equation (3) is called the constant discount rate 
version of the finite life general DDM. When 
practitioners use any of the DDM models pre¬ 
sented in this entry, typically the constant dis¬ 
count rate version form is used. 

Let's illustrate the finite life general DDM as¬ 
suming a constant discount rate assuming each 
period is a year. Suppose that the following data 
are determined for stock XYZ by a financial 
analyst: 


Dx = $2.00 D 2 = $2.20 D 3 = $2.30 
D 4 = $2.55 D 5 = $2.65 
P 5 = $26 N = 5 r = 0.10 


Based on these data, the fair price of stock 
XYZ is 


P = 


$ 2.00 
( 1 . 10) 1 + 
$2.65 
+ ( 1 . 10) 5 


$ 2.20 
( 1 . 10) 2 + 
$26.00 
+ ( 1 . 10) 5 


$2.30 $2.55 

( 1 . 10) 3 + ( 1 . 10) 4 

= $24,895 


Required Inputs 

The finite life general DDM requires three fore¬ 
casts as inputs to calculate the fair value of a 
stock: 

1 . The expected terminal price (Pn) 

2. The dividends up to the assumed horizon 
(Di to D n ) 

3. The discount rates (n to r N ) or r (in the case 
of the constant discount rate version) 


Thus the relevant question is. How accurately 
can these inputs be forecasted? 

The terminal price is the most difficult of the 
three forecasts. According to theory, P ; y is the 
present value of all future dividends after N; 
that is, D n+ i, D n+2 ,..., Djnfinity. Also, the fu¬ 
ture discount rate (rt) must be forecasted. In 
practice, forecasts are made of either dividends 
(Dn) or earnings (En) first, and then the price 
Pn is estimated by assigning an "appropriate" 
requirement for yield, price-earnings ratio, or 
capitalization rate. Note that the present value 
of the expected terminal price Pn/ (1 + r) N be¬ 
comes very small if N is very large. 

The forecasting of dividends is "somewhat" 
easier. Usually, past history is available, man¬ 
agement can be queried, and cash flows can be 
projected for a given scenario. The discount rate 
r is the required rate of return. Forecasting r is 
more complex than forecasting dividends, al¬ 
though not nearly as difficult as forecasting the 
terminal price (which requires a forecast of fu¬ 
ture discount rates as well). As noted above, in 
practice for a given company r is assumed to be 
constant for all periods and typically generated 
from the capital asset pricing model (CAPM). 
The CAPM provides the expected return for a 
company based on its systematic risk (beta). 

Assessing Fair Value 

Given the fair price derived from a dividend 
discount model, the assessment of the stock 
proceeds along the following lines. If the mar¬ 
ket price is below the fair price derived from 
the model, the stock is undervalued or cheap. 
The opposite holds for a stock whose market 
price is greater than the model-derived price. 
In this case, the stock is said to be overvalued 
or expensive. A stock trading equal to or close 
to its fair price is said to be fairly valued. 

The DDM tells us the fair price but does not 
tell us when the price of the stock should be 
expected to move to this fair price. That is, the 
model says that based on the inputs generated 
by the analyst, the stock may be cheap, expen¬ 
sive, or priced appropriately. However, it does 
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not tell us that if it is mispriced how long it will 
take before the market recognizes the mispric¬ 
ing and corrects it. As a result, an investor may 
hold on to a stock perceived to be cheap for an 
extended period of time and may underperform 
a benchmark during that period. 

While a stock may be mispriced, an investor 
must also consider how mispriced it is in order 
to take the appropriate action (buy a cheap stock 
and sell or sell short an expensive stock). This 
will depend on the degree of mispricing and 
transaction costs. 


CONSTANT GROWTH 
DIVIDEND DISCOUNT 
MODEL 

If future dividends are assumed to grow at a 
constant rate (g ) and a single discount rate (r) is 
used, then the finite life general DDM assuming 
a constant growth rate given by equation (3) 
becomes 


p Doll+g ) 1 } Po(l + g ) 2 | Po(l + g ) 3 | 


(1 + r) 1 


. D 0 (l+g) N 

■ /-i . \\T ' 


(1 + r) 2 
Pn 


(1 + r) 3 


(4) 


(1 + r) N (1 + r) N 

and it can be shown that if N is assumed to 
approach infinity, equation (4) is equal to: 

D 0 (l+g) 


P = 


■g 


( 5 ) 


Equation (5) is the constant growth dividend dis¬ 
count model (Gordon and Shapiro, 1956). An 
equivalent formulation for the constant growth 
DDM is 

Di 

P = — ( 6 ) 

r-g 

where Dj is equal to D 0 (l + g). 

Consider a company that currently pays div¬ 
idends of $3.00 per share. If the dividend is ex¬ 
pected to grow at a rate of 3% per year and the 
discount rate is 12%, what is the value of a share 
of stock of this company? Using equation (5), 


P = 


$3.00(1 + 0.03) $3.09 


0.12-0.03 


0.09 


= $34.33 


If the growth rate for this company's dividends 
is 5%, instead of 3%, the current value is $45.00: 


$3.00(1 +0.05) 
0.12-0.05 


$3.15 

0.07 


$45.00 


Therefore, the greater the expected growth rate 
of dividends, the greater the value of a share of 
stock. 

In this last example, if the discount rate is 14% 
instead of 12% and the growth rate of dividends 
is 3%, the value of a share of stock is: 

„ $3.00(1 +0.03) $3.09 

P = ----- = --= $28.09 

0.14-0.03 0.11 


Therefore, the greater the discount rate, the lower 
the current value of a share of stock. 

Let's apply the model as given by equation 
(5) to estimate the price of three companies: 
Eli Lilly, Schering-Plough, and Wyeth Labora¬ 
tories. The discount rate for each company was 
estimated using the capital asset pricing model 
assuming (1) a market risk premium of 5% and 
(2) a risk-free rate of 4.63%. The market risk pre¬ 
mium is based on the historical spread between 
the return on the market (often proxied with the 
return on the S&P 500 Index) and the risk-free 
rate. Historically, this spread has been approxi¬ 
mately 5%. The risk-free rate is often estimated 
by the yield on U.S. Treasury securities. At the 
end of 2006, 10-year Treasury securities were 
yielding approximately 4.625%. We use 4.63% 
as an estimate for the purposes of this illustra¬ 
tion. The beta estimate for each company was 
obtained from the Value Line Investment Sur¬ 
vey: 0.9 for Eli Lilly, 1.0 for Schering-Plough and 
Wyeth. The discount rate, r, for each company 
based on the CAPM is: 


Eli Lilly r = 0.0463 + 0.9 (0.05) = 9.125% 

Schering-Plough r = 0.0463 + 1.0 (0.05) = 9.625% 

Wyeth r = 0.0463 + 1.0 (0.05) = 9.625% 


The dividend growth rate can be estimated by 
using the compounded rate of growth of histor¬ 
ical dividends. 
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The compound growth rate, g, is found using 
the following formula: 

Last dividend 
Starting dividend 

This formula is equivalent to calculating the ge¬ 
ometric mean of 1 plus the percentage change 
over the number of years. Using time value of 
money math, the 2006 dividend is the future 
value, the starting dividend is the present value, 
the number of years is the number of periods; 
solving for the interest rate produces the growth 
rate. 

Substituting the values for the starting and 
ending dividend amounts and the number of 
periods into the formula, we get: 



) l/no. of years 

- 1 


Company 

1991 

dividend 

2006 

dividend 

Estimated 
annual 
growth rate 

Eli-Lilly 

$0.50 

$1.60 

8.063% 

Schering-Plough 

$0.16 

$0.22 

2.146% 

Wyeth 

$0.60 

$1.01 

3.533% 


The value of D 0 , the estimate for g, and the dis¬ 
count rate r for each company are summarized 
below: 


Estimated 

Current annual Required 
dividend growth rate of 

Company D 0 rate g return r 

Eli-Lilly $1.60 8.063% 9.125% 

Schering-Plough $0.22 2.146% 9.625% 

Wyeth $1.01 3.533% 9.625% 


Wyeth estimated price 


$1.01(1 + 0.03533) 
0.09625 - 0.03533 
$1,046 


0.06092 


= $17.16 


Comparing the estimated price with the ac¬ 
tual price, we see that this model does not do a 
good job of pricing these stocks: 



Estimated 

Actual price 


price at the 

at the end 

Company 

end of 2006 

of 2006 

Eli Lilly 

$162.80 

$49.87 

Schering-Plough 

$3.00 

$23.44 

Wyeth 

$17.16 

$50.52 


Notice that the constant growth DDM is consid¬ 
erably off the mark for all three companies. The 
reasons include: (1) the dividend growth pat¬ 
tern for none of the three companies appears 
to suggest a constant growth rate, and (2) the 
growth rate of dividends in recent years has 
been much slower than earlier years (and, in 
fact, negative for Schering-Plough after 2003), 
causing growth rates estimated from the long 
time periods to overstate future growth. And 
this pattern is not unique to these companies. 

Another problem that arises in using the con¬ 
stant growth rate model is that the growth rate 
of dividends may exceed the discount rate, r. 
Consider the following three companies and 
their dividend growth over the 16-year period 
from 1991 through 2006, with the estimated re¬ 
quired rates of return: 


Substituting these values into equation (5), we 
obtain: 


Eli Lilly estimated price 


$1.60(1 +0.08063) 
0.09125 - 0.08063 
$1,729 


0.0162 


= $162.80 


Schering-Plough estimated price 


$0.22(1 + 0.02146) 


0.09625 

$0,225 

0.07479 


- 0.02146 
= $3.00 


Estimated 
Estimated required 


1991 

Company dividend 

2006 

dividend 

growth 

rate# 

rate of 
return 

Coca 

Cola 

$0.24 

$1.24 

11.70% 

7.625% 

Hershey 

$0.24 

$1.03 

10.198% 

7.875% 

Tootsie 

Roll 

$0.04 

$0.31 

14.627% 

8.625% 


For these three companies, the growth rate of 
dividends over the prior 16 years is greater than 
the discount rate. If we substitute the Do (the 
2006 dividends), the g, and the r into equation 
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(5), the estimated price at the end of 2006 is 
negative, which doesn't make sense. Therefore, 
there are some cases in which it is inappropriate 
to use the constant rate DDM. 

The potential for misvaluation using the con¬ 
stant rate DDM is highlighted by Fogler (1988) 
in his illustration using ABC prior to its be¬ 
ing taken over by Capital Cities in 1985. He 
estimated the value of ABC stock to be $53.88, 
which was less than its market price at the time 
(of $64) and less than the $121 paid per share by 
Capital Cities. 


Because dividends following the fourth year are 
presumed to grow at a constant rate g 2 forever, 
the value of a share at the end of the fourth year 
(that is, P 4 ) is determined by using equation (5), 
substituting D 0 (l + gi ) 4 for D 0 (because period 
4 is the base period for the value at end of the 
fourth year) and g 2 for the constant rate g: 

Dp(l + gi) 1 Do(l+gi) 2 Do(l+gi) 3 
(1 + r) 1 (1 + r) 2 (1 + r) 3 


r 1 

( E>o(l + gl) 4 (l + §2)\ 

L(1 + r) 4 

l r-g 2 ). 


(7) 


MULTIPHASE DIVIDEND 
DISCOUNT MODELS 

The assumption of constant growth is unrealis¬ 
tic and can even be misleading. Instead, most 
practitioners modify the constant growth DDM 
by assuming that companies will go through 
different growth phases. Within a given phase, 
dividends are assumed to grow at a constant 
rate. Molodovsky, May, and Chattiner (1965) 
were some of the pioneers in modifying the 
DDM to accommodate different growth rates. 


Two-Stage Growth Model 

The simplest form of multi-phase DDM is the 
two-stage growth model. A simple extension 
of equation (4) uses two different values of g. 
Referring to the first growth rate as gi and the 
second growth rate as g 2 and assuming that the 
first growth rate pertains to the next four years 
and the second growth rate refers to all years 
following, equation (4) can be modified as: 

Dotl+gi) 1 Do(l+gi) 2 Do(l+gi) 3 
(1+r) 1 (1 + r) 2 (1 + r) 3 

Do(l+gi) 4 Do(l+gi) 5 Dp(l + gi) 6 
(1 + r) 4 (1 + r) 5 (1 + r) 6 

which simplifies to: 


P = 


Dojl+gi ) 1 Do(l+gi ) 2 
(1+r) 1 (1 + r) 2 


Do(l+M ) 4 


+ P.i 


Po(l+gi ) 3 

(1 + r) 3 


Suppose a company's dividends are expected 
to grow at 4% rate for the next four years and 
then 8% thereafter. If the current dividend is 
$2.00 and the discount rate is 12%, 


P = 


$2.08 


$2.16 


$2.25 


(1 


0 . 12) 1 

$2.34 


(1 + 0 . 12) 2 ( 1 + 0 . 12) 3 


(1 + 0 . 12) 4 
= $46.87 


r 1 

( $2.53 V 

_(1 + 0.12) 4 

1,0.12 - 0.08 )_ 


If this company's dividends are expected to 
grow at the rate of 4% forever, the value of 
a share is $26.00; if this company's dividends 
are expected to grow at the rate of 8% forever, 
the value of a share is $52.00. But because the 
growth rate of dividends is expected to increase 
from 4% to 8% in four years, the value of a share 
is between those two values, or $46.87. 

As you can see from this example, the ba¬ 
sic valuation model can be modified to accom¬ 
modate different patterns of expected dividend 
growth. 


Three-Stage Growth Model 

The most popular multiphase model employed 
by practitioners appears to be the three-stage 
DDM. (The formula for this model is derived in 
Sorensen and Williamson [1985].) This model 
assumes that all companies go through three 
phases, analogous to the concept of the product 
life cycle. In the growth phase, a company ex¬ 
periences rapid earnings growth as it produces 
new products and expands market share. In the 
transition phase the company's earnings begin 


(1 + r) 4 
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to mature and decelerate to the rate of growth of 
the economy as a whole. At this point, the com¬ 
pany is in the maturity phase in which earn¬ 
ings continue to grow at the rate of the general 
economy. 

Different companies are assumed to be at 
different phases in the three-phase model. 
An emerging growth company would have a 
longer growth phase than a more mature com¬ 
pany. Some companies are considered to have 
higher initial growth rates and hence longer 
growth and transition phases. Other compa¬ 
nies may be considered to have lower current 
growth rates and hence shorter growth and 
transition phases. 

In the typical investment management orga¬ 
nization, analysts supply the projected earn¬ 
ings, dividends, growth rates for earnings, and 
dividend and payout ratios using fundamental 
security analysis. The growth rate at maturity 
for the entire economy is applied to all compa¬ 
nies. As a generalization, approximately 25% of 
the expected return from a company (projected 
by the DDM) comes from the growth phase, 
25% from the transition phase, and 50% from 
the maturity phase. However, a company with 
high growth and low dividend payouts shifts 
the relative contribution toward the maturity 
phase, while a company with low growth and 
a high payout shifts the relative contribution 
toward the growth and transition phases. 

STOCHASTIC DIVIDEND 
DISCOUNT MODELS 

As we noted in our discussion and illustration 
of the constant growth DDM, an erratic divi¬ 
dend pattern such as that of Wyeth can lead 
to quite a difference between the estimated 
price and the actual price. In the case of the 
pharmaceutical companies, the estimated price 
overstated the actual price for Eli Lilly, but 
understated the price of Schering-Plough and 
Wyeth. 

Hurley and Johnson (1998a, 1998b) have sug¬ 
gested a new family of valuation model. Their 


model allows for a more realistic pattern of divi¬ 
dend payments. The basic model generates div¬ 
idend payments based on a model that assumes 
that either the firm will increase dividends for 
the period by a constant amount or keep div¬ 
idends the same. The model is referred to as 
a stochastic DDM because the dividend can in¬ 
crease or be constant based on some estimated 
probability of each possibility occurring. The 
dividend stream used in the stochastic DDM is 
called the stochastic dividend stream. 

There are two versions of the stochastic DDM. 
One assumes that dividends either increase or 
decrease at a constant growth rate. This ver¬ 
sion is referred to as a binomial stochastic DDM 
because there are two possibilities for divi¬ 
dends. The second version is called a trino¬ 
mial stochastic DDM because it allows for an 
increase in dividends, no change in dividends, 
and a cut in dividends. We discuss each version 
below. 

Binomial Stochastic Model 

For both the binomial and trinomial stochastic 
DDM, there are two versions of the model—the 
additive growth model and the geometric 
growth model. The former model assumes that 
dividend growth is additive rather than geo¬ 
metric. This means that dividends are assumed 
to grow by a constant dollar amount. So, for 
example, if dividends are $2.00 today and the 
additive growth rate is assumed to be $0.25 
per year, then next year dividends will grow 
to $2.25, in two years dividends will grow to 
$2.50, and so on. The second model assumes a 
geometric rate of dividend growth. This is the 
same growth rate assumption used in the ear¬ 
lier DDMs presented in this entry. 

Binomial Additive Stochastic Model 

This formulation of the model is expressed as 
follows: 

Dt+i = 


Df + C with probability p 
Dt with probability 1 — p 


for t = 1,2,... 
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where 


D t = dividend in period t 
D t + 1 = dividend in period t+1 

C = dollar amount of the dividend increase 
p = probability that the dividend will 
increase 

Hurley and Johnson (1998a) have shown that 
the theoretical value of the stock based on the 
additive stochastic DDM assuming a constant 
discount rate is equal to: 


P = 



( 8 ) 


For example, consider once again Wyeth. In 
the illustration of the constant growth model, 
we used Do of $1.01 and a g of 3.533%. We es¬ 
timate C by calculating the dollar increase in 
dividends for each year that had a dividend in¬ 
crease and then taking the average dollar div¬ 
idend increase. The average of the increases is 
$0.0373. 


In the 15-year span 1991 through 2006, 
dividends increased 11 of the 14 year-to-year 
differences. Therefore, p = 11/15 = 73.3333%. 
Substituting these values into equation (8), we 
find the estimated price to be: 


P = 

P = 
P = 


$ 1.01 


[(« 


1 


1 


0.09625 ' LV 0.09125 0.09125 2 

$10.49351+ [(118.336) ($0.3727) (0.73333)] 
$10.49351 + $3.23682 = $13.73033 


($0.03727) 



Applying this model to the other two phar¬ 
maceutical companies, we see that the model 
produces an estimated price that is closer to the 
actual price than the fair value based on the 
constant growth model: 


Company 

Actual 
price at 
the end 
of 2006 

Estimated 
price at the 
end of 2006 
using a 
constant 
growth 
model 

Estimated 
price at the 
end of 2006 
using the 
binomial 
additive 
stochastic 
model 

Eli Lilly 

$49.87 

$162.79 

$29.94 

Schering-Plough $23.44 

$3.00 

$11.04 

Wyeth 

$50.52 

$17.16 

$13.73 


Binomial Geometric Stochastic Model 

Letting g be the growth rate of dividends, then 
the geometric dividend stream is 


Df+i 


Dt (1 + g) with probability p 

l , for t = 1, 2 

Dt with probability 1 — p 


Hurley and Johnson (1998b) show that the price 
of the stock in this case is: 

P = + ps) (9) 

r - pg 

Equation (9) is the binomial stochastic DDM as¬ 
suming a geometric growth rate and a constant 
discount rate. 


Trinomial Stochastic Models 

The trinomial stochastic DDM allows for divi¬ 
dend cuts. Within the Hurley-Johnson stochas¬ 
tic DDM framework, Yao (1997) derived this 
model that allows for a cut in dividends. He 
notes that is not uncommon for a firm to cut 
dividends temporarily. In fact, an examination 
of the dividend record of the electric utilities 
industry as published in Value Line Industry Re¬ 
view found that in the aggregate firms cut divi¬ 
dends three times over a 15-year period. 


Trinomial Additive Stochastic Model 

The additive stochastic DDM can be extended 
to allow for dividend cuts as follow: 


Df + C with probability pu 
D t — C with probability po 
D t with probability 
1 - p c = 1 - pu ~ Pd 


for t = 1,2,... 


where 


pu — probability that the dividend will 
increase 

p D = probability that the dividend will 
decrease 

pc — probability that the dividend will be 
unchanged 

The theoretical value of the stock based on 
the trinomial additive stochastic DDM then 
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becomes: 



r r 


2 C(p U -p D ) 


( 10 ) 


Notice that when p p is zero (that is, there is no 
possibility for a cut in dividends), equation (10) 
reduces to equation (8). 


Trinomial Geometric Stochastic Model 
For the trinomial geometric stochastic DDM al¬ 
lowing for a possibility of cuts, we have: 


' 

Dj (1 + g) with probability pu 
Df(l - g) with probability p D f- — 1 2 

Dt with probability 1 — pc 
= 1 Pu Pd 


and the theoretical price is: 

p _ Pot 1 + (Pu + Pd)I ^ 

r ~(pu~ PD)g 

Once again, substituting zero for pu, equation 
(11) reduces to equation (9)—the binomial geo¬ 
metric stochastic DDM. 


Applications of the Stochastic DDM 

Yao (1997) applied the stochastic DDMs to five 
electric utility stocks that had regular dividends 
from 1979 to 1994 and found that the models fit 
the various utility stocks differently. 

We see similar results in an updated example 
using five electric utilities, as shown in Table 1. 
For three of the five utilities, the binomial model 
provides an estimate closest to the actual stock 
price, whereas for the other two utilities, the 
trinomial model offers the closest estimate. In 


none of the cases, however, did the constant div¬ 
idend growth model offer the closest approxi¬ 
mation to the actual stock price. 


Advantages of the Stochastic DDM 
The stochatic DDM developed by Hurley and 
Johnson is a powerful tool for the analyst 
because it allows the analyst to generate a prob¬ 
ability distribution for a stock's value. The prob¬ 
ability distribution can be used by an analyst 
to assess whether a stock is sufficiently mis¬ 
priced to justify a buy or sell recommenda¬ 
tion. For example, suppose that a three-phase 
DDM indicates that the value of a stock trad¬ 
ing at $35 is $42. According to the model, the 
stock is underpriced and the analyst would 
recommend the purchase of this stock. How¬ 
ever, the analyst cannot express his or her con¬ 
fidence as to the degree to which the stock is 
undervalued. 

Hurley and Johnson show how the stochas¬ 
tic DDM can be used to overcome this limita¬ 
tion of traditional DDMs. An analyst can use 
the derived probability distribution from the 
stochastic DDM to assess the probability that 
the stock is undervalued. For example, an an¬ 
alyst may find from a probability distribution 
that the probability that the stock is greater than 
$35 (the market price) is 90%. 

To employ a stochastic DDM an analyst must 
be prepared to make subjective assumptions 
about the uncertain nature of future dividends. 
Monte Carlo simulation available on a spread 
sheet (@RISK in Excel, for example) can then be 
used to generate the probability distribution. 


Table 1 Fit of the Different Dividend Models Applied to Five Electric Utilities 


Company 

Consolidated 

Edison 

Dominion 

Resources 

FPL Group 

PPL 

TECO 

Energy 

Actual stock price, end of 2006 

$45.82 

$40.73 

$52.98 

$34.89 

$16.46 

Estimated stock price given the ... 

Constant dividend growth model 

$33.57 

$19.36 

$22.14 

$16.54 

$7.46 

Binomial stochastic dividend model 

$43.59 

$30.51 

$36.12 

$28.30 

$23.02 

Trinomial stochastic dividend model 

$63.12 

$25.84 

$41.23 

$23.71 

$14.45 
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EXPECTED RETURNS AND 
DIVIDEND DISCOUNT 
MODELS 


Thus far, we have seen how to calculate the 
fair price of a stock given the estimates of 
dividends, discount rates, terminal prices, and 
growth rates. The model-derived price is then 
compared to the actual price and the appropri¬ 
ate action is taken. 

The analysis can be recast in terms of expected 
return. This is found by calculating the return 
that will make the present value of the expected 
cash flows equal to the actual price. Mathemat¬ 
ically, this is expressed as follows: 


Di D2 

A = (1 + ER) 1 + (1 + ER) 2 

+ P - 
(1 + ER) N 

where 


D n 

(1 + ER) N 
( 12 ) 


P A = actual price of the stock 
ER = expected return 

The expected return (ER) in equation (12). For 
example, consider the following inputs used at 
the outset of this entry to illustrate the finite 
life general DDM as given by equation (3). For 
stock XYZ, the inputs assumed are: 

Di = $2.00 D 2 = $2.20 D 3 = $2.30 
D 4 = $2.55 D 5 = $2.65 P 5 = $26 N = 5 


We calculated a fair price based on equation 
(3) to be $24.90. Suppose that the actual price 
is $25.89. Then the expected return is found by 
solving the following equation for ER: 


$25.89 


$2.00 ( $2.20 ( $2.30 

(1 + ER) + (1 + ER) 2 + (1 + ER) 3 
$2.55 $2.65 $26.00 

(1 + ER) 4 + (1 + ER) 5 + (1 + ER) 5 


The expected return is 9%. 

The expected return is the discount rate 
that equates the present value of the expected 
future cash flows with the present value of the 
stock. The higher the expected return—for a 
given set of future cash flows—the lower the 



Figure 1 The Relation between the Fair Value of 
a Stock and the Stock's Expected Return 

current value. The relation between the fair 
value of a stock and the expected return of a 
stock is shown in Figure 1. 

Given the expected return and the required re¬ 
turn (that is, the value for r), any mispricing can 
be identified. If the expected return exceeds the 
required return, then the stock is undervalued; 
if it is less than the required return then the stock 
is overvalued. A stock is fairly valued if the ex¬ 
pected return is equal to the required return. In 
our illustration, the expected return (9%) is less 
than the required return (10%); therefore, stock 
XYZ is overvalued. 

With the same set of inputs, the identifica¬ 
tion of a stock being mispriced or fairly valued 
will be the same regardless of whether the fair 
value is determined and compared to the mar¬ 
ket price or the expected return is calculated and 
compared to the required return. In the case of 
XYZ stock, the fair value is $24.90. If the stock is 
trading at $25.89, it is overvalued. The expected 
return if the stock is trading at $25.89 is 9%, 
which is less than the required return of 10%. 
If, instead, the stock price is $24.90, it is fairly 
valued. The expected return can be shown to be 
10%, which is the same as the required return. 
At a price of $23.95, it can be shown that the ex¬ 
pected return is 11%. Since the required return 
is 10%, stock XYZ would be undervalued. 

While the illustration above uses the basic 
DDM, the expected return can be computed for 
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any of the models. In some cases, the calculation 
of the expected return is simple since a formula 
can be derived that specifies the expected return 
in terms of the other variables. For example, for 
the constant growth DDM given by equation 
(5), the expected return (r) can be easily solved 
to give: 



Rearranging the constant growth model to 
solve for the expected return, we see that the re¬ 
quired rate of return can be specified as the sum 
of the dividend yield and the expected growth 
rate of dividends. 


KEY POINTS 

• Dividends are measured in a number of 
ways, including dividends per share, divi¬ 
dend yield, and dividend payout. 

• The discounted cash flow approach to valu¬ 
ing common stock requires projecting future 
dividends. Hence, the model used to value 
common stock is called a dividend discount 
model. 

• The simplest dividend discount model is the 
constant growth model. More complex mod¬ 
els include the multiphase model and stochas¬ 
tic models. 

• Stock valuation using a dividend discount 
model is highly dependent on the inputs 
used. 

• A dividend discount model does not indicate 
when the current market price will reach its 
fair value. 

• The output of a dividend discount model is 
the fair price. However, the model can be used 
to generate the expected return. 

• The expected return is the interest rate that 
will make the present value of the expected 


dividends plus terminal price equal to the 
stock's market price. The expected return 
is then compared to the required return to 
assess whether a stock is fairly priced in the 
market. 


REFERENCES 

Campbell, J. Y., and Shiller, R. J. (1998). Valua¬ 
tion ratios and the long-run stock market out¬ 
look. Journal of Portfolio Management 24 (Winter): 
11-26. 

Fogler, R. H. (1988). Security analysis, DDMs, 
and probability. In Equity Markets and Valuation 
Methods (pp. 51-52). Charlottesville, VA: The 
Institute of Chartered Financial Analysts. 

Gordon, M., and Shapiro, E. (1956). Capital equip¬ 
ment analysis: The required rate of profit. Man¬ 
agement Science 3:102-110. 

Hurley, W. J., and Johnson, L. (1994). A realistic 
dividend valuation model. Financial Analysts 
Journal 50 (July-August): 50-54. 

Hurley, W. J., and Johnson, L. (1998a). Generalized 
Markov dividend discount models. Journal of 
Portfolio Management 25 (Fall): 27-31. 

Hurley, W. J., and Johnson, L. (1998b). The 
Theory and Application of Stochastic Divi¬ 
dend Models. Monograph 7, Clarica Finan¬ 
cial Services Research Centre, School of 
Business and Economics, Wilfrid Laurier 
University. 

Molodovsky, N., May, C., and Chattiner, S. (1965). 
Common stock valuation: Principles, tables, 
and applications. Financial Analysts Journal 21 
(November-December): 111-117. 

Sorensen, E., and Williamson, E. (1985). Some 
evidence on the value of dividend dis¬ 
count models. Financial Analysts Journal 41 
(November-December): 60-69. 

Williams, J. B. (1938). The Theory of Investment 
Value. Cambridge, MA: Harvard University 
Press. 

Yao, Y. (1997). A trinomial dividend valua¬ 
tion model. Journal of Portfolio Management 21 
(Summer): 99-103. 


Discounted Cash Flow Methods for 
Equity Valuation 

GLEN A. LARSEN Jr., PhD, CFA 

Professor of Finance, Indiana University Kelley School of Business-Indianapolis 


Abstract: Most applied methods of valuing a firm's equity are based on discounted cash flow and 
relative valuation models. Although stock and firm valuation is very strongly tilted toward the 
use of discounted cashflow methods, it is impossible to ignore the fact that many analysts use other 
methods to value equity and entire firms. The primary alternative valuation method is relative 
valuation. Both discounted cash flow and relative valuation methods require strong assumptions 
and expectations about the future. No one single valuation model or method is perfect. All valuation 
estimates are subject to model error and estimation error. 


Sound investing requires that an investor does 
not pay more for an asset than its worth. There 
are those who argue that value is in the eyes 
of the beholder, which is simply not true when 
it comes to financial assets. Perceptions may be 
all that matter when the asset is an art object or 
antique automobile, but investors should not 
buy financial assets for aesthetic or emotional 
reasons; financial assets are acquired for the 
cash flows expected from them in future peri¬ 
ods. Consequently, perceptions of value have to 
be backed up by reality, which implies that the 
price paid for any financial asset should reflect 
the cash flows that it is expected to generate. 

Realize that at the end of the most careful and 
detailed valuation, there will be uncertainty 
about the final numbers, biased as they are by 
the assumptions that we make about the future 
of the company and the economy. It is unreal¬ 
istic to expect or demand absolute certainty in 
valuation, since cash flows and discount rates 


are estimated with error. This also means that 
you have to give yourself a reasonable margin 
for error in making recommendations on the ba¬ 
sis of valuations. Most importantly, realize that 
the degree of precision in valuations is likely 
to vary widely across investments. For exam¬ 
ple, the valuation of a large and mature com¬ 
pany, with a long financial history, will usually 
be much more precise than the valuation of a 
young company or of a company that is in a 
sector that is in turmoil. 

Implicit often in the act of valuation is the 
assumption that markets make mistakes and 
that we can find these mistakes, often using in¬ 
formation that tens of thousands of other in¬ 
vestors can access. Thus, the argument that 
those who believe that markets are inefficient 
should spend their time and resources on val¬ 
uation whereas those who believe that markets 
are efficient should take the market price as the 
best estimate of value, seems to be reasonable. 
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This statement, though, does not reflect the in¬ 
ternal contradictions in both positions. Those 
who believe that markets are efficient may still 
feel that valuation has something to contribute, 
especially when they are called upon to value 
the effect of a change in the way a firm is run or 
to understand why market prices change over 
time. 

Furthermore, it is not clear how markets 
would become efficient in the first place, if in¬ 
vestors did not attempt to find under- and over¬ 
valued stocks and trade on these valuations. 
In other words, a precondition for market effi¬ 
ciency seems to be the existence of millions of 
investors who believe that markets are not. 

Stock-pricing models are not physical or 
chemical laws of nature. There is, however, a 
strong principle of investing that must eventu¬ 
ally hold true for all firms over time if they are to 
have a positive value. This principle is that you 
should always be able, in your mind, to con¬ 
struct some sort of logical connection between 
a positive stock price today and a stream of fu¬ 
ture cash flows to the investor. The logical chain 
might be long. You might assume that years of 
start-up losses (earnings are zero or negative) 
will be followed by more years of all profits be¬ 
ing reinvested. But you should be able to envi¬ 
sion some connection between today's positive 
stock price and a stream of cash flows that will 
commence someday in the future. 

In this entry, we discuss practical methods of 
valuing a firm's equity based on discounted 
cash flow (DCF) models. Although stock and 
firm valuation is very strongly tilted toward 
the use of DCF methods, it is impossible to ig¬ 
nore the fact that many analysts use other meth¬ 
ods to value equity and entire firms. The DCF 
model is the subject of this entry. The primary 
alternative valuation method is relative valua¬ 
tion (RV). Both DCF and RV valuation methods 
require strong assumptions and expectations 
about the future. No one single valuation model 
or method is perfect. All valuation estimates 
are subject to model error and estimation er¬ 
ror. Nevertheless, investors use these models to 
help form their expectations about a fair market 


price. Markets then generate an observable mar¬ 
ket clearing price based on investor expecta¬ 
tions, and this market clearing price constantly 
changes along with investor expectations. 

DIVIDEND DISCOUNT 
MODEL 

The dividend discount model (DDM) is the most 
basic DCF stock approach to equity valua¬ 
tion, originally formulated by Williams (1938). 
It states that the stock price should equal the 
present value of all expected future dividends 
into perpetuity under the assumption that a 
firm has an infinite life. But you may also have 
ignored the DDM once you recognized how dif¬ 
ficult it is to apply in the real world. The next 
several paragraphs simply review the basic con¬ 
cepts in order to highlight the complexities that 
surround implementing the DDM in practice. 

Consider an investor who buys a share of 
stock, planning to hold it for one year. As you 
know from previous studies, the intrinsic value 
of the share is the present value, P(0), of the ex¬ 
pected dividend to be received at the end of the 
first year, ED(1), and the expected sales price, 
EP(1). 

P(0) = [£D(1) + £P(1)]/(1 + R) (1) 

Keep in mind that since we live in a world of 
uncertainty and no human can perfectly fore¬ 
cast the future, future prices and dividends are 
unknown. Specifically, we are dealing with ex¬ 
pected values, not certain values. Under the 
assumption that dividends can be predictable, 
given a company's dividend history, the ex¬ 
pected future dividend in the next period, 
£D(1), can be estimated based on historical 
trends. You might ask how we can estimate 
£P(1), the expected year-end price. 

According to equation (1), the year-end intrin¬ 
sic value, P(l), will be 

P(l) = [£D(2) + £P(2)]/(1 + R) (2) 

If we assume the stock will be selling for its 
intrinsic value next year, then P(l) = £P( 1), and 
we can substitute equation (2) into equation (1), 
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which gives 

P( 0) = ED(1)/(1 + R) + [ED( 2) 

+ £P(2)]/(1 + P) 2 (3) 

Equation (3) may be interpreted as the present 
value of dividends plus the expected sales price 
at the end of a two-year holding period. Of 
course, now we need to come up with a forecast 
of EP(2). Continuing in the same way, we can re¬ 
place the expected price at the end of two years 
by the intrinsic value at the end of two years. 
That is, replace EP{ 2) by [£D(3) + £ P(3)]/(l + 
R), which relates P(0) to the value of dividends 
over three years plus the expected sales price at 
the end of a three-year holding period. 

More generally, for a holding period of T 
years, we can write the stock value as the 
present value of dividends over the T years dis¬ 
counted at an appropriate discount rate, R, that 
is assumed to remain constant, plus the present 
value of the ultimate sales price, £ P(T): 

P(0) = £D(1)/(1 + R) + £D(2)/(1 + R ) 2 + ■ ■ ■ 

+ [ED(T) + £P(T)]/(1 + Rf (4) 

In short, the intrinsic price of a share of stock is 
the present value of a stream of payments (divi¬ 
dends in the case of stocks) and a final payment 
(the sales price of the stock at time T). 

The key problems with implementing this 
model are the uncertainty of future dividends, 
the lack of a fixed maturity date, and the un¬ 
known sales price at the horizon date and the 
appropriate discount rate. Indeed, one can con¬ 
tinue to substitute for a terminal price on out to 
infinity (INF): 

P(0) = £D(1)/(1 + R) + £D(2)/(1 + R ) 2 + ■ ■ ■ 

+ ED(INF)/( 1 + P) INF (5) 

Equation (5) states that the stock price should 
equal the present value of all expected future 
dividends in perpetuity. This formula is the 
DDM in mathematical form. It is tempting, but 
incorrect, to conclude from the equation that the 
DDM focuses exclusively on dividends and ig¬ 
nores capital gains as a motive for investing in 
stock. Indeed, we assume explicitly in equation 


(4), the finite version of the DDM, that capital 
gains (as reflected in the expected sales price, 
£P(T)) are part of the stock's value. £P(T) is 
the present value at time T of all dividends ex¬ 
pected to be paid after the horizon date. That 
value is then discounted back to today, time T 
= 0. The DDM asserts that stock prices are de¬ 
termined ultimately by the cash flows accruing 
to stockholders, and those are dividends. 

Stocks That Currently Pay 
No Dividend 

If investors never expected a dividend to be 
paid, then this model implies that the stock 
would have no value. To reconcile the fact that 
stocks not paying a current dividend do have 
a positive market value with this model, one 
must assume that investors expect that some¬ 
day, at some time T, the firm must pay out some 
cash, even if only a liquidating dividend. 

CONSTANT-GROWTH DDM 

The general form of the DDM, as it stands, is 
still not very useful in valuing a stock because 
it requires dividend forecasts for every year into 
the indefinite future. To make the DDM practi¬ 
cal, we need to introduce some simplifying as¬ 
sumptions. One useful and common first pass 
at the problem is to assume that dividends are 
trending upward at a stable or constant growth 
rate, g. 

For example, if g = 0.05 and the most recently 
paid dividend was D(0) = 3.81, expected future 
dividends are 

£D(1) = D(0)(1 +g) = (3.81)(1.05) = 4.00 
£D( 2) = D(0)(1 + g) 2 = (3.81)(1.05) 2 = 4.20 
£D( 3) = D(0)(1 +y) 3 = (3.81)(1.05) 3 = 4.41 

and so on. Using these dividend forecasts, we 
can solve for intrinsic value as 

P(0) = £D(1)/(1 + R) + £D(2)/(1 + R ) 2 
+ £D(3)/(1 + Rf + ■ ■ ■ 

Since the basic form of this equation stretches 
to infinity, basic algebra allows this equation to 
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be written as 

P(0) = ED(l)/(R-g) (6) 

Equation (6) is called the constant-growth 
DDM, or the Gordon-Shapiro model, after My¬ 
ron Gordon and Eli Shapiro, who popularized 
the model [see Gordon (1962) and Gordon and 
Shapiro (1956)]. 

Equation (6) should remind you of the for¬ 
mula for the present value of perpetuity If divi¬ 
dends were expected not to grow, g = 0, then the 
dividend stream would be a simple perpetuity, 
and the valuation formula would be 

P( 0) = ED(1)/R 

P( 0) = ED(1)/(R — g) is a generalization of the 
perpetuity formula to cover the case of a per¬ 
petuity growing at a constant rate, g. As g in¬ 
creases, for a given value of £D(1), the stock 
price rises. The constant-growth DDM is valid 
only when g is less than R. If dividends were 
expected to grow forever (to infinity) at a rate 
faster than R, the value of the stock would be 
infinite. Further, in all of the DDM equations 
presented, R is also assumed to be constant 
forever. 


NONCONSTANT-GROWTH 

DDM 

If you feel that you know the future growth 
rates in each period for a firm, then you can 
certainly use unique growth rates, g(T) and re¬ 
quired rates of return, R(T), in the present value 
equation and discount all unique dividends and 
future selling price back to the present. The 
problem becomes one of time, effort, and es¬ 
timation risk. At some future point in time, 
what you believe to be a better unique esti¬ 
mate of a future dividend or a future discount 
rate will in reality be no better than an assump¬ 
tion of constant growth and constant discount 
rate. 


INTUITION BEHIND 
THE DDM 

In a market economy, common sense dictates 
that you should go into business only if you 
expect to make money. In a sole proprietor¬ 
ship, everything left over from the revenue you 
earned, minus expenses, is yours. In other forms 
of a business organization, you need to be a bit 
more formal because there are other owners. 
In a partnership, partners draw money out of 
the business. And shareholders get money out 
of a corporation by receiving dividends. Using 
the corporate form as an example, the value per 
share is determined by the value of the divi¬ 
dends distributed to each shareholder. That is, 
the value per share is determined by the present 
value of each shareholder's expected share of 
the profits. 

Here is a simple example that illustrates sev¬ 
eral of the uncertainties involved with the basic 
DCF valuation process for a share of common 
stock. Let's say you consider buying shares of 
a corporation. How much will you pay if the 
expected annual dividend forever is $10 per 
share? That depends on how much of an annual 
"return" you want. If you want a 10% return, 
you'll offer $100 (that is, a $10 dividend divided 
by a $100 investment equals a return of 10%). 
But just because you offer to pay $100 doesn't 
mean someone will sell to you at that price. 

Financial capital is subject to principles of 
market supply and demand, just like commodi¬ 
ties. Suppose market conditions are such that 
prevailing rates of return for corporate shares 
in this particular risk class are in the 5% range. 
If I'm selling stock that commands a $10 per 
share dividend I can demand a price of $200, 
and someone will give it to me. Suppose this 
corporation is a bit riskier than most others. 
A buyer may say, "If I'm willing to accept the 
prevailing 5% return, there are hundreds upon 
hundreds of better-quality corporations I can 
invest in. So if you want me to buy your shares, 
you need to give me incentive to bypass all the 
others. The buyer and seller may settle on a 7% 
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return, which is equivalent to a price of about 
$143. The appropriate required rate of return, R, 
is therefore critical, and R can vary with market 
conditions. 

In all cases, assuming that the life of the cor¬ 
poration is infinite, the current price, P(0), is 
computed as the constant dividend in perpetu¬ 
ity, D, divided by the required rate of return, R, 
that is, the present value of all future constant 
dividends. Often, though, investors use return, 
R, as the basis for comparing and pricing in¬ 
vestments. R is often estimated from observable 
information as D (dividend) divided by current 
price P(0). Mathematically, it looks like this: 

R = D/P( 0) 

You've seen this before. It is the dividend yield. 

COMPLICATIONS IN 
IMPLEMENTING THE DDM 
IN THE REAL WORLD 

As you can see by now, there are essentially four 
major issues that complicate finding the present 
value of all future dividends and, therefore, in 
implementing the DDM. 

Expected Growth of Dividends 

As profits grow over time (as we hope they 
will), dividends can be expected to grow and 
not remain constant forever. If profits and divi¬ 
dends are growing by 10% every year, the div¬ 
idend this year may be $10, but by next year, 
it will be $11. If we divide $11 by today's $200 
purchase price, next year's yield will be 5.5% 
(11/200). The year after, assuming further 10% 
growth, the dividend will be $12.10. Dividing 
that by the $200 purchase price produces a yield 
of 6.05%. The buyer might smile, but the seller 
won't accept it. The seller wants a price that 
truly is consistent with the prevailing 5% yield. 
At $200, the buyer gets too much of a good deal. 
If the latter holds the stock over time, he'll wind 
up with an annual return well in excess of 5%. 


Appropriate Expected Required 
Rate of Return 

Simply stated, present value is a tool for com¬ 
puting today's equivalent of a cash payment to 
be made tomorrow. As stated earlier, this is of¬ 
ten referred to as DCF valuation. If I offer you 
$10 today or $10 a year from now, you'll prob¬ 
ably choose $10 today. But if the choice is $10 
today or $11.50 a year from now, you have to 
pause. If you can invest today's $10 payment 
for one year at 5%, at the end of the year you'll 
have $10.50. But if you bypass the $10 for now 
and wait, you can get $11.50 a year hence. That's 
a better deal. The way to decide if you should 
wait is to do some mathematics that helps you 
decide how much you must receive today to 
allow you to invest and wind up with $11.50 a 
year hence. In this example, the "present value" 
of $11.50 one year from now, assuming a 5% re¬ 
turn, is $10.95. If I take $10.95 and invest it for 
one year at 5%, I'll wind up with $11.50 at the 
end of the year. If interest rates rise, to say 8%, 
it'll take less money today to generate $11.50 
a year hence ($10.65 will be sufficient). So as 
interest rates rise, present values fall, and vice 
versa. 

Expected Future Selling Price 

Thus far, we have thought about a stream of 
dividends stretching into the infinite future. 
Even long-term investors prefer a holding pe¬ 
riod that's something short of infinity. So we 
need to account for the fact that someday you'll 
want to sell your shares. As such, the proceeds 
you expect to get when you sell are included, 
along with dividends, in the stream of cash you 
expect to get, and that goes into the present 
value calculation. 

Let's think about a projection of the future sale 
price. If you think you may sell in two years, 
imagine how a prospective buyer, two years 
into the future, will value the dividend stream 
that he'll get. Continuing with the preceding 
example, he'll be looking at an initial payout of 
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$12.10 and a 5% return. So a price of $244 seems 
a reasonable starting point. Of course, you'll 
need to make adjustments for probable growth 
beyond year two. And perhaps 5% won't be ap¬ 
propriate as a rate of return. Market rates may 
rise or fall, and / or the quality of the corpora¬ 
tion may improve or deteriorate relative to al¬ 
ternative investments. And two years hence, the 
growth forecast may change. But in any case, we 
do have a $244 starting point. The changes may 
bring it up, perhaps to $275, or down, possibly 
to $175. But if an exuberant analyst publishes 
a target price of $1,000, you ought to raise an 
eyebrow and insist that the analyst get serious 
about justifying his presumably bold assump¬ 
tions about market rates, growth, or company 
quality. 


Reinvestment of Profits/Internal 
Financing that Support Growth 

It is standard for corporations to refrain from 
paying out all annual profit as dividend. Some 
money is held in the business for a rainy day. 
And some money is simply reinvested for fu¬ 
ture growth. Either way, profits not paid out 
as dividends are known as retained earnings. 
Reinvestment is more desirable than dividend 
payments if the corporation can earn a higher 
return on the money than the shareholder could 
get (by reinvesting the dividends). If all goes 
well, the reinvestment will enable the corpo¬ 
ration to pay a higher dividend in the future 
than would otherwise have been the case. Go¬ 
ing back to the preceding example, if reinvest¬ 
ment gives the corporation the ability to set a 
year-five payout at $18 rather than $12.10, that 
raises the starting-point target price to $360. 
A shareholder who accepts a forecast like that 
would likely forgo all or some immediate div¬ 
idend payments in order to get that bigger 
future reward. As you can see, even if a corpo¬ 
ration currently pays little or no dividend, we 
still have to acknowledge dividends as a major 
factor in our thoughts about share pricing. 


For better or worse, many corporations now 
see themselves as "growth" companies. And 
many shareholders have accepted a situation 
where these publicly traded growth companies 
pay out very little of their profits, if anything, 
as dividends, and reinvest most or all profits 
back into the business. Many companies do not 
deliver nearly as well on the growth dream as 
everybody hopes. But the growth culture re¬ 
mains alive and well, and the dividend payout 
ratio has declined. 


ADAPTING TO THE 
COMPLICATIONS: THE 
EARNINGS PER SHARE 
APPROACH 

As a result of the four complications listed, 
modern stock prices have become uncoupled 
from dividends. So, in the real world, it is dif¬ 
ficult to compute a fair price through the basic 
dividend formulas presented. 

Here is one solution. It involves substituting 
earnings per share (EPS) for dividends. This 
doesn't really work in a theoretical DDM sense, 
but it does work within the context of a growth 
culture. Shareholders have so thoroughly ac¬ 
cepted and adopted growth that they act as if 
all corporate EPS (whether paid as dividends 
or reinvested back into the business) is in their 
hands. So, instead of working with a dividend 
yield as presented earlier, we can substitute 
an earnings (£) yield, which is computed as 
follows: 

Earnings Yield = E/P 

Does the E/P ratio look familiar? It should. Turn 
it upside down and we get something you see 
all the time: the P/E (price/earnings) ratio. 

It is important to emphasize that P/E ratios 
are not just one of those things we use for the 
heck of it. They have a serious and solid in¬ 
tellectual underpinning. They are equivalent 
to earnings yields, which are the modern-day 
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substitute for dividend yields—the true basis 
for valuing ownership of corporate stock. So 
when somebody states that P/E ratios are no 
longer relevant, you'd best turn away Buying 
any stock without addressing the P/E ratio is 
not sensible. 

When we flip P /E back over and think of earn¬ 
ings yield, we can understand, from the prior 
discussion of dividend yield, that a bad com¬ 
pany's stock will have to offer a higher yield 
to attract buyers. Similarly, the yield for a great 
company will be low (otherwise, there would be 
too many would-be buyers). Let's see how this 
works when we flip the earnings yields back to 
P/Es. 

If EPS equals $3.00 and the earnings yield is 
5%, the price will be $60. If it's a bad company 
and the yield is higher, at 8%, the stock price 
will be $37.50. If it's a good company and the 
yield is lower, say 3%, the stock price will be 
$100. The starting number translates to a P/E as 
follows: a $60 price divided by $3.00 EPS gives 
us a P/E of 20. A bad-company stock price of 
$37.50 divided by EPS of $3.00 produces a P/E 
of 12.5. A good-company stock price of $100 
divided by EPS of $3.00 produces a P/E of 33.3. 

That's the basis for the generally recognized 
phenomenon of good stocks having higher 
P/Es and bad stocks generally having lower 
P/Es. So, once again, this isn't just one of those 
things. It's an inevitable result of the basic prin¬ 
ciples of finance and math. When evaluating 
companies, good or bad is usually determined 
based on growth prospects and risk. 

We handled the complicating factors by treat¬ 
ing EPS as if it were the same as a dividend. 
But notwithstanding, we still have a reason¬ 
ably rational basis for stock prices. We can ar¬ 
gue over what the growth prospects are and 
what the market return ought to be (based on 
differing assessments of market conditions and 
company-quality issues). So there will always 
be disagreement on what, exactly, a fair stock 
price ought to be. But all rational investors 
should be somewhere in the same ballpark. We 
may have a big ballpark and debate if a stock 


that commands $25 today is worth $15 or $35. 
But we are unlikely to seriously consider a price 
of, say, $350. 

FREE CASH FLOW DCF 
MODEL—TOTAL FIRM 
VALUATION 

While estimating future cash flows to an indi¬ 
vidual share of stock can seem daunting, some 
investors prefer to estimate the free cash flow 
to the entire firm. Doing this allows investors 
to estimate the value of the entire firm and 
then "back out" an estimated value of a share 
of stock. This is called the free cashflow (FCF) 
model. While legitimate accounting rules do 
enable managers and auditors some range of 
choices, at the end of the day, good companies 
wind up looking good and bad companies wind 
up looking bad. In short, there's no one number 
in an income report that truly gives you the nec¬ 
essary information to value a firm from a dis¬ 
counted expected future cash flow viewpoint. 
You still have to select which type of cash flow 
you're going to look at. But the choice becomes 
very easy once you ask yourself the following 
question: What's my specific purpose for want¬ 
ing to know how a company is doing? 

There are many different types of users of fi¬ 
nancial information, and each is best served by 
concentrating on the information most relevant 
to him/her. Let's look at various kinds of num¬ 
bers and consider what they say, and what types 
of investors will find them most useful. 

Generally accepted accounting principles 
(GAAP) is a set of formal rules that produces 
what most of us have come to accept as the 
most official, or standard, version of income 
that a public corporation can report. Novices 
often believe this is the only valid number and 
are perplexed to learn otherwise. Essentially, 
GAAP is simple: Revenues minus costs equal 
profits. But the world is a complex place. For 
our convenience, we divide our activities into 
time periods. In a simple world, all costs would 
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be incurred in the same period as the revenues 
with which they are associated. But that is often 
not the case, so accountants have to find ways 
to identify which expenses should be matched 
against which revenues. One example is depre¬ 
ciation, a concept used to allocate multiperiod 
costs of a given expense to all the periods in 
which the expense generates revenue (e.g., if 
a factory can produce revenue for 10 years, 
charge one-tenth of the cost to build it against 
revenue in each year). 

Observers correctly note that depreciation 
rules are artificial, and advocate use of other 
performance measures that are supposedly 
more "real." We'll touch on this later. But for 
now, it's important to understand that depre¬ 
ciation rules are motivated by good purpose. 
They, and other GAAP rules, are designed to 
paint a picture of the "economic" performance 
of the business, something that is not necessar¬ 
ily the same as a running tally of physical dol¬ 
lars coming in and going out within a specific 
period of time. 

If you are looking to see how a company is 
doing because you want to form an opinion 
as to whether or not it has a track record of 
"success" (defined however you wish), GAAP 
income is very important to you. 

As noted, many investors do not like GAAP 
because of the artificial nature of depreciation. 
Their objection is valid. GAAP is, indeed, im¬ 
perfect. Companies have latitude to determine 
how to calculate it. They don't always use an 
equal allocation for each year. It's difficult, if not 
impossible, to reliably estimate useful life, espe¬ 
cially since assets are usually enhanced (that is, 
factories modernized) as time passes, thereby 
giving rise to extended life and additional de¬ 
preciable expenses tacked on. An assumption 
that at the end of the depreciation period the as¬ 
set will be worth zero, or some predetermined 
salvage value, is often untrue in the real world. 
And besides, there are other kinds of "artifi¬ 
cial" revenue-expense matching formulations 
to cover other situations. But depreciation is 
usually the biggest objection. 


Difference between Cash Flow and 
Free Cash Flow 

The response is often to add depreciation back 
to net income to calculate cash flow. This can be 
a trap for the unwary. The phrase "cash flow" 
sounds comforting. After all, how much more 
reliable a gauge of performance can you seek 
than cash in minus cash out? Read the warn¬ 
ing label closely. Is the cash flow you're seeing 
truly computed by adding depreciation back 
to net income? If that's what's happening, be 
very careful. Companies spend money to en¬ 
hance their assets every year. Because it is un¬ 
derstood that the benefits of these expenditures 
will span many years, they are not put on the in¬ 
come statement in any single year. So, in truth, 
simple cash flow understates a company's true 
cash-in minus cash-out situation. The solution 
lies in the firm's free cash flow. To arrive at a 
firm's FCF, we start with net income, add back 
the noncash depreciation charge, and then sub¬ 
tract the year's capital-spending outlays. (There 
are other adjustments, such as those relating 
to dividends and changes in net working capi¬ 
tal; but for now, these simple adjustments will 
suffice.) 

Once you hone in on FCF, you aren't likely 
to be misled regarding liquidity. But that 
does not mean you are learning about general 
corporate success or failure. Capital-spending 
programs aren't "smooth." In some years, ex¬ 
penditures are very large as major programs 
ramp up. In other years, capital spending 
shrinks as these programs wind down toward 
completion. If we're in a heavy-spending year, 
FCF could be negative, even though the com¬ 
pany may be having a great year. 

DCF valuation depends on the construction 
of pro forma financial statements in order to 
estimate a firm's future cash flows. Pro forma 
is Latin for "as if." This measure shows how a 
company might perform in the future "as if" it 
performs as it has in the past and other assump¬ 
tions that are made by the analyst. In any event, 
it is necessary to construct pro forma financial 
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statements in order to estimate future free cash 
flows that are the basis for total firm valuation. 

CALCULATING FCF 

Operating cash flow (OCF) is defined as be¬ 
ing equal to earnings before interest and taxes 
(EBIT) minus taxes plus depreciation. Note, 
though, that cash flows cannot be maintained 
over time unless depreciating fixed assets are 
replaced. That is, the firm must reinvest in those 
assets that are depreciating (wearing out) so 
that it can stay alive. Interest paid or any other 
financing costs such as dividends or principal 
repaid are not subtracted because we are inter¬ 
ested in the cash flow generated by the assets 
of the firm. The particular mixture of debt and 
equity a firm actually chooses to use is a man¬ 
agerial decision and determines how the OCF 
is distributed between owners (equity holders) 
and creditors (debt holders). The mixture also 
determines the firm's weighted average cost of 
capital (WACC), which impacts the firm's value 
through the discount rate. 

OCF = EBIT — Taxes + Depreciation 

Net operating profit after tax (NOPAT) is de¬ 
fined as EBIT minus taxes. 

NOPAT = EBIT - Taxes = EBIT x (1 - Tax rate) 

As a result, OCF can also be written as NOPAT 
plus any noncash adjustments. Where depreci¬ 
ation is the only noncash adjustment: 

OCF = NOPAT + Depreciation 

Free cash flow is defined as being the cash flow 
actually available for distribution to investors 
after the company has made all the investments 
in fixed assets and working capital necessary to 
sustain ongoing operations. To be more specific, 
the value of a company's operations depends 
on all the future expected FCFs, defined as OCF 
minus the amount of investment in working 
capital and fixed assets necessary to sustain the 
business. Thus, FCF represents the cash that is 
actually available for distribution to investors. 


Therefore, the way for managers to make their 
companies more valuable is to create a sustain¬ 
able increase in the firm's FCF. 

FCF = OCF - Change in NWC 

— Gross investment in operating capital 

Let's illustrate this. Assume a firm has NOPAT 
of $170.3 million. Its OCF is NOPAT plus any 
noncash adjustments as shown on the statement 
of cash flows. Where depreciation is the only 
noncash charge, the operating cash flow is: 

OCF = NOPAT + Depreciation 

= $170.3 + $100 = $270.3 million 

Further, assume the firm had $1,455 million of 
operating assets, or operating capital, at the end 
of the year, but $1,800 million at the end of the 
next year. Therefore, during the year: 

Net investment in operating capital 
= $1,800 - $1,455 = $345 million 

However, the firm took $100 million of depreci¬ 
ation. We find the gross investment in operating 
capital as follows: 

Gross investment in operating capital 
= Net investment + Depreciation 
= $345 + $100 = $445 million 
FCF in the year is: 

CF = OCF — Gross investment in operating capital 
= $270.3 - $445 = -$174.7 million (Negative FCF) 

Even though the firm had a positive OCF, its 
very high investment in operating capital re¬ 
sulted in a negative FCF. Since FCF is what is 
available for distribution to investors, not only 
was there nothing for investors, but investors 
actually had to provide more money to the firm 
to keep the business going. 

Is a negative FCF always bad? It depends on 
why the FCF was negative. If FCF was neg¬ 
ative because NOPAT was negative, this is a 
bad sign, because the company probably is ex¬ 
periencing operating problems. Exceptions to 
this might be start-up companies, or companies 
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Table 1 Free Cash Flow Statement: Indirect Method 


Net Income (Net Earnings) 

+ Depreciation 

— Increase in accounts receivable (A/R) 

— Increase in inventories 


+ Increase in accounts payable (A/P) 

+ Increase in taxes payable 
+ After-tax interest expense 


= Operating cash flow (OCF) 

— Gross investment in property, plant, 
and equipment (PP&E), at cost 


= Free cash flow (FCF) 


Depreciation is a noncash expense, and therefore is added back to 
calculate cash flows. 

The increase in A/R represents sales that have not yet been collected, 
and therefore did not produce a cash inflow. 

The increase in inventory has not been recognized as part of cost of 
goods sold (COGS) but was fully paid for, and therefore is deducted 
from the cash flow. 

The increase in A/P represents costs that have not yet been paid, and 
therefore is added back to the cash flow. 

Like the increase in A/P, these taxes have not yet been paid. 

We want to evaluate the operating side of the business and its financial 
side separately. The interest payment is a financial expense, and 
therefore we add back the "net interest cost." 

Some of the cash from operations must be used to buy the assets, such 
as equipment and plants that will allow the firm to generate future 
income. This is cash that cannot be freely used to pay dividends, to 
buy back shares, to repay loans, and the like, and therefore is 
deducted from the OCF to arrive at the FCF. 

This is the cash that the firm can use to distribute to any and all of its 
suppliers of capital, such as stockholders, debt holders, and warrant 
holders. 


that are incurring significant current expenses 
to launch a new product line. Also, many high- 
growth companies have positive NOPAT but 
negative FCF due to new investment in operat¬ 
ing assets needed to support growth. There is 
nothing wrong with profitable growth, but at 
some point in time FCF must turn positive in 
order for a firm to have value. We will see this 
later in a firm valuation example. 


USING THE CASH-FLOW 
STATEMENT TO ARRIVE AT 
OCF AND FCF 

As stated earlier, FCF is a concept that defines 
the amount of cash that the firm can distribute 
to security holders. There are two principal 
techniques to calculate the FCF—the indirect 
method and the direct method. Tables 1 and 2 


Table 2 Free Cash-Flow Statement: Direct Method 


Sales 

— COGS+SG&A 

— Increase in accounts receivable (A/R) 

— Increase in inventory 

+ Increase in accounts payable (A/P) 

+ Depreciation 

— Tax on operating income 


+ Increase in taxes payable 
= Operating cash flow (OCF) 

— Gross investment in property plant & 
equipment (PP&E) at cost 
= Free cash flow (FCF) 


As recorded on the Income Statement 

Cost of goods sold (COGS) + Selling, general and administrative 
expenses (SG&A) 

Credit sales are recorded as income but do not generate a cash inflow. 
Thus, to adjust "sales" to cash basis, we deduct the increase in A/R. 

Inventory was paid for and thus represents a cash drain. 

A/P are expenses not yet paid. 

Depreciation is not a cash expense and is netted out. 

The difference between taxes on operating income and the increase in 
taxes payable is the tax shield on interest, which we don't want to 
include in the OCF 






Discounted Cash Flow Methods for Equity Valuation 


25 


illustrate the direct and the indirect methods of 
converting accounting earnings into FCFs. The 
indirect approach first converts the net income 
(NI) to OCF then to FCF. The direct approach 
converts each item in the income statement to 
cash basis. 

The indirect method of calculating cash flows 
starts with the firm's NI and makes appropriate 
adjustments to arrive at a number that shows 
how much cash the firm has taken in over the 
period. The adjustments that have to be made to 
NI are of two types—operational adjustments 
and financial adjustments. When a firm pays 
interest, net income is defined as 

NI = EBIT — Interest — Taxes 

NI = EBT - Taxes 

The following adjustments must be made in or¬ 
der to present the results of the business activity 
of the firm on a cash basis as explained later in 
this entry. 


Adjustments for Changes in Net 
Working Capital 

Adjustments for changes in net working capi¬ 
tal (ANWC) are made because not all the sales 
are made in cash and because not all the firm's 
expenses are paid out in cash. The term and 
notation are somewhat misleading: Not all the 
firm's working capital items are operationally 
related; since we are interested in cash derived 
from the ongoing business activity of the firm, 
we ignore all other current items in our ANWC. 
Cash and marketable securities are the best ex¬ 
ample of working capital items that we ex¬ 
clude from our definition of ANWC, as they 
are the firm's stock of excess liquidity. Another 
working capital item that we exclude from the 
adjustment is notes payable or short-term bor¬ 
rowing. Since our aim in the FCF statement 
is to calculate the cash available to the firm 
from its business activities, we exclude from the 
FCF statement any cash flows relating to the 


firm's financing activities—short term or long 
term. 

Adjustments for Investment in New 
Fixed Assets 

When investment in these assets is necessary 
for the ongoing business activity of the firm, it 
cannot be used to pay security holders and thus 
must be deducted to calculate the FCF. 

Adjustments for Depreciation and 
Other Noncash Expenses 

Although depreciation is an expense for tax 
and financial reporting purposes (thus lower¬ 
ing earnings before taxes [EBT] and hence prof¬ 
its after taxes—[NI]), it is by itself not a cash 
expense. In the FCF statement, we thus add the 
depreciation back to NI. The remaining effect 
of depreciation and other noncash expenses on 
the FCF is the tax savings they entail. 

Financial Adjustments 

Financial adjustments are adjustments for fi¬ 
nancial items included in NI. Since FCF is a 
concept that relates to the ongoing business (as 
opposed to financial) activities of the firm, we 
want to neutralize financial items when con¬ 
verting NI into FCF. Thus, for example, al¬ 
though NI includes interest as an expense, we 
will add back the after-tax interest expenses to 
obtain the FCF. 

The concept of FCF is of cash flows that are 
generated by the business activities of the firm 
and are available (that is, "free") for distribu¬ 
tion to all suppliers of capital, such as equity 
holders, bondholders, convertible holders, and 
preferred stockholders. The calculation of ac¬ 
counting earnings (net income), however, is 
done from the point of view of shareholders, 
which is only one group of capital suppliers. 

After calculating the FCFs, we consider their 
uses. The FCFs can be paid to any secu¬ 
rity holder of the firm, such as debt holders. 
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Table 3 Cash Plow Statement 


Periodic payments 

Interest 

These periodic payments to the capital 

Capital market transactions 

Preferred dividend 

Regular dividend 

And so on 

Retirement of securities 

suppliers of the firm are after tax! (The free 
cash flows [FCFs] from which we pay these 
financial flows are also after-tax cash flows!) 
These sums represent cash paid when old 

Change in cash 

Debt retirement 

Preferred stock retirement 
Share repurchase 

And so on 

New financing 

New bank loans 

New bond flotation 

Stock sale 

Exercise of warrants 

And so on 

=FCF — financial cash flows 

securities are retired or represent cash 
received when new securities are affiliated 
(privately or publicly). 


stockholders, warrant holders, and convertible 
bondholders. 

The cash flows paid to the security hold¬ 
ers are the financial cash flows, which include 
interest, dividends, principal repayment, share 
repurchases, and funds received upon the is¬ 
suance of new securities. Obviously, when the 
FCF is negative (e.g., because growth oppor¬ 
tunities necessitate large investments in fixed 
assets), the financial cash flows must be a net 
inflow of funds net new financing (of, say, the 
needed investments). 

The difference between the funds generated 
by the firm's business, the FCF, and the funds 
distributed to the security holders of the firm, 
the financial cash flows (see Table 3), is the 
change in cash over the period. 

Thus, the bottom line of the cash flow state¬ 
ment is the closing link of the three accounting 
statements of financial performance: 

* The income statement's bottom line-retained 
earnings feeds into the closing balance sheet 
as the increase in accumulated retained 
earnings. 

• The income statement and the beginning and 
closing balance sheets are the basis for the 
computation of the cash flow statements. 


• The last line of the cash flow statement— 
change in cash (and cash equivalents)—feeds 
back into the end-of-period balance sheet's 
cash account. 

The cross-reference of the three accounting 
statements means that we can use accounting 
methods to ensure that models of projected fi¬ 
nancial performance are internally consistent. 
The firm's income statement and its cash flow 
statement are often the basis for predictions of 
its future FCFs. Note, however, that these state¬ 
ments reflect the past performance of the firm 
and are not, in themselves, necessarily predic¬ 
tive of future firm performance. 


VALUING THE TOTAL FIRM 

Earlier we introduced several equations for 
valuing a firm's common stock. For example, 
review the constant growth dividend discount 
model and the nonconstant growth dividend 
discount model. These models (equations) have 
one common element: They all assume that the 
firm is currently paying a dividend. Flowever, 
consider the situation of a start-up company 
formed to develop and market a new prod¬ 
uct. Such a company generally expects to have 
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low sales during its first few years as it devel¬ 
ops and begins to market its product. Then, if 
the product catches on, sales will grow rapidly 
for several years. Growing sales require addi¬ 
tional assets. A company cannot grow without 
increasing its assets. Moreover, increasing a lia¬ 
bility and /or equity account must finance asset 
growth. 

Small firms can often obtain some bank credit, 
but they must maintain a reasonable balance be¬ 
tween debt and equity. Thus, additional bank 
borrowings require increases in equity, but 
small firms have limited access to the stock 
market. Moreover, even if they can sell stock, 
their owners are often reluctant to do so for 
fear of losing voting control. Therefore, the best 
source of equity for most small businesses is 
from retaining earnings, so most small firms 
pay no dividends during their rapid-growth 
years. Eventually, most successful firms do pay 
dividends, with dividends growing rapidly at 
first but then slowing down as the firm ap¬ 
proaches maturity. 

Although most larger firms do pay a divi¬ 
dend, some firms, even highly profitable ones, 
have never paid a dividend. How can the value 
of such a company be determined? Similarly, 
suppose you start a business and someone of¬ 
fers to buy it from you. How could you de¬ 
termine its value, or that of any privately held 
business? Alternatively, suppose you work for 
a company with a number of divisions. How 
could you determine the value of one partic¬ 
ular division that the company wants to sell? 
In none of these cases could you use the div¬ 
idend growth model. However, you could use 
the FCF model to estimate total firm value, then 
back out the value of equity. 

ESTIMATING TOTAL FIRM 
VALUE USING THE FCF 
MODEL 

Tables 4 and 5 contain the actual 20X8 and 
projected 20X9 to 20Y2 financial statements for 


XYZ Inc. The negative FCF in the early years 
is typical for young, high-growth companies. 
Even though NOPAT is positive in all years, 
FCF is negative because of the need to invest in 
operating assets. The negative FCF means the 
company will have to obtain new funds from in¬ 
vestors, and the balance sheets in Table 5 show 
that notes payable, long-term bonds, and pre¬ 
ferred stock all increase from 20X8 to 20X9. 

Assume that XYZ's cost of capital is 10.84%. 
To find its going-concern value, we use an 
approach similar to the nonconstant dividend 
growth model, proceeding as follows: 

1. Assume that the firm will experience non¬ 
constant growth for N years, after which it 
will grow at some constant rate. 

2. Calculate the expected FCF for each of the 
N nonconstant growth years, and find the 
present value (PV) of these cash flows. 

3. Recognize that after Year N growth will be 
constant, so we can use the constant growth 
formula to find the firm's value at Year N. 
This "terminal value" is the value of the PVs 
for N + 1 and all subsequent years (to in¬ 
finity), discounted back to Year N. Then, the 
Year N value must be discounted back to the 
present to find its PV at Year 0. 

4. Now sum all the PVs, those of the annual 
free cash flows during the nonconstant pe¬ 
riod plus the PV of the terminal value, to find 
the firm's value of operations. This going- 
concern value, when added to the value of 
the nonoperating assets, is the total value of 
the firm. 

Stockholders will also help fund XYZ's 
growth. They will receive no dividends until 
20Yl, so all of the net income from 20X8 to 20Y1 
will be reinvested. However, as growth slows, 
FCF will become positive, and XYZ plans to use 
some of its FCF to pay dividends beginning in 
20Y1. A variant of the constant growth dividend 
model can be used to find the value of XYZ's 
operations once its FCF stabilize and begin to 
grow at a constant rate: 
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Table 4 XYZ Inc.: Income Statements (in millions except for per-share data) 



Actual 20X8 

20X9 

Projected 

20 Y0 20Yl 

20Y2 

Net sales 

$700.00 

$850.00 

$1,000 

$1,100 

$1,500 

Costs (except depreciation) 

(599) 

(734) 

(911) 

(935) 

(982) 

Depreciation 

(28) 

(31) 

(34) 

(36) 

(38) 

Total operating costs 

(627) 

(765) 

(945) 

(971) 

(1,020) 

Earnings before interest and taxes (EBIT) 

73 

85 

55 

129 

135 

Less "net interest" 

(13) 

(15) 

(16) 

(17) 

(19) 

Earnings before taxes 

60 

70 

39 

112 

116 

Taxes (40%) 

(24) 

(28) 

(15.6) 

(44.8) 

(46.4) 

Net income before preferred dividends 

36 

42 

23.4 

67.2 

69.6 

Preferred dividends 

(6) 

(7) 

(7.4) 

(8) 

(8.3) 

Net income available for common dividends 

30 

35 

16 

59.2 

61.3 

Common dividends 

— 

— 

— 

44.2 

45.3 

Addition to retained earnings 

30 

35 

16 

15 

16 

Number of shares 

100 

100 

100 

100 

100 

Dividends per share 

— 

- 

— 

0.442 

0.453 


Notes: 


1. " 'Net interest" is interest paid on debt less interest earned on marketable securities. Both items could be shown 
separately on the income statements, but for this example we combine them and show net interest. 

2. Net income is projected to decline in 20YO. This is due to a projected cost for a one-time marketing program in 
that year. 

3. Growth has been rapid in the past, but the market is becoming saturated, so the sales growth rate is expected to 
decline from 21% in 20X9 to a sustainable rate of 5% in 20Y2 and beyond (forever). Further, the entire economy 
has seldom grown more than a 4% to 6% rate on an average annual basis. If one firm were to grow faster than 6% 
forever, it would most likely become the only firm in the economy! Therefore, a 5% growth rate beyond year 
20Y2 is a reasonable assumption. Firms cannot grow faster than the overall economy forever. Growth must slow 
down at some point in the future to a more sustainable average rate. 

4. Profit margins are expected to improve as the production process becomes more efficient and because XYZ will 
no longer be incurring marketing costs associated with the introduction of a major product. 

5. All items on the financial statements are projected to grow at a 5% rate after the year 20Y2. Notice that the 
company does not pay a dividend, but it is expected to start paying out about 75% of its earnings beginning in 
20 Yl. 

6. A firm's value is determined by its ability to generate cash flow, both now and in the future. Therefore, XYZ's 
value can be calculated as the present value of its expected future FCFs from operations, discounted at its cost of 
capital, k, plus the value of nonoperating assets. Here is the equation for the value of operations, or the firm's 
value as a going concern: 

Value of operations = Present value of expected future FCF + Present value of nonoperating assets 


Based on a 10.84% cost of capital, a $49 mil¬ 
lion FCF in 20Y2, and a 5% growth rate, the 
value of XYZ's operations as of December 31, 
20Y2 (terminal value) is forecasted to be $880.99 
million: 


Terminal value = 


$49(1 + 0.05) 
(0.1084 - 0.050) 
$51.45 

(0.1084 - 0.05) : 


$880.99 


This $880.99 million figure is called the com¬ 
pany's terminal or horizon value, because it 
is the value at the end of the forecast period. 
Moreover, this is the amount that XYZ could 
expect to receive if it sold its operating assets 
on December 31,20Y2. 

Table 6 shows the free cash flow for each year 
during the nonconstant growth period, along 
with the value of operations in 20Y2, at the end 
of the nonconstant growth period. To find the 
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Table 5 XYZ Inc.: Balance Sheets (millions of dollars) 



Actual 20X8 

20X9 

20 Y0 

Projected 

20Y1 

20Y2 

Cash 

$17 

$20 

$22 

$23 

$24 

Marketable securities (1) 

63 

70 

80 

84 

88 

Accounts receivable 

85 

100 

110 

116 

121 

Inventories 

170 

200 

220 

231 

243 

Total current assets 

335 

390 

432 

454 

476 

Net plant and equipment 

279 

310 

341 

358 

376 

Total assets 

614 

700 

773 

812 

852 

Liabilities and Equity 

Accounts payable 

17 

20 

22 

23 

24 

Notes payable 

123 

140 

160 

168 

176 

Accruals 

43 

50 

55 

58 

61 

Total current liabilities 

183 

210 

237 

249 

261 

Long-term bonds 

124 

140 

160 

168 

176 

Preferred stock 

62 

70 

80 

84 

88 

Common stock (2) 

200 

200 

200 

200 

200 

Retained earnings 

45 

80 

96 

111 

127 

Common equity 

245 

280 

296 

311 

327 

Total liabilities and equity 

614 

700 

773 

812 

852 


Notes: 

1. All assets except marketable securities are operating assets required to support sales. The marketable securities 
are financial assets not required in operations. 

2. Common equity is shown at par plus paid-in capital. Present value of nonoperating assets. 


value of operations as of "today," December 31, 
20X8, we find the PV of each annual cash flow 
in Table 7, discounting at the 10.84% cost of 
capital. 

The sum of the PVs (all FCFs and the terminal 
value discounted at 10.84%) is approximately 
$615 million. The $615.27 represents an estimate 
of the price XYZ could expect to receive if it sold 
its operating assets today, December 31, 20X8. 
The total value of any company is the value of 
its operations plus the value of its nonoperat¬ 
ing assets. As the December 31, 20X8, balance 
sheet in Table 5 shows, XYZ had $63 million of 
marketable securities on that date. Unlike op¬ 
erating assets, we do not have to calculate a 
present value for marketable securities because 
short-term financial assets as reported on the 
balance sheet are at, or close to, their market 
value. 

Therefore, XYZ's total value on December 31, 
20X8, is $615.27 + $63.00 = $678.27 million. 
If the company's total value on December 31, 


20X8, is $678.27 million, what is the value of its 
common equity? 

First, Table 5 shows that notes payable and 
long-term debt total $123 + $124 = $247 mil¬ 
lion, and these securities have the first claim 
on assets and income. (Accounts payable and 
accruals were netted out earlier.) Next, the pre¬ 
ferred stock has a claim of $62 million, and it 
ranks above the common. 

Therefore, the value left for common 
stockholders is $678.27 - $247 - $62 = 
$369.27 million. 

Table 8 summarizes the calculations used to 
find XYZ's stock value per share. There are 
100 million shares outstanding, and their total 
value is $369.27 million. Therefore, the value of 
a single share is $3.69 ($369.27/100 = $3.69). 

Much can be learned from the total firm valu¬ 
ation model, so many analysts today use it for 
all types of valuations. The process of project¬ 
ing the future financial statements can reveal 
quite a bit about the company's operations and 
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Table 6 Calculating XYZ's Pro Forma Expected Free Cash Flow (in millions) 


Projected 



Actual 20X8 

20X9 

20Y0 

20Y1 

20Y2 

Calculation of free cash flow 

Required net operating working capital 

$212 

$250 

$275 

$289 

$303 

Required net plant and equipment 

279 

310 

341 

358 

376 

Required net operating assets 

$491 

$560 

$616 

$647 

$679 

Required net new investment in operating assets = change in 

69 

56 

31 

32 


net operating assets from previous year 

NOPAT (Net operating profit after taxes) 

EBIT x (1 — Tax rate) 

$51 

$33 

$77.40 

$81 

Less: Required investment in operating assets 

69 

56 

31 

32 


Free cash flow (FCF) 


($18) 

($23) 

$46.40 

$49 


Notes: 


1. NOPAT declines in 20Y0 because of a marketing expenditure projected for that year. 

2. Table 4 calculates free cash flow for each year. Line 1, with data for 20X8 from the balance sheets in Table 5, shows 
the required net operating working capital, or operating current assets minus operating current liabilities, for 
20X8: 


Required net operating working capital = (Cash + Accounts receivable + Inventories) 

— (Accounts payable + Accruals) 

= ($17.00 + $85.00 + $170.00) - ($17.00 - $43.00) = $212.00. 

3. Line 2 shows required net plant and equipment, and Line 3, which is the sum of Lines 1 and 2, shows the 
required net operating assets, sometimes called net operating capital. For 20X8, net operating capital is $212 + 
$279 = $491 million. 

4. Line 4 shows the required net annual addition to operating assets, found as the change in net operating assets 
from the previous year. For 20X9, the required net investment in operating assets is $560 — $491 = $69 million. 

5. Line 5 shows NOPAT, or net operating profit after taxes. Note that EBIT is operating earnings before taxes, while 
NOPAT is operating earnings after taxes. Therefore, NOPAT = EBIT (I — T). With 20X9 EBIT of $85 as shown in 
Table 5 and a tax rate of 40%, NOPAT as projected for 20X9 is $51 million: 

NOPAT = EBIT(1 - T) = $85(1.0 - 0.4) = $51 million. 

6. Although XYZ's operating assets are projected to produce $51 million of after-tax profits in 20X9, the company 
must invest $69 million in new assets in 20X9. Therefore, the FCF for 20X9, shown on Line 7, is a negative $18 
million: 

FCF in 20X9 = $51 — $69 = — $18.00 million (negative) 

Present value of nonoperating assets 


Table 7 Process for Finding the Value of Operations Assumes g = 5% (constant) for Years 12/31/Y2 in Perpetuity 


Year 

12/31/X8 

12/31/X9 

12/31/YO 

12/31/Y1 

12/31/Y2 

FCF 

Terminal value (TV) 


(18.00) 

(23.00) 

46.40 

49.00 

880.99 

Total 

Present value of FCF and TV 
@10.84% = $615.27 
$615.27 = Value of operating 
assets as of 12/31/X8 


(18.00) 

(23.00) 

46.40 

929.99 
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Table 8 Finding the Value of XYZ's Stock (in millions 
except for per-share data) 


1. Value of operations (net of payables 

$615.27 

and accruals) 


2. Plus value of nonoperating assets 

$63.00 

3. Total market value of the firm 

$678.27 

4. Less: Value of debt 

$247.00 

Value of preferred stock 

$62.00 

5. Value of common equity 

$369.27 

6. Divide by number of shares 

100 

7. Estimated value per share 

$3.69 


financing needs. Also, such an analysis can pro¬ 
vide insights into actions that might be taken to 
increase the company's value. 


KEY POINTS 

• The two most commonly used approaches for 
equity valuation are the discounted cash flow 
and relative valuation models. 

• Despite the fact that equity valuation is very 
strongly tilted toward the use of discounted 
cash flow models, it is impossible to ignore 
the fact that many financial modelers employ 
relative valuation techniques. 

• Expected future cash flow is the true basis 
for financial value. Take the firms that look 


attractive based on "fundamentals" and at¬ 
tempt to estimate their current fair value 
based on the present value of all expected fu¬ 
ture cash flows (dividends and future selling 
price). 

• The basic source of estimation risk when us¬ 
ing discounted cash flow models in calculat¬ 
ing the value of any financial asset is that 
the present value depends on expected future 
cash flows and the appropriate discount rates 
that reflect the risk of the future cash flows. 
Cash flow valuation models, therefore, rely 
on assumptions (often extreme). 

* With cash flow valuation, the main problem 
is estimation risk. No financial modeler can 
correctly and consistently forecast the future. 
Estimation risk comes from not being able to 
perfectly forecast future cash flows and dis¬ 
count rates. 
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Abstract: Relative valuation methods use multiples or ratios, such as price/earnings, price/book, or 
price/free cash flow, to determine whether a particular firm is trading at higher or lower multiples 
than its peers. Such methods require the user to choose a suitable universe of firms that are more 
or less comparable, though this can become difficult for firms with unusual characteristics in terms 
of product mix or geographical exposure. Relative valuation methods can be useful for portfolio 
managers who expect to be fully invested at all times, as they provide a practical tool for attempting 
to capture the "value premium" by which firms trading at lower multiples tend to outperform those 
trading at higher multiples. Implicitly, relative valuation methods assume that the average multiple 
across the universe of firms can be treated as a reasonable approximation of "fair value" for those 
firms; this may be problematic during periods of market panic or euphoria. 


Much research in corporate finance and sim¬ 
ilar academic disciplines is tilted toward the 
use of discounted cash flow (DCF) methods. 
However, many analysts also make use of rela¬ 
tive valuation methods, which compare several 
firms by using multiples or ratios. Multiples 
that are commonly used for such purposes in¬ 
clude price,/earnings, price,/book, and price/'free cash 
flow. 


Relative valuation methods implicitly assume 
that "similar" firms are likely to be valued 
similarly by investors. Therefore, on average, 
we would expect that firms that are generally 
comparable are likely to trade at similar mul¬ 
tiples, in terms of price/earnings, price/book, 
or various other metrics. If this assumption 
is approximately correct, then relative valua¬ 
tion methods can be used to identify firms 


*The material discussed here does not necessarily represent the opinions, methods, or views of Delaware 
Investments. 
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that look "cheap" or "expensive" relative to 
their peers. When a particular firm's multi¬ 
ples are extremely different from the rest of 
the universe, this may indicate a potential in¬ 
vestment opportunity—though further analy¬ 
sis will likely be required to determine whether 
there are reasons why such a firm is valued dif¬ 
ferently from other companies that otherwise 
appear comparable. 

The basis of relative valuation methods is to 
use one or several ratios to determine whether a 
firm looks "cheap" or "expensive" by compar¬ 
ison with generally similar firms. Relative val¬ 
uation methods do not attempt to explain why 
a particular firm is trading at a particular price; 
instead, they seek to measure how the market 
is currently valuing multiple companies, with 
the underlying assumption that the average 
multiple for a group of companies is probably 
a reasonable approximation to overall market 
sentiment toward that particular industry. In 
other words, relative valuation work assumes 
that on average, the share prices of companies 
in a particular universe are likely to trade at 
similar multiples relative to their own financial 
or operating performance. Baker and Ruback 
(1999) provide a more formal presentation of 
these concepts. However, it is important to re¬ 
alize that at any particular time, some firms are 
likely to be trading at higher or lower multiples 
than would be justified under "fair value." 

Making effective use of relative valuation 
methods does require careful selection of "sim¬ 
ilar" companies. Sometimes this is relatively 
simple, for instance when an analyst is deal¬ 
ing with industries where there are a large 
number of roughly homogeneous firms provid¬ 
ing goods or services that are approximately 
equivalent. However, sometimes there can be 
considerable difficulties in identifying "simi¬ 
lar" companies, particularly if the firms un¬ 
der consideration are unusually idiosyncratic in 
terms of their product mix, geographical focus, 
or market position. In this entry, we will provide 
some tentative guidance about how to build a 
universe of comparable companies. However, ul¬ 
timately this part of the process will depend on 


the skill and knowledge of the individual an¬ 
alyst; two different experts may pick different 
sets of "similar" firms, and thus generate differ¬ 
ent values from their relative valuation analysis. 

BASIC PRINCIPLES OF 
RELATIVE VALUATION 

Analysis based on relative valuation requires 
the analyst to choose a suitable universe of com¬ 
panies that are more or less comparable with 
one another. There is no standardized approach 
concerning how to choose such a universe of 
similar firms, and the process relies to some 
extent on an analyst's personal judgment con¬ 
cerning the particular industry and geography 
involved. However, it is possible to lay out some 
general principles that combine practitioners' 
insights with the results of academic inquiry. 

Sources of Data 

Relative valuation approaches can only be em¬ 
ployed if there is sufficient information, pro¬ 
duced on an approximately consistent foot¬ 
ing, about the various companies that are the 
subjects of analysis. In most countries, compa¬ 
nies that are publicly listed on stock exchanges 
are required by law and regulation to report 
their historical results publicly in a timely man¬ 
ner, or risk being delisted from the exchange. 
(There may be occasional exceptions to this gen¬ 
eral pattern, particularly for entities that are 
majority owned or controlled by their home 
country government. But such anomalies are 
not frequently observed except during crisis 
periods.) Consequently, it is almost always pos¬ 
sible to obtain information about listed com¬ 
panies' historical results. However, multiples 
based solely on historical data may not pro¬ 
vide a complete picture, as most analysts would 
probably agree that forward-looking estimates 
are likely to provide more useful insights into 
the market's opinion of a particular company 
(Valentine, 2011, p. 261). 

Investment banks, rating agencies, and other 
firms can provide estimates of a firm's future 
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earnings, revenues, and other metrics, typi¬ 
cally over the next two or three years. Various 
data providers such as Bloomberg or Thomson 
Reuters collect such information and use it as 
the basis for "consensus" estimates, which can 
be viewed as representing the market's general 
opinion of a company's future prospects. It is 
also possible to use a firm's own in-house es¬ 
timates for the companies under coverage, as 
these may incorporate insights that are not yet 
reflected in current pricing. However, for pre¬ 
cisely this reason, in-house estimates should be 
used as a supplement rather than as a replace¬ 
ment for consensus figures. 

It is conventional to consider more than one 
year of data, as there may be disparities in how 
the market is valuing results in the immediate 
future and in the slightly longer term. However, 
it is often difficult or impossible to obtain con¬ 
sensus estimates more than two or three years 
into the future. Consequently, relative valua¬ 
tion approaches generally focus on relatively 
short periods into the future, rather than seek¬ 
ing to gauge how the market is valuing ex¬ 
pected performance five or ten years hence. 
(In this respect, relative valuation analysis can 
be viewed as somewhat limited by compari¬ 
son with DCF approaches, which typically give 
considerably more attention to the relatively 
distant future.) 

Number of Comparable Firms 

In general, an analyst would like to use data 
from other firms that are as similar as possi¬ 
ble. However, if the criteria for "similarity" are 
specified too stringently, then there may be too 
few firms included in the universe. And if the 
sample is too small, then the idiosyncrasies of 
individual firms may exert an excessive influ¬ 
ence on the average multiple, even if the ana¬ 
lyst focuses on the median rather than the mean 
when calculating the "average" multiple. 

Generally speaking, we believe that it is desir¬ 
able to have at least five or six comparable com¬ 
panies, in order to begin drawing conclusions 
about relative valuation for a particular indus¬ 


try. Conversely, there may be few benefits from 
considering more than 12 companies, particu¬ 
larly if the larger universe contains firms that 
resemble less closely the particular company 
that is the focus of the analyst's attention. 1 For 
most practical purposes, a group of between six 
and 12 comparable firms should be sufficiently 
large to produce usable results. 

Basis for Selecting Comparable 
Firms 

In an ideal situation, a universe of comparable 
companies would be similar in terms of size, 
industry focus, and geography. This tends to 
be easier when considering small or mid-sized 
firms—say, with market caps between $100 mil¬ 
lion and $10 billion (based on 2010 U.S. dollars). 
Firms that are below this size limit, in other 
words microcap stocks, may be more difficult 
to use for relative valuation purposes. Even if 
these firms are public, they may receive less 
coverage from research analysts, who typically 
are more interested in companies that are large, 
liquid, and already owned by institutional in¬ 
vestors (see Bhushan, 1989). 

Conversely, it can also be difficult to perform 
relative value analysis on companies with rel¬ 
atively high market capitalization. Many large 
firms are dominant players in their particular 
market niches, in which case they may be more 
likely to trade at a premium reflecting their 
higher degree of market power. Alternatively, 
large firms may be effectively a conglomerate 
of numerous smaller entities, each engaged in a 
specific activity, and there may be no other large 
or small firm that produces an approximately 
equivalent blend of goods and/or services. 

When attempting to assess the relative value 
of firms that are large and / or complex, it can 
often be useful to assess "relative value" using 
two separate approaches. The first approach is 
to consider the firm as a complete entity and 
try to find other firms that are at least some¬ 
what comparable in terms of size and complex¬ 
ity, even if their business mix is not precisely 
identical. In such cases, it can often be useful 
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to consider similar firms that may be located 
in other countries, even though their different 
geographical positioning may affect their level 
of risk and thus the multiples at which they 
trade. The second approach is to use a sum- 
of-the-parts valuation method, which will be 
discussed in more detail later in this entry. 

Geography and Clientele 

Differences related to geographic location can 
affect the extent to which companies can be 
viewed as broadly similar. For instance, in the 
United States public utilities are predominantly 
regulated at the state level, and the public util¬ 
ity commissions in one state may operate quite 
differently from their counterparts elsewhere. 
Consequently, a public utility operating in one 
state may not be directly comparable with a 
public utility located in another state. In re¬ 
cent decades, there has been a wave of acquisi¬ 
tion activity in the U.S. utility industry, so that 
now some utilities have operations in multiple 
states. In such instances, the valuation placed on 
a utility will presumably incorporate investors' 
perceptions of the regulatory environment af¬ 
fecting each of its state-level operations. For rel¬ 
ative value purposes, a group of multistate pub¬ 
lic utilities may not be very similar to a public 
utility that is operating in only one state. 

Regional differences in regulatory regimes 
may only affect a subset of companies. FIow- 
ever, firms in the same industry may well have 
quite different client bases and geographic ex¬ 
posures. For instance, one retailer may aim to 
sell a wide range of goods to a mass-market 
client base at the regional or national level, 
while another retailer might instead focus on 
selling a limited number of luxury products to 
the most affluent members of the global popula¬ 
tion. These two firms are likely to have substan¬ 
tially different product quality, cost bases, profit 
margins, and sensitivity to macroeconomic con¬ 
ditions. In particular, retailers of luxury goods 
to a global client base may have developed 
brands that transcend national borders, and 
a high proportion of their current and future 


revenues and profits may come from outside 
their home country. Under such conditions, it is 
possible that a suitable universe of comparable 
companies might include at least a few foreign 
firms, particularly if they have similarly broad 
geographic reach. 

In past decades, analysts focusing on U.S. 
firms would probably have only rarely used 
foreign firms in their analysis of "comparable 
companies." Flowever, as both U.S. and foreign 
firms have become increasingly globalized, and 
as accounting standards around the world have 
gradually started to become more similar, we 
believe that for some types of relative value 
analysis, there may be benefits to including 
firms that are generally comparable in terms of 
size and product mix, even if their legal head¬ 
quarters are not located in the United States. For 
more insights into these issues, see Copeland, 
Roller, and Murrin (2000, Chapter 18). 

Many companies have "depositary receipts" 
in other markets, such as ADRs. Consensus esti¬ 
mates may be available for a firm's local results 
and/or its depositary receipts. The estimates 
for the depositary receipts may be affected by 
actual or expected movements between the cur¬ 
rencies of the two countries, which may bias the 
analysis. We therefore recommend that when 
calculating figures for companies that are listed 
in different countries, all multiples should be 
consistently calculated in terms of local cur¬ 
rency throughout, in order to ensure that an¬ 
ticipated or historical currency fluctuations will 
not affect the results. A substantial number of 
non-US companies have a share price quoted in 
one currency, but report their financial results in 
another currency; to avoid potential mismatch- 
related errors in such cases, it may be prudent 
to convert all numbers into a single numeraire 
such as the US dollar. 

Sector and Industry Characteristics 

Some academic research has examined differ¬ 
ent ways of selecting a universe of comparable 
firms. Bhojraj, Lee, and Oler (2003) compared 
the effect of using four different industry 
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classification methods, and concluded that at 
least for a universe of U.S. securities, the Global 
Industry Classification Standard (jointly devel¬ 
oped and maintained by Standard & Poor's 
and Morgan Stanley Capital International) ap¬ 
peared to do the best job of identifying firms 
with similar valuation multiples and stock price 
movements. Chan, Lakonishok, and Swami- 
nathan (2007) compared the effect of using in¬ 
dustry classification schemes with statistically 
based clustering approaches, and found that ex¬ 
amining stocks in terms of industry member¬ 
ship seemed to give better explanatory power 
than working in terms of either sectors or 
subindustries. To our knowledge, there have 
not been any parallel investigations into the ef¬ 
fectiveness of different industry classification 
schemes for cross-national analysis. The results 
of Phylaktis and Xia (2006) suggest that the 
importance of sector-level effects has been in¬ 
creasing in recent years, while the influence of 
country-level effects has waned slightly. 

Technology and Intraindustry 
Diversity 

As discussed above, some academic research 
has suggested that firms from similar indus¬ 
tries tend to trade at similar multiples and to 
experience similar stock price movements. In¬ 
dustry membership therefore would seem to 
be a useful starting point for analysis. Thus, for 
instance, trucking companies and railroad com¬ 
panies both provide transportation services, but 
railroads will generally trade at different multi¬ 
ples from trucking companies because their cost 
structure and balance sheets tend to be quite 
different. 

In some cases, there can be substantial vari¬ 
ation even within a particular subindustry. For 
instance, "publishing" covers a wide variety 
of different business models, including daily 
newspapers, weekly magazines, publishers of 
textbooks and professional journals, printers 
of fiction or nonfiction books, and suppliers 
of financial data. Each of these individual in¬ 
dustries is likely to have different sources of 


revenue, different technological requirements, 
different cost structures, and different rates of 
expected growth. Admittedly, the larger pub¬ 
lishing houses may have operations spanning 
several different fields, but the relative contri¬ 
butions of each division to the firm's overall 
revenues and profits may differ substantially. 
In such instances, relative value analysis may 
result in a wide range of valuation multiples, 
possibly with several different clusters reflect¬ 
ing each firm's competitive position. We con¬ 
sider such difficulties in the next section. 

There are also some industries in which tech¬ 
nological differences are the principal basis on 
which relative values are assigned. For instance, 
small companies in the field of biotechnology 
may have only a handful of products, each of 
which could potentially be a great success or 
a dismal failure. Some companies of this type 
may be still at the prerevenue stage when they 
go public, so that their valuation is entirely 
based on the market's expectations about the ul¬ 
timate value of technology that has not yet gen¬ 
erated actual sales. In such instances, relative 
value analysis might require particularly care¬ 
ful selection of companies that are truly compa¬ 
rable in terms of the market's perception of their 
stage of development and the likelihood that 
their key products will ultimately be success¬ 
ful. Arguably, relative value analysis in such 
cases may not generate particularly useful re¬ 
sults, because the spread of potential outcomes 
is so broad. 

Bimodal and Multimodal Patterns 

Sometimes the outcome of a relative value anal¬ 
ysis will show that the valuation multiples are 
not evenly spread between low and high, but 
instead are bimodal or multimodal—in other 
words, there seem to be two or more clusters 
of results. We show an example of this in our 
hypothetical example below, which suggests 
that in a universe of seven firms, two are ex¬ 
pected to achieve a return on equity (ROE) of 
11% to 12% in FYO and FY1, whereas the other 
companies are generally projected to deliver an 
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ROE of 8% to 9%. Such differences may appear 
relatively minor, but if the market really does 
expect these outcomes, then the two companies 
with higher profitability may legitimately be ex¬ 
pected to trade at a premium to their peers. 

When a relative valuation table appears to 
have bimodal or multimodal characteristics, an 
analyst will generally be well advised to inves¬ 
tigate further. In any given sector or industry, 
there may well be some firms that are truly ca¬ 
pable of producing higher returns than their 
peers, perhaps as a result of better manage¬ 
ment, a stronger market position, or a more 
supportive regulatory environment. Relative 
valuation methods can identify potential out¬ 
liers of this type, but cannot test whether the 
estimates themselves are reasonable. 

One potentially useful approach is to extend 
the analysis further back into the past, using 
historical prices for valuation purposes, and 
if possible also using as-was projections for 
the relevant period. Such projections are now 
widely available from various different data 
vendors, including Bloomberg, FactSet, and 
Thomson Reuters. Consider the companies that 
are currently trading at a premium or a discount 
to their peers—did they also trade at a discount 
to their peers in the past? A logical extension of 
relative value analysis based on a single period 
is to gauge whether a particular firm persis¬ 
tently tends to trade at a lower or higher mul¬ 
tiple than its peers, and then assess whether 
its current multiple is above or below what 
would be expected on the basis of prior periods. 
Damodaran (2006, Chapter 7, p. 244) notes that 
relative valuations frequently have low persis¬ 
tence over time. For industries in which this is 
the case, then relative valuation methods may 
indeed provide useful investment signals. 

Choice of Valuation Multiples 

Many relative valuation methods compare a 
company's share price with some measure of its 
performance, such as earnings per share (EPS) 
or free cash flow per share. Other relative val¬ 


uation methods compare a company's share 
price with some measure of its size, such as 
book value per share. Block (1999) has reported 
that the majority of practitioners consider that 
when analyzing securities, measures of earn¬ 
ings and cash flow are somewhat more impor¬ 
tant than measures of book value or dividends. 
However, many practitioners will make use of 
various metrics in their work, in the expecta¬ 
tion that the different multiples will provide 
varying perspectives. Liu, Nissim, and Thomas 
(2002) compared the efficacy of six different 
metrics for relative valuations of U.S. firms on a 
universe-wide basis. Liu, Nissim, and Thomas 
(2007) extended the analysis to seven different 
metrics applied to 10 different countries and 
multiple industries. Hooke (2010, Chapter 15) 
presents an example using eight different met¬ 
rics applied to the single industry of temporary 
staffing companies. In a hypothetical example 
below, we use three different metrics for rel¬ 
ative valuation analysis, and we believe that 
most practitioners would consider that between 
three and six different metrics is probably justi¬ 
fiable. It is certainly possible to have a much 
larger number of metrics (see Damodaran, 
2006, p. 650), but the results may be harder to 
interpret. 

A ratio such as price/earnings can be 
calculated in terms of share price /EPS, or alter¬ 
natively can be interpreted as market cap/net 
income. For most purposes, these two ratios 
will be the same. However, share issuance or 
buyback activity may impair the comparability 
of figures expressed in terms of EPS. If there 
is any possibility of ambiguity, then we would 
generally recommend using market cap/net 
income. 

For instance, a company may currently have 
100 million shares outstanding, a current share 
price of $40, and expected earnings of $2 in FY0 
and $3 in FY1. If the P/E ratio is calculated in 
terms of price/EPS, then the FY0 ratio is 20 and 
the FY1 ratio is 13.3. However, analysts may 
be expecting that the company will buy back 
and cancel 20% of its shares during FY1. If so. 
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then the projected net income in FY1 would 
presumably be $240 million rather than $300 
million. If the P/E ratio is calculated using 
market cap and net income, then the FY1 ra¬ 
tio would be 16.7 rather than 13.3. This hy¬ 
pothetical example indicates the importance of 
ensuring that the denominator is being calcu¬ 
lated on a basis that reflects the historical or 
projected situation for the relevant period. (An 
investor might consider that if a firm's manage¬ 
ment is indeed strongly committed to buying 
back its own shares, then this might indicate 
that the firm's management views the shares as 
being undervalued. However, such considera¬ 
tions would presumably be included as a quali¬ 
tative overlay to the relative valuation analysis.) 

Choice of Numerator: Market Cap 
versus Firm Value 

In some instances, the choice of numerator may 
have a significant impact on the multiple. For 
instance, many analysts will use price / sales ra¬ 
tios for valuation purposes. However, a firm's 
revenues are generated from the total of its cap¬ 
ital base, comprising both equity and debt. 

Consider two companies, A and B, which both 
have a current market cap of $300 million and 
projected annual revenues of $600 million in 
FY0, so that they both have a current price / sales 
ratio of 2. But suppose that Company A has 
no outstanding borrowings, whereas Company 
B has net debt of $300 million. One could 
argue that Company B is actually rather less 
attractive than Company A, as apparently it re¬ 
quires twice as much capital to generate the 
same volume of sales. In effect, analyzing the 
company in terms of "firm value/sales" rather 
than price/sales would reveal that Company B 
is actually making less efficient use of its capital 
than Company A. 

There is no single definition of "firm value" 
that is generally accepted by all practitioners. 
In an ideal world, one would want to have the 
market value of the firm's equity capital and 
of the firm's debt capital. However, because 


corporate bonds and bank loans typically are 
not traded in liquid markets, there may not be 
any reliable indicator of the market value of 
debt capital. Consequently, it is conventional 
to use market capitalization to estimate how in¬ 
vestors are valuing the firm's equity capital, but 
then to use figures from the firm's most recent 
balance sheet together with the notes to the fi¬ 
nancial statements as a proxy for net debt. The 
broadest definition of which we are aware is the 
following: 

Net Debt = Total Short-Term Debt 

+ Total Long-Term Debt + Minority Interest 
+ Unfunded Pension Liabilities 
— Cash and Equivalents 

In practice, for most firms, the biggest com¬ 
ponents of net debt are likely to be total short¬ 
term debt, total long-term debt, and cash and 
equivalents. In most cases, using an alternative 
definition of firm value will often have only a 
small impact on the calculated multiple. 

Conceptually, it is possible to divide the in¬ 
come statement between the line items that 
are generated on the basis of total capital, and 
those that pertain solely to equity capital. For 
most firms, the separator between these two 
categories is Net Interest Expense or Net In¬ 
terest Income. Analyzing relative valuation for 
banks and insurance companies can be some¬ 
what more complex, as discussed in Copeland, 
Koller, and Murrin (2000, Chapters 21 and 22). 
Generally speaking, it is usually desirable that 
the numerator and denominator of a valuation 
metric should be consistent with each other 
(Damodaran, 2006, pp. 239-240). 

Industry-Specific Multiples 

Analysts covering some industries may make 
use of information specific to that industry, 
such as paid miles flown for airlines, same- 
store sales for retailers, or revenue per available 
room for hotel chains. Such data can provide in¬ 
sights into how the market is valuing individual 
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firms' historical or expected operating perfor¬ 
mance. However, we consider that they should 
be viewed as a supplement to other multiples, 
rather than as a replacement for them, for two 
reasons: because it can be difficult to reconcile 
a company's operating performance with its fi¬ 
nancial results, and also because there may be 
little or no intuition about what would be a "rea¬ 
sonable" estimate for long-run valuation lev¬ 
els (Damodaran, 2006, Chapter 7, pp. 237-238). 
Natural resource producers tend to be valued in 
terms of both their operating efficiency and the 
resources that they control, so it may be useful 
to include some measure of their reserves in the 
analysis (Hooke, 2010, Chapter 21). Many prac¬ 
titioners make use of efficiency metrics when 
using relative valuation approaches to assess 
some types of banks and other lending institu¬ 
tions (Hooke, 2010, Chapter 22). 


HYPOTHETICAL EXAMPLE 

Suppose that an analyst is seeking to gauge 
whether Company A is attractive or unattrac¬ 
tive on the basis of relative valuation methods. 
Suppose that the analyst has determined that 
there are six other listed companies in the same 

Table 1 Hypothetical Relative Valuation Results 


industry which are approximately the same 
size, and which are also comparable in terms 
of product mix, client base, and geographical 
focus. 2 Based on this information, the analyst 
can calculate some potentially useful multiples 
for all seven companies. A hypothetical table of 
such results is shown in Table 1. (For the pur¬ 
poses of this simple hypothetical example, we 
are assuming that all the firms have the same fis¬ 
cal year. We will consider calendarization later 
in this entry.) 

In this hypothetical scenario. Company A is 
being compared to Companies B through G, 
and therefore Company A should be excluded 
from the calculation of median and standard de¬ 
viation, which would otherwise lead to double¬ 
counting. The median is used because it tends to 
be less influenced by outliers than the statistical 
mean, so it is likely to be a better estimate for the 
central tendency. (Similarly, the standard devia¬ 
tion can be strongly influenced by outliers, and 
it would be possible to use "median absolute 
deviation" as a more robust way of gauging the 
spread around the central tendency. Such ap¬ 
proaches may be particularly appropriate when 
the data contain one or a handful of extreme 
outliers for certain metrics, which might be as¬ 
sociated with company-specific idiosyncrasies.) 
The table has been arranged in terms of market 


P/E P/FCF P/B 


Company 

Share Price ($) 

Market Cap ($m) 

FY0 

FY1 

FY0 

FY1 

FY0 

FY1 

A 

20.00 

400 

12.0 

10.0 

8.5 

7.0 

1.30 

1.20 

B 

16.00 

550 

11.5 

11.5 

5.0 

6.0 

1.00 

0.95 

C 

40.00 

500 

13.0 

12.0 

8.0 

7.5 

1.50 

1.40 

D 

15.00 

450 

12.5 

12.0 

8.0 

7.0 

1.10 

1.05 

E 

13.00 

350 

14.5 

13.0 

9.0 

8.0 

1.25 

1.15 

F 

30.00 

350 

12.5 

12.5 

7.0 

4.5 

1.15 

1.15 

G 

15.00 

300 

15.0 

14.0 

7.0 

6.0 

1.20 

1.15 

Median 


400 

12.75 

12.25 

7.50 

6.50 

1.18 

1.15 

Std Dev 


98.3 

1.33 

0.89 

1.37 

1.26 

0.17 

0.15 

A versus Median 


0% 

-6% 

-18% 

13% 

8% 

11% 

4% 


Notes: P/E refers to price/eamings before extraordinary items; P/B refers to price / book value; P/FCF refers to 
price/free cash flow (defined as earnings before extraordinary items plus noncash items taken from the cash flow 
statement); FY0 refers to the current fiscal year; FY1 refers to the next fiscal year; figures for FY0 and FY1 could have 
been derived from consensus sell-side estimates or other sources. 
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cap, from largest to smallest, which can some¬ 
times reveal patterns associated with larger or 
smaller firms, though there don't appear to be 
any particularly obvious trends in this particu¬ 
lar set of hypothetical numbers. 

The table suggests that the chosen universe 
of comparable companies may be reasonably 
similar to Company A in several important re¬ 
spects. In terms of size. Companies B, C, and D 
are slightly larger, while Companies E, F, and 
G are slightly smaller, but the median market 
cap across the six firms is the same as Company 
A's current valuation. In terms of P/E ratios. 
Company A looks slightly cheap in terms of 
FYO earnings and somewhat cheaper in terms 
of FY1 earnings. In terms of P/FCF ratios. Com¬ 
pany A looks somewhat expensive in terms of 
FYO free cash flow, but only slightly expensive 
in terms of FY1 free cash flow. And finally, in 
terms of P/B ratios. Company A looks some¬ 
what expensive in terms of FYO book value, but 
roughly in line with its peers in terms of FY1 
book value. 

Analysis of the Hypothetical 
Example 

So what are the implications of these results? 
First, Company A looks relatively cheap com¬ 
pared to its peer group in terms of P/E ratios, 
particularly in terms of its FY1 multiples. Sec¬ 
ond, Company A looks rather expensive com¬ 
pared to its peer group in terms of P/FCF and 
P/B ratios, particularly in terms of FYO figures. 
If an analyst were focusing solely on P/E, then 
Company A would look cheap compared with 
the peer group, and this might suggest that 
Company A could be an attractive investment 
opportunity. 

However, the analyst might be concerned 
that Company A looks comparatively cheap in 
terms of P/E, but somewhat expensive in terms 
of price/book. One way to investigate this ap¬ 
parent anomaly is to focus on ROE, which is 
defined as earnings/book value. Using the data 
in the table, it is possible to calculate the ROE 


for Company A and for the other six com¬ 
panies by dividing the P/B ratio by the P/E 
ratio—because this effectively cancels out the 
"price" components, and thus will generate an 
estimated value for EPS divided by book value 
per share, which is one way to calculate ROE. 

The results suggest that Company A is ex¬ 
pected to deliver an ROE of 10.8% in FYO and 
12% in FY1, whereas the median ROE of the 
other six firms is 8.7% in FYO and 8.8% in FY1. 
Most of the comparable companies are expected 
to achieve an ROE of between 8% and 9% in 
both FY0 and FY1, though apparently Company 
C is expected to achieve an ROE of 11.5% in FYO 
and 11.7% in FY 1. (A similar analysis can be con¬ 
ducted using "free cash flow to equity," which 
involves dividing the P/B ratio by the P/FCF 
ratio. This indicates that Company A is slightly 
below the median of Companies B through G in 
FYO, but in line with its six peers during FY1.) 

These results suggest that Company A is ex¬ 
pected to deliver an ROE that is substantially 
higher than most of its peers. Suppose that an 
analyst is skeptical that Company A really can 
deliver such a strong performance, and instead 
hypothesizes that Company A's ROE during 
FYO and FY1 may only be in line with the me¬ 
dian ROE for the peer group in each year. Based 
on the figures in Table 1, Company A's book 
value in FYO is expected to be $15.38, and the 
company is projected to deliver $1.67 of earn¬ 
ings. Now suppose that Company A's book 
value remains the same, but that its ROE during 
FYO is only 8.7%, which is equal to the median 
for its peers. Then the implied earnings during 
FYO would only be $1.35, and the "true" P/E for 
Company A in FYO would be 14.9, well above 
the peer median of 12.75. 

The analysis can be extended a little further, 
from FYO to FY1. The figures in the table above 
suggest that Company A's book value in FY1 
will be $16.67, and that the company will gen¬ 
erate $2.00 of earnings during FY1. But if Com¬ 
pany A only produced $1.35 of earnings during 
FYO, rather than the table's expectation of $1.67, 
then the projected FY1 book value may be too 
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high. A quick way to estimate Company A's 
book value in FY1 is to use a clean surplus anal¬ 
ysis, using the following equation: 

BookpYi = BookpYo + Net IncomeFYi 
— DividendsFYi 

Based on the figures in the table above. Com¬ 
pany A is expected to have earnings of $1.67 
during FYO, and $2.00 during FY1. The implied 
book value per share is $15.38 in FYO, and $16.67 
during FY1. According to the clean surplus for¬ 
mula, Company A is expected to pay a dividend 
of $0.38 per share in FY1. 

Assuming that the true earnings in FYO are 
indeed $1.35 rather than $1.67, and that the div¬ 
idend payable in FY1 is still $0.38, then the ex¬ 
pected book value for Company A in FY1 would 
be $16.35 rather than $16.67. Taking this figure 
and applying the median FY1 peer ROE, the ex¬ 
pected FY1 earnings for Company A would be 
$1.42 rather than $2.00, and consequently the 
"true" P/E for FY1 would be 13.9 instead of the 
figure of 10.0 shown in the table. At those levels, 
the stock would presumably no longer appear 
cheap by comparison with its peer group. In¬ 
deed, Company A's FY1 P/E multiple would 
be roughly in line with Company G, which has 
the highest FY1 P/E multiple among the com¬ 
parable companies. 

This quick analysis therefore suggests that the 
analyst may want to focus on why Company A 
is expected to deliver FYO and FY1 ROE that is 
at or close to the top of its peer group. As noted 
previously. Company A and Company C are 
apparently expected to have an ROE that is sub¬ 
stantially stronger than those of the other com¬ 
parable companies. Is there something special 
about Companies A and C that would justify 
such an expectation? Conversely, is it possible 
that the estimates for Companies A and C are 
reasonable, but that the projected ROE for the 
other companies is too pessimistic? If the lat¬ 
ter scenario is valid, then it's possible that the 
P/E ratios for some of the other companies in 
the comparable universe are too high, and thus 


that those firms could be attractively valued at 
current levels. 

Other Potential Issues 

Multiples Involving Low or 
Negative Numbers 

It is conventional to calculate valuation multi¬ 
ples with the market valuation as the numerator 
and the firms' financial or operating data as the 
denominator. If the denominator is close to zero, 
or negative, then the valuation multiple may be 
very large or negative. The simplest example of 
such problems might involve a company's earn¬ 
ings. Consider a company with a share price 
of $10 and projected earnings of $0.10 for next 
year. Such a company is effectively trading at 
a P/E of 100. If consensus estimates turn more 
bearish, and the company's earnings next year 
are expected to be minus $0.05, the company 
will now be trading at a P/E of -200. 

It is also possible for a firm to have nega¬ 
tive shareholders' equity, which would indicate 
that the total value of its liabilities exceeds the 
value of its assets. According to a normal un¬ 
derstanding of accounting data, this would in¬ 
dicate that the company is insolvent. Flowever, 
some companies have been able to continue 
operating under such circumstances and even 
to retain a stock exchange listing. Firms with 
negative shareholders' equity will also have 
a negative price/book multiple. (In principle, 
a firm can even report negative net revenues 
during a particular period, though this would 
require some rather unusual circumstances. 
One would normally expect few firms to re¬ 
port negative revenues for more than a single 
quarter.) 

As noted previously, averages and standard 
deviations tend to be rather sensitive to out¬ 
liers, which is one reason to favor using the 
median and the median absolute deviation in¬ 
stead. But during economic recessions at the 
national or global level, many companies may 
have low or negative earnings. Similarly, firms 
in cyclical industries will often go through 
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periods when sales or profits are unusually 
low, by comparison with their average lev¬ 
els through a complete business cycle. Un¬ 
der such circumstances, an analyst may prefer 
not to focus on conventional metrics such as 
Price/Earnings, but instead to use line items 
from higher up the income statement that typ¬ 
ically will be less likely to generate negative 
numbers. 


Calendarization 

Some of the firms involved in the relative val¬ 
uation analysis may have fiscal years that end 
in different months. Most analyst estimates are 
based on a firm's own reporting cycle. It is usu¬ 
ally desirable to ensure that all valuation mul¬ 
tiples are being calculated on a consistent basis, 
so that calendar-based effects are not driving 
the analysis. 

One way to ensure that all valuation multiples 
are directly comparable is to calendarize the fig¬ 
ures. Consider a situation where at the start of 
January, an analyst is creating a valuation anal¬ 
ysis for one firm whose fiscal year ends in June, 
while the other firms in the universe have fiscal 
years that end in December. Calendarizing the 
results for the June-end firm will require tak¬ 
ing half of the projected number for FYO and 
adding half of the projected number for FY1. 
(If quarter-by-quarter estimates are available, 
then more precise adjustments can be imple¬ 
mented by combining 3QFY0, 4QFY0, 1QFY1, 
and 2QFY1.) 

Calendarization is conceptually simple, but 
may require some care in implementation dur¬ 
ing the course of a year. One would expect that 
after a company has reported results for a full 
fiscal year, the year defined as "FYO" would im¬ 
mediately shift forward 12 months. However, 
analysts and data aggregators may not change 
the definitions of "FYO" and "FY1" for a few 
days or weeks. In case of doubt, it may be 
worth looking at individual estimates in order 
to double-check that the correct set of numbers 
is being used. 


Sum-of-the-Parts Analysis 
When attempting to use relative valuation 
methods on firms with multiple lines of busi¬ 
ness, the analyst may not be able to identify 
any company that is directly similar on all di¬ 
mensions. In such instances, relative valuation 
methods can be extended to encompass "sum- 
of-the-parts" analysis, which considers each 
part of a business separately and attempts to 
value them individually by reference to compa¬ 
nies that are mainly or solely in one particular 
line of business (see Hooke, 2010, Chapter 18). 

Relative valuation analysis based on sum- 
of-the-parts approaches will involve the 
same challenges as were described above— 
identifying a suitable universe of companies en¬ 
gaged in each particular industry, collecting and 
collating the necessary data, and then using the 
results to gauge what might be a "fair value" 
for each of the individual lines of business. But 
in addition to these considerations, there is an 
additional difficulty, which is specific to sum- 
of-the-parts analysis. This problem is whether 
to apply a conglomerate discount, and if so, how 
much. 

Much financial theory assumes that all else 
equal, investors are likely to prefer to invest 
in companies that are engaged in a single line 
of business, rather than to invest in conglom¬ 
erates that have operations across multiple in¬ 
dustries. Investing in a conglomerate effectively 
means being exposed to all of that conglom¬ 
erate's operations, and the overall mix of in¬ 
dustry exposures might not mimic the portfolio 
that the investor would have chosen if it were 
possible instead to put money into individual 
companies. 

A possible counterargument might be that 
a conglomerate with strong and decisive cen¬ 
tral control may achieve synergies with regard 
to revenues, costs, or taxation that would not 
be available to individual free-standing firms 
dealing at arms' length with one another. A 
skeptical investor might wonder, on the other 
hand, about whether the potential positive im¬ 
pact of such synergies may be partly or wholly 
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undermined by the negative impacts of central¬ 
ized decision making, transfer pricing, and reg¬ 
ulatory or reputational risk. 

For these reasons, an analyst might consider 
that it is reasonable to apply a discount to 
the overall value that emerges from the "sum 
of the parts." Some practitioners favor a dis¬ 
count of somewhere between 5% and 15%, for 
the reasons given above. Academic research 
on spinoffs has suggested that the combined 
value of the surviving entity and the spun- 
off firm tends to rise by an average of around 
6%, though with a wide range of variation (see 
Burch and Nanda, 2003). (Some analysts have 
suggested that in some particular contexts, for 
instance in markets where competent managers 
are very scarce, then investors should be willing 
to pay a premium for being able to invest in a 
conglomerate that is fortunate enough to have 
such executives. However, this appears not to 
be a mainstream view.) 


Relative Valuation versus DCF: 

A Comparison 

Relative valuation methods can generally be 
implemented fairly fast, and the underlying 
information necessary to calculate can also 
be updated quickly. Even with the various 
complexities discussed above, an experienced 
analyst can usually create a relative valuation 
table within an hour or two. And the calcu¬ 
lated valuation multiples can adjust as market 
conditions and relative prices change. In both 
respects, relative valuation methods have an 
advantage over DCF models, which may re¬ 
quire hours or days of work to build or update, 
and which require the analyst to provide multi¬ 
ple judgment-based inputs about unknowable 
future events. Moreover, as noted by Baker and 
Ruback (1999), if a DCF model is extended to 
encompass multiple possible scenarios, it may 
end up generating a range of "fair value" prices 
that is too wide to provide much insight into 
whether the potential investment is attractive 
at its current valuation. 


Relative valuation methods focus on how 
much a company is worth to a minority share¬ 
holder, in other words an investor who will 
have limited or zero ability to influence the 
company's management or its strategy. Such 
an approach is suitable for investors who in¬ 
tend to purchase only a small percentage of 
the company's shares and to hold those shares 
until the valuation multiple moves from being 
"cheap" to being "in line" or "expensive" com¬ 
pared with the peer group. As noted above, 
relative valuation methods make no attempt 
to determine what is the "correct" price for a 
company's shares, but instead focus on trying 
to determine whether a company looks attrac¬ 
tive or unattractive by comparison with other 
firms that appear to be approximately similar 
in terms of size, geography, industry, and other 
parameters. 

DCF methods attempt to determine how 
much a company is worth in terms of "fair 
value" over a long time horizon. DCF methods 
can readily incorporate a range of assumptions 
about decisions in the near future or the dis¬ 
tant future, and therefore can provide a range 
of different scenarios. For this reason, most 
academics and practitioners consider that DCF 
methods are likely to produce greater insight 
than relative valuation methods into the vari¬ 
ous forces that may affect the fair value for a 
business. More specifically, DCF methods can 
be more applicable to situations where an in¬ 
vestor will seek to influence a company's future 
direction—perhaps as an activist investor push¬ 
ing management in new directions, or possibly 
as a bidder for a controlling stake in the firm. 
In such situations, relative valuation analysis is 
unlikely to provide much insight because the 
investor will actually be seeking to affect the 
company's valuation multiples directly, by af¬ 
fecting the value of the denominator. 

Nevertheless, even where an analyst favors 
the use of DCF approaches, we consider that 
relative valuation methods can still be valu¬ 
able as a "sanity check" on the output from a 
DCF-based valuation. An analyst can take the 
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expected valuation from the DCF model and 
compare it with the projected values for net 
income, shareholders' equity, operating cash 
flow, and similar metrics. These ratios drawn 
from the DCF modeling process can then be 
compared with the multiples for a universe of 
similar firms. If the multiples generated by the 
analyst's DCF model are approximately compa¬ 
rable with the multiples that can be derived for 
similar companies that are already being pub¬ 
licly traded, then the analyst may conclude that 
the DCF model's assumptions appear to be rea¬ 
sonable. Flowever, if the multiples from the an¬ 
alyst's model appear to diverge considerably 
from the available information concerning val¬ 
uation multiples for apparently similar firms, 
then it may be a good idea to reexamine the 
model, rechecking whether the underlying as¬ 
sumptions are truly justifiable. 

Relative valuation methods can also be useful 
in another way when constructing DCF models. 
Most DCF models include a "terminal value," 
which represents the expected future value of 
the business, discounted back to the present, 
from all periods subsequent to the ones for 
which the analyst has developed explicit esti¬ 
mates. One way to calculate this terminal value 
is in terms of a perpetual growth rate, but the 
choice of a particular growth rate can be dif¬ 
ficult to justify on the basis of the firm's cur¬ 
rent characteristics. An alternative approach is 
to take current valuation multiples for similar 
firms and use those values as multiples for ter¬ 
minal value (see Damodaran, 2006, Chapter 4, 
pp. 143-144). 


KEY POINTS 

• Relative valuation methods tend to receive 
less attention from academics than DCF ap¬ 
proaches, but such methods are widely used 
by practitioners. If relative valuation ap¬ 
proaches suggest that a company is cheap on 
some metrics but expensive on others, this 
may indicate that the market views that com¬ 


pany as being an outlier for some reason, and 
an analyst will probably want to investigate 
further. 

• Choosing an appropriate group of compa¬ 
rable companies is perhaps the most chal¬ 
lenging aspect of relative valuation analysis. 
Where possible, an analyst should seek to 
identify six to 12 companies that are similar 
in terms of size, geography, and industry. If 
this is not possible, then an analyst should feel 
free to relax one or more of these parameters 
in order to obtain a usable universe. 

* Determining an appropriate set of valuation 
multiples is also important. Calculating a sin¬ 
gle set of multiples is likely to provide fewer 
insights than using several different metrics 
that span multiple time periods. It is conven¬ 
tional to use consensus estimates of future fi¬ 
nancial and operating performance, as these 
presumably represent the market's collective 
opinion of each firm's prospects. 

* Most relative valuation analysis is per¬ 
formed using standard multiples such as 
price/earnings or firm value/sales. Under 
some conditions, using industry-specific mul¬ 
tiples can be valuable, though there may be 
fewer consensus estimates for such data, and 
there may also be less intuition about what is 
the "fair" price for such ratios. 

• Relative valuation methods are particularly 
useful for investors who aim to take minority 
stakes in individual companies when they are 
"cheap" relative to their peers, and then sell 
those stakes when the companies become "ex¬ 
pensive." Such methods are likely to be less 
directly useful for investors who will seek to 
influence a company's management, or who 
aim to take a controlling stake in a company. 
For such investors, DCF methods are likely to 
be more applicable. 

NOTES 

1. By contrast, in an example of how to assess a 
small wine producer, the proposed universe 
of comparables consisted of 15 "beverage 
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firms," including both small and large caps, 
and covering specialists in beer, wine, and 
soft drink production. Arguably, some of 
these are unlikely to be very similar to the 
proposed target of analysis. See Chapter 7 in 
Damodaran (2006, pp. 249-252). 

2. For further examples using real firms and 
actual figures, see Damodaran (2006, Chap¬ 
ters 7 and 8) or Hooke (2010, Chapter 15). 
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Abstract: Investment approaches are determined by investors' views of the market. For investors 
who believe the market is basically efficient, so that price changes are essentially random and 
unpredictable, the reasonable approach is passive investing, or indexing, which makes no attempt 
to outperform the underlying market. Investors who believe there are clear-cut patterns discernible 
in stock price movements may aim for above-market returns by using fairly simple approaches, 
such as buying stocks with low price/eaming ratios or buying small-capitalization stocks. But 
what if the market is not totally efficient, but there are no simple patterns that can be exploited 
for consistent excess returns? Such a complex market requires an investment approach capable of 
dealing with that complexity. 


Scientists classify systems into three types— 
ordered, random, and complex. Ordered sys¬ 
tems, such as the structure of diamond crystals 
or the dynamics of pendulums, are definable 
and predictable by relatively simple rules and 
can be modeled using a relatively small number 
of variables. Random systems like the Brownian 
motion of gas molecules or white noise (static) 
are unordered; they are the product of a large 
number of variables. Their behavior cannot be 
modeled and is inherently unpredictable. 

Complex systems like the weather and the 
workings of DNA fall somewhere between the 
domains of order and randomness. Their be¬ 
havior can be at least partly comprehended 
and modeled, but only with great difficulty. The 
number of variables that must be modeled and 


their interactions are beyond the capacity of the 
human mind alone. Only with the aid of ad¬ 
vanced computational science can the myster¬ 
ies of complex systems be unraveled . 1 

The stock market is a complex system . 2 Stock 
prices are not completely random, as the effi¬ 
cient market hypothesis and random walk the¬ 
ory would have it. Some price movements can 
be predicted, and with some consistency. But 
stock price behavior is not ordered. It cannot be 
successfully modeled by simple rules or screens 
such as low price-to-earnings ratios (P/Es) or 
even by elegant theories such as the capital as¬ 
set pricing model or arbitrage pricing theory. 
Rather, stock price behavior is permeated by 
a complex web of interrelated return effects. A 
model of the market that is complex enough to 
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disentangle these effects provides opportuni¬ 
ties for modeling price behavior and predicting 
returns. 

This entry describes our approach to investing 
and its application to the stock selection, portfolio 
construction, and performance evaluation prob¬ 
lems. We begin with the very basic question 
of how one should approach the equity market. 
Should one attempt to cover the broadest pos¬ 
sible range of stocks, or can greater analytical 
insights be garnered by focusing on a particu¬ 
lar subset of the market or a limited number of 
stocks? Each approach has its advantages and 
disadvantages. However, combining the two 
may offer the best promise of finding the key 
to unlocking investment opportunity in a com¬ 
plex market. 

While covering the broadest possible range 
of stocks, a complex approach recognizes that 
there are significant differences in the ways 
different types of stocks respond to changes in 
both fundamentals and investor behavior. This 
requires taking into account the interrelation¬ 
ships between numerous potential sources of 
price behavior. Multivariate analysis disentan¬ 
gles the web of return-predictor relationships 
that constitutes a complex market and provides 
independent, additive return predictions that 
are more robust than the predictions from 
univariate analyses. 


AN INTEGRATED APPROACH 
TO A SEGMENTED MARKET 

While one might think that U.S. equity markets 
are fluid and fully integrated, in reality there are 
barriers to the free flow of capital. Some of these 
barriers are self-imposed by investors. Others 
are imposed by regulatory and tax authorities 
or by client guidelines. 

Some funds, for example, are prohibited by 
regulation or internal policy guidelines from 
buying certain types of stock—non-dividend- 
paying stock, or stock below a given capitaliza¬ 


tion level. Tax laws, too, may effectively lock 
investors into positions they would otherwise 
trade. Such barriers to the free flow of capital 
foster market segmentation. 

Other barriers are self-imposed. Traditionally, 
for example, managers have focused (whether 
by design or default) on distinct approaches 
to stock selection. Value managers have con¬ 
centrated on buying stocks selling at prices 
perceived to be low relative to the company's 
assets or earnings. Growth managers have 
sought stocks with above-average earnings 
growth not fully reflected in price. Small- 
capitalization managers have searched for op¬ 
portunity in stocks that have been overlooked 
by most investors. The stocks that constitute the 
natural selection pools for these managers tend 
to group into distinct market segments. 

Client preferences encourage this balkaniza¬ 
tion of the market. Some investors, for exam¬ 
ple, prefer to buy value stocks, while others 
seek growth stocks; some invest in both, but hire 
separate managers for each segment. Both in¬ 
stitutional and individual investors generally 
demonstrate a reluctance to upset the apple cart 
by changing allocations to previously selected 
style managers. Several periods of underperfor¬ 
mance, however, may undermine this loyalty 
and motivate a flow of capital from one seg¬ 
ment of the market to another (often just as the 
out-of-favor segment begins to benefit from a 
reversion of returns back up to their historical 
mean). 

The actions of investment consultants have 
formalized a market segmented into style 
groupings. Consultants design style indexes 
that define the constituent stocks of these seg¬ 
ments and define managers in terms of their 
proclivity for one segment or another. As a 
manager's performance is measured against the 
given style index, managers who stray too far 
from index territory are taking on extra risk. 
Consequently, managers tend to stick close to 
their style homes, reinforcing market segmen¬ 
tation. 
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An investment approach that focuses on in¬ 
dividual market segments can have its ad¬ 
vantages. Such an approach recognizes, for 
example, that the U.S. equity market is nei¬ 
ther entirely homogeneous nor entirely het¬ 
erogeneous. All stocks do not react alike to a 
given impetus, but nor does each stock exhibit 
its own, totally idiosyncratic price behavior. 
Rather, stocks within a given style, or sector, or 
industry tend to behave similarly to each other 
and somewhat differently from stocks outside 
their group. 

An approach to stock selection that 
specializes in one market segment can op¬ 
timize the application of talent and maximize 
the potential for outperformance. This is 
most likely true for traditional, fundamen¬ 
tal analysis. The in-depth, labor-intensive 
research undertaken by traditional analysts 
can become positively ungainly without some 
focusing lens. 

An investment approach that focuses on the 
individual segments of the market, however, 
presents some theoretical and practical prob¬ 
lems. Such an approach may be especially dis¬ 
advantaged when it ignores the many forces 
that work to integrate, rather than segment, the 
market. 

Many managers, for example, do not special¬ 
ize in a particular market segment but are free 
to choose the most attractive securities from 
a broad universe of stocks. Others, such as 
style rotators, may focus on a particular type of 
stock, given current economic conditions, but 
be poised to change their focus should condi¬ 
tions change. Such managers make for capital 
flows and price arbitrage across the boundaries 
of particular segments. 

Furthermore, all stocks can be defined by the 
same fundamental parameters—by market cap¬ 
italization, P/E, dividend discount model rank¬ 
ing, and so on. All stocks can be found at some 
level on the continuum of values for each pa¬ 
rameter. Thus, growth and value stocks inhabit 
the opposite ends of the continuums of P/E and 


dividend yield, and small and large stocks the 
opposite ends of the continuums of firm capi¬ 
talization and analyst coverage. 

As the values of the parameters for any in¬ 
dividual stock change, so too does the stock's 
position on the continuum. An out-of-favor 
growth stock may slip into value territory. A 
small-cap company may grow into the large- 
cap range. 

Finally, while the values of these parame¬ 
ters vary across stocks belonging to different 
market segments—different styles, sectors, and 
industries—and while investors may favor cer¬ 
tain values—low P/E, say, in preference to 
high P/E—arbitrage tends to counterbalance 
too pronounced a predilection on the part of 
investors for any one set of values. In equilib¬ 
rium, all stocks must be owned. If too many 
investors want low P / E, low-P / E stocks will be 
bid up to higher P/E levels, and some investors 
will step in to sell them and buy other stocks 
deserving of higher P/Es. Arbitrage works to¬ 
ward market integration and a single pricing 
mechanism. 

A market that is neither completely seg¬ 
mented nor completely integrated is a com¬ 
plex market. A complex market calls for an 
investment approach that is 180 degrees re¬ 
moved from the narrow, segment-oriented fo¬ 
cus of traditional management. It requires 
a complex, unified approach that takes into 
account the behavior of stocks across the broad¬ 
est possible selection universe, without los¬ 
ing sight of the significant differences in price 
behavior that distinguish particular market 
segments. 

Such an approach offers three major advan¬ 
tages. First, it provides a coherent evaluation 
framework. Second, it can benefit from all the 
insights to be garnered from a wide and di¬ 
verse range of securities. Third, because it has 
both breadth of coverage and depth of analy¬ 
sis, it is poised to take advantage of more profit 
opportunities than a more narrowly defined, 
segmented approach proffers. 
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A Coherent Framework 

To the extent that the market is integrated, an in¬ 
vestment approach that models each industry 
or style segment as if it were a universe unto 
itself is not the best approach. Consider, for ex¬ 
ample, a firm that offers both core and value 
strategies. Suppose the firm runs a model on its 
total universe of, say, 3,000 stocks. It then runs 
the same model or a different, segment-specific 
model on a 500-stock subset of large-cap value 
stocks. 

If different models are used for each strat¬ 
egy, the results will differ. Even if the same 
model is estimated separately for each strategy, 
its results will differ because the model coeffi¬ 
cients are bound to differ between the broader 
universe and the narrower segment. What if 
the core model predicts GM will outperform 
Ford, while the value model shows the reverse? 
Should the investor start the day with multi¬ 
ple estimates of one stock's alpha? This would 
violate what we call the law of one alpha. 3 

Of course, the firm could ensure coherence 
by using separate models for each market 
segment—growth, value, small-cap, linking the 
results via a single, overarching model that re¬ 
lates all the subsets. But the firm then runs into 
a second problem with segmented investment 
approaches: To the extent that the market is inte¬ 
grated, the pricing of securities in one segment 
may contain information relevant to pricing in 
other segments. 

For example, within a generally well- 
integrated national economy, labor market con¬ 
ditions in the United States differ region by 
region. An economist attempting to model 
employment in the Northeast would proba¬ 
bly consider economic expansion in the South¬ 
east. Similarly, the investor who wants to model 
growth stocks should not ignore value stocks. 
The effects of inflation, say, on value stocks may 
have repercussions for growth stocks; after all, 
the two segments represent opposite ends of 
the same P/E continuum. 

An investment approach that concentrates on 
a single market segment does not make use of 


all available information. A complex, unified 
approach considers all the stocks in the uni¬ 
verse, value and growth, large and small. It thus 
benefits from all the information to be gleaned 
from a broad range of stock price behavior. 

Of course, an increase in breadth of inquiry 
will not benefit the investor if it comes at the 
sacrifice of depth of inquiry. A complex ap¬ 
proach does not ignore the significant differ¬ 
ences across different types of stock, differences 
exploitable by specialized investing. What's 
more, in examining similarities and differences 
across market segments, it considers numerous 
variables that may be considered to be defining. 

For value, say, a complex approach does not 
confine itself to a dividend discount model 
measure of value, but examines also earnings, 
cash flow, sales, and yield value, among other 
attributes. Growth measurements to be consid¬ 
ered include historical, expected, and sustain¬ 
able growth, as well as the momentum and 
stability of earnings. Share price, volatility, and 
analyst coverage are among the elements to be 
considered along with market capitalization as 
measures of size. 

At a deeper level of analysis, one must also 
consider alternative ways of specifying such 
fundamental variables as earnings or cash flow. 
Over what period does one measure earnings? 
If using analyst earnings expectations, which 
measure provides the best estimate of future 
real earnings? The consensus of all available es¬ 
timates made over the past six months, or only 
the very latest earnings estimates? Are some an¬ 
alysts more accurate or more influential? What 
if a recent estimate is not available for a given 
company? 4 

Predictor variables are often closely corre¬ 
lated with each other. Small-cap stocks, for ex¬ 
ample, tend to have low P/Es; low P/E is 
correlated with high yield; both low P/E and 
high yield are correlated with dividend dis¬ 
count model (DDM) estimates of value. Fur¬ 
thermore, they may be correlated with a stock's 
industry affiliation. A simple low-P/E screen, 
for example, will tend to select a large number 
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of bank and utility stocks. Such correlations can 
distort naive attempts to relate returns to po¬ 
tentially relevant predictors. A true picture of 
the return-predictor relationship emerges only 
after disentangling the predictors. 


DISENTANGLING 

The effects of different sources of stock return 
can overlap. In Figure 1, the lines represent con¬ 
nections documented by academic studies; they 
may appear like a ball of yarn after the cat got 
to it. To unravel the connections between pre¬ 
dictor variables and return, it is necessary to 
examine all the variables simultaneously. 

For instance, the low-P/E effect is widely rec¬ 
ognized, as is the small-size effect. But stocks 
with low P/Es also tend to be of small size. 
Are P/E and size merely two ways of look¬ 
ing at the same effect? Or does each variable 
matter? Perhaps the excess returns to small-cap 
stocks are merely a January effect, reflecting the 
tendency of taxable investors to sell depressed 
stocks at year-end. Answering these questions 
requires disentangling return effects via multi¬ 
variate regression. 5 

Common methods of measuring return 
effects (such as quintiling or univariate, single¬ 
variable, regression) are naive because they as¬ 
sume, naively, that prices are responding only 
to the single variable under consideration, low 



Figure 1 Return Effects Form a Tangled Web 


P/E, say. But a number of related variables 
may be affecting returns. As we have noted, 
small-cap stocks and banking and utility indus¬ 
try stocks tend to have low P/Es. A univariate 
regression of return on low P/E will capture, 
along with the effect of P/E, a great deal of 
noise related to firm size, industry affiliation, 
and other variables. 

Simultaneous analysis of all relevant vari¬ 
ables via multivariate regression takes into ac¬ 
count and adjusts for such interrelationships. 
The result is the return to each variable sep¬ 
arately, controlling for all related variables. A 
multivariate analysis for low P/E, for example, 
will provide a measure of the excess return to 
a portfolio that is market-like in all respects ex¬ 
cept for having a lower-than-average P/E ratio. 
Disentangled returns are 'pure returns. 

Noise Reduction 

Figure 2 plots naive and pure cumulative 
monthly excess (relative to a 3,000-stock uni¬ 
verse) returns to high book-to-price ratio (B/P). 
(Conceptually, naive and pure returns come 
from a portfolio having a B/P that is one stan¬ 
dard deviation above the universe mean B/P; 
for the pure returns, the portfolio is also con¬ 
strained to have universe-average exposures to 
all the other variables in the model, including 
fundamental characteristics and industry affil¬ 
iations.) The naive returns show a great deal of 
volatility; the pure returns, by contrast, follow 
a much smoother path. There is a lot of noise in 
the naive returns. What causes it? 

Notice the divergence between the naive and 
pure return series for the 12 months starting in 
March 1979. This date coincides with the crisis 
at Three Mile Island nuclear power plant. Util¬ 
ities such as GPU, operator of the Three Mile 
Island power plant, tend to have high B/Ps, 
and naive B/P measures will reflect the perfor¬ 
mance of these utilities along with the perfor¬ 
mance of other high-B/P stocks. Electric utility 
prices plummeted 24% after the Three Mile Is¬ 
land crisis. The naive B/P measure reflects this 
decline. 
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Figure 2 Naive and Pure Returns to High Book-to-Price Ratio 


But industry-related events such as Three 
Mile Island have no necessary bearing on the 
B/P variable. An investor could, for example, 
hold a high-B/P portfolio that does not over¬ 
weight utilities, and such a portfolio would not 
have experienced the decline reflected in the 
naive B/P measure in Figure 2. The naive re¬ 
turns to B/P reflect noise from the inclusion of 
a utility industry effect. A pure B/P measure is 
not contaminated by such irrelevant variables. 

Disentangling distinguishes real effects from 
mere proxies and thereby distinguishes be¬ 
tween real and spurious investment opportu¬ 
nities. As it separates high B/P and industry 
affiliation, for example, it can also separate the 
effects of firm size from the effects of related 
variables. Disentangling shows that returns to 
small firms in January are not abnormal; the 
apparent January seasonal merely proxies for 
year-end tax-loss selling. 6 Not all small firms 
will benefit from a January rebound; indiscrim¬ 
inately buying small firms at the turn of the year 
is not an optimal investment strategy. Ascer¬ 
taining true causation leads to more profitable 
strategies. 

Return Revelation 

Disentangling can reveal hidden opportunities. 
Figure 3 plots the naively measured cumulative 


monthly excess returns (relative to the 3,000- 
stock universe) to portfolios that rank lower 
than average in market capitalization and price 
per share and higher than average in terms 
of analyst neglect. These results derive from 
monthly univariate regressions. The small-cap 
line thus represents the cumulative excess re¬ 
turns to a portfolio of stocks naively chosen on 
the basis of their size, with no attempt made to 
control for other variables. 

All three return series move together. The sim¬ 
ilarity between the small-cap and neglect series 
is particularly striking. This is confirmed by the 
correlation coefficients in the first column of 
Table 1. Furthermore, all series show a great 
deal of volatility within a broader up, down, up 
pattern. 

Figure 4 shows the pure cumulative monthly 
excess returns to each size-related attribute over 
the period. These disentangled returns adjust 
for correlations not only between the three size 


Table 1 Correlations Between Monthly Returns to 
Size-Related Variables’ 


Variable 

Naive 

Pure 

Small cap/low price 

0.82 

-0.12 

Small cap/neglect 

0.87 

-0.22 

Neglect/low price 

0.66 

-0.11 


’A coefficient of 0.14 is significant at the 5% level. 
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Figure 3 Naive Returns Can Hide Opportunities (Three Size-Related Variables) 


variables, but also between each size variable 
and industry affiliations and each variable and 
growth and value characteristics. Two findings 
are immediately apparent from Figure 4. 

First, pure returns to the size variables do not 
appear to be nearly as closely correlated as the 
naive returns displayed in Figure 3. In fact, over 
the second half of the period, the three return 
series diverge substantially This is confirmed 
by the correlation coefficients in the second col¬ 
umn of Table 1. 

In particular, pure returns to small capital¬ 
ization accumulate quite a gain over the pe¬ 


riod; they are up 30%, versus an only 20% 
gain for the naive returns to small cap. Purify¬ 
ing returns reveals a profit opportunity not ap¬ 
parent in the naive returns. Furthermore, pure 
returns to analyst neglect amount to a sub¬ 
stantial loss over the period. Because disen¬ 
tangling controls for proxy effects, and thereby 
avoids redundancies, these pure return effects 
are additive. A portfolio could have aimed 
for superior returns by selecting small-cap 
stocks with a higher-than-average analyst fol¬ 
lowing (that is, a negative exposure to analyst 
neglect). 



-Small Cap Neglect.Low Price 


Figure 4 Pure Returns Can Reveal Opportunities (Three Size-Related Variables) 
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Table 2 Pure Returns Are Less Volatile, More 
Predictable: Standard Deviations of Monthly Returns 
to Size-Related Variables* 


Variable 

Naive 

Pure 

Small cap 

0.87 

0.60 

Neglect 

0.87 

0.67 

Low price 

1.03 

0.58 


* All differences between naive and pure return standard 
deviations are significant at the 1% level. 


Second, the pure returns appear to be much 
less volatile than the naive returns. The naive re¬ 
turns in Figure 3 display much month-to-month 
volatility within their more general trends. By 
contrast, the pure series in Figure 4 are much 
smoother and more consistent. This is con¬ 
firmed by the standard deviations given in 
Table 2. 

The pure returns in Figure 4 are smoother and 
more consistent than the naive return responses 
in Figure 3 because the pure returns capture 
more signal and less noise. And because they 
are smoother and more consistent than naive 
returns, pure returns are also more predictive. 

Predictive Power 

Disentangling improves the predictive power 
of estimated returns by providing a clearer 
picture of the relationships between investor 
behavior, fundamental variables, and macro- 
economic conditions. For example, investors 
often prefer value stocks in bearish market en¬ 
vironments, because growth stocks are priced 
more on the basis of high expectations, which 
get dashed in more pessimistic eras. But the 
success of such a strategy will depend on the 
variables one has chosen to define value. 

Table 3 displays the results of regressing both 
naive and pure monthly returns to various 
value-related variables on market (S&P 500) 
returns over the 1978-1996 period. 7 The re¬ 
sults indicate that DDM value is a poor indi¬ 
cator of a stock's ability to withstand a tide of 
receding market prices. The regression coeffi- 


Table 3 Market Sensitivities of Monthly Returns to 
Value-Related Variables 


Variable 

Naive 

(t-stat.) 

Pure 

(t-stat.) 

DDM 

0.06 

(5.4) 

0.04 

(5.6) 

B/P 

-0.10 

(-6.2) 

-0.01 

(-0.8) 

Yield 

-0.08 

(-7.4) 

-0.03 

(-3.5) 


cient in the first column indicates that a portfo¬ 
lio with a one-standard-deviation exposure to 
DDM value will tend to outperform by 0.06% 
when the market rises by 1.00% and to under¬ 
perform by a similar margin when the market 
falls by 1.00%. The coefficient for pure returns to 
DDM is similar. Whether their returns are mea¬ 
sured in pure or naive form, stocks with high 
DDM values tend to behave procyclically. 

Fligh B/P appears to be a better indicator of a 
defensive stock. It has a regression coefficient of 
—0.10 in naive form. In pure form, however, B/P 
is virtually uncorrelated with market move¬ 
ments; pure B/P signals neither an aggressive 
nor a defensive stock. B/P as naively measured 
apparently picks up the effects of truly defen¬ 
sive variables, such as high yield. 

The value investor in search of a defensive 
posture in uncertain market climates should 
consider moving toward high yield. The regres¬ 
sion coefficients for both naive and pure returns 
to high yield indicate significant negative mar¬ 
ket sensitivities. Stocks with high yields may be 
expected to lag in up markets but to hold up 
relatively well during general market declines. 

These results make broad intuitive sense. 
DDM is forward-looking, relying on estimates 
of future earnings. In bull markets, investors 
take a long-term outlook, so DDM explains 
security pricing behavior. In bear markets, how¬ 
ever, investors become myopic; they prefer to¬ 
day's tangible income to tomorrow's promise. 
Current yield is rewarded. 

Pure returns respond in intuitively satisfying 
ways to macroeconomic events. Figure 5 illus¬ 
trates, as an example, the estimated effects of 
changes in various macroeconomic variables on 
the pure returns to small size (as measured by 
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Figure 5 Forecast Response of Small Size to Macroeconomic Shocks 


market capitalization). Consistent with the cap¬ 
ital constraints on small firms and their rela¬ 
tively greater sensitivity to the economy, pure 
returns to small size may be expected to be 
negative in the first four months following an 
unexpected increase in the Baa corporate rate 
and positive in the first month following an 
unexpected increase in industrial production. 8 
Investors can exploit such predictable behav¬ 
ior by moving into and out of the small- 
cap market segment as economic conditions 
evolve. 9 

These examples serve to illustrate that the 
use of numerous, finely defined fundamental 
variables can provide a rich representation of 
the complexity of security pricing. The model 
can be even more finely tuned, however, by in¬ 
cluding variables that capture such subtleties as 
the effects of investor psychology, possible non¬ 


linearities in variable-return relationships, and 
security transaction costs. 

Additional Complexities 

In considering possible variables for inclusion 
in a model of stock price behavior, the in¬ 
vestor should recognize that pure stock returns 
are driven by a combination of economic fun¬ 
damentals and investor psychology. That is, 
economic fundamentals such as interest rates, 
industrial production, and inflation can explain 
much, but by no means all, of the system¬ 
atic variation in returns. Psychology, including 
investors' tendency to overreact, their desire 
to seek safety in numbers, and their selective 
memories, also plays a role in security pricing. 

What's more, the modeler should realize that 
the effects of different variables, fundamental 
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and otherwise, can differ across different types 
of stocks. The value sector, for example, in¬ 
cludes more financial stocks than the growth 
sector. Investors may thus expect value stocks in 
general to be more sensitive than growth stocks 
to changes in interest rate spreads. 

Psychologically based variables such as short¬ 
term overreaction and price correction also 
seem to have a stronger effect on value than 
on growth stocks. Earnings surprises and earn¬ 
ings estimate revisions, by contrast, appear to 
be more important for growth than for value 
stocks. Thus, Google shares can take a nosedive 
when earnings come in a penny under expecta¬ 
tions, whereas Duke Energy shares remain un¬ 
moved even by fairly substantial departures of 
actual earnings from expectations. 

The relationship between stock returns and 
relevant variables may not be linear. The ef¬ 
fects of positive earnings surprises, for in¬ 
stance, tend to be arbitraged away quickly; 
thus positive earnings surprises offer less 
opportunity for the investor. The effects of neg¬ 
ative earnings surprises, however, appear to be 
more long-lasting. This nonlinearity may reflect 
the fact that sales of stock are limited to those 
investors who already own the stock (and to a 
relatively small number of short-sellers). 10 

Risk-variable relationships may also differ 
across different types of stock. In particular, 
small-cap stocks generally have more idiosyn¬ 
cratic risk than large-cap stocks. Diversification 
is thus more important for small-stock than for 
large-stock portfolios. 

Return-variable relationships can also change 
over time. Recall the difference between DDM 
and yield value measures: high-DDM stocks 
tend to have high returns in bull markets and 
low returns in bear markets; high-yield stocks 
experience the reverse. For consistency of per¬ 
formance, return modeling must consider the 
effects of market dynamics, the changing na¬ 
ture of the overall market. 

The investor may also want to decipher the 
informational signals generated by informed 
agents. Corporate decisions to issue or buyback 


shares, split stock, or initiate or suspend divi¬ 
dends, for example, may contain valuable in¬ 
formation about company prospects. So, too, 
may insiders' (legal) trading in their own firms' 
shares. 

Finally, a complex model containing multi¬ 
ple variables is likely to turn up a number 
of promising return-variable relationships. But 
are these perceived profit opportunities trans¬ 
latable into real economic opportunities? Are 
some too ephemeral? Too small to survive 
frictions such as trading costs? Estimates of 
expected returns must be combined with esti¬ 
mates of the costs of trading to arrive at realistic 
returns net of trading costs. 

CONSTRUCTING, TRADING, 
AND EVALUATING 
PORTFOLIOS 

To maximize implementation of the model's 
insights, the portfolio construction process 
should consider exactly the same dimensions 
found relevant by the stock selection model. 
Failure to do so can lead to mismatches between 
model insights and portfolio exposures. 11 

Consider a commercially available portfolio 
optimizer that recognizes only a subset of the 
variables in the valuation model. Risk reduc¬ 
tion using such an optimizer will reduce the 
portfolio's exposures only along the dimen¬ 
sions the optimizer recognizes. As a result, the 
portfolio is likely to wind up more exposed to 
those variables recognized by the model, but 
not the optimizer, and less exposed to those 
variables common to both the model and the 
optimizer. 

Imagine an investor who seeks low-P/E 
stocks that analysts are recommending for pur¬ 
chase, but who uses a commercial optimizer 
that incorporates a P/E factor but not ana¬ 
lyst recommendations. The investor is likely 
to wind up with a portfolio that has a less- 
than-optimal level of exposure to low P/E and 
a greater-than-optimal level of exposure to 
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analyst purchase recommendations. Optimiza¬ 
tion using all relevant variables ensures a port¬ 
folio whose risk and return opportunities are 
balanced in accordance with the model's in¬ 
sights. Furthermore, the use of more numerous 
variables allows portfolio risk to be more finely 
tuned. 

Insofar as the investment process, both stock 
selection and portfolio construction, is model- 
driven, it is more adaptable to electronic trading 
venues. This should benefit the investor in sev¬ 
eral ways. First, electronic trading is generally 
less costly, with lower commissions, market im¬ 
pact, and opportunity costs. Second, it allows 
real-time monitoring, which can further reduce 
trading costs. Third, an automated trading sys¬ 
tem can take account of more factors, including 
the urgency of a particular trade and market 
conditions, than individual traders can be ex¬ 
pected to bear in mind. 

Finally, the performance attribution process 
should be congruent with the dimensions of the 
selection model (and portfolio optimizer). Inso¬ 
far as performance attribution identifies sources 
of return, a process that considers all the sources 
identified by the selection model will be more 
insightful than a commercial performance at¬ 
tribution system applied in a one-size-fits-all 
manner. Our investor who has sought exposure 
to low P/E and positive analyst recommenda¬ 
tions, for example, will want to know how each 
of these factors has paid off and will be less in¬ 
terested in the returns to factors that are not a 
part of the stock selection process. 

A performance evaluation process tailored 
to the model also functions as a monitor 
of the model's reliability, bias portfolio per¬ 
formance supported the model's insights? 
Should some be reexamined? Equally impor¬ 
tant, does the model's reliability hold up over 
time? A model that performs well in today's 
economic and market environments may not 
necessarily perform well in the future. A feed¬ 
back loop between the evaluation and the re¬ 
search/modeling processes can help ensure 
that the model retains robustness over time. 


PROFITING FROM 
COMPLEXITY 

H. L. Mencken is supposed to have noted, "For 
every complex problem, there is a simple solu¬ 
tion, and it is almost always wrong." Complex 
problems more often than not require complex 
solutions. 

A complex approach to stock selection, port¬ 
folio construction, and performance evaluation 
is needed to capture the complexities of the 
stock market. Such an approach combines the 
breadth of coverage and the depth of analy¬ 
sis needed to maximize investment opportunity 
and potential reward. 

Grinold presents a formula that identi¬ 
fies the relationships between the depth and 
breadth of investment insights and investment 
performance: 12 

IR = icVbr 

IR is the manager's information ratio, a mea¬ 
sure of the success of the investment process. 
IR equals annualized excess return over an¬ 
nualized residual risk (e.g., 2% excess return 
with 4% tracking error provides 0.5 IR). IC, the 
information coefficient, or correlation between 
predicted and actual security returns, measures 
the goodness of the manager's insights, or the 
manager's skill. BR is the breadth of the strat¬ 
egy, measurable as the number of independent 
insights upon which investment decisions are 
made. 

One can increase IRby increasing IC or BR. In¬ 
creasing IC means coming up with some means 
of improving predictive accuracy. Increasing BR 
means coming up with more "investable" in¬ 
sights. A casino analogy may be apt (if anath¬ 
ema to prudent investors). 

A gambler can seek to increase IC by card 
counting in blackjack or by building a computer 
model to predict probable roulette outcomes. 
Similarly, some investors seek to outperform 
by concentrating their research efforts on a few 
stocks: by learning all there is to know about 
Microsoft, for example, one may be able to 
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outperform all the other investors who follow 
this stock. But a strategy that makes a few con¬ 
centrated stock bets is likely to produce consis¬ 
tent performance only if it is based on a very 
high level of skill, or if it benefits from extraor¬ 
dinary luck. 

Alternatively, an investor can place a larger 
number of smaller stock bets and settle for more 
modest returns from a greater number of invest¬ 
ment decisions. That is, rather than behaving 
like a gambler in a casino, the investor can be¬ 
have like the casino. A casino has only a slight 
edge on any spin of the roulette wheel or roll 
of the dice, but many spins of many roulette 
wheels can result in a very consistent profit for 
the house. Over time, the odds will strongly 
favor the casino over the gambler. 

A complex approach to the equity market, one 
that has both breadth of inquiry and depth of fo¬ 
cus, can enhance the number and the goodness 
of investment insights. A complex approach to 
the equity market requires more time, effort, 
and ability, but it will be better positioned to 
capture the complexities of security pricing. The 
rewards are worth the effort. 

KEY POINTS 

* Ordered systems are definable and pre¬ 
dictable by relatively simple rules; random 
systems cannot be modeled and are inher¬ 
ently unpredictable; complex systems can be 
at least partly comprehended and modeled, 
but only with difficulty. 

* Stock price behavior is permeated by a com¬ 
plex web of interrelated return effects, and it 
requires a complex approach to stock selec¬ 
tion, portfolio construction, and performance 
evaluation to capture this complexity. 

• A complex approach combines the breadth of 
coverage and the depth of analysis needed 
to maximize investment opportunity and po¬ 
tential reward. 

• Simple methods of measuring return ef¬ 
fects (such as quintiling or univariate, single¬ 
variable regression) are naive because they 


assume that prices are responding only to the 
single variable under consideration. 

• Simultaneous analysis of all relevant vari¬ 
ables via multivariate regression takes into 
account and adjusts for interrelationships be¬ 
tween effects, giving the return to each vari¬ 
able separately. 

• Disentangling distinguishes real effects from 
mere proxies and thereby distinguishes be¬ 
tween real and spurious investment opportu¬ 
nities. 

• Because disentangling controls for proxy ef¬ 
fects, pure return effects are additive, each 
having the potential to improve portfolio 
performance. 

• In general, disentangling enhances the pre¬ 
dictive power of estimated returns by pro¬ 
viding a clearer picture of the relationships 
between investor behavior, fundamental vari¬ 
ables, and macroeconomic conditions. 

• To maximize implementation of insights 
gained from disentangling the market's com¬ 
plexity, the portfolio construction process 
should consider exactly the same dimensions 
found relevant by the stock selection process. 

• Performance attribution should be congruent 
with the stock selection and portfolio con¬ 
struction processes so that it can be used to 
monitor the reliability of the stock selection 
process and provide input for research. 


NOTES 

1. See Pagels (1988) and Wolfram (2002). 

2. Jacobs and Levy (1989a). 

3. See Jacobs and Levy (1995b). 

4. See Jacobs, Levy, and Krask (1997). 

5. See Jacobs and Levy (1988b). 

6. Jacobs and Levy (1988a). 

7. Jacobs and Levy (1988c). 

8. See Jacobs and Levy (1989b). 

9. Jacobs and Levy (1996). 

10. See Jacobs and Levy (1993). 

11. See Jacobs and Levy (1995a). 

12. Grinold (1989). 
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Abstract: Quantitative equity portfolio selection often involves extending the classical mean- 
variance framework or more advanced tail-risk portfolio allocation frameworks to include dif¬ 
ferent constraints that take specific investment guidelines and institutional features into account. 
Examples of such constraints are holding constraints that set limits on the total concentration of 
assets in an industry, sector, or country; turnover constraints that restrict the amount of trad¬ 
ing; tracking error constraints that limit the difference between the performance of the port¬ 
folio and a benchmark; and risk factor constraints that limit the exposure of the portfolio to 
a risk factor such as the market. Portfolio allocation models can also account for transaction 
costs, taxes, and optimization of trades across multiple client accounts. An important practi¬ 
cal issue in quantitative equity portfolio selection is how to mitigate the effect of model and 
estimation errors on the optimal allocation. Techniques that are used to address this issue in¬ 
clude robust statistical techniques for parameter estimation, portfolio resampling, and robust 
optimization. 


An integrated investment process generally 

involves the following activities: 1 2 3 

1. An investor's objectives, preferences, and 
constraints are identified and specified to de¬ 
velop explicit investment policies. 

2. Strategies are developed and implemented 
through the choice of optimal combinations 
of financial and real assets in the market¬ 
place. 

3. Market conditions, relative asset values, and 
the investor's circumstances are monitored. 


4. Portfolio adjustments are made as appropri¬ 
ate to reflect significant changes in any or all 
of the relevant variables. 

In this entry we focus on the second ac¬ 
tivity of the investment process, developing 
and implementing a portfolio strategy. The de¬ 
velopment of the portfolio strategy itself is 
typically done in two stages: First, funds are 
allocated among asset classes. Then, they are 
managed within the asset classes. The mean- 
variance framework is used at both stages. 
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but in this entry, we discuss the second stage. 
Specifically, we introduce quantitative formu¬ 
lations of portfolio allocation problems used in 
equity portfolio management. Quantitative eq¬ 
uity portfolio selection often involves extending 
the classical mean-variance framework or more 
advanced tail-risk portfolio allocation frame¬ 
works to include different constraints that take 
specific investment guidelines and institutional 
features into account. 

We begin by providing a classification of the 
most common portfolio constraints used in prac¬ 
tice. We then discuss extensions such as index 
tracking formulations, the inclusion of transac¬ 
tion costs, optimization of trades across mul¬ 
tiple client accounts, and tax-aware strategies. 
We conclude with a review of methods for 
incorporating robustness in quantitative port¬ 
folio allocation procedures by using robust 
statistics, simulation, and robust optimization 
techniques. 


PORTFOLIO CONSTRAINTS 
COMMONLY USED IN 
PRACTICE 

Institutional features and investment policy 
specifications often lead to more complicated 
requirements than simple minimization of risk 
(whatever the definition of risk may be) or max¬ 
imization of expected portfolio return. For in¬ 
stance, there can be constraints that limit the 
number of trades, the exposure to a specific in¬ 
dustry, or the number of stocks to be kept in 
the portfolio. Some of these constraints are im¬ 
posed by the clients, while others are imposed 
by regulators. For example, in the case of regu¬ 
lated investment companies, restrictions on as¬ 
set allocation are set forth in the prospectus and 
may be changed only with the approval of the 
fund's board of directors. Pension funds must 
comply with Employee Retirement Income Se¬ 
curity Act (ERISA) requirements. The objective 
of the portfolio optimization problem can also be 


modified to consider specifically the trade-off 
between risk and return, transactions costs, or 
taxes. 

In this section, we will take a single-period 
view of investing, in the sense that the goal of 
the portfolio allocation procedure will be to in¬ 
vest optimally over a single predetermined pe¬ 
riod of interest, such as one month. 2 We will use 
wo to denote the vector array of stock weights 
in the portfolio at the beginning of the period, 
and w to denote the weights at the end of the 
period (to be determined). 

Many investment companies, especially insti¬ 
tutional investors, have a long investment hori¬ 
zon. Flowever, in reality, they treat that horizon 
as a sequence of shorter period horizons. Risk 
budgets are often stated over a time period of 
a year, and return performance is monitored 
quarterly or monthly. 

Long-Only (No-Short-Selling) 
Constraints 

Many funds and institutional investors face 
restrictions or outright prohibitions on the 
amount of short selling they can do. When short 
selling is not allowed, the portfolio allocation 
optimization model contains the constraints 
w > 0. 


Holding Constraints 

Diversification principles argue against invest¬ 
ing a large proportion of the portfolio in a single 
asset, or having a large concentration of assets 
in a specific industry, sector, or country. Lim¬ 
its on the holdings of a specific stock can be 
imposed with the constraints 

1 < w < u 

where 1 and u are vectors of lower and up¬ 
per bounds of the holdings of each stock in the 
portfolio. 

Consider now a portfolio of 10 stocks. Sup¬ 
pose that the issuers of assets 1, 3, and 5 are in 
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the same industry, and that we would like to 
limit the portfolio exposure to that industry to 
be at least 20% but at most 40%. To limit expo¬ 
sure to that industry, we add the constraint 

0.20 < w\ + W 3 + W 5 < 0.40 

to the portfolio allocation optimization prob¬ 
lem. 

More generally, if we have a specific set of 
stocks Ij out of the investment universe I con¬ 
sisting of stocks in the same category (such 
as industry or country), we can write the 
constraint 

% 5 - U i 

fch 

In words, this constraint requires that the sum 
of all stock weights in the particular category 
of investments with indexes Ij is greater than or 
equal to a lower bound Lj and less than or equal 
to a maximum exposure of Uj. 

Turnover Constraints 

High portfolio turnover can result in large 
transaction costs that make portfolio rebalanc¬ 
ing inefficient and costly. Thus, some portfolio 
managers limit the amount of turnover allowed 
when trading their portfolio. (Another way to 
control for transaction costs is to minimize them 
explicitly; we will discuss the appropriate for¬ 
mulations later in this entry.) 

Most commonly, turnover constraints are im¬ 
posed for each stock: 

|u>i - w 0 ,/1 < Ui, 

that is, the absolute magnitude of the difference 
between the final and the initial weight of stock 
i in the portfolio is restricted to be less than 
some upper bound Sometimes, a constraint 
is imposed to minimize the portfolio turnover 
as a whole: 

X I w i ~ wo,/I < Uj 
i eI i 


that is, the total absolute difference between the 
initial and the final weights of the stocks in the 
portfolio is restricted to be less than or equal 
to an upper bound I/,. Under this constraint, 
some stock weights may deviate a lot more than 
others from their initial weights, but the total 
deviation is limited. 

Turnover constraints are often imposed rel¬ 
ative to the average daily volume (ADV) of a 
stock . 3 For example, we may want to restrict 
turnover to be no more than 5% of the ADV. (In 
the latter case, the upper bound u, is set to a 
value equal to 5% of the ADV.) Modifications of 
these constraints, such as limiting turnover in 
a specific industry or sector, are also frequently 
applied. 


Risk Factor Constraints 

In practice, it is very common for quantitatively 
oriented portfolio managers to use factor mod¬ 
els to control for risk exposures to different risk 
factors. Such risk factors could include the mar¬ 
ket return, size, and style. Let us assume that the 
return on stock i has a factor structure with K 
risk factors, that is, it can be expressed through 
the equality 

K 

n = O', + X Ak ■ fk+Si 
k=X 

The factors fk are common to all securities. The 
coefficient /S* in front of each factor ft shows 
the sensitivity of the return on stock i to factor 
k. The value of a, shows the expected excess 
return of the return on stock i, and e, is the 
idiosyncratic (called "nonsystematic") part of 
the return of stock i. The coefficients a, and ft,), 
are typically estimated by multiple regression 
analysis. 

To limit the exposure of a portfolio of N stocks 
to the kth risk factor, we impose the constraint 

N 

T Pik • u n < 14 

i =1 
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To understand this constraint, note that the 
total return on the portfolio can be written as 

N N K 

X W i ' = X W i • ( a i + X @ ik ‘ f k + Si ) 


i =1 


z=l 


k=l 


= 5>-« + e(«-(x>-a)) 

N 

+ X * £i 


1=1 


The sensitivity of the portfolio to the different 
factors is represented by the second term, which 
can be also written as 

s((£--4‘) 

Therefore, the exposure to a particular factor 
k is the coefficient in front of//,, that is. 


N 




Wi 


i=1 


On an intuitive level, the sensitivity of the 
portfolio to a factor k will be larger the larger 
the presence of factor k in the portfolio through 
the exposure of the individual stocks. Thus, 
when we compute the total exposure of the 
portfolio to factor k, we need to take into con¬ 
sideration both how important this factor is for 
determining the return on each of the securities 
in the portfolio, and how much of each security 
we have in the portfolio. 

A commonly used version of the maximum 
factor exposure constraint is 

N 

^2 P' k ' w i=° 

i =1 


Cardinality Constraints 

Depending on the portfolio allocation model 
used, sometimes the optimization subroutine 
recommends holding small amounts of a large 
number of stocks, which can be costly when one 
takes into consideration the transaction costs 
incurred when acquiring these positions. Alter¬ 
natively, a portfolio manager may be interested 
in limiting the number of stocks used to track a 
particular index. (We will discuss index track¬ 
ing later in this entry.) To formulate the con¬ 
straint on the number of stocks to be held in the 
portfolio (called the cardinality constraint), we 
introduce binary variables, one for each of the 
N stocks in the portfolio. Let us call these binary 
variables Si,..., 5/v- Variable <5, will take value 
1 if stock i is included in the portfolio, and 0 
otherwise. 

Suppose that out of the N stocks in the in¬ 
vestment universe, we would like to include a 
maximum of K stocks in the final portfolio. K 
here is a positive integer and is less than N. 
This constraint can be formulated as 
N 

i =1 

Sj binary, i = 1,... ,N 

We need to make sure, however, that if a stock 
is not selected in the portfolio, then the binary 
variable that corresponds to that stock is set to 
0 , so that the stock is not counted as one of the 
K stocks left in the portfolio. When the port¬ 
folio weights are restricted to be nonnegative, 
this can be achieved by imposing the additional 
constraints 

0 < Wi < Si, i = 1,..., N 


This constraint forces the portfolio optimiza¬ 
tion algorithm to find portfolio weights so that 
the overall risk exposure to factor k is 0, that is, 
so that the portfolio is neutral with respect to 
changes in factor k. Portfolio allocation strate¬ 
gies that claim to be "market-neutral" typically 
employ this constraint, and the factor is in fact 
the return on the market. 


If the optimal weight for stock i turns out to 
be different from 0, then the binary variable 5; 
associated with stock i is forced to take value 
1, and stock i will be counted as one of the K 
stocks to be kept in the portfolio. If the optimal 
weight for stock i is 0, then the binary vari¬ 
able Sj associated with stock i can be either 0 
or 1, but that will not matter for all practical 
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purposes, because the solver will set it to 0 if 
there are too many other attractive stocks that 
will be counted as the K stocks to be kept in the 
portfolio. At the same time, since the portfolio 
weights Wj are between 0 and 1, and <5, is 0 or 1, 
the constraint ay < <5, does not restrict the values 
that the stock weight if, can take. 

The constraints are a little different if short 
sales are allowed, in which case the weights 
may be negative. We have 

—M ■ < u>i < M ■ i = 1,N 

where M is a "large" constant (large relative to 
the size of the inputs in the problem; so in this 
portfolio optimization application M = 10 can 
be considered "large"). You can observe that if 
the weight ay is anything but 0, the value of the 
binary variable <$,- will be forced to be different 
from 0, that is, <$,- will need to be 1, since it can 
only take values 0 or 1. 

Minimum Holding and Transaction 
Size Constraints 

Cardinality constraints are often used in con¬ 
junction with minimum holding/trading con¬ 
straints. The latter set a minimum limit on the 
amount of a stock that can be held in the portfo¬ 
lio, or the amount of a stock that can be traded, 
effectively eliminating small trades. Both car¬ 
dinality and minimum holding/trading con¬ 
straints aim to reduce the amount of transaction 
costs. 

Threshold constraints on the amount of stock 
i to be held in the portfolio can be imposed with 
the constraint 

\wi\ > Li ■ 8j 

where Li is the smallest holding size allowed 
for stock i, and !), is a binary variable, analogous 
to the binary variables 8j defined in the previ¬ 
ous section—it equals 1 if stock i is included in 
the portfolio, and 0 otherwise. (All additional 
constraints relating <5, and ay described in the 
previous section still apply.) 

Similarly, constraints can be imposed on the 
minimum trading amount for stock i. As we ex¬ 


plained earlier in this section, the size of the 
trade for stock i is determined by the abso¬ 
lute value of the difference between the current 
weight of the stock, iuqj , and the new weight ay 
that will be found by the solver: | zu, - w 0/ i \. The 
minimum trading size constraint formulation is 

\u>i - w 0 ,,'| > L‘ rade ■ 8i 

where Lj rade is the smallest trading size allowed 
for stock i. 

Adding binary variables to an optimization 
problem makes the problem more difficult for 
the solver and can increase the computation 
time substantially. That is why in practice, port¬ 
folio managers often omit minimum holding 
and transaction size constraints from the op¬ 
timization problem formulation, selecting in¬ 
stead to eliminate weights and/or trades that 
appear too small manually, after the optimal 
portfolio is determined by the optimization 
solver. It is important to realize, however, that 
modifying the optimal solution for the simpler 
portfolio allocation problem (the optimal solu¬ 
tion in this case is the weights/trades for the 
different stocks) by eliminating small positions 
manually does not necessarily produce the op¬ 
timal solution to an optimization problem that 
contained the minimum holding and transac¬ 
tion size constraints from the beginning. In fact, 
there can be pathological cases in which the so¬ 
lution is very different from the true optimal 
solution. However, for most cases in practice, 
the small manual adjustments to the optimal 
portfolio allocation do not cause tremendous 
discrepancies or inconsistencies. 

Round Lot Constraints 

So far, we have assumed that stocks are in¬ 
finitely divisible, that is, that we can trade and 
invest in fractions of stocks, bonds, and so on. 
This is, of course, not true—in reality, securities 
are traded in multiples of minimum transaction 
lots, or rounds (e.g., 100 or 500 shares). 

In order to represent the condition that secu¬ 
rities should be traded in rounds, we need to 
introduce additional decision variables (let us 
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call them Z;, i = 1, ..., N) that are integers and 
will correspond to the number of lots of a par¬ 
ticular security that will be purchased. Each z, 
will then be linked to the corresponding port¬ 
folio weight wj through the equality 

Wi = Z; ■ fi, i = 1, . . . , N 


where/, is measured in dollars, and is a fraction 
of the total amount to be invested. For example, 
suppose there is a total of SI00 million to be 
invested, and stock i trades at $50 in round lots 
of 100. Then 


_ — 50jJh0 — _ 5 i 0 _7 

J 100,000,000 


All remaining constraints in the portfolio al¬ 
location can be expressed through the weights 
wi, as usual. However, we also need to specify 
for the solver that the decision variables z, are 
integers. 

An issue with imposing round lot constraints 
is that the budget constraint 


w't = 1 


which is in fact 

N 

i=i 

may not be satisfied exactly. One possibility to 
handle this problem is to relax the budget con¬ 
straint. For example, we can state the constraint 
as 


w't < 1 


or, equivalently, 

N 

• fi < 1 

1=1 

This will ensure that we do not go over 
budget. 

If our objective is stated as expected return 
maximization, the optimization solver will at¬ 
tempt to make this constraint as tight as possi¬ 
ble, that is, we will end up using up as much 
of the budget as we can. Depending on the ob¬ 
jective function and the other constraints in the 
formulation, however, this may not always hap¬ 
pen. We can try to force the solver to minimize 


the slack in the budget constraint by introduc¬ 
ing a pair of nonnegative decision variables 
(let us call them e + and s~) that account for 
the amount that is "overinvested" or "underin¬ 
vested." These variables will pick up the slack 
left over because of the inability to round the 
amounts for the different investments. Namely, 
we impose the constraints 
N 

^ Z, • fi + S~ - £ + = 1 
i=l 

s~ > 0 , £ + > 0 

and subtract the following term from the objec¬ 
tive function: 

^-rl • (e + £ + ) 

where )-,\ is a penalty term associated with the 
amount of over- or underinvestment the port¬ 
folio manager is willing to tolerate (selected by 
the portfolio manager). In the final solution, the 
violation of the budget constraint will be min¬ 
imized. Note, however, that this formulation 
technically allows for the budget to be over- 
invested. 

The optimal portfolio allocation we obtain af¬ 
ter solving this optimization problem will not 
be the same as the allocation we would ob¬ 
tain if we solve an optimization problem with¬ 
out round lot constraints, and then round the 
amounts to fit the lots that can be traded in the 
market. 

Cardinality constraints, minimum holding/ 
trading constraints, and especially round lot 
constraints require more sophisticated binary 
and integer programming solvers, and are dif¬ 
ficult problems to solve in the case of large 
portfolios. 


BENCHMARK EXPOSURE 
AND TRACKING ERROR 
MINIMIZATION 

Expected portfolio return maximization under 
the mean-variance framework or other risk 
measure minimization are examples of active 
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investment strategies, that is, strategies that iden¬ 
tify a universe of attractive investments and ig¬ 
nore inferior investments opportunities. A dif¬ 
ferent approach, referred to as a passive invest¬ 
ment strategy, argues that in the absence of any 
superior forecasting ability, investors might as 
well resign themselves to the fact that they can¬ 
not beat the market. From a theoretical perspec¬ 
tive, the analytics of portfolio theory tell them 
to hold a broadly diversified portfolio anyway. 
Many mutual funds are managed relative to a 
particular benchmark or stock universe, such 
as the S&P 500 or the Russell 1000. The port¬ 
folio allocation models are then formulated in 
such a way that the tracking error relative to the 
benchmark is kept small. 

Standard Definition of 
Tracking Error 

To incorporate a passive investment strategy, 
we can change the objective function of the port¬ 
folio allocation problem so that instead of min¬ 
imizing a portfolio risk measure, we minimize 
the tracking error with respect to a benchmark 
that represents the market, such as the Russell 
3000, or the S&P 500. Such strategies are often 
referred to as indexing. The tracking error can 
be defined in different ways. However, practi¬ 
tioners typically mean a specific definition: the 
variance (or standard deviation) of the differ¬ 
ence between the portfolio return, w'r, and the 
return on the benchmark, w^r. Mathematically, 
the tracking error (TE) can be expressed as 

TE = Var( w'r — wj,r) 

= Var((w — Wf,)'r) 

= (w — Wh)'Var (r) (w — w/,) 

= (w — w/,)'E(w — w;,) 

where E is the covariance matrix of the stock re¬ 
turns. One can observe that the formula is very 
similar to the formula for the portfolio variance; 
however, the portfolio weights in the formula 
for the variance are replaced by differences be¬ 
tween the weights of the stocks in the portfolio 
and the weights of the stocks in the index. 


Why do we need to optimize portfolio 
weights in order to track a benchmark, when 
technically the most effective way to track a 
benchmark is by investing the portfolio in the 
stocks in the benchmark portfolio in the same 
proportions as the proportions of these securi¬ 
ties in the benchmark? The problem with this 
approach is that, especially with large bench¬ 
marks like the Russell 3000, the transaction 
costs of a proportional investment and the sub¬ 
sequent rebalancing of the portfolio can be pro¬ 
hibitive (that is, dramatically adversely impact 
the performance of the portfolio relative to the 
benchmark). Furthermore, in practice securities 
are not infinitely divisible, so investing a port¬ 
folio of a limited size in the same proportions 
as the composition of the benchmark will still 
not achieve zero tracking error. Thus, the opti¬ 
mal formulation is to require that the portfolio 
follows the benchmark as closely as possible. 

While indexing has become an essential part 
of many portfolio strategies, most portfolio 
managers cannot resist the temptation to iden¬ 
tify at least some securities that will outperform 
others. Hence, restrictions on the tracking er¬ 
ror are often imposed as a constraint, while the 
objective function is something different from 
minimizing the tracking error. The tracking er¬ 
ror constraint takes the form 

(w - w fc )'E(w - w b ) < erf E 

where rr-jy is a limit (imposed by the investor) 
on the amount of tracking error the investor is 
willing to tolerate. This is a quadratic constraint, 
which is convex and computationally tractable, 
but requires specialized optimization software. 

Alternative Ways of Defining 
Tracking Error 

There are alternative ways in which tracking- 
error type constraints can be imposed. 

For example, we may require that the absolute 
deviations of the portfolio weights (w) from the 
index weights (w;,) are less than or equal to a 
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given vector array of upper bounds u: 

|w- Wfcl < u 

where the absolute values |. | for the vector dif¬ 
ferences are taken componentwise, that is, for 
pairs of corresponding elements from the two 
vector arrays. These constraints can be stated as 
linear constraints by rewriting them as 

w — W;, < u 
— (w — Wf,) < u 

Similarly, we can require that for stocks within 
a specific industry (whose indexes in the port¬ 
folio belong to a subset Ij of the investment uni¬ 
verse I), the total tracking error is less than a 
given upper bound Uf. 

J2( w i - w b,j) < u i 

Finally, tracking error can be expressed 
through risk measures other than the absolute 
deviations or the variance of the deviations 
from the benchmark. Rockafellar and Uryasev 
(2000) suggest using conditional value-at-risk 
(CVaR) to manage the tracking error. Condi¬ 
tional value-at-risk measures the average loss 
that can happen with probability less than some 
small probability, that is, the average loss in the 
tail of the distribution of portfolio losses. (Us¬ 
ing CVaR as a risk measure results in computa¬ 
tionally tractable optimization formulations for 
portfolio allocation, as long as the data are pre¬ 
sented in the form of scenarios. 4 ) We provide 
below a formulation that is somewhat different 
from Rockafellar and Uryasev, but preserves the 
main idea. 

Suppose that we are given S scenarios for the 
return of a benchmark portfolio (or an instru¬ 
ment we are trying to replicate), b s , s = 1,..., S. 
These scenarios can be generated by simulation 
or taken from historical data. We also have N 
stocks with returns r- J (i = 1,... ,N, s = 1,..., 
S) in each scenario. The value of the portfolio in 
scenario s is 

N 

X>!’W 

;=1 


or, equivalently, (rU)'w, where r^ is the vector 
of returns for the N stocks in scenario s. Con¬ 
sider the differences between the return on the 
benchmark and the return on the portfolio, 

b s - (r< s >)'w = —((r^)'w - b s ) 

If this difference is positive, we have a loss; 
if the difference is negative, we have a gain; 
both gains and losses are computed relative to 
the benchmark. Rationally, the portfolio man¬ 
ager should not worry about differences that are 
negative; the only cause for concern would be 
if the portfolio underperforms the benchmark, 
which would result in a positive difference. 
Thus, it is not necessary to limit the variance of 
the deviations of the portfolio returns from the 
benchmark, which penalizes for positive and 
negative deviations equally. Instead, we can im¬ 
pose a limit on the amount of loss we are willing 
to tolerate in terms of the CVaR of the distribu¬ 
tion of losses relative to the benchmark. 

The tracking error constraint in terms of the 
CVaR can be stated as the following set of 
constraints : 5 

S=1 

1 h > - ((r (s) )'w - fcs) - S = 1,..., S 
y s > 0, S = 1, . .. , S 

where Ute is the upper bound on the negative 
deviations. 

This formulation of tracking error is appeal¬ 
ing in two ways. First, it treats positive and 
negative deviations relative to the benchmark 
differently, which agrees with the strategy of an 
investor seeking to maximize returns overall. 
Second, it results in a linear set of constraints, 
which are easy to handle computationally, in 
contrast to the first formulation of the tracking 
error constraint in this section, which results in 
a quadratic constraint. 
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Actual Versus Predicted 
Tracking Error 

The tracking error calculation in practice is of¬ 
ten backward-looking. For example, in comput¬ 
ing the covariance matrix Y, in the standard 
tracking error definition as the variance of the 
deviations of the portfolio returns from the in¬ 
dex, or in selecting the scenarios used in the 
CVaR-type tracking error constraint in the pre¬ 
vious section, we may use historical data. The 
tracking error calculated in this manner is called 
the ex post tracking error, backward-looking 
error, or actual tracking error. 

The problem with using the actual tracking 
error for assessing future performance relative 
to a benchmark is that the actual tracking er¬ 
ror does not reflect the effect of the portfolio 
manager's current decisions on the future ac¬ 
tive returns and hence the tracking error that 
may be realized in the future. The actual track¬ 
ing error has little predictive value and can be 
misleading regarding portfolio risk. 

Portfolio managers need forward-looking es¬ 
timates of tracking error to reflect future port¬ 
folio performance more accurately. In practice, 
this is accomplished by using the services of a 
commercial vendor that has a multifactor risk 
model that has identified and defined the risks 
associated with the benchmark, or by building 
such a model in-house. Statistical analysis of 
historical return data for the stocks in the bench¬ 
mark is used to obtain the risk factors and to 
quantify the risks. Using the manager's current 
portfolio holdings, the portfolio's current expo¬ 
sure to the various risk factors can be calculated 
and compared to the benchmark's exposures to 
the risk factors. From the differential factor ex¬ 
posures and the risks of the factors, a forward- 
looking tracking error for the portfolio can be 
computed. This tracking error is also referred to 
as an ex ante tracking error or predicted track¬ 
ing error. 

There is no guarantee that the predicted track¬ 
ing error will match exactly the tracking error 
realized over the future time period of interest. 
Flowever, this calculation of the tracking error 


has its use in risk control and portfolio construc¬ 
tion. By performing a simulation analysis on the 
factors that enter the calculation, the manager 
can evaluate the potential performance of port¬ 
folio strategies relative to the benchmark, and 
eliminate those that result in tracking errors be¬ 
yond the client-imposed tolerance for risk. The 
actual tracking error, on the other hand, is use¬ 
ful for assessing actual performance relative to 
a benchmark. 

INCORPORATING 
TRANSACTION COSTS 

Transaction costs can be generally divided into 
two categories: (1) explicit such as bid-ask 
spreads, commissions, and fees, and (2) im¬ 
plicit such as price movement risk costs and 
market impact costs. Price movement risk costs 
are the costs resulting from the potential for a 
change in market price between the time the 
decision to trade is made and the time the trade 
is actually executed. Market impact is the ef¬ 
fect a trader has on the market price of an asset 
when it sells or buys the asset. It is the extent to 
which the price moves up or down in response 
to the trader's actions. For example, a trader 
who tries to sell a large number of shares of 
a particular stock may drive down the stock's 
market price. 

The typical portfolio allocation models are 
built on top of one or several forecasting mod¬ 
els for expected returns and risk. Small changes 
in these forecasts can result in reallocations that 
would not occur if transaction costs are taken 
into account. In practice, the effect of transac¬ 
tion costs on portfolio performance is far from 
insignificant. If transaction costs are not taken 
into consideration in allocation and rebalanc¬ 
ing decisions, they can lead to poor portfolio 
performance. 

This section describes some common trans¬ 
action cost models for portfolio rebalancing. 
We use the mean-variance framework as the 
basis for describing the different approaches. 
Flowever, it is straightforward to extend the 
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transaction cost models into other portfolio al¬ 
location frameworks. 

The earliest, and most widely used, model 
for transaction costs is the mean-variance risk- 
aversion formulation with transaction costs . 6 
The optimization problem has the following ob¬ 
jective function: 

maxw'p — X ■ w X w — kxc • TC 

W 

where TC is a transaction cost penalty function 
and Xtc is the transaction cost aversion parame¬ 
ter. In other words, the objective is to maximize 
the expected portfolio return less the cost of 
risk and transaction costs. We can imagine that 
as the transaction costs increase, at some point 
it becomes optimal to keep the current portfo¬ 
lio rather than to rebalance. Variations of this 
formulation exist. For example, it is common 
to maximize expected portfolio return minus 
transaction costs, and to impose limits on the 
risk as a constraint (i.e., to move the second term 
in the objective function to the constraints). 

Transaction costs models can involve compli¬ 
cated nonlinear functions. Although software 
exists for general nonlinear optimization prob¬ 
lems, the computational time required for solv¬ 
ing such problems is often too long for realistic 
investment applications, and the quality of the 
solution is not guaranteed. In practice, an ob¬ 
served complicated nonlinear transaction costs 
function is often approximated with a compu¬ 
tationally tractable function that is assumed to 
be separable in the portfolio weights, that is, it 
is often assumed that the transaction costs for 
each individual stock are independent of the 
transaction costs for another stock. For the rest 
of this section, we will denote the individual 
cost function for stock i by TC,. 

Next, we explain several widely used models 
for the transaction cost function. 

Linear Transaction Costs 

Let us start simple. Suppose that the transac¬ 
tion costs are proportional, that is, they are 
a percentage c, of the transaction size |f| = 


| w, - woj | . 7 Then, the portfolio allocation prob¬ 
lem with transaction costs can be written simply 
as 

N 

max w'p — X ■ w'Ew — A.jc • / c,-|w; — wq ;| 

i =1 

The problem can be made solver-friendly by 
replacing the absolute value terms with new 
decision variables y,, and adding two sets of 
constraints. Flence, we rewrite the objective 
function as 

N 

max w'q — X ■ w'Ew — k T c • c,-y, 

w. v ^' 

7=1 

and add the constraints 

\Ji > Wi - W 0 ,i 
Vi > ~{wi - wo.i) 

This preserves the quadratic optimization 
problem formulation, a formulation that can 
be passed to quadratic optimization solvers 
such as Excel Solver and MATLAB's quadprog 
function, because the constraints are linear ex¬ 
pressions, and the objective function contains 
only linear and quadratic terms. 

In the optimal solution, the optimization 
solver will in fact set the value for y, to 
I w i ~ zu u,i I- This is because this is a maxi¬ 
mization problem and y, occurs with a nega¬ 
tive sign in the objective function, so the solver 
will try to set y, to the minimum value possible. 
That minimum value will be the maximum of 
(Wi - Woj) or -( Wi - Wo,i), which is in fact the 
absolute value | if; - Woj \ ■ 

Piecewise-Linear Transaction Costs 

Taking the model in the previous section a step 
further, we can introduce piecewise-linear ap¬ 
proximations to transaction cost function mod¬ 
els. This kind of function is more realistic than 
the linear cost function, especially for large 
trades. As the trading size increases, it becomes 
increasingly more costly to trade because of the 
market impact of the trade. 

An example of a piecewise-linear function 
of transaction costs for a trade of size f of a 
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particular security is illustrated in Figure 1. 
The transaction cost function in the graph as¬ 
sumes that the rate of increase of transaction 
costs (reflected in the slope of the function) 
changes at certain threshold points. For exam¬ 
ple, it is smaller in the range 0 to 15% of daily 
volume than in the range 15% to 40% of daily 
volume (or some other trading volume index). 
Mathematically, the transaction cost function in 
Figure 1 can be expressed as 
TC(f) = 

s 2 t, 0 < t < 0.15 - Vol 

si(0.15 • Vol) + s 2 (f - 0.15 • Vol), 0.15 • Vol <t< 0.40 • Vol 
si(0.15 • Vol) + s 2 (0.25 • Vol) 0.40 • Vol <t< 0.50 • Vol 
. +s 3 (f- 0.40 Vol), 

where Si, s 2 , S 3 are the slopes of the three linear 
segments on the graph. (They are given data.) 

To include piecewise-linear functions for 
transaction costs in the objective function of a 
mean-variance (or any general mean-risk) port¬ 
folio optimization problem, we need to intro¬ 
duce new decision variables that correspond 
to the number of pieces in the piecewise-linear 
approximation of the transaction cost function 
(in this case, there are three linear segments, so 


we introduce variables Zi, z 2 , Z 3 ). We write the 
penalty term in the objective function for an in¬ 
dividual stock as 8 

Me • (si • zi + s 2 • z 2 + S 3 • Z 3 ) 

If there are N stocks in the portfolio, the total 
transaction cost will be the sum of the transac¬ 
tion costs for each individual stock, that is, the 
penalty term that involves transaction costs in 
the objective function becomes 

N 

— Me ( Sl >> ‘ Zl -' + S 2,i ' Z 2 ,i + s 3 ,i ' z 3,i) 
i=l 

In addition, we specify the following con¬ 
straints on the new decision variables: 

0 < Zi,; < 0.15 ■ Vol; 

0 < Z 2 ,i < 0.25 ■ Vol; 

0 < Z 3 ,; < 0.10 ■ Vol; 

Note that because of the increasing slopes of 
the linear segments and the goal of making 
that term as small as possible in the objective 
function, the optimizer will never set the deci¬ 
sion variable corresponding to the second seg¬ 
ment, z 2 ;, to a number greater than 0 unless 
the decision variable corresponding to the first 
segment, Zi,;, is at its upper bound. Similarly, 
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the optimizer would never set z^j to a number 
greater than 0 unless both and Z2j are at 
their upper bounds. So, this set of constraints 
allows us to compute the amount of transac¬ 
tion costs incurred in the trading of stock i as 
Zl,i + Z2,f + Z 3 ,i ■ 

Of course, we also need to link the amount 
of transaction costs incurred in the trading of 
stock i to the optimal portfolio allocation. This 
can be done by adding a few more variables 
and constraints. We introduce variables y,, one 
for each stock in the portfolio, that would rep¬ 
resent the amount traded (but not the direction 
of the trade) and would be nonnegative. Then, 
we require that 

y,; = z\j + %2,i + Z3 for each stock i, 

and also that y, equals the change in the portfo¬ 
lio position of stock i. The latter condition can 
be imposed by writing the constraint 

xji = | Wi - wo^l 

where Wgj and w, are the initial and the final 
amount of stock i in the portfolio, respectively. 9 

Despite their apparent complexity, piecewise- 
linear approximations for transaction costs are 
very solver-friendly, and save time (relative to 
nonlinear models) in the actual portfolio opti¬ 
mization. Although modeling transaction costs 
this way requires introducing new decision 
variables and constraints, the increase in the di¬ 
mension of the portfolio optimization problem 
does not affect significantly the running time 
or the performance of the optimization solver, 
because the problem formulation is easy from a 
computational perspective. 

Quadratic Transaction Costs 

The transaction cost function is often parame¬ 
terized as a quadratic function of the form 

TC;(f) = c, • \t\ + di ■ \t\ 2 

The coefficients c, and d, are calibrated from 
data-for example, by fitting a quadratic func¬ 
tion to an observed pattern of transaction costs 


realized for trading a particular stock under 
normal conditions. 

Including this function in the objective func¬ 
tion of the portfolio optimization problem re¬ 
sults in a quadratic program that can be solved 
with widely available quadratic optimization 
software. 

Fixed Transaction Costs 

In some cases, we need to model fixed trans¬ 
action costs. Those are costs that are incurred 
independently of the amount traded. To in¬ 
clude such costs in the portfolio optimization 
problem, we need to introduce binary variables 
Si ,..., 8 n corresponding to each stock, where 
Si equals 0 if the amount traded of stock i is 0, 
and 1 otherwise. The idea is similar to the idea 
we used to model the requirement that only a 
given number of stocks can be included in the 
portfolio. 

Suppose the fixed transaction cost is a, for 
stock i. Then, the transaction cost function is 

TC; = (li ■ 8j 

The objective function formulation is then 

N 

max w'q — X ■ w'Ew — A.xc • / 

i =1 

and we need to add the following constraints to 
make sure that the binary variables are linked 
to the trades | if / - Woj \: 

I Wi — w 0 ,/1 < M • <5;, i = 1,..., N, 

Si binary 

where M is a "large" constant. When the trading 
size | Wj - iv 0: i | is nonzero, 8, will be forced to 
be 1. When the trading size is 0, then 8, can be 
either 0 or 1, but the optimizer will set it to 0, 
since it will try to make its value the minimum 
possible in the objective function. 

Of course, combinations of different trading 
cost models can be used in practice. For exam¬ 
ple, if the trade involves both a fixed and a vari¬ 
able quadratic transaction cost, then we could 
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use a transaction cost function of the kind 
TQ(f) = iz, • Si + Ci • |f| + d-i ■ \t\ 2 

The important takeaway from this section is 
that when transaction costs are included in the 
portfolio rebalancing problem, the result is a re¬ 
duced amount of trading and rebalancing, and 
a different portfolio allocation than the one that 
would be obtained if transaction costs are not 
taken into consideration. 


INCORPORATING TAXES 

When stocks in a portfolio appreciate or de¬ 
preciate in value, capital gains (respectively, 
losses) accumulate. When stocks are sold, in¬ 
vestors pay taxes on the realized net capital 
gains. The taxes are computed as a percentage 
of the difference between the current market 
value of the stocks and their tax basis, where 
the tax basis is the price at which the stocks 
were bought originally. 10 The percentage is less 
for long-term capital gains (when stocks have 
been held for more than a year) than it is for 
short-term capital gains (when stocks have been 
held for less than a year). 11 Since shares of the 
same stock could have been bought at different 
points in time (in different lots), selling one lot 
of the stock as opposed to another could incur 
a different amount of tax. In addition to capital 
gains taxes, investors who are not exempt from 
taxes owe taxes on the dividends paid on stocks 
in their portfolios. Those dividends are histor¬ 
ically taxed at a higher rate than capital gains, 
and may eventually be taxed as income, that is, 
at the investor's personal tax rate. The tax lia¬ 
bility of a particular portfolio therefore depends 
on the timing of the execution of trades, on the 
tax basis of the portfolio, on the accumulated 
short-term and long-term capital gains, and on 
the tax bracket of the investor. 

Over two-thirds of marketable portfolio as¬ 
sets in the United States are held by individu¬ 
als, insurance, and holding companies who pay 
taxes on their returns. (Exceptions are, for ex¬ 


ample, pension funds, which do not pay taxes 
year-to-year.) Studies have indicated that taxes 
are the greatest expense investors face—greater 
than commissions and investment manage¬ 
ment fees. To gain some intuition about the ef¬ 
fect of taxes on the income of an investor over 
the investor's lifetime, consider a portfolio that 
has a capital appreciation of 6.00% per year. Af¬ 
ter 30 years, $1,000 invested in that portfolio will 
turn into $1,000 ■ (1 + 0.06) 30 = $5,743.49. Now 
suppose that the capital gains are realized each 
year, and a tax of 35% is paid on the gains (the 
remainder is reinvested). After 30 years, $1,000 
invested in the portfolio will turn into $1,000 • 
(1 + (1 - 0.35) • 0.06) 30 = $3,151.13, about half 
of the amount without taxes even when the tax 
is about one third of the capital gains. In fact, 
in order to provide the same return as the port¬ 
folio with no taxes, the portfolio with annual 
realized capital gains would need to generate a 
capital appreciation of 9.23% per year! One can 
imagine that the same logic would make bench¬ 
mark tracking and performance measurement 
very difficult on an after-tax basis. 

As investors have become more aware of the 
dramatic impact of taxes on their returns, there 
is increasing pressure on portfolio managers 
to include tax considerations in their portfolio 
rebalancing decisions and to report after-tax 
performance. Consequently, the demand for 
computationally efficient and quantitatively 
rigorous methods for taking taxes into con¬ 
sideration in portfolio allocation decisions has 
grown in recent years. The complexity of the 
problem of incorporating taxes, however, is 
considerable, both from a theoretical and prac¬ 
tical perspective: 

1. The presence of tax liabilities changes the 
interpretation of even fundamental portfo¬ 
lio performance summary measures such 
as market value and risk. Thus, well- 
established methods for evaluating portfo¬ 
lio performance on a pretax basis do not 
work well in the case of tax-aware portfo¬ 
lio optimization. For example, in traditional 


74 


Equity Models and Valuation 


portfolio management a loss is associated 
with risk and is therefore minimized when¬ 
ever possible. However, in the presence of 
taxes, losses may be less damaging, because 
they can be used to offset capital gains and re¬ 
duce the tax burden of portfolio rebalancing 
strategies. Benchmarking is also not obvious 
in the presence of taxes: Two portfolios that 
have exactly the same current holdings are 
not equivalent if the holdings have a differ¬ 
ent tax basis. 12 

2. Tax considerations are too complex to imple¬ 
ment in a nonautomated fashion; at the same 
time, their automatic inclusion in portfolio 
rebalancing algorithms requires the ability to 
solve very difficult, large-scale optimization 
problems. 

3. The best approach for portfolio manage¬ 
ment with tax considerations is optimiza¬ 
tion problem formulations that look at return 
forecasts over several time periods (e.g., 
until the end of the year) before recommend¬ 
ing new portfolio weights. However, the 
latter multiperiod view of the portfolio op¬ 
timization problem is very difficult to han¬ 
dle computationally—the dimension of the 
optimization problem, that is, the number of 
variables and constraints, increases exponen¬ 
tially with the number of time periods under 
consideration. 

We need to emphasize that while many of 
the techniques described in the previous sec¬ 
tions of this entry are widely known, there are 
no standard practices for tax-aware portfolio 
management that appear to be established. Dif¬ 
ferent asset management firms interpret tax- 
aware portfolio allocation and approach the 
problem differently. To some firms, minimiz¬ 
ing turnover, 13 for example, by investing in 
index funds, or selecting strategies that mini¬ 
mize the portfolio dividend yield, 14 qualify as 
tax-aware portfolio strategies. Other asset man¬ 
agement firms employ complex optimization 
algorithms that incorporate tax considerations 
directly in portfolio rebalancing decisions, so 


that they can keep up with the considerable bur¬ 
den of keeping track of thousands of managed 
accounts and their tax preferences. The fact is, 
even using simple rules of thumb, such as al¬ 
ways selling stocks from the oldest lots after re¬ 
balancing the portfolio with classical portfolio 
optimization routines, can have a positive effect 
on after-tax portfolio returns. The latter strategy 
minimizes the likelihood that short-term gains 
will be incurred, which in turn reduces taxes, 
because short-term capital gains are taxed at a 
higher rate than long-term capital gains. 

Apelfeld, Fowler, and Gordon (1996) suggest 
a tax-aware portfolio rebalancing framework 
that incorporates taxes directly into the portfo¬ 
lio optimization process. The main idea of the 
approach is to treat different lots of the same 
stock as different securities, and then penal¬ 
ize for taxes as if they were different transac¬ 
tion costs associated with the sale of each lot. 
(This means, for example, that Microsoft stock 
bought on Date 1 is treated as a different se¬ 
curity from Microsoft stock bought on Date 2.) 
Many tax-aware quantitative investment strate¬ 
gies employ versions of this approach, but there 
are a few issues to beware of when using it in 
practice: 

• The first one is a general problem for all 
tax-aware approaches when they are used in 
the context of active portfolio management. 
For a portfolio manager who handles thou¬ 
sands of different accounts with different tax 
exposures, it is virtually impossible to pay at¬ 
tention to the tax cost incurred by each indi¬ 
vidual investor. While the tax-aware method 
described above minimizes the overall tax 
burden by reducing the amount of realized 
short-term sales, it has no provisions for dif¬ 
ferentiating between investors in different tax 
brackets because it is difficult to think of 
each trade as divided between all investors, 
and adjusted for each individual investor's 
tax circumstances. This issue is so intractable 
that in practice it is not really brought under 
consideration. 
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• The dimension of the problem can become 
unmanageable very quickly. For example, a 
portfolio of 1,000 securities, each of which has 
10 different lots, is equivalent to a portfolio of 
10,000 securities when each lot is treated as 
a different security. Every time a new pur¬ 
chase is realized, a new security is added to 
the portfolio, since a new lot is created. One 
needs to exercise care and "clean up" lots that 
have been sold and therefore have holdings 
of zero each time the portfolio is rebalanced. 

• Practitioners typically use factor models for 
forecasting returns and estimating risk. One 
of the assumptions when measuring port¬ 
folio risk through factor models is that the 
specific risk of a particular security is uncorre¬ 
lated with the specific risk of other securities. 
(The only risk they share is the risk expressed 
through the factors in the factor model.) This 
assumption clearly does not hold when dif¬ 
ferent "securities" are in fact different lots of 
the same stock. 

DiBartolomeo (2000) describes a modifica¬ 
tion to the model used by Northfield Informa¬ 
tion Service's portfolio management software 
that eliminates the last two problems. Instead 
of treating each lot as a separate security, the 
software imposes piecewise-linear transaction 
costs (see Figure 1) where the break points on 
the horizontal axis correspond to the current 
size of different lots of the same security. The 
portfolio rebalancing algorithm goes through 
several iterations for the portfolio weights, and 
at each iteration, only the shares in the highest 
cost basis tax lot can be traded. Other shares of 
the same stock can be traded in subsequent iter¬ 
ations of the algorithm, with their appropriate 
tax costs attached. 

The approaches we described so far take into 
consideration the short-term or long-term na¬ 
ture of capital gains, but do not incorporate 
the ability to offset capital gains and losses ac¬ 
cumulated over the year. This is an inherent 
limitation of single-period portfolio rebalanc¬ 
ing approaches and is a strong argument in fa¬ 


vor of adopting more realistic multiperiod port¬ 
folio optimization approaches. The rebalancing 
of the portfolio at each point in time should 
be made not only by considering the immedi¬ 
ate consequences for the market value of the 
portfolio, but also the opportunity to correct for 
tax liabilities by realizing other capital gains or 
losses by the end of the taxable year. The scarce 
theoretical literature on multiperiod tax-aware 
portfolio optimization contains some character¬ 
izations of optimal portfolio strategies under 
numerous simplifying assumptions. 15 FFow- 
ever, even under such simplifying assumptions, 
the dimension of the problem grows exponen¬ 
tially with the number of stocks in a portfolio, 
and it is difficult to come up with computation¬ 
ally viable algorithms for portfolios of realistic 
size. 

MULTIACCOUNT 

OPTIMIZATION 

Portfolio managers who handle multiple ac¬ 
counts face an important practical issue. When 
individual clients' portfolios are managed, 
portfolio managers incorporate their clients' 
preferences and constraints. Flowever, on any 
given trading day, the necessary trades for mul¬ 
tiple diverse accounts are pooled and executed 
simultaneously. Moreover, typically trades may 
not be crossed, that is, it is not simply permis¬ 
sible to transfer an asset that should be sold on 
behalf of one client into the account of another 
client for whom the asset should be bought. 16 
The trades should be executed in the market. 
Thus, each client's trades implicitly impact the 
results for the other clients: The market impact 
of the combined trades may be such that the 
benefits sought for individual accounts through 
trading are lost due to increased overall transac¬ 
tion costs. A robust multiaccount management 
process should ensure accurate accounting and 
fair distribution of transaction costs among the 
individual accounts. 

One possibility to handle the effect of trad¬ 
ing in multiple accounts is to use an iterative 
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process, in which at each iteration the mar¬ 
ket impact of the trades in previous iterations 
is taken into account. 17 More precisely, single 
clients' accounts are optimized as usual, and 
once the optimal allocations are obtained, the 
portfolio manager aggregates the trades and 
computes the actual marginal transaction costs 
based on the aggregate level of trading. The 
portfolio manager then reoptimizes individual 
accounts using these marginal transaction costs, 
and aggregates the resulting trades again to 
compute new marginal transaction costs, and 
so on. The advantage of this approach is that 
little needs to be changed in the way individual 
accounts are typically handled, so the existing 
single-account optimization and management 
infrastructure can be reused. The disadvantage 
is that most generally, this iterative approach 
does not guarantee a convergence (or its con¬ 
vergence may be slow) to a "fair equilibrium," 
in which clients' portfolios receive an unbiased 
treatment with respect to the size and the con¬ 
straint structure of their accounts. 18 The latter 
equilibrium is the one that would be attained 
if all clients traded independently and compet¬ 
itively in the market for liquidity, and it is thus 
the correct and fair solution to the aggregate 
trading problem. 

An alternative, more comprehensive ap¬ 
proach is to optimize trades across all accounts 
simultaneously. O'Cinneide, Scherer, and Xu 
(2006) describe such a model and show that 
it attains the fair equilibrium we mentioned 
above. 19 Assume that client k's utility function 
is given by Uk and is in the form of a dollar 
return penalized for risk. Assume also that a 
transaction cost model r gives the cost of trad¬ 
ing in dollars, and that r is a convex increasing 
function. 20 Its exact form will depend on the de¬ 
tails of how trading is implemented. Let t be the 
vector of trades. It will typically have the form 
(fj 8 ,..., fj|j, fcf,..., f^), that is, it will specify the 
aggregate buys f ; + and the aggregate sells t~ for 
each asset 2=1,..., N, but it may also incorpo¬ 
rate information about how the trade could be 
carried out. 21 


The multiaccount optimization problem can 
be formulated as 

max E[iii(wi)] + ... + E[22j<;(w K )] - r(t) 

Wi,...,Wk ,t 

s.t. Wk e Ck,k = 1,..., K 

where wjt is the N-dimensional vector of as¬ 
set holdings (or weights) of client k, and Q ; 
is the collection of constraints on the portfolio 
structure of client k. The objective can be inter¬ 
preted as maximization of net expected utility, 
that is, as maximization of the expected dollar 
return penalized for risk and net of transaction 
costs. 

The problem can be simplified by making 
some reasonable assumptions. For example, it 
can be assumed that the transaction cost func¬ 
tion r is additive across different assets, that is, 
that trades in one asset do not influence trad¬ 
ing costs in another. In such a case, the trading 
cost function can be split into more manageable 
terms, that is, 

N 

*■(*) = X vtf’tn 

i=i 

where r,(f T, t~) is the cost of trading asset i 
as a function of the aggregate buys and sells 
of that asset. Splitting the terms r,(f, + . t~) fur¬ 
ther into separate costs of buying and selling, 
however, is not a reasonable assumption, be¬ 
cause simultaneous buying and selling of an 
asset tends to have an offsetting effect on its 
price. 

To formulate the problem completely, let w° 
be the vector of original holdings (or weights) of 
client k's portfolio, w k be the vector of decision 
variables for the optimal holdings (or weights) 
of client k's portfolio, and i]kj be constants that 
convert the holdings (or weight) of each asset 
2 in client z's portfolio w k , to dollars, that is, 
*lk,iU>k,i is client k's dollar holdings of asset z. 22 
We also introduce new variables w k to repre¬ 
sent the an upper bound on the weight of each 
asset client k will buy: 

Wk ,i - W k i < w£j. 


i =1, ...,N,k = l,...,K 


Equity Portfolio Selection Models in Practice 


77 


The aggregate amount of asset i bought for all 
clients can then be computed as 

K 

f / + = 22 n- 1 ■ w ti 

k =1 

The aggregate amount of asset i sold for all 
clients can be easily expressed by noticing that 
the difference between the amounts bought and 
sold of each asset are exactly equal to the total 
amount of trades needed to get from the origi¬ 
nal position w k i to the final position w k , of that 
asset: 23 

K 

t? - tf = 22 ^ ' ( w k,i - w li) 

k =1 

Here f ( + and f” are nonnegative variables. 

The multiaccount optimization problem then 
takes the form 

N 

max E[i/i(wi)] + ... + E[uk( wx)l - Y] 

w!.w K ,t+,t- pq 

s.t. w k 6 Ck, k = 1,.... K 


U!k,i — 1 »ki— w ki • i = 1. N, k = 1, . . . , K 

K 


= J2 r >k, iW k, i ’ 1 

' = 1,.... N 



k =1 




K 




i t - f r = J2 ^ 

■ ( w k,i ~ «£,■). 


., N 


Jc=l 


ft > 0, tr > 0, > 0. i = 1. N, k = 1,..., K 

O'Cinneide, Scherer, and Xu (2006) studied 
the behavior of the model in simulated experi¬ 
ments with a simple model for the transaction 
cost function, namely, one in which 

r(f) =e-t y 

where f is the trade size, and 0 and y are con¬ 
stants satisfying 6 > 0 and y > 1. 24 0 and y are 
specified in advance and calibrated to fit ob¬ 
served trading costs in the market. The trans¬ 
action costs for each client k can therefore be 
expressed as 

N 

r k = o^2 \ Wk ~ w h\ Y 

i =1 

O'Cinneide, Scherer, and Xu (2006) observed 
that key portfolio performance measures, such 


as the information ratio (IR), 25 turnover, and to¬ 
tal transaction costs, change under this model 
relative to the traditional approach. Not surpris¬ 
ingly, the turnover and the net information ra¬ 
tios of the portfolios obtained with multiaccount 
optimization are lower than those obtained with 
single-account optimization under the assump¬ 
tion that accounts are traded separately, while 
transaction costs are higher. These results are in 
fact more realistic, and they are a better repre¬ 
sentation of the postoptimization performance 
of multiple client accounts in practice. 


ROBUST PARAMETER 
ESTIMATION 

The most commonly used approach for estimat¬ 
ing security expected returns, covariances, and 
other parameters that are inputs to portfolio 
optimization models is to calculate the sample 
analogues from historical data. These are sam¬ 
ple estimates for the parameters we need. It is 
important to remember that when we rely on 
historical data for estimation purposes, we in 
fact assume that the past provides a good rep¬ 
resentation of the future. 

It is well known, however, that expected re¬ 
turns exhibit significant time variation (referred 
to as nonstationarity). They are impacted by 
changes in markets and economic conditions, 
such as interest rates, the political environment, 
consumer confidence, and the business cycles of 
different industry sectors and geographical re¬ 
gions. Consequently, extrapolated historical re¬ 
turns are often poor forecasts of future returns. 

Similarly, the covariance matrix is unsta¬ 
ble over time. Moreover, sample estimates of 
covariances for portfolios with thousands of 
stocks are notoriously unreliable, because we 
need large data sets to estimate them, and such 
large data sets of relevant data are difficult 
to procure. Estimates of the covariance matrix 
based on factor models are often used to reduce 
the number of statistical estimates needed from 
a limited set of data. 
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In practice, portfolio managers often alter his¬ 
torical estimates of different parameters subjec¬ 
tively or objectively, based on their expectations 
and forecasting models for future trends. They 
also use statistical methods for finding estima¬ 
tors that are less sensitive to outliers and other 
sampling errors, such as Bayesian and shrink¬ 
age estimators. A complete review of advanced 
statistical estimation topics is beyond the scope 
of this entry. We provide a brief overview of the 
most widely used concepts. 26 

Shrinkage is a form of averaging different esti¬ 
mators. The shrinkage estimator typically con¬ 
sists of three components: (1) an estimator with 
little or no structure (like the sample mean); 
(2) an estimator with a lot of structure (the 
shrinkage target); and (3) a coefficient that re¬ 
flects the shrinkage intensity. Probably the most 
well-known estimator for expected returns in 
the financial literature was proposed by Jorion 
(1986). The shrinkage target in Jorion's model is 
a vector array with the return on the minimum 
variance portfolio, and the shrinkage intensity 
is determined from a specific formula. 27 Shrink¬ 
age estimators are used for estimates of the co- 
variance matrix of returns as well, 28 although 
equally weighted portfolios of covariance ma¬ 
trix estimators have been shown to be equally 
effective as shrinkage estimators. 29 

Bayesian estimation approaches, named af¬ 
ter the English mathematician Thomas Bayes, 
are based on subjective interpretations of the 
probability that a particular event will occur. A 
probability distribution, called the prior dis¬ 
tribution, is used to represent the investor's 
knowledge about the probability before any 
data are observed. After more information is 
gathered (e.g., data are observed), a formula 
(known as Bayes' rule) is used to compute the 
new probability distribution, called the poste¬ 
rior distribution. 

In the portfolio parameter estimation context, 
a posterior distribution of expected returns is 
derived by combining the forecast from the em¬ 
pirical data with a prior distribution. One of 
the most well-known examples of the applica¬ 


tion of the Bayesian framework in this context 
is the Black-Litterman model, 30 which produces 
an estimate of future expected returns by com¬ 
bining the market equilibrium returns (i.e., re¬ 
turns that are derived from pricing models and 
observable data) with the investor's subjective 
views. The investor's views are expressed as 
absolute or relative deviations from the equi¬ 
librium together with confidence levels of the 
views (as measured by the standard deviation 
of the views). 

The ability to incorporate exogenous insight, 
such as a portfolio manager's opinion, into 
quantitative forecasting models is important; 
this insight may be the most valuable input to 
the model. The Bayesian framework provides 
a mechanism for forecasting systems to use 
both important traditional information sources 
such as proprietary market data and subjective 
external information sources such as analyst's 
forecasts. 

It is important to realize that regardless of 
how sophisticated the estimation and forecast¬ 
ing methods are, they are always subject to 
estimation error. What makes matters worse, 
however, is that different estimation errors 
can accumulate over the different activities of 
the portfolio management process, resulting 
in large aggregate errors at the final stage. It 
is therefore critical that the inputs evaluated 
at each stage are reliable and robust, so that 
the aggregate impact of estimation errors is 
minimized. 

PORTFOLIO RESAMPLING 

Robust parameter estimation is only one part of 
ensuring that the quantitative portfolio man¬ 
agement process as a whole is reliable. It has 
been observed that portfolio allocation schemes 
are very sensitive to small changes in the in¬ 
puts that go into the optimizer. In particular, 
a well-known study by Black and Litterman 31 
demonstrated that in the case of mean-variance 
optimization, small changes in the inputs for 
expected returns had a substantial impact on 
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the portfolio composition. "Optimal" portfolios 
constructed under conditions of uncertainty 
can have extreme or nonintuitive weights for 
some stocks. 

With advances in computational capabilities 
and new research in the area of optimization 
under uncertainty, practitioners in recent years 
have been able to incorporate considerations 
for uncertainty not only at the estimation, but 
also at the portfolio optimization stage. Meth¬ 
ods for taking into consideration inaccuracies 
in the inputs to the portfolio optimization prob¬ 
lem include simulation (resampling) and robust 
optimization. We explain portfolio resampling in 
this section, and robust portfolio optimization 
in the following section. 

A logical approach to making portfolio alloca¬ 
tion more robust with respect to changes in the 
input parameters is to generate different sce¬ 
narios for the values these parameters can take, 
and to find weights that remain stable for small 
changes in the input parameters. This method is 
referred to as portfolio resampling. 32 To illustrate 
the resampling technique, we explain how it 
is applied to portfolio mean-variance optimiza¬ 
tion. 

Suppose that we have initial estimates for the 
expected stock returns, ft, and covariance ma¬ 
trix, X, for the N stocks in the portfolio. (We use 
"hat" to denote a statistical estimate.) 

1. We simulate S samples of N returns from a 
multivariate normal distribution with mean 
(1 and covariance matrix X. 

2. We use the S samples generated in (1) to com¬ 
pute S new estimates of vectors of expected 
returns pq,..., p. s and covariance matrices 

3. We solve S portfolio optimization problems, 
one for each estimated pair of expected 
returns and covariances (p. s ,E s ), and save 
the weights for the N stocks in a vector 
array w* s *, where s = 1,..., S. (The optimiza¬ 
tion problem itself could be any of the stan¬ 
dard mean-variance formulations: maximize 
expected return subject to constraints on 


risk, minimize risk subject to constraints on 
the expected return, or maximize the utility 
function.) 

4. To find the final portfolio weights, we aver¬ 
age out the weight for each stock over the S 
weights found for that stock in each of the S 
optimization problems. In other words. 


w 


1 

S 


X>“’ 

S = 1 


For example, stock i in the portfolio has final 
weight 


w; = 




„(S) 


5. Perhaps even more valuable than the aver¬ 
age estimate of the weights obtained from 
the simulation and optimization iterations 
is the probability distribution we obtain for 
the portfolio weights. If we plot the weights 
for each stock obtained over the S iterations, 
w;- 1 *, ..., w\ S \ we can get a sense for how 
variable this stock weight is in the portfolio. 
A large standard deviation computed from 
the distribution of portfolio weight i will 
be an indication that the original portfolio 
weight was not very precise due to estima¬ 
tion error. 


An important question, of course, is how large 
is "large enough." Do we have evidence that 
the portfolios we obtained through resampling 
are statistically different from one another? We 
can evaluate that by using a test statistic. For 
example, it can be shown that the test statistic 

d( w*, w) = (w* — w)'X(w* — w) 

follows a chi-square (y 2 ) distribution with de¬ 
grees of freedom equal to the number of securi¬ 
ties in the portfolio. If the value of this statistic is 
statistically "large," then there will be evidence 
that the portfolio weights w* and w are statisti¬ 
cally different. This is an important insight for 
the portfolio manager, and its applications ex¬ 
tend beyond just resampling. Let us provide 
some intuition as to why. 
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Suppose that we are considering rebalancing 
our current portfolio. Given our forecasts of ex¬ 
pected returns and risk, we could calculate a set 
of new portfolios through the resampling pro¬ 
cedure. Using the test statistic above, we deter¬ 
mine whether the new set of portfolio weights is 
statistically different from our current weights 
and, therefore, whether it would be worthwhile 
to rebalance or not. If we decide that it is worth¬ 
while to rebalance, we could choose any of the 
resampled portfolios that are statistically dif¬ 
ferent from our current portfolio. Which one 
should we choose? A natural choice would be 
to select the portfolio that would lead to the 
lowest transaction costs. The idea of determin¬ 
ing statistically equivalent portfolios, therefore, 
has much wider implications than the ones il¬ 
lustrated in the context of resampling. 
Resampling has its drawbacks: 

* Since the resampled portfolio is calculated 
through a simulation procedure in which a 
portfolio optimization problem needs to be 
solved at each step, the approach is compu¬ 
tationally cumbersome, especially for large 
portfolios. There is a trade-off between the 
number of resampling steps and the accuracy 
of estimation of the effect of errors on the port¬ 
folio composition. 

* Due to the averaging in the calculation of 
the final portfolio weights, it is highly likely 
that all stocks will end up with nonzero 
weights. This has implications for the amount 
of transaction costs that will be incurred if the 
final portfolio is to be attained. One possi¬ 
bility is to include constraints that limit both 
the turnover and the number of stocks with 
nonzero weights. As we saw earlier, however, 
the formulation of such constraints adds an¬ 
other level of complexity to the optimization 
problem and will slow down the resampling 
procedure. 

* Since the averaging process happens after the 
optimization problems are solved, the final 
weights may not actually satisfy some of the 
constraints in the optimization formulation. 


In general, only convex (such as linear) con¬ 
straints are guaranteed to be satisfied by the 
averaged final weights. Turnover constraints, 
for example, may not be satisfied. This is a se¬ 
rious limitation of the resampling approach 
for practical applications. 

Despite these limitations, resampling has 
advantages and presents a good alternative 
to using only point estimates of inputs to the 
optimization problem. 


ROBUST PORTFOLIO 
OPTIMIZATION 

Another way in which uncertainty about the 
inputs can be modeled is by incorporating it di¬ 
rectly into the optimization process. Robust opti¬ 
mization is an intuitive and efficient way to deal 
with uncertainty. Robust portfolio optimization 
does not use the traditional forecasts, such as ex¬ 
pected returns and covariances, but rather un¬ 
certainty sets containing these point estimates. 
An example of such an uncertainty set is a con¬ 
fidence interval around the forecast for each ex¬ 
pected return ("alpha"). This uncertainty shape 
looks like a "box" in the space of the input 
parameters. (See Figure 2(A).) We can also for¬ 
mulate advanced uncertainty sets that incor¬ 
porate more knowledge about the estimation 
error. For instance, a widely used uncertainty 
set is the ellipsoidal uncertainty set, which takes 
into consideration the covariance structure of 
the estimation errors. (See Figure 2(B).) We will 
see examples of both uncertainty sets in this 
section. 

The robust optimization procedure for port¬ 
folio allocation is as follows. First, we specify 
the uncertainty sets around the input param¬ 
eters in the problem. Then, we ask what the 
optimal portfolio allocation is when the input 
parameters take the worst possible value inside 
these uncertainty sets. In effect, we solve an in¬ 
ner problem that determines the worst possi¬ 
ble realization of the uncertain parameters over 
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(A) (B) 


Figure 2 (A) Box Uncertainty Set in Three Dimensions; (B) Ellipsoidal Uncertainty Set in Three Dimen¬ 
sions 


the uncertainty set before we solve the original 
problem of optimal portfolio allocation. 

Let us give a specific example of how the ro¬ 
bust optimization framework can be applied 
in the portfolio optimization context. Consider 
the utility function formulation of the mean- 
variance portfolio allocation problem: 

max w'p — l-w'Iw 

W 

s.t. w't = 1 


Suppose that we have estimates p andE of the 
vector of expected returns and the covariance 
matrix. Instead of the estimate p, however, we 
will consider a set of vectors p that are "close" 
to p. We define the box uncertainty set 

Us(fi) = {p| |p; - P;| < Si, i = 1,..., N] 


In words, the set IL(p) contains all vectors 
p = (pi,..., pn) such that each component p; 
is in the interval [p, — <5;, p, +5,-]. We then solve 
the following problem: 


maxi min U'wl - A-w'Iw 

w lneUj(fX ) 1 


s.t. w'i = 1 


This is called the robust counterpart of the 
original problem. It is a max-min problem that 
searches for the optimal portfolio weights when 
the estimates of the uncertain returns take their 
worst-case values within the prespecified un¬ 
certainty set in the sense that the value of the 
objective function is the worst it can be over all 
possible values for the expected returns in the 
uncertainty set. 

It can be shown 33 that the max-min problem 
above is equivalent to the following problem 

max w'p — <S'|w| — X • w'Xw 

W 

s.t. w't = 1 

where |w| denotes the absolute value of the en¬ 
tries of the vector of weights w. To gain some 
intuition, notice that if the weight of stock i 
in the portfolio is negative, the worst-case ex¬ 
pected return for stock i is p, + <5, (we lose the 
largest amount possible). If the weight of stock 
i in the portfolio is positive, then the worst-case 
expected return for stock i is p; — <5, (we gain 
the smallest amount possible). Observe that 
p; u>i — 8i | Wi | equals (p; — <5;) u); if the weight w; 
is positive and (p, + <$;) w; if the weight w, is 
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negative. Hence, the mathematical expression 
in the objective agrees with our intuition: It min¬ 
imizes the worst-case expected portfolio return. 
In this robust version of the mean-variance for¬ 
mulation, stocks whose mean return estimates 
are less accurate (i.e., have a larger estimation 
error <S,) are therefore penalized in the objective 
function and will tend to have a smaller weight 
in the optimal portfolio allocation. 

This optimization problem has the same 
computational complexity as the nonrobust 
mean-variance formulation—namely, it can be 
stated as a quadratic optimization problem. The 
latter can be achieved by using a standard trick 
that allows us to get rid of the absolute values 
for the weights. The idea is to introduce an 
Af-dimensional vector of additional variables 
ij) to replace the absolute values | w |, and to 
write an equivalent version of the optimization 
problem, 

max w'p. — 5'rf> — X • w'Ew 

w.eJj 

s.t. w't = 1 

i j/i > Wi;\[ri > —Wi,i = 1 ,,N 

Therefore, incorporating considerations 
about the uncertainty in the estimates of the 
expected returns in this example has virtually 
no computational cost. 

We can view the effect of this particular "ro- 
bustification" of the mean-variance portfolio 
optimization formulation in two different ways. 
On the one hand, we can see that the values 
of the expected returns for the different stocks 
have been adjusted downward in the objec¬ 
tive function of the optimization problem. The 
robust optimization model "shrinks" the ex¬ 
pected return of stocks with large estimation 
error, that is, in this case the robust formula¬ 
tion is related to statistical shrinkage methods, 
which we introduced earlier in this entry. On 
the other hand, we can interpret the additional 
term in the objective function as a "risk-like" 
term that represents penalty for estimation er¬ 
ror. The size of the penalty is determined by 
the investor's aversion to estimation risk and is 
reflected in the magnitude of the deltas. 


More complicated specifications for uncer¬ 
tainty sets have more involved mathematical 
representations, but can still be selected so that 
they preserve an easy computational structure 
for the robust optimization problem. For exam¬ 
ple, we can use the ellipsoidal uncertainty set 
from Figure 2(B), which can be expressed math¬ 
ematically as 

Us(M = {hl(h- A) < S 2 } . 

Here T is the covariance matrix of estima¬ 
tion errors for the vector of expected returns q. 
This uncertainty set represents the requirement 
that the sum of squares (scaled by the inverse 
of the covariance matrix of estimation errors) 
between all elements in the set and the point 
estimates /u-i, fa, ..., An can be no larger than 
8 2 . We note that this uncertainty set cannot be 
interpreted as individual confidence intervals 
around each point estimate. Instead, it captures 
the idea of a joint confidence region. In practical 
applications, the covariance matrix of estima¬ 
tion errors is often assumed to be diagonal. In 
the latter case, the set contains all vectors of ex¬ 
pected returns that are within a certain number 
of standard deviations from the point estimate 
of the vector of expected returns, and the re¬ 
sulting robust portfolio optimization problem 
would protect the investor if the vector of ex¬ 
pected returns is within that range. 

It can be shown that the robust counterpart of 
the mean-variance portfolio optimization prob¬ 
lem with an ellipsoidal uncertainty set for the 
expected return estimates is the following opti¬ 
mization problem formulation: 

max w'q — X • w'Ew — 8 ■ J w'E„w 

W 

s.t. w't = 1 

This is a second-order cone optimization 
problem and requires specialized software to 
solve, but the methods for solving it are very 
efficient. 

Similarly to the case of the robust counter¬ 
part with a box uncertainty set, we can inter¬ 
pret the extra term in the objective function 
(8 ■ ybv'E^w) as the penalty for estimation risk. 
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where 5 incorporates the degree of the in¬ 
vestor's aversion to estimation risk. Note, by 
the way, that the covariance matrix in the esti¬ 
mation error penalty term, Y u , is not necessar¬ 
ily the same as the covariance matrix of returns 
Y. In fact, it is not immediately obvious how Y M 
can be estimated from data. Y ^ is the covari¬ 
ance matrix of the errors in the estimation of 
the expected (average) returns. Thus, if a port¬ 
folio manager forecasts 5% active return over 
the next time period, but gets 1%, the manager 
cannot argue that there was a 4% error in the 
expected return—the actual error would con¬ 
sist of both an estimation error in the expected 
return and the inherent volatility in actual re¬ 
alized returns. In fact, critics of the approach 
such as Lee, Stefek, and Zhelenyak (2006) have 
argued that the realized returns typically have 
large stochastic components that dwarf the ex¬ 
pected returns, and hence estimating Y ^ from 
data is very hard, if not impossible. 

Several approximate methods for estimating 
Y ^ have been found to work well in practice. 
For example, Stubbs and Vance (2005) observe 
that simpler estimation approaches, such as 
using just the diagonal matrix containing the 
variances of the estimates (as opposed to the 
complete error covariance matrix), often pro¬ 
vide most of the benefit in robust portfolio opti¬ 
mization. In addition, standard approaches for 
estimating expected returns, such as Bayesian 
statistics and regression-based methods, can 
produce estimates for the estimation error co- 
variance matrix in the process of generating the 
estimates themselves. 34 

Among practitioners, the notion of robust 
portfolio optimization is often equated with the 
robust mean-variance model we discussed in 
this section, with the box or the ellipsoidal un¬ 
certainty sets for the expected stock returns. 
While robust optimization applications often 
involve one form or another of this model, the 
actual scope of robust optimization can be much 
broader. We note that the term robust optimiza¬ 
tion refers to the technique of incorporating in¬ 
formation about uncertainty sets for the pa¬ 


rameters in the optimization model, and not 
to the specific definitions of uncertainty sets 
or the choice of parameters to model as un¬ 
certain. For example, we can use the robust 
optimization methodology to incorporate con¬ 
siderations for uncertainty in the estimate of 
the covariance matrix in addition to the un¬ 
certainty in expected returns, and obtain a dif¬ 
ferent robust portfolio allocation formulation. 
Robust optimization can be applied also to 
portfolio allocation models that are different 
from the mean-variance framework, such as 
Sharpe ratio optimization and value-at-risk 
optimization. 35 Finally, robust optimization has 
the potential to provide a computationally effi¬ 
cient way to handle portfolio optimization over 
multiple stages—a problem for which so far 
there have been few satisfactory solutions. 36 
There are numerous useful robust formulations, 
but a complete review is beyond the scope of 
this entry. 37 

Is implementing robust optimization formu¬ 
lations worthwhile? Some tests with simulated 
and real market data indicate that robust op¬ 
timization, when inaccuracy is assumed in the 
expected return estimates, outperforms classi¬ 
cal mean-variance optimization in terms of total 
excess return a large percentage (70-80%) of the 
time (see, for example, Ceria and Stubbs, 2006). 
Other tests have not been as conclusive (see, for 
example, Lee, Stefek, and Zhelenyak, 2006). The 
factor that accounts for much of the difference is 
how the uncertainty in parameters is modeled. 
Therefore, finding a suitable degree of robust¬ 
ness and appropriate definitions of uncertainty 
sets can have a significant impact on portfolio 
performance. 

Independent tests by practitioners and aca¬ 
demics using both simulated and market data 
appear to confirm that robust optimization gen¬ 
erally results in more stable portfolio weights, 
that is, that it eliminates the extreme comer 
solutions resulting from traditional mean- 
variance optimization. This fact has implica¬ 
tions for portfolio rebalancing in the presence of 
transaction costs and taxes, as transaction costs 
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and taxes can add substantial expenses when 
the portfolio is rebalanced. Depending on the 
particular robust formulations employed, ro¬ 
bust mean-variance optimization also appears 
to improve worst-case portfolio performance 
and results in smoother and more consistent 
portfolio returns. Finally, by preventing large 
swings in positions, robust optimization typi¬ 
cally makes better use of the turnover budget 
and risk constraints. 

Robust optimization, however, is not a 
panacea. By using robust portfolio optimization 
formulations, investors are likely to trade off the 
optimality of their portfolio allocation in cases 
in which nature behaves as they predicted for 
protection against the risk of inaccurate estima¬ 
tion. Therefore, investors using the technique 
should not expect to do better than classical 
portfolio optimization when estimation errors 
have little impact, or when typical scenarios oc¬ 
cur. They should, however, expect insurance in 
scenarios in which their estimates deviate from 
the actual realized values by up to the amount 
they have prespecified in the modeling process. 

KEY POINTS 

• Commonly used constraints in practice in¬ 
clude long-only (no short-selling) constraints, 
turnover constraints, holding constraints, risk 
factor constraints, and tracking error con¬ 
straints. These constraints can be handled in 
a straightforward way by the same type of 
optimization algorithms used for solving the 
classical mean-variance portfolio allocation 
problem. 

• Minimum holding constraints, transaction 
size constraints, cardinality constraints, and 
round-lot constraints are also widely used in 
practice, but their nature is such that they re¬ 
quire binary and integer modeling, which ne¬ 
cessitates the use of mixed-integer and other 
specialized optimization solvers. 

• Transaction costs can easily be incorporated 
in standard portfolio allocation models. Typ¬ 
ical functions for representing transaction 


costs include linear, piecewise linear, and 
quadratic. 

• Taxes can have a dramatic effect on portfolio 
returns; however, it is difficult to incorporate 
them into the classical portfolio optimization 
framework. Their importance to the individ¬ 
ual investor is a strong argument for taking 
a multiperiod view of investments, but the 
computational burden of multiperiod port¬ 
folio optimization formulations with taxes is 
extremely high. 

• For investment managers who handle multi¬ 
ple accounts, increased transaction costs be¬ 
cause of the market impact of simultaneous 
trades can be an important practical issue and 
should be taken into consideration when in¬ 
dividual clients' portfolio allocation decisions 
are made to ensure fairness across accounts. 

• As the use of quantitative techniques has 
become widespread in the investment in¬ 
dustry, the consideration of estimation risk 
and model risk has grown in importance. 
Methods for robust statistical estimation of 
parameters include shrinkage and Bayesian 
techniques. 

• Portfolio resampling is a technique that uses 
simulation to generate multiple scenarios for 
possible values of the input parameters in the 
portfolio optimization problem and aims to 
determine portfolio weights that remain sta¬ 
ble with respect to small changes in model 
parameters. 

• Robust portfolio optimization incorporates 
uncertainty directly into the optimization 
process. The uncertain parameters in the op¬ 
timization problem are assumed to vary in 
prespecified uncertainty sets that are selected 
subjectively or based on data. 


NOTES 

1. See Chapter 1 in Maginn and Tuttle (1990). 

2. Multiperiod portfolio optimization models 
are still rarely used in practice, not be¬ 
cause the value of multiperiod modeling is 
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questioned, but because such models are 
often too intractable from a computational 
perspective. 

3. As the term intuitively implies, the ADV 
measures the total amount of a given asset 
traded in a day on average, where the aver¬ 
age is taken over a prespecified time period. 

4. Another computationally tractable situa¬ 
tion for minimizing CVaR is when the data 
are normally distributed. In that case, min¬ 
imizing CVaR is equivalent to minimizing 
the standard deviation of the portfolio. 

5. See Chapters 8 and 9 in Pachamanova and 
Fabozzi (2010) for a more detailed explana¬ 
tion of CVaR and a derivation of the opti¬ 
mization formulation. 

6. Versions of this model have been suggested 
in Pogue (1970), Schreiner (1980), Adcock 
and Meade (1994), Lobo, Fazel, and Boyd 
(2000), and Mitchell and Braun (2004). 

7. Here we are thinking of Wj as the portfolio 
weights, but in fact it may be more intuitive 
to think of the transaction costs as a per¬ 
centage of amount traded. It is easy to go 
back and forth between portfolio weights 
and portfolio amounts by simply multiply¬ 
ing u>i by the total amount in the portfolio. 
In fact, we can switch the whole portfolio 
optimization formulation around and write 
it in terms of allocation of dollars, instead of 
weights. We just need to replace the vector 
of weights w by a vector x of dollar hold¬ 
ings. 

8. See, for example, Bertsimas, Darnell, and 
Soucy (1999). 

9. As we explained earlier, this constraint can 
be written in an equivalent, more optimiza¬ 
tion solver-friendly form, namely, 

\ji > Wi - wo,; 

y; > - (w; - w 0 ,i) 

10. The computation of the tax basis is different 
for stocks and bonds. For bonds, there are 
special tax rules, and the original price is 
not the tax basis. 


11. The exact rates vary depending on the cur¬ 
rent version of the tax code, but the main 
idea behind the preferential treatment of 
long-term gains to short-term gains is to en¬ 
courage long-term capital investments and 
fund entrepreneurial activity. 

12. See Stein (1998). 

13. See Apelfeld, Fowler, and Gordon (1996) 
who show that a manager can outperform 
on an after-tax basis with high turnover as 
well, as long as the turnover does not result 
in net capital gains taxes. (There are other 
issues with high turnover, however, such as 
higher transaction costs that may result in a 
lower overall portfolio return.) 

14. Dividends are taxed as regular income, i.e., 
at a higher rate than capital gains, so mini¬ 
mizing the portfolio dividend yield should 
theoretically result in a lower tax burden for 
the investor. 

15. See Constantinides (1983), Dammon and 
Spatt (1996), and Dammon, Spatt, and 
Zhang (2001 and 2004). 

16. The Securities and Exchange Commission 
(SEC) in general prohibits cross-trading but 
does provide exemptions if prior to the ex¬ 
ecution of the cross trade the asset man¬ 
ager can demonstrate to the SEC that a 
particular cross trade benefits both parties. 
Similarly, Section 406(b)(3) of the Employee 
Retirement Income Security Act of 1974 
(ERISA) forbids cross-trading, but there 
is new cross-trading exemption in Section 
408(b)(19) adopted in the Pension Protec¬ 
tion Act of 2006. 

17. Khodadadi, Tutuncu, and Zangari (2006). 

18. The iterative procedure is known to con¬ 
verge to the equilibrium, however, under 
special conditions. See O'Cinneide, Scherer, 
and Xu (2006). 

19. The issue of considering transaction costs 
in multiaccount optimization has been dis¬ 
cussed by others as well. See, for example, 
Bertsimas, Darnell, and Soucy (1999). 

20. As we mentioned earlier in this entry, realis¬ 
tic transaction costs are in fact described by 
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nonlinear functions, because costs per share 
traded typically increase with the size of the 
trade due to market impact. 

21. For example, if asset i is a euro-pound for¬ 
ward, then a trade in that asset can also 
be implemented as a euro-dollar forward 
plus a dollar-forward, so there will be two 
additional assets in the aggregate trade 
vector t. 

22. Note that r]k,i equals 1 if w k ( is the actual 
dollar holdings. 

23. Note that, similarly to w k , we could intro¬ 
duce additional sell variables w(T, but this is 
not necessary. By expressing aggregate sales 
through aggregate buys and total trades, we 
reduce the dimension of the optimization 
problem, because there are fewer decision 
variables. This would make a difference for 
the speed of obtaining a solution, especially 
in the case of large portfolios and compli¬ 
cated representation of transaction costs. 

24. Note that y = 1 defines linear transaction 
costs. For linear transaction costs, multi¬ 
account optimization produces the same 
allocation as single-account optimization, 
because linear transaction costs assume that 
an increased aggregate amount of trading 
does not have an impact on prices. 

25. The information ratio is the ratio of (annu¬ 
alized) portfolio residual return (alpha) to 
(annualized) portfolio residual risk, where 
risk is defined as standard deviation. 

26. For further details, see Chapters 6, 7, and 
8 in Fabozzi, Kolm, Pachamanova, and 
Focardi (2007). 

27. See Chapter 8 in Fabozzi, Kolm, 
Pachamanova, and Focardi (2007). 

28. See, for example, Ledoit and Wolf (2003). 

29. For an overview of such models, see Disat- 
nik and Benninga (2007). 

30. For a step-by-step description of the Black- 
Litterman model, see Chapter 8 in Fabozzi, 
Kolm, Pachamanova, and Focardi (2007). 

31. See Black and Litterman (1992). 

32. See Michaud (1998), Jorion (1992), and 
Scherer (2002). 


33. For derivation, see, for example. Chapter 12 
in Fabozzi, Kolm, Pachamanova, and Fo¬ 
cardi (2007) or Chapter 9 in Pachamanova 
and Fabozzi (2010). 

34. For a more in-depth coverage of the topic of 
estimating input parameters for robust op¬ 
timization formulations, see Chapter 12 in 
Fabozzi, Kolm, Pachamanova, and Focardi 
(2007). 

35. See, for example, Goldfarb and Iyengar 
(2003) and Natarajan, Pachamanova, and 
Sim (2008). 

36. See Ben-Tal, Margalit, and Nemirovski 
(2000) and Bertsimas and Pachamanova 
(2008). 

37. For further details, see Fabozzi, Kolm, 
Pachamanova, and Focardi (2007). 
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Abstract: Quantitative equity investing is one method used by investors to identify attractive stocks 
and gain a competitive advantage. In contrast to fundamental investors who focus on a single 
company at a time, quantitative investors focus on stock characteristics. Quantitative investors 
look for sources of information or company characteristics that help to explain why one stock 
outperforms another stock. They assemble a group of characteristics into a unique stock selection 
model, which is the core of the quantitative investment process. The quantitative investment 
process can be divided into three main phases: research, portfolio construction, and monitoring. 
During the research phase, the stock selection model is created. During the portfolio construction 
phase, the quantitative investor uses the stock selection model to create a live portfolio. Finally, 
during the monitoring phase, the quantitative investor makes sure the portfolio is performing 
as expected and modifies it as needed. While quantitative investing can be very different from 
fundamental investing, they are complementary and combined can lead to a more well-rounded 
overall investment approach. 


The goal of this entry is to provide the basics 
of quantitative equity investing and an explana¬ 
tion of the quantitative investing process. More 
specifically, I focus on the following three ques¬ 
tions. First, how do quantitative and fundamen¬ 
tal equity investors differ? Second, what are the 
core steps in a quantitative equity investment 
process? Finally, what are the basic building 
blocks used by quantitative equity investors? 

In answering these questions, I will pull back 
the curtain on the quantitative equity invest¬ 
ment process, showing how it is similar to 
many other approaches, all searching for the 
best stocks. Where it differs is in the creation of 
a repeatable process that uses several key cri¬ 


teria to find the most attractive companies—its 
stock selection model. Finally, some of the most 
common techniques used by quantitative eq¬ 
uity investors are covered. 

It is important to understand that this entry 
is dedicated to a traditional quantitative eq¬ 
uity investing approach. There are many other 
types of investing that are quantitative in na¬ 
ture (e.g., high-frequency trading, statistical 
arbitrage, etc.), which will not be covered. 

EQUITY INVESTING 

Investing can take many forms, but it starts 
with an investor assigning a value to a security. 
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Figure 1 The Value of a Stock Comes from Multiple Information Sources 


Whether this value exceeds or is less than 
the current market price usually determines 
whether the investor will buy or sell the 
security In the case of equities, the investor 
often seeks to understand the specific company 
under consideration, the broader economic en¬ 
vironment, and the interplay between the two. 
This encompasses a wide range of information 
for the investor to consider as displayed in 
Figure 1. How this information is used differ¬ 
entiates the quantitative from the fundamental 
investor. 

FUNDAMENTAL VS. 
QUANTITATIVE INVESTOR 

Let's start with a basic question. How do port¬ 
folio managers select stocks from a broad uni¬ 
verse of 1,000 or more companies? 

Fundamental managers start with a basic 
company screen. For instance, they may first 
look for companies that satisfy conditions such 
as a price-earnings (P/E) ratio that is less than 
15, earnings growth greater than 10%, and profit 
margins in excess of 20%. Filtering by those 
characteristics may result in, say, 200 potential 
candidates. Next, portfolio managers in consul¬ 
tation with their group of stock analysts will 


spend the majority of their time thoroughly 
reviewing each of the potential candidates to 
arrive at the best 50 to 100 stocks for their 
portfolio. Quantitative managers, in contrast, 
spend the bulk of their time determining the 
characteristics for the initial stock screen, their 
stock selection model. They will look for five 
or more unique characteristics that are good at 
identifying the most attractive 200 stocks of the 
universe. Quantitative managers will then pur¬ 
chase all 200 stocks for their portfolio. 

So let's expand on how these two investors— 
fundamental and quantitative—differ. Figure 2 
details the main attributes of the two ap¬ 
proaches discussed further below. 

Focus: Company versus Characteristic: The 
fundamental investor's primary analysis is 
on a single company at a time, while the 
quantitative investor's primary analysis is 
on a single characteristic at a time. For ex¬ 
ample, a fundamental investor may analyze 
a health care company to assess whether a 
company's sales prospects look strong and 
whether this stronger sales growth is re¬ 
flected in the company's current stock price. 
A quantitative investor may also invest in 
a company based on its sales growth, but 
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Fundamental vs. Quantitative Investor: Viewing Information 


will start by assessing the sales growth 
characteristic. The quantitative investor will 
determine whether stocks within the group, 
health care companies, with higher sales 
growth also have higher stock returns. If they 
do, then the quantitative investor will buy 
health care stocks with higher sales growth. 
In the end, both types of investors may buy 
a stock due to its good sales prospects, but 
both come at the decision from a different 
point of view. 

Narrow vs. Broad: Fundamental investors fo¬ 
cus their attention narrowly on a small group 
of stocks. They cover fewer companies since 
they make more in-depth reviews of each 
company. Fundamental investors immerse 
themselves in the company, studying ev¬ 
erything from financial information, to new 
products, to meeting management. Ideally, 
they are searching for exploitable differences 
between their detailed assessment of the 
company's value and the market's percep¬ 
tion of that value. In contrast, quantitative 
investors focus more broadly. Rather than re¬ 
viewing one company at a time, they look 
across a large group of companies. Quan¬ 


titative investors focus on what separates 
companies from one another; they search for 
pieces of information (characteristics) that 
they can use to exploit differences between 
securities. Since they are dealing with a great 
deal of data from a large number of compa¬ 
nies, they employ quantitative techniques to 
quickly sift through the information. 

Position Concentration/Size of Bets: Another 
difference in the two approaches is the size 
of the positions within a portfolio; they tend 
to be larger for a fundamental investor and 
smaller for a quantitative investor. Funda¬ 
mental investors perform in-depth company 
analysis so they will have greater convic¬ 
tion in taking larger positions in their se¬ 
lected stocks. Quantitative investors perform 
in-depth analysis across a group of compa¬ 
nies, so they will tend to spread their bets 
across this larger group of companies. 

Risk Perspective: The fundamental investor 
sees risk at the company level while the 
quantitative investor is more focused at the 
portfolio level. Fundamental investors will 
review the risk to both their forecasts and 
catalysts for the company. They understand 
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Figure 3 Fundamental vs. Quantitative Investor: Process Differences 


how a changing macro picture can impact 
their valuation of the company. In contrast, 
the quantitative investor's broader view re¬ 
lates to understanding the risks across the 
portfolio. They understand if there are risk 
characteristics in their portfolio that are 
different from their chosen stock selection 
model. For example, a quantitative investor 
who does not believe growth prospects mat¬ 
ter to a company's stock performance would 
want to investigate if the model had the 
investor buying many very high- or low- 
growth companies. 

Past vs. Future: Finally, the fundamental in¬ 
vestor often places greater emphasis on the 
future prospects of the company while the 
quantitative investor studies the company's 
past. Fundamental investors tend to paint 
a picture of the company's future; they will 
craft a story around the company and its 
prospects; and they will look for catalysts 
generating future growth for a company. 
They rely on their ability to predict changes 
in a company. In contrast, the quantitative 
investor places more emphasis on the past, 
using what is known or has been reported 
by a company. Quantitative investors rely 
on historical accounting data as well as 
historical strategy simulations, or backtests, 
to search for the best company character¬ 
istics to select stocks. For instance, they 
will look at whether technology companies 
with stronger profitability have performed 
better than those without, or whether retail 
companies with stronger inventory controls 
have performed better than those without. 


Quantitative investors are looking for stock 
picking criteria that can be tested and 
incorporated into a stock selection model. 

In the end, we have two types of investors 
viewing information, often the same infor¬ 
mation, quite differently. The fundamental 
investor is a journalist focused on crafting a 
unique story of a company's future prospects 
and predicting the potential for gain in the 
company's stock. The quantitative investor 
is a scientist, broadly focused, relying on 
historical information to differentiate across 
all companies, using statistical techniques to 
create a stock selection model. 

These two investors can and often do create 
different portfolios based on their different ap¬ 
proaches as shown in Figure 3. Fundamental 
investors are more focused, with higher con¬ 
viction in their stocks resulting in fewer, larger 
positions in their portfolios. Quantitative in¬ 
vestors, reviewing a large group of companies, 
generally take a large number of smaller posi¬ 
tions in their portfolio. Fundamental investors 
are investing in a stock (or sector) and there¬ 
fore are most concerned with how much each 
of their stocks (or sectors) is contributing to 
performance. Quantitative investors are invest¬ 
ing in a characteristic and how well it differ¬ 
entiates stocks. They want to know how each 
of their characteristics is contributing to per¬ 
formance. Finally, fundamental investors' de¬ 
tailed view into the company allows them to 
understand the intrinsic risk of each investment 
they make—the potential stumbling blocks for 
each company. Quantitative investors' goal is to 
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"Quantamental” 

• Broad based analysis followed byin-depthcompanyanalysis 

• Repeatable process (science) followed by qualitative judgment (journalist) 

• What worked in the past combined with future prospects 

• Individual stock assessment with overall portfolio assessment 

• Understanding of performance across company/sector and style 


r 


Figure 4 Benefits of a Combined Fundamental and Quantitative Approach 


understand specific characteristics across a 
broad universe of stocks. They look at risks 
across their entire portfolio, attempting to di¬ 
versify away any firm-specific risks ancillary to 
their strategy. 

Now that you understand the basic differ¬ 
ences between the two approaches, it might also 
be clear how using both investment styles can 
be very appealing. As Figure 4 shows, the two 
styles are quite complementary in nature and 
can provide a robust, well-rounded view of a 
company or portfolio. Combining the two ap¬ 
proaches provides the following benefits: 

• Breadth and depth. In-depth analysis across a 
large group of stocks selecting the best subset 
of companies, which is followed by in-depth 
review of the small subset of attractive com¬ 
panies. 

• Facts balanced with human insight. The sci¬ 
entific approach reviewing large amounts of 
data across many companies complemented 
by personal judgment at the company level. 

• Past and future perspective. A detailed his¬ 
torical review of companies combined with a 


review of future potential prospects of a com¬ 
pany. 

• Fidl risk analysis. A broad look at risk both 
within each company owned and across the 
entire portfolio. 

• Clear portfolio performance. A thorough un¬ 
derstanding of which companies, sectors, and 
characteristics are driving a portfolio's perfor¬ 
mance. 

In fact, over the years, the defining line 
between the two approaches has been blurring. 
Some have coined a term for this joint process: 
"quantamental." Many investment managers 
are combining both approaches in one invest¬ 
ment process, which is why whether you are 
a fundamental or quantitative investor, it is 
important to understand both perspectives. 

Given our preceding discussion, the distinc¬ 
tion between the quantitative and fundamental 
approaches should now be better appreciated. 
In the remainder of this entry we restrict our 
focus to the quantitative equity investment 
process, addressing the last two topics listed at 
the beginning of this entry: the core steps in a 
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quantitative equity investment process and 
some of the basic building blocks used by 
quantitative investors. 


THE QUANTITATIVE STOCK 
SELECTION MODEL 

Before diving into the details of the quantita¬ 
tive investment process, let's look at what is 
at its core—the stock selection model. As ex¬ 
plained in the previous section, the quantitative 
investment approach is rooted in understand¬ 
ing what separates strong-performing stocks 
from weak-performing stocks. 1 The quantita¬ 
tive investor looks for sources of information 
or company characteristics (often referred to as 
factors or signals) 2 that help to explain why one 
stock outperforms another stock. They assem¬ 
ble these characteristics into a stock selection 
model, which can be run daily to provide an 
updated view on every stock in their invest¬ 
ment universe. 

The stock selection model is at the heart of 
the quantitative process. To build the model, 
the quantitative investor will look throughout 
history and see what characteristics drive per¬ 
formance differences between stocks in a group 
such as a universe (i.e., small cap, small-cap 
value, and large-cap growth) or a sector (i.e., 
technology, financials, materials). 

The quantitative investor's typical stock se¬ 
lection methodology is buying stocks with the 
most attractive attributes and not investing in 
(or shorting, if permitted by investment guide¬ 
lines) stocks with the least attractive attributes. 
For instance, let's suppose retail stocks that 
have the highest profitability tend to have 
higher stock returns than those with the low¬ 
est profitability. In this case, if a retail stock had 
strong profitability, there is a greater chance 
a portfolio manager would purchase it. Prof¬ 
itability is just one characteristic of a company. 
The quantitative investor will look at a large 
number of characteristics, from 25 to over 100, 
to include in the stock selection model. In the 


Retail Sector 


Gross Margin 


Earnings Growth 


Analyst Recommendation 


Inventory Management 


Earnings / Price Ratio 


Figure 5 Sample Stock Selection Model for the 
Retail Sector 

end, they will narrow their final model to a few 
characteristics that are best at locating perfor¬ 
mance differences among stocks in a particular 
universe or sector. 

Figure 5 is an example of a stock selection 
model for the retail sector. If a stock has good 
margins and positive earnings growth, sell-side 
analysts like it, solid inventory management 
and is attractively valued, especially as per¬ 
tains to earnings, then the quantitative investor 
would buy it. And if it did not have these char¬ 
acteristics, a quantitative investor would not 
own it, sell it, or short it. This example is for a 
retail sector; a quantitative investor could also 
have different models to select stocks in the 
bank sector or utilities sector or among small- 
cap value stocks. 

So how does a quantitative investor create 
and use the stock selection model? A good anal¬ 
ogy is a professional golfer. Like a quantitative 
investor, golfers create a model of their game. 
First, golfers analyze all elements of their ba¬ 
sic swing from backswing to follow through. 
They then alter their swing to different con¬ 
ditions (high winds, rain, cold), and different 
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course types (links, woodlands, fast greens). 
Next, golfers put their model into action. While 
they are golfing, they make mental notes about 
what is and isn't working to help enhance their 
game. Could they tweak their swing? What has 
been effective under the current weather condi¬ 
tions? How are they playing this type of course? 

Overall, the golfers' process is much like 
quantitative investors' process. They create a 
model, implement it, and then monitor it, as¬ 
sessing their ability to shoot below par. Like 
professional golfers who go to the driving range 
for countless hours to perfect their swing, quan¬ 
titative investors will spend countless hours 
perfecting their model, understanding how it 
works under many different market (weather / 
course) conditions. 

With that analogy in mind, we now turn to 
the entire quantitative investment process. 

THE OVERALL 
QUANTITATIVE 
INVESTMENT PROCESS 

The quantitative process can be divided into 
the following three main phases (shown in 
Figure 6): 

• Research 

• Portfolio construction 

• Monitoring 

During the research phase, the stock selec¬ 
tion model is created. During the portfolio 
construction phase, the quantitative investor 


"productionalizes" the stock selection model or 
gets it ready to invest in a live portfolio. Finally, 
during the monitoring phase, the quantitative 
investor makes sure the portfolio is performing 
as expected. 

RESEARCH 

Let's start with the research phase since it is the 
basic building block of the quantitative process. 
It is where the fact-finding mission begins. This 
is similar to when the golfer spends countless 
hours at the driving range perfecting his (or 
her) swing. In this phase, the quantitative in¬ 
vestor determines what aspects of a company 
make its stock attractive or unattractive. The 
research phase begins by the quantitative in¬ 
vestors testing all the characteristics they have 
at their disposal, and it finishes with assembling 
the chosen characteristics into a stock selection 
model (see Figure 7). 

1. Characteristic Testing. First, quantitative in¬ 
vestors determine which characteristics are 
good at differentiating strong-performing 
from weak-performing stocks. Initially, the 
quantitative investor segments the stocks. 
This could be by sector, such as consumer 
discretionary; industry, such as consumer 
electronics; or a universe, such as small- 
cap value stocks. Once the stocks have 
been grouped, each of the characteristics 
is tested to see if it can delineate the 
strong-performing stocks from the weak- 
performing stocks. 
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Figure 6 Three Core Phases of the Quantitative Equity Investment Process 
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Figure 7 Two Core Steps in the Research Phase of the Quantitative Equity Investment Process 


2. Model Creation. Second, quantitative in¬ 
vestors select the final characteristics that are 
best at picking the most attractive stocks. 
Then they weight each characteristic in the 
stock selection model—determining which 
characteristics should be more relied upon 
when picking stocks, or if they all should be 
treated equally. 

During the research phase, the quantitative 
investor tries to get a broad picture of a char¬ 
acteristic, making sure it performs well under 
a diverse set of conditions and performance 
measures. For testing, the quantitative investor 


looks at historical information over 20 years or 
more in order to cover multiple market cycles. 
While testing, many performance metrics 
are reviewed to get an expansive view of a 
characteristic's ability to differentiate stocks. 
These metrics span the return category, risk 
category, and other metrics as outlined in 
Figure 8. Using an array of metrics, quanti¬ 
tative investors are better able to confirm a 
characteristic's consistency. They make sure 
that the selected characteristics score well on 
more than a single metric. Before continuing 
with the research process, let's review a few of 
the more commonly used metrics. 
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Figure 8 Characteristic Testing in the Research Phase 




Basics of Quantitative Equity Investing 


97 


Monthly Return 

0 . 0 % 5 . 0 % 



Top Quintile = 2.6% 


QuintileSpread = 
2.6% - 0.6% = 2.0% 


Bottom Quintile = 0.6% 


s Quintile Spread 


Profit Margin 

- 10 . 0 % 0 . 0 % 10 . 0 % 20 . 

H 
M 
F 
D 
T 
G 
N 

Companies 
(AtoT) 

J 

o 

B 
K 
E 
P 
Q 
R 
S 
C 
L 

Figure 9 Determining the Characteristic': 

Characteristic Testing: Key 
Quantitative Research Metrics 

In this section we will review quintile returns 
and information coefficients, which measure 
whether a characteristic can differentiate be¬ 
tween winning and losing stocks. Although 
profitability was chosen for the examples, other 
characteristics such as sales growth, P/E ratio, 
or asset turnover also could have been chosen. 

Quintile Returns 

The quintile return is already prevalent across 
most research publications, but is gaining pop¬ 
ularity in more and more mainstream publica¬ 
tions such as the Wall Street Journal, Barron s, and 
the like. Quintile returns measure how well a 
characteristic differentiates stocks. In essence, 
the stocks that are being reviewed are seg¬ 
mented into five groups (quintiles) and then 
are tested to determine if the companies in the 
group with the best attributes (top quintile) out¬ 


perform the group with the least desirable at¬ 
tributes (bottom quintile). 

Figure 9 provides an example. In this exam¬ 
ple, we start with 20 companies that we refer 
to as A through T. The first step—the left-hand 
side of the exhibit—is to order the 20 compa¬ 
nies by profitability from highest to lowest. In 
the second step, this ordered list is divided into 
five groups, creating a most profitable group 
(top quintile) down to the least profitable group 
(bottom quintile). The top and bottom quintile 
groups are boxed on the right-hand chart of the 
figure. Finally, the performance of the top quin¬ 
tile is compared to the bottom quintile. 

As Figure 9 shows, the stocks with highest 
profitability (top quintile) returned 2.6% while 
the stocks with the lowest profitability (bottom 
quintile) returned only 0.6%. So the top quintile 
stocks outperformed the bottom quintile stocks 
by 2.0%, meaning for this month, the most prof¬ 
itable companies outperformed the least prof¬ 
itable companies by 2.0%. This is commonly 
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Determining the Characteristic's Information Coefficient 


referred to as the characteristic's quintile re¬ 
turn or quintile spread. The higher the quintile 
spread, the more attractive the characteristic is. 

Information Coefficient 

Another common metric used for determin¬ 
ing if a characteristic is good at separating the 
strong- from the weak-performing stocks is the 
information coefficient (IC). It does so by measur¬ 
ing the correlation between a stock's character¬ 
istic (i.e., profitability) and its return. The major 
difference between the IC and quintile return is 
that the IC looks across all of the stocks, while 
the quintile return only focuses on the best and 
worst stocks, ignoring those stocks in the mid¬ 
dle. The IC is more concerned with differenti¬ 
ating performance across all stocks rather than 
the extremes. 

The calculation of the IC is detailed in 
Figure 10. Similar to assessing the quintile re¬ 
turn, the sort ordering of the companies based 
on profitability is done first. However, the next 


step is different. In the second step, each stock 
is ranked on both profitability and return. The 
most profitable company is assigned a rank 
of 1 all the way down to the least profitable 
company, which is assigned a rank of 20. Like¬ 
wise for stock returns: The highest returning 
stock is assigned a rank of 1 down to the low¬ 
est returning stock receiving a rank of 20. In 
the third step, the rank of the company's prof¬ 
itability is correlated with the rank of the com¬ 
pany's return. The correlation of the two ranks 
is the IC, which is 11% as shown in Figure 10. 
The higher the correlation (i.e., IC), the more 
likely companies with higher profitability also 
have higher returns and the more effective the 
characteristic. 

When is it better to employ an IC over a quin¬ 
tile spread? IC is a better metric when a quanti¬ 
tative investor is considering owning a greater 
number of stocks in the portfolio. The reason is 
that the IC looks at the relationships across all 
of the stocks in the group. The quintile return 
is better suited for more concentrated bets in 
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Profit Margin 



Figure 11 Determining the Characteristic's Batting Average 


# Positive 

Months = 47 

# Months = 72 


65% 


fewer stocks as it places a greater emphasis on 
measuring the few stocks at the extremes. 

The last two examples reviewed how a char¬ 
acteristic (profitability) was able to explain the 
next month's return for a group of stocks. In 
both cases it looked effective—a quintile return 
of 2.0% and an IC of 11%. However, in practice, 
it is also necessary to assess whether the charac¬ 
teristic was effective for not only one month, but 
over decades of investing encompassing multi¬ 
ple market cycles. To that end, during the re¬ 
search process a quantitative investor will look 
at the average quintile returns or ICs over an ex¬ 
tended period of up to 20 years or more. When 
looking at these longer time series, quantitative 
investors use additional metrics to understand 
the characteristic's effectiveness. 

Characteristic Testing: Key 
Measures of Consistency 

Two commonly used measures of consistency 
are batting average and information ratio. 

Batting Average 

Batting average is a straightforward metric. In 
baseball a player's batting average is the num¬ 
ber of hits divided by the number of times 
at bat. A similar metric is used in investing. 


Batting average is the number of positive per¬ 
formance months (hits) divided by the number 
of total months (at bats). The higher the batting 
average, the more consistently the characteristic 
generates positive performance. 

As Figure 11 displays, to arrive at the bat¬ 
ting average we take the number of months 
the quintile return was positive divided by the 
number of months tested. In our example, in 
47 of the 72 months profitability was effective, 
resulting in a positive return. This translates to 
a batting average of 65%, which is quite high. 
Imagine walking into a casino in Las Vegas 
where you have a 65% chance of winning every 
bet. That casino would not be in business very 
long with you at the table. 

Information Ratio 

Information ratio is also used to measure con¬ 
sistency. This measure is defined as the aver¬ 
age return of a characteristic divided by its 
volatility—basically a measure of return per 
unit of risk or risk reward ratio. For volatility, 
quantitative investors use tracking error, which 
is the standard deviation of excess returns. 

Figure 12 demonstrates the calculation of the 
information ratio. In this example, there are two 
characteristics. Which one should be selected? 
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Figure 12 Determining the Characteristic's Information Ratio 


Based only on returns, we would choose char¬ 
acteristic 2 since it has a higher excess return 
(3.0%) than characteristic 1 (2.0%). However, 
as we can see in the figure, characteristic 2 also 
has much larger swings in performance than 
characteristic 1 and therefore more risk. The 
higher risk of characteristic 2 is confirmed by 
its high tracking error of 12.0%, three times 
greater than characteristic l's tracking error 
of 4.0%. Characteristic 1 looks much better 
on a risk-adjusted basis with an information 
ratio of 0.50 (2.0%/4.0%) or twice characteristic 
2's information ratio of 0.25 (3.0%/12.0%). So 
even though characteristic 1 has a lower return 
than characteristic 2, it also has much less 
risk, making it preferred since investors are 
rewarded more for the risk they are taking. 

Model Creation 

After reviewing and selecting the best charac¬ 
teristics, the quantitative investor then needs 
to assemble them into a stock selection model. 
This step of the research process is called 
model creation. It usually involves two main 
components: 

1. Ascertaining whether the characteristics se¬ 
lected are not measuring the same effect (i.e., 
are not highly correlated). 

2. Assigning weights to the selected character¬ 
istics, potentially placing greater emphasis 
on those in which the quantitative investor 
has stronger convictions. 


Let us begin by discussing the first compo¬ 
nent in model creation: measuring correlation. 
When including characteristics in a stock selec¬ 
tion model, the quantitative investor does not 
want to include two characteristics that have 
very similar performance since they may be 
measuring similar aspects of the company. In 
these cases, quantitative managers could be po¬ 
tentially doubling their position in a stock for 
the same reason. For instance, stocks with a his¬ 
torically high sales growth may perform simi¬ 
larly to stocks with high expected growth in the 
future, or stocks with strong gross margins may 
perform similarly to stocks with strong profit 
margins. In either case, we would not include 
both similar characteristics. 

An example is provided in Figure 13, which 
shows the cumulative quintile spread return 
over 10 years for three characteristics (which 
we have labeled A, B, and C). Characteristic 
A did the best at differentiating the winners 
from losers—the stocks it liked outperformed 
the stocks it did not like by almost 10% over the 
10-year period. Characteristic B was next with 
a return slightly greater than 8%, and charac¬ 
teristic C was the lowest with an almost 4% 
cumulative 10-year return. Given that all three 
characteristics have good performance, which 
two should the quantitative investor retain in 
the model? 

Although characteristics A and B are bet¬ 
ter at differentiating winners from losers than 
characteristic C, A's return pattern looks very 
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Characteristic Correlation 
Table 


A B 


A 

1.00 

0.80 

0.12 

B 

0.80 

1,00 

0.04 

C 

0.12 

0.04 

1.00 


Figure 13 Model Creation: Correlation Review/Table 1 Characteristic Correlations 


similar to B's. This is confirmed by Table 1 
where characteristics A and B have a correla¬ 
tion of 0.80. Since a correlation of 1.00 means 
their returns move in lockstep, a correlation 
of 0.80 indicates they are very similar. Rather 
than keeping both A and B and potentially 
doubling our positions from similar character¬ 
istics, it would be best to keep either A or B 
and combine the characteristic retained with 
C. Even though characteristic C is the worst 
performing of the three, for the stock selection 
model C provides a good uncorrelated source of 
performance. 

Once the characteristics to select stocks are 
identified, quantitative investors are ready to 
determine the importance or weight of each 


characteristic. They must decide whether all 
characteristics should have the same weight 
or whether better characteristics should have 
greater weight in the stock selection model. 

There are many ways to determine the 
weights of the characteristics. We can simply 
equal weight them or use some other process 
such as creating an algorithm, performing re¬ 
gressions, or optimizing. Figure 14 shows how 
a typical stock selection model is created. In this 
step, the selected characteristics are combined 
to determine a target for each stock whether it 
be a return forecast, rank, or a position size. 

Once the combination of characteristics for 
the model is selected, the quantitative investor 
determines their weights and then reviews the 
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Figure 14 Stock Selection Model: Characteristic Weightings 
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Figure 15 Three Main Steps of the Portfolio Construction Phase 


model. Model review is similar to reviewing 
a single characteristic. The model is looked at 
from many perspectives, calculating all of the 
metrics described in Figure 8. The quantitative 
investor would look at how the top-quintile 
stocks of the model perform versus the bot¬ 
tom and look at information coefficients of the 
stock selection model over time. In addition, 
how much trading or turnover the stock selec¬ 
tion model creates is reviewed or if there are 
any biases in the stock selections (e.g., too many 
small-cap stocks, or a reliance on high- or low- 
beta stocks). In practice, the review is much 
more extensive, covering many more metrics. 
If the stock selection model does not hold up 
under this final review, then the quantitative 
investor will need to change the stock selection 
model to eliminate the undesirable effects. 


purchased for the portfolio as well as to spec¬ 
ify how large its position should be. 

Step 3: Trade. The stock selection model that has 
incorporated the most current information is 
used for trading. 

Data Collection 

As Figure 16 shows, data come from many 
different sources, such as a company's fun¬ 
damental, pricing, economic, and other data 
(specialized data sources). All of these data are 
updated nightly, so it is important to have ro¬ 
bust systems and processes established to han¬ 
dle large amounts of data, clean the data (check 
for errors), and process it in a timely fashion. 
The quantitative investor seeks to have every¬ 
thing ready to trade at the market opening. 


PORTFOLIO CONSTRUCTION 

In the second phase of the investment process, 
the quantitative investor uses the stock selec¬ 
tion model to buy stocks. It is in this phase 
that the quantitative investor puts the model 
into production. Returning to our golfer anal¬ 
ogy, this is when they travel to the course to 
play a round of golf. 

During the portfolio construction phase, the 
model is ready to create a daily portfolio. This 
phase consists of three main steps as shown in 
Figure 15 and described below. 

Step 1: Collect data. Data are collected on a 
nightly basis, making sure the data are cor¬ 
rect and do not contain any errors. 

Step 2: Create security weights. New data are 
used to both select the stocks that should be 


Creating Security Weights 

After the data are collected and verified, the 
next step is running all of the updated company 
information through the stock selection model. 
This will create final positions for every stock 
in the screened universe. In this step, each stock 
is ranked using the stock selection model, with 
the better scoring companies making it into the 
portfolio. 

Figure 17 provides a simplified example of 
this, showing a stock selection model with three 
characteristics: gross margins, sales growth, 
and earnings yield (i.e., earnings-to-price ra¬ 
tio; the higher the ratio, the more attractively 
priced the stock is). From the example. Com¬ 
pany ABC is in the top 10% of companies based 
on gross margin, in the top 30% in sales growth, 
and average on earnings yield. Company ABC 
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Figure 16 Data Collection Step of the Portfolio Construction Phase 


may represent a company finding a profitable 
market and growing into it, and the rest of the 
market has not caught on to its prospects, so 
it is still valued like an average stock. In this 
case, the stock rates favorably by the stock selec¬ 
tion model and would be purchased. The other 
stock, the stock of Company XYZ, is not as fa¬ 


vorable and either would not be held in the 
portfolio or, if permitted, could be shorted. Al¬ 
though Company XYZ also has good margins, 
its growth is slowing and it is relatively expen¬ 
sive compared to its earnings. The company 
could be one that had a profitable niche, but its 
niche may be shrinking as sales are dwindling. 
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Figure 17 Creating Security Weights Step of the Portfolio Construction Phase 
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Furthermore, the investment community has 
not discounted the slowing growth and hence 
the stock is still expensive. 

Trade 

The final step in the portfolio construction pro¬ 
cess is to trade into the new positions chosen by 
the stock selection model. While many invest¬ 
ment approaches trade regularly, even daily, 
quantitative investors tend not to. Quantitative 
investors tend to trade monthly or longer. They 
may wait for the views from their stock selec¬ 
tion model to change significantly from their 
current portfolio before trading into the new 
views. 

MONITORING 

The third and final phase in the quantitative eq¬ 
uity investment process is monitoring perfor¬ 
mance and risk. This step is important to check 
if any hidden biases are embedded in the port¬ 
folio and that the portfolio is performing in line 
with expectations. Returning one last time to 
our golfer analogy, this is when the golfer is 
making mental notes as to what is and isn't 
working during the round to improve his or 
her game in the future. This step can be bro¬ 
ken into two activities: risk management and 
performance attribution. 

Risk Management 

In risk management, the main emphasis is on 
making sure that the quantitative investor is 
buying companies consistent with the stock se¬ 
lection model. Returning to the retail model 
discussed earlier in this entry, the model liked 
companies with good profit margins but had no 
view on the company's beta. So the quantita¬ 
tive investor would want to make sure that the 
companies included in the portfolio have high 
profit margins but average beta. If the portfolio 
started to include high-beta stocks, the quan¬ 
titative investor would want to make adjust¬ 
ments to the process to eliminate this high-beta 


bias. There are many types of risk management 
software and techniques that can be used to de¬ 
tect any hidden risks embedded in the portfolio 
and provide ways to remedy those identified. 

Another aspect of risk management is to make 
sure that the portfolio's risk level is consistent 
with the modeling phase. The quantitative in¬ 
vestor wants to ensure that the tracking error 
is not too high or low relative to expectations. 
Again, risk management techniques and soft¬ 
ware can be used to monitor tracking error and 
sources of tracking error, and to remedy any 
deviations from expectations. 

Performance Attribution 

Performance attribution is critical in ensuring 
that the actual live portfolio's performance is 
coming from the characteristics in the stock 
selection model and is in line with performance 
expected during the modeling stage. Perfor¬ 
mance attribution is like monitoring a car's gas 
mileage: If the gas mileage begins to dip below 
what the driver expects, or what it is known to 
be, then the driver would want to look under 
the car's hood. Similarly, if the stock selection 
model is not producing the desired results, or 
the results have changed, then the quantitative 
investor would need to look under the hood 
of the stock selection model. If performance is 
not being generated from the selected charac¬ 
teristics, then the quantitative manager would 
want to check out the model in more detail. 
One possibility is that another characteristic 
is canceling the desired characteristics, or the 
model should be providing more weight to the 
desired characteristic. 

The monitoring phase is critical in making 
sure that the stock selection model is being im¬ 
plemented as expected. 

CURRENT TRENDS 

Let's look at some recent trends in the quantita¬ 
tive investment industry. 

Many quantitative equity investors are look¬ 
ing for additional sources of alpha by using 
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alternative data sources to help select stocks. 
One notable source is industry-specific data 
(e.g., banking, airlines, and retail). Addition¬ 
ally, quantitative investors are turning to the 
Internet to better understand news flows for 
companies through Web-based search engines. 
Furthermore, quantitative investors are using 
more conditioning models. Conditioning oc¬ 
curs when two characteristics are combined 
rather than choosing them side by side in a stock 
selection model. Traditional models would look 
for companies that have either attractive mar¬ 
gins or growth. With conditioning models, com¬ 
panies that have both attractive margins and 
growth are sought. 

Dynamic modeling is gaining renewed pop¬ 
ularity. It consists of timing characteristics, de¬ 
termining when they should enter or leave a 
stock selection model based on business cycle 
analysis, technical market indicators, or other 
information. For instance, during recessionary 
periods, a quantitative investor may want com¬ 
panies with strong profitability, while in expan¬ 
sionary periods companies with good growth 
prospects are sought. A stock selection model 
would contain profitability when the economy 
is entering a recession, and then include the 
growth characteristic once it felt that the econ¬ 
omy is moving into an expansionary period. 
This is an example of how quantitative in¬ 
vestors are bringing more personal judgment to 
the process, similar to fundamental investors. 

Finally, with the advent of high-frequency 
trading and more advanced trading analytics, 
many quantitative investors are reviewing how 
best to implement their stock selection models. 
Some characteristics such as earnings surprise 
may have short-lived alpha prospects, so quan¬ 
titative investors would want to trade into these 
stocks more quickly. Other characteristics are 
longer term in nature, such as valuation met¬ 
rics, so investors would not have to trade into 
companies with attractive valuations as quickly. 
Furthermore, trading costs are being measured 
with greater granularity, allowing quantitative 
investors to measure transaction cost and incor¬ 


porate these better estimates into their research 
modeling phase. 


KEY POINTS 

• Investing begins with processing many dif¬ 
ferent types of information to find the most 
attractively priced assets. Fundamental and 
quantitative investors differ in their approach 
to the available information. The fundamen¬ 
tal investor's primary focus is on a single 
company at a time, while the quantitative in¬ 
vestor's primary focus is on a single charac¬ 
teristic at a time. 

• Quantitative and fundamental approaches 
are complementary. By combining the two ap¬ 
proaches you can obtain a more well-rounded 
investment process including breadth and 
depth in analysis, facts based on human judg¬ 
ment, a past and future perspective of a com¬ 
pany, and a more well-rounded view of risk 
and performance of the portfolio. 

• The quantitative equity investment process is 
made up of three phases: research, portfolio 
construction, and monitoring. During the re¬ 
search phase, the stock selection model is cre¬ 
ated. During the portfolio construction phase, 
the quantitative investor "productionalizes" 
the stock selection model or gets it ready to 
invest in a live portfolio. Finally, during the 
monitoring phase, the quantitative investor 
makes sure the portfolio is performing as 
expected. 

• At the heart of the quantitative equity invest¬ 
ment process is the stock selection model. The 
model includes those characteristics that are 
best at delineating the highest from lowest 
returning stocks. Models can be created for 
industries, sectors, or styles. 

• Two common metrics used to judge a char¬ 
acteristic's effectiveness are quintile returns 
and information coefficients. Two more met¬ 
rics used to understand the consistency of 
a characteristic's performance over time are 
batting average and information ratio. 
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• During the portfolio construction phase, data 
are collected from multiple sources and run 
through the investor's stock selection model 
to arrive at a list of buy and sell candidates. 
The buy candidates will have the strongest 
characteristic values in the investor's stock 
selection model, and the sell candidates the 
weakest characteristic values. 

• The monitoring phase is when the investor 
assures that the performance in the portfolio 
is consistent with expectations. During this 
phase, the investor will make sure there are 
no hidden bets in the portfolio and that the 
characteristics in the stock selection model are 
performing as expected. 


NOTES 

1. Throughout the entry we discuss whether 
characteristics can separate a stock with 
strong future returns from one with weak 
future returns. Many times reference will be 
made to a "strong" characteristic that can dif¬ 
ferentiate the strong- from weak-performing 
stocks. 

2. In this entry, the term "characteristic" means 
the attributes that differentiate companies. 
Quantitative investors often refer to these 
same characteristics as factors or signals 
which they typically use in their stock se¬ 
lection model. 


Quantitative Equity Portfolio 
Management 

ANDREW ALFORD, PhD 

Managing Director, Quantitative Investment Strategies, Goldman Sachs Asset Management 

ROBERT JONES, CFA 

Chairman, Arwen Advisors and Chairman and CIO, System Two Advisors 

TERENCE LIM, PhD, CFA 

CEO, Arwen Advisors 


Abstract: Equity portfolio management has evolved considerably since the 1950s. Portfolio theories 
and asset pricing models, in conjunction with new data sources and powerful computers, have 
revolutionized the way investors select stocks and create portfolios. Consequently, what was once 
mostly an art is increasingly becoming a science: Loose rules of thumb are being replaced by 
rigorous research and complex implementation. While greatly expanding the frontiers of finance, 
these advances have not necessarily made it any easier for portfolio managers to outperform the 
market. The two approaches to equity portfolio management are the traditional approach and the 
quantitative approach. Despite the contrasting of these two approaches by their advocates, they 
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of equity values, using observable data to quantify these key drivers, using expert judgment to 
develop ways to map these key drivers into the final stock-selection decision, and evaluating their 
performance over time. The difference in the two approaches is how they perform these tasks. 


Equity portfolio management has evolved con¬ 
siderably since Benjamin Graham and David 
Dodd published their classic text on security 
analysis in 1934 (Graham and Dodd, 1934). For 
one, the types of stocks available for investment 
have shifted dramatically, from companies with 
mostly physical assets (such as railroads and 
utilities) to companies with mostly intangible 
assets (such as technology stocks and pharma¬ 


ceuticals). Moreover, theories such as the mod¬ 
ern portfolio theory and the capital asset pricing 
model, in conjunction with new data sources 
and powerful computers, have revolutionized 
the way investors select stocks and create port¬ 
folios. Consequently, what was once mostly an 
art is increasingly becoming a science: Loose 
rules of thumb are being replaced by rigorous 
research and complex implementation. 
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Of course, these new advances, while greatly 
expanding the frontiers of finance, have not 
necessarily made it any easier for portfolio 
managers to beat the market. In fact, the 
increasing sophistication of the average in¬ 
vestor has probably made it more difficult 
to find—and exploit—pricing errors. Several 
studies show that a majority of professional 
money managers have been unable to beat the 
market (see, for example, Malkiel, 1995). There 
are no sure bets, and mispricings, when they 
occur, are rarely both large and long lasting. 
Successful managers must therefore constantly 
work to improve their existing strategies and 
to develop new ones. Understanding fully 
the equity management process is essential to 
accomplishing this challenging task. 

These new advances, unfortunately, have also 
allowed some market participants to stray from 
a sound investment approach. It is now easier 
than ever for portfolio managers to use biased, 
unfamiliar, or incorrect data in a flawed strat¬ 
egy, one developed from untested conjecture or 
haphazard trial and error. Investors, too, must 
be careful not to let the abundance of data and 
high-tech techniques distract them when allo¬ 
cating assets and selecting managers. In par¬ 
ticular, investors should not allow popular but 
narrow rankings of short-term performance ob¬ 
scure important differences in portfolio man¬ 
agers' style exposure or investment process. To 
avoid these pitfalls, it helps to have a solid grasp 
of the constantly advancing science of equity 
investing. 

This entry provides an overview of equity 
portfolio management aimed at current and po¬ 
tential investors, analysts, investment consul¬ 
tants, and portfolio managers. We begin with 
a discussion of the two major approaches to 
equity portfolio management: the traditional ap¬ 
proach and the quantitative approach. The remain¬ 
ing sections of the entry are organized around 
four major steps in the investment process: 
(1) forecasting the unknown quantities needed 
to manage equity portfolios—returns, risks, 
and transaction costs; (2) constructing portfo¬ 


lios that maximize expected risk-adjusted re¬ 
turn net of transaction costs; (3) trading stocks 
efficiently; and (4) evaluating results and up¬ 
dating the process. 

These four steps should be closely integrated: 
The return, risk, and transaction cost forecasts, 
the approach used to construct portfolios, the 
way stocks are traded, and performance evalu¬ 
ation should all be consistent with one another. 
A process that produces highly variable, fast- 
moving return forecasts, for example, should 
be matched with short-term risk forecasts, 
relatively high transaction costs, frequent 
rebalancing, aggressive trading, and short- 
horizon performance evaluation. In contrast, 
stable, slower-moving return forecasts can be 
combined with longer term risk forecasts, lower 
expected transaction costs, less frequent rebal¬ 
ancing, more patient trading, and longer-term 
evaluation. Mixing and matching incompatible 
approaches to each part of the investment pro¬ 
cess can greatly reduce a manager's ability to 
reap the full rewards of an investment strategy. 

A well-structured investment process should 
also be supported by sound economic logic, 
diverse information sources, and careful em¬ 
pirical analysis that together produce reliable 
forecasts and effective implementation. And, of 
course, a successful investment process should 
be easy to explain; marketing professionals, 
consultants, and investors all need to under¬ 
stand a manager's process before they will 
invest in it. 

TRADITIONAL AND 
QUANTITATIVE 
APPROACHES TO EQUITY 
PORTFOLIO MANAGEMENT 

At one level, there are as many ways to man¬ 
age portfolios as there are portfolio managers. 
After all, developing a unique and innovative 
investment process is one of the ways man¬ 
agers distinguish themselves from their peers. 
Nonetheless, at a more general level, there are 
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two basic approaches used by most managers: 
The traditional approach and the quantitative 
approach. Although these two approaches are 
often sharply contrasted by their proponents, 
they actually share many traits. Both apply eco¬ 
nomic reasoning to identify a small set of key 
drivers of equity values; both use observable 
data to help measure these key drivers; both 
use expert judgment to develop ways to map 
these key drivers into the final stock-selection 
decision; and both evaluate their performance 
over time. What differs most between tradi¬ 
tional and quantitative managers is how they 
perform these steps. 

Traditional managers conduct stock-specific 
analysis to develop a subjective assessment 
of each stock's unique attractiveness. Tradi¬ 
tional managers talk with senior management, 
closely study financial statements and other 
corporate disclosures, conduct detailed, stock- 
specific competitive analysis, and usually build 
spreadsheet models of a company's financial 
statements that provide an explicit link between 
various forecasts of financial metrics and stock 
prices. The traditional approach involves de¬ 
tailed analysis of a company and is often well 
equipped to cope with data errors or structural 
changes at a company (e.g., restructurings or 
acquisitions). However, because the traditional 
approach relies heavily on the judgment of ana¬ 
lysts, it is subject to potentially severe subjective 
biases such as selective perception, hindsight 
bias, stereotyping, and overconfidence that can 
reduce forecast quality. (For a discussion of the 
systematic errors in judgment and probability 
assessment that people frequently make, see 
Kahneman, Slovic, and Tversky, 1982.) More¬ 
over, the traditional approach is costly to apply, 
which makes it impracticable for a large invest¬ 
ment universe comprising many small stocks. 
The high cost and subjective nature also make 
it difficult to evaluate, because it is hard to cre¬ 
ate the history necessary for testing. Testing 
an investment process is important because it 
helps to distinguish factors that are reflected 
in stock prices from those that are not. Only 


factors that are not yet impounded in stock 
prices can be used to identify profitable trad¬ 
ing opportunities. Failure to distinguish be¬ 
tween these two types of factors can lead to 
the familiar "good company, bad stock'' prob¬ 
lem in which even a great company can be a 
bad investment if the price paid for the stock is 
too high. 

Quantitative managers use statistical models 
to map a parsimonious set of measurable 
factors into objective forecasts of each stock's 
return, risk, and cost of trading. The quantita¬ 
tive approach formalizes the relation between 
the key factors and forecasts, which makes 
the approach transparent and largely free of 
subjective biases. Quantitative analysis can 
also be highly cost effective. Although the fixed 
costs of building a robust quantitative model 
are high, the marginal costs of applying the 
model, or extending it to a broader investment 
universe, are low. Consequently, quantitative 
portfolio managers can choose from a large 
universe of stocks, including many small and 
otherwise neglected stocks that have attractive 
fundamentals. Finally, because the quantitative 
approach is model-based, it can be tested his¬ 
torically on a wide cross-section of stocks over 
diverse economic environments. While quan¬ 
titative analysis can suffer from specification 
errors and overfitting, analysts can mitigate 
these errors by following a well-structured and 
disciplined research process. 

On the negative side, quantitative models can 
be misleading when there are bad data or sig¬ 
nificant structural changes at a company (that 
is, "garbage in, garbage out"). For this rea¬ 
son, most quantitative managers like to spread 
their bets across many names so that the suc¬ 
cess of any one position will not make or break 
the strategy. Traditional managers, conversely, 
prefer to take fewer, larger bets given their de¬ 
tailed hands-on knowledge of the company and 
the high cost of analysis. 

A summary of the major advantages of each 
approach to equity portfolio management is 
presented in Table 1. (Dawes, Faust, and Meehl 
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Table 1 Major Advantages of the Traditional and Quantitative Approaches to Equity Portfolio Management 


Traditional approach 

Depth 


Regime shifts 


Signal identification 


Qualitative factors 


Quantitative approach 

Universe 

Discipline 

Verification 

Risk management 
Lower fees 


Although they have views on fewer companies, traditional managers tend to 
have more in-depth knowledge of the companies they cover. Unlike a 
computerized model, they should know when data are misleading or 
unrepresentative. 

Traditional managers may be better equipped to handle regime shifts and 
recognize situations where past relationships might not be expected to 
continue (e.g., where back-tests may be unreliable). 

Based on their greater in-depth knowledge, traditional managers can better 
understand the unique data sources and factors that are important for stocks 
in different countries or industries. 

Many important factors that may affect an investment decision are not 
available in any database and are hard to evaluate quantitatively. Examples 
might include management and their vision for the company; the value of 
patents, brands, and other intangible assets; product quality; or the impact of 
new technology. 


Because a computerized model can quickly evaluate thousands of securities 
and can update those evaluations daily, it can uncover more opportunities. 
Further, by spreading their risk across many small bets, quantitative 
managers can add value with only slightly favorable odds. 

While individuals often base decisions on only the most salient or distinctive 
factors, a computerized model will simultaneously evaluate all specified 
factors before reaching a conclusion. 

Before using any signal to evaluate stocks, quantitative managers will 
normally backtest its historical efficacy and robustness. This provides a 
framework for weighting the various signals. 

By its nature, the quantitative approach builds in the notion of statistical risk 
and can do a better job of controlling unintended risks in the portfolio. 

The economies of scale inherent in a quantitative process usually allow 
quantitative managers to charge lower fees. 


[1989] provide an excellent comparison of clin¬ 
ical (traditional) and actuarial (quantitative) 
decision analysis.) Our focus in the rest of 
this entry is the process of quantitative equity 
portfolio management. 

FORECASTING STOCK 
RETURNS, RISKS, AND 
TRANSACTION COSTS 

Developing good forecasts is the first and per¬ 
haps most critical step in the investment pro¬ 
cess. Without good forecasts, the difficult task 
of forming superior portfolios becomes nearly 


impossible. In this section we discuss how to 
use a quantitative approach to generate forecasts 
of stock returns, risks, and transaction costs. These 
forecasts are then used in the portfolio construc¬ 
tion step described in the next section. 

It should be noted that some portfolio 
managers do not develop explicit forecasts 
of returns, risks, and transaction costs. In¬ 
stead, they map a variety of individual stock 
characteristics directly into portfolio holdings. 
However, there are limitations with this abbre¬ 
viated approach. Because the returns and risks 
corresponding to the various characteristics 
are not clearly identified, it is difficult to ensure 
the weights placed on the characteristics 
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are appropriate. Further, measuring risk at 
the portfolio level is awkward without reliable 
estimates of the risks of each stock, especially 
the correlations between stocks. Similarly, 
controlling turnover is hard when returns and 
transaction costs are not expressed in consistent 
units. And, of course, it is difficult to explain a 
process that occurs in one magical step. 

Forecasting Returns 

The process of building a quantitative return¬ 
forecasting model can be divided into four 
closely linked steps: (1) identifying a set of po¬ 
tential return forecasting variables, or signals; 
(2) testing the effectiveness of each signal, by 
itself and together with other signals; (3) de¬ 
termining the appropriate weight for each sig¬ 
nal in the model; and (4) blending the model's 
views with market equilibrium to arrive at rea¬ 
sonable forecasts for expected returns. 

Identifying a list of potential signals might 
seem like an overwhelming task; the candidate 
pool can seem almost endless. To narrow the 
list, it is important to start with fundamental re¬ 
lationships and sound economics. Reports pub¬ 
lished by Wall Street analysts and books about 
financial statement analysis are both good 
sources for ideas. Another valuable resource is 
academic research in finance and accounting. 
Academics have the incentive and expertise to 
identify and carefully analyze new and innova¬ 
tive information sources. Academics have stud¬ 
ied a large number of stock price anomalies, and 
Table 2 lists several that have been adopted by 
investment managers. (For evidence on the per¬ 
formance of several well-known anomalies, see 
Fama and French [2008].) 

For portfolio managers intent on building a 
successful investment strategy, it is not enough 
to simply take the best ideas identified by others 
and add them to the return-forecasting model. 
Instead, each potential signal must be thor¬ 
oughly tested to ensure it works in the con¬ 
text of the manager's strategy across many 
stocks and during a variety of economic envi- 


Table 2 Selected Stock Price Anomalies Used in 
Quantitative Models 


Growth/Value: Value stocks (high B/P, E/P, CF/P) 
outperform growth stocks (low B/P, E/P, CF/P). 

Post-earnings-announcement drift: Stocks that announce 
earnings that beat expectations outperform stocks that 
miss expectations. 

Short-term price reversal: One-month losers outperform 
one-month winners. 

Intermediate-term price momentum: Six-months to 
one-year winners outperform losers. 

Earnings quality: Stocks with cash earnings outperform 
stocks with non-cash earnings. 

Stock repurchases: Companies that repurchase shares 
outperform companies that issue shares. 

Analyst earnings estimates and stock recommendations: 
Changes in analyst stock recommendations and 
earnings estimates predict subsequent stock returns. 


ronments. The real challenge is winnowing the 
list of potential signals to a parsimonious set 
of reliable forecasting variables. When select¬ 
ing a set of signals, it is a good idea to include 
a variety of variables to capture distinct invest¬ 
ment themes, including valuation, momentum, 
and earnings quality. By diversifying over in¬ 
formation sources and variables, there is a good 
chance that if one signal fails to add value an¬ 
other will be there to carry the load. 

When evaluating a signal, it is important 
to make sure the underlying data used to 
compute the signal are available and largely 
error free. Checking selected observations 
by hand and screening for outliers or other 
influential observations is a useful way to 
identify data problems. It is also sometimes 
necessary to transform a signal—for instance, 
by subtracting the industry mean or taking the 
natural logarithm—to improve the "shape" of 
the distribution. To evaluate a signal properly, 
both univariate and multivariate analysis is im¬ 
portant. Univariate analysis provides evidence 
on the signal's predictive ability when the sig¬ 
nal is used alone, whereas multivariate analysis 
provides evidence on the signal's incremental 
predictive ability above and beyond other 
variables considered. For both univariate and 
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multivariate analysis, it is wise to examine the 
returns to a variety of portfolios formed on the 
basis of the signal. Sorting stocks into quintiles 
or deciles is popular, as is regression analysis, 
where the coefficients represent the return to 
a portfolio with unit exposure to the signal. 
These portfolios can be equal weighted, cap 
weighted, or even risk weighted depending 
on the model's ultimate purpose. Finally, the 
return forecasting model should be tested 
using a realistic simulation that controls the 
target level of risk, takes account of transaction 
costs, and imposes appropriate constraints 
(e.g., the nonnegativity constraint for long-only 
portfolios). In our experience, many promising 
return-forecasting signals fail to add value in re¬ 
alistic back-tests—either because they involve 
excessive trading; work only for small, illiquid 
stocks; or contain information that is already 
captured by other components of the model. 

The third step in building a return fore¬ 
casting model is determining each signal's 
weight. When computing expected returns, 
more weight should be put on signals that, over 
time, have been more stable; generated higher 
and more consistent returns; and provided su¬ 
perior diversification benefits. Maintaining ex¬ 
posures to signals that change slowly requires 
less trading, and hence lower costs, than is 
the case for signals that change rapidly. Other 
things being equal, a stable signal (such as the 
ratio of book-to-market equity) should get more 
weight than a less stable signal (such as one- 
month price reversal). High, consistent returns 
are essential to a profitable, low-risk investment 
strategy; hence, signals that generate high re¬ 
turns with little risk should get more weight 
than signals that produce lower returns with 
higher risk. Finally, signals with more diver¬ 
sified payoffs should get more weight because 
they can hedge overall performance when other 
signals in the model perform poorly. 

The last step in forecasting returns is to make 
sure the forecasts are reasonable and internally 
consistent by comparing them with equilibrium 
views. Return forecasts that ignore equilibrium 


expectations can create problems in the port¬ 
folio construction step. Seemingly reasonable 
return forecasts can cause an optimizer to 
maximize errors rather than expected returns, 
producing extreme, unbalanced portfolios. The 
problem is caused by return forecasts that are 
inconsistent with the assumed correlations 
across stocks. If two stocks (or subportfolios) 
are highly correlated, then the equilibrium 
expectation is that their returns should be 
similar; otherwise, the optimizer will treat the 
pair of stocks as a (near) arbitrage opportunity 
by going extremely long the high-return stock 
and extremely short the low-return stock. 
However, with hundreds of stocks, it is not 
always obvious whether certain stocks, or com¬ 
binations of stocks, are highly correlated and 
therefore ought to have similar return forecasts. 
The Black-Litterman model was specifically 
designed to alleviate this problem. It blends a 
model's raw return forecasts with equilibrium 
expected returns —which are the returns that 
would make the benchmark optimal for a given 
risk model—to produce internally consistent 
return forecasts that reflect the manager's 
(or model's) views yet are consistent with 
the risk model. (For a discussion of how to 
use the Black-Litterman model to incorporate 
equilibrium views into a return-forecasting 
model, see Litterman [2003].) 

Forecasting Risks 

In a portfolio context, the risk of a single 
stock is a function of the variance of its re¬ 
turns, as well as the covariances between its 
returns and the returns of other stocks in the 
portfolio. The variance-covariance matrix of 
stock returns, or risk model, is used to mea¬ 
sure the risk of a portfolio. For equity port¬ 
folio management, investors rarely estimate 
the full variance-covariance matrix directly be¬ 
cause the number of individual elements is too 
large, and for a well-behaved (that is, non¬ 
singular) matrix, the number of observations 
used to estimate the matrix must significantly 
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exceed the number of stocks in the matrix. 
To see this, suppose that there are N stocks. 
Then the variance-covariance matrix has N(N + 
l)/2 elements, consisting of N variances and 
N(N — l)/2 covariances. For an S&P 500 portfo¬ 
lio, for instance, there are 500 x (500 + l)/2 = 
125,250 unknown parameters to estimate, 500 
variances and 124,750 covariances. For this rea¬ 
son, most equity portfolio managers use a factor 
risk model in which individual variances and co- 
variances are expressed as a function of a small 
set of stock characteristics—such as industry 
membership, size, and leverage. This greatly re¬ 
duces the number of unknown risk parameters 
that the manager needs to estimate. 

When developing an equity factor risk model, 
it is a good idea to include all of the variables 
used to forecast returns among the (potentially 
larger) set of variables used to forecast risks. 
This way, the risk model "sees" all of the poten¬ 
tial risks in an investment strategy, both those 
managers are willing to accept and those they 
would like to avoid. Further, a mismatch be¬ 
tween the variables in the return and risk mod¬ 
els can produce less efficient portfolios in the 
optimizer. For instance, suppose a return model 
comprises two factors, each with 50% weight: 
the book-to-price ratio (B/P) and return on eq¬ 
uity (ROE). Suppose the risk model, on the other 
hand, has only one factor: B/P. When form¬ 
ing a portfolio, the optimizer will manage risk 
only for the factors in the risk model—that is, 
B/P but not ROE. This inconsistency between 
the return and risk models can lead to port¬ 
folios with extreme positions and higher-than- 
expected risk. The portfolio will not reflect the 
original 50-50 weights on the two return factors 
because the optimizer will dampen the expo¬ 
sure to B/P, but not to ROE. In addition, the 
risk modeFs estimate of tracking error will be 
too low because it will not capture any risk from 
the portfolio's exposure to ROE. The most ef¬ 
fective way to avoid these two problems is to 
make sure all of the factors in the return model 
are also included in the risk model (although 
the converse does not need to be true—that 


is, there can be risk factors without expected 
returns). 

A final issue to consider when developing or 
selecting a risk model is the frequency of data 
used in the estimation process. Many popular 
risk models use monthly returns, whereas some 
portfolio managers have developed propri¬ 
etary risk models that use daily returns. Clearly, 
when estimating variances and covariances, the 
more observations, the better. Fligh-frequency 
data produce more observations and hence 
more precise and reliable estimates. Further, 
by giving more weight to recent observations, 
estimates can be more responsive to changing 
economic conditions. As a result, risk models 
that use high-frequency returns should provide 
more accurate risk estimates. (For a detailed 
discussion of factor risk models, see Chapter 20 
of Litterman [2003]). 

Forecasting Transaction Costs 

Although often overlooked, accurate trade-cost 
estimates are critical to the equity portfolio 
management process. After all, what really mat¬ 
ters is not the gross return a portfolio might 
receive, but rather the actual return a portfolio 
does receive after deducting all relevant costs, 
including transaction costs. Ignoring transac¬ 
tion costs when forming portfolios can lead 
to poor performance because implementation 
costs can reduce, or even eliminate, the advan¬ 
tages achieved through superior stock selection. 
Conversely, taking account of transaction costs 
can help produce portfolios with gross returns 
that exceed the costs of trading. 

Accurate trading-cost forecasts are also im¬ 
portant after portfolio formation, when mon¬ 
itoring the realized costs of trading. A good 
transaction-cost model can provide a bench¬ 
mark for what realized costs "should be," 
and hence whether actual execution costs 
are reasonable. Detailed trade-cost monitor¬ 
ing can help traders and brokers achieve best 
execution by driving improvements in trad¬ 
ing methods—such as more patient trading. 
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or the selective use of alternative trading 
mechanisms. 

Transaction costs have two components: (1) 
explicit costs, such as commissions and fees; 
and (2) implicit costs, or market impact. Com¬ 
missions and fees tend to be relatively small, 
and the cost per share does not depend on the 
number of shares traded. In contrast, market 
impact costs can be substantial. They reflect the 
costs of consuming liquidity from the market, 
costs that increase on a per-share basis with the 
total number of shares traded. 

Market impact costs arise because suppliers 
of liquidity incur risk. One component of these 
costs is inventory risk. The liquidity supplier 
has a risk/return trade-off, and will demand a 
price concession to compensate for this inven¬ 
tory risk. The larger the trade size and the more 
illiquid or volatile the stock, the larger are in¬ 
ventory risk and market impact costs. Another 
consideration is adverse selection risk. Liquid¬ 
ity suppliers are willing to provide a better price 
to uninformed than informed traders, but since 
there is no reliable way to distinguish between 
these two types of traders, the market maker 
sets an average price, with expected gains from 
trading with uninformed traders compensating 
for losses incurred from trading with informed 
traders. Market impact costs tend to be higher 
for low-price and small-cap stocks for which 
greater adverse selection risk and informational 
asymmetry tend to be more severe. 

Forecasting price impact is difficult. Because 
researchers only observe prices for completed 
trades, they cannot determine what a stock's 
price would have been without these trades. 
It is therefore impossible to know for sure how 
much prices moved as a result of the trade. Price 
impact costs, then, are statistical estimates that 
are more accurate for larger data samples. 

One approach to estimating trade costs is to 
directly examine the complete record of mar¬ 
ket prices, tick by tick (see, for example, Breen, 
Hodrick, and Korajczyk [2002]). These data are 
noisy due to discrete prices, non-synchronous 
reporting of trades and quotes, and input er¬ 


rors. Also, the record does not show orders 
placed, just those that eventually got executed 
(which may have been split up from the orig¬ 
inal, larger order). Research by Lee and Rad- 
hakrishna (2000) suggests empirical analysis 
should be done using aggregated samples of 
trades rather than individual trades at the tick- 
by-tick level. 

Another approach is for portfolio managers 
to estimate a proprietary transaction cost model 
using their own trades and, if available, those 
of comparable managers. If generating a suf¬ 
ficient sample is feasible, this approach is 
ideal because the resulting model matches the 
stock characteristics, investment philosophy, 
and trading strategy of the individual port¬ 
folio manager. There is a large academic lit¬ 
erature on measuring transaction costs. Fur¬ 
ther, models built from actual trading records 
provide a complementary source of informa¬ 
tion on market impact costs. (For empirical evi¬ 
dence on how transaction costs can vary across 
trade characteristics and how to predict transac¬ 
tion costs, see Chapter 23 of Litterman [2003].) 


CONSTRUCTING 

PORTFOLIOS 

In this section we discuss how to construct port¬ 
folios based on the forecasts described in the 
last section. In particular, we compare ad hoc, 
rule-based approaches to portfolio optimiza¬ 
tion. The first step in portfolio construction, 
however, is to specify the investment goals. 
While having good forecasts (as described in 
the previous section) is obviously important, 
the investor's goals define the portfolio man¬ 
agement problem. These goals are usually spec¬ 
ified by three major parameters: the benchmark, 
the risk/return target, and specific restrictions 
such as the maximum holdings in any single 
name, industry, or sector. 

The benchmark represents the starting point 
for any active portfolio; it is the client's neu¬ 
tral position—a low-cost alternative to active 
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management in that asset class. For example, 
investors interested in holding large-cap U.S. 
stocks might select the S&P 500 or Russell 1000 
as their benchmark, while investors interested 
in holding small-cap stocks might choose the 
Russell 2000 or the S&P 600. Investors inter¬ 
ested in a portfolio of non-U.S. stocks could 
pick the FTSE 350 (United Kingdom), TOPIX 
(Japan), or MSCI EAFE (World minus North 
America) indexes. There are a large number 
of published benchmarks available, or an in¬ 
vestor might develop a customized benchmark 
to represent the neutral position. In all cases, 
however, the benchmark should be a reason¬ 
ably low-cost, investable alternative to active 
management. 

Although some investors are content to 
merely match the returns on their benchmarks, 
most investors allocate at least some of their as¬ 
sets to active managers. The allocation of risk 
is done via risk budgeting. In equity portfolio 
management, active management means over¬ 
weighting attractive stocks and underweight¬ 
ing unattractive stocks relative to their weights 
in the benchmark. (The difference between a 
stock's weight in the portfolio and its weight 
in the benchmark is called its active weight, 
where a positive active weight corresponds to 
an overweight position and a negative active 
weight corresponds to an underweight posi¬ 
tion.) Of course, there is always a chance that 
these active weighting decisions will cause the 
portfolio to underperform the benchmark, but 
one of the basic dictums of modern finance is 
that to earn higher returns, investors must ac¬ 
cept higher risk—which is true of active returns 
as well as total returns. 

A portfolio's tracking error measures its risk 
relative to a benchmark. Tracking error equals 
the time-series standard deviation of a port¬ 
folio's active return (which is the difference 
between the portfolio's return and that of 
the benchmark). A portfolio's information ratio 
equals its average active return divided by its 
tracking error. As a measure of return per unit 
of risk, the information ratio provides a conve¬ 


nient way to compare strategies with different 
active risk levels. 

An efficient portfolio is one with the highest ex¬ 
pected return for a target level of risk—that is, it 
has the highest information ratio possible given 
the risk budget. In the absence of constraints, an 
efficient portfolio is one in which each stock's 
marginal contribution to expected return is pro¬ 
portional to its marginal contribution to risk. 
That is, there are no unintended risks, and all 
risks are compensated with additional expected 
returns. How can a portfolio manager construct 
such an efficient portfolio? Below we compare 
two approaches: (1) a rule-based system; and 
(2) portfolio optimization. 

Building an efficient portfolio is a com¬ 
plex problem. To help simplify this compli¬ 
cated task, many portfolio managers use ad 
hoc, rule-based methods that partially control 
exposures to a small number of risk factors. 
For example, one common approach—called 
stratified sampling—ranks stocks within buck¬ 
ets formed on the basis of a few key risk fac¬ 
tors, such as sector and size. The manager 
then invests more heavily in the highest-ranked 
stocks within each bucket, while keeping the 
portfolio's total weight in each bucket close to 
that of the benchmark. The resulting portfolio 
is close to neutral with respect to the identified 
risk factors (that is, sector and size) while over¬ 
weighting attractive stocks and underweight¬ 
ing unattractive stocks. 

Although stratified sampling may seem 
sensible, it is not very efficient. Numerous 
unintended risks can creep into the portfolio, 
such as an overweight in high-beta stocks, 
growth stocks, or stocks in certain subsectors. 
Nor does it allow the manager to explicitly 
consider trading costs or investment objectives 
in the portfolio construction problem. Portfolio 
optimization provides a much better method 
for balancing expected returns against different 
sources of risk, trade costs, and investor 
constraints. An optimizer uses computer algo¬ 
rithms to find the set of weights (or holdings) 
that maximize the portfolio's expected return 
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(net of trade costs) for a given level of risk. 
It minimizes uncompensated sources of risk, 
including sector and style biases. Fortunately, 
despite the complex math, optimizers require 
only the various forecasts we've already 
described and developed in the prior section. 

Chapter 23 of Litterman (2003) demonstrates 
the benefits of optimization, comparing two 
portfolios: one constructed using stratified sam¬ 
pling and the other constructed using an op¬ 
timizer. The optimized portfolio is designed to 
have the same predicted tracking error as the 
rule-based portfolio. The results show that (1) the 
optimized portfolio is more efficient in terms 
of its expected alpha and information ratio for 
the same level of risk, (2) risk is spread more 
broadly for the optimized portfolio compared 
to the rule-based portfolio, (3) more of the risk 
budget in the optimized portfolio is due to the 
factors that are expected to generate positive ex¬ 
cess returns, and (4) the forecast beta for the op¬ 
timized portfolio is closer to 1.0, as unintended 
sources of risk (such as the market timing) are 
minimized. 

Another benefit of optimizers is that they 
can efficiently account for transaction costs, 
constraints, selected restrictions, and other 
account guidelines, making it much easier to 
create customized client portfolios. Of course, 
when using an optimizer to construct efficient 
portfolios, reliable inputs are essential. Data 
errors that add noise to the return, risk, and 
transaction cost forecasts can lead to portfolios 
in which these forecast errors are maximized. 
Instead of picking stocks with the highest 
actual expected returns, or the lowest actual 
risks or transaction costs, the optimizer takes 
the biggest positions in the stocks with the 
largest errors, namely, the stocks with the great¬ 
est overestimates of expected returns or the 
greatest underestimates of risks or transaction 
costs. A robust investment process will screen 
major data sources for outliers that can severely 
corrupt one's forecasts. Further, as described 
in the previous section, return forecasts should 
be adjusted for equilibrium views using the 


Black-Litterman model to produce final return 
forecasts that are more consistent with risk es¬ 
timates, and with each other. Finally, portfolio 
managers should impose sensible, but simple, 
constraints on the optimizer to help guard 
against the effects of noisy inputs. These con¬ 
straints could include maximum active weights 
on individual stocks, industries, or sectors, 
as well as limitations on the portfolio's active 
exposure to factors such as size or market beta. 

TRADING 

Trading is the process of executing the orders 
derived in the portfolio construction step. To 
trade a list of stocks efficiently, investors must 
balance opportunity costs and execution price 
risk against market impact costs. Trading each 
stock quickly minimizes lost alpha and price 
uncertainty due to delay, but impatient trad¬ 
ing incurs maximum market impact. Flowever, 
trading more patiently over a longer period re¬ 
duces market impact but incurs larger opportu¬ 
nity costs and short-term execution price risk. 
Striking the right balance is one of the keys to 
successful trade execution. 

The concept of "striking a balance" suggests 
optimization. Investors can use a trade opti¬ 
mizer to balance the gains from patient trad¬ 
ing (e.g., lower market-impact cost) against the 
risks (e.g., greater deviation between the exe¬ 
cution price and the decision price; potentially 
higher short-term tracking error). Such an op¬ 
timizer will tend to suggest aggressive trading 
for names that are liquid and / or have a large 
effect on portfolio risk, while suggesting patient 
trading for illiquid names that have less impact 
on risk. A trade optimizer can also easily handle 
most real-world trading constraints, such as the 
need to balance cash in each of many accounts 
across the trading period (which may last 
several days). 

A trade optimizer can also easily accommo¬ 
date the time horizon of a manager's views. 
That is, if a manager is buying a stock primarily 
for long-term valuation reasons, and the excess 
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return is expected to accrue gradually over 
time, then the optimizer will likely suggest a 
patient trading strategy (all else being equal). 
Conversely, if the manager is buying a stock 
in expectation of a positive earnings surprise 
tomorrow, the optimizer is likely to suggest 
an aggressive trading strategy (again, all else 
being equal). The trade optimizer can also be 
programmed to consider short-term return 
regularities, such as the tendency of stocks with 
dramatic price moves on one day to continue 
those moves on the next day before reversing 
the following day (see Heston, Korajczyk, 
and Sadka, 2010). Although these types of 
regularities may be too small to cover trading 
costs, and should not be used to initiate trades, 
they can be used to help minimize trading costs 
after an investor has independently decided to 
trade (see Engle and Ferstenberg, 2007). 

To induce traders to follow the desired strat¬ 
egy (that is, that suggested by the trade opti¬ 
mizer), the portfolio manager needs to give the 
trader an appropriate benchmark, which pro¬ 
vides guidance about how aggressively or pa¬ 
tiently to trade. Two widely used benchmarks 
for aggressive trades are the closing price on the 
previous day and the opening price on the trade 
date. Because the values of these two bench¬ 
marks are measured prior to any trading, a 
patient strategy that delays trading heightens 
execution price risk by increasing the possibil¬ 
ity of deviating significantly from the bench¬ 
mark. Another popular execution benchmark 
is the volume-weighted average price (VWAP) 
for the stock over the desired trading period, 
which could be a few minutes or hours for 
an aggressive trade, or one or more days for 
a patient trade. However, the VWAP bench¬ 
mark should only be used for trades that are not 
too large relative to total volume over the pe¬ 
riod; otherwise, the trader may be able to influ¬ 
ence the benchmark against which he or she is 
evaluated. 

Buy-side traders can increasingly make use 
of algorithmic trading, or computer algorithms 
that directly access market exchanges, to auto¬ 


matically make certain trading decisions such 
as the timing, price, quantity, type, and routing 
of orders. These algorithms may dynamically 
monitor market conditions across time and 
trading venues, and reduce market impact by 
breaking large orders into smaller pieces, em¬ 
ploying either limit orders or marketable limit 
orders, or selecting trading venues to submit 
orders, while closely tracking trading bench¬ 
marks. Algorithmic trading provides buy-side 
traders more anonymity and greater control 
over their order flow, but tends to work better 
for more liquid or patient trades. 

Principal package trading is another way to 
lower transaction costs relative to traditional 
agency methods (see Kavajecz and Keim, 2005). 
Principal trades may be crossed with the princi¬ 
pal's existing inventory positions, or allow the 
portfolio manager to benefit from the longer 
trading horizon and superior trading ability of 
certain intermediaries. 


EVALUATING RESULTS AND 
UPDATING THE PROCESS 

Once an investment process is up and run¬ 
ning, it needs to be constantly reassessed and, if 
necessary, refined. The first step is to compare 
actual results to expectations; if realizations dif¬ 
fer enough from expectations, process refine¬ 
ments may be necessary. Thus, managers need 
systems to monitor realized performance, risk, 
and trading costs and compare them to prior 
expectations. 

A good performance monitoring system 
should be able to determine not only the de¬ 
gree of over- or under-performance, but also the 
sources of these excess returns. For example, 
a good performance attribution system might 
break excess returns down into those due to 
market timing (having a different beta than 
the benchmark), industry tilts, style differences, 
and stock selection. Such systems are avail¬ 
able from a variety of third-party vendors. An 
even better system would allow the manager to 
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further disaggregate returns to see the effects of 
each of the proprietary signals used to forecast 
returns, as well as the effects of constraints and 
other portfolio requirements. And, of course, 
any system will be more accurate if it can ac¬ 
count for daily trading and changes in portfolio 
exposures. 

Investors should also compare realized risks 
to expectations. For example, Goldman Sachs 
has developed the concept of the green, yellow, 
and red zones to compare realized and targeted 
levels of risk (see Chapter 17 in Litterman, 2003). 
Essentially, if realized risk is within a reasonable 
band around the target (that is, the green zone), 
then one can assume the risk management tech¬ 
niques are working as intended and no action 
is required. If realized risk is further from the 
target (the yellow zone), the situation may re¬ 
quire closer examination, and if realized risk is 
far from the target (the red zone), some action 
is usually called for. 

Finally, it is important to monitor trading 
costs. Are they above or below the costs as¬ 
sumed when making trading decisions? Are 
they above or below competitors' costs? Are 
they too high in an absolute sense? If so, man¬ 
agers may need to improve their trade cost esti¬ 
mates, trading process, or both. There are many 
services that can report realized trade costs, but 
most are available with a significant lag, and are 
inflexible with respect to how they measure and 
report these costs. With in-house systems, how¬ 
ever, managers can compare a variety of trade 
cost estimation techniques and get the feedback 
in a timely enough fashion to act on the results. 

The critical question, of course, is what to do 
with the results of these monitoring systems: 
When do variations from expectations warrant 
refinements to the process? This will depend 
on the size of the variations and their persis¬ 
tence. For example, a manager probably would 
not throw out a stock-selection signal after one 
bad month—no matter how bad—but might 
want to reconsider after many years of poor 
performance, taking into consideration the eco¬ 


nomic environment and any external factors 
that might explain the results. It is also im¬ 
portant to compare the underperformance to 
historical simulations. Have similar periods oc¬ 
curred in the past, and if so, were they followed 
by improvements? In this case, the underperfor¬ 
mance is part of the normal risk in that signal 
and no changes may be called for. If not, there 
may have been a structural change that might 
invalidate the signal going forward—for exam¬ 
ple, if the signal has become overly popular, it 
may no longer be a source of mispricing. 

Similarly, the portfolio manager needs to con¬ 
sider the source of any differences between ex¬ 
pectations and realizations. For example, was 
underperformance due to faulty signals, port¬ 
folio constraints, unintended risk, or random 
noise? The answer will determine the proper 
response. If constraints are to blame, they may 
be lifted—but only if doing so would not vi¬ 
olate any investment guidelines or incur ex¬ 
cessive risk. Alternatively, if the signals are to 
blame, the manager must decide if the devia¬ 
tions from expectations are temporary or more 
enduring. If it is just random noise, no action 
is necessary. Similarly, any differences between 
realized and expected risk could be due to poor 
risk estimates or poor portfolio construction, 
with the answer determining the response. Fi¬ 
nally, excessive trading costs (versus expecta¬ 
tions) could reflect poor trading or poor trade 
cost estimates, again with different implications 
for action. 

In summary, ongoing performance, risk, and 
trade cost monitoring is an integral part of 
the equity portfolio management process and 
should get equal billing with forecasting, port¬ 
folio construction, and trading. Monitoring 
serves as both quality control and a source 
of new ideas and process improvements. The 
more sophisticated the monitoring systems, the 
more useful they are to the process. And al¬ 
though the implications of monitoring involve 
subtle judgments and careful analysis, better 
data can lead to better solutions. 
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KEY POINTS 

* Two popular ways to manage equity port¬ 
folios are the traditional, or qualitative, ap¬ 
proach and the quantitative approach. 

* The equity investment process comprises four 
primary steps: (1) forecasting returns, risks, 
and transaction costs; (2) constructing portfo¬ 
lios that maximize expected risk-adjusted re¬ 
turn net of transaction costs; (3) trading stocks 
efficiently; and (4) evaluating results and up¬ 
dating the process. 

* There are four closely linked steps to build¬ 
ing a quantitative equity return-forecasting 
model: (1) identifying a set of potential re¬ 
turn forecasting variables, or signals; (2) test¬ 
ing the effectiveness of each signal, by itself 
and together with other signals; (3) determin¬ 
ing the appropriate weight for each signal in 
the model; and (4) blending the model's views 
with market equilibrium to arrive at reason¬ 
able forecasts for expected returns. 

* Most quantitative equity portfolio managers 
use a factor risk model in which individual 
variances and covariances are expressed as a 
function of a small set of stock characteris¬ 
tics such as industry membership, size, and 
leverage. 

* Transaction costs consist of explicit costs, such 
as commissions and fees, and implicit costs, 
or market impact. The per-share cost of com¬ 
missions and fees does not depend on the 
number of shares traded, whereas market im¬ 
pact costs increase on a per-share basis with 
the total number of shares traded. 

* Tracking error measures a portfolio's risk rel¬ 
ative to a benchmark. Tracking error equals 
the time-series standard deviation of a port¬ 
folio's active return, the difference between 
the portfolio's return and that of the bench¬ 
mark. 

* Information ratio is a measure of return per 
unit of risk, a portfolio's average active return 
divided by its tracking error. 

* Two widely used ways to construct an effi¬ 
cient portfolio are stratified sampling, which 


is a rule-based system, and portfolio opti¬ 
mization. 

• To trade a list of stocks efficiently, investors 
must balance opportunity costs and execu¬ 
tion price risk against market impact costs. 
Trading each stock quickly minimizes lost 
alpha and price uncertainty due to delay, but 
impatient trading incurs maximum market 
impact. Trading more patiently over a longer 
period reduces market impact but incurs 
larger opportunity costs and short-term 
execution price risk. 

• Once an investment process is operational, it 
should be constantly reassessed and, if neces¬ 
sary, refined. Thus, managers need systems 
to monitor realized performance, risk, and 
trading costs and compare them to prior ex¬ 
pectations. 

• A good performance monitoring system 
should be able to determine the degree of 
over- or underperformance as well as the 
sources of these excess returns, such as market 
timing, industry tilts, style differences, and 
stock selection. 
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Abstract: One of the key tasks in seeking to generate attractive returns is producing realistic and 
reasonable return expectations and forecasts. In the Markowitz mean-variance framework, an 
investor's objective is to choose a portfolio of securities that has the largest expected return for 
a given level of risk (as measured by the portfolio's variance). In the case of common stock, by 
return (or expected return) of a stock, we mean the change (or expected change) in the stock price 
over the period, plus any dividends paid, divided by the starting price. Of course, since we do not 
know the true values of the securities' expected returns and covariances, these must be estimated or 
forecasted. Equity portfolio managers have used various statistical models for forecasting returns 
and risk. These models, referred to as predictive return models, make conditional forecasts of 
expected returns using the current information set. Predictive return models include regressive 
models, linear autoregressive models, dynamic factor models, and hidden-variable models. 


In contrast to forecasting events such as the 
weather, forecasting stock prices and returns 
is difficult because the predictions themselves 
will produce market movements that in turn 
provoke immediate changes in prices, thereby 
invalidating the predictions themselves. This 
leads to the concept of market efficiency: 
An efficient market is a market where all 
new information about the future behavior 
of prices is immediately impounded in the 
prices themselves and therefore exploits all 
information. 


Actually the debate about the predictability 
of stock prices and returns has a long history. 1 
More than 75 years ago, Cowles (1933) asked the 
question: "Can stock market forecasters fore¬ 
cast?" Armed with the state-of-the-art econo¬ 
metric tools at the time, Cowles analyzed the 
recommendations of stock market forecasters 
and concluded, "It is doubtful." Subsequent 
academic studies support Cowles's conclusion. 
However, the history goes further back. In 1900, 
a French mathematician, Louis Bachelier, in his 
doctoral dissertation in mathematical statistics 
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titled Theorie de la Speculation (The Theory of 
Specidation), showed using mathematical tech¬ 
niques why the stock market behaves as it 
does . 2 He also provided empirical evidence 
based on the French capital markets at the turn 
of the century. He wrote: 

Past, present, and even discounted future events are 
reflected in market price, but often show no appar¬ 
ent relation to price changes... . [AJrtificial causes 
also intervene: the Exchange reacts on itself, and 
the current fluctuation is a function, not only of the 
previous fluctuations, but also of the current state. 

The determination of these fluctuations depends on 
an infinite number of factors; it is, therefore, im¬ 
possible to aspire to mathematical predictions of 
it.... [T]he dynamics of the Exchange will never 
be an exact science. (Bachelier, 1900) 

In other words, according to Bachelier, stock 
price movements are difficult to forecast and 
even explain after the fact. 

Despite this conclusion, the adoption of 
modeling techniques by asset management 
firms has greatly increased since the turn 
of the century. Models to predict expected 
returns are routinely used at asset manage¬ 
ment firms. In most cases, it is a question of 
relatively simple models based on factors or 
predictor variables. However, more statistical 
or econometric-oriented models are also being 
experimented with and adopted by some asset 
management firms, as well as what are referred 
to as nonlinear models based on specialized 
areas of statistics such as neural networks and 
genetic algorithms. 

Historical data are often used for forecast¬ 
ing future returns as well as estimating risk. 
For example, a portfolio manager might pro¬ 
ceed in the following way: Observing weekly or 
monthly returns, the portfolio manager might 
use the past five years of historical data to es¬ 
timate the expected return and the covariances 
by the sample mean and sample covariances. 
The portfolio manager would then use these as 
inputs for mean-variance optimization, along 
with any ad hoc adjustments to reflect any 
views about expected returns on future perfor¬ 
mance. Unfortunately this historical approach 


most often leads to counterintuitive, unstable, 
or merely "wrong" portfolios generated by the 
mean-variance optimization model. Better fore¬ 
casts are necessary. Statistical estimates can be 
very noisy and typically depend on the quality 
of the data and the particular statistical tech¬ 
niques used to estimate the inputs. In general, 
it is desirable that an estimator of expected re¬ 
turn and risk have the following properties: 

• It provides a forward-looking forecast with 
some predictive power, not just a backward¬ 
looking historical summary of past perfor¬ 
mance. 

• The estimate can be produced at a reasonable 
computational cost. 

• The technique used does not amplify errors 
already present in the inputs used in the pro¬ 
cess of estimation. 

• The forecast should be intuitive, that is, the 
portfolio manager should be able to explain 
and justify them in a comprehensible manner. 

In this entry, we look at the issue of whether 
forecasting stock returns can be done so as to 
generate trading profits and excess returns. Be¬ 
cause the issue about predictability of stock re¬ 
turns or prices requires an understanding of 
statistical concepts, we will provide a brief de¬ 
scription of the relevant concepts in probability 
theory and statistics. We then discuss the dif¬ 
ferent types of predictive return models that 
are used by portfolio managers. 


THE CONCEPT OF 
PREDICTABILITY 

To predict (or forecast) involves forming an 
expectation of a future event or future events. 
Since ancient times it has been understood that 
the notion of predicting the future is subject to 
potential inconsistencies. Consider what might 
happen if one receives a highly reliable predic¬ 
tion that tomorrow one will have a car accident 
driving to work. This might alter one's behav¬ 
ior such that a decision is made not to go to 
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work. Hence, one's behavior will be influenced 
by the prediction, thus potentially invalidating 
the prediction. It is because of inconsistencies 
of this type that two economists in the mid 
1960s, Paul Samuelson and Eugene Fama, 
arrived at the apparently paradoxical conclu¬ 
sion that "properly anticipated prices fluctuate 
randomly." 3 

The concept of forecastability rests on how 
one can forecast the future given the current 
state of knowledge. In probability theory, the 
state of knowledge on a given date is referred 
to as the information set known at that date. Fore¬ 
casting is the relationship between the informa¬ 
tion set today and future events. By altering 
the information set, the forecast changes. How¬ 
ever, the relationship between the information 
set and the future is fixed and immutable. Aca¬ 
demicians and market practitioners adopt in 
finance theories this concept of forecastability. 
Prices or returns are said to be forecastable if the 
knowledge of the past influences our forecast of 
the future. For example, if the future returns of 
a firm's stock depend on the value of key finan¬ 
cial ratios, then those returns are predictable. If 
the future returns of that stock do not depend 
on any variable known today, then returns are 
unpredictable. 

As explained in the introduction to this entry, 
the merits of stock return forecasting is an on¬ 
going debate. There are two beliefs that seem to 
be held in the investment community. First, pre¬ 
dictable processes allow investors to earn excess 
returns. Second, unpredictable processes do not 
allow investors to earn excess returns. Neither 
belief is necessarily true. Understanding why 
will shed some light on the crucial issues in 
the debate regarding return modeling. The rea¬ 
sons can be summed up as follows. First, pre¬ 
dictable processes do not necessarily produce 
excess returns if they are associated with un¬ 
favorable risk. Second, unpredictable expecta¬ 
tions can be profitable if the expected value is 
favorable. 

Because most of our knowledge is uncertain, 
our forecasts are also uncertain. Probability the¬ 


ory provides the conceptual tools to represent 
and measure the level of uncertainty. 4 Proba¬ 
bility theory assigns a number—referred to as 
the "probability"—to every possible event. This 
number, the probability, might be interpreted in 
one of two ways. The first is that a probability 
is the "intensity of belief" that an event will oc¬ 
cur, where a probability of 1 means certainty. 5 
The second interpretation is the one normally 
used in statistics: Probability is the percentage 
of times (i.e., frequency) that a particular event 
is observed in a large number of observations 
(or trials). 6 This interpretation of probability is 
the frequentist interpretation, also referred to 
as the relative frequency concept of probability. 
Although it is this interpretation that is used 
in finance and the one adopted in this book, 
there are attempts to apply the subjective inter¬ 
pretation to financial decision making using an 
approach called the Bayesian approach. 7 

With this background, let's consider again the 
returns of some stock. Suppose that returns are 
unpredictable in the sense that future returns 
do not depend on the current information set. 
This does not mean that future returns are com¬ 
pletely uncertain in the same sense in which the 
outcome of throwing a die is uncertain. Clearly, 
we cannot believe that every possible return on 
the stock is equally likely: There are upper and 
lower bounds for real returns in an economy. 
More important, if we collect a series of histor¬ 
ical returns for a stock, a distribution of returns 
would be observed. 

It is therefore reasonable to assume that our 
uncertainty is embodied in a probability distri¬ 
bution of returns. The absence of predictability 
means that the distribution of future returns 
does not change as a function of the current in¬ 
formation set. More specifically, the distribution 
of future returns does not change as a function 
of the present and past values of prices and 
returns. This entails that the distribution of re¬ 
turns does not change with time. We can there¬ 
fore state that (1) a price or return process is pre¬ 
dictable if its probability distributions depend 
on the current information set, and (2) a price or 
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return process is unpredictable if its probability 
distributions are time-invariant. 

Given the concept of predictability as we have 
just defined it, we can now discuss why prices 
and returns are difficult (or perhaps impossi¬ 
ble) to predict. The key is that any prediction 
that might lead to an opportunity to generate a 
trading profit or an excess return tends to make 
that opportunity disappear. For example, sup¬ 
pose that the price of a stock is predicted to 
increase significantly in the next five trading 
days. A large price increase is a source of trad¬ 
ing profit or excess return. As a consequence, if 
that prediction is widely shared by the invest¬ 
ment community, investors will rush to pur¬ 
chase that stock. But the demand thus induced 
will make the stock's price rise immediately, 
thus eliminating the source of trading profit or 
excess return and invalidating the forecast. 

Suppose that the predictions of stock returns 
were certain rather than uncertain. By a certain 
prediction it is meant a prediction that leaves 
no doubt about what will happen. For example, 
U.S. Treasury zero-coupon securities if held to 
maturity offer a known or certain prediction 
of returns because the maturity value is guar¬ 
anteed by the full faith and credit of the U.S. 
government. Any forecast that leaves open 
the possibility that market forces will alter 
the forecast cannot be considered a certain 
forecast. If stock return predictions are certain, 
then simple arbitrage arguments would dictate 
that all stocks should have the same return. In 
fact, if stock returns could be predicted with 
certainty and if there were different returns, 
then investors would choose only those stocks 
with the highest returns. 

Stock return forecasts are not certain; as we 
have seen, uncertain predictions are embod¬ 
ied in probability distributions. Suppose that 
we have a joint probability distribution of the 
returns of the universe of investable stocks. 
Investors will decide the rebalancing of their 
portfolios depending on their probabilistic pre¬ 
dictions and their risk-return preferences. The 
problem we are discussing here is whether gen¬ 


eral considerations of market efficiency are able 
to determine the mathematical form of price or 
return processes. In particular, we are interested 
in understanding if stock prices or returns are 
necessarily unpredictable. 

The problem discussed in the literature is ex¬ 
pressed roughly as follows. Suppose that re¬ 
turns are a series of random variables. These 
series will be fully characterized by the joint dis¬ 
tributions of returns at any given time t and at 
any given set of different times. Suppose that in¬ 
vestors know these distributions and that they 
select their portfolios according to specific rules 
that depend on these distributions. Can we de¬ 
termine the form of admissible processes, that 
is, of admissible distributions? 

Ultimately, the objective in solving this prob¬ 
lem is to avoid models that allow unreasonable 
inferences. Flistorically, three solutions have 
been proposed: 

1. Returns fluctuate randomly around a given 
mean. 

2. Returns are a fair game. 

3. Returns are a fair game after adjusting for 
risk. 

In statistical terminology, returns fluctuating 
randomly around a given mean refers to re¬ 
turns following multivariate random walks. A fair 
game means that returns are martingales. These 
concepts and their differences will be explained 
below. The first two proposed solutions are in¬ 
correct; the third is too general to be useful for 
asset management. Before we discuss the above 
models of prices, we digress to briefly explain 
some statistical concepts. 

Statistical Concepts of Predictability 
and Unpredictability 

Because we have stressed how we must rely 
on probability to understand the concepts of 
predictability and unpredictability, we will first 
explain the concepts of conditional probabil¬ 
ity, conditional expectation, independent and 
identically distributed random variables, strict 
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white noise, martingale difference sequence, 
and white noise. In addition, we have to un¬ 
derstand the concept of an error term and an 
innovation. 

Conditional probability and conditional 
expectation are fundamental in the prob¬ 
abilistic description of financial markets. 
A conditional probability of some random 
variable X is the probability for X given a 
particular value for another random variable 
Y is known. Similarly, a conditional probability 
distribution can be determined. For the con¬ 
ditional probability distribution, an expected 
value can be computed and is referred to as 
a conditional expected value or conditional 
mean or, more commonly, a conditional 
expectation. 

The statistical concept independent and iden¬ 
tically distributed variables (denoted by IID 
variables) means two conditions about prob¬ 
ability distributions for random variables. First 
consider "independent." This means if we have 
a time series for some random variable, then at 
each time the random variable has a probability 
distribution. By independently distributed, it is 
meant that the probability distributions remain 
the same regardless of the history of past val¬ 
ues for the random variable. "Identically" dis¬ 
tributed means that all returns have the same 
distribution in every time period. These two 
conditions entail that, over time, the mean and 
the variance do not change from period to pe¬ 
riod. In the parlance of the statistician, we have 
a stationary time-series process. 

A strict white noise is a sequence of IID 
variables that have a mean equal to zero and 
a finite variance. Flence, a strict white noise 
is unpredictable in the sense that the condi¬ 
tional probability distribution of the random 
variables is fixed and independent from the 
past. Because a strict white noise is unpre¬ 
dictable, expectations and higher moments 
are unpredictable. Moments are measures to 
summarize the probability distribution. The 
first four moments are expected value or mean 
(location), variance (dispersion), skewness 


(asymmetry), and kurtosis (concentration in 
the tails). The higher moments of a probability 
distribution are those beyond the mean and 
variance, that is skewness and kurtosis. 

A martingale difference sequence is a se¬ 
quence of random variables that have a mean of 
zero that are uncorrelated such that their con¬ 
ditional expectations given the past values of 
the series is always zero. Because expectations 
and conditional expectations are both zero, in 
a martingale difference sequence, expectations 
are unpredictable. Flowever, if higher moments 
exist, they might be predictable. 

A white noise is a sequence of uncorrelated 
random variables with a mean of zero and 
a finite variance. Since the random variables 
are uncorrelated, in a white noise expecta¬ 
tions are linearly unpredictable. Fligher mo¬ 
ments, if they exist, might be predictable. The 
key here is that they are unpredictable using 
a linear model. Flowever, they may be pre¬ 
dicted as nonlinear functions of past values. It 
is for this reason that certain statistical tech¬ 
niques that involve nonlinear functions such as 
neural networks have been used by some quan¬ 
titative asset management firms to try to predict 
expectations. 

Random Walks and Martingales 

In the special case where the random vari¬ 
ables are normally distributed, it can be proven 
that strict white noise, martingale difference 
sequence, and white noise coincide. In fact, 
two uncorrelated, normally distributed random 
variables are also independent. 

We can now define what is meant by an arith¬ 
metic random walk, a martingale, and a strict 
arithmetic random walk that are used to de¬ 
scribe the stochastic process for returns and 
prices as follows: 

• An arithmetic random walk is the sum of 
white-noise terms. The mean of an arithmetic 
random walk is linearly unpredictable but 
might be predictable with nonlinear predic¬ 
tors. Fligher moments might be predictable. 
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• A martingale is the sum of martingale dif¬ 
ference sequence terms. The mean of a 
martingale is unpredictable (linearly and 
nonlinearly); that is, the expectation of a 
martingale coincides with its present value. 
Higher moments might be predictable. 

• A strick random walk is the sum of strict 
white-noise terms. A strict random walk is 
unpredictable: Its mean, variance, and higher 
moments are all unpredictable. 

Error Terms and Innovations 

Any statistical process can be broken down into 
a predictable and an unpredictable component. 
The first component is that which can be pre¬ 
dicted from the past values of the process. The 
second component is that which cannot be pre¬ 
dicted. The component that cannot be predicted 
is called the innovation process. Innovation is 
not specifically related to a model, it is a charac¬ 
teristic of the process. Innovations are therefore 
unpredictable processes. 

Now consider a model that is supposed to ex¬ 
plain empirical data such as predicting future 
returns or prices. For a given observation, the 
difference between the value predicted by the 
model and the observation is called the resid¬ 
ual. In econometrics, the residual is referred to 
as an error term or, simply, error of the model. 
It is not necessarily true that errors are inno¬ 
vations; that is, it is not necessarily true that 
errors are unpredictable. If errors are innova¬ 
tions, then the model offers the best possible 
explanation of data. If not, errors contain resid¬ 
ual forecastability. The previous discussion is 
relevant because it makes a difference if errors 
are strict white noise, martingale difference se¬ 
quences, or simply white noise. 

More specifically, a random walk whose 
changes (referred to as increments) are non¬ 
normal white noise contains a residual structure 
not explained by the model both at the level of 
expectations and higher moments. If data fol¬ 
low a martingale model, then expectations are 
completely explained by the model but higher 
moments are not. 


The Importance of the Statistical Concepts 

We have covered a good number of complex 
statistical concepts. What's more, many of these 
statistical concepts are not discussed in basic 
statistics courses offered in business schools. 
So, why are these apparently arcane statisti¬ 
cal considerations of practical significance to 
investors? The reason is that the properties of 
models that are used in attempting to forecast 
returns and prices depend on the assumptions 
made about "noise" in the data. For example, 
a linear model makes linear predictions of ex¬ 
pectations and cannot capture nonlinear events 
such as the clustering of volatility that have 
been observed in real-world stock markets. It is 
therefore natural to assume that errors are white 
noise. In other models attempting to forecast 
returns and prices, however, different assump¬ 
tions about noise need to be made; otherwise 
the properties of the model conflict with the 
properties of the noise term. 

Now, the above considerations have impor¬ 
tant practical consequences when testing error 
terms to examine how well the models that will 
be described later in this entry perform. When 
testing a model, one has to make sure that the 
residuals have the properties that we assume 
they have. Thus, if we use a linear model, say a 
linear regression, we will have to make sure that 
residuals from time-series data are white noise; 
that is, that the residuals are uncorrelated over 
time. The correlation between the residuals at 
different times from a model based on time- 
series data is referred to as autocorrelation. In 
a linear regression using time-series data, the 
presence of autocorrelation violates the ordi¬ 
nary least squares assumption when estimat¬ 
ing the parameters of the statistical model. 8 In 
general, it will suffice to add lags to the set of 
predictor variables to remove the existence of 
autocorrelation of the residuals. 9 However, if 
we have to check that residuals are martingale 
difference sequences or strict white noise, we 
will have to use more powerful tests. In ad¬ 
dition, adding lags will not be sufficient to re¬ 
move undesired properties of residuals. Models 
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will have to be redesigned. These effects are not 
marginal: They can have a significant impact on 
the profitability and performance of investment 
strategies. 


A CLOSER LOOK AT PRICING 
MODELS 

Armed with these concepts from statistics, let's 
now return to a discussion of pricing models. 
The first hypothesis on equity price processes 
that was advanced as a solution to the problem 
of forecastability was the random walk hypoth¬ 
esis. The strongest formulation assumes that re¬ 
turns are a sequence of IID variables, that is, a 
strict random walk. This means that, over time, 
the mean and the variance do not change from 
period to period. If returns are IID variables, it 
can be shown that the logarithms of prices fol¬ 
low a random walk and the prices themselves 
follow what is called a geometric random walk. 
The IID model is clearly a model without fore¬ 
castability as the distribution of future returns 
does not depend on any information set known 
at the present moment. It does, however, allow 
stock prices to have a fixed drift. 

There is a weaker form of the random walk 
hypothesis that only requires that returns at any 
two different times be uncorrelated. According 
to this weaker definition, returns are a sequence 
formed by a constant drift plus white noise. If 
returns are a white noise, however, they are not 
unpredictable. In fact, a white noise, although 
uncorrelated at every lag, might be predictable 
in the sense that its expectation might depend 
on the present information set. 

At one time, it was believed that if one as¬ 
sumes investors make perfect forecasts, then the 
strict random walk model was the only possi¬ 
ble model. However, this conclusion was later 
demonstrated to be incorrect by LeRoy (1973). 
He showed that the class of admissible mod¬ 
els is actually much broader. That is, the strict 
random walk model is too restricted to be the 


only possible model and proposed the use of 
the martingale model (i.e., the fair game model) 
that we explain next. 

The idea of a martingale has a long history 
in gambling. Actually the word "martingale" 
originally meant a gambling strategy in which 
the gambler continually doubles his or her bets. 
In modern statistics, a martingale embodies the 
idea of a fair game where, at every bet, the gam¬ 
bler has exactly the same probability of win¬ 
ning or losing. In fact, as explained earlier in 
this entry, the martingale is a process where 
the expected value of the process at any future 
date is the actual value of the process. If a price 
process or a game is represented by a martin¬ 
gale, then the expectation of gains or losses is 
zero. As from our discussion, a random walk 
with uncorrelated increments is not necessarily 
a martingale as its expectations are only linearly 
unpredictable. 

Technically, the martingale model applies to 
the logarithms of prices. Returns are the differ¬ 
ences of the logarithms of prices. The martin¬ 
gale model requires that the expected value of 
returns is not predictable because it is zero or 
a fixed constant. However, there can be subtle 
patterns of forecastability for higher moments 
of the return distribution. Higher moments, to 
repeat, are those moments of a probability dis¬ 
tribution beyond the expected value (mean) 
and variance, for example, skewness and kur- 
tosis. In other words, the distribution of returns 
can depend on the present information set pro¬ 
vided that the expected value of the distribution 
remains constant. 

The martingale model does not fully take into 
consideration risk premiums because it allows 
higher moments of returns to vary while ex¬ 
pected values remain constant. It cannot be a 
general solution to the problem of what pro¬ 
cesses are compatible with the assumptions that 
investors can make perfect probabilistic fore¬ 
casts. 

The definitive answer is due to Harrison and 
Kreps (1979) and Harrison and Pliska (1981, 
1985). They demonstrated that stock prices 
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must indeed be martingales but after multipli¬ 
cation for a factor that takes into account risk. 
The conclusion of their work (which involves 
a very complicated mathematical model), how¬ 
ever, is that a broad variety of predictable pro¬ 
cesses are compatible with the assumption that 
the market is populated by market agents capa¬ 
ble of making perfect forecasts in a probabilistic 
sense. Predictability is due to the interplay of 
risk and return. 

However, it is precisely due to the market be¬ 
ing populated by market agents capable of mak¬ 
ing perfect forecasts, it is not necessarily true 
that successful predictions will lead to excess 
returns. For example, it is generally accepted 
that predicting volatility is easier than predict¬ 
ing returns. The usual explanation of this fact 
is that investors and portfolio managers are 
more interested in returns than in volatility. 
With the maturing of the quantitative meth¬ 
ods employed by asset managers coupled with 
the increased emphasis placed on risk-return, 
risk and returns have become equally impor¬ 
tant. However, this does not entail that both 
risk and returns have become unpredictable. It 
is now admitted that it is possible to predict 
combinations of the two. 

PREDICTIVE RETURN 
MODELS 

Equity portfolio managers have used various 
statistical models for forecasting returns and 
risk. These models, referred to as predictive 
return models, make conditional forecasts of 
expected returns using the current informa¬ 
tion set. That information set could include 
past prices, company information, and financial 
market information such as economic growth or 
the level of interest rates. 

Most predictive return models employed in 
practice are statistical models. More specifi¬ 
cally, they use tools from the field of econo¬ 
metrics. We will provide a nontechnical review 
of econometric-based predictive return models 
below. 


Predictive return models can be classified into 
four general types: 10 

1. Regressive model. This model involves the use 
of regression analysis where the variables 
used to predict returns (also referred to as 
predictors or explanatory variables) are the 
factors that are believed to impact returns. 

2. Linear autoregressive model. In this model, the 
variables used to predict returns are the 
lagged returns (i.e., past returns). 

3. Dynamic factor model. Models of this type use 
a mix of prices and returns. 

4. Hidden-variable model. This type of model 
seeks to capture regime change. 

Although these models use traditional econo¬ 
metric techniques and are the most commonly 
used in practice, in recent years other mod¬ 
els based on the specialized area of machine 
learning have been proposed. The machine¬ 
learning approach in forecasting returns in¬ 
volves finding a model without any theoretical 
assumptions. This is done through a process 
of what is referred to as progressive adapta¬ 
tion. Machine-learning approaches, rooted in 
the fields of statistics and artificial intelligence 
(AI), include neural networks, decision trees, 
clustering, genetic algorithms, support vector 
machines, and text mining. 11 We will not de¬ 
scribe machine-learning based predictive re¬ 
turn models. However, in the 1990s, there were 
many exaggerated claims and hype about their 
potential value for forecasting stock returns that 
could completely revolutionize portfolio man¬ 
agement. Consequently, they received consid¬ 
erable attention by the investment community 
and the media. It seems these claims never 
panned out. 12 

As a prerequisite for the adoption of a predic¬ 
tive return model, there are a number of key 
questions that a portfolio manager must ad¬ 
dress. These include: 13 

• What are the statistical properties of the 
model? 

• How many predictor (explanatory) variables 
should be used in the model? 
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• What is the best statistical approach to esti¬ 
mate the model and is commercial software 
available for the task? 

• How does one statistically test whether the 
model is valid? 

• How can the consequences of errors in the 
choice of a model be mitigated? 

The first and last questions rely on the 
statistical concepts that we described ear¬ 
lier. These questions are addressed in more 
technical-oriented equity investment manage¬ 
ment books. 14 Consequently, we will limit our 
discussion in this entry to only the first ques¬ 
tion, describing the statistical properties of the 
four types of predictive return models. That 
is, we describe the fundamental statistical con¬ 
cepts behind these models and their economic 
meaning, but we omit the mathematical details. 

Regressive Models 

Regressive models of returns are generally 
based on linear regressions on factors. Factors 
are also referred to as predictors. Linear 
regression models are used in several aspects 
of portfolio management beyond that of return 
forecasting. For example, an equity analyst 
may use such models to forecast future sales of 
a company being analyzed. 

Regressive models can be categorized as one 
of two fundamental kinds. The first is static 
regressive models. These models do not make 
predictions about the future but regress present 
returns on present factors. The second type is 
predictive regressive models. In such models 
future returns are regressed on present and 
past factors to make predictions. For both types 
of models, the statistical concepts and princi¬ 
ples are the same. What differs is the economic 
meaning of each type of model. 

Static Regressive Models 

Static regressive models for predicting returns 
should be viewed as timeless relationships that 
are valid at any moment. They are not useful for 
predictive purposes because there is no time lag 
between the return and the factor. For example. 


consider the empirical analogue of the CAPM 
as represented by the characteristic line given 
by the following regression model: 

r, - r ft = cti + Pi [r M t - r ft ] + e it (1) 

where 

r t — return on the stock in month f 
Tf t — the risk-free rate in month f 
r Mt = the return on the market index (say 
S&P 500) in month t 
et = the error term for the stock in 
month f 

a and ft = parameters for the stock to be esti¬ 
mated by the regression model 
t = month (f = 1, 2,..., T) 

The above model says that the conditional ex¬ 
pectation of a stock's return at time t is propor¬ 
tional to the excess return of the market index 
at time f. This means that to predict the stock 
return at time T + 1, the portfolio manager must 
know the excess return of the market index at 
time T + 1, which is, of course, unknown at 
time T + 1. Predictions would be possible only 
if a portfolio manager could predict the excess 
return of the market index at time T + 1 (i.e., 

Twr+i - r/r+i)- 

There are also static multifactor models of re¬ 
turn where the return at time f is based on the 
factor returns at time f. For example, suppose 
that there are N factors. Letting F nt (n= 1,2,..., 
N; t = 1, 2,..., T), then a regression model for 
a multifactor model for stock i (again dropping 
the subscript i for stock i) would be 

rt-rft = u + p F i[rFi,t-rf t \ 

+ Pnb'Fi.t ~ r ft] H- 

+ Pfn^fnj — t ft] + St (2) 

where 

r t — return on the stock in month f 
Tft — the risk-free rate in month t 
rphi.t — the return on factor N in 
month f 

et = the error term for the stock in 
month f 
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a and /3 FN 's = parameters for the stock to 
be estimated by the regression 
model 

f = month (f = 1, 2,..., T) 

Thus, in order for a portfolio manager to build 
a portfolio or to compute portfolio risk mea¬ 
sures using the above multifactor model for 
month T + 1, just as in the case of the char¬ 
acteristic line, some assumption about how to 
forecast the excess returns (i.e., tfn,t+ i ~ r f r+i) 
for each factor is required. 

Predictive Regressive Models 
In the search for models to predict returns, 
predictive regressive models have been devel¬ 
oped. To explain predictive regressive models, 
consider some stock return and an assumed 
number of predictors. These predicators could 
be financial measures and market measures. 
A predictive linear regressive model assumes 
that the stock return at any given time t is a 
weighted average of its predictors at an earlier 
time plus a constant and some error. Hence, 
the information needed for predicting a stock's 
return does not require the forecasting of the 
predictor used in the regression model. 

Predictive regressive models can also be de¬ 
fined by estimating a regression model where 
there are factors used as predictors at differ¬ 
ent lags. Such models, referred to as distributed 
lag models, have the advantage that they can 
capture the eventual dependence of returns not 
only on factors but also on the rate of change 
of factors. Here is the economic significance 
of such models. Suppose that a portfolio man¬ 
ager wants to create a predictive model based 
on, among other factors, "market sentiment." 
In practice, market sentiment is typically mea¬ 
sured as a weighted average of analysts' fore¬ 
casts. A reasonable assumption is that stock 
returns will be sensitive to the value of mar¬ 
ket sentiment but will be even more sensi¬ 
tive to changes in market sentiment. Hence, 
distributed lag models will be useful in this 
setting. 


Linear Autoregressive Models 

In a linear autoregressive model, a variable is 
regressed on its own past values. Past values 
are referred to as lagged values and when they 
are used as predictors in the model they are 
referred to as lagged variables. In the case of 
predictive return models, one of the lagged 
variables would be the past values of the return 
of the stock. If the model involves only the 
lagged variable of the stock return, it is called 
an autoregressive model (AR model). An AR 
model prescribes that the value of a variable at 
time t be a weighted average of the values of 
the same variable at times t — 1, t - 2,..., and so 
on (depending on number of lags) plus an error 
term. The weighting coefficients are the model 
parameters that must be estimated. If the model 
includes p lags, then p parameters must be 
estimated. 

If there are other lagged variables in addition 
to the lagged variable representing the past val¬ 
ues of the return on the stock included in the 
regression model, the model is referred to as a 
vector autoregressive model (VAR model). The 
model expresses each variable as a weighted av¬ 
erage of its own lagged values plus the lagged 
values of the other variables. A VAR model 
with p lags is denoted by VAR(p) model. The 
benefit of a VAR model is that it can capture 
cross-autocorrelations; that is, a VAR model can 
model how values of a variable at time t are 
linked to the values of another variable at some 
other time. An important question is whether 
these links are causal or simply correlations. 15 

For a model to be useful, the number of pa¬ 
rameters to be estimated needs to be small. 
In practice, the implementation of a VAR is 
complicated by the fact that such models can 
only deal with a small number of series. This 
is because when there is a large number of 
series—for example, the return processes for the 
individual stocks making up such aggregates as 
the S&P 500 Index—this would require a large 
number of parameters to be estimated. For ex¬ 
ample, if one wanted to model the daily returns 
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of the S&P 500 with a VAR model that included 
two lags, the number of parameters to estimate 
would be 500,000. To have at least as many 
data points as parameters, one would need at 
least four years of data, or 1,000 trading days, 
for each stock return process, which is 1,000 x 
500 = 500,000 data points. Under these condi¬ 
tions, estimates would be extremely noisy and 
the estimated model would be meaningless. 

Dynamic Factor Models 

Unlike a VAR model, which involves regress¬ 
ing returns on factors but does not model the 
factors, a dynamic factor model assumes fac¬ 
tors follow a VAR model and returns (or prices) 
are regressed on these factors. The advantage 
of such models is that unlike the large amount 
of data needed to estimate the large number of 
parameters in a VAR model, a dynamic factor 
model can significantly reduce the number of 
parameters to be estimated and therefore the 
amount of data needed. 

Hidden-Variable Models 

Hidden-variable models attempt to repre¬ 
sent states of the market using hidden vari¬ 
ables. Probably the best known hidden- 
variable model is the autoregressive conditional 
heteroscedasticity (ARCH) and generalized 
autoregressive conditional heteroscedasticity 
(GARCH) family. ARCH/GARCH models use 
an autoregressive process to model the volatil¬ 
ity of another process. The result is a rich repre¬ 
sentation of the behavior of the model volatility. 

Another category of hidden-variable mod¬ 
els is the Markov switching-vector autoregres¬ 
sive (MS-VAR) family. These models do allow 
forecasting of expected returns. The simplest 
MS-VAR model is the Hamilton model. 16 In 
economics, this model is based on two ran¬ 
dom walk models—one with a drift for periods 
of economic expansion and the other with a 
smaller drift for periods of economic recession. 
The switch between the two models is governed 
by a probability transition table that prescribes 


the probability of switching from recession to 
expansion, and vice versa, and the probability 
of remaining in the same state. 

IS FORECASTING MARKETS 
WORTH THE EFFORT? 

In the end, all of this discussion leads to 
the question: What are the implications for 
portfolio managers and investors who are 
attempting or contemplating attempting build¬ 
ing predictive return models? That is, how does 
this help portfolio managers and investors to 
decide if there is potentially sufficient benefit 
(i.e., trading profits and/or excess returns) in 
trying to extract information from market price 
data through quantitative modeling? There are 
three important points regarding this potential 
benefit. 

The first, as stated by Fabozzi, Focardi, and 
Kolm (2006a, 11), is the following: 

It is not true that progress in our ability to fore¬ 
cast will necessarily lead to a simplification in price 
and return processes. Even if investors were to be¬ 
come perfect forecasters, price and return processes 
might still exhibit complex patterns of forecastabil- 
ity in both expected values and higher moments, 
insofar as they might be martingales after dynam¬ 
ically adjusting for risk. No simple conclusion can 
be reached simply by assuming that investors are 
perfect forecasters: in fact, it is not true that the 
ability to forecast prices implies that prices are un¬ 
predictable random walks. 

It is noteworthy that when the random walk 
hypothesis was first proposed in the academic 
community, it was the belief that the task of 
price forecasting efforts was a worthless exer¬ 
cise because prices were random walks. How¬ 
ever, it seems reasonable to conclude that price 
processes will always be structured processes 
simply because investors are trying to forecast 
them. Modeling and sophisticated forecasting 
techniques will be needed to understand the 
risk-return trade-offs offered by the market. 

The second point is that the idealized behav¬ 
ior of perfect forecasters does not have much 
to do with the actual behavior of real-world 
investors. The behavior of markets is the result. 
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not of perfectly rational market agents, but of 
the action of market agents with limited intelli¬ 
gence, limited resources, and subject to unpre¬ 
dictable exogenous events. Consequently, the 
action of market agents is a source of uncer¬ 
tainty in itself. As a result, there is no theo¬ 
retical reason to maintain that the multivariate 
random walk is the most robust model. 

Real-world investors use relatively simple 
forecasting techniques such as linear regres¬ 
sions. It is reasonable to believe that when real- 
world investors employ judgment, there is the 
possibility of making large forecasting errors. 
As the behavioral finance camp argues, the pre¬ 
occupation with the idealized behavior of mar¬ 
kets populated by perfect forecasters seems to 
be misguided. Theorists who defend the as¬ 
sumption that investors in the real world are 
perfect forecasters, believe that it is unreason¬ 
able to assume that investors make system¬ 
atic mistakes. Proponents of this assumption 
claim that, on average, investors make correct 
forecasts. 

However, the evidence suggests that this 
claim is not true. Investors can make systematic 
mistakes and then hit some boundary, the con¬ 
sequences of which can be extremely painful 
in terms of wealth accumulation as we saw in 
the late 1990s with the bursting of the tech¬ 
nology, media, and telecommunications bub¬ 
ble. As Fabozzi, Focardi, and Kolm (2006a, 11) 
conclude: 

A pragmatic attitude prevails. Markets are consid¬ 
ered to be difficult to predict but to exhibit rather 
complex structures that can be (and indeed are) 
predicted, either qualitatively or quantitatively. 

Finally, an important point is that predictabil¬ 
ity is not the only path to profitability/ excess re¬ 
turns. Citing once again from Fabozzi, Focardi, 
and Kolm (2006a, 11-12): 

If prices behaved as simple models such as the 
random walk model or the martingale, they 
coidd nevertheless exhibit high levels of persistent 
profitability. This is because these models are char¬ 
acterized by a fixed structure of expected returns. 
Actually, it is the time-invariance of expected re¬ 
turns coupled with the existence of risk premiums 
that makes these models unsuitable as long-term 


models .... A model such as the geometric random 
walk model of prices leads to exponentially diverg¬ 
ing expected returns. This is unrealistic in the long 
run, as it woidd lead to the concentration of all mar¬ 
ket capitalization in one asset. As a consequence, 
models such as the random walk model can only be 
approximate models over limited periods of time. 
This fact, in turn, calls attention to robust estima¬ 
tion methods. A random walk model is not an ideal¬ 
ization that represents the final benchmark model: It 
is only a short-term approximation of what a model 
able to capture the dynamic feedbacks present in 
financial markets should be. 

Hence, whether the random walk assumption 
is in fact the benchmark model of price pro¬ 
cesses must be addressed empirically. Yet, the 
view of portfolio managers is that markets offer 
patterns of predictability in returns, volatility 
(variance), and, possibly, higher moments. 
Because any such patterns might offer op¬ 
portunities for realizing excess returns, a 
portfolio manager who ignores these patterns 
will be risking lost opportunities to enhance 
performance. As Fabozzi, Focardi, and Kolm 
(2006a, 24) state: 

[S]imple random walk models with risk premiums 
are not necessarily the safest models. The joint as¬ 
sumptions that markets are unforecastable and that 
there are risk premiums is not necessarily the safest 
assumption. 


KEY POINTS 

• Despite the ongoing debate about the pre¬ 
dictability of stock prices and returns, asset 
management firms have adopted statistical 
models of various levels of complexity for 
forecasting these values. 

• The concept of forecastability rests on how 
one can forecast the future given the current 
information set known at that date. 

• Prices or returns are said to be forecastable 
if the knowledge of the past influences our 
forecast of the future. 

• The two beliefs that seem to be held in the in¬ 
vestment community are (1) predictable pro¬ 
cesses allow investors to earn excess returns, 
and (2) unpredictable processes do not allow 
investors to earn excess returns. 
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Predictable processes do not necessarily pro¬ 
duce excess returns if they are associated with 
unfavorable risk, and unpredictable expecta¬ 
tions can be profitable if the expected value is 
favorable. 

Probability theory is used in decision making 
to represent and measure the level of uncer¬ 
tainty 

The absence of predictability means that the 
distribution of future returns does not change 
as a function of the present and past values of 
prices and returns. 

From this perspective, a price or return pro¬ 
cess is said to be predictable if its probability 
distributions depend on the current informa¬ 
tion set, and a price or return process is said 
to be unpredictable if its probability distribu¬ 
tions do not vary over time. Using this con¬ 
cept of predictability, we can understand why 
prices and returns are difficult, perhaps even 
impossible, to predict. 

The key is that any prediction that might lead 
to an opportunity to generate a trading profit 
or an excess return tends to make that oppor¬ 
tunity disappear. If stock return predictions 
are certain, then using simple arbitrage ar¬ 
guments would dictate that all stocks should 
have the same return. In fact, if stock returns 
could be predicted with certainty and if there 
were different returns, then investors would 
choose only those stocks with the highest re¬ 
turns. 

Because stock return forecasts are not certain, 
uncertain predictions are embodied in prob¬ 
ability distributions. 

The problem faced by investors is whether 
general considerations of market efficiency 
are capable of determining the mathematical 
form of price or return processes. In particu¬ 
lar, investors are interested in understanding 
if stock prices or returns are necessarily un¬ 
predictable. 

In solving this problem, the investor's ob¬ 
jective is to shun models that permit unrea¬ 
sonable inferences. The following solutions 
have been proposed: (1) Returns fluctuate 
randomly around a given mean (i.e., returns 


follow multivariate random walks); (2) re¬ 
turns are a fair game (i.e., returns are mar¬ 
tingales); and (3) returns are a fair game after 
adjusting for risk. 

* Concepts from probability theory and statis¬ 
tics that are relevant in understanding return 
forecasting models are conditional probabil¬ 
ity, conditional expectation, independent and 
identically distributed random variables, 
strict white noise, martingale difference 
sequence, white noise, error terms, and 
innovations. 

* An arithmetic random walk, a martingale, 
and a strict arithmetic random walk describe 
the stochastic process for returns and prices. 
If stock prices or returns follow an arithmetic 
random walk, the mean is linearly unpre¬ 
dictable but higher moments might be pre¬ 
dictable. 

* In the case of a martingale, the mean is unpre¬ 
dictable (linearly and nonlinearly), although 
higher moments might be predictable. 

* If stock prices or returns follow a strict ran¬ 
dom walk, the mean, variance, and higher 
moments are all unpredictable. 

* The statistical-based predictive return mod¬ 
els used by portfolio managers make condi¬ 
tional forecasts of expected returns using the 
current information set: past prices, company 
information, and financial market informa¬ 
tion. These models are classified as regres¬ 
sive models, linear autoregressive models, 
dynamic factor models, and hidden-variable 
models. 

NOTES 

1. See Bernstein (2008). 

2. The contributions of Bachelier are too ex¬ 
haustive (and technical) to describe here. 
In addition to his study of the behavior 
of prices, his work in the area of random 
walks predated Albert Einstein's study of 
Brownian motion in physics by five years. 
His work in option pricing theory predated 
the well-known Black-Scholes option pric¬ 
ing model by 73 years. 
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3. See Samuelson (1965) and Fama (1965). 

4. See Bernstein (1998) for an account of the 
development of the concepts of risk and un¬ 
certainty from the beginning of civilization 
to modern risk management. 

5. The idea of probability as intensity of belief 
was introduced by Keynes (1921). 

6. The idea of probability as a relative fre¬ 
quency was introduced by von Mises 
(1921). 

7. See Rachev et al. (2007). 

8. More specifically, the presence of autocorre¬ 
lation does not bias the estimated parame¬ 
ters of the model but results in biases in the 
standard errors of the estimated parame¬ 
ters, which are used in testing the goodness 
of fit of the model. 

9. Statements like this are intended as ex¬ 
emplifications but do not strictly embody 
sound econometric procedures. Adding 
lags has side effects, such as making esti¬ 
mations noisier, and cannot be used indis¬ 
criminately. 

10. Fabozzi, Focardi, and Kolm (2006a, 66). 

11. For a nontechnical discussion of these mod¬ 
els, see Chapter 6 in Fabozzi, Focardi, and 
Kolm (2006a). For a more technical dis¬ 
cussion see Fabozzi, Focardi, and Kolm 
(2006b). 

12. For discussion of the merits and limits of AI 
from a practical perspective, see Leinweber 
and Beinart (1996). 

13. Fabozzi, Focardi, and Kolm (2006a, 66). 

14. See, for example, Fabozzi, Focardi, and 
Kolm (2006b). 

15. For a discussion of the analysis of causality 
in VAR models, see Fabozzi, Focardi, and 
Kolm (2006b). 

16. Hamilton (1989). 
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Abstract: Asset pricing models seek to estimate the relationship between the factors that drive asset 
expected return. The factors that drive the expected returns are referred to as risk factors. Two well- 
known asset pricing models returns are the capital asset pricing model and the arbitrage pricing 
theory. The relationship between risk factors and expected return in these two equilibrium models 
is based on various assumptions. In practice, multifactor models are estimated from observed asset 
returns and sophisticated statistical techniques are employed to estimate the exposure of an asset 
to each factor. 


Given a set of assets or asset classes, an im¬ 
portant task in the practice of investment man¬ 
agement is to understand and estimate their 
expected returns and the associated risks. Fac¬ 
tor models are widely used by investors to link 
the risk exposures of the assets to a set of known 
or unknown factors. The known factors can be 
economic or political factors, industry factors 
or country factors, and the unknown factors are 
those that best describe the dynamics of the as¬ 
set returns in the factor models, but they are 
not directly observable or easily interpreted by 
investors and have to be estimated from the 
data. 

Applications of the mean-variance analysis 
and portfolio selection theories in general re¬ 
quire the estimation of expected asset returns 
and their covariance matrix. Those market par¬ 
ticipants who can identify those true factors that 


drive asset returns should have much better es¬ 
timates of the true expected asset returns and 
the covariance matrix, and hence should be able 
to form a much better portfolio than otherwise 
possible. Hence, a lot of research and resources 
are devoted to analyzing factor models in prac¬ 
tice by the investment community. There is an 
intellectual "arms race" to find the best portfo¬ 
lio strategies to outperform competitors. 

Factor model estimation depends crucially on 
whether the factors are identified (known) and 
unidentified (latent), and depend on the sample 
size and the number of assets. In addition, factor 
models can be used not only for explaining asset 
returns, but also for predicting future returns. 
In this entry, we review first the factor models 
in the case of known and latent factors in order 
to provide a big picture, and then discuss the 
details of estimation. 
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ARBITRAGE PRICING 
THEORY 

One of the fundamental problems in finance is 
to explain the cross-section differences in asset 
expected returns. Specifically, what factors can 
explain the observed differences? Those factors 
that systematically affect the differences in ex¬ 
pected returns are therefore the risks that in¬ 
vestors are compensated for. Hence, the term 
"factors" is interchangeable with the term "risk 
factors." 

The arbitrage pricing theory (APT), formulated 
by Ross (1976), posits that expected returns of 
assets are linearly related to K systematic factors, 
and the exposure to these factors is measured 
by factor betas; that is, 

E[?j] — r f + YiPn + ■ ■ ■ + YkPik (1) 

where Pik is the beta or risk exposure on the 
k- th factor, and yi t is the factor risk premium, 
for k = 1,2,..., K. 

Technically, the APT assumes a K-factor 
model for the return-generating process, that 
is, the asset returns are influenced by K factors 
in the economy via linear regression equations, 

h'f ~ r ft — a i + Piljlt + • • • + PiK Jut + Sit (2) 

where Ji, Ji . Jk are the systematic factors 

that affect all the asset returns on the left-hand 
side, i = 1,2 ,,N; and e It is the asset specific 
risk. Note that we have placed a tilde sign (~) 
over the random asset returns, factors, and spe¬ 
cific risks. By so doing, we distinguish between 
factors (random) and their realizations (data), 
which are important for understanding the es¬ 
timation procedure below. 

Theoretically, under the assumption of no ar¬ 
bitrage, the asset pricing relation of the APT as 
given by equation (1) must be true as demon¬ 
strated by Ross. There are two important points 
to note. First, the return-generating process as 
given by equation (2) is fundamentally differ¬ 
ent from the asset pricing relation. The return¬ 
generating process is a statistical model used to 
measure the risk exposures of the asset returns. 
It does not require drawing any economic con¬ 


clusion, nor does it says anything about what 
the expected returns on the assets should be. In 
other words, the tf/s in the return-generating 
process can statistically be any numbers. Only 
when the no-arbitrage assumption is imposed 
can one claim the APT, which says that the «j's 
should be linearly related to their risk exposures 
(betas). 

Second, the APT does not provide any spe¬ 
cific information about what the factors are. Nor 
does the theory make any claims on the number 
of factors. It simply assumes that if the returns 
are driven by the factors, and if the smart in¬ 
vestors know the betas (via learning or estimat¬ 
ing), then an arbitrage portfolio, which requires 
no investment but yields a positive return, can 
be formed if the APT pricing relation is violated 
in the market. Hence, in equilibrium if there are 
no arbitrage opportunities, we should not ob¬ 
serve deviations from the APT pricing relation. 

TYPES OF FACTOR MODELS 

In this section we describe the different types of 
factor models. 

Known Factors 

The simplest case of factor models is where the 
K factors are assumed known or observable, 
so that we have time-series data on them. In 
this case, the K -factor model for the return¬ 
generating process as given by equation (2) is 
a multiple regression for each asset and is a 
multivariate regression if all of the individual 
regressions are pooled together. For example, 
if one believes that the gross domestic product 
(GDP) is the driving force for a group of stock 
returns, one would have a one-factor model, 

fit —rjt = ai + PuCDPt + su 

The above equation corresponds to equation 
(1) with K — 1 and f\ = GDP. In practice, one 
can obtain time-series data on both the asset 
returns and GDP, and then one can estimate re¬ 
gressions to obtain all the parameters, including 
in particular the expected returns. 
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Another popular one-factor model is the mar¬ 
ket model regression 

fit ~ rft = a; + Pn(f mt - Tft) + e it 

where r mt is the return on a stock market index. 

To understand the covariance matrix estima¬ 
tion, it will be useful to write the K-factor model 
in matrix form. 


Rf — a + Pft + St 


or 
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R t — an iV-vector of asset excess returns 
a — an N-vector of the alphas 
P — an N x K of betas or factor loadings 
ft — a K-vector of the factors 
S — an N-vector of the model residuals. 


For example, we can write a model with 
N — 3 assets and K — 2 factors as 
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Taking covariance on both sides of equation 
(2), we have the return covariance matrix 


i : = p'n f p + v e (3) 

where T,f is the covariance matrix of the factors, 
and is the covariance matrix of the residu¬ 
als. E f can be estimated by using the sample 
covariance matrix from the historical returns. 
This works for E s too if N is small relative to 
T. However, when N is large relative to T, the 
sample covariance matrix of the residuals will 
be poorly behaved. 


Usually an additional assumption that the 
residuals are uncorrelated is imposed, so that 
Eg becomes a diagonal matrix and can then 
be estimated by using the sample variances 
of the residuals. Plugging in the estimates of 
all the parameters into the right-hand side of 
equation (3), we obtain the covariance matrix 
needed for applying mean-variance portfolio 
analysis. 

In the estimation of a multifactor model, it 
is implicitly assumed that the number of time 
series observations T is far greater than K, the 
number of factors. Otherwise, the regressions 
will perform poorly. For the case in which K is 
close to T, some special treatments are needed. 
This will be addressed later in this entry. 


Examples of Multifactor Models with Known 
Factors 

Before discussing latent factors, let's briefly 
describe four multifactor models where known 
factors are used: (1) the Fama-French three- 
factor model (Fama and French, 1993), (2) the 
MSCI Barra fundamental factor model, (3) the 
Burmeister-Ibbotson-Roll-Ross (BIRR) macro- 
economic factor model (Burmeister, Roll, and 
Ross, 1994), and (4) the Barclay Group Inc. factor 
model. The first three are equity factor models 
and the last is a bond factor model. 

The widely used Fama-French three-factor 
model is a special case of equation (1) with 
K= 3, 


fit r^t — oti T Pim{rmt rft) T PisSMB t 

+ PihHML t + Su 


where f m t, as before, is the return on a stock mar¬ 
ket index, SMB t and HMLt are two additional 
factors. SMB t (small minus big) is defined as 
the difference between the returns on diversi¬ 
fied portfolios of small and big stocks (where 
small and big are measured in terms of stock 
market capitalization), and HML t (high minus 
low) is defined as the difference between the 
returns on diversified portfolios of high and 
low book value-to-market value (B/M) stocks. 
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The introduction of these factors by Fama and 
French is to better capture the systematic varia¬ 
tion in average return for typical portfolios than 
when using a stock market index alone. These 
factors are supported by empirical studies and 
are consistent with classifying stocks in terms 
of growth and value. 

Fundamental factor models use company and 
industry attributes and market data as "de¬ 
scriptors." Examples are price/earnings ratios, 
book/price ratios, estimated earnings growth, 
and trading activity. The estimation of a funda¬ 
mental factor model begins with an analysis of 
historical stock returns and descriptors about a 
company. In the MSCI Barra model, for exam¬ 
ple, the process of identifying the factors begins 
with monthly returns for hundreds of stocks 
that the descriptors must explain. Descriptors 
are not the “r factors" but instead they are the 
candidates for risk factors. The descriptors are 
selected in terms of their ability to explain stock 
returns. That is, all of the descriptors are po¬ 
tential risk factors but only those that appear 
to be important in explaining stock returns are 
used in constructing risk factors. Once the de¬ 
scriptors that are statistically significant in ex¬ 
plaining stock returns are identified, they are 
grouped into "risk indexes" to capture related 
company attributes. For example, descriptors 
such as market leverage, book leverage, debt- 
to-equity ratio, and company's debt rating are 
combined to obtain a risk index referred to as 
"leverage." Thus, a risk index is a combina¬ 
tion of descriptors that captures a particular at¬ 
tribute of a company. For example, in the MSCI 
Barra fundamental multifactor model, there are 
13 risk indices and 55 industry groups. The 
55 industry classifications are further classified 
into sectors. 

In a macroeconomic factor model, the inputs 
to the model are historical stock returns and ob¬ 
servable macroeconomic variables. In the BIRR 
macroeconomic multifactor model, the macro- 
economic variables that have been pervasive in 
explaining excess returns and which are there¬ 
fore included in the market are 


• The business cycle: Changes in real output that 
are measured by percentage changes in the 
index of industrial production. 

• Interest rates: Changes in investors' expecta¬ 
tions about future interest rates that are mea¬ 
sured by changes in long-term government 
bond yields. 

• Investor confidence: Expectations about future 
business conditions as measured by changes 
in the yield spread between high- and low- 
grade corporate bonds. 

• Short-term inflation: Month-to-month jumps in 
commodity prices, such as gold or oil, as mea¬ 
sured by changes in the consumer price index. 

• Inflationary expectations: Changes in expecta¬ 
tions of inflation as measured by changes 
in the short-term, risk-free nominal interest 
rate. 

Additional variables, such as the real GDP 
growth and unemployment rates, are also 
among the macroeconomic factors used by asset 
managers in other macroeconomic multifactor 
models. Moreover, some asset managers also 
have identified technical variables, such as trad¬ 
ing volume and market liquidity, as factors. 

The Barclay Group Inc. (BGI) bond factor 
model (previously the Lehman bond factor 
model) uses two categories of systematic risk 
factors: term structure factors and non-term 
structure risk factors. The former include 
changes in the level of interest and changes 
in the shape of the yield curve. The non-term 
structure factors are sector risk, credit risk, op- 
tionality risk, and a series of risks associated 
with investing in mortgage-backed securities. 

The search for factors is a never-ending task of 
asset managers. In practice, many popular in¬ 
vestment software packages use dozens of fac¬ 
tors. Some academic studies, such as Ludvigson 
and Ng (2007), use hundreds of them. 

Latent Factors 

While some applications use observed factors, 
some use entirely latent factors, that is, the view 
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that the factors f t in the K-factor model, 

R t — a + ftft + St 

are not directly observable. An argument for 
the use of latent factors is that the observed 
factors may be measured with errors or have 
been already anticipated by investors. Without 
imposing what f t are from our likely incorrect 
belief, we can statistically estimate the factors 
based on the factor model and data. 

It is important to understand that in the field 
of statistics, there is statistical methodology 
known as "factor analysis" and the model gen¬ 
erated is referred to as a "factor model." Factor 
models as used by statisticians are statistical 
models that try to explain complex phenom¬ 
ena through a small number of basic causes 
or factors with the factors being latent. Factor 
models as used by statisticians serve two main 
purposes: (1) They reduce the dimensionality of 
models to make estimation possible, and /or (2) 
they find the true causes that drive data. In our 
discussion of multifactor models, we are using 
the statistical tool of factor analysis to try to de¬ 
termine the latent factors driving asset returns. 

While the estimation procedures for deter¬ 
mining the set of factors will be discussed in 
the next section, it will be useful to know some 
of the properties of the factor model here. The 
first property is that the factors are not uniquely 
defined in the model, but all sets of factors are 
linear combinations of each other. This is be¬ 
cause if f t is a set of factors, then, for any K x K 
invertible matrix A, we have 

R t — a + ft ft + St = a + {ft A l ){Aft) + St 

( 4 ) 

which says that if f t with regression coefficients 
ft (known as adding factor loadings in the con¬ 
text of factor models) explains asset returns 
well, so does f* — Af t with loadings ft A -1 . The 
linear transformation of ft, f* , is also known as 
a rotation of ft. 

The second property is that we can assume all 
the factors have zero mean (i.e., E [ft] = 0). This 


is because if /if = E[f t ], then the factor model 
can be written as 

Rt = o: + ft ft + St — (« - ft [if ) + ft(ft — M/) + st 

(5) 

If we rename a — ft/i f as the new alphas, and 
ft — as the new factors, then the new factors 
will have zero means, and the new factor model 
is statistically the same as the old one. Flence, 
without loss of generality, we will assume that 
the mean of the factors are zeros in our estima¬ 
tion in the next section. 

Note that the return covariance matrix for¬ 
mula, equation (3) or 

E = ^'E / ^ + E s (6) 

holds regardless of whether the factors are ob¬ 
servable or latent. Flowever, through factor ro¬ 
tation, we can make a new set of factors so as to 
have the identity covariance matrix. In this case 
with Yjf = Ik, we say that the factor model is 
standardized, and the covariance equation then 
simply becomes 

Z = p'p+ E e (7) 

In general, Y. f can have nonzero off-diagonal 
elements, implying that the residuals are corre¬ 
lated. If we assume that the residuals are un¬ 
correlated, then becomes a diagonal matrix, 
and the factor model is known as a strict fac¬ 
tor model. If we assume further that E e has 
equal diagonal elements, i.e., E £ = <r 2 by for 
some a > 0 with by an N identity matrix, then 
the factor model is known as a normal factor 
model. 


Both Types of Factors 

Rather than taking the view of only observable 
factors or only latent factors, we can consider 
a more general factor model with both types of 
factors. 


Rt — oi + ft ft + ftggt + St 


( 8 ) 
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where J t is a K-vector of latent factors, g t is 
an L-vector of observable factors, and ft g are 
the betas associated with gt- This model makes 
intuitive sense. If we believe a few fundamental 
or macroeconomic factors are the driving forces, 
they can be used to create the gt vector. Since 
we may not account for all the possible factors, 
we need to add K unknown factors, which are 
to be estimated from the data. 

The estimation of the above factor model 
given by equation (8) usually involves two 
steps. In the first step, a regression of the as¬ 
set returns on the known factors is run in order 
to obtain ft,,, an estimate of ft g . This allows us to 
compute the residuals, 

«f = Rt~ ftggt (9) 

that is, the difference of the asset returns from 
their fitted values by using the observed factors 
for all the time periods. Then, in the second step, 
a factor estimation approach is used to estimate 
the latent factors for u t , 

lit — of + ft ft + ftggt + $t (10) 

where u t is the random differences whose real¬ 
ized values are fq. The estimation method for 
this model is the same as estimating a latent 
factor model and will be detailed in the next 
section. With the factor estimates, we can treat 
the latent factors as known, and then use equa¬ 
tion (8) to determine the expected asset returns 
and covariance matrix. 

Predictive Factor Models 

An important feature of factor models is that 
they use time t factors to explain time t returns. 
This is to estimate the long-run risk exposures 
of the assets, which are useful for both risk con¬ 
trol and portfolio construction. On the other 
hand, portfolio managers are also very con¬ 
cerned about time-varying expected returns. 
In this case, they often use a predictive factor 
model such as the following to forecast the re¬ 
turns, 

( 11 ) 


where as before ft and g t are the latent and 
observable factors, respectively. The single dif¬ 
ference is that the earlier R t is now replaced by 
Rt | ]. Equation (11) uses time t factors to forecast 
future return R t+ i. 

Computationally, the estimation of the predic¬ 
tive factor model is the same as for estimating 
the standard factor models. However, it should 
be emphasized that the regression R 2 , a mea¬ 
sure of model fitting, is usually very good in 
the explanatory factor models. In contrast, if a 
predictive factor model is used to forecast the 
expected returns of various assets, the R 2 rarely 
exceeds 2%. This simply reflects the fact that as¬ 
sets returns are extremely difficult to predict in 
the real world. For example, Rapach, Strauss, 
Tu, and Zhou (2009) find that that the R 2 are 
mostly less than 1% when forecasting industry 
returns using a variety of past economic vari¬ 
ables and past industry returns. 


FACTOR MODEL 
ESTIMATION 

In this section, we provide first a step-by-step 
procedure for estimating the factor model 
based on the popular and implementable 
approach, principal components analysis (PCA), 
to which a detailed and intuitive introduction 
is provided in the last section of this entry. PCA 
is a statistical tool that is used by statisticians to 
determine factors with statistical learning tech¬ 
niques when factors are not observable. That is, 
given a variance-covariance matrix, a statisti¬ 
cian can determine factors using the technique 
of PCA. Then, after learning the computational 
procedure, we provide an application to iden¬ 
tify three factors for bond returns. Finally, we 
outline some alternative procedures for esti¬ 
mating the factor models and their extensions. 

Computational Procedure 

By our use of latent models, we need to consider 
only how to estimate the latent factors ft from 


Rf+i —cn + ftjt + ftggt + £t 
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the K-f actor model, 

Y t = ft ft + £ t (12) 

where 


E(J t ) = 0, E[Y f ] = 0 


This version of the factor model is obtained in 
two steps. We de-mean first the factor f t so that 
the alphas are the expected returns of the assets. 
Second, we de-mean again the asset returns. In 
other words, we let Y t = Rt — a. 

In practice, suppose that we have return data 
on N risky assets over T time periods. Then the 
realizations of the random variable Yt can be 
summarized by a matrix. 


Y = 


/ Yu Y 21 


\ Yit Yxt 


Yn i ^ 
Ynt j 


(13) 


where each row is the N asset returns sub¬ 
tracting from their sample means at time f for 
t = 1, 2, ..., T. Our task is to estimate the real¬ 
izations (unobserved) on the K factors, j), over 
the T periods. 


(F u 

F 2 i ■ 

■ f k1 \ 

V Fit 

f 2T ■ 

■ F K t ) 


(14) 


We will now apply PCA estimation method¬ 
ology. 

There are two important cases, each of which 
calls for a different way of applying PCA. The 
first case is the one of traditional factor analysis 
in which N is treated as fixed, and T is allowed 
to grow. We will refer to this case as the "fixed 
N" below. The second case is when N is allowed 
to grow but T is either fixed or allowed to grow. 
We will refer to this case simply as "large N." 


Case 1: Fixed N 

In the case of fixed N, we have a relatively 
smaller number of assets and a relatively large 
sample size. Then the covariance matrix of the 
asset returns, which is the same as the covari¬ 
ance matrix of Yt, can be estimated by the sam¬ 


ple covariance matrix, 

Y'Y 
4 / = — 
T 


(15) 


which is an N by N matrix since Y is T by N. 
For example, if we think there are K (say K — 5) 
factors, we can use standard software to com¬ 
pute the first K eigenvectors of T corresponding 
to the first K largest eigenvalues of T, each of 
which is an N vector. Let ft be the NbyK matrix 
formed by these K eigenvectors. Then ft will be 
an estimate of ft. Based on this, the factors are 
estimated by 


Ft — Y t ft, f = 1,2,..., T (16) 


where Y f is the f-th row of Y, and F t is the esti¬ 
mate of F t/ the f-th row of F. The Fft s are the es¬ 
timated realizations of the first K factors. Seber 
(1984) explains why the Pfts are good estimates 
of the true and unobserved factor realizations. 
However, theoretically, they, though close, will 
not necessarily converge to the true values, un¬ 
less the factor model is normal, as T increases. 
Nevertheless, despite this problem, this proce¬ 
dure is widely used in practice. 


Case 2: Large N 

In the case of large N, we have a large number 
of assets. We now form a new matrix based on 
the product of Y with Y', 


Q = 


YY' 

T 


(17) 


which is a T by T matrix since Y is T by N. Given 
K, we use standard software to compute the 
first K eigenvectors of Q corresponding to the 
first K largest eigenvalues of £2, each of which 
is a T vector. Letting P be the T by K matrix 
formed by these K eigenvectors, the PCA says 
that P is an estimate of the true and unknown 
factor realizations F of equation (14), up to a 
linear transformation. Connor and Korajczyk 
(1986) provided the first study in the finance 
literature to apply the PCA as described above. 
The method is also termed "asymptotic PCA" 
since it allows the number of assets to increase 
without bound. In contrast, traditional PCA 
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keeps N fixed, while allowing the number of 
time periods, T, to be large. 

Theoretically, if the true factor model is the 
strict factor model or is not much too different 
from it (i.e., the residual correlations are not too 
strong), Bai (2003) shows that P converges to F 
up to a linear transformation when both T and 
N increase without limit. The estimation errors 
are of order the larger of 1 / T or 1 /y/N, and con¬ 
verge to zero as both T and N grow to infinity. 
However, when T is fixed, we need a stronger 
assumption that the the true factor model is 
close to a normal model, then the estimation er¬ 
rors are of order of 1/y/N. Intuitively, at each 
time f, given that there are only a few factors to 
pricing so many assets, we should have enough 
information to back out the factors accurately. 

Based on the estimated factors, the factor 
loadings are easily estimated from equation 
(12). For example, we can obtain the loadings 
for each asset by estimating the standard ordi¬ 
nary least squares (OLS) regression of the asset 
returns on the factors. Mathematically, this is 
equivalent to computing all the loadings from 
the formula 

ft' = {P'P)~ 1 P'X (18) 

Under the same conditions above, ft also con¬ 
verges to ft up to a linear transformation. 

The remaining question is how to determine 
K. In practice, this may be determined by trial 
and error depending on how different K’s per¬ 
form in model fitting and in meeting the ob¬ 
jectives where the model is applied. From an 
econometrics perspective, there is a simple so¬ 
lution in Case 2. Bai and Ng (2002) provide a 
statistical criterion 

/ N + T\ ( NT \ 

IdK) = ]log( V(K)) + K ( — ) log ( —) 

(19) 

where 


For a given K, V(K) is the sum of the fit¬ 
ted squared residual errors of the factor model 
across both asset and time. This is a measure 
of model fitting. The smaller the V(K), the bet¬ 
ter the K-f actor model in explaining the asset 
returns. So we want to choose such a K that 
minimizes V(K). However, the more the factors, 
the smaller the V(K), but at a cost of estimat¬ 
ing more factors with greater estimation errors. 
Hence, we want to penalize too many factors. 
This is the same as the case in linear regres¬ 
sions where we also want to penalize too many 
regressors. The second term in equation (19) 
plays this role. It is an increasing function of K. 
Therefore, the trade-off between model fitting 
and estimation errors requires us to minimize 
the fC(fC) function. Theoretically, assuming that 
the factor model is indeed true for some fixed 
K*, Bai and Ng show that the K that minimizes 
IC(K) will converge to K* as either N or T or 
both increase to infinity. 

An Application to Bond Returns 

To illustrate the procedure, consider an appli¬ 
cation of the PCA factor analysis to the excess 
returns on Treasury bonds with maturities 12, 
18,24,30,36,42,48,54,60,120, and beyond 120 
months. Hence, there are N = 11 assets. With 
monthly data from January 1980 to December 
2008, available from the Center for Research in 
Security Prices of the University of Chicago's 
Graduate School of Business, we have a sample 
size of T = 348. Since N is small relative to T, 
this is a case of the fixed N. 


is an 11 by 11 matrix. We can easily compute its 
eigenvalues and eigenvectors. The largest three 
eigenvalues are 

(M, y. 2 , A*) = 10 _2 (0.2403, 0133, 0012) 


N T 

V(K) = EE (Tit - fin fit ~ finht - - fiiK?Kt ) 2 

i =1 f=l 

(20) 


whose sum is more than 99% of the sum of 
all the eigenvalues. Thus, it is enough to con¬ 
sider K = 3 factors and use the first three 
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Table 1 Factor Loadings and Explanatory Power 



Pi 

Pi 

Pi 

R 2 

(Fd 

R 2 

( F, and F 2 ) 

R 2 

(all three) 

12 month 

0.0671 

-0.1418 

0.4046 

0.67 

0.80 

0.96 

18 month 

0.1118 

-0.2057 

0.4227 

0.79 

0.84 

0.99 

24 month 

0.1524 

-0.2455 

0.3371 

0.85 

0.87 

1.00 

32 month 

0.1932 

-0.2876 

0.3199 

0.88 

0.89 

1.00 

38 month 

0.2269 

-0.2851 

0.2101 

0.91 

0.92 

1.00 

42 month 

0.2523 

-0.2621 

-0.0813 

0.94 

0.94 

0.99 

48 month 

0.2837 

-0.2415 

-0.2531 

0.95 

0.96 

1.00 

54 month 

0.3072 

-0.1920 

-0.3762 

0.97 

0.97 

1.00 

60 month 

0.3368 

-0.1819 

-0.3246 

0.97 

0.98 

0.99 

120 month 

0.4038 

0.0426 

-0.1507 

0.99 

0.99 

0.99 

Over 120 

0.5966 

0.7173 

0.2394 

0.92 

0.93 

1.00 


eigenvectors, PC As, as proxies for the factors. 
Denote them as F\, ¥2 and F 3 . 

Consider now the regression of the 11 excess 
bond returns on the three factors, 

Rit = O', + PaFit + fii2p2t + Pi3^3t + e it 

where i = 1,2,..., 11. The regression R 2 s of us¬ 
ing all the factors for each of the assets are re¬ 
ported in the last column of Table 1. All but 
one is 99% or above, confirming the eigenvalue 
analysis that three factors are sufficient, which 
explains almost all the variations of the bond 
returns. However, when only the first two are 
used, the R 2 s are smaller, but the minimum 
is still over 80%. When only the first factor 
is used, the R 2 s range from 67% on the first 
bond return to 99% on the 10th. Overall, the 
PCA factors are effective in explaining the asset 
returns. 

The factor loadings or regression coefficients 
on the factors are also reported in Table 1. It 
is interesting that the loadings on the first fac¬ 
tor are all positive. This implies that a positive 
realization of Fi will have a positive effect on 
the returns of all the bonds. It is, however, clear 
that F | affects long-term bonds more than short¬ 
term bonds. As an approximation, Fi is usu¬ 
ally interpreted as a level effect or parallel effect 
that roughly shifts the returns on bonds across 
maturity. 

The second factor, however, has a different 
pattern from the first. A positive realization 


of F\ will have a negative effect on short¬ 
term bonds and a positive effect on the long¬ 
term ones. This is equivalent to an increase in 
the slope of the bond returns across maturity 
(known as yield curve). Therefore, F 2 is com¬ 
monly identified as a steepness factor. 

Finally, a positive realization of F 3 will have 
a positive effect on both short- and long-term 
bonds, but a negative effect on the intermediate 
ones. Hence F 3 is usually interpreted as a cur¬ 
vature factor. Litterman and Scheinkman (1991) 
appears to have been one of the first to to ap¬ 
ply the PCA to study bond returns and to have 
identified the above three factors. Although the 
data we used here are different, the three fac¬ 
tors we computed share the same properties as 
those identified by them. 

Alternative Approaches and 
Extensions 

The standard statistical approach for estimat¬ 
ing the factor model is the maximum likelihood 
(ML) method. Consider the factor model given 
by equation (12) where £(/ t ) = 0, £[Y f ] = 0. 
The de-meaned returns and standardized fac¬ 
tors are usually assumed to have normal distri¬ 
butions. 

In addition, the factors are usually standard¬ 
ized so that E f = Ik, and the residuals are as¬ 
sumed uncorrelated so that E, is diagonal. Then 
the log likelihood function, as the log density 
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function of the returns, is 

NT T 

log LOS, Z E ) = —— log(2jr) - — log |jS'y3 + Z E | 

1 T 

-2Ew + s «)' 1y ' ( 21 ) 

The ML estimator of the parameters ft and £ E 
are those values that maximize the log likeli¬ 
hood function. Since ft enters into the function 
in a complex nonlinear way, an analytical solu¬ 
tion to the maximization problem is a very dif¬ 
ficult problem. Numerically, it is still difficult if 
maximizing log L(/J, Z E ) directly. 

There is, however, a data-augmentation tech¬ 
nique known as the expectation maximiza¬ 
tion (EM) algorithm that can be applied (see 
Lehmann and Modest, 1998). The EM algorithm 
can be effective in numerically solving the ear¬ 
lier maximization problem. The idea of the EM 
algorithm is simple. The key difficulty here is 
that the factors are unobserved. But conditional 
on the parameters and the factor model, we can 
learn them. Consider now that given the factors 
ft, the log likelihood function conditional on f t 
is 

NT T 

log L C (P, Z E ) =-— log(2jr) - - log | Z E | 

1 T 
^ t= 1 

( 22 ) 

Because it is conditional on ft, the factor 
model is the usual linear regression. In other 
words, integrating out f t from equation (22) 
yields the unconditional log L(/3, Z E ). The 
beta estimates conditional on f t are straight¬ 
forward. They are the usual OLS regression 
coefficients, and the estimates for Z E are the 
residual variances. 

On the other hand, conditional on the parame¬ 
ters, we can learn the factors by using their con¬ 
ditional expected values obtained easily from 
their joint distribution with the returns. Hence, 
we can have an iterative algorithm. Starting 
from an initial guess of the factors, we maximize 


the conditional likelihood function to obtain the 
OLS ft and Z E estimates, which is the M-step of 
the EM algorithm. Based on these estimates, we 
update a new estimate of f t using their expected 
value. This is the EM algorithm's E-step. Using 
the new ft, we learn new estimates of f and Z E 
in the M-step. With the new estimates, we can 
again update the ft. Iterating between the EM 
steps, the limits converge to the unconditional 
ML estimate and the factor estimates converge 
to the true ones. 

As an alternative to the ML method, Geweke 
and Zhou (1996) propose a Bayesian approach, 
which treats all parameters as random vari¬ 
ables. It works in a way similar to the EM al¬ 
gorithm. Conditional on parameters, we learn 
the factors, and conditional on the factors, we 
learn the parameters. Iterating after a few thou¬ 
sand times, we learn the entire joint distribu¬ 
tion of the factors and parameters, which are 
all we need in a factor model. The advantage 
of the Bayesian approach is that it can incor¬ 
porate prior information and can provide exact 
inference. In contrast, the ML method cannot 
use any priors, nor can it obtain the exact stan¬ 
dard errors of both parameters and functions 
of interest due to the complexity of the fac¬ 
tor model. Nardari and Scruggs (2007) extend 
the Bayesian approach to allow a more gen¬ 
eral model in which the covariance matrix can 
vary over time and the APT restrictions can be 
imposed. 

Finally, we provide two important extensions 
of the factor model that are useful in prac¬ 
tice. Note that the factors we discussed thus 
far assume identical and independently dis¬ 
tributed returns and factors. These are known 
as static factor models. The first extension is 
dynamic factor models, which allow the fac¬ 
tors to evolve over time according to a vector 
autoregression, 

ft = Aif-i + Aift—2 + ■ ■ ■ + A m ft- m + vt 

(23) 

where the A's are the regression coefficient ma¬ 
trices, m is the order of the autoregression that 
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determines how far past factor realizations still 
affect today's realizations, and v t is the resid¬ 
ual. In practice, many economic variables are 
highly persistent, and hence it will be important 
to incorporate this as above. (See Amengual and 
Watson [2007] for a discussion of estimation for 
dynamic factor models.) 

The second extension is to allow the case with 
a large number of factors. Consider our earlier 
factor model 


Rt — a + fijt + Pggt + St (24) 

where Jt is a K vector of latent factors, gt is an L 
vector of observable factors. The problem now 
is that L is large, about 100 or 200, for instance. 
This requires at least a few hundred or more 
time series observations for the regression of Rt 
on g t to be well behaved, and this can cause a 
problem due to the lack of long-term time series 
data or due to concerns of stationarity. The idea 
is to break g t into two sets, g\ t and g 2 t, with the 
first having a few key variables and the second 
having the rest. We then consider the modified 
model 


Rt — oi + fift + Pgigit + Phiit + St (25) 

where h(t) has a few variables too that repre¬ 
sent a few major driving forces that summarize 
the potentially hundreds of variables of g 2 t via 
another factor model, 

git = Bh t + ii t (26) 

where u t is the residual. This second factor 
model provides a large dimension reduction 
that transforms the hundreds of variables into 
a few, which can be estimated by the PCA. In 
the end, we have only a few factors in equa¬ 
tion (25), making the analysis feasible based 
on the methods we discussed earlier. Ludvig- 
son and Ng (2007) appear to be the first to ap¬ 
ply such a model in finance. They find that 
the model can effectively incorporate a few 
hundred variables so as to make a signifi¬ 
cant difference in understanding stock market 
predictability. 


USE OF PRINCIPAL 
COMPONENTS ANALYSIS 

Principal components analysis (PCA) is a 
widely used tool in finance. It is useful not only 
for estimating factor models as explained in this 
entry, but also for extracting a few driving vari¬ 
ables in general out of many for the covariance 
matrix of asset returns. Hence, it is important 
to understand the statistical intuition behind it. 
To this end, we provide a simple introduction 
to it in the last section of the entry. 

Perhaps the best way to understand the PCA 
is to go through an example in detail. Suppose 
there are two risky assets, whose returns are 
denoted by f | and f 2 , with covariance matrix 


of O'! 2 


'2.05 

1.95' 

1- 


1.95 

2.05 _ 


That is, we assume that they have the same 
variances of 2.05 and covariance of 1.95. Our 
objective is to find a linear combination of the 
two assets so that it has a large component in the 
covariance matrix, which will be clear below. 
For notation brevity, we assume first that the 
expected returns are zeros; that is, 

R[h] = 0, E[h] = 0 


and will relax this assumption later. 

Recall from linear algebra that we call any 
vector (fli, a 2 j satisfying 


E 



= k 



an eigenvector of E, and the associated k the 
eigenvalue. In our example here, it is easy to 
verify that 


2.05 1.951 m m 

1.95 2.05 J V 1 / W 


2.05 1.95 
1.95 2.05 



= 0.1 x 



so 4 and 0.1 are the eigenvalues, and (1, 1)' and 
(1, —1)' are the eigenvectors. 
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In practice, computer software is available to 
compute the eigenvalue and eigenvectors of 
any covariance matrix. The mathematical result 
is that for a covariance matrix of N assets, there 
are exactly N different eigenvectors and N as¬ 
sociated positive eigenvalues (these eigenval¬ 
ues can be equal in some cases). Moreover, the 
eigenvectors are orthogonal to each other; that 
is, their inner product or vector product is zero. 
In our example, it is clear that 

cl, iy- (M = 1-1 = 0 


• Fact 1. The variances of the PCs are exactly 
equal to the eigenvalues corresponding to the 
eigenvectors used to form the PCs. 

That is, 

Var(Pi) = 4 
Var(P 2 ) = 1 

Note that the two PCs are random variables 
since they are the linear combination of random 
returns. So, their variances are well defined. 
The equalities to the eigenvalues can be verified 
directly. 


It should be noted that the eigenvalue associ¬ 
ated with each eigenvector is unique, but any 
scale of the eigenvector remains an eigenvec¬ 
tor. In our example, it is obvious that a double 
of the first eigenvector, (2, 2)', is also an eigen¬ 
vector. However, the eigenvectors will be 
unique if we standardize them, making the sum 
of the elements 1. In our example. 


A\ 


1/72 

1/72 


Az 


1/72 

-1/72 


are the standardized eigenvectors, which are 
obtained by scaling the earlier eigenvectors by 
1 /a/ 2- These are indeed standardized, since 

a; A! = (1/a/2) 2 + (1/72)2 = 1 
a/a 2 = (1/V2) 2 + (-1/72)2 = 1 


• Fact 2. The returns can also be written as linear 
combinations of the PCs. 


The PCs are defined as linear combinations of 
the returns. Inverting them, the returns are lin¬ 
ear functions of the PCs, too. Mathematically, 
P — AP, and so R = A -1 P. Since A is orthogo¬ 
nal, A -1 = A', thus P = A'P. That is, we have 


1 1 _ 

h = -pPi + p= Pz 
a/2 72 


f2 =72 Pl 


72 


(26) 


• Fact 3. The asset return covariance matrix can 
be decomposed as the sum of the products of 
eigenvalues with the cross products of eigen¬ 
vectors. 


Mathematically, it is known that 


Now let us consider two linear combinations 
(or portfolios without imposing the weights 
summing to 1) of the two assets whose returns 
are fi and f 2 . 


fs 1 1 

Pi = —j=-f\ + —=r 2 = 
72 72 

fS 1 1 

p 2 = — h — —f 2 = 

72 72 


A/P 

A/P 


where P = (?i,f 2 )'. Both Pi and P 2 are called 
the principal components (PCs). There are three 
important and interesting mathematical facts 
about the PCs. 


E = [A\, A 2 ] 


M 0 

0 x 2 


[Ai, A 2 ]' 


= A. 1 A 1 A/ T A. 2 A 2 A/ = 4Ai A/ T 0.1A 2 A/ 


which is also easy to verify in our example. The 
economic interpretation is that the total risk 
profile of the two assets, as captured by their 
covariance matrix, is a sum of two components. 
The first component is determined by the first 
PC, and the second is determined by the second 
PC. In other words, in the return linear combi¬ 
nations, equation (26), if we ignore P 2 , we will 
get only X\A\A V the first component in the co- 
variance matrix decomposition, and only the 
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second if we ignore Pi. We obtain the entire E 
if we ignore neither. 

The purpose of the PCA is finally clear. Since 
4 is 40 times as big as 0.1, the second compo¬ 
nent in the E decomposition has little impact, 
and hence may be ignored. Then, ignoring P 2 , 
we can write the returns simply as, based on 
equation (26), 

fi « (1/V2)P! 

? 2 « (1/V2)P! 


This says that we can reduce the analysis of 
?i and ?2 by analyzing simple functions of P\. 
In this example, the result tells us that the two 
assets are almost the same. In practice, there 
may be hundreds of assets. By using PCA, we 
can reduce the dimensionality of the problem 
substantially to an analysis of perhaps a few, 
say five, PCs. 

In general, when there are N assets with 
return R — (f 1 ,... r^)', computer software can 
be used to obtain the N eigenvalues and N 
standardized eigenvectors. Let Ai > A 2 > ... > 
fv > 0 be the N eigenvalues in decreasing or¬ 
der, and A, = (a,\. a, 2 ,... cij N y be the standard¬ 
ized eigenvector associated with A;, and A be an 
N x N matrix formed by the all the eigenvec¬ 
tors. Then, the z-th PC is defined as P, = A' R, 
all of which can be computed in matrix form. 


_1 


1 

hf hf 

1_ 

1 

1_ 


-A n R_ 


(27) 


The decomposition for E is 


£ = [A 1 ,..., A n ] 


Aj 0 ... 0 
0 A 2 ... 0 
0 0 • • ■ A n 


[A 1; ..., A N ]' 


= A1A1 Aj + A2A2A2 + ■ • • + AjvAjvA'jy 


It is usually the case that, for some K, the first 
K eigenvalues are large, and the rest are too 
small and can then be ignored. In such situa¬ 
tions, based on the first K PCs, we can approxi¬ 


mate the asset returns by 

fi sa Pi + ZZ12P2 + ■ ■ ■ + ciikPk, 

H ~ ZZ21 Pi + ZZ22P2 + • • • + « 2 k Pk , ^ 8 ) 

?N ^ “fflPl + UNlPl H-+ UnkPk 

In most studies, the K PCs may be interpreted 
as K factors that (approximately) derive the 
movements of all the N returns. Our earlier ex¬ 
ample is a case with K = 1 and N — 2. 

In the above PCA discussion, the expected 
returns of the asset are assumed to be zero. 
If they are nonzero and given by a vector 
(fii, hi, ■ ■ ■, /r .v)', E will remain the same, and 
so will the eigenvalues and eigenvectors. How¬ 
ever, in this case we need to replace all the f,'s 
in equation (27) by r, — /x,'s and add /x,'s on the 
right-hand side of equation (28). The interpre¬ 
tation will be, of course, the same as before. 

In Case 1 of the factor model estimation (i.e., 
known or observable factors) discussed in the 
entry, the K PCs clearly provide a good approx¬ 
imation of the first K factors since they explain 
the asset variations the most given K. More¬ 
over, in either Case 1 or Case 2 (latent factors), 
the PCA is equivalent to minimizing the model 
errors, as given by equation (20), by choosing 
both the loadings and factors, and hence the so¬ 
lution should be close to the true factors and 
loadings. 


KEY POINTS 

• The arbitrage pricing theory is a general mul¬ 
tifactor model for pricing assets. The theory 
does not provide any specific information 
about what the factors are. Moreover, the APT 
does not make any claims on the number of 
factors either. 

• The APT asserts that only taking the system¬ 
atic risks is rewarded. 

• The APT simply assumes that if the returns 
are driven by the factors, and if investors 
know the betas for the factors, then an arbi¬ 
trage portfolio, which requires no investment 
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but yields a positive return, can be formed if 
the APT pricing relation is violated in the mar¬ 
ket. In equilibrium, therefore, if there are no 
arbitrage opportunities, deviations from the 
APT pricing relation should not be observed. 

• In practice, factor models are widely used as a 
tool for estimating expected asset returns and 
their covariance matrix. The reason is that if 
investors can identify the factors that drive 
asset returns, they will have much better es¬ 
timates of the true expected asset returns and 
the covariance matrix, and hence will be able 
to form a much better portfolio than other¬ 
wise possible. 

* Factor model estimation depends crucially on 
(1) whether the factors are identified (known) 
and unidentified (latent), and (2) the sample 
size and the number of assets. Furthermore, 
factor models can be used not only for ex¬ 
plaining asset returns, but also for predicting 
future returns. 

* The simplest case of factor models is where 
the factors are assumed to be known or ob¬ 
servable, so that time-series data are those 
factors can be used to estimate the model. 

* In practice there are three commonly used eq¬ 
uity multifactor models where known factors 
are used: (1) the Fama-French three-factor 
model, (2) the MSCI Barra fundamental factor 
model, and (3) the Burmeister-Ibbotson-Roll- 
Ross macroeconomic factor model. Fun¬ 
damental factor models use company and 
industry attributes and market data as de¬ 
scriptors. In a macroeconomic factor model, 
the inputs to the model are historical stock 
returns and observable macroeconomic 
variables. 

• An argument for the use of latent factors is 
that the observed factors may be measured 
with errors or have been already anticipated 
by investors. Without imposing what the fac¬ 
tors are from likely incorrect beliefs, asset 
managers can statistically estimate the factors 
based on the factor model and data. 

• Two important extensions of the static factor 
model used in practice are (1) dynamic fac¬ 


tor models, which allow the factors to evolve 
over time according to a vector autoregres¬ 
sion, and (2) allowance for a large number 
of factors. This second factor model provides 
a large dimension reduction that transforms 
the hundreds of variables into a few, which 
can be estimated by principal components 
analysis. 

• Principal components analysis is a sim¬ 
ple statistical approach that can be ap¬ 
plied to estimate a factor model easily and 
effectively. 
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Abstract: In investment management, multifactor risk modeling is the most common application 
of financial modeling. Multifactor risk models, or simply factor models, are linear regressions 
over a number of variables called factors. Factors can be exogenous variables or abstract variables 
formed by portfolios. Exogenous factors (or known factors) can be identified from traditional 
fundamental analysis or economic theory from macroeconomic factors. Abstract factors, also called 
unidentified or latent factors, can be determined with factor analysis or principal component 
analysis. Principal component analysis identifies the largest eigenvalues of the variance-covariance 
matrix or the correlation matrix. The largest eigenvalues correspond to eigenvectors that identify 
the entire market and sectors that correspond to industry classification. Factor analysis can be used 
to identify the structure of the latent factors. 


Principal component analysis (PCA) and factor 
analysis are statistical tools that allow a mod¬ 
eler to (1) reduce the number of variables in a 
model (i.e., to reduce the dimensionality), and 
(2) identify if there is structure in the relation¬ 
ships between variables (i.e., to classify vari¬ 
ables). In this entry, we explain PCA and factor 
analysis. We illustrate and compare both tech¬ 
niques using a sample of stocks. Because of its 
use in the estimation of factor models, we begin 
with a brief discussion of factor models. 


FACTOR MODELS 

Factor models are statistical models that try to 
explain complex phenomena through a small 
number of basic causes or factors. Factor models 
serve two main purposes: (1) They reduce the 
dimensionality of models to make estimation 
possible; and/or (2) they find the true causes 
that drive data. Factor models were introduced 
by Charles Spearman (1904), a leading psychol¬ 
ogist who developed many concepts of modern 
psychometrics. 
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Spearman was particularly interested in 
understanding how to measure human intel¬ 
lectual abilities. In his endeavor to do so, he 
developed the first factor model, known as 
the Spearman model, a model that explains 
intellectual abilities through one common fac¬ 
tor, the famous "general intelligence" g factor, 
plus another factor s, which is specific to each 
distinct ability. Spearman was persuaded that 
the factor g had an overwhelming importance. 
That is, he thought that any mental ability can 
be explained quantitatively through a common 
intelligence factor. According to this theory, 
outstanding achievements of, say, a painter, a 
novelist, and a scientist can all be ascribed to a 
common general intelligence factor plus a small 
contribution from specific factors. 

Some 30 years later, Louis Leon Thurstone 
(1938) developed the first true multifactor 
model of intelligence. Thurstone was among 
the first to propose and demonstrate that there 
are numerous ways in which a person can be 
intelligent. Thurstone's multiple-factors theory 
identified seven primary mental abilities. 

One might question whether factors are only 
statistical artifacts or if they actually correspond 
to any reality. In the modern operational inter¬ 
pretation of science, a classification or a factor is 
"real" if we can make useful predictions using 
that classification. For example, if the Spearman 
theory is correct, we can predict that a highly in¬ 
telligent person can obtain outstanding results 
in any field. Thus, a novelist could have ob¬ 
tained outstanding results in science. However, 
if many distinct mental factors are needed, peo¬ 
ple might be able to achieve great results in 
some field but be unable to excel in others. 

In the early applications of factor models to 
psychometrics, the statistical model was essen¬ 
tially a conditional multivariate distribution. 
The raw data were large samples of psycho¬ 
metric tests. The objective was to explain these 
tests as probability distributions conditional on 
the value of one or more factors. In this way, 
one can make predictions of, for example, the 


future success of young individuals in different 
activities. 

In finance, factor models are typically applied 
to time series. The objective is to explain the 
behavior of a large number of stochastic pro¬ 
cesses, typically price, returns, or rate processes, 
in terms of a small number of factors. These 
factors are themselves stochastic processes. In 
order to simplify both modeling and estima¬ 
tion, most factor models employed in financial 
econometrics are static models. This means that 
time series are assumed to be sequences of tem¬ 
porally independent and identically distributed 
(IID) random variables so that the series can be 
thought of as independent samples extracted 
from one common distribution. 

In financial econometrics, factor models are 
needed not only to explain data but to make 
estimation feasible. Given the large number 
of stocks presently available—in excess of 
15,000—the estimation of correlations cannot 
be performed without simplifications. Widely 
used ensembles such as the S&P 500 or the 
MSCI Europe include hundreds of stocks and 
therefore hundreds of thousands of individual 
correlations. Available samples are insufficient 
to estimate this large number of correlations. 
Hence factor models are able to explain all pair¬ 
wise correlations in terms of a much smaller 
number of correlations between factors. 

Linear Factor Models Equations 

Linear factor models are regression models of the 
following type: 

K 

Xj = oij + fij fj + si 

i =1 

where 

X, = a set of N random variables 
fj = a set of K common factors 
e, = the noise terms associated with each 
variable X, 
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The fiij's are called the factor loadings or factor 
sensitivities; they express the influence of the 
y'-th factor on the ;-th variable. 

In this formulation, factor models are essen¬ 
tially static models, where the variables and the 
factors are random variables without any ex¬ 
plicit dependence on time. It is possible to add 
a dynamic to both the variables and the factors, 
but that is beyond the scope of our basic intro¬ 
duction in this entry. 

As mentioned above, one of the key objectives 
of factor models is to reduce the dimensionality 
of the covariance matrix so that the covariances 
between the variables X; are determined only 
by the covariances between factors. Suppose 
that the noise terms are mutually uncorrelated, 
so that 


E(siSj) 


0, i # j 
i ± i 


as follows: 


X = a + |3f + e 


where 

X = (X l7 ..., X N )' — the N-vector of vari¬ 
ables 

a = (aj,..., a N )' = the N-vector of means 
e = (ei,..., ej v)' = the N-vector of idiosyn- 
cratic noise terms 

f = (f lr ... ,f K y — the X-vector of factors 





_ Pni • • • Pnk _ 

= the N x K matrix of factor loadings. 


Let's define the following: 


and that the noise terms are uncorrelated with 
the factors, that is, E(sifj) — 0, V/,/. Suppose also 
that both factors and noise terms have a zero 
mean, so that £(X ; ) = a/. Factor models that 
respect the above constraints are called strict 
factor models. 

Let's compute the covariances of a strict factor 
model: 


E((Xj — Ui)(Xj — aj)) 

= E [{Tjisfs + Si j (y Pjtft j 

-((s«)(s-))-((s«)-) 

+ E ^( £ d y, Pjtft J + E (£;£/) 

= ypisE(fsft)Pjt + V(eiSj) 

S,t 

From this expression we can see that the vari¬ 
ances and covariances between the variables X, 
depend only on the covariances between the 
factors and the variances of the noise term. 

We can express the above compactly in matrix 
form. Let's write a factor model in matrix form 


E = the N x N variance-covariance matrix 
of the variables X 

ft = the K x K variance-covariance matrix 
of the factors 

vp —N x N variance-covariance matrix of 
the error terms e 


If we assume that our model is a strict factor 
model, the matrix will be a diagonal matrix 
with the noise variances on the diagonal, that 
is. 


= 


/Vh 2 0 \ 
V o 1 An/ 


Under the above assumptions, we can express 
the variance-covariance matrix of the variables 
in the following way: 


e = pnp' + ^ 


In practice, the assumption of a strict factor 
model might be too restrictive. In applied work, 
factor models will often be approximate factor 
models. (See, for example, Bai, 2003.) Approx¬ 
imate factor models allow idiosyncratic terms 
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to be weakly correlated among themselves and 
with the factors. 

As many different factor models have been 
proposed for explaining stock returns, an im¬ 
portant question is whether a factor model is 
fully determined by the observed time series. 
In a strict factor model, factors are determined 
up to a nonsingular linear transformation. In 
fact, the above matrix notation makes it clear 
that the factors, which are hidden, nonobserv¬ 
able variables, are not fully determined by the 
above factor model. That is, an estimation pro¬ 
cedure cannot univocally determine the hidden 
factors and the factor loadings from the observ¬ 
able variables X. In fact, suppose that we mul¬ 
tiply the factors by any nonsingular matrix R. 
We obtain other factors 

g = Rf 

with a covariance matrix 

= RS2R 1 

and we can write a new factor model: 

X=a+|3f+e = a+ pR ^Rf 
+ e = a + p g g + e 

In order to solve this indeterminacy, we can 
always choose the matrix R so that the factors g 
are a set of orthonormal variables, that is, uncor¬ 
related variables (the orthogonality condition) 
with unit variance (the normality condition). 
In order to make the model uniquely identifi¬ 
able, we can stipulate that factors must be a set 
of orthonormal variables and that, in addition, 
the matrix of factor loadings is diagonal. Un¬ 
der this additional assumption, a strict factor 
model is called a normal factor model. Note ex¬ 
plicitly that under this assumption, factors are 
simply a set of standardized independent vari¬ 
ables. The model is still undetermined under 
rotation, that is multiplication by any nonsin¬ 
gular matrix such that RR' = I. 

In summary, a set of variables has a normal 
factor representation if it is represented by the 
following factor model: 

X = oc -f~ pf -f~ £ 


where factors are orthonormal variables and 
noise terms are such that the covariance matrix 
can be represented as follows: 

£ = PP' + 

where P is the diagonal matrix of factor load¬ 
ings and ^ is a diagonal matrix. 

How can we explain the variety of factor mod¬ 
els proposed given that a strict factor model 
could be uniquely identified up to a factor lin¬ 
ear transformation? As mentioned, the assump¬ 
tions underlying strict factor models are often 
too restrictive and approximate factor models 
have to be adopted. Approximate factor mod¬ 
els are uniquely identifiable only in the limit of 
an infinite number of series. The level of ap¬ 
proximation is implicit in practical models of 
returns. 


Types of Factors and Their 
Estimation 

In financial econometrics, the factors used in 
factor models can belong to three different cat¬ 
egories: macroeconomic factors, fundamental 
factors, and statistical factors. The first two are 
factor models that deal with known factors and 
will not be discussed here. 

Note that factors defined through statistical 
analysis are linear combinations of the vari¬ 
ables. That is, if the variables are asset returns, 
factors are portfolios of assets. They are hid¬ 
den variables insofar as one does not know the 
weights of the linear combinations. However, 
once the estimation process is completed, statis¬ 
tical factors are always linear combinations of 
variables. If data have a strict factor structure, 
we can always construct linear combinations of 
the series (e.g., portfolios of returns) that are 
perfectly correlated with a set of factors. Often 
they can be given important economic interpre¬ 
tations. In the following sections we describe 
the theory and estimation methods of principal 
components analysis and factor analysis. 
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PRINCIPAL COMPONENTS 
ANALYSIS 

Principal components analysis (PCA) was intro¬ 
duced by Harold Hotelling (1933). Hotelling 
proposed PCA as a way to determine fac¬ 
tors with statistical learning techniques when 
factors are not exogenously given. Given a 
variance-covariance matrix, one can determine 
factors using the technique of PCA. 

PCA implements a dimensionality reduction 
of a set of observations. The concept of PCA 
is the following. Consider a set of n stationary 
time series X,, for example the 500 series of re¬ 
turns of the S&P 500. Consider next a linear 
combination of these series, that is, a portfolio 
of securities. Each portfolio P is identified by an 
n-vector of weights cup and is characterized by a 
variance (rj r In general, the variance cr~ depends 
on the portfolio's weights cup. Lastly, consider 
a normalized portfolio, which has the largest 
possible variance. In this context, a normalized 
portfolio is a portfolio such that the squares of 
the weights sum to one. 

If we assume that returns are IID sequences, 
jointly normally distributed with variance- 
covariance matrix cr, a lengthy direct calcula¬ 
tion demonstrates that each portfolio's return 
will be normally distributed with variance 

o-p = LDpcrtUp 

The normalized portfolio of maximum variance 
can therefore be determined in the following 
way: 

Maximize tUptrcup 
subject to the normalization condition 
CVpWp = 1 

where the product is a scalar product. It can 
be demonstrated that the solution of this prob¬ 
lem is the eigenvector aq corresponding to the 
largest eigenvalue k\ of the variance-covariance 
matrix cr. As cr is a variance-covariance matrix, 
the eigenvalues are all real. 

Consider next the set of all normalized port¬ 
folios orthogonal to uq, that is, portfolios com¬ 


pletely uncorrelated with uq. These portfolios 
are identified by the following relationship: 

cu^tup = tUpOq = 0 

We can repeat the previous reasoning. Among 
this set, the portfolio of maximum variance is 
given by the eigenvector aq corresponding to 
the second largest eigenvalue A. 2 of the variance- 
covariance matrix cr. If there are n distinct eigen¬ 
values, we can repeat this process n times. In 
this way, we determine the n portfolios P, of 
maximum variance. The weights of these port¬ 
folios are the orthonormal eigenvectors of the 
variance-covariance matrix a. Note that each 
portfolio is a time series that is a linear combi¬ 
nation of the original time series X, . The coeffi¬ 
cients are the portfolios' weights. 

These portfolios of maximum variance are all 
mutually uncorrelated. It can be demonstrated 
that we can recover all the original return time 
series as linear combinations of these portfolios: 

n 

x, = E“/.' p ' 

!=1 

Thus far we have succeeded in replacing the 
original n correlated time series X, with n un¬ 
correlated time series P, with the additional in¬ 
sight that each X ; is a linear combination of 
the P,. Suppose now that only p of the port¬ 
folios P, have a significant variance, while the 
remaining n — p have very small variances. We 
can then implement a dimensionality reduction 
by choosing only those portfolios whose vari¬ 
ance is significantly different from zero. Let's 
call these portfolios factors F. 

It is clear that we can approximately represent 
each series X, as a linear combination of the 
factors plus a small uncorrelated noise. In fact 
we can write 

v « v 

Xj = E + E a i J Pl = E a i’ iFi + E i 

i=i i=p +i ;=i 

where the last term is a noise term. Therefore 
to implement PCA one computes the eigen¬ 
values and the eigenvectors of the variance- 
covariance matrix and chooses the eigenvalues 
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significantly different from zero. The corre¬ 
sponding eigenvectors are the weights of port¬ 
folios that form the factors. Criteria of choice 
are somewhat arbitrary. 

Suppose, however, that there is a strict fac¬ 
tor structure, which means that returns follow 
a strict factor model as defined earlier in this 
entry: 

r = a + |3f + e 

The matrix |3 can be obtained diagonalizing 
the variance-covariance matrix. In general, the 
structure of factors will not be strict and one 
will try to find an approximation by choosing 
only the largest eigenvalues. 

Note that PCA works either on the variance- 
covariance matrix or on the correlation matrix. 
The technique is the same but results are gen¬ 
erally different. PCA applied to the variance- 
covariance matrix is sensitive to the units of 
measurement, which determine variances and 
covariances. This observation does not apply 
to returns, which are dimensionless quantities. 
However, if PCA is applied to prices and not 
to returns, the currency in which prices are ex¬ 
pressed matters; one obtains different results in 
different currencies. In these cases, it might be 
preferable to work with the correlation matrix. 

We have described PCA in the case of time 
series, which is the relevant case in economet¬ 
rics. However, PCA is a generalized dimension¬ 
ality reduction technique applicable to any set 
of multidimensional observations. It admits a 
simple geometrical interpretation, which can 
be easily visualized in the three-dimensional 
case. Suppose a cloud of points in the three- 
dimensional Euclidean space is given. PCA 
finds the planes that cut the cloud of points in 
such a way as to obtain the maximum variance. 

Illustration of Principal 
Components Analysis 

Let's now show how PCA is performed. To 
do so, we used monthly observations for the 
following 10 stocks: Campbell Soup, General 


Dynamics, Sun Microsystems, Hilton, Martin 
Marietta, Coca-Cola, Northrop Grumman, 
Mercury Interactive, Amazon.com, and United 
Technologies for the period from December 
2000 to November 2005. Figure 1 shows the 
graphics of the 10 return processes. 

As explained earlier, performing PCA is 
equivalent to determining the eigenvalues and 
eigenvectors of the covariance matrix or of 
the correlation matrix. The two matrices yield 
different results. We perform both exercises, 
estimating the principal components using 
separately the covariance and the correlation 
matrices of the return processes. We estimate 
the covariance with the empirical covariance 
matrix. Recall that the empirical covariance 
aij between variables (X„X ( ) is defined as 
follows: 

T 

t=l 

T T 

Xi = jJ2 x i(t),x j = ^j2 x i^ 

t =i f=i 

Table 1 shows the covariance matrix. 

Normalizing the covariance matrix with the 
standard deviations, we obtain the correlation 
matrix. Table 2 shows the correlation matrix. 
Note that the diagonal elements of the correla¬ 
tion matrix are all equal to one. In addition, a 
number of entries in the covariance matrix are 
close to zero. Normalization by the product of 
standard deviations makes the same elements 
larger. 

Let's now proceed to perform PCA using the 
covariance matrix. We have to compute the 
eigenvalues and the eigenvectors of the covari¬ 
ance matrix. Table 3 shows the eigenvectors 
(panel A) and the eigenvalues (panel B) of the 
covariance matrix. 

Each column of panel A of Table 3 represents 
an eigenvector. The corresponding eigenvalue 
is shown in panel B. Eigenvalues are listed in de¬ 
scending order; the corresponding eigenvectors 
go from left to right in the matrix of eigenvec¬ 
tors. Thus the leftmost eigenvector corresponds 
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Figure 1 Graphics of the 10 Stock Return Processes 



to the largest eigenvalue. Eigenvectors are not 
uniquely determined. In fact, multiplying any 
eigenvector for a real constant yields another 
eigenvector. The eigenvectors in Table 3 are nor¬ 
malized in the sense that the sum of the squares 
of each component is equal to 1. It can be eas¬ 
ily checked that the sum of the squares of the 
elements in each column is equal to 1. This still 
leaves an indeterminacy, as we can change the 


sign of the eigenvector without affecting this 
normalization. 

As explained earlier, if we form portfolios 
whose weights are the eigenvectors, we can 
form 10 portfolios that are orthogonal (i.e., 
uncorrelated). These orthogonal portfolios are 
called principal components. The variance of each 
principal component will be equal to the corre¬ 
sponding eigenvector. Thus the first principal 


Table 1 The Covariance Matrix of 10 Stock Returns 



SUNW 

AMZN 

MERQ 

GD 

NOC 

CPB 

KO 

MLM 

HLT 

UTX 

SUNW 

0.02922 

0.017373 

0.020874 

3.38E-05 

-0.00256 

-3.85E-05 

0.000382 

0.004252 

0.006097 

0.005467 

AMZN 

0.017373 

0.032292 

0.020262 

5.03E-05 

-0.00277 

0.000304 

0.001507 

0.001502 

0.010138 

0.007483 

MERQ 

0.020874 

0.020262 

0.0355 

-0.00027 

-0.0035 

-0.00011 

0.003541 

0.003878 

0.007075 

0.008557 

GD 

3.38E-05 

5.03E-05 

-0.00027 

9.27E-05 

0.000162 

2.14E-05 

-0.00015 

3.03E-05 

-4.03E-05 

-3.32E-05 

NOC 

-0.00256 

-0.00277 

-0.0035 

0.000162 

0.010826 

3.04E-05 

-0.00097 

0.000398 

-0.00169 

-0.00205 

CPB 

-3.85E-05 

0.000304 

-0.00011 

2.14E-05 

3.04E-05 

7.15E-05 

2.48E-05 

—7.96E-06 

—9.96E-06 

—4.62E-05 

KO 

0.000382 

0.001507 

0.003541 

-0.00015 

-0.00097 

2.48E-05 

0.004008 

—9.49E-05 

0.001485 

0.000574 

MLM 

0.004252 

0.001502 

0.003878 

3.03E-05 

0.000398 

—7.96E-06 

-9.49E-05 

0.004871 

0.00079 

0.000407 

HLT 

0.006097 

0.010138 

0.007075 

-4.03E-05 

-0.00169 

-9.96E-06 

0.001485 

0.00079 

0.009813 

0.005378 

UTX 

0.005467 

0.007483 

0.008557 

—3.32E-05 

-0.00205 

—4.62E-05 

0.000574 

0.000407 

0.005378 

0.015017 


Note: Sun Microsystems (SUNW), Amazon.com (AMZN), Mercury Interactive (MERQ), General Dynamics (GD), 
Northrop Grumman (NOC), Campbell Soup (CPB), Coca-Cola (KO), Martin Marietta (MLM), Hilton (HLT), United 
Technologies (UTX). 
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Table 2 The Correlation Matrix of the Same 10 Return Processes 



SUNW 

AMZN 

MERQ 

GD 

NOC 

CPB 

KO 

MLM 

HLT 

UTX 

SUNW 

1 

0.56558 

0.64812 

0.020565 

-0.14407 

-0.02667 

0.035276 

0.35642 

0.36007 

0.26097 

AMZN 

0.56558 

1 

0.59845 

0.029105 

-0.14815 

0.20041 

0.1325 

0.11975 

0.56951 

0.33983 

MERQ 

0.64812 

0.59845 

1 

-0.14638 

-0.17869 

-0.06865 

0.29688 

0.29489 

0.37905 

0.37061 

GD 

0.020565 

0.029105 

-0.14638 

1 

0.16217 

0.26307 

-0.24395 

0.045072 

-0.04227 

-0.02817 

NOC 

-0.14407 

-0.14815 

-0.17869 

0.16217 

1 

0.034519 

-0.14731 

0.054818 

-0.16358 

-0.16058 

CPB 

-0.02667 

0.20041 

-0.06865 

0.26307 

0.034519 

1 

0.046329 

-0.01349 

-0.0119 

-0.04457 

KO 

0.035276 

0.1325 

0.29688 

-0.24395 

-0.14731 

0.046329 

1 

-0.02147 

0.23678 

0.07393 

MLM 

0.35642 

0.11975 

0.29489 

0.045072 

0.054818 

-0.01349 

-0.02147 

1 

0.11433 

0.047624 

HLT 

0.36007 

0.56951 

0.37905 

-0.04227 

-0.16358 

-0.0119 

0.23678 

0.11433 

1 

0.44302 

UTX 

0.26097 

0.33983 

0.37061 

-0.02817 

-0.16058 

-0.04457 

0.07393 

0.047624 

0.44302 

1 


Note: Sun Microsystems (SUNW), Amazon.com (AMZN), Mercury Interactive (MERQ), General Dynamics (GD), 
Northrop Grumman (NOC), Campbell Soup (CPB), Coca-Cola (KO), Martin Marietta (MLM), Hilton (HLT), United 
Technologies (UTX). 


component (i.e., the portfolio corresponding to 
the first eigenvalue), will have the maximum 
possible variance and the last principal compo¬ 
nent (i.e., the portfolio corresponding to the last 
eigenvalue) will have the smallest variance. Fig¬ 
ure 2 shows the graphics of the principal com¬ 
ponents of maximum and minimum variance. 


The 10 principal components thus obtained 
are linear combinations of the original series, 
X = (Xi,..., Xv)' that is, they are obtained by 
multiplying X by the matrix of the eigenvec¬ 
tors. If the eigenvalues and the corresponding 
eigenvectors are all distinct, as it is the case 
in our illustration, we can apply the inverse 


Table 3 Eigenvectors and Eigenvalues of the Covariance Matrix 


Panel A: Eigenvectors 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

-0.50374 

0.50099 

0.28903 

-0.59632 

-0.01824 

-0.01612 

0.22069 

-0.08226 

0.002934 

-0.00586 

2 

-0.54013 

-0.53792 

0.51672 

0.22686 

-0.06092 

0.25933 

-0.10967 

-0.12947 

0.020253 

0.016624 

3 

-0.59441 

0.32924 

-0.4559 

0.52998 

0.051976 

0.015346 

0.010496 

0.21483 

-0.01809 

-0.00551 

4 

0.001884 

-0.00255 

0.018107 

-0.01185 

0.013384 

0.01246 

-0.01398 

0.01317 

-0.86644 

0.4981 

5 

0.083882 

0.10993 

0.28331 

0.19031 

0.91542 

-0.06618 

0.14532 

-0.02762 

0.011349 

-0.00392 

6 

-0.00085 

-0.01196 

0.016896 

0.006252 

-0.00157 

0.01185 

-0.00607 

-0.02791 

-0.49795 

-0.86638 

7 

-0.0486 

-0.02839 

-0.1413 

0.19412 

-0.08989 

-0.35435 

0.31808 

-0.8387 

-0.01425 

0.027386 

8 

-0.07443 

0.19009 

0.013485 

-0.06363 

0.11133 

-0.22666 

-0.90181 

-0.27739 

0.010908 

0.002932 

9 

-0.20647 

-0.36078 

-0.01067 

-0.1424 

0.038221 

-0.82197 

0.052533 

0.35591 

-0.01155 

-0.01256 

10 

-0.20883 

-0.41462 

-0.5835 

-0.46223 

0.3649 

0.27388 

-0.02487 

-0.14688 

0.001641 

-0.00174 


Panel B: Eigenvalues 

1 0.0783 

2 0.0164 

3 0.0136 

4 0.0109 

5 0.0101 

6 0.0055 

7 0.0039 

8 0.0028 
9 0.0001 

10 0.0001 
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Figure 2 Graphic of the Portfolios of Maximum and Minimum Variance Based on the Covariance 
Matrix 


transformation and recover the X as linear com¬ 
binations of the principal components. 

PCA is interesting if, in using only a small 
number of principal components, we neverthe¬ 
less obtain a good approximation. That is, we 
use PCA to determine principal components 
but we use only those principal components 
that have a large variance as factors of a factor 
model. Stated otherwise, we regress the origi¬ 
nal series X onto a small number of principal 
components. In this way, PCA implements a di¬ 
mensionality reduction as it allows one to retain 
only a small number of components. By choos¬ 
ing as factors the components with the largest 
variance, we can explain a large portion of the 
total variance of X. 

Table 4 shows the total variance explained by 
a growing number of components. Thus the 
first component explains 55.2784% of the to¬ 
tal variance, the first two components explain 
66.8507% of the total variance, and so on. Ob¬ 
viously 10 components explain 100% of the 


total variance. The second, third, and fourth 
columns of Table 5 show the residuals of the Sun 
Microsystem return process with 1, 5, and all 
10 components, respectively. There is a large 
gain from 1 to 5, while the gain from 5 to all 10 
components is marginal. 


Table 4 Percentage of the Total Variance Explained 
by a Growing Number of Components Based on the 
Covariance Matrix 


Principal 

Component 

Percentage of Total 
Variance Explained 

1 

55.2784% 

2 

66.8508 

3 

76.4425 

4 

84.1345 

5 

91.2774 

6 

95.1818 

7 

97.9355 

8 

99.8982 

9 

99.9637 

10 

100.0000 
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Table 5 Residuals of the Sun Microsytem Return Process with 1,5, and All Components Based on the Covariance 
Matrix and the Correlation Matrix 


Residuals Based on Covariance Matrix Residuals Based on Correlation Matrix 


Month/Year 

1 Principal 
Component 

5 Principal 
Components 

10 Principal 
Components 

1 Principal 
Component 

5 Principal 
Components 

10 Principal 
Components 

Dec. 2000 

0.069044 

0.018711 

1.53E-16 

0.31828 

0.61281 

—2.00E-15 

Jan. 2001 

-0.04723 

-0.02325 

1.11E-16 

-0.78027 

-0.81071 

1.78E-15 

Feb.2001 

-0.03768 

0.010533 

—1.11E-16 

-0.47671 

0.04825 

2.22E-16 

March 2001 

-0.16204 

-0.02016 

2.50E-16 

-0.47015 

-0.82958 

—2.78E-15 

April 2001 

-0.00819 

-0.00858 

—7.63E-17 

-0.32717 

-0.28034 

—5.00E-16 

May 2001 

0.048814 

-0.00399 

2.08E-17 

0.36321 

0.016427 

7.22E-16 

June 2001 

0.21834 

0.025337 

—2.36E-16 

1.1437 

1.37 

7.94E-15 

July 2001 

-0.03399 

0.02732 

1.11E-16 

-0.7547 

0.35591 

1.11E-15 

Aug. 2001 

0.098758 

-0.00146 

2.22E-16 

1.0501 

0.19739 

—8.88E-16 

Sept. 2001 

0.042674 

0.006381 

—5.55E-17 

0.40304 

0.28441 

2.00E-15 

Oct. 2001 

0.038679 

-0.00813 

—5.55E-17 

0.50858 

0.17217 

4.44E-16 

Nov. 2001 

-0.11967 

-0.01624 

1.11E-16 

-0.89512 

-0.8765 

—7.77E-16 

Dec. 2001 

-0.19192 

0.030744 

1.67E-16 

-1.001 

0.047784 

—1.55E-15 

Jan. 2002 

-0.13013 

-0.00591 

5.55E-17 

-1.1085 

-0.68171 

—1.33E-15 

Feb.2002 

0.003304 

0.017737 

0 

-0.05222 

0.20963 

—9.99E-16 

March 2002 

-0.07221 

0.012569 

5.55E-17 

-0.35765 

0.13344 

2.22E-16 

April 2002 

-0.08211 

-0.00916 

2.78E-17 

-0.38222 

-0.47647 

—2.55E-15 

May 2002 

-0.05537 

-0.02103 

0 

-0.45957 

-0.53564 

4.22E-15 

June 2002 

-0.15461 

0.004614 

1.39E-16 

-1.0311 

-0.54064 

—3.33E-15 

July 2002 

0.00221 

0.013057 

8.33E-17 

0.24301 

0.37431 

—1.89E-15 

Aug. 2002 

-0.12655 

0.004691 

5.55E-17 

-0.8143 

-0.30497 

2.00E-15 

Sept. 2002 

-0.07898 

0.039666 

5.55E-17 

-0.25876 

0.64902 

—6.66E-16 

Oct. 2002 

0.15839 

0.003346 

—1.11E-16 

0.98252 

0.53223 

—1.78E-15 

Nov. 2002 

-0.11377 

0.013601 

1.67E-16 

-0.95263 

-0.33884 

—2.89E-15 

Dec. 2002 

-0.06957 

0.012352 

1.32E-16 

-0.10309 

0.029623 

—4.05E-15 

Jan. 2003 

0.14889 

-0.00118 

—8.33E-17 

1.193 

0.73723 

5.00E-15 

Feb.2003 

-0.03359 

-0.02719 

—4.16E-17 

-0.02854 

-0.38331 

4.05E-15 

March 2003 

-0.05314 

-0.00859 

2.78E-17 

-0.38853 

-0.40615 

—2.22E-16 

April 2003 

0.10457 

-0.01442 

—2.22E-16 

0.73075 

0.097101 

—1.11E-15 

May 2003 

0.078567 

0.022227 

—5.55E-17 

0.52298 

0.63772 

—7.77E-16 

June 2003 

-0.1989 

-0.02905 

1.39E-16 

-1.4213 

-1.3836 

-3.55E-15 

July 2003 

-0.0149 

-0.00955 

0 

0.13876 

-0.1059 

3.44E-15 

Aug. 2003 

-0.12529 

-0.00528 

8.33E-17 

-0.73819 

-0.51792 

9.99E-16 

Sept. 2003 

0.10879 

-0.00645 

—8.33E-17 

0.69572 

0.25503 

—2.22E-15 

Oct. 2003 

0.07783 

0.01089 

—2.78E-17 

0.36715 

0.45274 

—1.11E-15 

Nov. 2003 

0.038408 

-0.01181 

—5.55E-17 

0.11761 

-0.13271 

3.33E-16 

Dec. 2003 

0.18203 

0.012593 

— 1.39E-16 

1.2655 

0.98182 

3.77E-15 

Jan. 2004 

0.063885 

-0.00042 

6.94E-18 

0.33717 

0.038477 

0 

Feb.2004 

-0.12552 

-0.00225 

1.11E-16 

-0.70345 

-0.49379 

0 

March 2004 

-0.01747 

0.016836 

0 

-0.1949 

0.35348 

—1.94E-16 

April 2004 

0.015742 

0.013764 

4.16E-17 

0.2673 

0.46969 

—5.77E-15 

May 2004 

-0.03556 

-0.02072 

—6.94E-17 

-0.60652 

-0.68268 

0 

June 2004 

0.14325 

0.008155 

—1.94E-16 

0.54463 

0.59768 

3.22E-15 

July 2004 

0.030731 

-0.00285 

—4.16E-17 

0.13011 

0.028779 

7.08E-16 

Aug. 2004 

0.032719 

-0.00179 

—5.55E-17 

0.26793 

0.18353 

2.05E-15 

Sept. 2004 

0.083238 

0.003664 

0 

0.58186 

0.29544 

3.77E-15 

Oct. 2004 

0.11722 

-0.00356 

— 1.39E-16 

0.77575 

0.38959 

2.22E-16 

Nov. 2004 

-0.04794 

-0.00088 

0 

-0.47706 

-0.35464 

—3.13E-15 

Dec. 2004 

-0.1099 

-0.01903 

1.11E-16 

-0.69439 

-0.64663 

—2.22E-16 

Jan. 2005 

0.0479 

-0.00573 

2.08E-17 

0.24203 

-0.04065 

—4.45E-16 

Feb.2005 

-0.015 

0.003186 

1.39E-17 

-0.07198 

0.054412 

3.28E-15 

March 2005 

0.005969 

-0.0092 

—4.16E-17 

0.035251 

-0.02106 

3.83E-15 

April 2005 

-0.00742 

-0.01241 

—4.16E-17 

-0.09335 

-0.42659 

—1.67E-16 

May 2005 

0.14998 

-0.01126 

6.25E-17 

1.0219 

0.034585 

-9.05E-15 

June 2005 

-0.05045 

-0.00363 

3.47E-17 

-0.25655 

-0.1229 

—4.66E-15 

July 2005 

0.065302 

-0.00421 

—5.20E-17 

0.56136 

0.16602 

3.08E-15 

Aug. 2005 

0.006719 

-0.01174 

1.39E-17 

0.09319 

-0.22119 

—2.00E-15 

Sept. 2005 

0.12865 

-0.00259 

—8.33E-17 

0.95602 

0.33442 

3.50E-15 

Oct. 2005 

-0.01782 

0.011827 

—8.33E-17 

-0.2249 

0.27675 

1.53E-15 

Nov. 2005 

0.026312 

—7.72E-05 

—1.39E-17 

0.26642 

0.19725 

1.67E-15 
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Table 6 Eigenvectors and Eigenvalues of the Correlation Matrix 


Panel A: Eigenvectors 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

-0.4341 

0.19295 

-0.26841 

0.040065 

-0.19761 

0.29518 

-0.11161 

-0.11759 

-0.72535 

-0.14857 

2 

-0.45727 

0.18203 

0.20011 

0.001184 

0.013236 

0.37606 

0.05077 

0.19402 

0.47275 

-0.55894 

3 

-0.47513 

-0.03803 

-0.16513 

0.16372 

-0.01282 

0.19087 

-0.08297 

-0.38843 

0.37432 

0.61989 

4 

0.06606 

0.63511 

0.18027 

-0.16941 

-0.05974 

-0.24149 

-0.66306 

-0.14342 

0.092295 

0.02113 

5 

0.17481 

0.33897 

-0.21337 

0.14797 

0.84329 

0.23995 

0.091628 

-0.07926 

-0.06105 

0.001886 

6 

-0.00505 

0.42039 

0.57434 

0.40236 

-0.15072 

-0.05018 

0.48758 

-0.07382 

-0.15788 

0.19532 

7 

-0.18172 

-0.397 

0.28037 

0.58674 

0.26063 

-0.26864 

-0.38592 

-0.16286 

-0.11336 

-0.24105 

8 

-0.1913 

0.26851 

-0.55744 

0.32448 

-0.09047 

-0.58736 

0.20083 

0.19847 

0.15935 

-0.13035 

9 

-0.40588 

-0.0309 

0.20884 

-0.20157 

0.29193 

-0.16641 

-0.08666 

0.67897 

-0.1739 

0.37201 

10 

-0.32773 

-0.05042 

0.14067 

-0.51858 

0.24871 

-0.41444 

0.30906 

-0.4883 

-0.06781 

-0.17077 


Panel B: Eigenvalues 


1 3.0652 

2 1.4599 

3 1.1922 

4 0.9920 

5 0.8611 

6 0.6995 

7 0.6190 

8 0.5709 

9 0.3143 

10 0.2258 


We can repeat the same exercise for the cor¬ 
relation matrix. Table 6 shows the eigenvectors 
(panel A) and the eigenvalues (panel B) of the 
correlation matrix. Eigenvectors are normalized 
as in the case of the covariance matrix. 

Table 7 shows the total variance explained by 
a growing number of components. Thus the 
first component explains 30.6522% of the to¬ 
tal variance, the first two components explain 


Table 7 Percentage of the Total Variance Explained 
by a Growing Number of Components Using the 
Correlation Matrix 


Principal 

Component 

Percentage of Total 
Variance Explained 

1 

30.6522% 

2 

45.2509 

3 

57.1734 

4 

67.0935 

5 

75.7044 

6 

82.6998 

7 

88.8901 

8 

94.5987 

9 

97.7417 

10 

100.0000 


45.2509% of the total variance, and so on. Ob¬ 
viously 10 components explain 100% of the to¬ 
tal variance. The increase in explanatory power 
with the number of components is slower than 
in the case of the covariance matrix. 

The proportion of the total variance explained 
grows more slowly in the correlation case than 
in the covariance case. Figure 3 shows the 
graphics of the portfolios of maximum and min¬ 
imum variance. The ratio between the two port¬ 
folios is smaller in this case than in the case of 
the covariance. 

The last three columns of Table 6 show the 
residuals of the Sun Microsystem return pro¬ 
cess with 1, 5, and all components based on the 
correlation matrix. Residuals are progressively 
reduced, but at a lower rate than with the co- 
variance matrix. 

PCA and Factor Analysis with 
Stable Distributions 

In the previous sections we discussed PCA 
and factor analysis without making any explicit 
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Figure 3 Graphic of the Portfolios of Maximum and Minimum Variance Based on the Correlation 
Matrix 


reference to the distributional properties of the 
variables. These statistical tools can be applied 
provided that all variances and covariances ex¬ 
ist. Therefore applying them does not require, 
per se, that distributions are normal, but only 
that they have finite variances and covariances. 
Variances and covariances are not robust but are 
sensitive to outliers. Robust equivalents of vari¬ 
ances and covariances exist. In order to make 
PCA and factor analysis insensitive to outliers, 
one could use robust versions of variances and 
covariances and apply PCA and factor analysis 
to these robust estimates. 

In many cases, however, distributions might 
exhibit fat tails and infinite variances. In this 
case, large values cannot be trimmed but must 
be taken into proper consideration. However, 
if variances and covariances are not finite, the 
least squares methods used to estimate fac¬ 
tor loadings cannot be applied. In addition, 
the concept of PCA and factor analysis as il¬ 
lustrated in the previous sections cannot be 


applied. In fact, if distributions have infinite 
variances, it does not make sense to determine 
the portfolio of maximum variance as all port¬ 
folios will have infinite variance and it will be 
impossible, in general, to determine an ordering 
based on the size of variance. 

Both PCA and factor analysis as well as the es¬ 
timation of factor models with infinite-variance 
error terms are at the forefront of econometric 
research. 

FACTOR ANALYSIS 

Thus far, we have seen how factors can be de¬ 
termined using principal components analysis. 
We retained as factors those principal compo¬ 
nents with the largest variance. In this section, 
we consider an alternative technique for deter¬ 
mining factors :fnctor analysis (FA). Suppose we 
are given T independent samples of a random 
vector X = (Xi,..., X ; v)' . In the most common 
cases in financial econometrics, we will be given 
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T samples of a multivariate time series. How¬ 
ever, factor analysis can be applied to samples 
extracted from a generic multivariate distribu¬ 
tion. To fix these ideas, suppose we are given N 
time series of stock returns at T moments, as in 
the case of PCA. 

Assuming that the data are described by a 
strict factor model with K factors, the objective 
of factor analysis (FA) consists of determining a 
model of the type 

X = (x -f- |3f -f- £. 
with covariance matrix 

E = (3(3' + V 

The estimation procedure is performed in two 
steps. In the first step, we estimate the co- 
variance matrix and the factor loadings. In the 
second step, we estimate factors using the co- 
variance matrix and the factor loadings. 

If we assume that the variables are jointly nor¬ 
mally distributed and temporally IID, we can 
estimate the covariance matrix with maximum 
likelihood methods. Estimation of factor mod¬ 
els with maximum likelihood methods is not 
immediate because factors are not observable. 
Iterative methods such as the expectation maxi¬ 
mization (EM) algorithm are generally used. 

After estimating the matrices |3 and 'l' fac¬ 
tors can be estimated as linear regressions. In 
fact, assuming that factors are zero means (an 
assumption that can always be made), we can 
write the factor model as 

X— cx= |3f + e 

which shows that, at any given time, factors can 
be estimated as the regression coefficients of the 
regression of (X — a) onto (3. Using the standard 
formulas of regression analysis, we can now 
write factors, at any given time, as follows: 

if = (|3''T 1 |3)~ 1 phT^Xt-a) 

The estimation approach based on maximum 
likelihood estimates implies that the number 
of factors is known. In order to determine the 
number of factors, a heuristic procedure con¬ 
sists of iteratively estimating models with a 


growing number of factors. The correct num¬ 
ber of factors is determined when estimates of 
q factors stabilize and cannot be rejected on the 
basis of p probabilities. A theoretical method 
for determining the number of factors was pro¬ 
posed by Bai and Ng (2002). 

The factor loadings matrix can also be esti¬ 
mated with ordinary least squares (OLS) meth¬ 
ods. The OLS estimator of the factor loadings 
coincides with the principal component estima¬ 
tor of factor loadings. However, in a strict factor 
model, OLS estimates of the factor loadings are 
inconsistent when the number of time points 
goes to infinity but the number of series remains 
finite, unless we assume that the idiosyncratic 
noise terms all have the same variance. 

The OLS estimators, however, remain consis¬ 
tent if we allow both the number of processes 
and the time to go to infinity. Under this as¬ 
sumption, as explained by Bai (2003), we can 
also use OLS estimators for approximate factor 
models. 

In a number of applications, we might want to 
enforce the condition a — 0. This condition is the 
condition of asset of arbitrage. OLS estimates of 
factor models with this additional condition are 
an instance of constrained OLS methods. 


An Illustration of Factor Analysis 

Let's now show how factor analysis is per¬ 
formed. To do so, we will use the same 10 stocks 
and return data for December 2000 to Novem¬ 
ber 2005 that we used to illustrate principal 
components analysis. 

As just described, to perform factor analysis, 
we need estimate only the factor loadings and 
the idiosyncratic variances of noise terms. We 
assume that the model has three factors. Table 8 
shows the factor loadings. Each row represents 
the loadings of the three factors corresponding 
to each stock. The last column of the table shows 
the idiosyncratic variances. 

The idiosyncratic variances are numbers be¬ 
tween 0 and 1, where 0 means that the variance 
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Table 8 A Factor Loadings and Idiosyncratic Variances 




Factor Loadings 


Variance 

pi 

p2 

/?3 

SUNW 

0.656940 

0.434420 

0.27910 

0.301780 

AMZN 

0.959860 

-0.147050 

-0.00293 

0.057042 

MERQ 

0.697140 

0.499410 

-0.08949 

0.256570 

GD 

0.002596 

-0.237610 

0.43511 

0.754220 

NOC 

-0.174710 

-0.119960 

0.23013 

0.902130 

CPB 

0.153360 

-0.344400 

0.13520 

0.839590 

KO 

0.170520 

0.180660 

-0.46988 

0.717500 

MLM 

0.184870 

0.361180 

0.28657 

0.753250 

HLT 

0.593540 

0.011929 

-0.18782 

0.612300 

UTX 

0.385970 

0.144390 

-0.15357 

0.806590 


is completely explained by common factors and 
1 that common factors fail to explain variance. 

The p-value turns out to be 0.6808 and there¬ 
fore fails to reject the null of three factors. Esti¬ 
mating the model with 1 and 2 factors we obtain 
much lower p-values while we run into numer¬ 
ical difficulties with 4 or more factors. We can 
therefore accept the null of three factors. Fig¬ 
ure 4 shows the graphics of the three factors. 


PCA AND FACTOR ANALYSIS 
COMPARED 

The two illustrations of PCA and FA are rel¬ 
ative to the same data and will help clarify 
the differences between the two methods. Let's 
first observe that PCA does not imply, per se, 
any specific restriction on the process. Given a 
nonsingular covariance matrix, we can always 



Figure 4 Graph of the three factors 
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Figure 5 Graphical Representation of Factor Loadings 


perform PCA as an exact linear transforma¬ 
tion of the series. When we consider a smaller 
number of principal components, we perform 
an approximation that has to be empirically jus¬ 
tified. For example, in our PCA illustration, the 
first three components explain 76% of the total 
variance (based on the covariance matrix; see 
Table 4). 

Factor analysis, on the other hand, assumes 
that the data have a strict factor structure in the 
sense that the covariance matrix of the data can 
be represented as a function of the covariances 
between factors plus idiosyncratic variances. 
This assumption has to be verified, otherwise 
the estimation process might yield incorrect 
results. 

In other words, PCA tends to be a dimension¬ 
ality reduction technique that can be applied 
to any multivariate distribution and that yields 
incremental results. This means that there is a 
trade-off between the gain in estimation from 
dimensionality reduction and the percentage of 
variance explained. Consider that PCA is not an 
estimation procedure: It is an exact linear trans¬ 
formation of a time series. Estimation comes 
into play when a reduced number of princi¬ 


pal components is chosen and each variable is 
regressed onto these principal components. At 
this point, a reduced number of principal com¬ 
ponents yields a simplified regression, which 
results in a more robust estimation of the co- 
variance matrix of the original series though 
only a fraction of the variance is explained. 

Factor analysis, on the other hand, tends to re¬ 
veal the exact factor structure of the data. That 
is, FA tends to give an explanation in terms of 
what factors explain what processes. Factor ro¬ 
tation can be useful both in the case of PCA and 
FA. Consider FA. In our illustration, to make the 
factor model identifiable, we applied the restric¬ 
tion that factors are orthonormal variables. This 
restriction, however, might result in a matrix of 
factor loadings that is difficult to interpret. 

For example, if we look at the loading ma¬ 
trix in Table 8, there is no easily recognizable 
structure, in the sense that the time series is 
influenced by all factors. Figure 5 shows graph¬ 
ically the relationship of the time series to the 
factors. In this graphic, each of the 10 time series 
is represented by its three loadings. 

We can try to obtain a better representa¬ 
tion through factor rotation. The objective is to 
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Figure 6 Relationship of Time Series to the Factors after Rotation 


create factors such that each series has only one 
large loading and thus is associated primarily 
with one factor. Several procedures have been 
proposed for doing so. For example, if we ro¬ 
tate factors using the "promax" method, we 
obtain factors that are no longer orthogonal 
but that often have a better explanatory power. 
Figure 6 shows graphically the relationship of 
time series to the factors after rotation. The asso¬ 
ciation of the series to a factor is more evident. 
This fact can be seen from the matrix of new 
factor loadings in Table 9, which shows how 
nearly each stock has one large loading. 


Table 9 Factor Loadings after Rotation 



FI 

F2 

F3 

SUNW 

0.214020 

0.750690 

0.101240 

AMZN 

0.943680 

0.127310 

0.104990 

MERQ 

0.218340 

0.578050 

-0.294340 

GD 

0.163360 

0.073269 

0.544220 

NOC 

-0.070130 

-0.003990 

0.278000 

CPB 

0.393120 

-0.178070 

0.301920 

KO 

0.032397 

-0.100020 

-0.545120 

MLM 

-0.137130 

0.561640 

0.123670 

HLT 

0.513660 

0.048842 

-0.168290 

UTX 

0.229400 

0.133510 

-0.204650 


KEY POINTS 

• Principal component analysis (PCA) and 
factor analysis are statistical tools used in fi¬ 
nancial modeling to reduce the number of 
variables in a model (i.e., to reduce the di¬ 
mensionality) and to identify a structure in 
the relationships between variables. 

• Factor models seek to explain complex phe¬ 
nomena via a small number of basic causes or 
factors. In finance these models are typically 
applied to time series. 

• The objective of a factor model in finance is 
to explain the behavior of a large number of 
stochastic processes typically price, returns, 
or rate processes in terms of a small number of 
factors (which themselves are stochastic pro¬ 
cesses). In financial modeling, factor models 
are needed not only to explain data but to 
make estimation feasible. 

• Linear factor models are regression models. 
The coefficients are referred to as factor load¬ 
ings or factor sensitivities, and they represent 
the influence of a factor on some variable. 

• Principal components analysis is a tool to 
determine factors with statistical learning 
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techniques when factors are not exogenously 
given. PCA implements a dimensionality re¬ 
duction of a set of observations. 

Performing PCA is equivalent to determin¬ 
ing the eigenvalues and eigenvectors of 
the covariance matrix or of the correlation 
matrix. 

Factor analysis is an alternative technique for 
determining factors. The estimation proce¬ 
dure is performed in two steps: (1) estimate 
the covariance matrix and the factor loadings, 
and (2) estimate factors using the covariance 
matrix and the factor loadings. 

The covariance matrix can be estimated with 
maximum likelihood methods, assuming that 
the variables are jointly normally distributed 
and temporally independently and identi¬ 
cally distributed. The estimation of models 
with maximum likelihood methods is not im¬ 
mediate because factors are not observable, 
and consequently iterative methods such as 


the expectation maximization (EM) algorithm 
are generally used. 
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Abstract: Multifactor risk models seek to estimate and characterize the risk of a portfolio, either 
in absolute value or when compared against a benchmark. Risk is typically decomposed into a 
systematic and an idiosyncratic component. Systematic risk captures the exposures the portfolio 
has to broad risk factors. For equity portfolios these are typically countries, industries, fundamental 
(e.g., size), or technical (e.g., momentum). The portfolio systematic risk depends on its exposure to 
these risk factors, the volatility of the factors, and how they correlate with each other. Idiosyncratic 
risk captures the uncertainty associated with news affecting only individual issuers in the portfolio. 
This risk can be diversified by decreasing the importance of individual issuers in the portfolio. 
Intuitive multifactor risk models can provide relevant information regarding the major sources of 
risk in the portfolio. This information can be used to understand the important imbalances of the 
portfolio and guide the portfolio manager in constructing or rebalancing the portfolio. It can also 
be used in interpreting results from return attribution or scenario analysis. 


Risk management is an integral part of the 
portfolio management process. Risk models are 
central to this practice, allowing managers to 
quantify and analyze the risk embedded in their 


portfolios. Risk models provide managers in¬ 
sight into the major sources of risk in a port¬ 
folio, helping them to control their exposures 
and understand the contributions of different 
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portfolio components to total risk. They help 
portfolio managers in their decision-making 
process by providing answers to important 
questions such as: How does my small-cap 
exposure affect portfolio risk? Does my under¬ 
weight in diversified financials hedge my over¬ 
weight in banks? Risk models are also widely 
used in various other areas such as in portfo¬ 
lio construction, performance attribution, and 
scenario analysis. 

In this entry, we discuss the structure of multi¬ 
factor equity risk models, types of factors used 
in these models, and describe certain estima¬ 
tion techniques. We also illustrate the use of eq¬ 
uity risk factor models in various applications, 
namely the analysis of portfolio risk, portfo¬ 
lio construction, scenario analysis, and perfor¬ 
mance attribution. 

Throughout this entry, we will be using the 
Barclays Global Risk Model 1 for illustration 
purposes. For completeness, we also refer to 
other approaches one can take to construct such 
a model. 


MOTIVATION 

In this section, we discuss the motivation be¬ 
hind the multifactor equity risk models. Let's 
assume that a portfolio manager wants to esti¬ 
mate and analyze the volatility of a large portfo¬ 
lio of stocks. A straightforward idea would be to 
compute the volatility of the historical returns 
of the portfolio and use this measure to forecast 
future volatility. However, this framework does 
not provide any insight into the relationships 
between different securities in the portfolio or 
the major sources of risk. For instance, it does 
not assist a portfolio manager interested in di¬ 
versifying her portfolio or constructing a port¬ 
folio that has better risk-adjusted performance. 

Instead of estimating the portfolio volatility 
using historical portfolio returns, one could uti¬ 
lize a different strategy. The portfolio return 
is a function of stock returns and the market 
weights of these stocks in the portfolio. Us¬ 


ing this, the forecasted volatility of the port¬ 
folio (dp) can be computed as a function of the 
weights (iv) and the covariance matrix (E s ) of 
stock returns in the portfolio: 

<T p = w T ■ E s ■ w 

This covariance matrix can be decomposed 
into individual stock volatilities and the corre¬ 
lations between stock returns. Volatilities mea¬ 
sure the riskiness of individual stock returns 
and correlations represent the relationships be¬ 
tween the returns of different stocks. Looking 
into these correlations and volatilities, the port¬ 
folio manager can gain insight into her portfo¬ 
lio, namely the riskiness of different parts of the 
portfolio or how the portfolio can be diversified. 
As we outlined above, to estimate the portfolio 
volatility we need to estimate the correlation 
between each pair of stocks. Unfortunately, this 
means that the number of parameters to be es¬ 
timated grows quadratically with the number 
of stocks in the portfolio . 2 For most practical 
portfolios, the relatively large number of stocks 
makes it difficult to estimate the relationship be¬ 
tween stock returns in a robust way. Moreover, 
this framework uses the history of individual 
stock returns to forecast future stock volatility. 
However, stock characteristics are dynamic and 
hence using returns from different time periods 
may not produce good forecasts . 3 Finally, the 
analysis does not provide much insight regard¬ 
ing the broad factors influencing the portfo¬ 
lio. These drawbacks constitute the motivation 
for the multifactor risk models detailed in this 
entry. 

One of the major goals of multifactor risk 
models is to describe the return of a port¬ 
folio using a smaller set of variables, called 
factors. These factors should be designed to 
capture broad (systematic) market fluctuations, 
but should also be able to capture specific nu¬ 
ances of individual portfolios. For instance, a 
broad U.S. market factor would capture the gen¬ 
eral movement in the equity market, but not 
the varying behavior across industries. If our 
portfolio is heavily biased toward particular 
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industries, the broad U.S. market factor may 
not allow for a good representation of our port¬ 
folio's return. 

In the context of factor models, the total re¬ 
turn of a stock is decomposed into a systematic 
and an idiosyncratic component. Systematic re¬ 
turn is the component of total return due to 
movements in common risk factors, such as in¬ 
dustry or size. On the other hand, idiosyncratic 
return can be described as the residual compo¬ 
nent that cannot be explained by the systematic 
factors. Under these models, the idiosyncratic 
return is uncorrelated across issuers. Therefore, 
correlations across securities are driven by their 
exposures to the systematic risk factors and the 
correlation between those factors. 

The following equation demonstrates the sys¬ 
tematic and the idiosyncratic components of 
total stock return: 

r s = L s ■ F + e s 

The systematic return for security s is the prod¬ 
uct of the loadings of that security (L s , also 
called sensitivities) to the systematic risk fac¬ 
tors and the returns of these factors (F). The 
idiosyncratic return is given by s s . Under these 
models, the portfolio volatility can be estimated 
as 

crp = Lj,-'Ep-Lp + w T -£2-w 

Models represented by equations of this form 
are called linear factor models. Here L p repre¬ 
sents the loadings of the portfolio to the risk 
factors (determined as the weighted average of 
individual stock loadings), and E/ is the co- 
variance matrix of factor returns, w is the vec¬ 
tor of security weights in the portfolio, and Q 
is the covariance matrix of idiosyncratic stock 
returns. Due to the uncorrelated nature of these 
returns, this covariance matrix is diagonal: all 
elements outside its diagonal are zero. As a re¬ 
sult, the idiosyncratic risk of the portfolio is di¬ 
versified away as the number of securities in the 
portfolio increases. This is the diversification 
benefit attained when combining uncorrelated 
exposures. 


For most practical portfolios, the number of 
factors is significantly smaller than the number 
of stocks in the portfolio. Therefore, the num¬ 
ber of parameters in Ef is much smaller than in 
Eg, leading to a generally more robust estima¬ 
tion. Moreover, the factors can be designed in a 
way that they are relatively more stable than in¬ 
dividual stock returns, leading to models with 
potentially better predictability. 

Another important advantage of using linear 
factor models is the detailed insight they pro¬ 
vide into the structure and properties of port¬ 
folios. These models characterize stock returns 
in terms of systematic factors that (can) have 
intuitive economic interpretations. Linear fac¬ 
tor models can provide important insights re¬ 
garding the major systematic and idiosyncratic 
sources of risk and return. This analysis can 
help managers to better understand their port¬ 
folios and can guide them through the different 
tasks they perform, such as rebalancing, hedg¬ 
ing, or the tilting of their portfolios. The Bar¬ 
clays Global Risk Model—the model used for 
illustration throughout this entry—is an exam¬ 
ple of such a linear factor model. 

EQUITY RISK FACTOR 
MODELS 

The design of a linear factor model usu¬ 
ally starts with the identification of the major 
sources of risk embedded in the portfolios of 
interest. For an equity portfolio manager who 
invests in various markets across the globe, the 
major sources of risk are typically country, in¬ 
dustry membership, and other fundamental or 
technical exposures such as size, value, and 
momentum. The relative significance of these 
components varies across different regions. For 
instance, for regional equity risk models in de¬ 
veloped markets, industry factors tend to be 
more important than country factors, although 
in periods of financial distress country factors 
become more significant. On the other hand, for 
emerging markets models the country factor 
is still considered to be the most important 
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source of risk. For regional models, the rela¬ 
tive significance of industry factors depends on 
the level of financial integration across different 
local markets in that region. The importance of 
these factors is also time-varying, depending 
on the particular time period of the analysis. 
For instance, country risk used to be a large 
component of total risk for European equity 
portfolios. Flowever, country factors have been 
generally losing their significance in this con¬ 
text due to financial integration in the region 
as a result of the European Union and a com¬ 
mon currency, the euro. This is particularly true 
for larger European countries. Similarly, the rel¬ 
ative importance of industry factors is higher 
over the course of certain industry-led crises, 
such as the dot-com bubble burst (2000-2002) 
and the 2007-2009 banking and credit crisis. As 
we will see, the relative importance of different 
risk factors varies also with the particular de¬ 
sign and the estimation process chosen to cali¬ 
brate the model. 

A typical global or regional equity risk model 
has the following structure: 

= fi MKJ • p MKT q. . pIND 

+ A CNT. f CNT + ^^... F FT + e . 

7=1 

where 

r, = the rate of return for stock i 
pMKT _ |-q e mar p e t factor 
piND _ |-q e tnd us try factor corresponding to 
stock i 

pCNT _ td e country factor corresponding to 
stock i 

Pi — the exposure (beta) of the stock to the 
corresponding factor 

F ft = the set of fundamental and technical 
factors 

tij — the loading of stock i to factor Fj [ 

Sj = the residual return for stock i 

There are different ways in which these 
factors can be incorporated into an equity 
risk model. The choice of a particular model 


affects the interpretation of the factors. For in¬ 
stance, consider a model that has only mar¬ 
ket and industry factors. Industry factors in 
such a model would represent industry-specific 
moves net of the market return. On the other 
hand, if we remove the market factor from 
the equation, the industry factors now incor¬ 
porate the overall market effect. Their interpre¬ 
tation would change, with their returns now 
being close to market value-weighted indus¬ 
try indexes. Country-specific risk models are 
a special case of the previous representation 
where the country factor disappears and the 
market factor is represented by the returns of 
the countrywide market. Macroeconomic fac¬ 
tors are also used in some equity risk models, 
as discussed later. 

The choice of estimation process also influ¬ 
ences the interpretation of the factors. As an 
example, consider a model that has only in¬ 
dustry and country factors. These factors can 
be estimated jointly in one step. In this case, 
both factors represent their own effect net of 
the other ones. On the other hand, these factors 
can be estimated in a multistep process—e.g., 
industry factors estimated in the first step and 
then the country factors estimated in the second 
step, using residual returns from the first step. 
In this case, the industry factors have an inter¬ 
pretation close to the market value-weighted 
industry index returns, while the country fac¬ 
tors would now represent a residual country 
average effect, net of industry returns. We dis¬ 
cuss this issue in more detail in the following 
section. 

Model Estimation 

In terms of the estimation methodology, there 
are three major types of multi-factor equity risk 
models: cross-sectional, time series, and statisti¬ 
cal. All three of these methodologies are widely 
used to construct linear factor models in the eq¬ 
uity space. 4 In cross-sectional models, loadings 
are known and factors are estimated. Examples 
of loadings used in these models are industry 
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membership variables and fundamental secu¬ 
rity characteristics (e.g., the book-to-price ratio). 
Individual stock returns are regressed against 
these security-level loadings in every period, 
delivering estimation of factor returns for that 
period. The interpretation of these estimated 
factors is usually intuitive, although dependent 
on the estimation procedure and on the quality 
of the loadings. In time-series models, factors 
are known and loadings are estimated. Exam¬ 
ples of factors in these models are financial or 
macroeconomic variables, such as market re¬ 
turns or industrial production. Time series of 
individual equity returns are regressed against 
the factor returns, delivering empirical sensitiv¬ 
ities (loadings or betas) of each stock to the risk 
factors. In these models, factors are constructed 
and not estimated, therefore, their interpreta¬ 
tion is straightforward. In statistical models 
(e.g., principal component analysis), both fac¬ 
tors and loadings are estimated jointly in an 
iterative fashion. The resulting factors are sta¬ 
tistical in nature, not designed to be intuitive. 
That being said, a small set of the statistical fac¬ 
tors can be (and usually are) correlated with 
broad economic factors, such as the market. 
Table 1 summarizes some of the characteristics 
of these models. 

An important advantage of cross-sectional 
models is that the number of parameters to be 


estimated is generally significantly smaller as 
compared to the other two types of models. On 
the other hand, cross-sectional models require a 
much larger set of input data (company-specific 
loadings). Cross-sectional models tend to be 
relatively more responsive as loadings can 
adjust faster to changing market conditions. 
There are also hybrid models, which combine 
cross-sectional and time-series estimation in 
an iterative fashion; these models allow the 
combination of observed and estimated factors. 
Finally, statistical models require only a history 
of security returns as input to the process. They 
tend to work better when economic sources of 
risk are hard to identify and are primarily used 
in high-frequency applications. 

As we mentioned in the previous section, the 
estimation process is a major determinant in the 
interpretation of factors. Estimating all factors 
jointly in one-step regression allows for a nat¬ 
ural decomposition of total variance in stock 
returns. However it also complicates the in¬ 
terpretation of factors as each factor now rep¬ 
resents its own effect net of all other factors. 
Moreover, multicollinearity problems arise nat¬ 
urally in this set-up, potentially delivering lack 
of robustness to the estimation procedure and 
leading to unintuitive factor realizations. This 
problem can be serious when using factors that 
are highly correlated. 


Table 1 Cross-Sectional, Time-Series, and Statistical Factor Models 


Model 

Cross-Sectional 

Time-Series 

Statistical 

Input set 

Security-specific loadings 
and returns 

Factor and security returns 

Security returns 

Factors and loadings 

Factors are estimated using 
the known loadings (e.g., 
industry beta or 
momentum score) 

Factors are known (e.g., 
market or industrial 
production) and loadings 
are estimated (e.g., 
industry or momentum 
betas) 

Both factors and loadings 
are estimated 

Interpretation 

Clean interpretation of 
loadings; generally 
intuitive interpretation of 
factors 

Straightforward 

interpretation of factors 

Factors may have no 
intuitive interpretation 

Number of parameters 

(No. of factors) x (No. of 
time periods) 

(No. of securities) x (No. of 
factors) 

(No. of securities) x (No. of 
factors) 
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An alternative in this case is to use a mul¬ 
tistep estimation process where different sets 
of factors are estimated sequentially, in sepa¬ 
rate regressions. In the first step, stock returns 
are used in a regression to estimate a certain 
set of factors, and then residual returns from 
this step are used to estimate the second step 
factors, and so on. The choice of the order of 
factors in such estimation influences the nature 
of the factors and their realizations. This choice 
should be guided by the significance and the de¬ 
sired interpretation of the resulting factors. The 
first-step factors have the most straightforward 
interpretation as they are estimated in isolation 
from all other factors using raw stock returns. 
For instance, in a country-specific equity risk 
model where there are industry, fundamental 
and technical factors, the return series of indus¬ 
try factors would be close to the industry index 
returns if they are estimated in isolation in the 
first step. This would not be the case if all in¬ 
dustry, fundamental, and technical factors are 
estimated in the same step. 

An important input to the model estimation 
is the security weights used in the regressions. 
There is a variety of techniques employed in 
practice but generally more weight is assigned 
to less volatile stocks (usually represented by 
larger companies). This enhances the robust¬ 
ness of the factor estimates as stocks from these 
companies tend to have relatively more stable 
return distributions. 

Types of Factors 

In this section, we analyze in more detail the dif¬ 
ferent types of factors typically used in equity 
risk models. These can be classified under five 
major categories: market factors, classification 
variables, firm characteristics, macroeconomic 
variables, and statistical factors. 

Market Factors 

A market factor can be used as an observed 
factor in a time-series setting (e.g., in the cap¬ 


ital asset pricing model, the market factor is 
the only systematic factor driving returns). As 
an example, for a U.S. equity factor model, 
S&P 500 can be used as a market factor and 
the loading to this factor—market beta—can 
be estimated by regressing individual stock 
returns to the S&P 500. On the other hand, 
in a cross-sectional setting, the market factor 
can be estimated by regressing stock returns 
to their market beta for each time period (this 
beta can be empirical—estimated via statistical 
techniques—or set as a dummy loading, usually 
1). When incorporated into a cross-sectional re¬ 
gression with other factors, it generally works 
as an intercept, capturing the broad average re¬ 
turn for that period. This changes the interpre¬ 
tation of all other factors to returns relative to 
that average (e.g., industry factor returns would 
now represent industry-specific moves net of 
market). 

Classification Variables 

Industry and country are the most widely used 
classification variables in equity risk models. 
They can be used as observed factors in time- 
series models via country/industry indexes 
(e.g., return series of GICS indexes 5 can be 
used as observed industry factors). In a cross- 
sectional setting, these factors are estimated by 
regressing stock returns to industry /country 
betas (either estimated or represented as a 0/1 
dummy loading). These factors constitute a sig¬ 
nificant part of total risk for a majority of eq¬ 
uity portfolios, especially for portfolios tilted 
toward specific industries or countries. 

Firm Characteristics 

Factors that represent firm characteristics can 
be classified as either fundamental or techni¬ 
cal factors. These factors are extensively used 
in equity risk models; exposures to these fac¬ 
tors represent tilts towards major investment 
themes such as size, value, and momentum. 
Fundamental factors generally employ a mix of 
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accounting and market variables (e.g., account¬ 
ing ratios) and technical factors commonly use 
return and volume data (e.g., price momentum 
or average daily volume traded). 

In a time-series setting, these factors can be 
constructed as representative long-short portfo¬ 
lios (e.g., Fama-French factors). As an example, 
the value factor can be constructed by taking a 
long position in stocks that have a large book 
to price ratio and a short position in the stocks 
that have a small book to price ratio. On the 
other hand, in a cross-sectional setup, these fac¬ 
tors can be estimated by regressing the stock 
returns to observed firm characteristics. For in¬ 
stance, a book to price factor can be estimated 
by regressing stock returns to the book to price 
ratios of the companies. In practice, fundamen¬ 
tal and technical factors are generally estimated 
jointly in a multivariate setting. 

A popular technique in the cross-sectional set¬ 
ting is the standardization of the characteris¬ 
tic used as loading such that it has a mean of 
zero and a standard deviation of one. This im¬ 
plies that the loading to the corresponding fac¬ 
tor is expressed in relative terms, making the 
exposures more comparable across the differ¬ 
ent fundamental/technical factors. Also, sim¬ 
ilar characteristics can be combined to form 
a risk index and then this index can be used 
to estimate the relevant factor (e.g., different 
value ratios such as earnings to price and 
book to price can be combined to construct a 
value index, which would be the exposure to 
the value factor). The construction of an in¬ 
dex from similar characteristics can help re¬ 
duce the problem of multicollinearity referred 
to above. Unfortunately, it can also dilute the 
signal each characteristic has, potentially re¬ 
ducing its explanatory power. This trade-off 
should be taken into account while construct¬ 
ing the model. The construction of fundamental 
factors and their loadings requires careful han¬ 
dling of accounting data. These factors tend to 
become more significant for portfolios that are 
hedged with respect to the market or industry 
exposures. 


Macroeconomic Variables 

Macroeconomic factors, representing the state 
of the economy, are generally used as observed 
factors in time-series models. Widely used 
examples include interest rates, commodity in¬ 
dexes, and market volatility (e.g., the VIX in¬ 
dex). These factors tend to be better suited 
for models with a long horizon. For short to 
medium horizons, they tend to be relatively in¬ 
significant when included in a model that incor¬ 
porates other standard factors such as industry. 
The opposite is not true, suggesting that macro 
factors are relatively less important for these 
horizons. This does not mean that the macro- 
economic variables are not relevant in 
explaining stock returns; it means that a large 
majority of macroeconomic effects can be cap¬ 
tured through the industry factors. Moreover, 
it is difficult to directly estimate stock sensitiv¬ 
ities to slow-moving macroeconomic variables. 
These considerations lead to the relatively infre¬ 
quent use of macro variables in short to medium 
horizon risk models. 6 

Statistical Factors 

Statistical factors are very different in nature 
from all the aforementioned factors as they do 
not have direct economic interpretation. They 
are estimated using statistical techniques such 
as principal component analysis where both 
factors and loadings are estimated jointly in an 
iterative fashion. Their interpretation can be dif¬ 
ficult, yet in certain cases they can be re-mapped 
to well-known factors. For instance, in a prin¬ 
cipal component analysis model for the U.S. 
equity market, the first principal component 
would represent the U.S. market factor. These 
models tend to have a relatively high in-sample 
explanatory power with a small set of factors 
and the marginal contribution of each factor 
tends to diminish significantly after the first few 
factors. Statistical factors can also be used to 
capture the residual risk in a model with eco¬ 
nomic factors. These factors tend to work better 
when there are unidentified sources of risk such 
as in the case of high-frequency models. 
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Other Considerations in 
Factor Models 

Various quantitative and qualitative measures 
can be employed to evaluate the relative per¬ 
formance of different model designs. Gener- 
ically, better risk models are able to forecast 
more accurately the risk of different types of 
portfolios across different economic environ¬ 
ments. Moreover, a better model allows for an 
intuitive analysis of the portfolio risk along the 
directions used to construct and manage the 
portfolio. The relative importance of these con¬ 
siderations should frame how we evaluate 
different models. 

A particular model is defined by its estima¬ 
tion framework and the selection of its factors 
and loadings. Typically, these choices are eval¬ 
uated jointly, as the contributions of specific 
components are difficult to measure in prac¬ 
tice. Moreover, decisions on one of these com¬ 
ponents (partially) determine the choice of the 
others. For instance, if a model uses fundamen¬ 
tal firm characteristics as loadings, it also uses 
estimated factors—more generally, decisions on 
the nature of the factors determine the nature 
of the loadings and vice-versa. 

Quantitative measures of factor selection in¬ 
clude the explanatory power or significance of 
the factor, predictability of the distribution of 
the factor, and correlations between factors. On 
a more qualitative perspective, portfolio man¬ 
agers usually look for models with factors and 
loadings that have clean and intuitive interpre¬ 
tation, factors that correspond to the way they 
think about the asset class, and models that re¬ 
flect their investment characteristics (e.g., short 
vs. long horizon, local vs. global investors). 

Idiosyncratic Risk 

Once all systematic factors and loadings are 
estimated, the residual return can be computed 
as the component of total stock return that 
cannot be explained by the systematic factors. 
Idiosyncratic return—also called residual, 
nonsystematic, or name-specific return—can 


be a significant component of total return for 
individual stocks, but tends to become smaller 
for portfolios of stocks as the number of stocks 
increases and concentration decreases (the 
aforementioned diversification effect). The 
major input to the computation of idiosyncratic 
risk is the set of historical idiosyncratic returns 
of the stock. Because the nature of the company 
may change fast, a good idiosyncratic risk 
model should use only recent and relevant id¬ 
iosyncratic returns. Moreover, recent research 
suggests that there are other conditional vari¬ 
ables that may help improve the accuracy of id¬ 
iosyncratic risk estimates. For instance, there is 
substantial evidence that the market value of a 
company is highly correlated with its idiosyn¬ 
cratic risk, where larger companies exhibit 
relatively smaller idiosyncratic risk. The use 
of such variables as an extra adjustment factor 
can improve the accuracy of idiosyncratic risk 
estimates. 

As mentioned before, idiosyncratic returns 
of different issuers are assumed to be uncor¬ 
related. Flowever, different securities from the 
same issuer can show a certain level of co¬ 
movement, as they are all exposed to specific 
events affecting their common issuer. 

Interestingly, this co-movement is not perfect 
or static. Certain news can potentially affect the 
different securities issued by the same company 
(e.g., equity, bonds, or equity options) in differ¬ 
ent ways. Moreover, this relationship changes 
with the particular circumstances of the firm. 
For instance, returns from securities with claims 
to the assets of the firm should be more highly 
correlated if the firm is in distress. A good 
risk model should be able to capture these 
phenomena. 

APPLICATIONS OF EQUITY 
RISK MODELS 

Multifactor equity risk models are employed 
in various applications such as the quantitative 
analysis of portfolio risk, hedging unwanted 
exposures, portfolio construction, scenario 
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analysis, and performance attribution. In this 
section we discuss and illustrate some of these 
applications. 

Portfolio managers can be divided broadly 
into indexers (those that measure their returns 
relative to a benchmark index) and absolute 
return managers (typically hedge fund man¬ 
agers). In between stand the enhanced index¬ 
ers, those that are allowed to deviate from the 
benchmark index in order to express views, pre¬ 
sumably leading to superior returns. All are 
typically subject to a risk budget that prescribes 
how much risk they are allowed to take to 
achieve their objectives: minimize transaction 
costs and match the index return for the pure 
indexers, maximize the net return for the en¬ 
hanced indexers, or maximize absolute return 
for absolute return managers. In all of these 
cases, the manager has to merge all her views 
and constraints into a final portfolio. 

The investment process of a typical portfo¬ 
lio manager involves several steps. Given the 
investment universe and objective, the steps 
usually consist of portfolio construction, risk 
prediction, and performance evaluation. These 
steps are iterated throughout the investment cy¬ 
cle over each rebalancing period. The examples 
in this section are constructed following these 
steps. In particular, we start with a discussion 
on the portfolio construction process for three 
equity portfolio managers with different goals: 
The first aims to track a benchmark, the second 
to build a momentum portfolio, and the third 
to implement sector views in a portfolio. We 
conduct these exercises through a risk-based 
portfolio optimization approach at a monthly 
rebalancing frequency. For the index-tracking 
portfolio example, we then conduct a careful 
evaluation of its risk exposures and contribu¬ 
tions to ensure that the portfolio manager's 
views and intuition coincide with the actual 
portfolio exposures. Once comfortable with the 
positions and the associated risk, the portfolio 
is implemented. At the end of the monthly in¬ 
vestment cycle, the performance of the portfolio 
and return contributions of its different compo¬ 


nents can be evaluated using performance at¬ 
tribution. 

Scenario analysis can be employed in both 
the portfolio construction and the risk eval¬ 
uation phases of the portfolio process. This 
exercise allows the manager to gain additional 
intuition regarding the exposures of her portfo¬ 
lio and how it may behave under particular eco¬ 
nomic circumstances. It usually takes the form 
of stress testing the portfolio under historical 
or hypothetical scenarios. It can also reveal the 
sensitivity of the portfolio to particular move¬ 
ments in economic and financial variables not 
explicitly considered during the portfolio con¬ 
struction process. The last application in this 
entry illustrates this kind of analysis. 

Throughout our discussion, we use a suite 
of global cash equity risk models available 
through POINT®, the Barclays portfolio ana¬ 
lytics and modeling platform. 7 

Portfolio Construction 

Broadly speaking there are two main ap¬ 
proaches to portfolio construction: a formal 
quantitative optimization-based approach and 
a qualitative approach that is based primarily 
on manager intuition and skill. There are many 
variations within and between these two ap¬ 
proaches. In this section, we focus on risk-based 
optimization using a linear factor model. We do 
not discuss other more qualitative or nonrisk- 
based approaches (e.g., a stratified sampling). A 
common objective in a risk-based optimization 
exercise is the minimization of volatility of the 
portfolio, either in isolation or when evaluated 
against a benchmark. In the context of multifac¬ 
tor risk models, total volatility is composed of 
a systematic and an idiosyncratic component, 
as described above. Typically, both of these 
components are used in the objective function 
of the optimization problems. We demonstrate 
three different portfolio construction exercises 
and discuss how equity factor models are em¬ 
ployed in this endeavor. The examples were 
constructed using the POINT® Optimizer. 8 All 
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optimization problems were run as of July 30, 

2010 . 

Tracking an Index 

In our first example, we study the case of a port¬ 
folio manager whose goal is to create a port¬ 
folio that tracks a benchmark equity index as 
closely as possible, using a limited number of 
stocks. This is a very common problem in the 
investment industry since most assets under 
management are benchmarked to broad mar¬ 
ket indexes. Creating a benchmark-tracking 
portfolio provides a good starting point for 
implementing strategic views relative to that 
benchmark. For example, a portfolio manager 
might have a mandate to outperform a bench¬ 
mark under particular risk constraints. One 
way to implement this mandate is to dynam¬ 
ically tilt the tracking portfolio toward certain 
investment styles based on views on the future 
performance of those styles at a particular point 
in the business cycle. 

Consider a portfolio manager who is bench- 
marked to the S&P 500 index and aims to build a 
tracking portfolio composed of long-only posi¬ 
tions from the set of S&P 500 stocks. Because of 
transaction cost and position management lim¬ 
itations, the portfolio manager is restricted to a 
maximum number of 50 stocks in the tracking 
portfolio. Her objective is to minimize the track¬ 
ing error volatility (TEV) between her portfo¬ 
lio and the benchmark. Tracking error volatil¬ 
ity can be described as the volatility of the 
return differential between the portfolio and the 
benchmark (i.e., measures a typical movement 
in this net position). A portfolio's TEV is com¬ 
monly referred to as the risk or the (net) volatil¬ 
ity of the portfolio. 

As mentioned before, the total TEV is de¬ 
composed into a systematic TEV and an id¬ 
iosyncratic TEV. Moreover, because these two 
components are assumed to be independent. 

Total TEV 

= .y/Systematic TEV 2 + Idiosyncratic TEV 2 


Table 2 Total Risk of Index-Tracking Portfolio vs. the 
Benchmark (bps/month) 


Attribute 

Realized Value 

Total TEV 

39.6 

Idiosyncratic TEV 

35.8 

Systematic TEV 

16.9 


The minimization of systematic TEV is achieved 
by setting the portfolio's factor exposures (net 
of benchmark) as close to zero as possible, 
while respecting other potential constraints of 
the problem (e.g., maximum number of 50 se¬ 
curities in the portfolio). The minimization of 
idiosyncratic volatility is achieved through the 
diversification of the portfolio holdings. 

Table 2 illustrates the total risk for portfolio 
versus benchmark that comes out of the opti¬ 
mization problem. We see that total TEV of the 
net position is 39.6 bps/month with 16.9 bps/ 
month of systematic TEV and 35.8 bps/month 
of idiosyncratic TEV. If the portfolio manager 
wants to reduce her exposure to name-specific 
risk, she can increase the upper bound on the 
number of securities picked by the optimizer to 
construct the optimal portfolio (increasing the 
diversification effect). Another option would 
be to increase the relative weight of idiosyn¬ 
cratic TEV compared to the systematic TEV 
in the objective function. The portfolio result¬ 
ing from this exercise would have smaller id¬ 
iosyncratic risk but, unfortunately, would also 
have higher systematic risk. This trade-off can 
be managed based on the portfolio manager's 
preferences. 

Figure 1 depicts the distribution of the 
position amount for individual stocks in the 
portfolio. We can see that the portfolio is well 
diversified across the 50 constituent stocks with 
no significant concentrations in any of the indi¬ 
vidual positions. The largest stock position is 
4.1%, about three times larger than the small¬ 
est holding. Later in this entry, we analyze 
the risk of this particular portfolio in more 
detail. 
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Stocks in the Optimal Portfolio 


Figure 1 Position Amount of Individual Stocks in the Optimal Tracking Portfolio 


Constructing a Factor-Mimicking 
Portfolio 

Factor-mimicking portfolios allow portfolio 
managers to capitalize on their views on vari¬ 
ous investment themes. For instance, the portfo¬ 
lio manager may forecast that small-cap stocks 
will outperform large-cap stocks or that value 
stocks will outperform growth stocks in the 
near future. By constructing long-short factor- 
mimicking portfolios, managers can place 
positions in line with their views on these in¬ 
vestment themes without taking explicit direc¬ 
tional views on the broader market. 

Considering another example, suppose our 
portfolio manager forecasts that recent winner 
(high momentum) stocks will outperform re¬ 
cent losers (low momentum). To implement her 


views, she constructs two portfolios, one with 
winner stocks and one with loser stocks (100 
stocks from the S&P 500 universe in each port¬ 
folio). She then takes a long position in the win¬ 
ners portfolio and a short position in the losers 
portfolio. While a sensible approach, a long- 
short portfolio constructed in this way would 
certainly have exposures to risk factors other 
than momentum. For instance, the momentum 
view might implicitly lead to unintended sec¬ 
tor bets. If the portfolio manager wants to un¬ 
derstand and potentially limit or avoid these 
exposures, she needs to perform further anal¬ 
ysis. The use of a risk model will help her 
substantially. 

To illustrate this point, table 3 shows one 
of POINT®'s risk model outputs—the 10 


Table 3 Largest Risk Factor Exposures for the Momentum Winners/Losers Portfolio (bps/month) 


Factor Name 

Sensitivity/ 

Exposure 

Net 

Exposure 

Factor 

Volatility 

Contribution 
to TEV 

EQUITIES DEVELOPED MARKETS 

U.S. Equity Energy 

Empirical beta 

-0.094 

651 

25.3 

U.S. Equity Materials 

Empirical beta 

-0.045 

808 

15.9 

U.S. Equity CYC Media 

Empirical beta 

0.027 

759 

-9.9 

U.S. Equity FIN Banks 

Empirical beta 

0.088 

900 

13.0 

U.S. Equity FIN Diversified Financials 

Empirical beta 

-0.108 

839 

39.6 

U.S. Equity FIN Real Estate 

Empirical beta 

0.100 

956 

-19.0 

U.S. Equity TEC Software 

Empirical beta 

-0.057 

577 

17.2 

U.S. Equity TEC Semiconductors 

Empirical beta 

-0.029 

809 

9.9 

U.S. Equity Corporate Default Probability 

CDP 

-0.440 

76 

23.2 

U.S. Equity Momentum (9m) 

Momentum 

1.491 

73 

74.9 
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largest risk factor exposures by their con¬ 
tribution to TEV (last column in the table) 
for the initial long-short portfolio. While 
momentum has the largest contribution to 
volatility, other risk factors also play a sig¬ 
nificant role. As a result, major moves in 
risk factors other than momentum can have 
a significant—and potentially unintended— 
impact on the portfolio's return. 

Given this information, suppose our port¬ 
folio manager decides to avoid these expo¬ 
sures to the extent possible. She can do that 
by setting all exposures to factors other than 
momentum to zero (these type of constraints 
may not always be feasible and one may need 
to relax them to achieve a solution). More¬ 
over, because she wants the portfolio to rep¬ 
resent a pure systematic momentum effect, 
she seeks to minimize idiosyncratic risk. There 
are many ways to implement these additional 
goals, but increasingly portfolio managers are 
turning to risk models (using an optimization 
engine) to construct their portfolios in a ro¬ 
bust and convenient way. She decides to set 
up an optimization problem where the objec¬ 
tive function is the minimization of idiosyn¬ 
cratic risk. The tradable universe is the set of 
S&P 500 stocks and the portfolio is constructed 
to be dollar-neutral. This problem also incor¬ 
porates the aforementioned factor exposure 
constraints. 


The resulting portfolio (not shown) has ex¬ 
actly the risk factor exposures that were spec¬ 
ified in the problem constraints. It exhibits a 
relatively low idiosyncratic TEV. Figure 2 de¬ 
picts the largest 10 positions on the long and 
short sides of the momentum factor-mimicking 
portfolio; we see that there are no significant 
individual stock concentrations. 

Implementing Sector Views 
For our final portfolio construction example, 
let's assume we are entering a recessionary 
environment. An equity portfolio manager 
forecasts that the consumer staples sector will 
outperform the consumer discretionary sector 
in the near future, so she wants to create a 
portfolio to capitalize on this view. One sim¬ 
ple idea would be to take a long position in the 
consumer staples sector (NCY: noncyclical) and 
a short position in the consumer discretionary 
sector (CYC: cyclical) by using, for example, sec¬ 
tor ETFs. Similar to the previous example, this 
could result in exposures to risk factors other 
than the industry factors. Table 4 illustrates 
the exposure of this long-short portfolio to the 
risk factors in the POINT® U.S. equity risk 
model. As we can see in the table, the portfo¬ 
lio has significant net exposures to certain fun¬ 
damental and technical factors, such as share 
turnover. 
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Figure 2 Largest 10 Positions on Long and Short Sides for the Momentum Portfolio 
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Table 4 Factor Exposures and Contributions for Consumer Staples vs. Consumer Discretionary Portfolio 
(bps/month) 


Factor Name 

Sensitivity/ 

Exposure 

Net 

Exposure 

Factor 

Volatility 

Contribution 
to TEV 

CURRENCY 

USD (U.S. dollar) 

Market weight (%) 

0.00 

0 

0.0 

EQUITIES DEVELOPED MARKETS 

U.S. Equity CYC Automobiles 

Empirical beta 

-0.069 

1,086 

60.3 

U.S. Equity CYC Consumer Durables 

Empirical beta 

-0.093 

822 

59.1 

U.S. Equity CYC Consumer Services 

Empirical beta 

-0.140 

690 

71.1 

U.S. Equity CYC Media 

Empirical beta 

-0.292 

759 

172.8 

U.S. Equity CYC Retailing 

Empirical beta 

-0.317 

745 

185.1 

U.S. Equity NCY Retailing 

Empirical beta 

0.226 

404 

-44.9 

U.S. Equity NCY Food 

Empirical beta 

0.546 

418 

-96.4 

U.S. Equity NCY Household 

Empirical beta 

0.236 

415 

-55.3 

U.S. Equity Total Yield 

Total yield 

0.269 

36 

-3.7 

U.S. Equity Corporate Default Probability 

CDP 

-0.201 

76 

9.0 

U.S. Equity Share Turnover Rate 

Share turnover 

-0.668 

59 

-10.3 

U.S. Equity Momentum (9m) 

Momentum 

-0.144 

73 

-5.6 

U.S. Equity Discretionary Accruals 

Accruals 

-0.020 

31 

-0.2 

U.S. Equity Market Value 

Size 

0.193 

111 

1.7 

U.S. Equity Realized Volatility 

Realized volatility 

-0.619 

97 

5.5 

U.S. Equity Earnings to Price 

Earnings-Price 

0.024 

44 

0.0 

U.S. Equity Book to Price 

Book-Price 

-0.253 

40 

5.8 

U.S. Equity Earnings Forecast 

Earnings forecast 

0.038 

67 

-0.4 


Suppose the portfolio manager decides to 
limit exposures to fundamental and technical 
factors. We can again use the optimizer to con¬ 
struct a long-short portfolio, with an exposure 
(beta) of 1 to the consumer staples sector and 
a beta of — 1 to the consumer discretionary sec¬ 
tor. To limit the exposure to fundamental and 
technical risk factors, we further impose the ex¬ 
posure to each of these factors to be between 
—0.2 and 0.2. 9 We also restrict the portfolio to 
be dollar neutral, and allow for only long po¬ 
sitions in the consumer staples stocks and for 
only short positions in consumer discretionary 
stocks. Finally, we restrict the investment uni¬ 
verse to the members of the S&P 500 index. 10 

The resulting portfolio consists of 69 securities 
(approximately half of discretionary and sta¬ 
ples stocks in S&P 500) with 31 long positions 
in the consumer staples stocks and 38 short 
positions in consumer discretionary stocks. 
Table 5 depicts the factor exposures for this 
portfolio. As we can see in the table, the sum 


of the exposures to the industry factors is 1 
for the consumer staples stocks and —1 for the 
consumer discretionary stocks. Exposures to 
fundamental and technical factors are gener¬ 
ally significantly smaller when compared to 
the previous table, limiting the adverse effects 
of potential moves in these factors. Interest¬ 
ingly, no stocks from the automobiles indus¬ 
try are selected in the optimal portfolio, poten¬ 
tially due to excessive idiosyncratic risk of firms 
in that particular industry. The contribution to 
volatility from the cyclical sector is higher than 
that from the non-cyclical sector, due to higher 
volatility of industry factors in the former. 

The bounds used for the fundamental and 
technical factor exposures in the portfolio con¬ 
struction process were set to force a reduction 
in the exposure to these factors. However, there 
is a trade-off between having smaller expo¬ 
sures and having smaller idiosyncratic risk in 
the final portfolio. The resolution of this trade¬ 
off depends on the preferences of the portfolio 
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Table 5 Factor Exposures and Contributions for the Optimal Sector View Portfolio (bps/month) 


Factor Name 

Sensitivity/ 

Exposure 

Net 

Exposure 

Factor 

Volatility 

Contribution 
to TEV 

EQUITIES DEVELOPED MARKETS 

U.S. Equity CYC Consumer Durables 

Empirical beta 

-0.118 

822 

80.6 

U.S. Equity CYC Consumer Services 

Empirical beta 

-0.222 

690 

124.9 

U.S. Equity CYC Media 

Empirical beta 

-0.242 

759 

151.0 

U.S. Equity CYC Retailing 

Empirical beta 

-0.417 

745 

264.2 

U.S. Equity NCY Retailing 

Empirical beta 

0.287 

404 

-67.9 

U.S. Equity NCY Food 

Empirical beta 

0.497 

418 

-112.6 

U.S. Equity NCY Household 

Empirical beta 

0.216 

415 

-56.2 

U.S. Equity Total Yield 

Total yield 

-0.059 

36 

0.8 

U.S. Equity Corporate Default Probability 

CDP 

-0.042 

76 

1.7 

U.S. Equity Share Turnover Rate 

Share turnover 

-0.196 

59 

-3.9 

U.S. Equity Momentum (9m) 

Momentum 

-0.138 

73 

-4.7 

U.S. Equity Discretionary Accruals 

Accruals 

-0.014 

31 

-0.1 

U.S. Equity Market Value 

Size 

-0.011 

111 

-0.1 

U.S. Equity Realized Volatility 

Realized volatility 

-0.199 

97 

-0.1 

U.S. Equity Earnings to Price 

Earnings-Price 

0.027 

44 

0.0 

U.S. Equity Book to Price 

Book-Price 

-0.070 

40 

1.4 

U.S. Equity Earnings Forecast 

Earnings forecast 

0.085 

67 

-1.1 


manager. When the bounds are more restric¬ 
tive, we are also decreasing the feasible set of 
solutions available to the problem and there¬ 
fore potentially achieving a higher idiosyncratic 
risk (remember that the objective is the mini¬ 
mization of idiosyncratic risk). In our example, 
the idiosyncratic TEV of the portfolio increases 
from 119 bps/month (before the optimization) 
to 158 bps/month on the optimized portfolio. 
This change is the price paid for the ability to 
limit certain systematic risk factor exposures. 

Analyzing Portfolio Risk Using 
Multifactor Models 

Now that we have seen examples of using mul¬ 
tifactor equity models for portfolio construction 
and briefly discussed their risk outcomes, we 
take a more in-depth look at portfolio risk. Risk 
analysis based on multifactor models can take 
many forms, from a relatively high-level aggre¬ 
gate approach to an in-depth analysis of the risk 
properties of individual stocks and groups of 
stocks. Multifactor equity risk models provide 


the tools to perform the analysis of portfolio 
risk in many different dimensions, including 
exposures to risk factors, security factor con¬ 
tributions to total risk, analysis at the ticker 
level, and scenario analysis. In this section, we 
provide an overview of such detailed analysis 
using the S&P 500 index tracker example we 
created in the previous section. 

Recall from Table 2 that the TEV of the op¬ 
timized S&P 500 tracking portfolio was 39.6 
bps/month, composed mostly of idiosyncratic 
risk (35.8 bps/month) and a relatively small 
amount of systematic risk (16.9 bps/month). To 
analyze further the source of these numbers, 
we first compare the holdings of the portfolio 
with those of the benchmark and then study the 
impact of the mismatch to the risk of the net po¬ 
sition (Portfolio-Benchmark). The first column 
in Table 6 shows the net market weights (NMW) 
of the portfolio at the sector level (GICS level 1). 
Our portfolio appears to be well balanced with 
respect to the benchmark from a net market 
weight perspective. The largest market value 
discrepancies are an overweight in information 






Multifactor Equity Risk Models and Their Applications 


185 


Table 6 Net Market Weights and Risk Contributions by Sector (bps/month) 



Net Market 
Weight (%) 

Contribution to TEV (CTEV) 

Systematic 

Idiosyncratic 

Total 

Total 

0.0 

7.2 

32.7 

39.8 

Energy 

1.4 

1.3 

4.4 

5.7 

Materials 

-2.1 

1.0 

1.3 

2.3 

Industrials 

2.1 

0.3 

3.8 

4.1 

Consumer discretionary 

-3.6 

1.7 

4.7 

6.3 

Consumer staples 

-0.5 

0.5 

2.2 

2.6 

Health care 

-3.3 

1.3 

2.2 

3.4 

Financials 

0.1 

0.6 

7.0 

7.5 

Information tech 

5.2 

0.6 

5.5 

6.1 

Telecom services 

2.4 

0.2 

0.8 

1.0 

Utilities 

-1.7 

-0.1 

0.9 

0.8 


technology (+5.2%) and an underweight in 
consumer discretionary (—3.6%) and health 
care (—3.3%) companies. However, the sector 
with the largest contribution to overall risk 
(contribution to TEV, or CTEV) is financials 
(7.5 bps/month). This may seem unexpected, 
given the small NMW of this sector (0.1%). 
This result is explained by the fact that con¬ 
tributions to risk (CTEV) are dependent on the 
net market weight of an exposure, its risk and 
also the correlation between the different ex¬ 
posures. Looking into the decomposition of the 
CTEV, the table also shows that most of the total 
contribution to risk from financials is idiosyn¬ 
cratic (7.0 bps/month). This result is due to the 


small number of securities our portfolio has in 
this sector and the underlying high volatility of 
these stocks. In short, the diversification bene¬ 
fits across financial stocks are small in our port¬ 
folio: We could potentially significantly reduce 
total risk by constructing our financials expo¬ 
sure using more names. Note that this analysis 
is only possible with a risk model. 

Table 7 highlights additional risk measures 
by sector. What we see in the first column is the 
isolated TEV, that is, the risk associated with 
the stocks in that particular sector only. On an 
isolated basis, the information technology sec¬ 
tor has the highest risk in the portfolio. This 
top position in terms of isolated risk does not 


Table 7 Additional Risk Measures by Sector 



Isolated TEV 
(bps/month) 

Liquidation 

Effect on TEV 
(bps/month) 

TEV Elasticity 
(xlOO) (bps) 

Systematic 
Beta (bps) 

Total 

39.64 

-39.64 

100.00 

1.00 

Energy 

13.94 

-3.38 

14.25 

0.89 

Materials 

16.94 

1.29 

5.74 

1.25 

Industrials 

20.99 

1.41 

10.27 

1.20 

Consumer discretionary 

29.34 

4.25 

15.89 

1.11 

Consumer staples 

10.70 

-1.20 

6.59 

0.70 

Health care 

17.37 

0.37 

8.56 

0.65 

Financials 

20.77 

-2.19 

18.93 

1.34 

Information tech 

31.90 

6.20 

15.30 

0.99 

Telecom services 

11.58 

0.67 

2.53 

0.76 

Utilities 

10.30 

0.56 

1.93 

0.79 
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translate into the highest contribution to overall 
portfolio risk, as we saw in Table 6. The discrep¬ 
ancy between isolated risk numbers and contri¬ 
butions to risk is explained by the correlation 
between the exposures and allows us to un¬ 
derstand the potential hedging effects present 
across our portfolio. The liquidation effect re¬ 
ported in the table represents the change in 
TEV when we completely hedge that particu¬ 
lar position, that is, enforce zero net exposure 
to any stock in that particular sector. Interest¬ 
ingly, eliminating our exposure to information 
technology stocks would actually increase our 
overall portfolio risk by 6.2 bps/month. This 
happens because the overweight in this sec¬ 
tor is effectively hedging out risk contributions 
from other sectors. If we eliminate this expo¬ 
sure, the portfolio balance is compromised. The 
TEV elasticity reported gives an additional per¬ 
spective regarding how the TEV in the portfo¬ 
lio changes when we change the exposure to 
that sector. Specifically, it tells us the percent¬ 
age change in TEV for each 1% change in our 
exposure to that particular sector. For example, 
if we double our exposure to the energy sector, 
our TEV would increase by 14.25% (from 39.6 
bps/month to 45.2 bps/month). Finally, the re¬ 
port estimates the portfolio to have a beta of 
1.00 to the benchmark, which is, of course, in 
line with our index tracking objective. The beta 
statistic measures the comovement between the 
systematic risk drivers of the portfolio and the 
benchmark and should be interpreted only as 
that. In particular, a low portfolio beta (relative 
to the benchmark) does not imply low port¬ 
folio risk. It signals relatively low systematic 
co-movement between the two universes or a 
relatively high idiosyncratic risk for the port¬ 
folio. For example, if the sources of systematic 
risk from the portfolio and the benchmark are 
distinct, the portfolio beta is close to zero. The 
report also provides the systematic beta associ¬ 
ated with each sector. For instance, we see that 
a movement of 1% in the benchmark leads to 
a 1.34% return in the financials component of 


our portfolio. As expected, consumer staples 
and health care are two low beta industries, as 
they tend to be more stable through the business 
cycle. 11 

Although important, the information we 
examined so far is still quite aggregated. For 
instance, we know from Table 6 that a large com¬ 
ponent of idiosyncratic risk comes from finan¬ 
cials. But what names are contributing most? 
What are the most volatile sectors? How are 
systematic exposures distributed within each 
sector? Risk models should be able to provide 
answers to all these questions, allowing for a 
detailed view of the portfolio's risk exposures 
and contributions. As an example. Table 8 dis¬ 
plays all systematic risk factors the portfolio 
or the benchmark loads onto. It also provides 
the portfolio, benchmark, and net exposures for 
each risk factor, the volatility of each of these 
factors, and their contributions to total TEV. 
The table shows that the net exposures to the 
risk factors are generally low, meaning that the 
tracking portfolio has small active exposures. 
This finding is in line with the evidence from 
Table 2, where we see that the systematic risk 
is small (16.9 bps/month). If we look into the 
contributions of individual factors to total TEV, 
the table shows that the top contributors are the 
size, share turnover, and realized volatility fac¬ 
tors. The optimal index tracking portfolio tends 
to be composed of very large-cap names within 
the specified universe, and that explains the net 
positive loading to the market value (size) fac¬ 
tor. This portfolio tilt is due to the generally 
low idiosyncratic risk large companies have. 
This is seen favorably by the optimization en¬ 
gine, as it tries to minimize idiosyncratic risk. 
This same tilt would explain our net exposure 
to both the share turnover and realized volatil¬ 
ity factors, as larger companies tend to have 
lower realized volatility and share turnover too. 
Interestingly, industry factors have relatively 
small contributions to TEV, even though they 
exhibit significantly higher volatilities. This 
results from the fact that the optimization 
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U.S. Equity Realized Volatility Realized volatility —0.21 —0.08 —0.13 97 2.38 

U.S. Equity Earnings to Price Earnings-Price 0.09 0.04 0.05 44 0.19 

U.S. Equity Book to Price Book-Price —0.06 —0.03 —0.03 40 0.02 

U.S. Equity Earnings Forecast Earnings forecast 0.08 0.05 0.03 67 0.12 

U.S. Equity Other Market Volatility Market weight 1.00 1.00 0.00 17 0.00 






188 


Factor Models for Portfolio Construction 


engine specifically targets these factors because 
of their high volatility and is successful in min¬ 
imizing net exposure to industry factors in the 
final portfolio. 

Finally, remember from Table 2 that the largest 
component of the portfolio risk comes from 
name-specific exposures. Therefore, it is impor¬ 
tant to be aware of which individual stocks 
in our portfolio contribute the most to over¬ 
all risk. Table 9 shows the set of stocks in our 
portfolio with the largest idiosyncratic risk. The 
portfolio manager can use this information as a 
screening device to filter out undesirable posi¬ 
tions with high idiosyncratic risk and to make 
sure her views on individual firms translate into 
risk as expected. In particular, the list in the 
table should only include names about which 
the portfolio manager has strong views, either 
positive—expressed with positive NMW—or 
negative—in which case we would expect a 
short net position. 

It should be clear from the above examples 
that although the factors used to measure risk 
are predetermined in a linear factor model, 
there is a large amount of flexibility on the 
way the risk numbers can be aggregated and 
reported. Instead of sectors, we could have 
grouped risk by any other classification of 
individual stocks, for example, by regions or 
market capitalization. This allows the risk to be 
reported using the same investment philosophy 
underlying the portfolio construction process 12 
regardless of the underlying factor model. 


There are also many other risk analytics avail¬ 
able, not mentioned in this example, that give 
additional detail about specific risk properties 
of the portfolio and the constituents. We have 
only discussed total, systematic, and idiosyn¬ 
cratic risk (which can be decomposed into risk 
contributions on a flexible basis), and referred 
to isolated and liquidation TEV, TEV elasticity, 
and portfolio beta. Most users of multifactor 
risk models will find their own preferred 
approach to risk analysis through experience. 

Performance Attribution 

Now that we discussed portfolio construction 
and risk analysis as the first steps of the 
investment process, we give a brief overview 
of performance attribution, an ex post analysis 
of performance typically conducted at the 
end of the investment horizon. Performance 
attribution analysis provides an evaluation 
of the portfolio manager's performance with 
respect to various decisions made throughout 
the investment process. The underperformance 
or outperformance of the portfolio manager 
when compared to the benchmark can be due to 
different reasons, including effective sector allo¬ 
cation, security selection, or tilting the portfolio 
toward certain risk factors. Attribution analysis 
aims to unravel the major sources of this per¬ 
formance differential. The exercise allows the 
portfolio manager to understand how her 
particular views—translated into net 


Table 9 Individual Securities and Idiosyncratic Risk Exposures 


Company Name 

Portfolio 
Weight (%) 

Benchmark 
Weight (%) 

Net 

Weight (%) 

Idiosyncratic TEV 
(bps/month) 

Vornado Realty Trust 

2.80 

0.13 

2.67 

7.42 

Kohls Corp 

1.41 

0.15 

1.26 

6.58 

Bank of America Corp 

2.71 

1.41 

1.29 

6.16 

Conocophillips 

2.29 

0.82 

1.47 

6.03 

Roper Industries Inc 

1.62 

0.06 

1.56 

5.98 

Walt Disney Co 

2.26 

0.66 

1.60 

5.48 

Honeywell International Inc. 

2.58 

0.33 

2.25 

5.48 

Cincinnati Financial Corp 

1.88 

0.05 

1.83 

5.35 

Goldman Sachs 

0.00 

0.78 

-0.78 

5.25 
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exposures—performed during the period 
and reveals whether some of the portfolio's 
performance was the result of unintended bets. 

There are three basic forms of attribution anal¬ 
ysis used for equity portfolios. These are return 
decomposition, factor model-based attribution, 
and style analysis. In the return decomposi¬ 
tion approach, the performance of the portfo¬ 
lio manager is generally attributed to top-down 
allocation (e.g., currency, country, or sector 
allocation) in a first step, followed by a bottom- 
up security selection performance analysis. This 
is a widely used technique among equity port¬ 
folio managers. 

Factor model-based analysis attributes per¬ 
formance to exposures to risk factors such as 
industry, size, and financial ratios. It is relatively 
more complicated than the previous approach 
and is based on a particular risk model that 
needs to be well understood. For example, let's 
assume that a portfolio manager forecasts that 
value stocks will outperform growth stocks in 
the near future. As a result, the manager tilts 
the portfolio toward value stocks as compared 
to the benchmark, creating an active exposure 
to the value factor. In an attribution framework 
without systematic factors, such sources of 
performance cannot be identified and hence 
may be inadvertently attributed to other rea¬ 
sons. Factor model-based attribution analysis 
adds value by incorporating these factors (rep¬ 
resenting major investment themes) explicitly 
into the return decomposition process and by 
identifying additional sources of performance 
represented as active exposures to systematic 
risk factors. 

Style analysis, on the other hand, is based on a 
regression of the portfolio return to a set of style 
benchmarks. It requires very little information 
(e.g., we do not need to know the contents of 
the portfolio), but the outcome depends signif¬ 
icantly on the selection of style benchmarks. It 
also assumes constant loadings to these styles 
across the regression period, which may be un¬ 
realistic for managers with somewhat dynamic 
allocations. 


Factor-Based Scenario Analysis 

The last application we review goes over the 
use of equity risk factor models in the context of 
scenario analysis. Many investment profession¬ 
als utilize scenario analysis in different shapes 
and forms for both risk and portfolio construc¬ 
tion purposes. Factor-based scenario analysis 
is a tool that helps portfolio managers in their 
decision-making process by providing addi¬ 
tional intuition on the behavior of their portfo¬ 
lio under a specified scenario. A scenario can be 
a historical episode, such as the equity market 
crash of 1987, the war in Iraq, or the 2008 credit 
crisis. Alternatively, scenarios can be defined 
as a collection of hypothetical views (e.g., user- 
defined scenarios) in a variety of forms such 
as a view on a given portfolio or index (e.g., 
S&P 500 drops by 20%) or a factor (e.g., U.S. 
equity-size factor moves by 3 standard devi¬ 
ations) or correlation between factors (e.g., in¬ 
creasing correlations across markets in episodes 
of flight to quality). In this section, we use the 
POINT® Factor-Based Scenario Analysis Tool 
to illustrate how we can utilize factor models to 
perform scenario analysis. 

Before we start describing the example, let's 
take an overview of the mechanics of the model. 
It allows for the specification of user views on 
returns of portfolios, indexes, or risk factors. 
When the user specifies a view on a portfolio or 
index, this is translated into a view on risk fac¬ 
tor realizations, through the linear factor model 
framework. 13 These views are combined with 
ones that are directly specified in terms of risk 
factors. It is important to note that the portfolio 
manager does not need to specify views on all 
risk factors, and typically has views only on a 
small subset of them. Once the manager speci¬ 
fies this subset of original views, the next step 
is to expand these views to the whole set of 
factors. The scenario analysis engine achieves 
this by estimating the most likely realization 
of all other factors—given the factor realiza¬ 
tions on which views are specified—using the 
risk model covariance matrix. Once all factor 
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Table 10 Index Returns under Scenario 1 (VIX jumps by 50%) 


Universe 

Type 

Measure 

Unit 

Result 

S&P 500 

Equity index 

Return 

% 

-7.97 

FTSE U.K. 100 

Equity index 

Return 

O/ 

/o 

-9.34 

DJ EURO STOXX 50 

Equity index 

Return 

O/ 

/o 

-11.63 

NIKKEI 225 

Equity index 

Return 

0 / 

/o 

-4.99 

MSCI-AC ASIA PACIFIC EX JAPAN 

Equity index 

Return 

0 / 

/o 

-10.33 

MSCI-EMERGING MARKETS 

Equity index 

Return 

0/ 

/o 

-9.25 


realizations are populated, the scenario out¬ 
come for any portfolio or index can be com¬ 
puted by multiplying their specific exposures to 
the risk factors by the factor realizations under 
the scenario. The tool provides a detailed analy¬ 
sis of the portfolio behavior under the specified 
scenario. 

We illustrate this tool using two different sce¬ 
narios: a 50% shift in the U.S. equity market 
volatility—represented by the VIX index— 
(scenario 1) and a 50% jump in the European 
credit spreads (scenario 2). 14 We use a set of 
equity indexes from across the globe to illus¬ 
trate the impact of these two scenarios. We run 
the scenarios as of July 30, 2010, which spec¬ 
ifies the date both for the index loadings and 
the covariance matrix used. Base currency is set 
to U.S. dollars (USD) and hence index returns 
presented below are in USD. 

Table 10 shows the returns of the chosen eq¬ 
uity indexes under the first scenario. We see 
that all indexes experience significant negative 
returns with Euro Stoxx plummeting the most 


and Nikkei experiencing the smallest drop. To 
understand these numbers better, let's look into 
the contributions of different factors to these in¬ 
dex returns. 

Table 11 illustrates return contributions for 
four of these equity indexes under scenario 1. 
Specifically, for each index, it decomposes the 
total scenario return into return coming from 
different factors each index has exposure to. In 
this example, all currency factors are defined 
with respect to USD. Moreover, equity factors 
are expressed in their corresponding local cur¬ 
rencies and can be described as broad market 
factors for their respective regions. 

Not surprisingly. Table 11 shows that the ma¬ 
jority of the return contributions for selected 
indexes come from the reaction of equity mar¬ 
ket factors to the scenario. However, foreign ex¬ 
change (FX) can also be a significant portion of 
total return for some indexes, such as in the case 
of the Euro Stoxx (—4.8%). Nikkei experiences 
a relatively smaller drop in USD terms, majorly 
due to a positive contribution coming from the 


Table 11 Return Contributions for Equity Indexes under Scenario 1 (in %) 


Group 

Factor 

S&P 

500 

FTSE 

U.K. 100 

DJ EURO 
STOXX 50 

NIKKEI 

225 

FX 

GBP 


-1.77 



FX 

JPY 




1.21 

FX 

EUR 


-0.38 

-4.80 


Equity 

U.S. equity 

-7.97 




Equity 

U.K. equity 


-6.67 



Equity 

Japan equity 




-6.20 

Equity 

EMG equity 


-0.09 



Equity 

Continental Europe equity 


-0.43 

-6.83 


Total 


-7.97 

-9.34 

-11.63 

-4.99 
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Table 12 Factor Returns and Z-Scores under Scenario 1 


Group 

Name 

Measure 

Unit 

Value 

Std. Dev. 

Z-Score 

Equity 

U.K. equity 

Return 

% 

-7.85 

4.99 

-1.57 

Equity 

U.S. equity 

Return 

/o 

-8.61 

6.06 

-1.42 

Equity 

Continental Europe equity 

Return 

O/ 

/o 

-7.12 

5.04 

-1.41 

Equity 

Japan equity 

Return 

0 / 

/o 

-5.96 

4.73 

-1.26 

Equity 

EMG equity 

Return 

0 / 

/o 

-8.50 

6.88 

-1.24 

FX 

EUR 

Return 

O/ 

/o 

-4.80 

3.93 

-1.22 

FX 

GBP 

Return 

O/ 

/o 

-1.93 

3.42 

-0.56 

FX 

JPY 

Return 

0/ 

/o 

1.21 

3.39 

0.36 


JPY FX factor. This positive contribution is due 
to the safe haven nature of Japanese yen in case 
of flight to quality under increased risk aversion 
in global markets. 

Table 12 demonstrates the scenario-implied 
factor realizations ("value"), factor volatilities, 
and the Z-scores for the risk factors given in 
Table 11. The Z-score of the factor quantifies 
the effect of the scenario on that specific factor. 
It is computed as 

r 

z = — 

O r 

where r is the return of the factor in the sce¬ 
nario and a, is the standard deviation of the 
factor. Hence, the Z-score measures how many 
standard deviations a factor moves in a given 
scenario. Table 12 lists the factors by increasing 
Z-score under scenario 1. The U.K. equity factor 
experiences the largest negative move, at —1.57 
standard deviations. FX factors experience rela¬ 
tively smaller movements. JPY is the only factor 
with a positive realization due to the aforemen¬ 
tioned characteristic of the currency. 


In the second scenario, we shift European 
credit spreads by 50% (a 3.5-sigma event) and 
explore the effect of credit market swings on 
the equity markets. As we can see in Table 13, 
all equity indexes experience significant re¬ 
turns, in line with the severity of the scenario. 15 
The result also underpins the strong recent 
co-movement between the credit and equity 
markets. The exception is again the Nikkei that 
realizes a relatively smaller return. 

Table 14 provides the return, volatility, and 
the Z-score of certain relevant factors under 
scenario 2. As expected, the major mover on 
the equity side is the continental Europe equity 
factor, followed by the United Kingdom. Given 
the recent strong correlations between equity 
and credit markets across the globe, the table 
suggests that a 3.5 standard deviation shift in 
the European spread factor results in a 2 to 3 
standard deviation movement of global equity 
factors. 

The two examples above illustrate the use of 
factor models in performing scenario analysis 
to achieve a clear understanding of how a port¬ 
folio may react under different circumstances. 


Table 13 Index Returns under Scenario 2 (EUR Credit Spread Jumps by 50%) 


Universe 

Type 

Measure 

Unit 

Result 

S&P 500 

Equity index 

Return 

% 

-13.03 

FTSE U.K. 100 

Equity index 

Return 

/o 

-18.62 

DJ EURO STOXX 50 

Equity index 

Return 

o/ 

/o 

-19.68 

NIKKEI 225 

Equity index 

Return 

O/ 

/o 

-8.92 

MSCI-AC ASIA PACIFIC EX JAPAN 

Equity index 

Return 

o/ 

/o 

-18.40 

MSCI-EMERGING MARKETS 

Equity index 

Return 

o/ 

/o 

-16.83 
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Table 14 Factor Returns and Z-Scores under Scenario 2 


Group 

Name 

Measure 

Unit 

Value 

Std. Dev. 

Z-Score 

Equity 

Continental Europe equity 

Return 

% 

-14.02 

5.04 

-2.78 

Equity 

U.K. equity 

Return 

O/ 

/o 

-13.05 

4.99 

-2.62 

Equity 

Japan equity 

Return 

O/ 

/o 

-11.53 

4.73 

-2.44 

Equity 

U.S. equity 

Return 

O/ 

/o 

-14.09 

6.06 

-2.33 

Equity 

EMG equity 

Return 

0 / 

/o 

-15.93 

6.88 

-2.32 

FX 

GBP 

Return 

o/ 

/o 

-6.54 

3.42 

-1.91 

FX 

EUR 

Return 

0 / 

/o 

-6.23 

3.93 

-1.59 

FX 

JPY 

Return 

0 / 

/o 

3.07 

3.39 

0.90 


KEY POINTS 

• Multifactor equity risk models provide de¬ 
tailed insight into the structure and proper¬ 
ties of portfolios. These models characterize 
stock returns in terms of systematic factors 
and an idiosyncratic component. Systematic 
factors are generally designed to have intu¬ 
itive economic interpretation and they rep¬ 
resent common movements across securities. 
On the other hand, the idiosyncratic com¬ 
ponent represents the residual return due to 
stock-specific events. 

• Systematic factors used in equity risk mod¬ 
els can be broadly classified under five 
categories: market factors, classification vari¬ 
ables, firm characteristics, macroeconomic 
variables, and statistical factors. 

• Relative significance of systematic risk fac¬ 
tors depends on various parameters such as 
the model horizon, region/country for which 
the model is designed, existence of other fac¬ 
tors, and the particular time period of the 
analysis. For instance, in the presence of in¬ 
dustry factors, macroeconomic factors tend 
to be insignificant for short to medium hori¬ 
zon equity risk models whereas they tend to 
be more significant for long-horizon models. 
Moreover, for developed equity markets, in¬ 
dustry factors are typically more significant 
as compared to the country factors. The lat¬ 
ter are still the dominant effect for emerging 
markets. 


• Choice of the model and the estimation 
technique affect the interpretation of factors. 
For instance, in the existence of a market 
factor, industry factors represent industry- 
specific movements net of market. If there 
is no market factor, their interpretation is 
very close to market value-weighted industry 
indexes. 

• Multifactor equity risk models can be clas¬ 
sified according to how their loadings and 
factors are specified. The most common eq¬ 
uity factor models specify loadings based on 
classification (e.g., industry) and fundamen¬ 
tal or technical information, and estimate fac¬ 
tor realizations every period. Certain other 
models take factors as known (e.g., returns 
on industry indexes) and estimate loadings 
based on time-series information. A third 
class of models is based purely on statistical 
approaches without concern for economic in¬ 
terpretation of factors and loadings. Finally, it 
is possible to combine these approaches and 
construct hybrid models. Each of these ap¬ 
proaches has its own specific strengths and 
weaknesses. 

• A good multifactor equity risk model pro¬ 
vides detailed information regarding the ex¬ 
posures of a complex portfolio and can be 
a valuable tool for portfolio construction 
and risk management. It can help man¬ 
agers construct portfolios tracking a partic¬ 
ular benchmark, express views subject to a 
given risk budget, and rebalance a portfolio 
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while avoiding excessive transaction costs. 
Further, by identifying the exposures where 
the portfolio has the highest risk sensitivity 
it can help a portfolio manager reduce (or in¬ 
crease) risk in the most effective way. 

• Performance attribution based on multifactor 
equity risk models can give ex post insight 
into how the portfolio manager's views and 
corresponding investments translated into 
actual returns. 

• Factor-based scenario analysis provides port¬ 
folio managers with a powerful tool to per¬ 
form stress testing of portfolio positions and 
gain insight into the impact of specific market 
events on portfolio performance. 


NOTES 

1. The Barclays Global Risk Model is available 
through POINT®, Barclays portfolio man¬ 
agement tool. It is a multicurrency cross¬ 
asset model that covers many different 
asset classes across the fixed income and eq¬ 
uity markets, including derivatives in these 
markets. At the heart of the model is a co- 
variance matrix of risk factors. The model 
has more than 500 factors, many specific to a 
particular asset class. The asset class mod¬ 
els are periodically reviewed. Structure is 
imposed to increase the robustness of the 
estimation of such a large covariance ma¬ 
trix. The model is estimated from histor¬ 
ical data. It is calibrated using extensive 
security-level historical data and is updated 
on a monthly basis. 

2. As an example, if the portfolio has 10 stocks, 
we need to estimate 45 parameters, with 
100 stocks we would need to estimate 4,950 
parameters. 

3. This is especially the case over crisis periods 
where stock characteristics can change dra¬ 
matically over very short periods of time. 

4. Fixed income managers typically use cross- 
sectional type of models. 


5. GICS is the Global Industry Classification 
Standard by Standard & Poor's, a widely 
used classification scheme by equity port¬ 
folio managers. 

6. An application of macro variables in the 
context of risk factor models is as follows. 
First, we get the sensitivities of the port¬ 
folio to the model's risk factors. Then we 
project the risk factors into the macro vari¬ 
ables. We then combine the results from 
these two steps to get the indirect loadings 
of the portfolio to the macro factors. There¬ 
fore, instead of calculating the portfolio 
sensitivities to macro factors by aggregating 
individual stock macro sensitivities—that 
are always hard to estimate—we work with 
the portfolio's macro loadings, estimated 
indirectly from the portfolio's risk factor 
loadings as described above. This indirect 
approach may lead to statistically more 
robust relationships between portfolio re¬ 
turns and macro variables. 

7. The equity risk model suite in POINT 
consists of six separate models across the 
globe: the United States, United Kingdom, 
Continental Europe, Japan, Asia (excluding 
Japan), and global emerging markets equity 
risk models (for details see Silva, Staal, and 
Ural, 2009). It incorporates many unique 
features related to factor choice, industry 
and fundamental exposures, and risk pre¬ 
diction. 

8. See Kumar (2010). 

9. The setting of these exposures and its trade¬ 
offs are discussed later in this entry. 

10. As POINT® U.S. equity risk model in¬ 
corporates industry level factors, a unit 
exposure to a sector is implemented by re¬ 
stricting exposures to different industries 
within that sector to sum up to 1. Also, 
note that as before, the objective in the op¬ 
timization problem is the minimization of 
idiosyncratic TEV to ensure that the result¬ 
ing portfolio represents systematic—not 
idiosyncratic—effects. 
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11. Note that we can sum the sector betas into 
the portfolio beta, using portfolio sector 
weights (not net weights) as weights in the 
summation. 

12. For a detailed methodology on how to per¬ 
form this customized analysis, see Silva 
(2009). 

13. Specifically, we can back out factor realiza¬ 
tions from the portfolio or index returns by 
using their risk factor loadings. 

14. For reference, as of July 30, 2010, scenario 1 
would imply the VIX would move from 23.5 
to 35.3 and scenario 2 would imply that the 
credit spread for 


the Barclays European Credit Index would 
change from 174 bps to 261 bps. 

15. The same scenario results in a —8.12% move 
in the Barclays Euro Credit Index. 
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Abstract: A factor is a common character among a group of assets. In the equities market, for exam¬ 
ple, it could be a particular financial ratio such as the price-earnings ratio or the book-price ratio. 
Factors fall into three categories—macroeconomic influences, cross-sectional characteristics, and 
statistical factors. Within asset management firms, factors and factor-forecasting models are used 
for a number of purposes. Those purposes could be central to managing portfolios. Within a trading 
strategy, for example, factors determine when to buy and sell securities. Factors are employed in 
other areas of financial theory, such as asset pricing, risk management, and performance attribution. 


Common stock investment strategies can be 
broadly classified into the following categories: 
(1) factor-based trading strategies (also called 
stock selectiont or alpha models), (2) statistical 
arbitrage, (3) high-frequency strategies, and (4) 
event studies. Factors and factor-based models 
form the core of a major part of today's quantita¬ 
tive trading strategies. The focus of this entry is 
on developing trading strategies based on fac¬ 
tors constructed from common (cross-sectional) 
characteristics of stocks. For this purpose, first 
we provide a definition of factors. We then ex¬ 
amine the major sources of risk associated with 
trading strategies, and demonstrate how factors 


are constructed from company characteristics 
and market data. The quality of the data used 
in this process is critical. We examine several 
data cleaning and adjustment techniques to ac¬ 
count for problems occurring with backfilling 
and restatements of data, missing data, incon¬ 
sistently reported data, as well as survivorship 
and look-ahead biases. In the last section of this 
entry, we discuss the analysis of the statistical 
properties of factors. 

In a series of examples, we show the individ¬ 
ual steps for developing a basic trading strat¬ 
egy. The purpose of these examples is not to 
provide yet another profitable trading strategy. 
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but rather to illustrate the process an analyst 
may follow when performing research. In fact, 
the factors that we use for this purpose are well 
known and have for years been exploited by in¬ 
dustry practitioners. The value added of these 
examples is in the concrete illustration of the 
research and development process of a factor- 
based trading model. 


FACTOR-BASED TRADING 

Since the first version of the classic text on se¬ 
curity analysis by Benjamin Graham and David 
Dodd 1 —considered to be the Bible on the fun¬ 
damental approach to security analysis—was 
first published in 1934, equity portfolio man¬ 
agement and trading strategies have developed 
considerably. Graham and Dodd were early 
contributors to factor-based strategies because 
they extended traditional valuation approaches 
by using information throughout the financial 


statements 2 and by presenting concrete rules of 
thumb to be used to determine the attractive¬ 
ness of securities. 3 

Today's quantitative managers use factors 
as fundamental building blocks for trading 
strategies. Within a trading strategy, factors 
determine when to buy and sell securities. 
We define a factor as a common characteristic 
among a group of assets. In the equities mar¬ 
ket, it could be a particular financial ratio such 
as the price-earnings (P/E) or the book-price 
(B/P) ratios. Some of the most well-known fac¬ 
tors and their underlying basic economic ratio¬ 
nale references are provided in Table 1. 

Most often this basic definition is expanded 
to include additional objectives. First, factors 
frequently are intended to capture some eco¬ 
nomic intuition. For instance, a factor may help 
understand the prices of assets by reference 
to their exposure to sources of macroeconomic 
risk, fundamental characteristics, or basic mar¬ 
ket behavior. Second, we should recognize that 


Table 1 Summary of Well-Known Factors and Their Underlying Economic Rationale 


Factor Economic Rationale 


Dividend yield 
Value 

Size (market capitalization) 
Asset turnover 

Earnings revisions 

Growth of fiscal year 1 and 
fiscal year 2 earnings 
estimates 
Momentum 
Return reversal 

Idiosyncratic risk 

Earnings surprises 

Accounting accruals 

Corporate governance 


Executive compensation factors 
Accounting risk factors 


Investors prefer to immediately receive receipt of their investment returns. 
Investors prefer stocks with low valuations. 

Smaller companies tend to outperform larger companies. 

This measure evaluates the productivity of assets employed by a firm. 

Investors believe higher turnover correlates with higher future return. 
Positive analysts' revisions indicate stronger business prospects and 
earnings for a firm. 

Investors are attracted to companies with growing earnings. 


Investors prefer stocks that have had good past performance. 

Investors overreact to information, that is, stocks with the highest returns 
in the current month tend to earn lower returns the following month. 

Stocks with high idiosyncratic risk in the current month tend to have 
lower returns the following month. 

Investors like positive earnings surprises and dislike negative earnings 
surprises. 

Companies with earnings that have a large cash component tend to have 
higher future returns. 

Firms with better corporate governance tend to have higher firm value, 
higher profits, higher sales growth, lower capital expenditures, and 
fewer corporate acquisitions. 

Firms that align compensation with shareholders' interest tend to 
outperform. 

Companies with lower accounting risk tend to have higher future returns. 
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assets with similar factors (characteristics) tend 
to behave in similar ways. This attribute is crit¬ 
ical to the success of a factor. Third, we would 
like our factor to be able to differentiate across 
different markets and samples. Fourth, we want 
our factor to be robust across different time 
periods. 

Factors fall into three categories— 
macroeconomic influences, cross-sectional 
characteristics, and statistical factors. Macro- 
economic influences are time series that mea¬ 
sure observable economic activity. Examples 
include interest rate levels, gross domestic 
production, and industrial production. Cross- 
sectional characteristics are observable asset 
specifics or firm characteristics. Examples in¬ 
clude dividend yield, book value, and volatility. 
Statistical factors are unobservable or latent 
factors common across a group of assets. These 
factors make no explicit assumptions about the 
asset characteristics that drive commonality in 
returns. Statistical factors are not derived using 
exogenous data but are extracted from other 
variables such as returns. These factors are 
calculated using various statistical techniques 
such as principal components analysis or factor 
analysis. 

Within asset management firms, factors and 
forecasting models are used for a number of 
purposes. Those purposes could be central to 
managing portfolios. For example, a portfolio 
manager can directly send the model output to 
the trading desk to be executed. In other uses, 
models provide analytical support to analysts 
and portfolio management teams. For instance, 
models are used as a way to reduce the in- 
vestable universe to a manageable number of 
securities so that a team of analysts can per¬ 
form fundamental analysis on a smaller group 
of securities. 

Factors are employed in other areas of fi¬ 
nancial theory, such as asset pricing, risk man¬ 
agement, and performance attribution. In asset 
pricing, researchers use factors as proxies for 
common, undiversifiable sources of risk in the 
economy to understand the prices or values of 


securities to uncertain payments. Examples in¬ 
clude the dividend yield of the market or the 
yield spread between a long-term bond yield 
and a short-term bond yield . 4 In risk manage¬ 
ment, risk managers use factors in risk models 
to explain and to decompose variability of re¬ 
turns from securities, while portfolio managers 
rely on risk models for covariance construction, 
portfolio construction, and risk measurement. 
In performance attribution, portfolio managers 
explain past portfolio returns based on the port¬ 
folio's exposure to various factors. Within these 
areas, the role of factors continues to expand. 
Recent research presents a methodology for at¬ 
tributing active return, tracking error, and the 
information ratio to a set of custom factors . 5 

The focus in this entry is on using factors to 
build equity forecasting models, also referred 
to as alpha or stock selection models. The models 
serve as mathematical representations of trad¬ 
ing strategies. The mathematical representation 
uses future returns as dependent variables and 
factors as independent variables. 


DEVELOPING 
FACTOR-BASED TRADING 
STRATEGIES 

The development of a trading strategy has 
many similarities with an engineering project. 
We begin by designing a framework that is 
flexible enough so that the components can be 
easily modified, yet structured enough that we 
remain focused on our end goal of designing a 
profitable trading strategy. 

Basic Framework and 
Building Blocks 

The typical steps in the development of a 
trading strategy are: 

• Defining a trading idea or investment 
strategy. 

• Developing factors. 
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• Acquiring and processing data. 

• Analyzing the factors. 

• Building the strategy. 

• Evaluating the strategy. 

• Backtesting the strategy. 

• Implementing the strategy. 

In what follows, we take a closer look at each 
step. 


Defining a Trading Idea or Investment 
Strategy 

A successful trading strategy often starts as an 
idea based on sound economic intuition, mar¬ 
ket insight, or the discovery of an anomaly. 
Background research can be helpful in order 
to understand what others have tried or imple¬ 
mented in the past. 

We distinguish between a trading idea and 
trading strategy based on the underlying eco¬ 
nomic motivation. A trading idea has a more 
short-term horizon often associated with an 
event or mispricing. A trading strategy has a 
longer horizon and is frequently based on the 
exploitation of a premium associated with an 
anomaly or a characteristic. 

Developing Factors 

Factors provide building blocks of the model 
used to build an investment strategy. We intro¬ 
duced a general definition of factors earlier in 
this entry. After having established the trading 
strategy, we move from the economic concepts 
to the construction of factors that may be able to 
capture our intuition. In this entry, we provide 
a number of examples of factors based on the 
cross-sectional characteristics of stocks. 


Acquiring and Processing Data 

A trading strategy relies on accurate and clean 
data to build factors. There are a number of 
third-party solutions and databases available 
for this purpose such as Thomson MarketQA , 6 


Factset Research Systems , 7 and Compustat 
Xpressfeed . 8 

Analyzing the Factors 

A variety of statistical and econometric 
techniques must be performed on the data to 
evaluate the empirical properties of factors. 
This empirical research is used to understand 
the risk and return potential of a factor. The 
analysis is the starting point for building a 
model of a trading strategy. 

Building the Strategy 

The model represents a mathematical specifica¬ 
tion of the trading strategy. There are two im¬ 
portant considerations in this specification: the 
selection of which factors and how these fac¬ 
tors are combined. Both considerations need to 
be motivated by the economic intuition behind 
the trading strategy. We advise against model 
specification being strictly data driven because 
that approach often results in overfitting the 
model and consequently overestimating fore¬ 
casting quality of the model. 

Evaluating, Backtesting, and Implementing 
the Strategy 

The final step involves assessing the estimation, 
specification, and forecast quality of the model. 
This analysis includes examining the goodness 
of fit (often done in sample), forecasting ability 
(often done out of sample), and sensitivity and 
risk characteristics of the model. 


RISK TO TRADING 
STRATEGIES 

In investment management, risk is a primary 
concern. The majority of trading strategies are 
not risk free but rather subject to various risks. 
It is important to be familiar with the most 
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common risks in trading strategies. By under¬ 
standing the risks in advance, we can structure 
our empirical research to identify how risks will 
affect our strategies. Also, we can develop tech¬ 
niques to avoid these risks in the model con¬ 
struction stage when building the strategy. 

We describe the various risks that are com¬ 
mon to factor trading strategies as well as other 
trading strategies such as risk arbitrage. Many 
of these risks have been categorized in the be¬ 
havioral finance literature. 9 The risks discussed 
include fundamental risk, noise trader risk, 
horizon risk, model risk, implementation risk, 
and liquidity risk. 

Fundamental risk is the risk of suffering ad¬ 
verse fundamental news. For example, say our 
trading strategy focuses on purchasing stocks 
with high earnings-to-price ratios. Suppose that 
the model shows a pharmaceutical stock main¬ 
tains a high score. After purchasing the stock, 
the company releases a news report that states 
it faces class-action litigation because one of its 
drugs has undocumented adverse side effects. 
While during this period other stocks with high 
earnings-to-price ratio may perform well, this 
particular pharmaceutical stock will perform 
poorly despite its attractive characteristic. We 
can minimize the exposure to fundamental risk 
within a trading strategy by diversifying across 
many companies. Fundamental risk may not al¬ 
ways be company specific; sometimes this risk 
can be systemic. Some examples include the 
exogenous market shocks of the stock market 
crash in 1987, the Asian financial crisis in 1997, 
and the tech bubble in 2000. In these cases, 
diversification was not that helpful. Instead, 
portfolio managers that were sector or market 
neutral in general fared better. 

Noise trader risk is the risk that a mispric¬ 
ing may worsen in the short run. The typical 
example includes companies that clearly are 
undervalued (and should therefore trade at a 
higher price). Flowever, because noise traders 
may trade in the opposite direction, this mis¬ 
pricing can persist for a long time. Closely re¬ 
lated to noise trader risk is horizon risk. The 


idea here is that the premium or value takes 
too long to be realized, resulting in a realized 
return lower than a target rate of return. 

Model risk, also referred to as misspecification 
risk, refers to the risk associated with mak¬ 
ing wrong modeling assumptions and deci¬ 
sions. This includes the choice of variables, 
methodology, and context the model operates 
in. There are different sources that may re¬ 
sult in model misspecification and there are 
several remedies based on information theory, 
Bayesian methods, shrinkage, and random co¬ 
efficient models. 10 

Implementation risk is another risk faced by in¬ 
vestors implementing trading strategies. This 
risk category includes transaction costs and 
funding risk. Transaction costs such as com¬ 
missions, bid-ask spreads, and market impact 
can adversely affect the results from a trad¬ 
ing strategy. If the strategy involves shorting, 
other implementation costs arise such as the 
ability to locate securities to short and the costs 
to borrow the securities. Funding risk occurs 
when the portfolio manager is no longer able 
to get the funding necessary to implement a 
trading strategy. For example, many statisti¬ 
cal arbitrage funds use leverage to increase 
the returns of their funds. If the amount of 
leverage is constrained, then the strategy will 
not earn attractive returns. Khandani and Lo 
(2007) confirm this example by showing that 
greater competition and reduced profitability 
of quantitative strategies today require more 
leverage to maintain the same level of expected 
return. 

Liquidity risk is a concern for investors. Liq¬ 
uidity is defined as the ability to (1) trade 
quickly without significant price changes, and 
(2) trade large volumes without significant price 
changes. Cerniglia and Kolm (2009) discuss the 
effects of liquidity risk during the "quant crisis" 
in August 2007. They show how the rapid liqui¬ 
dation of quantitative funds affected the trading 
characteristics and price impact of trading indi¬ 
vidual securities as well as various factor-based 
trading strategies. 
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These risks can detract or contribute to the 
success of a trading strategy. It is obvious how 
these risks can detract from a strategy. What 
is not always clear is when any one of these 
unintentional risks contributes to a strategy. 
That is, sometimes when we build a trading 
strategy we take on a bias that is not obvi¬ 
ous. If there is a premium associated with this 
unintended risk, then a strategy will earn ad¬ 
ditional return. Later the premium to this un¬ 
intended risk may disappear. For example, a 
trading strategy that focuses on price momen¬ 
tum performed strongly in the calendar years 
of 1998 and 1999. What an investor might not 
notice is that during this period the portfolio 
became increasingly weighted toward technol¬ 
ogy stocks, particularly Internet-related stocks. 
During 2000, these stocks severely underper¬ 
formed. 


DESIRABLE PROPERTIES 
OF FACTORS 

Factors should be founded on sound economic 
intuition, market insight, or an anomaly. In ad¬ 
dition to the underlying economic reasoning, 
factors should have other properties that make 
them effective for forecasting. 

It is an advantage if factors are intuitive to 
investors. Many investors will only invest in a 
particular fund if they understand and agree 
with the basic ideas behind the trading strate¬ 
gies. Factors give portfolio managers a tool in 
communicating to investors what themes they 
are investing in. 

The search for the economic meaningful fac¬ 
tors should avoid strictly relying on pure his¬ 
torical analysis. Factors used in a model should 
not emerge from a sequential process of eval¬ 
uating successful factors while removing less 
favorable ones. 

Most importantly, a group of factors should 
be parsimonious in its description of the trading 
strategy. This requires careful evaluation of the 
interaction between the different factors. For ex¬ 


ample, highly correlated factors will cause the 
inferences made in a multivariate approach to 
be less reliable. Another possible problem when 
using multiple factors is the possibility of over¬ 
fitting in the modeling process. 

Any data set contains outliers, that is, obser¬ 
vations that deviate from the average proper¬ 
ties of the data. Outliers are not always trivial 
to handle and sometimes we may want to ex¬ 
clude them and other times not. For example, 
they could be erroneously reported or legiti¬ 
mate abnormal values. Later in this entry we 
discuss a few standard techniques to perform 
data cleaning. The success or failure of factors 
selected should not depend on a few outliers. 
In most cases, it is desirable to construct factors 
that are reasonably robust to outliers. 


SOURCES FOR FACTORS 

How do we find factors? The sources are 
widespread with no one source clearly domi¬ 
nating. Employing a variety of sources seems 
to provide the best opportunity to uncover fac¬ 
tors that will be valuable for developing a new 
model. 

There are a number of ways to develop factors 
based on economic foundations. It may start 
with thoughtful observation or study of how 
market participants act. For example, we may 
ask ourselves how other market participants 
will evaluate the prospects of the earnings or 
business of a firm. We may also want to consider 
what stock characteristics investors will reward 
in the future. Another common approach is to 
look for inefficiencies in the way that investors 
process information. For instance, research may 
discover that consensus expectations of earn¬ 
ings estimates are biased. 

A good source for factors is the various 
reports released by the management of com¬ 
panies. Many reports contain valuable infor¬ 
mation and may provide additional context 
on how management interprets the company 
results and financial characteristics. For 
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example, quarterly earning reports (10-Qs) may 
highlight particular financial metrics relevant 
to the company and the competitive space they 
are operating within. Other company financial 
statements and SEC filings, such as the 10-K or 
8-K, also provide a source of information to de¬ 
velop factors. It is often useful to look at the fi¬ 
nancial measures that management emphasize 
in their comments. 

Factors can be found through discussions 
with market participants such as portfolio man¬ 
agers and traders. Factors are uncovered by 
understanding the heuristics experienced in¬ 
vestors have used successfully. These heuristics 
can be translated into factors and models. 

Wall Street analyst reports—also called sell- 
side reports or equity research reports—may 
contain valuable information. The reader is of¬ 
ten not interested in the final conclusions, but 
rather in the methodology or metrics the an¬ 
alysts use to forecast the future performance 
of a company. It may also be useful to study 
the large quantity of books written by portfolio 
managers and traders that describe the process 
they use in stock selection. 

Academic literature in finance, account¬ 
ing, and economics provides evidence of 
numerous factors and trading strategies that 
earn abnormal returns. Not all strategies will 
earn abnormal profits when implemented by 
practitioners, for example, because of institu¬ 
tional constraints and transaction costs. Bushee 
and Raedy (2006) find that trading strategy re¬ 
turns are significantly decreased due to issues 
such as price pressure, restrictions against short 
sales, incentives to maintain an adequately di¬ 
versified portfolio, and restrictions to hold no 
more than 5% ownership in a firm. 

In uncovering factors, we should put eco¬ 
nomic intuition first and data analysis second. 
This avoids performing pure data mining or 
simply overfitting our models to past history. 
Research and innovation is the key to finding 
new factors. Today, analyzing and testing new 
factors and improving upon existing ones is it¬ 
self a big industry. 


BUILDING FACTORS 
FROM COMPANY 
CHARACTERISTICS 

The following sections focus on the techniques 
for building factors from company characteris¬ 
tics. Often we desire our factors to relate the fi¬ 
nancial data provided by a company to metrics 
that investors use when making decisions about 
the attractiveness of a stock such as valuation 
ratios, operating efficiency ratios, profitability 
ratios, and solvency ratios. Factors should also 
relate to the market data such as analysts' fore¬ 
casts, prices and returns, and trading volume. 

WORKING WITH DATA 

In this section, we discuss how to work with 
data and data quality issues, including some 
well-probed techniques used to improve the 
quality of the data. Though the role of getting 
and analyzing data can be mundane and te¬ 
dious, we need not forget that high-quality data 
are critical to the success of a trading strategy. 
It is important to realize model output is only 
as good as the data used to calibrate it. As the 
saying goes: "Garbage in, garbage out." 

Understanding the structure of financial data 
is important. We distinguish three different 
categories of financial data: time series, cross- 
sectional, and panel data. Time series data con¬ 
sist of information and variables collected over 
multiple time periods. Cross-sectional data con¬ 
sist of data collected at one point in time for 
many different companies (the cross-section of 
companies of interest). A panel data set con¬ 
sists of cross-sectional data collected at differ¬ 
ent points in time. We note that a panel data 
set may not be homogeneous. For instance, the 
cross-section of companies may change from 
one point in time to another. 

Data Integrity 

Quality data maintain several attributes such 
as providing a consistent view of history. 
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maintaining good data availability, containing 
no survivorship, and avoiding look-ahead bias. 
As all data sets have their limitations, it is im¬ 
portant for the quantitative researcher to be able 
to recognize the limitations and adjust the data 
accordingly. 

Data used in research should provide a con¬ 
sistent view of history. Two common problems 
that distort the consistency of financial data are 
backfilling and restatements of data. Backfill¬ 
ing of data happens when a company is first 
entered into a database at the current period 
and its historical data are also added. This pro¬ 
cess of backfilling data creates a selection bias 
because we now find historical data on this re¬ 
cently added company when previously it was 
not available. Restatements of data are preva¬ 
lent in distorting consistency of data. For ex¬ 
ample, if a company revises its earnings per 
share numbers after the initial earnings release, 
then many database companies will overwrite 
the number originally recorded in the database 
with the newly released figure. 

A frequent and common concern with finan¬ 
cial databases is data availability. First, data 
items may only be available for a short period of 
time. For example, there were many years when 
stock options were granted to employees but 
the expense associated with the option grant 
was not required to be disclosed in financial 
statements. It was not until 2005 that accounting 
standards required companies to recognize di¬ 
rectly stock options as an expense on the income 
statement. Second, data items may be available 
for only a subset of the cross-section of firms. 
Some firms, depending on the business they 
operate in, have research and development ex¬ 
penses while others do not. For example, many 
pharmaceutical companies have research and 
development expenses while utilities compa¬ 
nies do not. A third issue is that a data item 
may simply not be available because it was not 
recorded at certain points in time. Sometimes 
this happens for just a few observations, other 
times it is the case for the whole time-series for 
a specific data item for a company. Fourth, dif¬ 


ferent data items are sometimes combined. For 
example, sometimes depreciation and amorti¬ 
zation expenses are not a separate line item on 
an income statement. Instead it is included in 
cost of goods sold. Fifth, certain data items are 
only available at certain periodicities. For in¬ 
stance, some companies provide more detailed 
financial reports quarterly while others report 
more details annually. Sixth, data items may be 
inconsistently reported across different compa¬ 
nies, sectors, or industries. This may happen as 
the financial data provider translates financial 
measures from company reports to the specific 
database items (incomplete mapping), thereby 
ignoring or not correctly making the right ad¬ 
justments. 

For these issues some databases provide spe¬ 
cific codes to identify the causes of missing data. 
It is important to have procedures in place that 
can distinguish among the different reasons for 
the missing data and be able to make adjust¬ 
ments and corrections. 

Two other common problems with databases 
are survivorship and look-ahead bias. Survivor¬ 
ship bias occurs when companies are removed 
from the database when they no longer ex¬ 
ist. For example, companies can be removed 
because of a merger or bankruptcy. This bias 
skews the results because only successful firms 
are included in the entire sample. Look-ahead 
bias occurs when data are used in a study that 
would not have been available during the ac¬ 
tual period analyzed. For example, the use of 
year-end earnings data immediately at the end 
of the reporting period is incorrect because the 
data are not released by the firm until several 
days or weeks after the end of the reporting 
period. 

Data alignment is another concern when 
working with multiple databases. Many 
databases have different identifiers used to 
identify a firm. Some databases have vendor 
specific identifiers, others have common identi¬ 
fiers such as CUSIPs or ticker symbols. Unfortu¬ 
nately, CUSIPs and ticker symbols change over 
time and are often reused. This practice makes 
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it difficult to link an individual security across 
multiple databases across time. 

Example: The EBITDA/EV Factor 

This example illustrates how the nuances of 
data handling can influence the results of a 
particular study. We use data from the Com- 
pustat Point-In-Time database and calculate 
the EBITDA/EV factor. 11 This factor is defined 
as earnings before interest, taxes, depreciation, 
and amortization divided by enterprise value 
(EBITDA/EV). Our universe of stocks is the 
Russell 1000 from December 1989 to December 
2008, excluding financial companies. We calcu¬ 
late EBITDA /EV by two equivalent but differ¬ 
ent approaches. Each approach differs by the 
data items used in calculating the numerator 
(EBITDA): 

1. EBITDA = Sales (Compustat data item 2) 
- Cost of goods sold (Compustat data item 
30) - Selling and general administrative ex¬ 
penses (Compustat data item 1). 

2. EBITDA = Operating income before depre¬ 
ciation (Compustat data item 21). 


According to the Compustat manual, the fol¬ 
lowing identity holds: 

Operating income before depreciation 
= Sales — Cost of goods sold — Selling 
and general administrative expenses 

However, while this mathematical identity is 
true, this is not what we discover in the data. 
After we calculate the two factors, we form 
quintile portfolios of each factor and compare 
the individual holding rankings between the 
portfolio. Figure 1 displays the percentage dif¬ 
ferences in rankings for individual companies 
between the two portfolios. We observe that the 
results are not identical. As a matter of fact, 
there are large differences, particularly in the 
early period. In other words, the two mathe¬ 
matically equivalent approaches do not deliver 
the same empirical results. 


Potential Biases from Data 

There are numerous potential biases that may 
arise from data quality issues. It is important to 
recognize the direct effects of these data issues 



Figure 1 Percentage of Companies in Russell 1000 with Different Ranking According to the 
EBITDA/EV Factor 
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are not apparent a priori. We emphasize three 
important effects: 12 

1. Effect on average stock characteristics. When 
calculating cross-sectional averages of var¬ 
ious metrics such as book-to-price or market 
capitalization, data issues can skew statis¬ 
tics and lead to incorrect inference about the 
population characteristics used in the study. 

2. Effect on portfolio returns. The portfolio 
return implications of data issues are not 
always clear. For example, survivor bias 
results in firms being removed from the sam¬ 
ple. Typically firms are removed from the 
sample for one of two reasons—mergers and 
acquisitions or failure. In most cases firms 
are acquired at a premium from the prevail¬ 
ing stock price. Leaving these firms out of 
the sample would have a downward bias on 
returns. In cases where companies fail, the 
stock price falls dramatically and removing 
these firms from the sample will have an up¬ 
ward bias on returns. 

3. Effects on estimated moments of returns. A study 
by Kothari, Sabino, and Zach (2005) found 
that nonsurviving firms tend to be either ex¬ 
tremely bad or extremely good performers. 
Survivor bias implies truncation of such ex¬ 
treme observations. The authors of the study 
show that even a small degree of such non- 
random truncation can have a strong impact 
on the sample moments of stock returns. 


Dealing with Common Data Issues 

Most data sets are subject to some quality issues. 
To work effectively, we need to be familiar with 
data definitions and database design. We also 
need to use processes to reduce the potential 
impact of data problems as they could cause 
incorrect conclusions. 

The first step is to become familiar with the 
data standardization process vendors use to 
collect and process data. For example, many 
vendors use different templates to store data. 


Specifically, the Compustat US database has one 
template for reporting income statement data, 
while the Worldscope Global database has four 
different templates depending on whether a 
firm is classified as a bank, insurance company, 
industrial company, or other financial company. 
Other questions related to standardization a 
user should be familiar with include: 

• What are the sources of the data—publicly 
available financial statements, regulatory fil¬ 
ings, newswire services, or other sources? 

• Is there a uniform reporting template? 

• What is the delay between publication of in¬ 
formation and its availability in the database? 

• Is the data adjusted for stock splits? 

• Is history available for extinct or inactive com¬ 
panies? 

• Flow is data handled for companies with mul¬ 
tiple share classes? 

• What is the process used to aggregate the 
data? 

Understanding of the accounting principles 
underlying the data is critical. Flere, two prin¬ 
ciples of importance are the valuation method¬ 
ology and data disclosure or presentation. For 
the valuation, we should understand the type 
of cost basis used for the various accounting 
items. Specifically, are assets calculated using 
historical cost basis, fair value accounting, or 
another type? For accounting principles regard¬ 
ing disclosure and presentation, we need to 
know the definition of accounting terms, the 
format of the accounts, and the depth of detail 
provided. 

Researchers creating factors that use finan¬ 
cial statements should review the history of the 
underlying accounting principles. For example, 
the cash flow statement reported by companies 
has changed over the years. Effective for fiscal 
years ending July 15,1988, Statement of Finan¬ 
cial Accounting Standards No. 85 (SFAS No. 85) 
requires companies to report the Statement of 
Cash Flows. Prior to the adoption of that ac¬ 
counting standard, companies could report one 
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of three statements: Working Capital Statement, 
Cash Statement by Source and Use of Funds, or 
Cash Statement by Activity Historical analysis 
of any factor that uses cash flow items will re¬ 
quire adjustments to the definition of the factor 
to account for the different statements used by 
companies. 

Preferably, automated processes should be 
used to reduce the potential impact of data 
problems. We start by checking the data for 
consistency and accuracy. We can perform time 
series analysis on individual factors looking at 
outliers and for missing data. We can use mag¬ 
nitude tests to compare current data items with 
the same items for prior periods, looking for 
data that are larger than a predetermined vari¬ 
ance. When suspicious cases are identified, the 
cause of the error should be researched and any 
necessary changes made. 

Methods to Adjust Factors 

At first, factors consist of raw data from a 
database combined in an economically mean¬ 
ingful way. After the initial setup, a factor may 
be adjusted using analytical or statistical tech¬ 
niques to be more useful for modeling. The fol¬ 
lowing three adjustments are common. 

Standardization 

Standardization rescales a variable while pre¬ 
serving its order. Typically, we choose the stan¬ 
dardized variable to have a mean of zero and 
a standard deviation of one by using the trans¬ 
formation 

r new _Xi~X 

Xj — 

where X; is the stock's factor score, x is the uni¬ 
verse average, and a x is the universe standard 
deviation. There are several reasons to scale a 
variable in this way. First, it allows one to deter¬ 
mine a stock's position relative to the universe 
average. Second, it allows better comparison 
across a set of factors since means and standard 


deviations are the same. Third, it can be useful 
in combining multiple variables. 

Orthogonalization 

Sometimes the performance of our factor might 
be related to another factor. Orthogonalizing 
a factor for other specified factor(s) removes 
this relationship. We can orthogonalize by using 
averages or running regressions. 

To orthogonalize the factor using averages ac¬ 
cording to industries or sectors, we can proceed 
as follows. First, for each industry we calculate 
the industry scores 

n 

E x i • ind u 


E ind a 

1=1 

where X; is a factor and ind, / c represent the 
weight of stock i in industry k. Next, we subtract 
the industry average of the industry scores, s*, 
from each stock. We compute 

X ( new = Xi - ^ ind U ' Sk 

ke Industries 

where x l new is the new industry neutral factor. 

We can use linear regression to orthogonalize 
a factor. We first determine the coefficients in 
the equation 

Xi = a + b • fi + Si 

where// is the factor to orthogonalize the factor 
Xi by, b is the contribution off, to x„ and s, is the 
component of the factor x, not related to f. e, is 
orthogonal to f (that is, e, is independent off) 
and represents the neutralized factor 

new _ 

X; — fcz 

In the same fashion, we can orthogonalize our 
variable relative to a set of factors by using the 
multivariate linear regression 

Xi = a +Y2 b i • // + £; 
i 

and then setting x” ew = £;. 
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Often portfolio managers use a risk model to 
forecast risk and an alpha model to forecast re¬ 
turns. The interaction between factors in a risk 
model and an alpha model often concerns port¬ 
folio managers. One possible approach to ad¬ 
dress this concern is to orthogonalize the factors 
or final scores from the alpha model against the 
factors used in the risk model. Later in the entry, 
we discuss this issue in more detail. 


Transformation 

It is common practice to apply transforma¬ 
tions to data used in statistical and econometric 
models. In particular, factors are often trans¬ 
formed such that the resulting series is sym¬ 
metric or close to being normally distributed. 
Frequently used transformations include natu¬ 
ral logarithms, exponentials, and square roots. 
For example, a factor such as market capi¬ 
talization has a large skew because a sample 
of large-cap stocks typically includes mega¬ 
capitalization stocks. To reduce the influence 
of mega-capitalization companies, we may in¬ 
stead use the natural logarithm of market capi¬ 
talization in a linear regression model. 


Outlier Detection and Management 

Outliers are observations that seem to be in¬ 
consistent with the other values in a data set. 
Financial data contain outliers for a number of 
reasons including data errors, measurement er¬ 
rors, or unusual events. Interpretation of data 
containing outliers may therefore be mislead¬ 
ing. For example, our estimates could be biased 
or distorted, resulting in incorrect conclusions. 

Outliers can be detected by several methods. 
Graphs such as boxplots, scatter plots, or his¬ 
tograms can be useful to visually identify them. 
Alternatively, there are a number of numerical 
techniques available. One common method is to 
compute the interquartile-range and then iden¬ 
tify outliers as those values that are some mul¬ 
tiple of the range. The interquartile-range is a 


measure of dispersion and is calculated as the 
difference between the third and first quartiles 
of a sample. This measure represents the mid¬ 
dle 50% of the data, removing the influence of 
outliers. 

After outliers have been identified, we need to 
reduce their influence in our analysis. Trimming 
and winsorization are common procedures for 
this purpose. Trimming discards extreme val¬ 
ues in the data set. This transformation requires 
the researcher to determine the direction (sym¬ 
metric or asymmetric) and the amount of trim¬ 
ming to occur. 

Winsorization is the process of transforming 
extreme values in the data. First, we calculate 
percentiles of the data. Next we define outliers 
by referencing a certain percentile ranking. For 
example, any data observation that is greater 
than the 97.5 percentile or less than the 2.5 per¬ 
centile could be considered an outlier. Finally, 
we set all values greater or less than the refer¬ 
ence percentile ranking to particular values. In 
our example, we may set all values greater than 
the 97.5 percentile to the 97.5 percentile value 
and all values less than 2.5 percentile set to the 
2.5 percentile value. It is important to fully in¬ 
vestigate the practical consequences of using 
either one of these procedures. 


ANALYSIS OF FACTOR DATA 

After constructing factors for all securities in 
the investable universe, each factor is analyzed 
individually. Presenting the time-series and 
cross-sectional averages of the mean, standard 
deviations, and key percentiles of the distribu¬ 
tion provide useful information for understand¬ 
ing the behavior of the chosen factors. 

Although we often rely on techniques that as¬ 
sume the underlying data generating process is 
normally distributed, or at least approximately, 
most financial data is not. The underlying data 
generating processes that embody aggregate in¬ 
vestor behavior and characterize the financial 
markets are unknown and exhibit significant 
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uncertainty. Investor behavior is uncertain be¬ 
cause not all investors make rational decisions 
or have the same goals. Analyzing the proper¬ 
ties of data may help us better understand how 
uncertainty affects our choice and calibration of 
a model. 

Below we provide some examples of the 
cross-sectional characteristics of various fac¬ 
tors. For ease of exposition we use histograms to 
evaluate the data rather than formal statistical 
tests. We let particular patterns or properties of 
the histograms guide us in the choice of the ap¬ 
propriate technique to model the factor. We rec¬ 
ommend that an intuitive exploration should 
be followed by a more formal statistical test¬ 
ing procedure. Our approach here is to analyze 
the entire sample, all positive values, all nega¬ 
tive values, and zero values. Although omitted 
here, a thorough analysis should also include 
separate subsample analysis. 

Example 1: EBITDA/EV 

The first factor we discuss is the earnings before 
interest, taxes, and amortization to enterprise 
value (EBITDA/EV) factor. Enterprise value is 
calculated as the market value of the capital 
structure. This factor measures the price (enter¬ 
prise value) investors pay to receive the cash 
flows (EBITDA) of a company. The economic 
intuition underlying this factor is that the valu¬ 
ation of a company's cash flow determines the 
attractiveness of companies to an investor. 

Figure 2(A) presents a histogram of all cross- 
sectional values of the EBITDA/EV factor 
throughout the entire history of the study. The 
distribution is close to normal, showing there 
is a fairly symmetric dispersion among the val¬ 
uations companies receive. Figure 2(B) shows 
that the distribution of all the positive values of 
the factor is also almost normally distributed. 
On the other hand. Figure 2(C) shows that the 
distribution of the negative values is skewed to 
the left. However, because there are only a small 
number of negative values, it is likely that they 
will not greatly influence our model. 


Example 2: Revisions 

We evaluate the cross-sectional distribution of 
the earnings revisions factor. 13 The revisions 
factor we use is derived from sell-side ana¬ 
lyst earnings forecasts from the IBES database. 
The factor is calculated as the number of 
analysts who revise their earnings forecast up¬ 
ward minus the number of downward fore¬ 
casts, divided by the total number of forecasts. 
The economic intuition underlying this factor 
is that there should be a positive relation to 
changes in forecasts of earnings and subsequent 
returns. 

In Figure 3(A) we see that the distribution of 
revisions is symmetric and leptokurtic around 
a mean of about zero. This distribution ties with 
the economic intuition behind the revisions. 
Since business prospects of companies typically 
do not change from month-to-month, sell-side 
analysts will not revise their earnings forecast 
every month. Consequently, we expect and find 
the cross-sectional range to be peaked at zero. 
Figure 3(B) and (C), respectively, show there is 
a smaller number of both positive and negative 
earnings revisions and each one of these distri¬ 
butions are skewed. 


Example 3: Share Repurchase 

We evaluate the cross-sectional distribution of 
the shares repurchases factor. This factor is cal¬ 
culated as the difference of the current number 
of common shares outstanding and the num¬ 
ber of shares outstanding 12 months ago, di¬ 
vided by the number of shares outstanding 12 
months ago. The economic intuition underly¬ 
ing this factor is that share repurchase provides 
information to investors about future earnings 
and valuation of the company's stock. 14 We ex¬ 
pect there to be a positive relationship between 
a reduction in shares outstanding and subse¬ 
quent returns. 

We see in Figure 4(A) that the distribution 
is leptokurtic. The positive values (see Fig¬ 
ure 4(B)) are skewed to the right and the 
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A. All Factor Values 
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Figure 2 FFistograms of the Cross-Sectional Values for the EBITDA/EV Factor 
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A. All Factor Values 
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Figure 3 Histograms of the Cross-Sectional Values for the Revisions Factor 
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Figure 4 Histograms of the Cross-Sectional Values for the Share Repurchase Factor 
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negative values (see Figure 4(C)) are clustered 
in a small band. The economic intuition un¬ 
derlying share repurchases is the following. 
Firms with increasing share count indicate they 
require additional sources of cash. This need 
could be an early sign that the firm is experienc¬ 
ing higher operating risks or financial distress. 
We would expect these firms to have lower 
future returns. Firms with decreasing share 
count have excess cash and are returning value 
back to shareholders. Decreasing share count 
could result because management believes the 
shares are undervalued. As expected, we find 
the cross-sectional range to be peaked at zero 
(see Figure 4(D)) since not all firms issue or re¬ 
purchase shares on a regular basis. 

KEY POINTS 

* A factor is a common characteristic among a 
group of assets. Factors should be founded on 
sound economic intuition, market insight, or 
an anomaly. 

* Factors fall into three categories— 

macroeconomic, cross-sectional, and sta¬ 
tistical factors. 

* The main steps in the development of a factor- 
based trading strategy are (1) defining a 
trading idea or investment strategy, (2) de¬ 
veloping factors, (3) acquiring and processing 
data, (4) analyzing the factors, (5) building the 
strategy, (6) evaluating the strategy, (7) back¬ 
testing the strategy, and (8) implementing the 
strategy. 

* Most trading strategies are exposed to risk. 
The main sources of risk are fundamental risk, 
noise trader risk, horizon risk, model risk, im¬ 
plementation risk, and liquidity risk. 

* Factors are often derived from company 
characteristics and metrics, and market data. 
Examples of company characteristics and 
metrics include valuation ratios, operating ef¬ 
ficiency ratios, profitability ratios, and sol¬ 
vency ratios. Example of useful market data 
include analysts forecasts, prices and returns, 
and trading volume. 


* Fligh-quality data are critical to the success 
of a trading strategy. Model output is only as 
good as the data used to calibrate it. 

* Some common data problems and biases are 
backfilling and restatements of data, missing 
data, inconsistently reported data, and sur¬ 
vivorship and look-ahead biases. 

* The ability to detect and adjust outliers is cru¬ 
cial to a quantitative investment process. 

* Common methods used for adjusting data are 
standardization, orthogonalization, transfor¬ 
mation, trimming, and winsorization. 

* The statistical properties of factors need to be 
carefully analyzed. Basic statistical measures 
include the time-series and cross-sectional av¬ 
erages of the mean, standard deviations, and 
key percentiles. 


NOTES 

1. Graham and Dodd (1962). 

2. Graham (1949). 

3. See Bernstein (1992). 

4. See Fama and French (1988). 
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(2008). 
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financial_products / quantitative_analysis 

/ quantitative_analytics. 

7. Factset Research Systems, http://www 
.factset.com. 

8. Compustat Xpressfeed, http://www 

.compustat.com. 

9. See Barberis and Thaler (2003). 

10. For a discussion of the sources of model 
misspecification and remedies, see Fabozzi, 
Focardi, and Kolm (2010). 

11. The ability of EBITDA/EV to forecast fu¬ 
ture returns is discussed in, for example. 
Dechow, Kothari, and Watts (1988). 

12. See Nagel (2001). 

13. For a representative study see, for example, 
Bercel (1994). 

14. See Grullon and Michaely (2004). 
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Abstract: Quantitative asset managers construct and apply models that can be used for dynamic 
multifactor trading strategies. These models incorporate a number of common institutional con¬ 
straints such as turnover, transaction costs, sector, and tracking error. Approaches for the evaluation 
of return premiums and risk characteristics to factors include portfolio sorts, factor models, factor 
portfolios, and information coefficients. Several techniques are used to combine several factors into 
a single model—a trading strategy. These techniques include data driven, factor model, heuristic, 
and optimization approaches. 


In the construction of factor models, factors are 
constructed from company characteristics and 
market data. In this entry, we explain and il¬ 
lustrate how to include multiple factors with 
the purpose of developing a dynamic multifac¬ 
tor trading strategy that incorporates a num¬ 
ber of common institutional constraints such 
as turnover, transaction costs, sector, and track¬ 
ing error. For this purpose, we use a combina¬ 
tion of growth, value, quality, and momentum 
factors. For the purpose of our illustration, our 
universe of stocks is the Russell 1000 from 
December 1989 to December 2008, and we 
construct our factors by using the Compustat 


Point-In-Time and IBES databases. A complete 
list of the factors and data sets used is provided 
in the appendix. 

We begin by reviewing several approaches 
for the evaluation of return premiums and risk 
characteristics to factors, including portfolio 
sorts, factor models, factor portfolios, and infor¬ 
mation coefficients. We then turn to techniques 
that are used to combine several factors into a 
single model—a trading strategy. In particular, 
we discuss the data driven, factor model, 
heuristic, and optimization approaches. It is 
critical to perform out-of-sample backtests of a 
trading strategy to understand its performance 
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and risk characteristics. We cover the split- 
sample and recursive out-of-sample tests. 

Throughout this entry, we provide a series of 
examples, including backtests of a multifactor 
trading strategy. The purpose of these examples 
is not to attempt to provide a profitable trading 
strategy, but rather to illustrate the process a fi¬ 
nancial modeler may follow when performing 
research. We emphasize that the factors that we 
use are well known and have for years been ex¬ 
ploited by industry practitioners. We think that 
the value added of these examples is in the con¬ 
crete illustration of the research and develop¬ 
ment process of a factor-based trading model. 

CROSS-SECTIONAL 
METHODS FOR 
EVALUATION OF FACTOR 
PREMIUMS 

There are several approaches used for the eval¬ 
uation of return premiums and risk character¬ 
istics to factors. In this section, we discuss the 
four most commonly used approaches: portfo¬ 
lio sorts, factor models, factor portfolios, and 
information coefficients. We examine the 
methodology of each approach and summarize 
its advantages and disadvantages. 

In practice, to determine the right approach 
for a given situation there are several issues to 
consider. One determinant is the structure of 
the financial data. A second determinant is the 
economic intuition underlying the factor. For 
example, sometimes we are looking for a mono¬ 
tonic relationship between returns and factors 
while at other times we care only about extreme 
values. A third determinant is whether the 
underlying assumptions of each approach are 
valid for the data-generating process at hand. 

Portfolio Sorts 

In the asset pricing literature, the use of portfolio 
sorts can be traced back to the earliest tests of the 
capital asset pricing model (CAPM). The goal 
of this particular test is to determine whether a 


factor earns a systematic premium. The portfo¬ 
lios are constructed by grouping together se¬ 
curities with similar characteristics (factors). 
For example, we can group stocks by market 
capitalization into 10 portfolios—from small¬ 
est to largest—such that each portfolio contains 
stocks with similar market capitalization. The 
next step is to calculate and evaluate the returns 
of these portfolios. 

The return for each portfolio is calculated by 
equally weighting the individual stock returns. 
The portfolios provide a representation of how 
returns vary across the different values of a fac¬ 
tor. By studying the return behavior of the fac¬ 
tor portfolios, we may assess the return and 
risk profile of the factor. In some cases, we may 
identify a monotonic relationship of the returns 
across the portfolios. In other cases, we may 
identify a large difference in returns between 
the extreme portfolios. In still other cases, there 
may be no relationship between the portfolio 
returns. Overall, the return behavior of the port¬ 
folios will help us conclude whether there is a 
premium associated with a factor and describe 
its properties. 

One application of the portfolio sort is the con¬ 
struction of a factor mimicking portfolio (FMP). 
An FMP is a long-short portfolio that goes long 
stocks with high values of a factor and short 
stocks with low values of a factor, in equal dol¬ 
lar amounts. An FMP is a zero-cost factor trad¬ 
ing strategy. 

Portfolio sorts have become so widespread 
among practitioners and academics alike that 
they elicit few econometric queries, and often 
no econometric justification for the technique 
is offered. While a detailed discussion of these 
topics is beyond the scope of this book, we 
would like to point out that asset pricing tests 
used on sorted portfolios may exhibit a bias 
that favors rejecting the asset pricing model 
under consideration. 1 

The construction of portfolios sorted on a fac¬ 
tor is straightforward: 

• Choose an appropriate sorting methodology. 

• Sort the assets according to the factor. 
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• Group the sorted assets into N portfolios (usu¬ 
ally N = 5, orN= 10). 

• Compute average returns (and other statis¬ 
tics) of the assets in each portfolio over sub¬ 
sequent periods. 

The standard statistical testing procedure for 
portfolio sorts is to use a Student's f-test to eval¬ 
uate the significance of the mean return differ¬ 
ential between the portfolios of stocks with the 
highest and lowest values of the factor. 

Choosing the Sorting Methodology 

The sorting methodology should be consistent 
with the characteristics of the distribution of the 
factor and the economic motivation underlying 
its premium. We list six ways to sort factors: 

Method 1 

• Sort stocks with factor values from the highest 
to lowest. 

Method 2 

• Sort stocks with factor values from the lowest 
to highest. 

Method 3 

• First allocate stocks with zero factor values 
into the bottom portfolio. 

• Sort the remaining stocks with nonzero factor 
values into the remaining portfolios. 

For example, the dividend yield factor would 
be suitable for this sorting approach. This ap¬ 
proach aligns the factor's distributional charac¬ 
teristics of dividend and nondividend-paying 
stocks with the economic rationale. Typically, 
nondividend-paying stocks maintain character¬ 
istics that are different from dividend paying 
stocks. So we group nondividend-paying stocks 
into one portfolio. The remaining stocks are 
then grouped into portfolios depending on the 
size of their nonzero dividend yields. We dif¬ 
ferentiate among stocks with dividend yield be¬ 
cause of two reasons: (1) the size of the dividend 
yield is related to the maturity of the company, 
and (2) some investors prefer to receive their 
investment return as dividends. 


Method 4 

• Allocate stocks with zero factor values into 
the middle portfolio. 

• Sort stocks with positive factor values into the 
remaining higher portfolios (greater than the 
middle portfolio). 

• Sort stocks with negative factor values into 
the remaining lower portfolios (less than the 
middle portfolio). 

Method 5 

• Sort stocks into partitions. 

• Rank assets within each partition. 

• Combine assets with the same ranking from 
the different partitions into portfolios. 

An example will clarify this procedure. Sup¬ 
pose we want to rank stocks according to earn¬ 
ings growth on a sector-neutral basis. First, 
we separate stocks into groups corresponding 
to their sector. Within each sector, we rank 
the stocks according to their earnings growth. 
Lastly, we group all stocks with the same rank¬ 
ings of earnings growth into the final portfo¬ 
lio. This process ensures that each portfolio will 
contain an equal number of stocks from every 
sector, thereby the resulting portfolios are sector 
neutral. 

Method 6 

• Separate all the stocks with negative factor 
values. Split the group of stocks with negative 
values into two portfolios using the median 
value as the break point. 

• Allocate stocks with zero factor values into 
one portfolio. 

• Sort the remaining stocks with nonzero factor 
values into portfolios based on their factor 
values. 

An example of method 6 is the share repur¬ 
chase factor. We are interested in the extreme 
positive and negative values of this factor. As 
we see in Figure 5(A), the distribution of these 
factors is leptokurtic with the positive values 
skewed to the right and the negative values 
clustered in a small range. By choosing method 
6 to sort this variable, we can distinguish 
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between those values we view as extreme. The 
negative values are clustered so we want to dis¬ 
tinguish among the magnitudes of those values. 
We accomplish this because our sorting method 
separates the negative values by the median of 
the negative values. The largest negative values 
form the extreme negative portfolio. The posi¬ 
tive values are skewed to the right, so we want 
to differentiate between the larger and smaller 
positive values. When implementing portfolio 
method 6, we would also separate the zero val¬ 
ues from the positive values. 

The portfolio sort methodology has several 
advantages. The approach is easy to implement 
and can easily handle stocks that drop out or 
enter into the sample. The resulting portfolios 
diversify away idiosyncratic risk of individual 
assets and provide a way of assessing how av¬ 
erage returns differ across different magnitudes 
of a factor. 


The portfolio sort methodology has several 
disadvantages. The resulting portfolios may be 
exposed to different risks beyond the factor the 
portfolio was sorted on. In those instances, it 
is difficult to know which risk characteristics 
have an impact on the portfolio returns. Because 
portfolio sorts are nonparametric, they do not 
give insight as to the functional form of the rela¬ 
tion between the average portfolio returns and 
the factor. 

Next we provide three examples to illustrate 
how the economic intuition of the factor and 
cross-sectional statistics can help determine the 
sorting methodology. 

Example 1: Portfolio Sorts Based on the 
EBITDA/EV Factor 

Panel A of Figure 1 contains the cross-sectional 
distribution of the EBITDA/EV factor. This dis¬ 
tribution is approximately normally distributed 


A. All Factor Values 



B. Monthly Average Returns for the Sorted Portfolios 



ql q2 q3 q4 q5 Is 

Figure 1 Portfolio Sorts Based on the EBITDA/EV Factor 
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around a mean of 0.1, with a slight right skew. 
We use method 1 to sort the variables into five 
portfolios (denoted by ql, ..., q5) because this 
sorting method aligns the cross-sectional distri¬ 
bution of factor returns with our economic intu¬ 
ition that there is a linear relationship between 
the factor and subsequent return. In Figure 1(B), 
we see that there is a large difference between 
the equally weighted monthly returns of port¬ 
folio 1 (ql) and portfolio 5 (q5). Therefore, a 
trading strategy (denoted by Is in the graph) 
that goes long portfolio 1 and short portfolio 5 
appears to produce abnormal returns. 

Example 2: Portfolio Sorts Based on the 
Revisions Factor 

In Figure 2(A), we see that the distribution 
of earnings revisions is leptokurtic around a 
mean of about zero, with the remaining val- 

A. All Factor Values 
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ues symmetrically distributed around the peak. 
The pattern in this cross-sectional distribution 
provides insight on how we should sort this 
factor. We use method 3 to sort the variables 
into five portfolios. The firms with no change 
in revisions we allocate to the middle port¬ 
folio (portfolio 3). The stocks with positive 
revisions we sort into portfolios 1 and 2, accord¬ 
ing to the size of the revisions—while we sort 
stocks with negative revisions into portfolios 4 
and 5, according to the size of the revisions. In 
Figure 2(B), we see there is a relationship be¬ 
tween the portfolios and subsequent monthly 
returns. The positive relationship between re¬ 
visions and subsequent returns agrees with the 
factor's underlying economic intuition: We ex¬ 
pect that firms with improving earnings should 
outperform. The trading strategy that goes long 
portfolio 1 and short portfolio 5 (denoted by 



0.0 0.5 1.0 


B. Monthly Average Returns for the Sorted Portfolios 



Figure 2 The Revisions Factor 
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A. All Factor Values 



B. Monthly Average Returns for the Sorted Portfolios 

luu 

q-2 q-1 ql q2 q3 q4 q5 Is 


Figure 3 The Share Repurchase Factor 

Is in the graph) appears to produce abnormal 
returns. 

Example 3: Portfolio Sorts Based on the Share 
Repurchase Factor 

In Figure 3(A), we see the distribution of 
share repurchase is asymmetric and leptokur- 
tic around a mean of zero. The pattern in this 
cross-sectional distribution provides insight on 
how we should sort this factor. We use method 
6 to sort the variables into seven portfolios. 
We group stocks with positive revisions into 
portfolios 1 through 5 (denoted by Cj-\, ..., q 5 
in the graph) according to the magnitude of 
the share repurchase factor. We allocate stocks 
with negative repurchases into portfolios q—2 
and q—1 where the median of the negative val¬ 
ues determines their membership. We split the 
negative numbers because we are interested in 
large changes in the shares outstanding. In Fig¬ 


ure 3(B), unlike the other previous factors, we 
see that there is not a linear relationship be¬ 
tween the portfolios, fiowever, there is a large 
difference in return between the extreme port¬ 
folios (denoted by Is in the graph). This large 
difference agrees with the economic intuition 
of this factor. Changes in the number of shares 
outstanding are a potential signal for the fu¬ 
ture value and prospects of a firm. On the one 
hand, a large increase in shares outstanding 
may signal to investors (1) the need for ad¬ 
ditional cash because of financial distress, or 
(2) that the firm may be overvalued. On the 
other hand, a large decrease in the number 
of shares outstanding may indicate that man¬ 
agement believes the shares are undervalued. 
Finally, small changes in shares outstanding, 
positive or negative, typically do not have an 
impact on stock price and therefore are not 
significant. 
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Information Ratios for Portfolio Sorts 

The information ratio (IR) is a statistic for sum¬ 
marizing the risk-adjusted performance of an 
investment strategy. It is defined as the ratio of 
the average excess return to the standard devia¬ 
tion of return. For actively managed equity long 
portfolios, the IR measures the risk-adjusted 
value a portfolio manager is adding relative to a 
benchmark. 2 IR can also be used to capture the 
risk-adjusted performance of long-short portfo¬ 
lios from portfolio sorts. When comparing port¬ 
folios built using different factors, the IR is an 
effective measure for differentiating the perfor¬ 
mance between the strategies. 

New Research on Portfolio Sorts 

As we mentioned earlier in this section, the 
standard statistical testing procedure for port¬ 
folio sorts is to use a Student's f-test to evaluate 
the mean return differential between the two 
portfolios containing stocks with the highest 
and lowest values of the sorting factor. How¬ 
ever, evaluating the return between these two 
portfolios ignores important information about 
the overall pattern of returns among the remain¬ 
ing portfolios. 

Recent research by Patton and Timmermann 
(2009) provides new analytical techniques to in¬ 
crease the robustness of inference from portfo¬ 
lio sorts. The technique tests for the presence 
of a monotonic relationship between the port¬ 
folios and their expected returns. To find out 
if there is a systematic relationship between a 
factor and portfolio returns, they use the mono¬ 
tonic relation (MR) test to reveal whether the 
null hypothesis of no systematic relationship 
can be rejected in favor of a monotonic re¬ 
lationship predicted by economic theory. By 
MR it is meant that the expected returns of a 
factor should rise or decline monotonically in 
one direction as one goes from one portfolio 
to another. Moreover, Patton and Timmermann 
develop separate tests to determine the direc¬ 
tion of deviations in support of or against the 
theory. 


The authors emphasize several advantages in 
using this approach. The test is nonparametric 
and applicable to other cases of portfolios such 
as two-way and three-way sorts. This test is 
easy to implement via bootstrap methods. Fur¬ 
thermore, this test does not require specifying 
the functional form (e.g., linear) in relating the 
sorting variable to expected returns. 


FACTOR MODELS 

Classical financial theory states that the average 
return of a stock is the payoff to investors for 
taking on risk. One way of expressing this risk- 
reward relationship is through a factor model. 
A factor model can be used to decompose the re¬ 
turns of a security into factor-specific and asset- 
specific returns 

h,f = + Pi, 1 fl,t + . . - + Pi,K fr.t + 6j,f 

where b;,i, Pi, 2 , ■ ■ ■> Pi,K are the factor exposures 
of stock z',/i,i, fi,ti ■ ■ -Jkj are the factor returns, 
a, is the average abnormal return of stock z, and 
Si,t is the residual. 

This factor model specification is contempo¬ 
raneous, that is, both left- and right-hand side 
variables (returns and factors) have the same 
time subscript, t. For trading strategies one gen¬ 
erally applies a forecasting specification where 
the time subscript of the return and the factors 
are t + h (h > 1) and f, respectively. In this case, 
the econometric specification becomes 

r i,t+b = «/' + Pi, 1 fl,t + . . ■ + Pi,K fK,t + Si,t+b 

How do we interpret a trading strategy based 
on a factor model? The explanatory variables 
represent different factors that forecast security 
returns, and each factor has an associated fac¬ 
tor premium. Therefore, future security returns 
are proportional to the stock's exposure to the 
factor premium 

E(n, t +b\fi,t, ■ • •, fK,t) = a; + P-f t 
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and the variance of future stock return is 
given by 

Var(r M+b | f u ,f K t ) = |3)£(f t f' t )P; 
where and |3,- = (A,i, A, 2 , • • ■, A.fc)' and f f = 

(A*. A*. 

In the next section we discuss some specific 
econometric issues regarding cross-sectional re¬ 
gressions and factor models. 

Econometric Considerations for 
Cross-Sectional Factor Models 

In cross-sectional regressions, where the de¬ 
pendent variable 3 is a stock's return and the 
independent variables are factors, inference 
problems may arise that are the result of vio¬ 
lations of classical linear regression theory The 
three most common problems are measurement 
problems, common variations in residuals, and 
multicollinearity. 

Measurement Problems 

Some factors are not explicitly given, but need 
to be estimated. These factors are estimated 
with an error. This error can have an impact on 
the inference from a factor model. This problem 
is commonly referred to as the "errors in vari¬ 
ables problem." For example, a factor that is 
comprised of a stock's beta is estimated with an 
error because beta is determined from a regres¬ 
sion of stock excess returns on the excess returns 
of a market index. While beyond the scope of 
this entry, several approaches have been sug¬ 
gested to deal with this problem 4 

Common Variation in Residuals 

The residuals from a regression often contain 
a source of common variation. Sources of com¬ 
mon variation in the residuals are heteroskedas- 
ticity and serial correlation. 5 We note that when 
the form of heteroskedasticity and serial corre¬ 
lation is known, we can apply generalized least 
squares (GLS). If the form is not known, it has 


to be estimated, for example as part of feasible 
generalized least squares (FGLS). We summa¬ 
rize some additional possibilities next. 

Heteroskedasticity occurs when the variance 
of the residual differs across observations and 
affects the statistical inference in a linear re¬ 
gression. In particular, the estimated stan¬ 
dard errors will be underestimated and the 
f-statistics will therefore be inflated. Ignoring 
heteroskedasticity may lead the researcher to 
find significant relationships where none ac¬ 
tually exist. Several procedures have been de¬ 
veloped to calculate standard errors that are 
robust to heteroskedasticity, also known as 
heteroskedasticity-consistent standard errors. 

Serial correlation occurs when residuals terms 
in a linear regression are correlated, violating 
the assumptions of regression theory. If the se¬ 
rial correlation is positive, then the standard 
errors are underestimated and the f-statistics 
will be inflated. Cochrane (2005) suggests that 
the errors in cross-sectional regressions using 
financial data are often off by a factor of 10. Pro¬ 
cedures are available to correct for serial corre¬ 
lation when calculating standard errors. 

When the residuals from a regression are both 
heteroskedastic and serially correlated, proce¬ 
dures are available to correct them. One com¬ 
monly used procedure is the one proposed 
by Newey and West (1987) referred to as the 
"Newey-West corrections," and its extension by 
Andrews (1991). 

Petersen (2009) provides guidance on choos¬ 
ing the appropriate method to use for correctly 
calculating standard errors in panel data re¬ 
gressions when the residuals are correlated. 
He shows the relative accuracy of the differ¬ 
ent methods depends on the structure of the 
data. In the presence of firm effects, where 
the residuals of a given firm may be corre¬ 
lated across years, ordinary least squares (OLS), 
Newey-West (modified for panel data sets), or 
Fama-MacBeth, 6 corrected for first-order auto¬ 
correlation, all produce biased standard errors. 
To correct for this, Petersen recommends using 
standard errors clustered by firms. If the firm 
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effect is permanent, the fixed effects and ran¬ 
dom effects models produce unbiased standard 
errors. In the presence of time effects, where 
the residuals of a given period may be corre¬ 
lated across difference firms (cross-sectional de¬ 
pendence), Fama-MacBeth produces unbiased 
standard errors. Furthermore, standard errors 
clustered by time are unbiased when there are a 
sufficient number of clusters. To select the cor¬ 
rect approach he recommends determining the 
form of dependence in the data and comparing 
the results from several methods. 

Gow, Ormazabal, and Taylor (2009) evaluate 
empirical methods used in accounting research 
to correct for cross-sectional and time-series de¬ 
pendence. They review each of the methods, 
including several methods from the account¬ 
ing literature that have not previously been 
formally evaluated, and discuss when each 
methods produces valid inferences. 


ogy for estimating cross-sectional regressions 
of returns on factors. For notational simplicity, 
we describe the procedure for one factor. The 
multifactor generalization is straightforward. 

First, for each point in time t we perform a 
cross-sectional regression: 

r i,t = Pi,t ft + £;,o i = 1 , 2 ,..., N 

In the academic literature, the regressions are 
typically performed using monthly or quarterly 
data, but the procedure could be used at any 
frequency. 

The mean and standard errors of the time se¬ 
ries of slopes and residuals are evaluated to de¬ 
termine the significance of the cross-sectional 
regression. We estimate/ and e, as the average 
of their cross-sectional estimates, therefore, 

r t 

J = = 

t=i t=i 


Multicollinearity 

Multicollinearity occurs when two or more in¬ 
dependent variables are highly correlated. We 
may encounter several problems when this hap¬ 
pens. First, it is difficult to determine which fac¬ 
tors influence the dependent variable. Second, 
the individual p values can be misleading—a 
p value can be high even if the variable is im¬ 
portant. Third, the confidence intervals for the 
regression coefficients will be wide. They may 
even include zero. This implies that we cannot 
determine whether an increase in the indepen¬ 
dent variable is associated with an increase—or 
a decrease—in the dependent variable. There 
is no formal solution based on theory to correct 
for multicollinearity. The best way to correct for 
multicollinearity is by removing one or more of 
the correlated independent variables. It can also 
be reduced by increasing the sample size. 

Fama-MacBeth Regression 

To address the inference problem caused by 
the correlation of the residuals, Fama and Mac- 
Beth (1973) proposed the following methodol¬ 


The variations in the estimates determine the 
standard error and capture the effects of resid¬ 
ual correlation without actually estimating the 
correlations. 7 We use the standard deviations of 
the cross-sectional regression estimates to cal¬ 
culate the sampling errors for these estimates 


= Ot - /) 2 ’ = yi E ( g «.‘ - g o : 


t=i 


T 2 


f=i 


Cochrane (2005) provides a detailed anal¬ 
ysis of this procedure and compares it to 
cross-sectional OLS and pooled time-series 
cross-sectional OLS. Fie shows that when the 
factors do not vary over time and the residuals 
are cross-sectionally correlated, but not corre¬ 
lated over time, then these procedures are all 
equivalent. 


Information Coefficients 

To determine the forecast ability of a model, 
practitioners commonly use a statistic called 
the information coefficient (IC). The IC is a lin¬ 
ear statistic that measures the cross-sectional 
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correlation between a factor and its subsequent 
realized return: 8 

ICt,t+k = corr(f f , t t t+k ) 

where t f is a vector of cross sectional factor val¬ 
ues at time t and *t,t+k is a vector of returns over 
the time period ttot + k. 

Just like the standard correlation coefficient, 
the values of the IC range from — 1 to +1. A pos¬ 
itive IC indicates a positive relation between the 
factor and return. A negative IC indicates a neg¬ 
ative relation between the factor and return. ICs 
are usually calculated over an interval, for ex¬ 
ample, daily or monthly. We can evaluate how 
a factor has performed by examining the time 
series behavior of the ICs. Looking at the mean 
IC tells how predictive the factor has been over 
time. 

An alternate specification of this measure is to 
make ff the rank of a cross-sectional factor. This 
calculation is similar to the Spearman rank coef¬ 
ficient. By using the rank of the factor, we focus 
on the ordering of the factor instead of its value. 
Ranking the factor value reduces the undue in¬ 
fluence of outliers and reduces the influence of 
variables with unequal variances. For the same 
reasons, we may also choose to rank the returns 
instead of using their numerical value. 

Sorensen, Qian, and Hua (2007) present a 
framework for factor analysis based on ICs. 
Their measure of IC is the correlation between 
the factor ranks, where the ranks are the nor¬ 
malized z-score of the factor, 9 and subsequent 
return. Intuitively, this IC calculation measures 
the return associated with a one standard devia¬ 
tion exposure to the factor. Their IC calculation 
is further refined by risk adjusting the value. 
To risk adjust, the authors remove systematic 
risks from the IC and accommodate the IC for 
specific risk. By removing these risks, Qian and 
Hua (2004) show that the resulting ICs provide 
a more accurate measure of the return forecast¬ 
ing ability of the factor. 

The subsequent realized returns to a fac¬ 
tor typically vary over different time horizons. 
For example, the return to a factor based on 


price reversal is realized over short horizons, 
while valuation metrics such as EBITDA/EV 
are realized over longer periods. It therefore 
makes sense to calculate multiple ICs for a 
set of factor forecasts whereby each calculation 
varies the horizon over which the returns are 
measured. 

The IC methodology has many of the same ad¬ 
vantages as regression models. The procedure 
is easy to implement. The functional relation¬ 
ship between factor and subsequent returns is 
known (linear). 

ICs can also be used to assess the risk of fac¬ 
tors and trading strategies. The standard devi¬ 
ation of the time series (with respect to t) of 
ICs for a particular factor ( std(ICt,t+k )) can be 
interpreted as the strategy risk of a factor. Ex¬ 
amining the time series behavior of st:d(lC tjt+ k) 
over different time periods may give a better 
understanding of how often a particular factor 
may fail. Qian and Hua show that std(IC t ,t+k) 
can be used to more effectively understand the 
active risk of investment portfolios. Their re¬ 
search demonstrates that ex post tracking er¬ 
ror often exceeds the ex ante tracking provided 
by risk models. The difference in tracking error 
occurs because tracking error is a function of 
both ex ante tracking error from a risk model 
and the variability of information coefficients, 
std(IC, it+k ). They define the expected tracking 
error as 

CTTE = std(ICf. t+ *)\/Nff m0 deidis(R t ) 

where N is the number of stocks in the uni¬ 
verse (breath), cr rTI0 dt-i is the risk model track¬ 
ing error, and dis(Rt) is dispersion of returns 10 
defined by 

dis(Rf) = std(ri, f) r 2 ,t, ■ • ■, r N4 ) 

Example: Information Coefficients 
Figure 4 displays the time-varying behavior of 
ICs for each one of the factors EBITDA/EV, 
growth of fiscal year 1 and fiscal year 2 earn¬ 
ings estimates, revisions, and momentum. The 
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Figure 4 Information Coefficients over Various Horizons for EBITDA/EV, Growth of Fiscal Year 1 and 
Fiscal Year 2 Earnings Estimates, Revisions, and Momentum Factors 


graph shows the time series average of infor¬ 
mation coefficients: 

1C ]t = mean(IC t ,t+k) 

The graph depicts the information horizons for 
each factor, showing how subsequent return is 
realized over time. The vertical axis shows the 
size of the average information coefficient ICk 
for k = 1,2,..., 15. 

Specifically, the EBITDA/EV factor starts at 
almost 0.03 and monotonically increases as the 
investment horizon lengthens from one month 
to 15 months. At 15 months, the EBITDA/EV 
factor has an IC of 0.09, the highest value among 
all the factors presented in the graph. This re¬ 
lationship suggests that the EBITDA/EV fac¬ 
tor earns higher returns as the holding period 
lengthens. 

The other ICs of the factors in the graph are 
also interesting. The growth of fiscal year 1 
and fiscal year 2 earnings estimates factor is 
defined as the growth in current fiscal year 
(fyl) earnings estimates to the next fiscal year 
(fy2) earnings estimates provided by sell-side 
analysts. 11 We call the growth of fiscal year 1 
and fiscal year 2 earnings estimates factor the 
earnings growth factor throughout the remainder 
of the entry. The IC is negative and decreases 
as the investment horizon lengthens. The mo¬ 
mentum factor starts with a positive IC of 0.02 


and increases to approximately 0.055 in the fifth 
month. After the fifth month, the IC decreases. 
The revisions factor starts with a positive IC 
and increases slightly until approximately the 
eleventh month at which time the factor begins 
to decay. 

Looking at the overall patterns in the graph, 
we see that the return realization pattern to dif¬ 
ferent factors varies. One notable observation 
is that the returns to factors don't necessarily 
decay but sometimes grow with the holding 
period. Understanding the multiperiod effects 
of each factor is important when we want to 
combine several factors. This information may 
influence how one builds a model. For exam¬ 
ple, we can explicitly incorporate this informa¬ 
tion about information horizons into our model 
by using a function that describes the decay or 
growth of a factor as a parameter to be cali¬ 
brated. Implicitly, we could incorporate this in¬ 
formation by changing the holding period for a 
security traded for our trading strategy. Specifi¬ 
cally, Sneddon (2008) discusses an example that 
combines one signal that has short-range pre¬ 
dictive power with another that has long-range 
power. Incorporating this information about the 
information horizon often improves the return 
potential of a model. Kolm (2010) describes a 
general multiperiod model that combines in¬ 
formation decay, market impact costs, and real 
world constraints. 
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Factor Portfolios 

Factor portfolios are constructed to measure the 
information content of a factor. The objective 
is to mimic the return behavior of a factor and 
minimize the residual risk. Similar to portfolio 
sorts, we evaluate the behavior of these factor 
portfolios to determine whether a factor earns 
a systematic premium. 

Typically, a factor portfolio has a unit expo¬ 
sure to a factor and zero exposure to other fac¬ 
tors. Construction of factor portfolios requires 
holding both long and short positions. We can 
also build a factor portfolio that has exposure 
to multiple attributes, such as beta, sectors, or 
other characteristics. For example, we could 
build a portfolio that has a unit exposure to 
book-to-price and small size stocks. Portfolios 
with exposures to multiple factors provide the 
opportunity to analyze the interaction of differ¬ 
ent factors. 

A Factor Model Approach 

By using a multifactor model, we can build fac¬ 
tor portfolios that control for different risks. 12 
We decompose return and risk at a point in time 
into a systematic and specific component using 
the regression: 

r = Xb + u 

where r is an N vector of excess returns of the 
stocks considered, X is an N by K matrix of fac¬ 
tor loadings, b is a K vector of factor returns, 
and u is an N vector of firm specific returns 
(residual returns). Here, we assume that factor 
returns are uncorrelated with the firm specific 
return. Further assuming that firm specific re¬ 
turns of different companies are uncorrelated, 
the N by N covariance matrix of stock returns 
V is given by 

V = XFX' + A 

where F is the K by K factor return covariance 
matrix and A is the N by N diagonal matrix of 
variances of the specific returns. 


We can use the Fama-MacBeth procedure dis¬ 
cussed earlier to estimate the factor returns over 
time. Each month, we perform a GLS regression 
to obtain 

b = (X'A _ 1 X) _ 1 X'A _1 r 

OLS would give us an unbiased estimate, but 
since the residuals are heteroskedastic the GLS 
methodology is preferred and will deliver a 
more efficient estimate. The resulting holdings 
for each factor portfolio are given by the rows 
of (X/A _ 1 X) _ 1 XA _1 . 

An Optimization-Based Approach 
A second approach to build factor portfolios 
uses mean-variance optimization. Using op¬ 
timization techniques provides a flexible ap¬ 
proach for implementing additional objectives 
and constraints. 13 

Using the notation from the previous sub¬ 
section, we denote by X the set of factors. We 
would like to construct a portfolio that has max¬ 
imum exposure to one target factor from X (the 
alpha factor), zero exposure to all other factors, 
and minimum portfolio risk. Let us denote the 
alpha factor by Xq, and all the remaining ones 
by X CT . Then the resulting optimization problem 
takes the form 

1 1 
max w'X„-tw'Vw \ 

w 2 

s.t w'Xq- = 0 

The analytical solution to this optimization 
problem is given by 

h* = V 1 [I - X tT (X , (T V _ 1 X (T ) _ 1 X' tr V _1 ] X« 

We may want to add additional constraints to 
the problem. Constraints are added to make fac¬ 
tor portfolios easier to implement and meet ad¬ 
ditional objectives. Some common constraints 
include limitations on turnover, transaction 
costs, the number of assets, and liquidity pref¬ 
erences. These constraints 14 are typically imple¬ 
mented as linear inequality constraints. When 
no analytical solution is available to solve the 
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optimization with linear inequality constraints, 
we have to resort to quadratic programming 

(QP ). 15 

PERFORMANCE 
EVALUATION OF FACTORS 

Analyzing the performance of different factors 
is an important part of the development of a 
factor-based trading strategy. A researcher may 
construct and analyze over a hundred different 
factors, so a process to evaluate and compare 
these factors is needed. Most often this process 
starts by trying to understand the time-series 
properties of each factor in isolation and then 
study how they interact with each other. 

To give a basic idea of how this process may be 
performed, we use the five factors introduced 
earlier in this entry: EBITDA/EV, revisions, 
share repurchase, momentum, and earnings 
growth. These are a subset of the factors that 
we use in the factor trading strategy model 
discussed later in the entry. We choose a limited 
number of factors for ease of exposition. In par¬ 
ticular, we emphasize those factors that possess 
more interesting empirical characteristics. 

Figure 5(A) presents summary statistics of 
monthly returns of long-short portfolios con¬ 


structed from these factors. We observe that the 
average monthly return ranges from —0.05% 
for the earnings growth to 0.90% for the mo¬ 
mentum factor. The f-statistics for the mean 
return are significant at the 95% level for the 
EBITDA/EV, share repurchase, and momen¬ 
tum factors. The monthly volatility ranges from 
3.77% for the revisions factor to 7.13% for the 
momentum factor. In other words, the return 
and risk characteristics among factors vary sig¬ 
nificantly. We note that the greatest monthly 
drawdown has been large to very large for 
all of the factors, implying significant down¬ 
side risk. Overall, the results suggest that there 
is a systematic premium associated with the 
EBITDA/EV, share repurchase, and momen¬ 
tum factors. 

Let pctPos and pctNeg denote the fraction of 
positive and negative returns over time, respec¬ 
tively. These measures offer another way of in¬ 
terpreting the strength and consistency of the 
returns to a factor. For example, EBITDA/EV 
and momentum have f-statistics of 2.16 and 
1.90, respectively, indicating that the former 
is stronger. However, pctPos (pctNeg) are 0.55 
versus 0.61 (0.45 versus 0.39) showing that pos¬ 
itive returns to momentum occur more fre¬ 
quently. This may provide reassurance of the 


A. Summary Statistics of Monthly Returns of Long-Short Portfolios 



Mean 

Stdev 

Median 

f-stat 

Max 

Min 

pctPos 

pctNeg 

Revisions 

0.29 

3.77 

0.77 

1.17 

10.43 

-19.49 

0.55 

0.45 

EBITDA/EV 

0.83 

5.80 

0.72 

2.16 

31.61 

-30.72 

0.55 

0.45 

Share repurchase 

0.72 

3.89 

0.43 

2.78 

22.01 

-14.06 

0.61 

0.39 

Momentum 

0.90 

7.13 

0.97 

1.90 

25.43 

-42.71 

0.61 

0.39 

Earnings growth 

-0.05 

4.34 

0.25 

-0.18 

14.03 

-23.10 

0.53 

0.47 

B. Correlations between Long-Short Portfolios 
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1.00 
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-0.12 
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0.12 
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Figure 5 Results from Portfolio Sorts 
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Figure 6 Cumulative Returns of Long-Short Portfolios 


usefulness of the momentum factor, despite the 
fact that its f-statistic is below the 95% level. 

Figure 5(B) presents unconditional correla¬ 
tion coefficients of monthly returns for long- 
short portfolios. The comovement of factor 
returns varies among the factors. The lowest 
correlation is —0.28 between EBITDA/EV and 
revisions. The highest correlation is 0.79 be¬ 
tween momentum and revisions. In addition, 
we observe that the correlation between re¬ 
visions and share repurchase, and between 
EBITDA/EV and earnings growth are close to 
zero. The broad range of correlations provides 
evidence that combining uncorrelated factors 
could produce a successful strategy. 

Figure 6 presents the cumulative returns for 
the long-short portfolios. The returns of the 
long-short factor portfolios experience sub¬ 
stantial volatility. We highlight the following 
patterns of cumulative returns for the different 
factors: 

* The cumulative return of the revisions factor 
is positive in the early periods (12/1989 to 
6/1998). While it is volatile, its cumulative re¬ 


turn is higher in the next period (7/1998 to 
7/2000). It deteriorates sharply in the follow¬ 
ing period (8/2000 to 6/2003), and levels out 
in the later periods (7/2003 to 12/2008). 

• The performance of the EBITDA/EV factor 
is consistently positive in the early periods 
(12/1989 to 9/1998), deteriorates in the next 
period (10/1998 to 1/2000) and rebounds 
sharply (2/ 2000 to 7/2002), grows at a slower 
but more historically consistent rate in the 
later periods (8/2002 to 4/2007), deteriorates 
in the next period (5/2007 to 9/2007), and re¬ 
turns to more historically consistent returns 
in last period (10/2007 to 12/2008). 

• The cumulative return of the share repurchase 
factor grows at a slower pace in the early 
years (12/1989 to 5/1999), falls slightly in the 
middle periods (6/1999 to 1/2000), rebounds 
sharply (2/2000 to 7/2002), falls then flattens 
out in the next period (8/2002 to 4/2008), 
and increases at a large rate late in the graph 
(5/2008 to 12/2008). 

• The momentum factor experiences the largest 
volatility. This factor performs consistently 
well in the early period (12/1989 to 12/1998), 
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A. Basic Statistics for Monthly Information Coefficients 



Mean 

Stdev 

Median 

t-stat 

Max 

Min 

pctPos 

pctNeg 

Revisions 

0.02 

0.10 

0.02 

2.51 

0.31 

-0.29 

0.58 

0.42 

EBITDA/EV 

0.03 

0.13 

0.02 

3.13 

0.48 

-0.41 

0.59 

0.41 

Share repurchase 

-0.01 

0.10 

-0.00 

-2.13 

0.20 

-0.45 

0.48 

0.52 

Momentum 

0.03 

0.18 

0.05 

2.86 

0.50 

-0.57 

0.59 

0.41 

Earnings growth 

-0.00 

0.13 

0.00 

-0.56 

0.26 

-0.28 

0.51 

0.49 

B. Correlations for Monthly Average Information Coefficients 
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Figure 7 Summary of Monthly Factor Information Coefficients 


experiences sharp volatility in the middle pe¬ 
riod (1/1999 to 5/2003), flattens out (6/2003 
to 6/2007), and grows at an accelerating rate 
from (7/2007to 12/2008). 

* The performance of the earnings growth fac¬ 
tor is flat or negative throughout the entire 
period. 

The overall pattern of the cumulative returns 
among the factors clearly illustrates that factor 
returns and correlations are time varying. 

In Figure 7(A), we present summary statistics 
of the monthly information coefficients of the 
factors. The average monthly information coef¬ 
ficients range from 0.03 for EBITDA/EV and 
momentum, to 0.01 for the share repurchase 
factor. The f-statistics for the mean ICs are sig¬ 
nificant at the 95% level for all factors except 
earnings growth. With the exception of share 
repurchase and earnings growth, the fraction of 
positive returns of the factors are significantly 
greater than that of the negative returns. 

The share repurchase factor requires some 
comments. The information coefficient is neg¬ 
ative, in contrast to the positive return in 
the long-short portfolio sorts, because nega¬ 
tive share repurchases are correlated with sub¬ 
sequent return. The information coefficient is 
lower than we would expect because there is 


not a strong linear relation between the return 
and the measures. As the results from the port¬ 
folio sorts indicate, the extreme values of this 
factor provide the highest returns. 

Figure 7(B) displays unconditional correla¬ 
tion coefficients of the monthly information co¬ 
efficients. The comovement of the ICs factor 
returns varies among the factors. The lowest 
correlation is —0.66 between EBITDA/EV and 
share repurchases. But again this should be 
interpreted with caution because it is nega¬ 
tive repurchases that we view as attractive. The 
highest correlation reported in the exhibit is 0.79 
between momentum and revisions. Similar to 
the correlation of long-short factor portfolio re¬ 
turns, the diverse set of correlations provides 
evidence that combining uncorrelated factors 
may produce a successful strategy. 

In Figure 8(A), we present summary statis¬ 
tics of the time series average of the monthly 
coefficients from the Fama-MacBeth (FM) re¬ 
gressions of the factors. The information pro¬ 
vided by the FM coefficients differs from the 
information provided by portfolio sorts. The 
FM coefficients show the linear relationship be¬ 
tween the factor and subsequent returns, while 
the results from the portfolio sorts provide in¬ 
formation on the extreme values of the factors 
and subsequent returns. The difference in the 
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A. Basic Statistics for Fama-MacBeth Regression Coefficients 



Mean 

Stdev 

Median 

t -stat 

Max 

Min 

pctPos 

pctNeg 

Revisions 

0.09 

1.11 

0.22 

1.22 

3.36 

-5.26 

0.59 

0.41 

EBITDA/EV 

0.27 

1.61 

0.14 

2.50 

8.69 

-7.81 

0.59 

0.41 

Share repurchase 

-0.18 

0.96 

-0.06 

-2.90 

3.21 

-5.91 

0.44 

0.56 

Momentum 

0.31 

2.42 

0.29 

1.94 

9.97 

-12.37 

0.60 

0.40 

Earnings growth 

-0.08 

0.99 

-0.04 

-1.20 

2.83 

-4.13 

0.48 

0.52 


B. Correlations for Fama-MacBeth Regression Coefficients 



Share 

Revisions EBITDA/EV Repurchase 

Earnings 

Momentum Growth 

Revisions 

1.00 

-0.27 

0.05 

0.77 

-0.26 

EBITDA/EV 

-0.27 

1.00 

-0.75 

-0.18 

-0.58 

Share repurchase 

0.05 

-0.75 

1.00 

-0.04 

0.64 

Momentum 

0.77 

-0.18 

-0.04 

1.00 

-0.18 

Earnings growth 

-0.26 

-0.58 

0.64 

-0.18 

1.00 


Figure 8 Summary of Monthly Fama-MacBeth Regression Coefficients 


size of the mean returns between the FM coeffi¬ 
cients and portfolio sorts exits partially because 
the intercept terms from the FM regressions are 
not reported in the exhibit. 

The average monthly FM coefficient ranges 
from —0.18 for share repurchase to 0.31 for the 
momentum factor. Again the share repurchase 
results should be interpreted with caution be¬ 
cause it is negative repurchases that we view as 
attractive. The f-statistics are significant at the 
95% level for the EBITDA/EV and share repur¬ 
chase factors. 

Also, we compare the results of portfolio sorts 
in Figure 7(A) with the FM coefficients in Fig¬ 
ure 8(A). The rank ordering of the magnitude of 
factor returns is similar between the two pan¬ 
els. The f-statistics are slightly higher in the FM 
regressions than the portfolio sorts. The correla¬ 
tion coefficients for the portfolio sorts in Figure 
7(B) are consistent with the FM coefficients in 
Figure 8(B) for all the factors except for shares 
repurchases. The results for share repurchases 
need to be interpreted with caution because it is 
negative repurchases that we view as attractive. 
The portfolio sorts take that into account while 
FM regressions do not. 

To better understand the time variation of 
the performance of these factors, we calculate 
rolling 24-month mean returns and correlations 


of the factors. The results are presented in Fig¬ 
ure 9. We see that the returns and correlations to 
all factors are time varying. A few of the time se¬ 
ries experience large volatility in the rolling 24- 
month returns. The EBITDA/EV factor shows 
the largest variation followed by the momen¬ 
tum and share repurchase factors. All factors 
experience periods where the rolling average 
returns are both positive and negative. 

Figure 10 presents the rolling correlation be¬ 
tween pairs of the factors. There is substantial 
variability in many of the pairs. In most cases 
the correlation moves in a wave-like pattern. 
This pattern highlights the time-varying prop¬ 
erty of the correlations among the factors. This 
property will be important to incorporate in a 
factor trading model. The most consistent cor¬ 
relation is between momentum and revisions 
factors and this correlation is, in general, fairly 
high. 

MODEL CONSTRUCTION 
METHODOLOGIES FOR A 
FACTOR-BASED TRADING 
STRATEGY 

In the previous section, we analyzed the per¬ 
formance of each factor. The next step in 
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Figure 9 Rolling 24-Month Mean Returns for the Factors 


building our trading strategy is to determine 
how to combine the factors into one model. The 
key aspect of building this model is to (1) de¬ 
termine what factors to use out of the universe 
of factors that we have, and (2) how to weight 
them. 


We describe four methodologies to combine 
and weight factors to build a model for a trad¬ 
ing strategy. These methodologies are used to 
translate the empirical work on factors into 
a working model. Most of the methodologies 
are flexible in their specification and there is 
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some overlap between them. Though the list 
is not exhaustive, we highlight those processes 
frequently used by quantitative portfolio man¬ 
agers and researchers today The four method¬ 
ologies are the data driven, the factor model, 
the heuristic, and the optimization approaches. 

It is important to be careful how each method¬ 
ology is implemented. In particular, it is crit¬ 
ical to balance the iterative process of finding 
a robust model with good forecasting ability 
versus finding a model that is a result of data 
mining. 


The Data Driven Approach 

A data driven approach uses statistical methods 
to select and weight factors in a forecasting 
model. This approach uses returns as the in¬ 
dependent variables and factors as the depen¬ 
dent variables. There are a variety of estimation 
procedures, such as neural nets, classification 
trees, and principal components, that can be 
used to estimate these models. Usually a statis¬ 
tic is established to determine the criteria for a 
successful model. The algorithm of the statis¬ 
tical method evaluates the data and compares 
the results against the criteria. 

Many data driven approaches have no struc¬ 
tural assumptions on potential relationships the 
statistical method finds. Therefore, it is some¬ 
times difficult to understand or even explain 
the relationship among the dependent variables 
used in the model. 

Deistler and Hamann (2005) provide an ex¬ 
ample of a data driven approach to model 
development. The model they develop is used 
for forecasting the returns to financial stocks. 
To start, they split their data sample into 
two parts—an in-sample part for building the 
model and an out-of-sample part to validate 
the model. They use three different types of fac¬ 
tor models for forecasting stock returns: qua¬ 
sistatic principal components, quasistatic factor 
models with idiosyncratic noise, and reduced 
rank regression. For model selection Deistler 


and Hamann use an iterative approach where 
they find the optimal mix of factors based 
on the Akaike's information criterion and the 
Bayesian information criterion. A large num¬ 
ber of different models are compared using the 
out-of-sample data. They find that the reduced 
rank model provides the best performance. This 
model produced the highest out-of-sample R 2 s, 
hit rates, 16 and Diebold-Mariano test statistic 17 
among the different models evaluated. 


The Factor Model Approach 

In this section, we briefly address the use of fac¬ 
tor models for forecasting. The goal of the factor 
model is to develop a parsimonious model that 
forecasts returns accurately. One approach is for 
the researcher to predetermine the variables to 
be used in the factor model based on economic 
intuition. The model is estimated and then the 
estimated coefficients are used to produce the 
forecasts. 

A second approach is to use statistical tools 
for model selection. In this approach we con¬ 
struct several models—often by varying the fac¬ 
tors and the number of factors used—and have 
them compete against each other, just like a 
horse race. We then choose the best perform¬ 
ing model. 

Factor model performance can be evaluated in 
three ways. We can evaluate the fit, forecast abil¬ 
ity, and economic significance of the model. The 
measure to evaluate the fit of a model is based 
on statistical measures including the model's 
R 2 and adjusted R 2 , and F- and f-statistics of the 
model coefficients. 

There are several methods to evaluate how 
well a model will forecast. West (2004) discusses 
the theory and conventions of several measures 
of relative model quality. These methods use the 
resulting time series of predictions and predic¬ 
tion errors from a model. In the case where we 
want to compare models. West suggests ratios 
or differences of mean; mean-square or mean- 
absolute prediction errors; correlation between 
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one model's prediction and another model's re¬ 
alization (also know as forecast encompassing); 
or comparison of utility or profit-based mea¬ 
sures of predictive ability. In other cases where 
we want to assess a single model, he suggests 
measuring the correlation between prediction 
and realization, the serial correlation in one step 
ahead prediction errors, the ability to predict 
direction of change, and the model prediction 
bias. 

We can evaluate economic significance by 
using the model to predict values and using 
the predicted values to build portfolios. The 
profitability of the portfolios is evaluated by ex¬ 
amining statistics such as mean returns, infor¬ 
mation ratios, dollar profits, and drawdown. 


The Heuristic Approach 

The heuristic approach is another technique used 
to build trading models. Heuristics are based 
on common sense, intuition, and market in¬ 
sight and are not formal statistical or mathe¬ 
matical techniques designed to meet a given set 
of requirements. Heuristic-based models result 
from the judgment of the researcher. The re¬ 
searcher decides the factors to use, creates rules 
in order to evaluate the factors, and chooses 
how to combine the factors and implement the 
model. 

Piotroski (2000) applies a heuristic approach 
in developing an investment strategy for high- 
value stocks (high book-to-market firms). He 
selects nine fundamental factors 18 to measure 
three areas of the firm's financial condition: 
profitability, financial leverage and liquidity, 
and operating efficiency. Depending on the fac¬ 
tor's implication for future prices and prof¬ 
itability, each factor is classified as either "good" 
or "bad." An indicator variable for the factor is 
equal to one (zero) if the factor's realization is 
good (bad). The sum of the nine binary factors 
is the F_SCORE. This aggregate score measures 
the overall quality, or strength, of the firm's fi¬ 
nancial position. According to the historical re¬ 


sults provided by Piotroski, this trading strat¬ 
egy is very profitable. Specifically, a trading 
strategy that buys expected winners and shorts 
expected losers would have generated a 23% 
annual return between 1976 and 1996. 

There are different approaches to evaluate 
a heuristic approach. Statistical analysis can 
be used to estimate the probability of incor¬ 
rect outcomes. Another approach is to evaluate 
economic significance. For example, Piotroski 
determines economic significance by forming 
portfolios based on the firm's aggregate score 
(F_SCORE) and then evaluates the size of the 
subsequent portfolio returns. 

There is no theory that can provide guidance 
when making modeling choices in the heuristic 
approach. Consequently, the researcher has to 
be careful not to fall into the data-mining trap. 


The Optimization Approach 

In this approach, we use optimization to select 
and weight factors in a forecasting model. An 
optimization approach allows us flexibility in cali¬ 
brating the model and simultaneously optimiz¬ 
ing an objective function specifying a desirable 
investment criteria. 

There is substantial overlap between opti¬ 
mization use in forecast modeling and portfolio 
construction. There is frequently an advantage 
in working with the factors directly, as opposed 
to all individual stocks. The factors provide a 
lower dimensional representation of the com¬ 
plete universe of the stocks considered. Besides 
the dimensionality reduction, which reduces 
computational time, the resulting optimization 
problem is typically more robust to changes in 
the inputs. 

Sorensen, Hua, Qian, and Schoen (2004) 
present a process that uses an optimization 
framework to combine a diverse set of factors 
(alpha sources) into a multifactor model. Their 
procedure assigns optimal weights across the 
factors to achieve the highest information ra¬ 
tio. They show that the optimal weights are a 
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function of average ICs and IC covariances. 
Specifically, 

w oc cov(IC)” 1 x IC 

where w is the vector of factor weights, IC is the 
vector of the average of the risk-adjusted ICs, 
and cov(IC) -1 is the inverse of the covariance 
matrix of the ICs. 

In a subsequent paper, Sorensen, Hua, and 
Qian (2005) apply this optimization technique 
to capture the idiosyncratic return behavior of 
different security contexts. The contexts are de¬ 
termined as a function of stock risk character¬ 
istics (value, growth, or earnings variability). 
They build a multifactor model using the histor¬ 
ical risk-adjusted IC of the factors, determining 
the weights of the multifactor model by max¬ 
imizing the IR of the combined factors. Their 
research demonstrates that the weights to fac¬ 
tors of an alpha model (trading strategy) differ 
depending on the security contexts (risk dimen¬ 
sions). The approach improves the ex post in¬ 
formation ratio compared to a model that uses 
a one-size-fits-all approach. 

Importance of Model Construction 
and Factor Choice 

Empirical research shows that the factors and 
the weighting scheme of the factors are impor¬ 
tant in determining the efficacy of a trading 
strategy model. Using data from the stock se¬ 
lection models of 21 major quantitative funds, 
the quantitative research group at Sanford 
Bernstein analyzed the degree of overlap in 
rankings and factors. 19 They found that the 
models maintained similar exposures to many 
of the same factors. Most models showed high 
exposure to cash flow-based valuations (e.g., 
EV/EBITDA) and price momentum, and less 
exposure to capital use, revisions, and normal¬ 
ized valuation factors. Although they found 
commonality in factor exposures, the stock 
rankings and performance of the models were 
substantially different. This surprising finding 
indicates that model construction differs among 


the various stock selection models and provides 
evidence that the efficacy of common signals 
has not been completely arbitraged away. 

A second study by the same group showed 
commonality across models among cash flow 
and price momentum factors, while stock rank¬ 
ings and realized performance were vastly 
different. 20 They hypothesize that the difference 
between good and poor performing models 
may be related to a few unique factors identified 
by portfolio managers, better methodologies 
for model construction (e.g., static, dynamic, or 
contextual models), or good old-fashioned luck. 

Example: A Factor-Based 
Trading Strategy 

In building this model, we hope to accomplish 
the following objectives: identify stocks that 
will outperform and underperform in the fu¬ 
ture, maintain good diversification with regard 
to alpha sources, and be robust to changing 
market conditions such as time varying returns, 
volatilities, and correlations. 

We have identified 10 factors that have an 
ability to forecast stock returns. 21 Of the four 
model construction methodologies discussed 
previously, we use the optimization framework 
to build the model as it offers the greatest 
flexibility. 

We determine the allocation to specific factors 
by solving the following optimization problem: 

minw'^w, w > 0 

W 

J2 w v > 0.35 

ueValue 

w g - 0-20 

ge Growth 
10 

3 < £ 6, < 7 

i =1 

with the budget constraint 

w'e = 1, e = (1,..., 1)' 

where E is the covariance matrix of factor re¬ 
turns, Value and Growth are the sets of value 
and growth factors, and 5; is equal to one if 
zvi > 0 or zero otherwise. 
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Figure 11 Factor Weights of the Trading Strategy 


We constrain the minimum exposure to val¬ 
ues factors to be greater than or equal to 35% 
of the weight in the model based on the belief 
that there is a systematic long-term premium to 
value. 

Using the returns of our factors, we perform 
this optimization monthly to determine which 
factors to hold and in what proportions. Figure 
11 displays how the factor weights change over 
time. 

In the next step, we use the factor weights to 
determine the attractiveness of the stocks in our 
universe. We score each stock in the universe by 
multiplying the standardized values of the fac¬ 
tors by the weights provided by the optimiza¬ 
tion of our factors. Stocks with high scores are 
deemed attractive and stocks with low scores 
are deemed unattractive. 

To evaluate how the model performs, we sort 
the scores of stocks into five equally weighted 
portfolios and evaluate the returns of these 
portfolios. Table 1(A) provides summary statis¬ 
tics of the returns for each portfolio. Note that 
there is a monotonic increasing relationship 
among the portfolios with portfolio 1 (ql) earn¬ 
ing the highest return and portfolio 5 (q5) earn¬ 


ing the lowest return. Over the entire period, 
the long-short portfolio (LS) that is long port¬ 
folio 1 and short portfolio 5 averages about 1% 
per month with a monthly Sharpe ratio of 0.33. 
Its return is statistically significant at the 97.5% 
level. 


Table 1 Summary of Model Results 
A. Summary Statistics of the Model Returns 



q 1 

q2 

q3 

q4 

q5 

LS 

Mean 

1.06 

0.98 

0.83 

0.65 

0.12 

0.94 

Stdev 

5.64 

5.18 

4.98 

5.31 

5.88 

2.82 

Median 

1.61 

1.61 

1.58 

1.55 

1.11 

0.71 

Max 

15.79 

11.18 

10.92 

13.26 

13.01 

12.84 

Min 

-23.59 

-23.32 

-19.45 

-21.25 

-24.51 

-6.87 

Num 

169 

169 

169 

169 

169 

169 

f-statistic 

2.44 

2.45 

2.17 

1.59 

0.27 

4.33 

IR 

0.19 

0.19 

0.17 

0.12 

0.02 

0.33 

B. Summary Statistics of Turnover for Portfolio 1 (ql) 

and Portfolio 5 (q5) 







q 1 


q5 



Mean 


0.20 


0.17 



Stdev 


0.07 


0.06 



Median 


0.19 


0.16 



Max 


0.53 


0.39 



Min 


0.07 


0.05 



Num 


169 


169 



f-statistic 


36.74 


39.17 




Apr-30-2008 
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Table 1(B) shows the monthly average stock 
turnover of portfolio 1 (ql) and portfolio 5 
(q5). Understanding how turnover varies from 
month to month for a trading strategy is im¬ 
portant. If turnover is too high then it might 
be prohibitive to implement because of execu¬ 
tion costs. While beyond the scope of this en¬ 
try, we could explicitly incorporate transaction 
costs in this trading strategy using a market 
impact model. 22 Due to the dynamic nature of 
our trading strategy—where active factors may 
change from month to month—our turnover of 
20% is a bit higher than what would be expected 
using a static approach. 

We evaluate the monthly information 
coefficient between the model scores and sub¬ 
sequent return. This analysis provides informa¬ 
tion on how well the model forecasts return. 
The monthly mean information coefficient of 
the model score is 0.03 and is statistically sig¬ 


nificant at the 99% level. The monthly standard 
deviation is 0.08. We note that both the informa¬ 
tion coefficients and returns were stronger and 
more consistent in the earlier periods. 

Figure 12 displays the cumulative return to 
portfolio 1 through portfolio 5. Throughout the 
entire period there is a monotonic relationship 
between the portfolios. To evaluate the over¬ 
all performance of the model, we analyze the 
performance of the long-short portfolio returns. 
We observe that the model performs well in 
December 1994 to May 2007 and April 2008 
to June 2008. This is due to the fact that our 
model correctly picked the factors that per¬ 
formed well in those periods. We note that 
the model performs poorly in the period July 
2007-April 2008, losing an average of 1.09% a 
month. The model appears to suffer from the 
same problems many quantitative equity funds 
and hedge funds faced during this period. 23 





O 
co m 

£ 3 


O 
co co 

6 c 

% 3 


O 
CO CO 

6 c 

V 3 


H O 

co co 

6 G 
£ =3 


i o 
co co 
6 G 
£ 3 


O 
co CO 

6 G 
% 3 


O 
CO co 

6 G 
£ =3 


O 
co CO 

6 G 
% =3 


<0 rH C> rH CD O <0 ' CD H 

cococococococococococococo 


G CJ c CJ 
3 <D 3 <D 

^ Q ►=> Q 


r^tDr^tD-^aDr^tU 


Figure 12 Cumulative Return of the Model 











Cross-Sectional Factor-Based Models and Trading Strategies 


235 


The worst performance in a single month was 
-6.87, occurring in January 2001, and the maxi¬ 
mum drawdown of the model was —13.7%, oc¬ 
curring during the period from May 2006 (peak) 
to June 2008 (trough). 24 

To more completely understand the return 
and risk characteristic of the strategy, we would 
have to perform a more detailed analysis, in¬ 
cluding risk and performance attribution, and 
model sensitivity analysis over the full period 
as well as over subperiods. As the turnover is 
on the higher side, we may also want to in¬ 
troduce turnover constraints or use a market 
impact model. 

Periods of poor performance of a strategy 
should be disconcerting to any analyst. The 
poor performance of the model during the pe¬ 
riod June 2007-March 2008 indicates that many 
of the factors we use were not working. We need 
to go back to each individual factor and analyze 
them in isolation over this time frame. In addi¬ 
tion, this highlights the importance of research 
to improve existing factors and develop new 
ones using unique data sources. 


BACKTESTING 

In the research phase of the trading strategy, 
model scores are converted into portfolios and 
then examined to assess how these portfolios 
perform over time. This process is referred to as 
backtesting a strategy. The backtest should mir¬ 
ror as closely as possible the actual investing en¬ 
vironment incorporating both the investment's 
objectives and the trading environment. 

When it comes to mimicking the trading en¬ 
vironment in backtests, special attention needs 
to be given to transaction costs and liquid¬ 
ity considerations. The inclusion of transaction 
costs is important because they may have a ma¬ 
jor impact on the total return. Realistic market 
impact and trading costs estimates affect what 
securities are chosen during portfolio construc¬ 
tion. Liquidity is another attribute that needs 
to be evaluated. The investable universe of 


stocks should be limited to stocks where there 
is enough liquidity to be able to get in and out 
of positions. 

Portfolio managers may use a number of 
constraints during portfolio construction. Fre¬ 
quently these constraints are derived from the 
portfolio policy of the firm, risk management 
policy, or investor objectives. Common con¬ 
straints include upper and lower bounds for 
each stock, industry, or risk factor—as well as 
holding size limits, trading size limits, turnover, 
and the number of assets long or short. 

To ensure the portfolio construction process 
is robust we use sensitivity analysis to evaluate 
our results. In sensitivity analysis we vary the 
different input parameters and study their im¬ 
pact on the output parameters. If small changes 
in inputs give rise to large changes in out¬ 
puts, our process may not be robust enough. 
For example, we may eliminate the five best 
and worst performing stocks from the model, 
rerun the optimization, and evaluate the per¬ 
formance. The results should be similar as the 
success of a trading strategy should not depend 
on a handful of stocks. 

We may want to determine the effect of small 
changes in one or more parameters used in 
the optimization. The performance of the op¬ 
timal portfolio should in general not differ 
significantly after we have made these small 
changes. 

Another useful test is to evaluate a model by 
varying the investment objective. For example, 
we may evaluate a model by building a low- 
tracking-error portfolio, a high-tracking-error 
portfolio, and a market-neutral portfolio. If the 
returns from each of these portfolios are decent, 
the underlying trading strategy is more likely 
to be robust. 

Understanding In-Sample and 
Out-of-Sample Methodologies 

There are two basic backtesting methodologies: 
in-sample and out-of-sample. It is important is 
to understand the nuances of each. 
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We refer to a backtesting methodology as an 
in-sample methodology when the researcher 
uses the same data sample to specify, calibrate, 
and evaluate a model. 

An out-of-sample methodology is a backtest¬ 
ing methodology where the researcher uses a 
subset of the sample to specify and calibrate a 
model, and then evaluates the forecasting abil¬ 
ity of the model on a different subset of data. 
There are two approaches for implementing an 
out-of-sample methodology. One approach is 
the split-sample method. This method splits the 
data into two subsets of data where one subset 
is used to build the model while the remaining 
subset is used to evaluate the model. 

A second method is the recursive out-of- 
sample test. This approach uses a sequence of 
recursive or rolling windows of past history to 
forecast a future value and then evaluates that 
value against the realized value. For example, 
in a rolling regression-based model we will use 
data up to time t to calculate the coefficients 
in the regression model. The regression model 
forecasts the t + h dependent values, where 
h > 0. The prediction error is the difference be¬ 
tween the realized value at t + h and the pre¬ 
dicted value from the regression model. At t + 1 
we recalculate the regression model and evalu¬ 
ate the predicted value of t + 1 + li against real¬ 
ized value. We continue this process throughout 
the sample. 

The conventional thinking among econome¬ 
tricians is that in-sample tests tend to reject 
the null hypotheses of no predictability more 
often than out-of-sample tests. This view is 
supported by many researchers because they 
reason that in-sample tests are unreliable, of¬ 
ten finding spurious predictability. Two reasons 
given to support this view are the presence of 
unmodeled structural changes in the data and 
the use of techniques that result in data mining 
and model overfitting. 

Inoune and Kilian (2002) question this con¬ 
ventional thinking. They use asymptotic theory 
to evaluate the "trade-offs between in-sample 
tests and out-of-sample tests of predictability 


in terms of their size and power." They ar¬ 
gue strong in-sample results and weak out-of- 
sample results are not necessarily evidence that 
in-sample tests are not reliable. Out-of-sample 
tests using sample-splitting result in a loss of 
information and lower power for small sam¬ 
ples. As a result, an out-of-sample test may fail 
to detect predictability while the in-sample test 
will correctly identify predictability. They also 
show that out-of-sample tests are not more ro¬ 
bust to parameter instability that results from 
unmodeled structural changes. 

A Comment on the Interaction 
between Factor-Based Strategies and 
Risk Models 

Frequently, different factor models are used to 
calculate the risk inputs and the expected return 
forecasts in a portfolio optimization. A common 
concern is the interaction between factors in the 
models for risk and expected returns. Lee and 
Stefek (2008) evaluate the consequences of us¬ 
ing different factor models, and conclude that 
(1) using different models for risk and alpha 
can lead to unintended portfolio exposures that 
may worsen performance; (2) aligning risk fac¬ 
tors with alpha factors may improve informa¬ 
tion ratios; and (3) modifying the risk model by 
including some of the alpha factors may miti¬ 
gate the problem. 


BACKTESTING OUR FACTOR 
TRADING STRATEGY 

Using the model scores from the trading strat¬ 
egy example, we build two optimized port¬ 
folios and evaluate their performance. Unlike 
the five equally weighted portfolios built only 
from model scores, the models we now dis¬ 
cuss were built to mirror as close as possible 
tradable portfolios a portfolio manager would 
build in real time. Our investable universe is the 
Russell 1000. We assign alphas for all stock in 
the Russell 1000 with our dynamic factor model. 
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Table 2 Total Return Report (annualized) 


From 01/1995 to 06/2008 

QTD 

YTD 

1 Year 

2 Year 

3 Year 

5 Year 

10 Year 

Since Inception 

Portfolio: Low-tracking error 

-0.86 

-10.46 

-11.86 

4.64 

7.73 

11.47 

6.22 

13.30 

Portfolio: High-tracking error 

-1.43 

-10.47 

-11.78 

4.15 

8.29 

13.24 

7.16 

14.35 

S&P 500: Total return 

-2.73 

-11.91 

-13.12 

2.36 

4.41 

7.58 

2.88 

9.79 


The portfolios are long only and benchmarked 
to the S&P 500. The difference between the port¬ 
folios is in their benchmark tracking error. For 
the low-tracking error portfolio the risk aver¬ 
sion in the optimizer is set to a high value, sec¬ 
tors are constrained to plus or minus 10% of the 
sector weightings in the benchmark, and port¬ 
folio beta is constrained to 1.00. For the high- 
tracking error portfolio, the risk aversion is set 
to a low value, the sectors are constrained to 
plus or minus 25% of the sector weightings 
in the benchmark, and portfolio beta is con¬ 
strained to 1.00. Rebalancing is performed once 
a month. Monthly turnover is limited to 10% 
of the portfolio value for the low-tracking error 
portfolio and 15% of the portfolio value for the 
high-tracking error portfolio. 

Table 2 presents the results of our backtest. 
The performance numbers are gross of fees and 
transaction costs. Performance over the entire 
period is good and consistent throughout. The 
portfolios outperform the benchmark over the 
various time periods. The resulting annualized 
Sharpe ratios over the full period are 0.66 for the 
low-tracking error portfolio, 0.72 for the high- 
tracking error portfolio, and 0.45 for the S&P 
500. 25 


KEY POINTS 

• The four most commonly used approaches for 
the evaluation of return premiums and risk 
characteristics to factors are portfolio sorts, 
factor models, factor portfolios, and informa¬ 
tion coefficients. 

* The portfolio sorts approach ranks stocks by a 
particular factor into a number of portfolios. 
The sorting methodology should be consis¬ 


tent with the characteristics of the distribution 
of the factor and the economic motivation un¬ 
derlying its premium. 

• The information ratio (IR) is a statistic for 
summarizing the risk-adjusted performance 
of an investment strategy and is defined as 
the ratio of average excess return to the stan¬ 
dard deviation of return. 

• We distinguish between contemporaneous 
and forecasting factor models, dependent on 
whether both left- and right-hand side vari¬ 
ables (returns and factors) have the same time 
subscript, or the time subscript of the left- 
hand side variable is greater. 

• The three most common violations of classical 
regression theory that occur in cross-sectional 
factor models are (1) the errors in variables 
problem, (2) common variation in residuals 
such as heteroskedasticity and serial correla¬ 
tion, and (3) multicollinearity. There are sta¬ 
tistical techniques that address the first two. 
The third issue is best dealt with by removing 
collinear variables from the regression, or by 
increasing the sample size. 

• The Fama-MacBeth regression addresses the 
inference problem caused by the correlation 
of the residuals in cross-sectional regressions. 

• The information coefficient (IC) is used to 
evaluate the return forecast ability of a fac¬ 
tor. It measures the cross-sectional correlation 
between a factor and its subsequent realized 
return. 

• Factor portfolios are used to measure the in¬ 
formation content of a factor. The objective is 
to mimic the return behavior of a factor and 
minimize the residual risk. We can build fac¬ 
tor portfolios using a factor model or an opti¬ 
mization. An optimization is more flexible as 
it is able to incorporate constraints. 
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• Analyzing the performance of different fac¬ 
tors is an important part of the develop¬ 
ment of a factor-based trading strategy. This 
process begins with understanding the time- 
series properties of each factor in isolation 
and then studying how they interact with 
each other. 

• Techniques used to combine and weight fac¬ 
tors to build a trading strategy model include 
the data driven, the factor model, the heuris¬ 
tic, and the optimization approaches. 

• An out-of-sample methodology is a backtest¬ 
ing methodology where the researcher uses a 
subset of the sample to specify a model and 
then evaluates the forecasting ability of the 
model on a different subset of data. There 
are two approaches for implementing an out- 
of-sample methodology: the split-sample ap¬ 
proach and the recursive out-of-sample test. 

• Caution should be exercised if different factor 
models are used to calculate the risk inputs 
and the expected return forecasts in a portfo¬ 
lio optimization. 


APPENDIX: THE COMPUSTAT 
POINT-IN-TIME, IBES 
CONSENSUS DATABASES 
AND FACTOR DEFINITIONS 

The factors used in this entry were constructed 
on a monthly basis with data from the Com- 
pustat Point-In-Time and IBES Consensus 
databases. Our sample includes the largest 
1,000 stocks by market capitalization over the 
period December 31, 1989, to December 31, 
2008. 

The Compustat Point-In-Time database (Cap¬ 
ital IQ, Compustat, http://www.compustat 
.com) contains quarterly financial data from 
the income, balance sheet, and cash flow state¬ 
ments for active and inactive companies. This 
database provides a consistent view of histori¬ 
cal financial data, both reported data and subse¬ 
quent restatements, the way it appeared at the 


end of any month. Using these data allows the 
researcher to avoid common data issues such as 
survivorship and look-ahead bias. The data are 
available from March 1987. 

The Institutional Brokers Estimate Sys¬ 
tem (IBES) database (Thomson Reuters, 
http: / / www.thomsonreuters.com) provides 
actual earnings from companies and estimates 
of various financial measures from sell-side 
analysts. The estimated financial measures 
include estimates of earnings, revenue and 
sales, operating profit, analyst recommenda¬ 
tions, and other measures. The data are offered 
on a summary (consensus) level or detailed 
(analyst-by-analyst) basis. The U.S. data cover 
reported earnings estimates and results since 
January 1976. 

The factors used in this entry are defined as 
follows. (LTM refers to the last four reported 
quarters.) 

Value Factors 

Operating income before depreciation to enter¬ 
prise value = EBITDA /EV 
where 

EBITDA = Sales LTM (Compustat Item 2) 

— Cost of goods Sales LTM 
(Compustat Item 30) 

— SG&A Exp (Compustat Item 1) 

and 

EV = [Long-term debt (Compustat 
Item 51) 

+ Common shares outstanding 
(Computstat Item 61) 
x Price (PRCCM) — Cash 
(Compustat Item 36)] 

Book to price = Stockholders'equity total 
(Computstat Item 60) 

-T- [Common shares outstanding 
(Computstat Item 59) 
x Price (PRCCM)] 
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Sales to price = Sales LTM (Computstat Item 2) 

-T- [Common shares outstanding 
(Computstat Item 61) 
x Price(PRCCM)] 

Quality Factors 

Share repurchase = [Common shares 

outstanding (Computstat Item 61) — Common 
shares outstanding (Computstat Item 61) 
from 12 months ago] H- Common shares 
outstanding (Computstat Item 61) from 
12 months ago 

Asset turnover = Sales LTM (Computstat Item 2) / 
[(Assets (Computstat Item 44) 

— Assets (Computstat Item 44) 
from 12 months ago)/2] 

Return on invested capital = Income/ 

Invested capital 

where 

Income = Income before extra items LTM 
(Computstat Item 8) 

+ Interest expense LTM 
(Computstat Item 22) 

+ Minority interest expense LTM 
(Computstat Item 3) 

and 

Invested capital 

= Common equity (Computstat Item 59) 

+ Long-term debt (Computstat Item 51) 

+ Minority interest (Computstat Item 53) 

+ Preferred stock (Computstat Item 55) 

Debt to equity = Total debt/Stockholders' equity 

where 

Total debt = [Debt in current liabilities 
(Computstat Item 45) + Long-term debt 
— Total(Computstat Item 51)] 


and 

Stockholders' equity = Stockholders' equity 

(Computstat Item 60) 

Chg. debt to equity = (Total debt — Total debt 
from 12 months ago) 

-7- [(Stockholders' equity 
+ Stockholders' equity 
from 12 months ago)/2] 

Growth 

Revisions = [Number of up revisions 
(IBESitemNUMUP) 

— Number of down revisions (IBES 
itemNUMDOWN)] 

-T- Number of estimates revisions 
(IBES item NUMEST) 

Growth of fiscal Year 1 and fiscal Year 2 
earnings estimates = Consensus mean of FY2 
(IBES item MEANFY2) h- Consensus mean of 
FY 1(IBES item MEAN FY1) — 1 

Momentum 

Momentum = Total return of last 11 months 

excluding the most returns from 
the most recent month 

Summary Statistics 

The following table contains monthly summary 
statistics of the factors defined previously. Fac¬ 
tor values greater than the 97.5 percentile or less 
than the 2.5 percentile are considered outliers. 
We set factor values greater than the 97.5 per¬ 
centile value to the 97.5 percentile value, and 
factor values less than the 2.5 percentile value 
to the 2.5 percentile value, respectively. 
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Mean 

Standard 

Deviation 

Median 

25 Percentile 

75 Percentile 

EBITDA/EV 

0.11 

0.06 

0.11 

0.07 

0.15 

Book to price 

0.46 

0.30 

0.40 

0.24 

0.62 

Sales to price 

0.98 

0.91 

0.69 

0.36 

1.25 

Share repurchase 

0.03 

0.09 

0.00 

-0.01 

0.03 

Asset turnover 

1.83 

1.89 

1.46 

0.64 

2.56 

Return on invested capital 

0.13 

0.11 

0.11 

0.07 

0.17 

Debt to equity 

0.97 

1.08 

0.62 

0.22 

1.26 

Change in debt to equity 

0.10 

0.31 

0.01 

-0.04 

0.17 

Revisions 

-0.02 

0.33 

0.00 

-0.17 

0.11 

Growth of fiscal year 1 and fiscal year 

0.37 

3.46 

0.15 

0.09 

0.24 

2 earnings estimates 

Momentum 

13.86 

36.03 

11.00 

-7.96 

31.25 


NOTES 

1. For a good overview of the most com¬ 
mon issues, see Berk (2000) and references 
therein. 

2. Grinold and Kahn (1999) discuss the dif¬ 
ferences between the f-statistic and the in¬ 
formation ratio. Both measures are closely 
related in their calculation. The f-statistic is 
the ratio of mean return of a strategy to its 
standard error. Grinold and Kahn state the 
related calculations should not obscure the 
distinction between the two ratios. The t- 
statistic measures the statistical significance 
of returns while the IR measures the risk- 
reward trade-off and the value added by an 
investment strategy. 

3. See, for example, Fama and French (2004). 

4. One approach is to use the Bayesian or 
model averaging techniques. For more de¬ 
tails on the Bayesian approach, see, for 
example, Rachev, FIsu, Bagasheva, and 
Fabozzi (2008). 

5. For a discussion of dealing with these 
econometric problems, see Chapter 2 in 
Fabozzi, Focardi, and Kolm (2010). 

6. We cover Fama-MacBeth regression in this 
section. 

7. Fama and French (2004). 

8. See, for example, Grinold and Kahn (1999) 
and Qian, Flua, and Sorensen (2007). 

9. A factor normalized z-score is given by the 
formula z-score = (f — f)/std(f) where f is 


the factor, f is the mean, and std(f) is the 
standard deviation of the factor. 

10. We are conforming to the notation used in 
Qian and Flua (2004). To avoid confusion, 
Qian and Flua use dis() to describe the cross- 
sectional standard deviation and std() to de¬ 
scribe the time series standard deviation. 

11. The earnings estimates come from the IBES 
database. See the appendix for a more de¬ 
tailed description of the data. 

12. This derivation of factor portfolios is pre¬ 
sented in Grinold and Kahn (1999). 

13. See Melas, Suryanarayanan, and Cavaglia 
(2009). 

14. An exception is the constraint on the num¬ 
ber of assets that results in integer con¬ 
straints. 

15. For a more detailed discussion on portfo¬ 
lio optimization problems and optimization 
software see, for example, Fabozzi, Kolm, 
Pachamanova, and Focardi (2007). 

16. The hit rate is calculated as 

1 Tl 

h = T - T ^ si S n (y't%t-i) 

2 1 f=r 1 +i 

where y\ is one-step ahead realized value and 
y' is the one-step ahead predicted value. 

17. For calculation of this measure, see Diebold 
and Mariano (2005). 

18. The nine factors are return on assets, 
change in return on assets, cash flow from 
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operations scaled by total assets, cash com¬ 
pared to net income scaled by total assets, 
change in long-term debt/assets, change in 
current ratio, change in shares outstanding, 
change in gross margin, and change in asset 
turnover. 

19. Zlotnikov, Larson, Cheung, Kalaycioglu, 
Lao, and Apoian (2007). 

20. Zlotnikov, Larson, Wally, Kalaycioglu, Lao, 
and Apoian (2007). 

21. We use a combination of growth, value, 
quality, and momentum factors. The ap¬ 
pendix to this entry contains definitions of 
all of them. 

22. Cerniglia and Kolm (2010). 

23. See Rothman (2007) and Daniel (2007). 

24. We ran additional analysis on the model by 
extending the holding period of the model 
from 1 to 3 months. The results were much 
stronger as returns increased to 1.6% per 
month for a two-month holding period and 
1.9% per month for a three-month holding 
period. The risk as measured by drawdown 
was higher at -17.4% for a two-month hold¬ 
ing period and -29.5% for the three-month 
holding period. 

25. Here we calculate the Sharpe ratio as port¬ 
folio excess return (over the risk-free rate) 
divided by the standard deviation of the 
portfolio excess return. 
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Abstract: Fundamental factor risk models have been used in equity portfolio management and risk 
management for decades now. There persists, however, the notion that fundamental factor models 
are "quantitative" models that are divorced from fundamental analysis, the realm of traditional 
equity analysts. This perception is inaccurate in that the basic building blocks of analysts and factor 
modelers are in fact similar; both try to identify microeconomic traits that drive the risk and returns 
of individual securities. The differences between fundamental factor models and fundamental 
analysis lie not in their ideology but in their objectives. The objective of the fundamental analyst is 
to forecast return (or future stock values) for a particular stock. The objective of the fundamental 
factor model is to forecast the fluctuation of a portfolio around its expected return. Most importantly, 
the factor model captures the interaction of the firm's microeconomic characteristics at the portfolio 
level. This is important because as names are added to the portfolio, company-specific returns are 
diversified away, and the common factor (systematic) portion becomes an increasingly larger part 
of the portfolio risk and return. Fundamental factor models are in fact complementary as opposed 
to antithetical to traditional security analysis. 


Fundamental analysis is the process of determin¬ 
ing a security's future value by analyzing a com¬ 
bination of macro- and microeconomic events 
and company-specific characteristics. Though 
fundamental analysis focuses on the valuation 
of individual companies, most institutional in¬ 
vestors recognize that there are common factors 
affecting all stocks. (Common factors are shared 
characteristics between firms that affect their 


returns.) For example, macroeconomic events, 
like sudden changes in interest rate, inflation, or 
exchange rate expectations, can affect all stocks 
to varying degrees, depending on the stock's 
characteristics. 

Barr Rosenberg and Vinay Marathe (1976) 
developed the theory that the effects of 
macroeconomic events on individual securi¬ 
ties could be captured through microeconomic 


243 




244 


Factor Models for Portfolio Construction 


characteristics—essentially common factors, 
such as industry membership, financial struc¬ 
ture, or growth orientation. 

Rosenberg and Marathe (1976, p. 3) discuss 
possible effects of a money market crisis. They 
say a money market crisis would: 

result in possible bankruptcy for some firms, dislo¬ 
cation of the commercial paper market, and a dearth 
of new bank lending beyond existing commitments. 
Firms with high financial risk (shoivn in extreme 
leverage, poor coverage of fixed charges, and inade¬ 
quate liquid assets) might be driven to bankruptcy. 
Almost all firms would suffer to some degree from 
higher borrowing costs and worsened economic ex¬ 
pectations: Firms with high financial risk ivould 
be impacted most; the market portfolio, which is a 
weighted average of all firms, woidd be somewhat 
less exposed; and firms with abnormally low finan¬ 
cial risk woidd suffer the least. Moreover, some in¬ 
dustries such as construction woidd suffer greatly 
because of their special exposure to interest rates. 
Others such as liquor might be unaffected. 

This early insight into the linkage between 
macroeconomic events and microeconomic 
characteristics has had a profound impact on 
the asset management industry ever since. In 
this entry, we discuss the intuition behind a 
fundamental factor model based on microeco¬ 
nomic traits, showing how it is linked to tra¬ 
ditional fundamental analysis. When building 
a fundamental factor model, we look for vari¬ 
ables that explain return, just as fundamental 
analysts do. We highlight the complementary 
role of the fundamental factor model to tradi¬ 
tional security analysis and point out the in¬ 
sights these models can provide. 


FUNDAMENTAL ANALYSIS 
AND THE BARRA 
LUNDAMENTAL FACTOR 
MODEL 

Fundamental analysts use many criteria when 
researching companies; they may investigate a 
firm's financial statements, talk to senior man¬ 
agement, visit facilities and plants, or analyze 
a product pipeline. Most are seeking under- 


Table 1 Main Areas of Stock Research 


Qualitative 

Quantitative 

Business Model 
Competitive Advantage 

Management Quality 
Corporate Governance 

Capital Structure 

Revenue, Expenses, and 
Earnings Growth 

Cash Flows 


Note: Balance sheet and income statement data are read¬ 
ily available from 10K filings while access to company 
management and information about the business model 
and competitor landscape will vary on a case-by-case 
basis. 

valued companies with good fundamentals or 
companies with strong growth potential. They 
typically review a range of quantitative and 
qualitative information to help predict future 
stock values. Table 1 summarizes key areas. 

Similarly, the goal of a fundamental fac¬ 
tor model is to identify traits that are im¬ 
portant in forecasting security risk. These 
models may analyze microeconomic character¬ 
istics, such as industry membership, earnings 
growth, cash flow, debt-to-assets, and company 
specific traits. Figure 1 shows the cumulative re¬ 
turns to Merck, GlaxoSmithKline, and Bristol- 
Myers, three of the largest pharmaceutical 
companies in the United States. The chart il¬ 
lustrates the similarities in the return behavior 
of these stocks, primarily because they are U.S. 
large-cap equities within the same industry. We 
also see that Bristol-Myers underperformed the 
other two companies in recent years, indicating 
that other firm-specific factors also impacted its 
performance. 

The first task when building a fundamen¬ 
tal factor model is to identify microeconomic 
traits. These include characteristics from indus¬ 
try membership and financial ratios to techni¬ 
cal indicators like price momentum and recent 
volatility that explain return variation across 
a relevant security universe. The next step is 
to determine the impact certain events may 
have on individual stocks, such as the sensi¬ 
tivity or weight of an individual security to a 
change in a given fundamental factor. 1 Finally, 
the remaining part of the returns needs to be 
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Figure 1 Industry Membership Drives Similarities between Stocks 


modeled, which is the company-specific behav¬ 
ior of stocks. 

How does the model we have described com¬ 
pare with the way a fundamental analyst or 
portfolio manager analyzes stocks? The basic 
building blocks of analysts and factor mod¬ 
elers are in fact similar; both try to identify 
microeconomic traits that drive the risk and 
returns of individual securities. Figure 2 com¬ 
pares the two perspectives. In both views, there 
are clearly firm-specific traits driving risk and 
return. There are also sources of risk and return 
from a stock's exposure, or beta, to the overall 
market, its industry, and certain financial and 
technical ratios. But the objective of the funda¬ 
mental analyst is to forecast return (or future 
stock values), whereas the fundamental factor 
model forecasts the fluctuation of a security 
or a portfolio of securities around its expected 
return. 

Both the analyst and the factor model re¬ 
searcher look at similar macro- and microe¬ 
conomic data and events. After all, both are 
seeking traits that explain the risk and re¬ 
turns of stocks. Table 2 shows examples used 


in the Barra equity models (specifically the 
U.S. and Japan Equity Models). Variables like 
profitability and debt loads are incorporated 
in our models through factors like Earnings 
Yield, Growth, and Leverage. Expectations of 


Industry Company 

News/Trends Fundamentals 



Risk Modeler: Portfolio Manager: 

Forecast Risk Forecast Return 


Figure 2 Overview of Stock Determinants: Fun¬ 
damental Analysis versus Factor Model Analysis 
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Table 2 Sample Fundamental Data Used in Barra Models 


Value 

Growth 

Earnings Variation 

Leverage 

Foreign 

Sensitivity 

Book value 

Five-year payout 

Variability in earnings 

Market leverage 

Exchange rate 
sensitivity 

Analyst predicted 
earnings 

Variability in capital 
structure 

Standard deviation of 
analyst-predicted 
earnings 

Book leverage 

Oil price 
sensitivity 

Trailing earnings 

Growth in total assets 

Variability in cash flows 

Debt to assets 

Sensitivity to 
other market 
indices 

Forecast 

operating 

income 

Sales 

Forecast sales 

Growth in revenues 

Pension liabilities 
Historical earnings 
growth 

Analyst-predicted 
earnings growth 

Recent earnings changes 

Extraordinary items in 
earnings 

Senior debt rating 

Export revenue 
as percentage 
of total 


future revenue growth and cost savings are 
incorporated through variables like analyst 
consensus views. What about popular metrics 
that aren't included? Some factors may not be 
good risk factors despite being good return fac¬ 
tors. (A good return factor has persistent direc¬ 
tion though not a lot of volatility; the ability 
of a company to beat earnings estimates is one 
of these factors). Other factors may be relevant 
only for a particular sector or industry. (These 
risk factors would only be included in industry 
or sector risk models.) 

Note that adjustments of financial statements 
are incorporated in several ways. A key task for 
the fundamental analyst is to adjust financial 
statements—each analyst wants to get at the 
"real" number rather than what is reported in 
financial statements. Even under generally ac¬ 
cepted accounting principles, management can 
be aggressive with basic principles, such as rev¬ 
enue/expense recognition; usage of unusual, 
infrequent, or extraordinary items; and timing 
issues that may lead to violations of the match¬ 
ing principle. In a factor model, these types of 
adjustments are accounted for through the in¬ 
clusion of forward-looking, analyst-derived de¬ 
scriptors. 


In addition to fundamental data, techni¬ 
cal variables such as price momentum, beta, 
option-implied volatility, and so on may also 
be used. For instance, price momentum has 
been shown to significantly explain returns (see 
Carhart, 1997). 

How are the fundamental data used in a fac¬ 
tor model? Certain factors are found to ex¬ 
plain stock returns over time, for example, 
industries and certain financial and technical 
ratios. If such factors explain returns across 
a broad universe of stocks, they are deemed 
important. In financial theory, these factors 
are "priced" across assets, for example, Fama- 
French value (BTM) size (small-cap) factors 
(Fama and French, 1992). 

Once we have identified the factors, we 
need to link each stock to each factor. For 
this, we use microeconomic characteristics. We 
start by identifying a set of characteristics we 
call descriptors. For instance, if the factor is 
growth, a few descriptors might include earn¬ 
ings growth, revenue growth, and asset growth 
(see Table 2). These include both historical 
and forward-looking descriptors, such as fore¬ 
cast earnings growth. After we identify the 
important descriptors, we standardize them 
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Table 3 Calculating Exposures from Raw Data (April 1, 2010) 


Barra Factor 

Size 

Value 

Yield 

Descriptor for Factor 

Capitalization 
(USD Bn)* 

Book to Price 

Predicted 
Dividend Yield 

Microsoft 

256.7 

0.15 

0.02 

Estimation Universe Average 

69.8 

0.39 

0.02 

Estimation Universe Std Dev 

21.1 

0.37 

0.02 

Exposure 

1.64 

-0.62 

0.06 


Note: The actual descriptor for the USE3 Size factor uses the log of market capitalization. The log of market cap for 
Microsoft is 12.46. The estimation universe average is 10.22 and the standard deviation 1.36. The resulting exposure 
for Microsoft is 1.64. 


across a universe of stocks, typically the con¬ 
stituents of a broad market index. 2 Table 3 illus¬ 
trates how Microsoft's exposures for the Barra 
U.S. factors—size, value, and yield—are cal¬ 
culated. We subtract the estimation universe 
average 3 descriptor for each factor and divide 
it by the standard deviation of the universe of 
stocks. 

In some cases, factors reflect several charac¬ 
teristics. This occurs when multiple descriptors 
help explain the same factor. The Barra U.S. 
Growth factor, for instance, reflects five-year 
payouts, variability in capital structure, growth 
in total assets, recent large earnings changes, 
and forecast and historical earnings growth. 
Table 4 shows how we calculate Microsoft's ex¬ 
posure to the Growth factor. Each descriptor is 
first standardized and then the descriptors are 
combined to form the exposure. 


In addition to factors like Value, Size, Yield, 
and Growth, which we call style factors, a 
stock's returns are also a function of its in¬ 
dustry. Industry exposures are calculated in a 
different way. A company like Google, for in¬ 
stance, is engaged solely in Internet-related ac¬ 
tivities. It has an exposure of 100% (1.0) to the 
Internet industry factor in the Barra U.S. Equity 
Model. Its exposure to all other industry factors 
is zero. Some companies, like General Electric, 
have business activities that span multiple in¬ 
dustries. In the U.S. model, industry exposures 
are based on sales, assets, and operating income 
in each industry. 4 

What does a factor exposure mean? In the 
same way the classic capital asset pricing model 
beta measures how much a stock price moves 
with every percentage change in the market, 
a factor exposure measures how much a stock 


Table 4 Calculating Exposures When There Are Multiple Characteristics (April 1, 2010) 


Growth 


Factor 

Descriptor 

Growth Rate 
of Total 

Assets 

Recent 

Earnings 

Change 

Analyst- 

Predicted 

Earnings 

Growth 

Variability 
in Capital 
Structure 

Earnings 
Growth Rate 
Over Last 

5 Years 

5-Year 

Payout 

Microsoft 

-0.01% 

-0.14 

-0.31 

25% 

0% 

0.69 

Estimation Universe 
Average 

0.03% 

-2.76 

1.44 

15% 

-1% 

0.39 

Estimation Universe Std 
Dev 

0.04% 

47.08 

4.36 

39% 

18% 

3.28 

Standardized descriptor 

-0.95 

0.06 

-0.40 

0.24 

0.03 

0.09 

Weight of each descriptor 
Exposure 

-0.34 

0.20 

0.15 

0.13 

0.10 

0.08 

-0.33 
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price moves with every percentage change in 
a factor. Thus, if the value factor rises by 10%, 
a stock or portfolio with an exposure of 0.5 to 
the value factor will see a return of 5%, all else 
equal. 5 

Once we have predetermined the factor expo¬ 
sures for all stocks based on their underlying 
characteristics, we estimate the factor returns 
using a regression-based method. 6 

A stock's return can then be described by the 
returns of its subcomponents: its Size exposure 
times the return of the Size factor plus its Value 
exposure times the pure return of the Value fac¬ 
tor, and so on. This process can account for a 
substantial proportion of a stock's return. The 
remainder of the stock's return is deemed com¬ 
pany specific and unique to each security. For 
example, the return to Microsoft over the last 
month can be viewed as: 

F MSFT — ^Industry \F Industry 1 T ^IndustryF2.FIndustry _2 
-f- . . . ~F XsizeF Size "t~ % Valued Value T • - • -FFirm-Specific 

where x is the exposure of Microsoft to the 
various factors and /'/ fl; - f „,dcnotes returns to the 
factors. 

The returns to the factors are important. They 
are returns to the particular style or character¬ 
istic net of all other factors. For instance, the 
Value factor is the return to stocks with low 
price to book ratio with all the industry effects 
and other style effects removed. Industry re¬ 
turns have a similar interpretation and differ 
from a Global Industry Classification Standard 
(GICS®) industry-based return. They are esti¬ 
mated returns that reflect the returns to that 
industry net of all other style characteristics. 
They offer insight into the pure returns to the 
industry. 

The final building block of our fundamen¬ 
tal factor model is the modeling of company- 
specific returns. Predicting specific returns and 
risk is a difficult task that has been approached 
in a number of ways. The simplest approach 
is to assume that specific returns and / or risk 
will be the same as they have been historically. 
Another approach is to use a structural model 


where the predicted specific risk of a company 
depends on its industry, size, and other fun¬ 
damental characteristics. Both approaches— 
simple historical and modeled—are used in the 
Barra models, depending on the market. The 
modeled approach has the advantage of using 
fundamental data. 


CRITICAL INSIGHTS FROM 
THE BARRA FUNDAMENTAL 
FACTOR MODEL 

Fundamental analysis and fundamental factor 
models may begin with the same ideology but 
they offer different insights. Fundamental anal¬ 
ysis ultimately focuses on in-depth company 
research, while factor models tie the informa¬ 
tion together at the portfolio level. The critical 
value of the factor model is that it shows the 
interaction of the firm's microeconomic char¬ 
acteristics. The value of the factor model at the 
company level is magnified at the portfolio level 
as the company-specific component becomes 
less important. Figure 3 illustrates this princi¬ 
ple of diversification. As names are added to the 
portfolio, company-specific returns are diversi¬ 
fied away. Because the common factor (system¬ 
atic) portion stays roughly the same, it becomes 
an increasingly larger part of the portfolio risk 
and return. 

This means that at the portfolio level com¬ 
mon factors are more important than company- 
specific drivers in determining a portfolio's 
return and risk. Understanding and managing 
the common factor component becomes critical 
to the portfolio's performance. 

The complementary character of fundamental 
factor models and individual security analysis 
allows managers to use factor models to analyze 
portfolio characteristics. Next, we discuss the 
benefits of using fundamental factor models, 
including: 

• Monitoring and managing portfolio expo¬ 
sures over time 
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Number of Stocks in Portfolio 


Figure 3 The Number of Stocks and the Impact on the Risk Composition 

Note: This chart shows a stylized example of adding stocks to the portfolio where all the stocks have 
the same specific risk of 20%, there are two factors with risk of 10% and 5%, and correlation between 
them is 0.25. Factor exposures are drawn from a random distribution. Stocks are weighted equally in the 
portfolio. 


• Understanding the contribution of factors 
and individual stocks to portfolio risk and 
tracking error relative to the relevant bench¬ 
mark (risk decomposition) 

* Attributing portfolio performance to factors 
and individual stocks to understand the re¬ 
turn contribution of intended and accidental 
bets 


Monitoring Portfolio Exposures 

To illustrate, we use a portfolio of U.S. airline 
stocks. The concepts can be applied to any sec¬ 
tor, multisector, or multicountry portfolio. 

Since the middle of 2009, airline stocks have 
performed well. UAL (United), Delta, and 
Southwest saw big gains in December 2009 and 
February 2010. Table 5 lists the largest U.S. air¬ 
line stocks as of April 30,2010, with at least USD 


1 billion market capitalization and their recent 
performance. 

For the remainder of this section, we look 
at an equal-weighted portfolio of the stocks in 
Table 5. Figure 4 shows how the exposures of 
the airline portfolio to Barra factors evolved 
over time. The figure shows the top three expo¬ 
sures that changed the most in absolute terms 
between January 1995 and May 2010. The port¬ 
folio had an exposure to the Value factor of 1.8 
in January 1995, and by May 2010 the exposure 
had declined to -0.9. Essentially, the portfolio 
went from being relatively cheap to relatively 
expensive during this time. Airlines have also 
seen a long-term decrease in exposure to cur¬ 
rency sensitivity, most likely due to changes in 
oil exposure management and global air traffic 
patterns. 

There can also be important differences in 
the distribution of the stocks' exposures to a 
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Table 5 Largest Stocks in U.S. Airline Industry and Recent Performance 


Company 

Ticker 

Market Cap 
(USD Bn) 

1 year 

(3/31/09-3/31/10) 2009 Return 2008 Return 

DELTA AIR LINES INC DE 

DAL 

10.4 

111% 

-1% 

-23% 

SOUTHWEST AIRLS CO 

LUV 

10.2 

101% 

33% 

-29% 

UAL CORP 

UAUA 

3.6 

367% 

17% 

-67% 

CONTINENTAL AIRLS [B] 

CAL 

3.1 

109% 

-1% 

-19% 

AMR CORP 

AMR 

2.8 

63% 

-28% 

-24% 

JETBLUE AIRWAYS CORP 

JBLU 

1.7 

32% 

-23% 

20% 

ALASKA AIR GROUP INC 

ALK 

1.5 

161% 

18% 

17% 

ALLEGIANT TRAVEL CO 

ALGT 

1.1 

3% 

97% 

68% 

U S AIRWAYS GROUP INC 

LCC 

1.1 

75% 

-37% 

-47% 


factor. Figure 5 shows the distribution of in¬ 
dividual stock exposures to two of the U.S. 
factors—Value, which has the largest distribu¬ 
tion, and Growth, which has among the most 
narrow distributions—as of May 2010. Two 
portfolios can have the same overall exposure 
to a factor but very different distributions. 

Monitoring unintentional risk exposures that 
may not be visible on the surface can be criti¬ 
cal. At the portfolio level, these exposures can 
be unintended bets that can impact overall 
performance. In addition, the distribution of 
exposures may be important. For example, a 


portfolio of companies with a leverage expo¬ 
sure of zero has a very different economic pro¬ 
file than a portfolio with a barbell distribution 
where half the companies are overleveraged 
and potentially vulnerable to a collapse in credit 
conditions. 

RISK DECOMPOSITION 

Factor exposures highlight how sensitive a port¬ 
folio is to different sources of risk. Flowever, 
to truly understand how risky these exposures 
are, we can use the factor model to attribute risk 



Figure 4 Airline Portfolio Exposures over Time 
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AMR UAUA LCC CAL DAL ALGT ALK LUV JBLU 

■ Growth Value 

Figure 5 Monitoring the Distribution of Exposures (May 3, 2010) 


fully. The combination of exposures and factor 
volatilities determines the riskiness of each po¬ 
sition. For example, a portfolio can have a large 
exposure to a factor but if the factor isn't partic¬ 
ularly risky, it won't be a major contributor to 
portfolio risk. Furthermore, the relationship be¬ 
tween factors also matters. A large exposure to 
two factors that are highly correlated will also 
increase portfolio risk. 


Continuing with the airline portfolio, we de¬ 
compose risk as of April 30,2010 across the four 
major sources (see Figure 6A). Since the stocks 
are within a single industry, industry risk con¬ 
tributes the most risk. Most importantly, we see 
that even with just 9 names in the portfolio, 
style risk far outweighs company-specific risk. 
The former contributes nearly three times that 
of the latter (16% versus 5.5%). 


A 


B 



Airline Industry Factor Covariance between Style Factors Company-Specific 
Airline Industry and 
Styles 



Volatility Leverage Earnings Variation Currency Size 

Sensitivity 


Figure 6 Sources of Risk in an Airline Portfolio, April 30,2010, Using the Barra U.S. Equity Long-Term 
Model (USE3L) 
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Table 6 Exposure to Volatility of Stocks in an Airline 
Portfolio, April 30, 2010, Using the Barra U.S. 
Long-Term Equity Model (USE3L) 


Portfolio 

1.82 



USAir 

3.28 

JetBlue 

1.52 

UAL Corp 

3.19 

Alaska 

1.01 

AMR 

2.70 

Southwest 

0.49 

Continental 

1.95 

Allegiant 

0.39 

Delta 

1.81 




Which specific style factors drive the style 
risk? As seen in Figure 6B, the Volatility fac¬ 
tor is the biggest contributor by far to risk. This 
risk stems mostly from USAir and United's high 
exposure to the factor (see Table 6). 

To summarize, risk decomposition provides two 
critical insights. First, as we move from the 
stock level to the portfolio level, style and in¬ 
dustry risk become more important, overtaking 
company-specific risk. Second, we see that cer¬ 
tain styles contribute more risk than others at 
the stock and portfolio levels. For example, the 
risk of United (UAL Corp) and USAir will be 
heavily impacted by the Volatility factor. 

Performance Attribution 

The fundamental factor model also provides in¬ 
sight into performance attribution. Managers can 
use the model to analyze past performance, at¬ 
tributing realized portfolio return to its var¬ 
ious sources. This can include allocations to 
certain countries or sectors, or allocations to 
certain segments—small-cap names, emerging 
markets, or high beta names. 

Table 7 shows the decomposition of realized 
returns for the airline portfolio for April 2010. 
The first column displays the portfolio return 


attribution. The subsequent columns show the 
return attribution for each individual airline 
stock in isolation. The portfolio of airline stocks 
returned -4.3% for the month despite a pos¬ 
itive contribution of 4.3% coming from style 
factors. JetBlue, for instance, was flat for the 
month, as its gain from style factors largely off¬ 
set losses from the industry component. Sim¬ 
ilarly, Continental and UAL were helped by 
both strong contributions from style exposures. 
In contrast, positive gains from style factors 
were not enough to offset the company-specific 
losses suffered by USAir, Delta, AMR, and Alle- 
giant. In fact, only about half the stocks realized 
positive company-specific returns. 

Table 8 takes the last row in Table 7 and breaks 
it down into the individual styles in the model. 
The main source of positive return was the 
Size factor followed by the Currency Sensitivity, 
Leverage, and Volatility factors. In other words, 
airlines benefited from being smaller in cap size 
relative to the market (exposure of -1.7 to Size). 
They also benefited from the appreciation of the 
U.S. dollar (exposure of -2.7 to Currency Sensi¬ 
tivity). In addition, they were helped by being 
relatively levered (exposure of 2.6 to Leverage) 
and from having relatively higher overall and 
higher beta to the market (exposure of 1.7 to 
Volatility) 

At the stock level, most of the airlines ben¬ 
efited from being relatively small. UAL and 
USAir benefited the most from the apprecia¬ 
tion of the U.S. dollar. UAL, USAir, and AMR 
benefited the most from being relatively more 
levered than the other airlines. These three 
stocks also benefited the most from having 
relatively higher beta to the market and higher 
volatility. 


Table 7 Return Attribution for Airline Portfolio and Stocks, %, March 31, 2010-April 30, 2010, Barra U.S. Equity 
Long-Term Model (USE3L) 



Portfolio 

Alaska 

Allegiant 

AMR 

Continental 

Delta 

JetBlue 

South west 

UAL 

USAir 

Total 

- 4.3 

0.4 

-11.1 

-19.0 

1.7 

-17.2 

0.2 

-0.3 

10.4 

-3.8 

Company-Specific 

-4.4 

2.6 

-9.9 

-22.6 

1.5 

-17.6 

0.4 

2.3 

10.5 

—6.6 

Airline Industry 

-4.2 

-4.2 

-4.2 

-4.2 

-4.2 

-4.2 

-4.2 

-4.2 

- 4.2 

-4.2 

Styles 

4.3 

2.1 

3.0 

7.9 

4.5 

4.6 

4.0 

1.6 

4.1 

7.0 
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Table 8 Return Attribution for Styles Only in Percent, March 31, 2010-April 30, 2010, Barra U.S. Equity 
Long-Term Model (USE3L) 



Portfolio 

Alaska Allegiant 

AMR 

Continental 

Delta JetBlue 

Southwest 

UAL 

USAir 

Size 

2.3 

3.0 

3.2 

2.2 

2.2 

0.9 

3.1 

1.1 

2.2 

3.2 

Currency Sensitivity 

1.1 

0.6 

1.3 

1.3 

1.1 

1.3 

0.8 

-0.1 

2.0 

2.0 

Leverage 

1.0 

0.6 

0.2 

1.6 

1.3 

1.2 

0.9 

0.0 

1.4 

1.8 

Volatility 

0.9 

0.7 

0.1 

1.3 

1.0 

0.9 

0.5 

0.3 

1.6 

1.5 

Earnings Yield 

0.8 

-0.5 

-0.1 

4.0 

0.7 

1.9 

0.1 

0.4 

-0.7 

1.3 

Trading Activity 

0.1 

0.2 

0.0 

0.2 

0.2 

0.2 

0.1 

0.1 

0.2 

0.2 

Momentum 

0.0 

0.0 

0.0 

-0.1 

-0.1 

0.0 

0.0 

0.0 

-0.1 

-0.1 

Growth 

-0.1 

0.0 

-0.5 

0.1 

0.0 

-0.4 

0.2 

0.1 

-0.1 

-0.1 

Value 

-0.2 

0.0 

-0.1 

-0.8 

-0.2 

-0.2 

0.4 

0.1 

-0.7 

-0.4 

Yield 

-0.3 

-0.3 

-0.3 

-0.3 

-0.3 

-0.3 

-0.3 

-0.3 

-0.3 

-0.3 

Size Nonlinearity 

-0.3 

-0.6 

-0.7 

-0.2 

-0.2 

0.1 

-0.6 

0.1 

-0.2 

-0.7 

Earnings Variation 

-1.0 

-1.6 

0.0 

-1.5 

-1.2 

-0.8 

-1.2 

-0.1 

-1.1 

-1.4 


Styles can contribute significantly to a man¬ 
ager's performance. In our example, the U.S. 
Volatility factor was the main driver. Looking 
at individual factors and stocks, we can also see 
that certain factors and stocks made a signifi¬ 
cant contribution to performance due to stock- 
specific performance or style contribution. 

In summary, portfolio performance can be 
strongly impacted by unintended bets. The 
manager may be taking major risks without ad¬ 
equate compensation. The factor model helps 
uncover these issues. 

KEY POINTS 

* Fundamental analysis is the process of deter¬ 
mining a security's future value by analyzing 
a combination of macro- and microeconomic 
events and company-specific characteristics. 

* Though fundamental analysis focuses on the 
valuation of individual companies, most in¬ 
stitutional investors recognize that there are 
common factors affecting all stocks. Com¬ 
mon factors are shared characteristics be¬ 
tween firms that affect their returns. 

* Fundamental factor models are in fact com¬ 
plementary as opposed to antithetical to tra¬ 
ditional security analysis. The basic building 
blocks of analysts and factor modelers are 
in fact similar: Both try to identify microe¬ 
conomic traits that drive the risk and returns 
of individual securities. 


* The objective of the fundamental analyst is 
to forecast return (or future stock values), 
whereas the fundamental factor model fore¬ 
casts the fluctuation of a security or a portfolio 
of securities around its expected return. Some 
factors may help managers forecast return but 
not be good risk factors. A good return fac¬ 
tor has persistent direction though not a lot 
of volatility—the ability of a company to beat 
earnings estimates is one of these factors. A 
good risk factor may be persistent or not but 
must be adequately volatile. 

* Fundamental analysis and fundamental fac¬ 
tor models may begin with the same ide¬ 
ology but they offer different insights. The 
most critical difference is that the factor model 
captures the interaction of the firm's mi¬ 
croeconomic characteristics at the portfolio 
level. This is important because as names are 
added to the portfolio, company-specific re¬ 
turns are diversified away, and the common 
factor (systematic) portion becomes an in¬ 
creasingly larger part of the portfolio risk and 
return. 

* There are three major benefits of using fun¬ 
damental factor models: (1) monitoring and 
managing portfolio exposures over time; (2) 
understanding the contribution of factors and 
individual stocks to portfolio risk and track¬ 
ing error relative to the relevant benchmark 
(risk decomposition); and (3) attributing port¬ 
folio performance to factors and individual 
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stocks to understand the return contribution 
of intended and accidental bets. 

• Managers can use the model to analyze past 
performance, attributing realized portfolio 
return to its various sources. Portfolio per¬ 
formance can be strongly impacted by 
unintended bets. The manager may be taking 
major risks without adequate compensation. 
The factor model helps uncover these issues. 

• The distribution of exposures may be im¬ 
portant. For example, a portfolio of compa¬ 
nies with a leverage exposure of zero has a 
very different economic profile than a portfo¬ 
lio with a barbell distribution where half the 
companies are overleveraged and potentially 
vulnerable to a collapse in credit conditions. 

NOTES 

1. In the Barra U.S. equity model, for example, 
we allow companies to be split up into five 
different industries, depending on their busi¬ 
ness structure. 

2. All existing Barra models focus on a particu¬ 
lar market, using an equity universe that in¬ 
cludes all sectors and large to mid-caps with 
some small-caps. 

3. The estimation universe average is a market- 
cap weighted average. 


4. In effect, we build three separate valua¬ 
tion models. The results of each valuation 
model determine a set of weights, based 
on fundamental information. The final in¬ 
dustry weights are a weighted average of 
the three weighting schemes. Further details 
are available in the Barra U.S. Equity Model 
Handbook. 

5. Specifically, the effects of other factors as well 
as specific returns remain the same, and the 
risk-free rate is unchanged. 

6. Details of the model construction are avail¬ 
able in The Barra Risk Model Handbook or Barra 
U.S. Equity Model Handbook. 
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Abstract: Multifactor equity risk models are classified as statistical models, macroeconomic models, 
and fundamental models. The most popular types of models used in practice are fundamental mod¬ 
els. Many of the inputs used in a multifactor risk model are those used in traditional fundamental 
analysis. There are several commercially available fundamental multifactor risk models. There are 
asset management companies that develop proprietary models. Brokerage firms have developed 
models that they make available to institutional clients. 


Quantitative-oriented common stock portfolio 
managers typically employ a multifactor equity 
risk model in constructing and rebalancing a 
portfolio and then for evaluating performance. 
The most popular type of multifactor equity risk 
model used is a fundamental factor model. 1 
While some asset management firms develop 
their own model, most use commercially avail¬ 
able models. In this entry we use one commer¬ 


cially available model to illustrate the general 
features of fundamental models and how they are 
used to construct portfolios. In our illustration, 
we will use an old version of a model devel¬ 
oped by Barra (now MSCI Barra). Although that 
model has been updated, the discussion and 
illustrations provide the essential points for ap¬ 
preciating the value of using multifactor equity 
models. 


255 




256 


Factor Models for Portfolio Construction 


MODEL DESCRIPTION AND 
ESTIMATION 

The basic relationship to be estimated in a mul¬ 
tifactor risk model is 

Rj — Rf = hi.FI RfI + Pi,F2 Rf2 + ■ ■ ■ 

+ Pi.FH RFH + Si 

where 

Ri = rate of return on stock i 
Rf = risk-free rate of return 

P if! = sensitivity of stock i to risk factor j 
R F j = rate of return on risk factor j 
e, = nonfactor (specific) return on 
security i 

The above function is referred to as a return 
generating function. 

Fundamental factor models use company and 
industry attributes and market data as descrip¬ 
tors. Examples are price/earnings ratios, book/ 
price ratios, estimated earnings growth, and 
trading activity The estimation of a fundamen¬ 
tal factor model begins with an analysis of his¬ 
torical stock returns and descriptors about a 
company In the Barra model, for example, the 
process of identifying the risk factors begins with 
monthly returns for 1,900 companies that the 
descriptors must explain. Descriptors are not 
the "risk factors" but instead they are the can¬ 
didates for risk factors. The descriptors are se¬ 
lected in terms of their ability to explain stock 
returns. That is, all of the descriptors are poten¬ 
tial risk factors but only those that appear to be 
important in explaining stock returns are used 
in constructing risk factors. 

Once the descriptors that are statistically sig¬ 
nificant in explaining stock returns are identi¬ 
fied, they are grouped into risk indexes to capture 
related company attributes. For example, de¬ 
scriptors such as market leverage, book lever¬ 
age, debt-to-equity ratio, and company's debt 
rating are combined to obtain a risk index re¬ 
ferred to as "leverage." Thus, a risk index is a 
combination of descriptors that captures a par¬ 
ticular attribute of a company. 


The Barra fundamental multifactor risk 
model, the "E3 model" being the latest version, 
has 13 risk indexes and 55 industry groups. 
(The descriptors are the same variables that 
have been consistently found to be important in 
many well-known academic studies on risk fac¬ 
tors.) Table 1 lists the 13 risk indexes in the Barra 
model. 2 Also shown in the table are the descrip¬ 
tors used to construct each risk index. The 55 
industry classifications are grouped into 13 sec¬ 
tors. For example, the following three industries 
comprise the energy sector: energy reserves and 
production, oil refining, and oil services. The 
consumer noncyclicals sector consists of the 
following five industries: food and beverages, 
alcohol, tobacco, home products, and grocery 
stores. The 13 sectors in the Barra model are 
basic materials, energy, consumer noncylicals, 
consumer cyclicals, consumer services, indus¬ 
trials, utility, transport, health care, technology, 
telecommunications, commercial services, and 
financial. 

Given the risk factors, information about the 
exposure of every stock to each risk factor (fi,Fj) 
is estimated using statistical analysis. For a 
given time period, the rate of return for each 
risk factor (Jii j) also can be estimated using sta¬ 
tistical analysis. The prediction for the expected 
return can be obtained from equation (1) for any 
stock. The nonfactor return (e,) is found by sub¬ 
tracting the actual return for the period for a 
stock from the return as predicted by the risk 
factors. 

Moving from individual stocks to portfolios, 
the predicted return for a portfolio can be com¬ 
puted. The exposure to a given risk factor of a 
portfolio is simply the weighted average of the 
exposure of each stock in the portfolio to that 
risk factor. For example, suppose a portfolio has 
42 stocks. Suppose further that stocks 1 through 
40 are equally weighted in the portfolio at 2.2%, 
stock 41 is 5% of the portfolio, and stock 42 is 
7% of the portfolio. Then the exposure of the 
portfolio to risk factor j is 

0.022 /h, pj + 0.022 /?2, p ; + ■■■+ 0.022 /bo, Fj 
+ 0.050 /hi , Fj + 0.07 Pn.Fj 
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Table 1 Barra E3 Model Risk Definitions 


Descriptors in Risk Index 

Risk Index 

Beta times sigma 

Daily standard deviation 

High-low price 

Log of stock price 

Cumulative range 

Volume beta 

Serial dependence 

Option-implied standard 
deviation 

Volatility 

Relative strength 

Historical alpha 

Momentum 

Log of market capitalization 

Size 

Cube of log of market 
capitalization 

Size Nonlinearity 

Share turnover rate (annual) 

Share turnover rate (quarterly) 
Share turnover rate (monthly) 

Share turnover rate (five years) 
Indicator for forward split 

Volume to variance 

Trading Activity 

Payout ratio over five years 
Variability in capital structure 
Growth rate in total assets 

Earnings growth rate over the 
last five years 

Analyst-predicted earnings 
growth 

Recent earnings change 

Growth 

Analyst-predicted 

earnings-to-price 

Trailing annual earnings-to-price 
Historical earnings-to-price 

Earnings Yield 

Book-to-price ratio 

Value 

Variability in earnings 

Variability in cash flows 
Extraordinary items in earnings 
Standard deviation of 
analyst-predicted 
earnings-to-price 

Earnings Variability 

Market leverage 

Book leverage 

Debt to total assets 

Senior debt rating 

Leverage 

Exposure to foreign currencies 

Currency Sensitivity 

Predicted dividend yield 

Dividend Yield 

Indicator for firms outside US-E3 

Non-Estimation 

estimation universe 

Universe Indicator 


Adapted from Table 8-1 in Barra (1998, pp. 71-73). 
Adapted with permission. 


The nonfactor error term is measured in the 
same way as in the case of an individual stock. 
However, in a well-diversified portfolio, the 
nonfactor error term will be considerably less 
for the portfolio than for the individual stocks 
in the portfolio. 

The same analysis can be applied to a stock 
market index because an index is nothing more 
than a portfolio of stocks. 

RISK DECOMPOSITION 

The real usefulness of a linear multifactor 
model lies in the ease with which the risk of a 
portfolio with several assets can be estimated. 
Consider a portfolio with 100 assets. Risk is 
commonly defined as the variance of the port¬ 
folio's returns. So, in this case, we need to find 
the variance-covariance matrix of the 100 as¬ 
sets. That would require us to estimate 100 vari¬ 
ances (one for each of the 100 assets) and 4,950 
covariances among the 100 assets. That is, in all 
we need to estimate 5,050 values, a very diffi¬ 
cult undertaking. Suppose, instead, that we use 
a three-factor model to estimate risk. Then, we 
need to estimate (1) the three factor loadings 
for each of the 100 assets (i.e., 300 values), (2) 
the six values of the factor variance-covariance 
matrix, and (3) the 100 residual variances (one 
for each asset). That is, we need to estimate only 
406 values in all. This represents a nearly 90% 
reduction from having to estimate 5,050 values, 
a huge improvement. Thus, with well-chosen 
factors, we can substantially reduce the work 
involved in estimating a portfolio's risk. 

Multifactor risk models allow a manager and 
a client to decompose risk in order to assess the 
exposure of a portfolio to the risk factors and 
to assess the potential performance of a portfo¬ 
lio relative to a benchmark. This is the portfolio 
construction and risk control application of the 
model. Also, the actual performance of a port¬ 
folio relative to a benchmark can be assessed. 
This is the performance attribution analysis ap¬ 
plication of the model. 

Barra suggests that there are various ways 
that a portfolio's total risk can be decomposed 
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Figure 1 Total Risk Decomposition 

Source: Figure 4.2 in Barra (1998, p. 34). Reprinted with permission. 


when employing a multifactor risk model. 3 
Each decomposition approach can be useful 
to managers depending on the equity port¬ 
folio management that they pursue. The four 
approaches are (1) total risk decomposition, 

(2) systematic-residual risk decomposition, 

(3) active risk decomposition, and (4) active 
systematic-active residual risk decomposition. 

In all of these approaches to risk decom¬ 
position, the total return is first divided into 
the risk-free return and the total excess return. 
The total excess return is the difference between 
the actual return realized by the portfolio and 
the risk-free return. The risk associated with the 
total excess return, called total excess risk, is what 
is further partitioned in the four approaches. 

Total Risk Decomposition 

There are managers who seek to minimize to¬ 
tal risk. For example, a manager pursuing a 
long-short or market neutral strategy seeks to 
construct a portfolio that minimizes total risk. 
For such managers, total risk decomposition 
that breaks down the total excess risk into two 
components— common factor risks (e.g., capital¬ 
ization and industry exposures) and specific 
risk —is useful. This decomposition is shown 


in Figure 1. There is no provision for market 
risk, only risk attributed to the common factor 
risks and company-specific influences (i.e., risk 
unique to a particular company and therefore 
uncorrelated with the specific risk of other com¬ 
panies). Thus, the market portfolio is not a risk 
factor considered in this decomposition. 

Systematic-Residual Risk 
Decomposition 

There are managers who seek to time the mar¬ 
ket or who intentionally make bets to create a 
different exposure from that of a market port¬ 
folio. Such managers would find it useful to 
decompose total excess risk into systematic risk 
and residual risk as shown in Figure 2. Unlike 
in the total risk decomposition approach just 
described, this view brings market risk into the 
analysis. It is the type of decomposition where 
systematic risk is the risk related to a portfolio's 
beta. 

Residual risk in the systematic-residual risk de¬ 
composition is defined in a different way from 
residual risk in the total risk decomposition. 
In the systematic-residual risk decomposition, 
residual risk is risk that is uncorrelated with 
the market portfolio. In turn, residual risk 





















Multifactor Equity Risk Models and Their Applications 


259 



Figure 2 Systematic-Residual Risk Decomposition 

Source: Figure 4.3 in Barra (1998, p. 34). Reprinted with permission. 


is partitioned into specific risk and common 
factor risk. Notice that the partitioning of risk 
described here is different from that in the 
arbitrage pricing theory model where all risk 
factors that could not be diversified away were 
referred to as "systematic risks." In the discus¬ 
sion here, risk factors that cannot be diversified 
away are classified as market risk and common 
factor risk. Systematic risk can be diversified to 
a negligible level. 


Active Risk Decomposition 

The active risk decomposition approach is use¬ 
ful for assessing a portfolio's risk exposure and 
actual performance relative to a benchmark in¬ 
dex. In this type of decomposition, shown in 
Figure 3, the total excess return is divided into 
benchmark risk and active risk. Benchmark risk 
is defined as the risk associated with the bench¬ 
mark portfolio. 



Figure 3 Active Risk Decomposition 

Source: Figure 4.4 in Barra (1998, p. 34). Reprinted with permission. 
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Figure 4 Active Systematic-Active Residual Risk Decomposition 
Source: Figure 4.5 in Barra (1998, p. 37). Reprinted with permission. 


Active risk is the risk that results from the 
manager's attempt to generate a return that 
will outperform the benchmark. Another name 
for active risk is tracking error. The active risk 
is further partitioned into common factor risk 
and specific risk. This decomposition is useful 
for managers of index funds and traditionally 
managed active funds. 

Active Systematic-Active Residual 
Risk Decomposition 

There are managers who overlay a market¬ 
timing strategy on their stock selection. That 
is, they not only try to select stocks they be¬ 
lieve will outperform but also try to time the 
purchase of the acquisition. For a manager who 
pursues such a strategy, it will be important in 
evaluating performance to separate market risk 
from common factor risks. In the active risk 
decomposition approach just discussed, there 
is no market risk identified as one of the risk 
factors. 


Since market risk (i.e., systematic risk) is an 
element of active risk, its inclusion as a source 
of risk is preferred by managers. When market 
risk is included, we have the active systematic- 
active residual risk decomposition approach 
shown in Figure 4. Total excess risk is again 
divided into benchmark risk and active risk. 
However, active risk is further divided into ac¬ 
tive systematic risk (i.e., active market risk) and 
active residual risk. Then active residual risk is 
divided into common factor risks and specific 
risk. 

Summary of Risk Decomposition 

The four approaches to risk decomposition are 
just different ways of slicing up risk to help 
a manager in constructing and controlling the 
risk of a portfolio and for a client to understand 
how the manager performed. Figure 5 provides 
an overview of the four approaches to carving 
up risk into specific/common factor, system¬ 
atic/ residual, and benchmark/active risks. 
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Systematic Residual 



Figure 5 Risk Decomposition Overview 
Source: Figure 4.6 in Barra (1998, p. 38). Reprinted 
with permission. 


APPLICATIONS IN 
PORTFOLIO CONSTRUCTION 
AND RISK CONTROL 

The power of a multifactor risk model is that 
given the risk factors and the risk factor sensi¬ 
tivities, a portfolio's risk exposure profile can 
be quantified and controlled. The three exam¬ 
ples below show how this can be done so 
that a manager can avoid making unintended 
bets. In the examples, we use the Barra E3 
factor model. 4 


A fundamental multifactor risk model can be 
used to assess whether the current portfolio is 
consistent with a manager's strengths. Table 2 
is a list of the top 15 holdings of Portfolio ABC 
as of December 31, 2008. Table 3 is a summary 
risk decomposition report for the same portfo¬ 
lio. The portfolio had a total market value of $5.4 
billion, 868 holdings, and a predicted beta of 
1.15. The risk report also shows that the portfo¬ 
lio had an active risk of 6.7%. This is its tracking 
error with respect to the benchmark, the S&P 
500 index. Notice that nearly 93% of the active 
risk variance (which is 44.8) came from common 
factor variance (which is 41.6), and only a small 
proportion came from stock-specific risk vari¬ 
ance (also known as asset selection variance, 
which is 3.2). Clearly, the manager of this port¬ 
folio had placed fairly large factor bets. 

The top portion of Table 4 lists the factor risk 
exposures of Portfolio ABC relative to those of 
the S&P 500 index, its benchmark. The first col¬ 
umn shows the exposures of the portfolio, and 
the second column shows the exposures of the 
benchmark. The last column shows the active 
exposure, which is the difference between the 
portfolio exposure and the benchmark expo¬ 
sure. The exposures to the risk index factors 
are measured in units of standard deviation. 


Table 2 Portfolio ABC's Holdings (only the top 15 holdings shown) 


Ticker 

Security Name 

Shares 

Price ($) 

Weight 

Beta 

Industry 

XOM 

Exxon Mobil Corp. 

3,080,429 

79.83 

4.56 

0.92 

Oil Refining 

MSFT 

Microsoft Corp. 

6,235,154 

19.44 

2.25 

0.95 

Computer Software 

CVX 

Chevron Corp. 

1,614,879 

73.97 

2.21 

0.98 

Energy Reserves & Production 

IBM 

International Business 
Machines Corp. 

1,100,900 

84.16 

1.72 

0.83 

Computer Software 

T 

AT&T Inc. 

3,226,744 

28.50 

1.70 

0.80 

Telephone 

HPQ 

Hewlett-Packard Co. 

2,464,100 

36.29 

1.66 

0.84 

Computer Hardware & Business 
Machines 

INTC 

Intel Corp. 

5,997,300 

14.66 

1.63 

0.87 

Semiconductors 

COP 

ConocoPhillips 

1,634,986 

51.80 

1.57 

1.24 

Energy Reserves & Production 

CSCO 

Cisco Systems Inc. 

5,186,400 

16.30 

1.57 

0.95 

Computer Hardware & Business 
Machines 

JNJ 

Johnson & Johnson 

1,403,544 

59.83 

1.56 

0.54 

Medical Products & Supplies 

OXY 

Occidental Petroleum Corp. 

1,324,426 

59.99 

1.47 

1.26 

Energy Reserves & Production 

PG 

Procter & Gamble Co. 

1,249,446 

61.82 

1.43 

0.57 

Home Products 

GE 

General Electric Co. 

4,762,984 

16.20 

1.43 

1.41 

Heavy Electrical Equipment 

PFE 

Pfizer Inc. 

4,339,092 

17.71 

1.42 

0.61 

Drugs 

TWX 

Time Warner Inc. 

1,948,880 

30.18 

1.09 

1.32 

Media 
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Table 3 Portfolio ABC's Summary Risk 
Decomposition Report 


Number of Securities 

Number of Shares 

Average Share Price 

Weighted Average Share Price 

Portfolio Ending Market Value 
Predicted Beta (vs. Benchmark, 

S&P 500) 

Barra Risk Decomposition (Variance) 

868 

298,371,041 

$24.91 

$35.30 

$5,396,530,668 

1.15 

Asset Selection Variance 

3.2 

Common Factor Variance: 


Risk Indexes 

22.5 

Industries 

11.7 

Covariance x 2 

7.5 

Common Factor Variance 

41.6 

Active Variance 

44.8 

Benchmark Variance 

749.4 

Total Variance 

1,016.6 

Barra Risk Decomposition (Std. Dev.) 


Asset Selection Risk 

1.8 

Common Factor Risk: 


Risk Indexes 

4.7 

Industries 

3.4 

Covariance x 2 


Common Factor Risk 

6.5 

Active Risk 

6.7 

Benchmark Risk 

27.4 

Total Risk 

31.9 


while the exposures to the industry factors are 
measured in percentages. The portfolio had a 
high active exposure to the Volatility risk index 
factor. That is, the stocks in the portfolio were 
far more volatile than the stocks in the bench¬ 
mark. On the other side, the portfolio had a low 
active exposure to the Size risk index. That is, 
the stocks in the portfolio were smaller than the 
benchmark average in terms of market capital¬ 
ization. The lower portion of Table 4 is an ab¬ 
breviated list of the industry factor exposures. 

An important use of such risk reports is the 
identification of portfolio bets, both explicit 
and implicit. If, for example, the manager of 
Portfolio ABC did not intend to place the large 
bet on the Volatility risk index, then he has 
to make appropriate changes in the portfolio 
holdings until the active exposure to this factor 
is closer to zero. 

Risk Control against a Stock 
Market Index 

The objective of equity indexing is to match 
the performance of some specified stock market 


Table 4 Analysis of Portfolio ABC's Factor Exposures 


Risk Indexes (std. dev. units) 

Managed 3 

Benchmark 11 

Active 0 

U.S. Volatility 

0.321 

-0.089 

0.410 

U.S. Value 

0.199 

-0.024 

0.223 

U.S. Earnings Variation 

0.149 

-0.053 

0.202 

U.S. Earnings Yield 

0.243 

0.053 

0.191 

U.S. Trading Activity 

0.161 

0.052 

0.109 

U.S. Leverage 

-0.036 

-0.110 

0.074 

U.S. Growth 

0.004 

-0.069 

0.073 

U.S. Non-Estimation Universe 

0.027 

0.000 

0.027 

U.S. Currency Sensitivity 

-0.013 

0.007 

-0.019 

U.S. Momentum 

-0.183 

-0.043 

-0.139 

U.S. Yield 

-0.115 

0.078 

-0.194 

U.S. Size Non-Linearity 

-0.107 

0.123 

-0.230 

U.S. Size 

-0.244 

0.356 

-0.600 

Top Three Industries (percentages) 

Managed 

Benchmark 

Active 

U.S. Energy Reserves 

0.098 

0.064 

0.033 

U.S. Semiconductors 

0.052 

0.023 

0.028 

U.S. Mining and Metals 

0.036 

0.009 

0.027 


3 Managed return. 
b Benchmark return (S&P 500). 

c Active return = Managed return - Benchmark return. 
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Table 5 Factor Exposures of a 50-Stock Portfolio that Optimally Matches the S&P 500 Index 


Risk Indexes (std. dev. units) 

Managed 3 

Benchmark 15 

Active 3 

U.S. Volatility 

-0.153 

-0.089 

-0.063 

U.S. Momentum 

-0.062 

-0.043 

-0.018 

U.S. Size 

0.795 

0.356 

0.440 

U.S. Size Non-Linearity 

0.164 

0.123 

0.041 

U.S. Trading Activity 

-0.001 

0.052 

-0.053 

U.S. Growth 

-0.052 

-0.069 

0.016 

U.S. Earnings Yield 

0.076 

0.053 

0.023 

U.S. Value 

-0.019 

-0.024 

0.005 

U.S. Earnings Variation 

-0.122 

-0.053 

-0.069 

U.S. Leverage 

-0.176 

-0.110 

-0.066 

U.S. Currency Sensitivity 

-0.048 

0.007 

-0.055 

U.S. Yield 

0.140 

0.078 

0.061 

U.S. Non-Estimation Universe 

0.000 

0.000 

0.000 


a Managed return. 
b Benchmark return (S&P 500). 

c Active return = Managed return - Benchmark return. 


index with little tracking error. To do this, the 
risk profile of the indexed portfolio must match 
the risk profile of the designated stock market 
index. Put in other terms, the factor risk ex¬ 
posure of the indexed portfolio must match as 
closely as possible the exposure of the desig¬ 
nated stock market index to the same factors. 
Any differences in the factor risk exposures re¬ 
sult in tracking error. Identification of any dif¬ 
ferences allows the indexer to rebalance the 
portfolio to reduce tracking error. 

To illustrate this, suppose that an index man¬ 
ager has constructed a portfolio of 50 stocks 
to match the S&P 500 index. Table 5 lists the 
exposures to the Barra risk indexes of the 50- 
stock portfolio and the S&P 500 index. The last 
column in the exhibit shows the difference in 
exposures. The differences are very small ex¬ 
cept for the exposures to the Size risk index 
factor. Though not shown in this exhibit, there 
is a similar list of exposures to the 55 industry 
factors. 

The illustration in Table 5 uses price data as 
of December 31, 2008. It demonstrates how a 
multifactor risk model can be combined with 
an optimization model to construct an indexed 
portfolio when a given number of holdings 
are sought. Specifically, the portfolio analyzed 
in the exhibit is the result of an application 


in which the manager wants a portfolio con¬ 
structed that matches the S&P 500 index with 
only 50 stocks and that minimizes tracking 
error. The optimization model uses the multi¬ 
factor risk model to construct a 50-stock portfo¬ 
lio with a tracking error versus S&P 500 index 
of just 2.75%. Since this is the optimal 50-stock 
portfolio to replicate the S&P 500 index with 
a minimum tracking error risk, this tells the 
index manager that if he seeks a lower track¬ 
ing error, then more stocks must be held. Note, 
however, that the optimal portfolio changes as 
time passes and prices move. 


Tilting a Portfolio 

Now let's look at how an active manager can 
construct a portfolio to make intentional bets. 
Suppose that a portfolio manager seeks to con¬ 
struct a portfolio that generates superior returns 
relative to the S&P 500 by tilting it toward low 
P/E stocks. At the same time, the manager does 
not want to increase tracking error significantly. 
An obvious approach may seem to be to iden¬ 
tify all the stocks in the universe that have a 
lower than average P/E. The problem with this 
approach is that it introduces unintentional bets 
with respect to the other risk indexes. 
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Table 6 Factor Exposures of a Portfolio Tilted Toward Earnings Yield 


Risk Indexes (std. dev. units) 

Managed 3 

Benchmark b 

Active 0 

U.S. Volatility 

-0.050 

-0.089 

0.039 

U.S. Momentum 

-0.096 

-0.043 

-0.052 

U.S. Size 

0.284 

0.356 

-0.072 

U.S. Size Non-Linearity 

0.096 

0.123 

-0.027 

U.S. Trading Activity 

0.114 

0.052 

0.062 

U.S. Growth 

-0.096 

-0.069 

-0.027 

U.S. Earnings Yield 

0.553 

0.053 

0.500 

U.S. Value 

0.076 

-0.024 

0.100 

U.S. Earnings Variation 

-0.091 

-0.053 

-0.038 

U.S. Leverage 

-0.153 

-0.110 

-0.043 

U.S. Currency Sensitivity 

0.066 

0.007 

0.059 

U.S. Yield 

0.179 

0.078 

0.100 

U.S. Non-Estimation Universe 

0.000 

0.000 

0.000 


a Managed return. 
b Benchmark return (S&P 500). 

c Active return = Managed return - Benchmark return. 


Instead, an optimization method combined 
with a multifactor risk model can be used 
to construct the desired portfolio. The neces¬ 
sary inputs to this process are the tilt exposure 
sought and the benchmark stock market index. 
Additional constraints can be placed, for ex¬ 
ample, on the number of stocks to be included 
in the portfolio. The Barra optimization model 
can also handle additional specifications such 
as forecasts of expected returns or alphas on 
the individual stocks. 

In our illustration, the tilt exposure sought 
is toward low P/E stocks, that is, toward high 
earnings yield stocks (since earnings yield is 
the inverse of P/E). The benchmark is the S&P 
500. We seek a portfolio that has an average 
earnings yield that is at least 0.5 standard de¬ 
viations more than that of the earnings yield 
of the benchmark. We do not place any limit 
on the number of stocks to be included in the 
portfolio. We also do not want the active expo¬ 
sure to any other risk index factor (other than 
earnings yield) to be more than 0.1 standard 
deviations in magnitude. This way we avoid 
placing unintended bets. While we do not re¬ 
port the holdings of the optimal portfolio here. 
Table 6 provides an analysis of that portfolio by 
comparing the risk exposure of the 50-stock op¬ 
timal portfolio to that of the S&P 500. Though 


not shown in Table 6, there is a similar list of 

exposures to the 55 industry factors. 

KEY POINTS 

• There are three types of multifactor equity 
risk models that are used in practice: statis¬ 
tical, macroeconomic, and fundamental. The 
most popular is the fundamental model. 

• A multifactor equity risk model assumes that 
stock returns (and hence portfolio returns) 
can be explained by a linear model with mul¬ 
tiple factors, consisting of "risk index" factors 
such as company size, volatility, momentum, 
and so on, and "industry" factors. The por¬ 
tion of the stock return that is not explained 
by this model is the stock-specific return. 

• The risk index factors are measured in stan¬ 
dard deviation units, while the industry fac¬ 
tors are measured in percentages. 

• The real usefulness of a linear multifactor 
model lies in the ease with which the risk 
(i.e., the volatility) of a portfolio with several 
assets can be estimated. Instead of estimating 
the variance-covariance matrix of its assets, it 
is only necessary to estimate the portfolio's 
factor exposures and the variance-covariance 
matrix of the factors, a computationally much 
easier task. 
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The variance-covariance matrix of the factors 
and the factor exposures of stocks are calcu¬ 
lated based on a mix of historical and current 
data and are updated periodically. 

Total risk of a portfolio can be decomposed in 
several ways. The partitioning method cho¬ 
sen is based on what is useful given the man¬ 
ager's strategy. The active risk decomposition 
method is useful for managers of index funds 
and traditionally managed active funds. 

The level of active risk of a portfolio and the 
split of the tracking error variance between 
the common factor portion and the stock- 
specific portion are useful in assessing if the 
portfolio is constructed in a way that is con¬ 
sistent with the manager's strengths. 

The list of active factor exposures of a portfo¬ 
lio helps the manager identify its bets, both 
explicit and implicit. If a manager discov¬ 
ers some unintended bets, then the portfo¬ 
lio can be rebalanced so as to minimize such 
bets. 

Using a multifactor risk model and an opti¬ 
mization model, a portfolio that has the mini¬ 
mum active risk relative to its benchmark for 
a given number of assets held can be con¬ 
structed. This application is useful for passive 
managers. 


• Similarly, a manager can construct a portfolio 
that tilts toward a specified factor and has no 
material active exposure to any other factor. 
This application is useful for active managers. 

NOTES 

1. For a discussion of the different types of fac¬ 
tor models, see Connor (1995). 

2. For a more detailed description of each de¬ 
scriptor, see Appendix A in Barra (1998). A 
listing of the 55 industry groups is provided 
in Table 4 in this entry. 

3. See Chapter 4 in Barra (1998). The discussion 
to follow in this section follows that in the 
Barra publication. 

4. The illustrations were created by the authors 
based on applications suggested in Chap¬ 
ter 6 of Barra (1996). 
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Abstract: Multifactor risk models seek to estimate and characterize the risk of a portfolio, either 
in absolute value or when compared against a benchmark. Risk is typically decomposed into a 
systematic and an idiosyncratic component. Systematic risk captures the exposures the portfolio 
has to broad risk factors, such as interest rates or spreads. This risk is driven by the exposure 
of the portfolio to these risk factors, their volatility, and the correlation between these different 
sources of risk. Idiosyncratic risk captures the uncertainty associated with the particular issuers 
in the portfolio. Idiosyncratic risk is diversifiable by spreading the exposure to a large number 
of individual issuers. Multifactor risk models allow for the decomposition of the total risk by risk 
factor (or sets of risk factors). If the factors are economically meaningful, the risk model can provide 
relevant intuition regarding the major variables influencing the volatility of the portfolio and be a 
useful tool in portfolio construction. 


In this entry, we discuss risk modeling con¬ 
struction and applications to fixed income 
portfolios. Although they share a similar frame¬ 
work, multifactor models in fixed income use 
different building blocks and provide a differ¬ 
ent analysis of the risk of a portfolio. 


When analyzing their holdings, portfolio 
managers constantly monitor their exposures, 
typically net of a benchmark: What is the port¬ 
folio net duration? How risky is the overweight 
to credit? How does it relate to the exposure 
to mortgages? What is the exposure to specific 
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issuers? Even when portfolio holdings and ex¬ 
posures are well known, portfolio managers in¬ 
creasingly rely on quantitative techniques to 
translate this information into a common risk 
language. Risk models can present a coherent 
view of the portfolio, its exposures, and how 
they correlate to each other. They can quantify 
the risk of each exposure and its contribution to 
the overall risk of the portfolio. 

Fixed income securities are exposed to many 
different types of risk. Multifactor risk models 
in this area capture these risks by first identi¬ 
fying common sources along different dimen¬ 
sions, the systematic risk factors. All risk not 
captured by systematic factors is considered id¬ 
iosyncratic or security-specific. Typically, fixed 
income systematic risk factors are divided into 
two sets: those that influence securities across 
asset classes (e.g., yield curve risk) and those 
specific to a particular asset class (e.g., prepay¬ 
ment risk in securitized products). 

There are many ways to define systematic risk 
factors. For instance, they can be defined purely 
by statistical methods, observed in the mar¬ 
kets, or estimated from asset returns. In fixed 
income, the standard approach is to use pric¬ 
ing models to calculate the analytics that are 
the natural candidates for risk factor loadings 
(also called sensitivities). In this setting, the 
risk factors are estimated from cross-sectional 
asset returns. This is the approach taken in 
the Barclays Global Risk Model, 1 which is the 
model used for illustration throughout this 
entry. 

In this risk model, the forecasted risk of the 
portfolio is driven by both a systematic and 
an idiosyncratic (also called specific, nonsys- 
tematic, and concentration) component. The 
forecasted systematic risk is a function of the 
mismatch between the portfolio and the bench¬ 
mark in the exposures to the risk factors, such 
as yield curve or spreads. The exposures are 
aggregated from security-level analytics. The 
systematic risk is also a function of the volatility 
of the risk factors, as well as the correlations 
between them. In this setting, the correlation 


of returns across securities is driven by the 
correlation of systematic risk factors these secu¬ 
rities load on. As the model uses security-level 
returns and analytics to estimate the factors, 
we can recover the idiosyncratic return for each 
security. This is the return net of all systematic 
factors. The model uses these idiosyncratic 
returns to estimate rich specifications for the 
idiosyncratic risk. 


APPROACHES USED TO 
ANALYZE RISK 

In what follows, we turn to the analysis of 
the risk of a particular portfolio, going through 
the different approaches typically used. Specif¬ 
ically, consider a portfolio manager that is 
benchmarked against the Barclays US. Aggre¬ 
gate Index. Moreover, suppose she believes in¬ 
terest rates are coming down—so she wants to 
be long duration—and that she wants some ex¬ 
tra yield in her portfolio—meaning investing in 
bonds with relatively higher spreads. Finally, 
let us assume that she is mandated to keep the 
difference between the returns of the portfolio 
and the benchmark at around 15 basis points, 
on a monthly basis. Therefore, she has to track a 
benchmark, but is allowed to deviate from it up 
to a point in order to express views that hope¬ 
fully lead to superior returns. A portfolio man¬ 
ager with such a mandate is called an enhanced 
indexer. The amount of deviation allowed is 
called the risk budget (15 basis points in our 
example) and can be quantified using a risk 
model. The risk model produces an estimate 
of the volatility of the difference of the portfo¬ 
lio and benchmark returns, called tracking error 
volatility (TEV). (In this entry, we refer to TEV, 
risk, and the standard deviation of the portfo¬ 
lio net returns interchangeably.) The portfolio 
manager should keep the TEV at a level equal 
to or less than her risk budget. For illustration, 
we construct a portfolio with 50 securities that 
is consistent with the portfolio manager's views 
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Table 1 Market Weights for Portfolio and Benchmark Table 2 Aggregate Analytics 


Asset Class 

Portfolio 

Benchmark 

Difference 

Total 

100.0 

100.0 

0.0 

Treasury 

30.2 

32.1 

-1.9 

Government- 

5.8 

12.3 

-6.5 

related 

Corporate 

9.0 

9.7 

-0.7 

industrials 

Corporate utilities 

2.9 

2.1 

0.8 

Corporate 

18.6 

6.4 

12.2 

financials 

MBS agency 

28.4 

34.1 

-5.8 

ABS 

0.0 

0.3 

-0.3 

CMBS 

5.2 

3.1 

2.1 


and risk budget and analyze it throughout this 
entry. 

Market Structure and Exposure 
Contributions 

The first level of analysis that any portfolio 
manager usually performs is to compare the 
portfolio holdings in terms of market value 
with the holdings from the benchmark. For in¬ 
stance, Table 1 shows that the composition of 
the portfolio has several important mismatches 
when compared with the benchmark. The 
portfolio is underweighted in Treasuries and 
government-related securities by 8.4%. This is 
compensated with an overweight of 12.3% in 
corporates, especially in the financials sector. 
Other mismatches include an underweight in 
mortgage-backed securities (MBS) (—5.8%) and 
an overweight in commercial mortgage-backed 
securities (CMBS) (+2.1%). 

Interestingly, for an equity manager, this 
kind of information—for example, applied 
to the different industries or sectors of the 
portfolio—would be of paramount importance 
to the analysis of the risk of her portfolio. 
For a fixed income portfolio, this is not the 
case. Although important, this analysis tells us 
very little about the true active exposures of a 
fixed income portfolio. What if the Treasuries 
in the portfolio have significantly longer dura¬ 
tion than those in the benchmark—would we 


Analytics 

Portfolio 

Benchmark 

Difference 

Duration 

4.55 

4.30 

0.25 

Spread duration 

4.67 

4.56 

0.11 

Convexity 

-0.15 

-0.29 

0.13 

Vega 

-0.02 

-0.01 

-0.01 

Spread 

157 

57 

100 


be really "short" in this asset class? What if 
the spreads from financials in the portfolio are 
much smaller than those in the benchmark—is 
the weight mismatch that important? 

To answer this kind of questions, we turn to 
another typical dimension of analysis—the ex¬ 
posure of the portfolio to major sources of risk. 
An example of such a risk exposure is the dura¬ 
tion of the portfolio. Other exposures typically 
monitored are the spread duration, convexity, 
spread level, and vega (if the portfolio has many 
securities with optionality, such as mortgages or 
callable bonds). 

Table 2 shows these analytics at the aggre¬ 
gate level for our portfolio, benchmark, and 
the difference between the two. In particular, 
we can see that the portfolio is long duration 
(+0.25 years), consistent with the forecast the 
manager has regarding yield curve moves. In 
terms of spread duration, the mismatch is some¬ 
what smaller. We can also see that the portfolio 
has significantly lower negative convexity than 
the benchmark (—0.15 versus —0.29), probably 
coming from the smaller weight MBS securities 
have in the portfolio. The portfolio has also a 
higher negative vega, but the number is reason¬ 
ably small for both universes. Finally, the port¬ 
folio has significantly higher spreads (100 basis 
points) than the benchmark. This mismatch is 
consistent with the manager's goal of having a 
higher yield in her portfolio, when compared 
with the benchmark. 

The analysis in the Tables 1 and 2 can be 
combined to deliver a more detailed picture of 
where the different exposures are coming from. 
Table 3 shows that analysis for the duration of 
the portfolio. This exhibit shows that the ma¬ 
jority of the mismatch in duration contribution 
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Table 3 Duration Contribution per Asset Class Table 4 Isolated Risk per Category 


Duration 


Contribution 

Portfolio 

Benchmark 

Difference 

Total 

4.55 

4.30 

0.25 

Treasury 

1.92 

1.71 

0.21 

Government- 

0.40 

0.49 

-0.09 

related 

Corporate 

1.31 

1.19 

0.11 

Securitized 

0.92 

0.90 

0.02 


(market-weighted duration exposures) comes 
from the Treasury component of our portfo¬ 
lio (+0.21). Interestingly, even though we are 
short in Treasuries, we are actually long in du¬ 
ration for that asset class. This means that our 
Treasury portfolio will be negatively impacted, 
when compared with the benchmark, by an in¬ 
crease in interest rates. Because we are short in 
Treasuries, this result must mean that our Trea¬ 
sury portfolio is longer in duration than the 
Treasury component of the benchmark. Con¬ 
versely, we have a relatively small contribution 
to excess duration coming from our very large 
overexposure to corporates. This means that on 
average the corporate bonds in the portfolio are 
significantly shorter in duration than those in 
the benchmark. 


Adding Volatility and Correlations 
into the Analysis 

The analysis above gives us some basic under¬ 
standing of our exposures to different kinds of 
risk. However, it is still hard to understand how 
we can compare the level of risk across these dif¬ 
ferent exposures. What is more risky, the long 
duration exposure of 0.25 years, or the extra 
spread of 100 basis points? How can we quan¬ 
tify how serious is the vega mismatch on my 
portfolio? Specifically, the risk of the portfolio 
is a function of the exposures to the risk factors, 
but also of how volatile (how "risky") each of 
the factors is. So to enhance the analysis we 
bring volatilities into the picture. Table 4 shows 
the outcome of this addition to our example. In 


Risk Factors Categories Risk 


Curve 8.5 

Volatility 1.7 

Spread government-related 3.0 

Spread corporate 5.1 

Spread securitized 3.0 


particular, it displays the risk of the different 
exposures of the portfolio in isolation (that is, if 
the only active imbalances were those from that 
particular set of risk factors). 

For example, in Table 4 one can see that if 
the only active weight in the portfolio were the 
mismatch in the yield curve exposures, the risk 
of the portfolio would be 8.5 basis points per 
month. By adding volatilities into the analy¬ 
sis, we can now quantify that the mismatch of 
+0.25 years in duration "costs" the portfolio 8.5 
basis points per month of extra volatility, when 
taken in isolation. 2 Similarly, if the only mis¬ 
match were the exposure to corporate spreads, 
the risk of the portfolio would be 5.1 ba¬ 
sis points. Interestingly, we also see that 
both government-related and securitized sec¬ 
tors have nontrivial risk, despite having smaller 
imbalances in terms of market weights. By 
bringing volatilities into the analysis, we can 
now compare and quantify the impact of each 
of the imbalances in the portfolio. 

For future reference, consider the volatility of 
the portfolio if all these sources of risk were in¬ 
dependent (e.g., correlations were zero). That 
number would be 10.9 basis points per month. 3 
Of course, this scenario is unrealistic, as these 
sources of risk are not independent. Also, this 
analysis does not allow us to understand the 
interplay between the different imbalances. For 
instance, we know that the isolated risk asso¬ 
ciated with the curve is 8.5. But this value can 
be achieved both by being long or short dura¬ 
tion. So the isolated number does not allow us 
to understand the impact of the curve imbal¬ 
ance to the total risk of the portfolio. The net 
impact certainly depends on the sign of the im¬ 
balance. For instance, if the long exposure in 
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Table 5 Correlated Risk per Category 


Risk Factors Categories 

Risk 

Total 

9.3 

Curve 

5.9 

Volatility 

0.1 

Spread government-related 

0.1 

Spread corporate 

2.4 

Spread securitized 

0.7 


curve is diversified away by a long exposure in 
credit (due, for instance, to negative correlation 
between rates and credit spreads), a symmetric 
(short) curve exposure would add to the risk of 
the long exposure in credit. The risk is clearly 
smaller in the first case. 

To alleviate these shortcomings, we bring 
correlations into the picture. They allow us to 
understand the net impact of the different ex¬ 
posures to the portfolio's total risk and to detect 
potential sources of diversification among the 
imbalances in the portfolio. Table 5 reports the 
contribution of each of the risk factor groups 
to the total risk, once all correlations are taken 
into account. The total risk (9.3 bps/month) 
is smaller than the zero-correlation risk calcu¬ 
lated before (10.9 bps/month) due to generally 
negative correlations between the curve and 
the spread factors. The exhibit also allows us 
to isolate the main sources of risk as being 
curve (5.9 bps/month) and credit spreads (2.4 
bps/month), in line with the evidence from the 
earlier analysis. In particular, the risk of the 
government-related and securitized spreads 
is significantly smaller once correlations are 
taken into account. 

The difference in analysis between the iso¬ 
lated and correlated risks reported in Tables 4 
and 5 deserves a bit more discussion. For sim¬ 
plicity, assume there are only two sources of risk 
in the portfolio—yield curve (Y) and spreads 
(S). The total systematic variance of the portfo¬ 
lio (P) can be illustrated as follows: 

VAR(P) = VAR(Y+S) 

= VAR(Y ) + VAR(S) + 2 COV{Y, S) 
= YxY+SxS + 2(YxS) 


where we use the product (x) to represent vari¬ 
ances and covariances. Another way to rep¬ 
resent this summation is using the following 
matrix: 

’ Y x Y Y x S' 

_Y x S Sx S_ 

The sum of the four elements in the matrix 
is the variance of the portfolio. The isolated 
risk (in standard deviation units) reported in 
Table 4 is the square root of the diagonal terms. 
So the isolated risk due to spreads is repre¬ 
sented as 

= ^/~Sx~S 

It would be a function of the exposure to all 
spread factors, the volatilities of all these fac¬ 
tors, and the correlations among them. 

The correlated risk reported in Table 5 is 

n™k c s °z l d T d = [y x s + s x syjvAR(P) 

that is, we sum all elements in the row of 
interest (row 1 for Y, row 2 for S) from the 
matrix above, and normalize it by the stan¬ 
dard deviation of the portfolio. This statis¬ 
tic (1) takes into account correlations and (2) 
ensures that the correlated risks of all fac¬ 
tors add up to the total risk of the portfo¬ 
lio {Risk c c °Z lated + Risk c s ;ZT d = s/VAim = 
STD(P)) , 4 

The generic analysis we just performed con¬ 
stitutes the first step into the description of 
the risk associated with a portfolio. The anal¬ 
ysis refers to categories of risk factors (such as 
"curve" or "spreads"). However, a factor-based 
risk model allows for a significantly deeper 
analysis of the imbalances the portfolio may 
have. Each of the risk categories referred to 
above can be described with a rich set of de¬ 
tailed risk factors. Typically in a fixed income 
factor model, each asset class has a specific set 
of risk factors, in addition to the potential set 
of factors common to all (e.g., curve factors). 
These asset-specific risk factors are designed to 
capture the particular sources of risk the asset 
class is exposed to. In the following section, we 
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go through a risk report built in such a way, 
emphasizing risk factors that are common or 
particular to the different asset classes. Along 
the way, we demonstrate how the report offers 
insights from both a risk management and a 
portfolio construction perspective. 

A Detailed Risk Report 

In this section, we continue the analysis of the 
portfolio introduced previously, a 50-bond port¬ 
folio benchmarked against the Barclays US Ag¬ 
gregate Index. The report package we present 
was generated using POINT®, Barclays cross¬ 
asset portfolio analysis and construction sys¬ 
tem, and gives a very detailed picture of the 
risk embedded in the portfolio. The package 
is divided into four types of reports: summary 
reports, factor exposure reports, issue/issuer 
level reports, and scenario analysis reports. 
Some of the information we reviewed earlier 
can be thought of as summary reports. 

Summary Report 

Table 6 illustrates a typical risk summary statis¬ 
tics report. It shows that the portfolio has 50 po¬ 
sitions, but from only 27 issuers. This number 
implies limited ability to diversify idiosyncratic 
risk, as we will see below. The report confirms 
that the portfolio is long duration (OAD of 4.55 
years versus 4.30 years for the benchmark) and 
has higher yield (yield to worst of 3.71% versus 
2.83% for the benchmark) and coupon (4.73% 
versus 4.46% for the benchmark). 

The table also reports that the total volatil¬ 
ity of the portfolio (163.3 bps/month) is higher 
than that of the benchmark (158.1 bps/month). 
This is not surprising: longer duration, higher 
spread and less diversification all tend to in¬ 
crease the volatility of a portfolio. Because of 
its higher volatility, we refer to the portfolio 
as riskier than the benchmark. Looking into 
the different components of the portfolio's total 
volatility, the table reports that the idiosyncratic 
volatility of the portfolio is significantly smaller 


Table 6 Summary Statistics Report 



Portfolio 

Benchmark 


A. Parameter 

Positions 

50 

8,191 


Issuers 

27 

787 


Currencies 

1 

1 


Market value 

200 

14,762 


($ millions) 

Notional 

187 

13,750 


($ millions) 

B. Analytics 

Portfolio 

Benchmark 

Difference 

Coupon 

4.73 

4.46 

0.27 

Average life 

6.63 

6.35 

0.27 

Yield to worst 

3.71 

2.83 

0.88 

Spread 

157 

57 

100 

Duration 

4.55 

4.30 

0.25 

Vega 

-0.02 

-0.01 

-0.01 

Spread duration 

4.67 

4.56 

0.11 

Convexity 

-0.15 

-0.29 

0.13 

C. Volatility 

Portfolio 

Benchmark 

TEV 

Systematic 

162.9 

158.0 

9.3 

Idiosyncratic 

11.1 

5.6 

10.1 

Total 

163.3 

158.1 

13.7 

D. Portfolio Beta 



1.03 


than that of the systematic (11.1 bps/month 
versus 162.9 bps/month, respectively). This is 
also expected from a portfolio of investment- 
grade bonds. Given the fact that by construction 
the systematic and idiosyncratic components of 
risk are independent, we can calculate the total 
volatility of the portfolio as 

TEVptf = Vl62.9 2 + ll.l 2 = 163.3 

There are two interesting observations re¬ 
garding this number: first, the total volatil¬ 
ity is smaller than the sum of the volatilities 
of the two components. This is the diversifi¬ 
cation benefit that comes from combining in¬ 
dependent sources of risk. Second, the total 
volatility is very close to the systematic one. 
This may suggest that the idiosyncratic risk is 
irrelevant. That is an erroneous and danger¬ 
ous conclusion. In particular, when managing 
against a benchmark, the focus should be on 
the net exposures and risk, not on their absolute 
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Table 7 Factor Partition—Risk Analysis 


Risk Factor Group 

Isolated TEV 

Contribution to TEV 

Liquidation Effect on TEV 

TEV Elasticity (%) 

Total 

13.7 

13.7 

-13.7 

1.0 

Systematic risk 

9.3 

6.3 

-3.6 

0.5 

Curve 

8.5 

4.0 

-1.5 

0.3 

Volatility 

1.7 

0.1 

0.0 

0.0 

Government-related 

3.0 

0.1 

0.2 

0.0 

spreads 

Corporate spreads 

5.1 

1.6 

-0.7 

0.1 

Securitized spreads 

3.0 

0.5 

-0.2 

0.1 

Idiosyncratic risk 

10.1 

7.4 

-4.4 

0.5 


counterparts. In Table 6 the total TEV is re¬ 
ported as 13.7 bps/month. This means that 
the model forecasts the portfolio return to be 
typically no more than 14 bps /month higher 
or lower than the return of the benchmark. 
This number is in line with the risk budget of 
our manager. The exhibit also reports idiosyn¬ 
cratic TEV of 10.1 bps/month, which is greater 
than the systematic TEV (9.3). When measured 
against the benchmark, our major source of risk 
is idiosyncratic, contrary to the conclusion one 
could draw by looking only at the portfolio's 
volatility. The TEV of our portfolio is also big¬ 
ger than the difference between the volatilities 
of the portfolio and benchmark. Again, this is 
not surprising: The volatility depends on the 
absolute exposures, while the TEV measures 
imbalances between these absolute exposures 
from the portfolio and the benchmark. For the 
TEV what matters most is the correlation be¬ 
tween these absolute exposures. Depending on 
this correlation, the TEV may be smaller or big¬ 
ger than the difference in volatilities. 

Finally, the report estimates the portfolio to 
have a beta of 1.03 to the benchmark. This 
statistic measures the co-movement between 
the portfolio and the benchmark. We can read it 
as follows: The model forecasts that a move¬ 
ment of 10 bps in the benchmark leads to a 
movement of 10.3 bps in the portfolio in the 
same direction. Note that a beta of less than one 
does not mean that the portfolio is less risky 
than the benchmark. In the limit, if the portfo¬ 
lio and benchmark are uncorrelated, the port¬ 


folio beta is zero but obviously that does not 
mean that the portfolio has zero risk. Finally, 
one can compute many different "betas" for the 
portfolio or subcomponents of it. 5 A simple and 
widely used one is the "duration beta," given by 
the ratio of the portfolio duration to that of the 
benchmark. In our case this ratio is 4.55/4.30 = 
1.06. This implies that the portfolio has a return 
from yield curve movements around 1.06 times 
larger than that of the benchmark. This beta 
is larger than the portfolio beta (1.03), meaning 
that net exposures to other factors (e.g., spreads) 
"hedge" the portfolio's curve risk. 

This first summary report (Table 6) allows us 
to get a glimpse into the risk of the portfolio. 
However, we want to know in more detail what 
the source of this risk is. To do that, we turn to 
the next two summary reports. In the first, risk 
is partitioned across different groups of risk fac¬ 
tors. In the second, the partition is across groups 
of securities/asset classes. 

Table 7 shows four different statistics associ¬ 
ated with each set of risk factors. The first two 
were somewhat explored in Tables 4 and 5. 6 
The exhibit reports in the first column the iso¬ 
lated TEV, that is, the risk associated with that 
particular set of risk factors only. We see that 
in an isolated analysis, the systematic and id¬ 
iosyncratic risks are balanced, at 9.3 and 10.1 
respectively. The report also shows the isolated 
risk associated with the major components of 
systematic risk. As discussed before, all com¬ 
ponents of systematic risk have nontrivial iso¬ 
lated risk, but only curve and credit spreads 
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Table 8 Security Partition—Risk Analysis I 


Security Partition Bucket 

NMW (%) 

Contribution to TEV 

Systematic Idiosyncratic 

Total 

Total 

0.0 

6.3 

7.4 

13.7 

Treasuries 

-2.0 

2.9 

0.2 

3.1 

Government agencies 

-5.4 

0.5 

0.4 

0.9 

Government nonagencies 

-1.0 

-1.4 

0.1 

-1.3 

Corporates 

12.4 

3.4 

4.3 

7.7 

MBS 

-5.8 

0.9 

0.8 

1.7 

ABS 

-0.3 

0.0 

0.0 

0.0 

CMBS 

2.1 

0.0 

1.6 

1.6 


are significant when we look into the contribu¬ 
tions to TEV. If we look across factors, the major 
contributors are idiosyncratic risk, curve, and 
credit spreads. Other systematic exposures are 
relatively small. 

Another look into the correlation comes when 
we analyze the liquidation effect reported in 
the table. This number represents the change in 
TEV when we completely hedge that particular 
group of risk factors. For instance, if we hedge 
the curve component of our portfolio, our TEV 
drops by 1.5 bps/month, from 13.7 to 12.2. One 
may think that the drop is rather small, given 
the magnitude of isolated risk the curve rep¬ 
resents. However, if we hedge the curve, we 
also eliminate the beneficial effect the negative 
correlation between curve and spreads have on 
the overall risk of the portfolio. Therefore, we 
have a more limited impact when hedging the 
curve risk. In fact, for this portfolio we see that 
hedging any particular set of risk factors has a 
limited effect in the overall risk. 

The TEV elasticity reported in the last column 
gives another perspective into how the TEV in 
the portfolio changes when we change the risk 
loadings. Specifically, it tells us what the per¬ 
centage change in TEV would be if we changed 
our exposure to that particular set of factors by 
1%. We can see that if we reduce our exposure 
to corporate spreads by 1%, our TEV would de¬ 
crease by 0.1%. 

We perform a similar analysis in Table 8, but 
applied to a security partition. That is, instead 
of looking at individual sources of risk (e.g.. 


curve) across all securities, we now aggregate 
all sources of risk within a security and report 
analytics for different groups of these securities 
(e.g., subportfolios). In particular. Table 8 
reports the results by asset class. We can see 
that the majority of risk (7.7 bps/month) is 
coming from the corporate component of the 
portfolio. 7 Corporates are also the primary 
contributors to the portfolio's systematic and 
idiosyncratic components of risk. This is not 
surprising, given the portfolio's large net 
market weight (NMW) to this sector. There 
are two other important sources of risk. The 
first is the Treasuries subportfolio, with 3.1 
bps/month of risk. This risk comes mainly 
from the mismatch in duration. The second 
comes from the idiosyncratic risk of the CMBS 
component of the portfolio. Even though the 
NMW and systematic risk are not significant 
for this asset class, the relatively small number 
of (risky) CMBS positions in the portfolio 
causes it to have significant idiosyncratic risk 
(three securities in the portfolio versus 1,735 in 
the index). Since the portfolio manager is trying 
to replicate a very large benchmark with only 
50 positions, she has to be very confident in 
the issuers selected. This report highlights the 
significant name risk the portfolio is exposed to. 

Table 9 completes the analysis, reporting 
other important risk statistics about the dif¬ 
ferent asset classes within the portfolio. These 
statistics mimic the analysis done in terms of 
risk factor partitions in Table 7, so we will 
not repeat their definitions. We focus on the 
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Table 9 Security Partition—Risk Analysis II 


Security Partition 
Bucket 

Isolated 

TEV 

Liquidation 
Effect on 
TEV 

TEV 

Elasticity 

(%) 

Total 

13.7 

-13.7 

1.0 

Treasuries 

7.4 

-1.1 

0.2 

Government 

9.1 

2.0 

0.1 

agencies 

Government 

6.7 

2.7 

-0.1 

nonagencies 

Corporates 

15.2 

0.6 

0.6 

MBS 

5.8 

-0.5 

0.1 

ABS 

1.1 

0.1 

0.0 

CMBS 

5.1 

-0.7 

0.1 


numbers. In particular, the isolated TEV from 
the corporate sector is 15.2 bps/month, higher 
than the total risk of the portfolio. This means 
that the exposures to the other asset classes, on 
average, hedge our credit portfolio. The exhibit 
also reports that the agencies isolated risk is 
very large. This is due to the large negative net 
exposure (—5.4%) we have to this asset class. 
But the risk is fully hedged by the other ex¬ 
posures of the portfolio (e.g., long exposure to 
credit or long duration on Treasuries), so overall 
the risk contribution of this asset class is small, 
as previously discussed. We can even take the 
analysis a bit further: Table 9 shows us through 
the liquidation effect that if we eliminate the 
imbalance the portfolio has on agencies, we ac¬ 
tually would increase the total risk of the port¬ 
folio by 2.0 bps/month. In short, we would 
be eliminating the hedge this asset class pro¬ 
vides to the global portfolio, therefore increas¬ 
ing its risk. The exposures to this asset class 
were clearly built to counteract other exposures 
in the portfolio. Finally, Table 9 also reports the 
TEV elasticity of the different components of 
the portfolio. This number represents the per¬ 
centage change in TEV if the NMW to that sub¬ 
portfolio changes by 1%, so we need to read the 
numbers with an opposite sign if the NMW is 
negative. In particular, if we increase the weight 
of the agency portfolio in absolute value (mak¬ 
ing it "more short") by 1%, we would actually 
increase the TEV by 0.1%. This result shows that 


the position in agencies provides hedging "on 
average," but marginally it is already increas¬ 
ing the risk of the portfolio. In other words, the 
hedging went beyond its optimal value. 

This set of summary reports gives us a very 
clear picture of the major sources of risk and 
how they relate to each other. In what follows, 
we focus on the more detailed analysis of the 
individual systematic sources of risk. 

Factor Exposure Reports 

At the heart of a multifactor risk model is the 
definition of the set of systematic factors that 
drive risk across the portfolio. As described 
above, there are different types of risk a fixed 
income portfolio is exposed to. In what follows, 
we focus on the three major types: curve, credit, 
and prepayment risk. Specifically in what re¬ 
gards the latter two, we use the credit and 
MBS component of the portfolio, respectively, 
to illustrate how to measure risks along these 
dimensions. Moreover, to keep the example 
simple, we show only a partial view of all rel¬ 
evant factors for these sources of risk. Later in 
this section we refer briefly to other sources of 
risk a fixed income portfolio maybe exposed to. 

Curve Risk As the previous analysis shows 
(e.g.. Table 7), curve is the major source of risk 
in our portfolio. This kind of risk is embedded 
in virtually all fixed income securities (excep¬ 
tions are, for instance, floaters and distressed 
securities), therefore mismatches are very pe¬ 
nalizing. 

When analyzing curve risk, we should use 
the curve of reference we are interested in. De¬ 
pending on the portfolio and circumstances, 
this is typically the government or swap curve. 8 
In calm periods, the behavior of the swap 
curve tends to match that of the government 
curve. However, during liquidity crises (e.g., 
the Russian crisis in 1998 or the credit crisis 
in 2008), they can diverge significantly. To cap¬ 
ture these different behaviors adequately, we 
analyze curve risk using the following decom¬ 
position: For government products, the curve 





276 


Factor Models for Portfolio Construction 


risk is assessed using the government curve. 
For all other products in our portfolio (that usu¬ 
ally trade off the swap curve), this risk is mea¬ 
sured using both the Treasury curve and swap 
spreads (i.e., the spreads between the swap and 
the government curve). Other decompositions 
are also possible. 

The risk associated with each of these curves 
can be described by the exposure the portfolio 
has to different points along the curve and 
how volatile and correlated the movement 
in these points of the curve are. A additional 
convexity term is sometimes used to capture 
the non-linear components of curve risk. For 
a typical portfolio, a good description of the 
curve can be achieved by looking at a relatively 
small number of points along the curve (called 
key rates), for example, 6-month, 2-year, 5-year, 
10-year, 20-year, and 30-year. An alternative 
set of factors used to capture yield curve risk 
can be defined using statistical analysis of the 
historical realizations of the various yield curve 
points. The statistical method used most often 
is called principal component analysis (PCA). 
This method defines factors that are statistically 
independent of each other. Typically three or 
four such factors are sufficient to explain the 
risk associated with changes of yields across 
the yield curve. PCA analysis has several 
shortcomings and must be used with caution. 
Using a larger set of economic factors, such as 
the key rate points described above, is more 
intuitive and captures the risk of specialized 
portfolios better. In our analysis, we follow the 
key rates approach. 

Table 10 details the risk in our portfolio associ¬ 
ated with the US Treasury curve. It starts by de¬ 
scribing all risk factors our portfolio or bench¬ 
mark load on. As discussed above, we identify 
the six key rate (KR) points in the curve plus 
the convexity term as the risk factors associated 
with US Treasury risk. They are described in 
the first column of panel A in the exhibit. They 
measure the risk associated with moves in that 
particular point in the curve. Exposure to these 
risk factors is measured by the key rate dura¬ 


tions (KRD) for each of the six points. The de¬ 
scription of the loading is in the second column 
of the exhibit, while its value for the portfolio, 
benchmark, and the difference is displayed in 
the next columns. Key rate durations are also 
called partial durations, as they add up to ap¬ 
proximately the duration of the portfolio. Their 
loadings are constructed by aggregating par¬ 
tial durations across (virtually) all the securi¬ 
ties. For instance, for our portfolio, the sum of 
the key rate durations is 0.14 + 0.86 + 1.30 + 
0.77 + 1.02 + 0.47 = 4.56, very close to the total 
duration of our portfolio. 

Looking at the table, we see significant 
mismatches in the duration profiles between 
our portfolio and its benchmark, namely at 
the 10-year and 20-year points on the curve. 
Specifically, we are short 0.41 years at the 
10-year point and long 0.53 years at the 20-year 
point. Flow serious is this mismatch? Looking 
at the factor volatility column, it can be seen 
that these points on the curve have been very 
volatile at around 40 bps/month. If we inter¬ 
pret this volatility as a typical move, the first 
two columns of panel B show us the potential 
impact of such a movement in the return of 
our portfolio, net of benchmark. For instance, 
a typical move up (+44.2 bps/month) in the 
10-year point of the Treasury curve, when 
considered in isolation, will deliver a positive 
net return of 15.9 bp. 9 In isolation, the positive 
impact is expected because we are short that 
point of the curve. More interesting may be 
the correlated number on the exhibit. It states 
the return impact but in a correlated fashion. 
In the scenario under analysis, a movement in 
the 10-year point will almost certainly involve 
a movement of the neighboring points in 
the curve. So, contrary to the positive isolated 
effect documented above, the correlated impact 
of a change up in the 10-year point is actually 
negative, at —5.0 bps. This result is in line 
with the overall positive duration exposure the 
portfolio has: General (correlated) movements 
up in the curve have negative impact in the 
portfolio's performance. 10 Finally, and broadly 
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Table 10 Treasury Curve Risk 


A. Exposures and Factor Volatility 


Exposure 


Factor Name 

Units 

Portfolio 

Benchmark 

Net 

Factor Volatility 

USD 6M key rate 

KRD (Yr) 

0.14 

0.15 

-0.01 

36.0 

USD 2Y key rate 

KRD (Yr) 

0.86 

0.70 

0.15 

38.0 

USD 5Y key rate 

KRD (Yr) 

1.30 

1.25 

0.05 

44.3 

USD 10Y key rate 

KRD (Yr) 

0.77 

1.13 

-0.36 

44.2 

USD 20Y key rate 

KRD (Yr) 

1.02 

0.53 

0.49 

39.6 

USD 30Y key rate 

KRD (Yr) 

0.47 

0.53 

-0.06 

39.7 

USD convexity 

OAC 

-0.15 

-0.29 

0.13 

8.4 

B. Other Risk Statistics 


Return Impact of a Typical 






Move 







Marginal Contribution 


Factor Name 

Isolated 

Correlated 

to TEV 


TEV Elasticity (%) 

USD 6M key rate 

0.5 

-2.4 

6.3 


0.0 

USD 2Y key rate 

-5.8 

-4.5 

12.2 


0.1 

USD 5Y key rate 

-2.0 

-4.5 

14.5 


0.0 

USD 10Y key rate 

15.9 

-5.0 

15.9 


-0.4 

USD 20Y key rate 

-19.5 

-5.2 

14.9 


0.5 

USD 30Y key rate 

2.5 

-5.2 

14.8 


-0.1 

USD convexity 

1.1 

2.0 

1.2 


0.0 


speaking, the (negative of the) ratio of the cor¬ 
related impact to the factor volatility gives us 
the model-implied partial empirical duration 
of the portfolio. For instance, if we focus on the 
10-year point, we get —(—5.0/44.2) = 0.11. 
This smaller empirical duration is typical in 
portfolios with spread exposure. The spread 
exposure tends to empirically hedge some 
of the curve exposure, given the negative 
correlation between these two sources of risk. 
Finally, the exhibit shows the risk associated 
with convexity. We can see that the benchmark 
is significantly more negatively convex, so the 
portfolio is less responsive than the benchmark 
to higher order changes in the yield curve. 

There are many other statistics of interest one 
can analyze regarding the Treasury curve risk 
of the portfolio. Portfolio managers frequently 
have questions such as: If I want to reduce 
the risk of my portfolio by manipulating 
my Treasury curve exposure, what should I 
change? What is the most effective move? By 
how much would my risk actually change? The 


statistics reported in the columns "Marginal 
Contribution to TEV" and "TEV Elasticity 
(%)" of panel B are typically used to answer 
these questions. Regarding the marginal con¬ 
tributions, the 10-year point has the largest 
value, indicating that an increase (reduction) 
of one unit of exposure (in this case one year 
of duration) to the 10-year point leads to an 
increase (reduction) of around 16 bps in the 
TEV. 11 In other words, if we want to reduce 
risk by manipulating our exposure to the yield 
curve, the 10-year point seems to present the 
fastest track. In addition, the exhibit shows that 
all Treasury risk factors are associated with 
positive marginal contributions. This means 
that an increase in the exposure to any of these 
factors increases the risk (TEV) of the portfolio. 
This conclusion holds, even for factors for 
which we have negative exposure (e.g., the 
10-year key rate). The reason behind this result 
is our overall long duration exposure. If we 
add exposure to it, regardless of the specific 
point where we add it, we extend our duration 
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Table 11 Swap Spread (SS) Risk 


Factor Name 

Exposure (SS-KRD) 


Factor 

Volatility 

Return Impact 
Correlated 

Marginal 
Contribution 
to TEV 

Portfolio 

Benchmark 

Net 

6M SS 

0.14 

0.13 

0.01 

39.1 

-2.1 

5.8 

2YSS 

0.52 

0.47 

0.04 

20.4 

-2.1 

3.0 

5YSS 

0.84 

0.75 

0.09 

9.6 

-2.0 

1.4 

10YSS 

0.71 

0.68 

0.03 

14.1 

1.7 

-1.8 

20YSS 

0.34 

0.33 

0.01 

17.0 

2.2 

-2.7 

30YSS 

0.06 

0.20 

-0.15 

20.1 

2.4 

-3.5 


even further, increasing the mismatch our 
portfolio has in terms of duration, and so 
increasing its risk. 12 This result holds because 
we take into consideration the correlations be¬ 
tween the different points in the Treasury curve. 
Without correlations, the analysis would be 
significantly less clear. The exhibit also reports 
the TEV elasticity of each of the risk factors, a 
concept introduced earlier. The interpretation 
is similar to the marginal contribution, but with 
normalized changes (percentage changes). 
This normalization makes the numbers more 
comparable across risk factors of very different 
nature. It is also useful when considering 
leveraging the entire portfolio proportionally. 
In our case, if we increase the exposure to the 
10-year key rate point by 10%, from —0.36 to 
something around —0.40 (effectively reducing 
our long duration exposure), our TEV would be 
reduced by 4% (from 13.7 to 13.2 bps/month). 

We now turn the analysis to the other com¬ 
ponent of the curve risk described above: the 
risk embedded into the portfolio exposure to 
the swap spread, that is, the spread between 
the swap and the Treasury curves. All securities 
that trade against the swap curve (e.g., all typi¬ 
cal credit and securitized bonds) are exposed to 
this risk. Its analysis follows very closely that of 
the Treasury curve, so we only highlight the ma¬ 
jor risk characteristics of the portfolio along this 
dimension. Table 11 shows that in general our 
exposure to the swap spreads is smaller than the 
exposure to the Treasury curve. Remember that 
Treasuries do not load on this set of risk factors, 
so the market-weighted exposures are conse¬ 


quently smaller. Looking at the profile of factor 
volatilities, one can see that its term structure of 
volatilities is U-shaped, with the short end ex¬ 
tremely volatile and the five-year point having 
the lowest volatility. When comparing with the 
Treasury curve volatility profile (see Table 10), 
we can see significant differences, the aftermath 
of a strong liquidity crisis. Regarding net expo¬ 
sures, the exhibit shows that our largest mis¬ 
match is at the 30-year point, where we are short 
by 0.15 years. Interestingly, this is not the most 
expensive mismatch in terms of risk: When 
looking at the last column, we see that we would 
be able to change risk the most by manipulat¬ 
ing the short end of our exposure to the swap 
spread curve, namely the six-month point. 

The previous tables allow us to understand 
our exposures to the different types of curve 
risk and their impact both on the return and risk 
of our portfolios. They also guide us regarding 
what changes we can introduce to modify the 
risk profile of the portfolio. We now turn our 
attention to sources of risk that are more specific 
to particular asset classes. In particular, we start 
with the analysis of credit risk. 

Credit Risk Instruments in the portfolio is¬ 
sued by corporations or entities that may de¬ 
fault are said to have credit risk. The holders of 
these securities demand some extra yield—on 
top of the risk-free yield—to compensate for 
that risk. The extra yield is usually measured 
as a spread to a reference curve. For instance, 
for corporate bonds the reference curve is usu¬ 
ally the swap curve. The level of credit spreads 
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determines to a large extent the credit risk ex¬ 
posure associated with the portfolio. 13 

There are several characteristics of credit 
bonds that are naturally associated with 
systematic sources of credit spread risk. For 
instance, depending on the business cycle, 
particular industries may be going through es¬ 
pecially tough times. So industry membership 
is a natural systematic source of risk. Similarly, 
bonds with different credit ratings are usually 
treated as having different levels of credit risk. 
Credit rating could be another dimension we 
can use to measure systematic exposure to 
credit risk. Given these observations, it is com¬ 
mon to see factor models for credit risk using 
industry and rating as the major systematic risk 
factors. Recent research suggests that risk mod¬ 
els that directly use the spreads of the bonds 
instead of their ratings to assess risk perform 


better for relatively short/medium horizons 
of analysis. 14 Under this approach, the loading 
of a particular bond to a credit risk factor 
would be the commonly used spread duration 
multiplied by the bond's spread (the loading 
is termed DTS = Duration Times Spread = 
OASD x OAS). By directly using the spread 
of the bond in the definition of the loading to 
the credit risk factors we do not need to assign 
specific risk factors to capture the rating or any 
similar quality-like effect. It will be automat¬ 
ically captured by the bond's loading to the 
credit risk factor and will adjust as the spread of 
the bond changes. We use different systematic 
risk factors only to distinguish among credit 
risk coming from different industries. 15 

The results of such an approach to the anal¬ 
ysis of our portfolio are displayed in Table 12, 
which shows the typical industry risk factors 


Table 12 Credit Spread Risk 


Factor Name 


Exposure (DTS) 


Factor 

Volatility 

Return Impact 
Correlated 

Marginal 
Contribution 
to TEV 

Portfolio Benchmark 

Net 

IND Chemicals 

0.00 

0.03 

-0.03 

15.01 

-0.39 

0.43 

IND Metals 

0.00 

0.06 

-0.06 

20.01 

-0.16 

0.23 

IND Paper 

0.00 

0.01 

-0.01 

17.04 

-0.40 

0.49 

IND Capital Goods 

0.00 

0.05 

-0.05 

14.98 

-0.02 

0.02 

IND Div. Manufacturing 

0.00 

0.03 

-0.03 

14.21 

-0.62 

0.64 

IND Auto 

0.00 

0.01 

-0.01 

22.18 

-0.53 

0.85 

IND Consumer Cyclical 

0.10 

0.05 

0.06 

17.05 

-0.26 

0.32 

IND Retail 

0.00 

0.05 

-0.05 

16.95 

0.14 

-0.17 

IND Cons. Non-cyclical 

0.00 

0.13 

-0.13 

14.62 

-0.22 

0.24 

IND Health Care 

0.00 

0.02 

-0.02 

14.07 

0.13 

-0.13 

IND Pharmaceuticals 

0.19 

0.06 

0.12 

15.13 

-0.34 

0.37 

IND Energy 

0.12 

0.20 

-0.07 

16.39 

-0.29 

0.34 

IND Technology 

0.00 

0.06 

-0.06 

15.52 

-0.11 

0.12 

IND Transportation 

0.00 

0.05 

-0.05 

15.09 

-0.26 

0.29 

IND Media Cable 

0.24 

0.06 

0.18 

15.83 

0.51 

-0.58 

IND Media Non-cable 

0.00 

0.04 

-0.04 

15.94 

0.20 

-0.23 

IND Wirelines 

0.09 

0.17 

-0.08 

15.26 

0.41 

-0.45 

IND Wireless 

0.00 

0.03 

-0.03 

14.87 

1.06 

-1.13 

UTI Electric 

0.28 

0.20 

0.08 

15.79 

-0.16 

0.18 

UTI Gas 

0.09 

0.10 

-0.01 

18.51 

-0.41 

0.55 

FIN Banking 

0.88 

0.56 

0.32 

18.61 

1.19 

-1.59 

FIN Brokerage 

0.00 

0.02 

-0.02 

15.90 

1.47 

-1.68 

FIN Finance Companies 

0.08 

0.10 

-0.02 

20.64 

0.68 

-1.01 

FIN Life & Health Insurance 

0.12 

0.11 

0.01 

19.96 

0.58 

-0.84 

FIN P&C Insurance 

0.00 

0.06 

-0.06 

11.76 

0.34 

-0.29 

FIN Reits 

0.14 

0.04 

0.10 

17.68 

0.80 

-1.02 

Non Corporate 

0.06 

0.23 

-0.17 

25.27 

0.28 

-0.50 
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Table 13 Risk per Rating 


Rating 

NMW (%) 



TEV 


Systematic Beta 

Contribution 

Isolated 

Liquidation 

Elasticity (%) 

Total 

0.0 

13.7 

13.7 

-13.7 

1.0 

1.03 

AAA 

-7.2 

10.9 

37.4 

22.2 

0.8 

1.12 

AA1 

-0.3 

-0.2 

1.0 

0.2 

0.0 

0.00 

AA2 

0.2 

0.3 

3.3 

0.1 

0.0 

1.10 

AA3 

-2.3 

-1.3 

6.7 

2.6 

-0.1 

0.00 

At 

-0.5 

0.3 

4.2 

0.4 

0.0 

1.51 

A2 

7.1 

3.6 

11.2 

1.0 

0.3 

0.77 

A3 

4.7 

1.7 

5.8 

-0.5 

0.1 

0.65 

BAA1 

-0.1 

0.3 

3.7 

0.2 

0.0 

1.51 

BAA2 

-3.3 

-2.3 

11.5 

5.9 

-0.2 

0.00 

BAA3 

1.7 

0.3 

7.7 

1.7 

0.0 

0.37 


associated with credit risk. The portfolio has 
net positions in 27 industries, spanning all three 
major sectors: Industrials (IND), Utilities (UTI) 
and Financials (FIN). We saw before that we 
have a significant net exposure to financials in 
terms of market weights (12.2%, see Table 1). 
In terms of risk exposure. Table 12 shows that 
the net DTS exposure to the Banking industry 
is 0.32, clearly the highest across all sectors. 16 
Flowever, the marginal contribution to TEV 
that comes from that industry, although high, 
is comparable to other industries, namely Bro¬ 
kerage, for which the net exposure is close to 
zero. This means that these two industries are 
close substitutes in terms of the current portfo¬ 
lio holdings. Actually, what is very interesting 
is the fact that the marginal contribution is neg¬ 
ative for these industries, even though we are 
significantly overweighting them. The analysis 
suggests that if we increase our risk exposure 
to Banking, our risk would actually decrease. 
This result is again driven by the strong neg¬ 
ative correlation between spreads in financials 
and the yield curve. Therefore, the exposure in 
banking is actually helping hedge out our (more 
risky) long duration position. This kind of anal¬ 
ysis is only possible when you account for the 
correlations across factors. It is of course also 
dependent on the quality of the correlation es¬ 
timations the model has. 

Although the risk factors used to measure 
risk are predetermined in a linear factor model. 


there is extreme flexibility in the way the risk 
numbers can be aggregated and reported. 17 For 
example, as explained above, the risk model 
we use to generate the current risk reports does 
not use credit ratings as drivers of systematic 
credit risk. Instead, it relies on the DTS concept. 
Flowever, once generated, the risk numbers 
can be reported using any portfolio partition. 
As an example. Table 13 shows the risk break¬ 
down by rating. As reported in this table, 
the majority of risk is coming from our AAA 
exposure (10.9 bps/month), the bucket with 
the biggest mismatch in terms of net weight 
(—7.2%). This bucket includes Treasury and 
government-related securities, sectors that are 
underweighted in the portfolio leading to sig¬ 
nificant risk. This is even clearer when we look 
into the isolated TEV numbers. If we had mis¬ 
matches only on AAAs, the risk of our portfolio 
would be 37.4 bps/month, instead of the actual 
13.7: our other exposures (namely the one to 
single As) hedge the risk from AAAs. This 
table also reports the systematic betas asso¬ 
ciated with each of the rating subportfolios. 
These betas add up to the portfolio beta, when 
we use the portfolio weights (not NMW) as 
weights in the summation. Systematic betas of 
zero identify buckets for which the portfolio 
has (close to) no holdings. The table shows that 
a movement of 10 basis points in the benchmark 
leads to a 11.2 basis points return in the AAA 
subcomponent of the portfolio. The beta of 0.37 
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Table 14 MBS (spread) Prepayment Risk 


Factor Name 

Exposure (OASD) 


Factor 

Volatility 

Return Impact 
Correlated 

Marginal 
Contribution 
to TEV 

Portfolio Benchmark 

Net 

MBS New Discount 

0.00 

0.00 

0.00 

36.8 

-1.2 

3.3 

MBS New Current 

0.00 

0.04 

-0.04 

24.5 

-0.3 

0.6 

MBS New Premium 

0.38 

0.59 

-0.21 

29.7 

-0.1 

0.3 

MBS Seasoned Current 

0.00 

0.00 

0.00 

25.5 

-0.6 

1.2 

MBS Seasoned Premium 

0.65 

0.46 

0.19 

29.8 

0.1 

-0.2 

MBS Ginnie Mae 30Y 

0.31 

0.21 

0.10 

6.1 

-0.1 

0.0 

MBS Fannie Mae 15Y 

0.00 

0.11 

-0.11 

15.7 

0.4 

-0.4 

MBS Ginnie Mae 15Y 

0.00 

0.01 

-0.01 

12.3 

0.5 

-0.4 


for the BAA3 component of the portfolio does 
not signal low volatility for this subportfolio. It 
indicates mainly low correlation with the 
benchmark. This is probably due to a larger 
component of idiosyncratic risk for this set of 
bonds. 

Prepayment Risk Securitized products are 
generally exposed to prepayment risk. The 
most common of the securitized products are 
the residential MBS (RMBS or simply MBS). 
These securities represent pools of deals that 
allow the borrower to prepay their debt before 
the maturity of the loan/deal, typically when 
prevailing lending rates are lower. This option 
means an extra risk to the holder of the security, 
the risk of holding cash exactly when reinvest¬ 
ment rates are low. Therefore, these securities 
have two major sources of risk: interest rates 
(including convexity) and prepayment risk. 

Some part of the prepayment risk can be 
expressed as a function of interest rates via 
a prepayment model. This risk will be cap¬ 
tured as part of interest-rate risk using the key 
rate durations and the convexity. These secu¬ 
rities usually have negative convexity because 
usually prepayments increase (decrease) with 
decreasing (increasing) interest rates, thereby 
reducing price appreciation (increase price de¬ 
preciation). The remaining part of prepayment 
risk—that is not captured by the prepayment 
model—must be modeled with additional sys¬ 
tematic risk factors. Typically, the volatility of 
prepayment speeds (and therefore of risk) on 
MBS securities depends on three characteristics: 


program / term of the deal, if the bond is priced 
at discount or premium (e.g., if the coupon on 
the bond is bigger than the current mortgage 
rates) and how seasoned the bond is. This anal¬ 
ysis suggests that the systematic risk factors in 
a risk model should span these three character¬ 
istics of the securities. 

Table 14 shows a potential set of risk factors 
that capture the three characteristics discussed 
above. Programs identified as having differ¬ 
ent prepayment characteristics are the conven¬ 
tional (Fannie Mae) 30-year bonds (the base case 
used for the analysis), the 15-year conventional 
(Fannie Mae) bonds, as well as the Ginnie Mae 
30- and 15-year bonds. The age of bonds is cap¬ 
tured by factors distinguishing between new 
and aged deals. Finally, each bond is also clas¬ 
sified by the price of the security—discount, 
current, or premium. In this example there are 
no seasoned discounted bonds, given the un¬ 
precedented level of mortgage rates as of June 
2010. In terms of risk exposures, the exhibit 
shows that we are currently underweighting 
15-year conventional bonds, and overweight¬ 
ing 30-year Ginnie Mae bonds. 

Interaction between Sources of Risk So far 
we analyzed the major sources of spread risk: 
credit and prepayment. To do this, we con¬ 
veniently used two asset classes—credit and 
agency RMBS, respectively—where one can ar¬ 
gue that these sources of risk appear rela¬ 
tively isolated. However, recent developments 
have made very clear that these sources of 
risk appear simultaneously in other major asset 
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classes, including non-agency RMBS, home eq¬ 
uity loans and CMBS. 18 When designing a risk 
model for a particular asset class, one should be 
able to anticipate the nature of the risks the as¬ 
set class exhibits currently or may encounter in 
the future. The design and ability to segregate 
between these two kinds of risk depends also 
on the richness of the bond indicatives and an¬ 
alytics available to the researcher. For this last 
point, it is imperative that the researcher under¬ 
stands well the pricing model and assumptions 
made to generate the analytics typically used 
as inputs in a risk model. This allows the user 
to fully understand the output of the model, as 
well as its applicability and shortcomings. 

Other Sources of Risk There are other sources 
of systematic risk we did not detail in this sec¬ 
tion. They may be important sources of risk for 
particular portfolios. Specific risk models can 
be designed to address them. We now mention 
some of them briefly. 

Implied Volatility Risk Many fixed income 
securities have embedded options (e.g., callable 
bonds). This means that the expected future 
volatility (implied volatility 19 ) of the interest 
rate or other discount curves used to price the 
security plays a role in the value of that op¬ 
tion. If expected volatility increases, options 
generally become more expensive, affecting the 
prices of bonds with embedded options. For ex¬ 
ample, callable bonds will become cheaper with 
increasing implied volatility since the bond 
holder is short optionality (the right of the is¬ 
suer to call the bond). Therefore, the exposure 
of the portfolio to the implied volatility of the 
yield curve is also a source of risk that should 
be accounted for. The sensitivity of securities 
to changes of implied volatilities is measured 
by vega, which is calculated using the security 
pricing model. Implied volatility factors can be 
either calculated by the market prices of liquid 
fixed income options (caps, floors, and swap¬ 
tions), or implied by the returns of bonds with 
embedded options within each asset class. 


Liquidity Risk Many fixed income securities 
are traded over-the-counter, in decentralized 
markets. Some trade infrequently, making them 
illiquid. It is therefore hard to establish their fair 
price. These bonds are said to be exposed to liq¬ 
uidity risk. The holder of illiquid bonds would 
have to pay a higher price to liquidate its posi¬ 
tion, usually meaning selling at a discount. This 
discount is uncertain and varies across the busi¬ 
ness cycle. For instance, the discount can be sig¬ 
nificant in a liquidity crisis, such as the one we 
experienced in 2008. The uncertainty about this 
discount means that, everything equal, a more 
illiquid bond will be riskier. This extra risk can 
be captured through liquidity risk factors. For 
instance, in the Treasury markets, one generally 
refers to the difference in volatility between an 
on-the-run and an off-the-run Treasury bond as 
liquidity risk. 

Inflation Risk Inflation-linked securities are 
priced based on the expectation of future in¬ 
flation. Uncertainty about this variable adds 
to the volatility of the bond over and above 
the volatility from other sources of risk, such 
as the nominal interest rates. Expected infla¬ 
tion is not an observed variable in the market¬ 
place but can be extracted from the prices on 
inflation-linked government bonds and infla¬ 
tion swaps. Expected inflation risk factors can 
be constructed by summarizing this informa¬ 
tion. The sensitivity of securities to expected in¬ 
flation is calculated using a specialized pricing 
model and is usually called inflation duration. 

Tax-Policy Risk Many municipal securities 
are currently tax-exempt. This results in added 
benefit to their holders. This benefit—incor¬ 
porated in the price of the security—depends 
on the level of exemption allowed. Uncertainty 
around tax policy—tax-policy risk—adds to the 
risk of these securities. Once again, tax-policy 
risk factors cannot be observed in the market¬ 
place and must be extracted from the prices of 
municipal securities. The return of municipal 
securities in excess of interest rates is driven 
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partially by tax-policy expectations changes. 
However, it is also driven by changes in the 
creditworthiness of the municipal issuers as 
well as other factors. In this case it is difficult to 
separate tax-policy risk factors from other fac¬ 
tors driving municipal bond spreads. Therefore, 
instead of specific tax-policy factors we usually 
extract factors representing the overall spread 
risk of municipal securities. This exercise is per¬ 
formed in a similar way to the credit risk model, 
where securities are partitioned into groups of 
"similar" risk by geography, bond-type (gen¬ 
eral obligation versus revenue), tax-status, and 
the like. 20 

Issue-Level Reports 

The previous analysis focused on the system¬ 
atic sources of risk. We now turn our atten¬ 
tion to the idiosyncratic or security-specific risk 
embedded in our portfolio. This risk measures 
the volatility the portfolio has due to news or 
demand-supply imbalances specific to the in¬ 
dividual issues/issuers it holds. Therefore, the 
idiosyncratic risk is independent across issuers 
and diversifies away as the number of issues 
in the portfolio increases: Negative news about 
some issuers is canceled by positive news about 
others. For relatively small portfolios, the id¬ 
iosyncratic risk may be a substantial compo¬ 


nent of the total risk. This can be seen in our 
example, as our portfolio has only 27 issuers. 
Table 6 shows that the idiosyncratic volatility 
of our portfolio is 11.1 bps/month, more than 
twice the idiosyncratic volatility of the bench¬ 
mark (5.6 bps/month). When looking at the 
tracking error volatility net of benchmark. Table 
6 shows that our specific risk is 10.1 bps/month 
and larger than the systematic component (9.3 
bps/month). This means that, typically, a major 
component of the monthly net return is driven 
by events affecting only individual issues or is¬ 
suers. Therefore, monitoring these individual 
exposures is of paramount importance. 

The idiosyncratic risk of each bond is a func¬ 
tion of two variables: its net market weight and 
its idiosyncratic volatility. This last parameter 
depends on the nature of the bond issuer. For in¬ 
stance, a bond from a distressed firm has much 
higher idiosyncratic volatility than one from a 
government-related agency. 

Table 15 provides a summary of the idiosyn¬ 
cratic risk for the top 10 positions by market 
weight in our portfolio. Not surprisingly, our 
top seven holdings are Treasuries and MBS 
securities, in line with the constitution of 
the index we are using as benchmark. More¬ 
over, these positions have significant market 
weights, given that our portfolio contains 


Table 15 Issue Specific Risk 


Identifier 

Ticker 

Description 

Maturity 

Spread 

(bps) 

Market Weight (%) 

Portfolio Net 

Idiosyncratic 

TEV 

912828KF 

US/T 

US Treasury Notes 

2/28/2014 

4 

5.4 

5.2 

0.4 

912828KJ 

US/T 

US Treasury Notes 

3/31/2014 

3 

5.0 

4.8 

0.4 

912828JW 

US/T 

US Treasury Notes 

12/31/2013 

1 

4.7 

4.5 

0.4 

912828KN 

US/T 

US Treasury Notes 

4/30/2014 

2 

3.8 

3.6 

0.3 

FNA04409 

FNMA 

FNMA Conventional 
Long T. 30yr 

3/1/2039 

20 

3.2 

1.1 

0.4 

FGB04409 

FHLMC 

FHLM Gold Guar 
Single F. 30yr 

3/1/2039 

25 

2.7 

1.1 

0.4 

912810FT 

US/T 

US Treasury Bonds 

2/15/2036 

-1 

2.3 

2.1 

0.7 

20029PAG 

CMCSA 

Comcast Cable 
Communication 

5/1/2017 

222 

2.2 

2.2 

2.4 

59018YSU 

BAC 

Merrill Lynch & 

Co. 

2/3/2014 

300 

2.1 

2.1 

2.9 

912828KV 

US/T 

US Treasury Notes 

5/31/2014 

1 

2.1 

1.9 

0.2 
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only 50 positions. Even though we see large 
concentrations, the idiosyncratic TEV for the 
top holdings is small, as they are not exposed 
to significant name risk. The last column of the 
table shows that from this group the largest 
idiosyncratic risk comes from two corporate 
bonds (issued by Comcast Cable Communi¬ 
cation "CMCSA" and Merrill Lynch "BAC"). 
This is not surprising, as these are the type of 
securities with larger event risk. Even within 
corporates, idiosyncratic risk can be quite 
diverse. In particular, it usually depends on the 
industry, duration, and level of distress of the 
issuer (usually proxied by rating, but in our 
model by the spread of the bond). For instance, 
the net position for both the CMCSA and BAC 
bonds is similar (2.2% and 2.1% respectively), 
but even though the maturity of the BAC bond 
is significantly shorter, its spread is higher, de¬ 
livering a higher idiosyncratic risk (2.9 versus 
2.4 bps/month). The fact that BAC is a firm 
from an industry (Financials) that experienced 
significant volatility in the recent past also 
contributes to higher idiosyncratic volatility. To 
manage the idiosyncratic risk in the portfolio 
one should pay particular attention to mis¬ 
matches between the portfolio and benchmark 
for bonds with large spreads or long durations. 
These would tend to affect disproportionably 
the idiosyncratic risk of the portfolio. 

Although important, the information in 
Table 15 is not enough to fully assess the 
idiosyncratic risk embedded in the portfolio. 
For instance, one could buy credit protection to 
BAC through a credit default swap (CDS). In 
this case, our exposure to this issuer may not 
be significant, even though, taken separately, 
the position reported in this exhibit is relevant. 
More generally, idiosyncratic risk is indepen¬ 
dent across issuers, but what happens within 
a particular issuer? A good risk model should 
have the ability to account for the fact that the 
idiosyncratic risk of two securities from the 
same issuer is correlated, as they are both sub¬ 
ject to the same company-specific events. This 
is especially the case for corporates and emerg¬ 


ing market securities. Moreover, it is important 
to note that the correlation between issues from 
the same issuer is not constant either. For an 
issuer in financial distress, all claims to their 
assets (bonds, equities, convertibles, etc.) tend 
to move together, in the absence of specific cir¬ 
cumstances. This means that the idiosyncratic 
correlation between issues from that issuer 
should be high. Therefore, adding more issues 
from that issuer to the portfolio does not deliver 
additional diversification. On the other end, 
securities from firms that enjoy very strong 
financial wealth can move quite differently, 
driven by liquidity or other factors. In this case, 
one can have some diversification of idiosyn¬ 
cratic risk (although limited) even when adding 
issues from that same issuer into the portfolio. 

To help us understand the net effect of all 
these points, we need to know the issuers that 
contribute the most to idiosyncratic risk. When 
aggregating risk from the issue (as shown in 
Table 15) to the issuer level, the correlations re¬ 
ferred to above should be fully taken into ac¬ 
count. Table 16 shows the results of this exercise 
for the 10 issuers with the highest idiosyncratic 
TEV. Our riskiest exposure comes from Johnson 
& Johnson (JNJ), with 3.7 bps/month of issuer 
risk. We can also observe that idiosyncratic TEV 
is not monotonic in the NMW: We have JNJ and 
President & Fellows of Harvard "HARVRD" 
with the same NMW, but the former is sig¬ 
nificantly more risky (3.7 versus 2.0 bps/ 
month). It is possible to have important issuer 
risk even for names we do not have in our port¬ 
folio, if they have significant market weight in 
the benchmark. Finally, note that because the id¬ 
iosyncratic risk across issuers is independent, 
we can easily calculate the cumulative risk of 
several issuers. For example, the total idiosyn¬ 
cratic risk of the first two issuers is given by 

TEVf£ 0 J+D = x/3.7 2 + 2.8 2 = 4.6 

Another important interpretation from 
Table 16 is that these are our biggest name 
exposures in our portfolio. In this case, we are 
overweight in all of them. Therefore, we should 
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Table 16 Issuer Specific Risk 


Ticker 

Name 

Sector 

NMW (%) 

Idiosyncratic TEV 

INI 

Johnson & Johnson 

Pharmaceuticals 

2.0 

3.7 

D 

Dominion Resources Inc 

Electric 

1.8 

2.8 

CMCSA 

Comcast Cable Communication 

Media cable 

2.0 

2.1 

BBT 

BB&T Corporation 

Banking 

2.0 

2.1 

HARVRD 

Pres&Fellows of Harvard 

Industrial other 

2.0 

2.0 

AXP 

American Express Credit 

Banking 

1.7 

1.8 

MS 

Morgan Stanley Dean Witter 

Banking 

1.3 

1.7 

C 

Citigroup Inc 

Banking 

1.5 

1.7 

BAC 

Merrill Lynch & Co. 

Banking 

1.6 

1.6 

RBS 

Charter One Bank Fsb 

Banking 

1.6 

1.4 


not have negative views about any of them. 
If this is not the case, then we are assuming 
an unintended name risk. This risk should be 
promptly taken out of the portfolio, in favor 
of another issuer with similar characteristics 
and for which we do not have negative views 
about. This interactive exercise can easily be 
performed with a good and flexible optimizer. 

Scenario Analysis Report 

Scenario analysis is another useful way to gain 
additional perspective on the portfolio's risk. 
There are many ways to perform this exer¬ 
cise. For instance, one may want to reprice the 
whole portfolio under a particular interest rate 
or spread scenario, and look at the hypothet¬ 
ical return under that scenario. Alternatively, 
one may look at the holdings of the portfolio 


and see how they would have performed under 
particular stressed historical scenarios (e.g., the 
1987 equity crash or the Asian crisis in 1997). 
One particular problem with this approach is 
the fact that, given the dynamic nature of the se¬ 
curities, the current portfolio did not exist with 
the current characteristics along all these histor¬ 
ical episodes. A solution may be to try to price 
the current securities with the market variables 
at the time. Another solution is to represent the 
current portfolio as the set of loadings to all sys¬ 
tematic risk factors in the factor risk model. We 
can then multiply these loadings by the histor¬ 
ical realizations of the risk factors. The result is 
a set of historical systematic simulated returns. 
Figure 1 presents these returns for our portfolio 
over the last five years. As expected, the largest 
volatility came with the crisis of 2008, when 
the portfolio registered returns between —200 


Basis Points Basis Points 



1 Portfolio Return -Net Return (right axis) 


Figure 1 Historical Systematic Simulated Returns (basis points) 










286 


Factor Models for Portfolio Construction 


and +300 basis points. The largest underper¬ 
formance against the benchmark appeared in 
September 2008, followed by the largest out- 
performance two months after, both at around 
20 basis points. 

This analysis has some limitations, especially 
for the portfolio under consideration, where id¬ 
iosyncratic exposure is a major source of risk. 
This kind of risk is always very hard to pin 
down and obviously less relevant from an his¬ 
torical perspective, as the issuers in our current 
portfolio may have not witnessed any particu¬ 
lar major idiosyncratic event in the past. How¬ 
ever, these and other kinds of historical scenario 
analysis are very important, as they give us 
some indication of the magnitude of historical 
returns our portfolio might have encountered. 
They are usually the starting point for any stress 
testing. The researcher should always comple¬ 
ment these with other nonhistorical scenarios 
relevant for the particular portfolio under anal¬ 
ysis. One way to use the risk model to express 
such scenarios is discussed in the following 
section. 


APPLICATIONS OF RISK 
MODELING 

In this section, we illustrate several risk model 
applications typically employed for portfolio 
management. All applications make use of 
the fact that the risk model translates into 
a common, comparable set of numbers the 
imbalances the portfolio may have across 
many different dimensions. In some of the 
applications —risk budgeting and portfolio 
rebalancing—an optimizer that uses the risk 
model as an input is the optimal setting to 
perform the exercise. 

Portfolio Construction and 
Risk Budgeting 

Portfolio managers can be divided broadly into 
indexers (those that measure their returns rela¬ 


tive to a benchmark index) and absolute return 
managers (typically hedge fund managers). In 
between stand the enhanced indexers we in¬ 
troduced previously in the entry. All are typi¬ 
cally subject to a risk budget that prescribes how 
much risk they are allowed to take to achieve 
their objectives: Minimize transaction costs and 
match the index returns for the pure indexers, 
maximize the net return for the enhanced index¬ 
ers, or maximize absolute returns for absolute 
return managers. In any of these cases, the man¬ 
ager has to merge all her views and constraints 
into a final portfolio. When constructing the 
portfolio, how can she manage the competing 
views, while respecting the risk budget? How 
can the views be combined to minimize the risk? 
What trade-offs can be made? Many different 
techniques can be used to structure portfolios 
in accordance with the manager's views. In par¬ 
ticular, risk models are widely used to perform 
this exercise. They perform this task in a simple 
and objective manner: They can measure how 
risky each view is and how correlated they are. 
The manager can then compare the risk with the 
expected return of each of the views and decide 
on the optimal allocation across her views. 

An example of a portfolio construction exer¬ 
cise using the risk model is the one we per¬ 
formed to construct the portfolio analyzed in 
the previous section. 21 Figure 2 shows the ex¬ 
act problem we asked the optimizer to solve. 
We start the problem by defining an initial 
portfolio (empty in our case) and a tradable 
universe—the set of securities we allow the op¬ 
timizer to buy or sell from. In our case, this is the 
Barclays US Aggregate index with issues hav¬ 
ing at least $300 million of amount outstanding 
(in the Tradable Universe Options pane of the 
POINT® Optimizer window shown in Figure 
2). The selection of this universe allows us to 
avoid having small issues in our portfolio, po¬ 
tentially increasing its liquidity. Pertaining to 
the risk model use (in the Objectives pane of 
the POINT® Optimizer window), the objective 
function used in the problem is to minimize 
Total TEV. This means that we are giving 
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Tradable Universe Options 


No. 

Name 

Type 

| Trade/Buy/Sell 

Long/Short 

► 1 

Initial Portfolio 

H Initial Portfolio 

Buy and Sell 

Long/Short 

2 

US Agg 300 Min (System) 

Index 

Buy and Sell 

Long Only 






XZUJCLLIVCi 




(* Minimize C Maximize 




No. 

Attribute 

| Measure 

Weight 

Unit 

1 

Total TEV 

Net vs Bmark 

1.00 bps / mo 

2 

Systematic TEV 

Net vs Bmark 

0.00 bps / mo 

3 

Idiosyncratic TEV 

Net vs Bmark 

0.00 bps / mo 

+ lx| 

71 


Commc 

Final Pc 




rtf olio Cash (base currency): f"’ Long/Short C Long Only C Short Only (* No Cash 


E- . 

Description 

Measure [Bound 

Unit 

0 

Budget: Final Portfolio Market Value 

Change 100j 000,000 

USD 

0 

Final portfolio maximum gross size 

Target 

USD 

0 

Turnover: Maximum gross size of trades 

Target 

USD 

0 

Maximum number of securities in final portfolio 



0 

Maximum number of trades 

50 


0 

Minimum trade size 

Target 

USD 









No, 

Soft 

... | Attribute Universe 

| Measure 

| Lower Bound Upper Bound | Unit 

► 1 

□ 

OAD Final Portfolio 

Net vs Bmark 

0.25 0.30 yrs 

2 

□ 

OA5 Final Portfolio 

Net vs Bmark 

100.0 150.0 bps / yr 


-Constraints on each Issue/Issuer/Ticker 
Universe Final Portfolio 


Universe_For Each j Measure| Lower Bound | Upper Bound | Unit 


Soft | Penalty ' Attribute 


► 1 


0 □] 


Market Value [%] Final Portfolio Ticker Met vs Bmark 


Figure 2 Portfolio Construction Optimization Setup in the POINT® Optimizer 


leeway to the risk model to choose a portfo¬ 
lio from the tradeable universe that minimizes 
the risk relative to the benchmark, in our case 
the Barclays US Aggregate index. In the Com¬ 
mon Constraints pane, additional generic con¬ 
straints have been imposed: a $100 million final 
portfolio with a maximum number of 50 se¬ 
curities. In the Constraints on values aggre¬ 
gated by Buckets pane, we force the optimizer 
to tilt our portfolio to respect the portfolio 
manager's views: long duration against the 
benchmark between 0.25 and 0.30 years and 
spreads between 100 and 150 bps higher than 
the benchmark. In the Constraints on each Is¬ 
sue/Issuer/Ticker pane, we impose a maxi¬ 
mum under-/overweight of 2% per issuer, to 
ensure proper diversification. 22 The character¬ 
istics of the portfolio resulting from this opti¬ 
mization problem were extensively analyzed in 
the previous section. 


Portfolio Rebalancing 

Managers need to rebalance their portfolios reg¬ 
ularly. For instance, as time goes by, the char¬ 
acteristics of the portfolio may drift away from 
targeted levels. This may be due to the aging 
of its holdings, market moves, or issuer-specific 
events such as downgrades or defaults. The pe¬ 
riodic re-alignment of a portfolio to its invest¬ 
ment guidelines is called portfolio rebalancing. 
Similar needs arise in many different contexts: 
when managers receive extra cash to invest, 
get small changes to their mandates, want to 
tilt their views, and the like. Similar to port¬ 
folio construction, a risk model is very use¬ 
ful in the rebalancing exercise. During rebal¬ 
ancing, the portfolio manager typically seeks 
to sell bonds currently held and replace them 
with others having properties more consistent 
with the overall portfolio goals. Such buy and 
sell transactions are costly, and their cost must 
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Final Portfolio Cash (base currency); ^ Long/Short C Long Only 

C 5hort Only (• No Cash 



E- 

Description 

Measure |Bound 


Unit 

0 

Budget; Final Portfolio Market Value 

Change 

0 

USD 

0 

Final portfolio maximum gross size 

Target 


USD 

0 

Turnover; Maximum gross size of trades 

Target 


USD 

0 

0 

0 

Maximum number of securities in final portfolio 

Maximum number of trades 

Minimum trade size 

Target 

10 

lj 000,000 

USD 


Constraints on values aggregated by Buckets 


No. 

Soft 

... Attribute 

Universe 

Measure 

Lower Bound Upper Bound 

Unit 

► 1 

□ 

OAD 

Final Portfolio 

Net vs Bmark 

0.25 

0.30 

yrs 

2 

□ 

OAS 

Final Portfolio 

Net vs Bmark 

100.0 

150.0 

bps / yx 

3 

□ 

Market Value [%] 

Financial Inst. Banki., 

Net vs Bmark 

0.00000 

5.00000 

% 


Figure 3 Portfolio Rebalancing Optimization Setup in the POINT® Optimizer 


be weighted against the benefit from moving 
the portfolio closer to its initial specifications. 
A risk model can tell the manager how much 
risk reduction (or increase) a particular set of 
transactions can achieve so that she can evalu¬ 
ate the risk adjustment benefits relative to the 
transaction cost. 

As an example, suppose our portfolio man¬ 
ager wants to tone down the heavy overweight 
she has on banking. She wants to cap that over¬ 
weight to 5% and wants to do it with no more 
than 10 trades. Finally, assume she wants no 
change to the market value of the final portfo¬ 
lio. We can use a setup similar to that of Figure 
2, but adjusting some of the constraints. Fig¬ 
ure 3 shows two of the constraints option panes 
in the POINT® Optimizer window, changed to 
allow for the new constraints. Specifically, in 
the first panel, we allow for 10 trades and, in 
the second, included an extra constraint for the 
banking industry. 

Table 17 shows the trading list suggested 
by the POINT® Optimizer. Not surprisingly, 
the biggest sells are of financial companies. 
To replace them, the optimizer—using the risk 
model—recommends more holdings of Trea¬ 
sury and corporate bonds. (We need these last 
to keep the net yield of the portfolio high.) 
Remember that we concluded that our finan¬ 
cial holdings were highly correlated with Trea¬ 
suries, so the proposed swap is not surprising. 

Interestingly, the extra constraint imposed on 
the optimization problem did not materially 


change the risk of the portfolio. Results show 
that the risk actually decreased to around 
13 bps/month. This is due to the extra three 
positions added to the portfolio that now has 
53 securities. These extra securities allowed the 
portfolio to reduce both its systematic as well 
as its idiosyncratic risk. 

Scenario Analysis 

As described in the previous section, scenario 
analysis is a very popular tool both for risk 
management and portfolio construction. In this 
section, we illustrate another way to construct 
scenarios, this time using the covariance matrix 
of the risk model. In this context, users express 
views on the returns of particular financial vari¬ 
ables, indexes, securities, or risk factors, and the 
scenario analysis tool (using the risk model) 
calculates their impact on the portfolio's (net) 
return. 

Typically in this kind of scenario analysis, 
the views one has are only partial views. This 
means we can have specific views on how 
particular macro variables, asset classes, or risk 
factors will behave; but we hardly have views 
on all risk factors the portfolio under analysis 
is exposed to. This is when risk models may be 
useful. At the heart of the linear factor models 
described in this entry is a set of risk factors 
and the covariance matrix between them. They 
are being increasingly used in the context 
of scenario analysis as a way to "complete" 
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Table 17 Proposed Trading List 


BUYS 

Identifier 

Description 

Position Amount 

Market Value 

912828KV 

US Treasury Notes 

967,403 

1,000,000 

126650BK 

CVS Corp-Global 

1,696,069 

1,518,408 

98385XAJ 

XTO Energy Inc 

2,097,746 

2,508,567 

FNA05009 

FNMA Conventional Long T. 30yr 

2,547,359 

2,708,258 

912828KF 

US Treasury Notes 

3,786,070 

3,882,263 

Total 



11,617,497 

SELLS 

Identifier 

Description 

Position Amount 

Market Value 

16132NAV 

Charter One Bank FSB 

-3,229,847 

-3,370,981 

05531FAF 

BB&T Corporation 

-2,425,413 

-2,499,505 

0258M0BZ 

American Express Credit 

-2,021,013 

-2,208,231 

3133XN4B 

Federal Home Loan Bank 

-1,818,417 

-2,085,812 

740816AB 

Pres&Fellows of Harvard 

-1,281,616 

-1,452,968 

Total 



-11,617,497 


specific partial views or scenarios, delivering 
a full picture of the impact of the scenario in 
the return of the portfolio. Mechanically, what 
happens is the following: First, one translates 
the views into realizations of a subset of risk 
factors. Then the scenario is completed—using 
the risk model covariance matrix—to get the 
realizations of all risk factors. Finally, the 
portfolio's (net) loadings to all risk factors 
are used to get its (net) return under that 
scenario (by multiplying the loadings by the 
factor realizations under the scenario). This 
construction implies a set of assumptions that 
should be carefully understood. For instance, 
we assume that we can represent or translate 
our views as risk factor returns. So, if we have a 
view about the unemployment rate, and this is 
not a risk factor, 23 we cannot use this procedure 
to test our scenario. Also, to "complete" the 
scenario, we generally assume a stationary and 
normal multivariate distribution between all 
factors. These assumptions make this analysis 
less appropriate for looking at extreme events 
or regime shifts, for instance. But the analysis 
can be very useful in many circumstances. 

As an example, consider using the scenario 
analysis to compute the model-implied empir¬ 


ical durations (MED) of the portfolio we an¬ 
alyzed in detail previously in this entry. To 
do this, we express our views as changes in 
the curve factors. In our risk model, these 
are represented by the six key rate factors il¬ 
lustrated in Table 10. In particular, to calcu¬ 
late the model-implied empirical duration, we 
are going to assume that all six decrease by 
25 bps/month, broadly in line with our 
managers' views. 

Panel a of Table 18 shows that under this 
scenario, the portfolio returns 99 basis points, 
against the 93 of the benchmark. As expected 
given our longer duration, we outperform the 
benchmark. Due to the other exposures present 
in the portfolio and benchmark (e.g., spreads) 
and their average negative correlation with 
the curve factors, the duration implied by the 
scenario (MED) for our portfolio is only 3.96 
(= 99/25) against the analytical 4.55. The sce¬ 
nario shows a similar decrease in the bench¬ 
mark's duration. 

Another characteristic imposed while con¬ 
structing the portfolio was a targeted higher 
spread. As shown in Table 2, this resulted 
in an OAS for the portfolio of 157 bps 
against the 57 of the benchmark. It would be 
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Table 18 Spread Analysis 


a. Analytical and Model-Implied Durations 


Durations 


Universe Return under Scenario (bp) 

MED (scenario) 

Analytical 

Portfolio 99 

3.96 

4.55 

Benchmark 93 

3.72 

4.30 

b. Spread Contraction of 10% 


Restriction on YC movement 

Universe 

No Movement 

Correlated 

Portfolio 

31 

-3 

Benchmark 

32 

0 


interesting to evaluate the impact to the port¬ 
folio (net) return of a credit spread contraction 
of 10%. The portfolio is long spread duration 
(net OASD = 0.11, see Table 2), so we may 
expect our portfolio to outperform in this sce¬ 
nario. To do so, we analyze the results under 
two spread contraction scenarios: imposing no 
change in the yield curve (that is, an unchanged 
yield curve is part of the view) or allowing 
this change to be implied by the correlation 
matrix. (That is, the change in the yield curve 
is not part of the scenario. We have no views 
about it, but we allow it to change in a way 
historically consistent with our spread view.) 
Contrary to what one might expect, panel b 
of Table 18 shows that the effect in the net 
return is minimal under both scenarios. The 
higher spreads deliver no return advantage un¬ 
der this scenario. However, the absolute returns 
are quite different across the scenarios. When 
one allows the rates to move in a correlated 
fashion the net return drops close to zero: All 
positive return from the spread contraction is 
cancelled by the probable increase in the level 
of the curve and our long-duration exposure. 

These very simple examples illustrate how 
one can look at reasonable scenarios to study the 
behavior of the portfolio or the benchmark un¬ 
der different environments. This scenario anal¬ 
ysis does increase significantly the intuition the 
portfolio manager may have regarding the re¬ 
sults from the risk model. 


KEY POINTS 

• Risk models describe the different imbalances 
of a portfolio using a common language. The 
imbalances are combined into a consistent 
and coherent analysis reported by the risk 
model. 

• Risk models provide important insights re¬ 
garding the different trade-offs existing in the 
portfolio. They provide guidance regarding 
how to balance them. 

• Risk models in fixed income are unique in two 
different ways: First, the existence of good 
pricing models allows us to robustly calculate 
important analytics regarding the securities. 
These analytics can be used confidently as in¬ 
puts into a risk model. Second, returns are not 
typically used directly to calibrate risk factors. 
Instead returns are first normalized into more 
invariant series (e.g., returns normalized by 
the duration of the bond). 

• The fundamental systematic risk of all fixed 
income securities is interest rate and term 
structure risk. This is captured by factors rep¬ 
resenting risk-free rates and swap spreads of 
various maturities. 

• Excess (of interest rates) systematic risk is cap¬ 
tured by factors specific to each asset class. 
The most important components of such risk 
are credit risk and prepayment risk. Other 
risk factors that can be important are implied 
volatility, liquidity, inflation, and tax policy. 
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• Idiosyncratic risk is diversified away in large 
portfolios and indices but can become a very 
significant component of the total risk in small 
portfolios. The correlation of idiosyncratic 
risk of securities of the same issuer is nonzero 
and must be modeled very carefully 

• A good risk model provides detailed infor¬ 
mation about the exposures of a complex 
portfolio and can be a valuable tool for port¬ 
folio construction and management. It can 
help managers construct portfolios tracking a 
particular benchmark, express views subject 
to a given risk budget, and rebalance a port¬ 
folio while avoiding excessive transaction 
costs. Further, by identifying the exposures 
where the portfolio has the highest risk sensi¬ 
tivity it can help a portfolio manager reduce 
(or increase) risk in the most effective way. 


NOTES 

1. The Barclays Global Risk Model is available 
through POINT®, Barclays portfolio man¬ 
agement tool. It is a multi-currency cross¬ 
asset model that covers many different 
asset classes across the fixed income and eq¬ 
uity markets, including derivatives in these 
markets. At the heart of the model is a co- 
variance matrix of risk factors. The model 
has more than 700 factors, many specific 
to a particular asset class. The asset class 
models are periodically reviewed. Struc¬ 
ture is imposed to increase the robustness 
of the estimation of such large covariance 
matrix. The model is estimated from his¬ 
torical data. It is calibrated using exten¬ 
sive security-level historical data and is 
updated on a monthly basis. 

2. Later in this entry, we refer to this risk num¬ 
ber as Isolated TEV. 

3. We arrive at this number by taking the 
square root of the sum of squares of all the 
numbers in the table: 10.9 = (8.5 2 + 1.7 2 + 
3.0 2 + 5.1 2 + 3.0 2 ) 0 5 . Moreover, this number 
would represent the total systematic risk of 


the portfolio. This definition is developed 
later in the entry. 

4. In this example, we focus only on the sys¬ 
tematic component of risk. Later, the nor¬ 
malization is with respect to the total risk of 
the portfolio, including idiosyncratic risk. 

5. For example, see Table 13 later in this entry. 

6. Note that the contribution numbers are 
different from those from Table 5 be¬ 
cause there we reported the contribution to 
systematic—not total—risk. 

7. This result does not contradict the findings 
in Table 7, where we see that curve is the ma¬ 
jor source of risk. Remember that the curve 
risk can come from our corporate subport¬ 
folio. 

8. Other curves that can be used are, for 
instance, the municipals (tax free) curve, 
derivatives-based curves, and the like. 

9. This number is obtained by simply mul¬ 
tiplying the net exposure by the factor 
volatility. The sign of the move depends on 
the interpretation of the factor. In the case 
of the yield curve movements we know 
that R = -KRD x A KR. In our example 
-(-0.36) x 44.2 = 15.9. 

10. This reversal is clearly related to the fact that 
the 10-year and the 20-year points in the 
curve are usually highly correlated. In our 
case, our short position on the 10-year point 
is more than compensated by the positive 
exposure in the 20-year. Netting out, the 20- 
year effect (long duration) dominates when 
all changes are taken in a correlated fashion. 

11. The marginal contribution is the derivative 
of the TEV with respect to the loading of 
each factor, so its interpretation holds only 
locally. Therefore, a more realistic reading 
may be that if we reduce the exposure to 
the 10-year by 0.1 years, the TEV would be 
reduced by around 1.6 basis points. 

12. This is a rationale very similar to the one 
used before, where we see all correlated im¬ 
pacts with the same sign. 

13. Spreads are also compensation for sources 
of risk other than credit (e.g., liquidity), but 
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for the sake of our argument, we treat them 
primarily as major indicators of credit risk. 

14. For details, see Ben Dor et al. (2010). 

15. The general principle of a risk model is that 
the historical returns of assets contain in¬ 
formation that can be used to estimate the 
future volatility of portfolio returns. How¬ 
ever, good risk models must have the abil¬ 
ity to interpret the historical asset returns 
in the context of the current environment. 
This translation is made when designing a 
particular risk model/factor and delivers 
risk factors that are as invariant as possi¬ 
ble. This invariance makes the estimation 
of the factor distribution much more robust. 
In the particular case of the DTS, by includ¬ 
ing the spread in the loading (instead of 
using only the typical spread duration), we 
change the nature of the risk factor being 
estimated. The factor now represents per¬ 
centage change in spreads, instead of ab¬ 
solute changes in spreads. The former has 
a significantly more invariant distribution. 
For more details, see Silva (2009a). 

16. The DTS units used in the report are based 
on an OASD stated in years and an OAS in 
percentage points. Therefore, a bond with 
an OASD = 5 and an OAS = 200 basis points 
would have a DTS of 5 x 2= 10. The DTS 
industry exposures are the weighted sum of 
the DTS of each of the securities in that in¬ 
dustry, the weights being the market weight 
of each security. 

17. For a detailed methodology on how to 
perform this customized analysis, Silva 
(2009b). 


18. For a further discussion, see Gabudean 
(2009). 

19. The volatility is called implied because it is 
calculated from the market prices of liquid 
options with the help of an option-pricing 
model. 

20. For more discussion, see Staal (2009). 

21. The example is constructed using the 
POINT® Optimizer. For more details, 
refer to Kumar (2010). 

22. Another way to ensure diversification 
would be to include the minimization of 
the idiosyncratic TEV as a specific goal in 
the objective function. 

23. Unemployment rate is not used as a fac¬ 
tor in most short- and medium-term risk 
models. 
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Abstract: Financial econometrics is the econometrics of financial markets. It is a quest for models 
that describe financial time series such as prices, returns, interest rates, financial ratios, defaults, and 
so on. The economic equivalent of the laws of physics, econometrics represents the quantitative, 
mathematical laws of economics. The development of a quantitative, mathematical approach to 
economics started at the end of the 19th century in a period of great enthusiasm for the achievements 
of science and technology. Robert Engle and Clive Granger, two econometricians who shared 
the 2003 Nobel Prize in Economics Sciences, have contributed greatly to the field of financial 
econometrics. 


Econometrics is the branch of economics that 
draws heavily on statistics for testing and an¬ 
alyzing economic relationships. Within econo¬ 
metrics, there are theoretical econometricians 
who analyze statistical properties of estima¬ 
tors of models. Several recipients of the Nobel 
Prize in Economic Sciences received the award 
as a result of their lifetime contribution to this 
branch of economics. To appreciate the impor¬ 
tance of econometrics to the discipline of eco¬ 
nomics, when the first Nobel Prize in Economic 
Sciences was awarded in 1969, the co-recipients 
were two econometricians, Jan Tinbergen and 
Ragnar Frisch (who is credited with first us¬ 


ing the term "econometrics" in the sense that it 
is known today). Further specialization within 
econometrics, and the subject of this entry, is 
financial econometrics. 

As Jianqing Fan (2004) writes, financial econo¬ 
metrics uses statistical techniques and eco¬ 
nomic theory to address a variety of problems 
from finance. These include building financial 
models, estimation and inferences of financial 
models, volatility estimation, risk management, 
testing financial economics theory, capital asset 
pricing, derivative pricing, portfolio allocation, 
risk-adjusted returns, simulating financial sys¬ 
tems, and hedging strategies, among others. 
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In this entry, we provide an overview of finan¬ 
cial econometrics and the methods employed. 


THE DATA GENERATING 
PROCESS 

The basic principles for formulating quanti¬ 
tative laws in financial econometrics are the 
same as those that have characterized the de¬ 
velopment of quantitative science over the last 
four centuries. We write mathematical models, 
that is, relationships between different variables 
and / or variables in different moments and dif¬ 
ferent places. The basic tenet of quantitative sci¬ 
ence is that there are relationships that do not 
change regardless of the moment or the place 
under consideration. For example, while sea 
waves might look like an almost random move¬ 
ment, in every moment and location the basic 
laws of hydrodynamics hold without change. 
Similarly, asset price behavior might appear 
to be random, but econometric laws should 
hold in every moment and for every set of 
assets. 

There are similarities between financial eco¬ 
nometric models and models of the physical sci¬ 
ences, but there are also important differences. 
The physical sciences aim at finding immutable 
laws of nature; econometric models model the 
economy or financial markets—artifacts subject 
to change. For example, financial markets in the 
form of stock exchanges have been in opera¬ 
tion for two centuries. During this period, they 
have changed significantly both in the number 
of stocks listed and the type of trading. And 
the information available on transactions has 
also changed. Consider that in the 1950s, we 
had access only to daily closing prices and this 
typically the day after; now we have instanta¬ 
neous information on every single transaction. 
Because the economy and financial markets are 
artifacts subject to change, econometric models 
are not unique representations valid through¬ 
out time; they must adapt to the changing en¬ 
vironment. 


While basic physical laws are expressed as 
differential equations, financial econometrics 
uses both continuous-time and discrete-time 
models. For example, continuous-time models 
are used in modeling derivatives where both 
the underlying and the derivative price are 
represented by stochastic (i.e., random) differ¬ 
ential equations. In order to solve stochastic 
differential equations with computerized 
numerical methods, derivatives are replaced 
with finite differences. (Note that the stochastic 
nature of differential equations introduces 
fundamental mathematical complications. The 
definition of stochastic differential equations 
is a delicate mathematical process invented, 
independently, by the mathematicians Ito 
and Stratonovich. In the Ito-Stratonovich 
definition, the path of a stochastic differential 
equation is not the solution of a corresponding 
differential equation. Flowever, the numerical 
solution procedure yields a discrete model 
that holds pathwise. See Focardi and Fabozzi 
[2004].) This process of discretization of time 
yields discrete time models. Flowever, discrete 
time models used in financial econometrics 
are not necessarily the result of a process of 
discretization of continuous time models. 

Let's focus on models in discrete time, the 
bread-and-butter of econometric models used 
in asset management. There are two types 
of discrete-time models: static and dynamic. 
Static models involve different variables at the 
same time. The well-known capital asset pricing 
model (CAPM), for example, is a static model. 
Dynamic models involve one or more variables 
at two or more moments. (This is true in dis¬ 
crete time. In continuous time, a dynamic model 
might involve variables and their derivatives at 
the same time.) Momentum models, for exam¬ 
ple, are dynamic models. 

In a dynamic model, the mathematical rela¬ 
tionship between variables at different times 
is called the data generating process (DGP). This 
terminology reflects the fact that, if we know 
the DGP of a process, we can simulate the pro¬ 
cess recursively, starting from initial conditions. 
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Consider the time series of a stock price p t , that 
is, the series formed with the prices of that stock 
taken at fixed points in time, say daily. Let's now 
write a simple econometric model of the prices 
of a stock as follows: 

Pt +1 = h + p p t + £f+l (1) 

This model tells us that if we consider any 
time f+1, the price of that stock at time f+1 is 
equal to a constant plus the price in the pre¬ 
vious moment t multiplied by p plus a zero- 
mean random disturbance independent from 
the past, which always has the same statistical 
characteristics. If we want to apply this model to 
real-world price processes, the constants // and 
p must be estimated. The parameter p. deter¬ 
mines the trend and p defines the dependence 
between the prices. Typically p is less than but 
close to 1. A random disturbance of the type 
shown in the above equation is called a white 
noise. 

If we know the initial price po at time t = 0, 
using a computer program to generate random 
numbers, we can simulate a path of the price 
process with the following recursive equations: 

Pl = /X + p p 0 + £i 

P2 = M + P pi + £2 

That is, we can compute the price at time 
t = 1 from the initial price po and a computer¬ 
generated random number s\ and then use this 
new price to compute the price at time t = 2, 
and so on. The e, are independent and iden¬ 
tically distributed random variables with zero 
mean. Typical choices for the distribution of s 
are normal distribution, f-distribution, and sta¬ 
ble non-Gaussian distribution. The distribution 
parameters are estimated from the sample. 

It is clear that if we have a DGP we can 
generate any path. An econometric model that 
involves two or more different times can be 
regarded as a DGP. However, there is a more 
general way of looking at econometric models 
that encompasses both static and dynamic 
models. That is, we can look at econometric 
models from a perspective other than that of the 


recursive generation of stochastic paths. In fact, 
we can rewrite our previous model as follows: 

Pf+i -H- pp t = £(+i (2) 

This formulation shows that, if we consider 
any two consecutive instants of time, there is 
a combination of prices that behave as random 
noise. More in general, an econometric model 
can be regarded as a mathematical device that 
reconstructs a noise sequence from empirical 
data. 

This concept is visualized in Figure 1, which 
shows a time series of numbers pt generated 
by a computer program according to the rule 
given by (2) with p = 0.9 and p = 1 and the 
corresponding time series et . If we choose any 
pair of consecutive points in time, say (f+l,f), 
the difference pt+i - p - p pi is always equal to 
the series et+i- For example, consider the points 
pi 3 = 10.2918, pu = 12.4065. The difference 
pu ~ 0.9/5,3 - 1 = 2.1439 has the same value 
as £14. If we move to a different pair we ob¬ 
tain the same result, that is, if we compute 
p t +i - 1 - 0.9 p t , the result will always be the 
noise sequence £f+i. 

To help intuition, imagine that our model is 
a test instrument: Probing our time series with 
our test instrument, we always obtain the same 
reading. Actually, what we obtain is not a con¬ 
stant reading but a random reading with mean 
zero and fixed statistical characteristics. The ob¬ 
jective of financial econometrics is to find pos¬ 
sibly simple expressions of different financial 
variables such as prices, returns, or financial ra¬ 
tios in different moments that always yield, as 
a result, a zero-mean random disturbance. 

Static models (i.e., models that involve only 
one instant) are used to express relationships 
between different variables at any given time. 
Static models are used, for example, to deter¬ 
mine exposure to different risk factors. How¬ 
ever, because they involve only one instant, 
static models cannot be used to make forecasts; 
forecasting requires models that link variables 
in two or more instants in time. 
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Figure 1 DGP and Noise Terms 


FINANCIAL ECONOMETRICS 
AT WORK 

Applying financial econometrics involves three 
key steps: (1) model selection, (2) model estimation, 
and (3) model testing. 

In the first step, model selection, the mod¬ 
eler chooses (or might write ex novo) a family 
of models with given statistical properties. This 
entails the mathematical analysis of the model 
properties as well as economic theory to jus¬ 
tify the model choice. It is in this step that the 
modeler decides to use, for example, regression 
on financial ratios or other variables to model 
returns. 

In general, models include a number of free 
parameters that have to be estimated from sam¬ 
ple data, the second step in applying financial 
econometrics. Suppose that we have decided 
to model returns with a regression model. This 
requires the estimation of the regression coef¬ 
ficients, performed using historical data. Esti¬ 
mation provides the link between reality and 
models. As econometric models are probabilis¬ 
tic models, any model can in principle describe 
our empirical data. We choose a family of mod¬ 
els in the model selection phase and then de¬ 


termine the optimal model in the estimation 
phase. 

As mentioned, model selection and estima¬ 
tion are performed on historical data. As mod¬ 
els are adapted (or fitted) to historical data 
there is always the risk that the fitting process 
captures ephemeral features of the data. Thus 
there is the need to test the models on data 
different from the data on which the models 
were estimated. This is the third step in ap¬ 
plying financial econometrics, model testing. 
We assess the performance of models on fresh 
data. 

We can take a different approach to 
model selection and estimation, namely sta¬ 
tistical learning. Statistical learning combines 
the two steps—model selection and model 
estimation—insofar as it makes use of a class of 
universal models that can fit any data. Neural 
networks are an example of universal models. 
The critical step in the statistical learning ap¬ 
proach is estimation. This calls for methods to 
restrict model complexity (i.e., the number of 
parameters used in a model). 

Within this basic scheme for applying finan¬ 
cial econometrics, we can now identify a num¬ 
ber of modeling issues, such as: 
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• How do we apply statistics given that there is 
only one realization of financial series? 

• Given a sample of historical data, how do we 
choose between linear and nonlinear models, 
or the different distributional assumptions or 
different levels of model complexity? 

• Can we exploit more data using, for example, 
high-frequency data? 

• How can we make our models more robust, 
reducing model risk? 

• How do we measure not only model perfor¬ 
mance but also the ability to realize profits? 

Implications of Empirical Series 
with Only One Realization 

As mentioned, econometric models are proba¬ 
bilistic models: Variables are random variables 
characterized by a probability distribution. 
Generally speaking, probability concepts can¬ 
not be applied to single "individuals" (at least, 
not if we use a frequentist concept of proba¬ 
bility). Probabilistic models describe "popula¬ 
tions" formed by many individuals. However, 
empirical financial time series have only one 
realization. For example, there is only one his¬ 
torical series of prices for each stock—and we 
have only one price at each instant of time. This 
makes problematic the application of probabil¬ 
ity concepts. How, for example, can we mean¬ 
ingfully discuss the distribution of prices at a 
specific time given that there is only one price 
observation? Applying probability concepts to 
perform estimation and testing would require 
populations made up of multiple time series 
and samples made up of different time series 
that can be considered a random draw from 
some distribution. 

As each financial time series is unique, the so¬ 
lution is to look at the single elements of the time 
series as the individuals of our population. For 
example, because there is only one realization 
of each stock's price time series, we have to look 
at the price of each stock at different moments. 
However, the price of a stock (or of any other 
asset) at different moments is not a random in¬ 


dependent sample. For example, it makes little 
sense to consider the distribution of the prices of 
a single stock in different moments because the 
level of prices typically changes over time. Our 
initial time series of financial quantities must be 
transformed; that is, a unique time series must 
be transformed into populations of individu¬ 
als to which statistical methods can be applied. 
This holds not only for prices but for any other 
financial variable. 

Econometrics includes transformations of 
the above type as well as tests to verify that 
the transformation has obtained the desired 
result. The DGP is the most important of these 
transformations. Recall that we can interpret a 
DGP as a method for transforming a time series 
into a sequence of noise terms. The DGP, as we 
have seen, constructs a sequence of random 
disturbances starting from the original series; 
it allows one to go backwards and infer the 
statistical properties of the series from the noise 
terms and the DGP. However, these properties 
cannot be tested independently. 

The DGP is not the only transformation that 
allows statistical estimates. Differencing time 
series, for example, is a process that may trans¬ 
form nonstationary time series into stationary time 
series. A stationary time series has a constant 
mean that, under specific assumptions, can be 
estimated as an empirical average. 

Determining the Model 

As we have seen, econometric models are math¬ 
ematical relationships between different vari¬ 
ables at different times. An important question 
is whether these relationships are linear or non¬ 
linear. Consider that every econometric model 
is an approximation. Thus the question is: 
Which approximation—linear or nonlinear—is 
better? 

To answer this, it is generally necessary to con¬ 
sider jointly the linearity of models, the distri¬ 
butional assumptions, and the number of time 
lags to introduce. The simplest models are lin¬ 
ear models with a small number of lags under 
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the assumption that variables are normal vari¬ 
ables. A widely used example of normal linear 
models are regression models where returns are 
linearly regressed on lagged factors under the 
assumption that noise terms are normally dis¬ 
tributed. A model of this type can be written 
as: 

r t+i — P ft + £f+ 1 (3) 

where r t are the returns at time f and ft are 
factors, that is, economic or financial variables. 
Given the linearity of the model, if factors and 
noise are jointly normally distributed, returns 
are also normally distributed. 

However, the distribution of returns, at least 
at some time horizons, is not normal. If we pos¬ 
tulate a nonlinear relationship between factors 
and returns, normally distributed factors yield 
a non-normal return distribution. However, 
we can maintain the linearity of the regression 
relationship but assume a non-normal distribu¬ 
tion of noise terms and factors. Thus nonlinear 
models transform normally distributed noise 
into non-normal variables but it is not true that 
non-normal distributions of variables implies 
nonlinear models. 

If we add lags (i.e., a time space backward), 
the above model becomes sensitive to the shape 
of the factor paths. For example, a regression 
model with two lags will behave differently if 
the factor is going up or down. Adding lags 
makes models more flexible but more brittle. In 
general, the optimal number of lags is dictated 
not only by the complexity of the patterns that 
we want to model but also by the number of 
points in our sample. If sample data are abun¬ 
dant, we can estimate a rich model. 

Typically there is a trade-off between model 
flexibility and the size of the data sample. By 
adding time lags and nonlinearities, we make 
our models more flexible, but the demands in 
terms of estimation data are greater. An opti¬ 
mal compromise has to be made between the 
flexibility given by nonlinear models and/or 
multiple lags and the limitations due to the size 
of the data sample. 


TIME HORIZON OF MODELS 

There are trade-offs between model flexibility 
and precision that depend on the size of sam¬ 
ple data. To expand our sample data, we would 
like to use data with small time spacing in order 
to multiply the number of available samples. 
High-frequency data or HFD (i.e., data on indi¬ 
vidual transactions) have the highest possible 
frequency (i.e., each individual transaction) and 
are irregularly spaced. To give an idea of the ra¬ 
tio in terms of numbers, consider that there are 
approximately 2,100 ticks per day for the me¬ 
dian stock in the Russell 3000 (see Falkenberry, 
2002). Thus the size of the HDF data set of one 
day for a typical stock in the Russell 3000 is 
2,100 times larger than the size of closing data 
for the same day! 

In order to exploit all available data, we 
would like to adopt models that work over 
time intervals of the order of minutes and, from 
these models, compute the behavior of finan¬ 
cial quantities over longer periods. Given the 
number of available sample data at high fre¬ 
quency, we could write much more precise laws 
than those established using longer time inter¬ 
vals. Note that the need to compute solutions 
over forecasting horizons much longer than the 
time spacing is a general problem that applies 
at any time interval. For example, in asset allo¬ 
cation we need to understand the behavior of fi¬ 
nancial quantities over long time horizons. The 
question we need to ask is if models estimated 
using daily intervals can correctly capture the 
process dynamics over longer periods, such as 
years. 

It is not necessarily true that models estimated 
on short time intervals, say minutes, offer bet¬ 
ter forecasts at longer time horizons than mod¬ 
els estimated on longer time intervals, say days. 
This is because financial variables might have 
a complex short-term dynamics superimposed 
on a long-term dynamics. It might be that us¬ 
ing high-frequency data, one captures the short¬ 
term dynamics without any improvement in 
the estimation of the long-term dynamics. That 
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is, with high-frequency data it might be that 
models get more complex (and thus more data- 
hungry) because they describe short-term be¬ 
havior superimposed on long-term behavior. 
This possibility must be resolved for each class 
of models. 

Another question is if it is possible to use the 
same model at different time horizons. To do so 
is to imply that the behavior of financial quan¬ 
tities is similar at different time horizons. This 
conjecture was first made by Benoit Mandel¬ 
brot (1963), who observed that long series of 
cotton prices were very similar at different time 
aggregations. 

Model Risk and Model Robustness 

Not only are econometric models probabilis¬ 
tic models, as we have already noted; they are 
only approximate models. That is, the probabil¬ 
ity distributions themselves are only approxi¬ 
mate and uncertain. The theory of model risk and 
model robustness assumes that all parameters of 
a model are subject to uncertainty and attempts 
to determine the consequence of model uncer¬ 
tainty and strategies for mitigating errors. 

The growing use of models in finance has 
heightened the attention to model risk and 
model-risk mitigation techniques. Asset man¬ 
agement firms are beginning to address the 
need to implement methodologies that allow 
both robust estimation and robust optimization 
in the portfolio management process. 

Performance Measurement 
of Models 

It is not always easy to understand ex ante just 
how well (or how poorly) a forecasting model 
will perform. Because performance evaluations 
made on training data are not reliable, the eval¬ 
uation of model performance requires separate 
data sets for training and for testing. Models 
are estimated on training data and tested on 
the test data. Poor performance might be due to 


model misspecification, that is, models might 
not reflect the true DGP of the data (assuming 
one exists), or there might simply be no DGP. 

Various measures of model performance have 
been proposed. For example, one can compute 
the correlation coefficient between the fore¬ 
casted variables and their actual realizations. 
Each performance measure is a single number 
and therefore conveys only one aspect of the 
forecasting performance. Often it is crucial to 
understand if errors can become individually 
very large or if they might be correlated. Note 
that a simple measure of model performance 
does not ensure the profitability of strategies. 
This can be due to a number of reasons, includ¬ 
ing, for example, the risk inherent in appar¬ 
ently profitable forecasts, market impact, and 
transaction costs. 

APPLICATIONS 

There has been a greater use of econometric 
models in investment management since the 
turn of the century. Application areas include: 

• Portfolio construction and optimization 

• Risk management 

• Asset and liability management 

Each type of application requires different 
modeling approaches. 

Portfolio Construction and 
Optimization 

Portfolio construction and optimization require 
models to forecast returns: There is no way to es¬ 
cape the need to predict future returns. Passive 
strategies apparently eschew the need to fore¬ 
cast future returns of individual stocks by in¬ 
vesting in broad indexes. They effectively shift 
the need to forecast to a higher level of analysis 
and to longer time horizons. 

Until recently, the mainstream view was 
that financial econometric models could per¬ 
form dynamic forecasts of volatility but not of 
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expected returns. However, volatility forecasts 
are rarely used in portfolio management. With 
the exception of some proprietary applications, 
the most sophisticated models used in portfolio 
construction until recently were factor models 
where forecasts are not dynamic but consist in 
estimating a drift (i.e., a constant trend) plus a 
variance-covariance matrix. 

Since the late 1990s, the possibility of mak¬ 
ing dynamic forecasts of both volatility and 
expected returns has gained broad acceptance. 
During the same period, it became more widely 
recognized that returns are not normally dis¬ 
tributed, evidence that had been reported by 
Mandelbrot (1963). Higher moments of dis¬ 
tributions are therefore important in portfolio 
management and therefore require representa¬ 
tion and estimation of nonnormal distributions. 

As observed above, the ability to correctly 
forecast expected returns does not imply, per se, 
that there are profit opportunities. In fact, we 
have to take into consideration the interplay be¬ 
tween expected returns, higher moments, and 
transaction costs. As dynamic forecasts typi¬ 
cally involve higher portfolio turnover, transac¬ 
tion costs might wipe out profits. As a general 
comment, portfolio management based on dy¬ 
namic forecasts calls for a more sophisticated 
framework for optimization and risk manage¬ 
ment with respect to portfolio management 
based on static forecasts. 

Regression models appear to form the core of 
the modeling efforts to predict future returns 
at many asset management firms. Regression 
models regress returns on a number of pre¬ 
dictors. Stated otherwise, future returns are a 
function of the value of present and past pre¬ 
dictors. Predictors include financial ratios such 
as eaming-to-price ratio or book-to-price ratio 
and other fundamental quantities; predictors 
might also include behavioral variables such 
as market sentiment. A typical formula of a 
regressive model is the following: 

h',f+i = a; + Pij fj,t + £;,t+i (4) 

;'=i 


where 

f’l.f+l - Pi,t 
rU " ~ l\, 

is the return at time f+1 of the z-th asset, and the 
fit are factors observed at time t. While regres¬ 
sions are generally linear, nonlinear models are 
also used. 

In general, the forecasting horizon in asset 
management varies from a few days for actively 
managed or hedge funds to several weeks for 
more traditionally managed funds. Dynamic 
models typically have a short forecasting hori¬ 
zon as they capture short-term dynamics. This 
contrasts with static models, such as the widely 
used multifactor models, which tend to cap¬ 
ture long-term trends and ignore short-term 
dynamics. 

The evolution of forecasting models over the 
last two decades has also changed the way fore¬ 
casts are used. A basic utilization of forecasts is 
in stock picking/ranking systems, which have 
been widely implemented at asset management 
firms. The portfolio manager builds his or her 
portfolio combining the model ranking with the 
manager's personal views and within the con¬ 
straints established by the firm. A drawback 
in using such an approach is the difficulty in 
properly considering the structure of correla¬ 
tions and the role of higher moments. 

Alternatively, forecasts can be fed to an opti¬ 
mizer that automatically computes the portfolio 
weights. But because an optimizer implements 
an optimal trade-off between returns and some 
measure of risk, the forecasting model must 
produce not only returns forecasts but also mea¬ 
sures of risk. If risk is measured by portfolio 
variance or standard deviation, the forecasting 
model must be able to provide an estimated 
variance-covariance matrix. 

Estimating the variance-covariance matrix is 
the most delicate of the estimation tasks. Here 
is why. The number of entries of a variance- 
covariance matrix grows with the square of the 
number of stocks. As a consequence, the num¬ 
ber of entries in a variance-covariance matrix 
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rapidly becomes very large. For example, the 
variance-covariance matrix of the stocks in the 
S&P 500 is a symmetric matrix that includes 
some 125,000 entries. If our universe were the 
Russell 5000, the variance-covariance matrix 
would include more than 12 million entries. 
The problem with estimating matrices of this 
size is that estimates are very noisy because the 
number of sample data is close to the number 
of parameters to estimate. For example, if we 
use three years of data for estimation, we have, 
on average, less than three data points per es¬ 
timated entry in the case of the S&P 500; in the 
case of the Russell 5000, the number of data 
points would be one fourth of the number of 
entries to estimate! Robust estimation methods 
are called for. 

Note that if we use forecasting mod¬ 
els we typically have (1) an equilibrium 
variance-covariance matrix that represents 
the covariances of the long-run relationships 
between variables, plus (2) a short-term, 
time-dependent, variance-covariance matrix. If 
returns are not normally distributed, optimizers 
might require the matrix of higher moments. 

A third utilization of forecasting models and 
optimizers is to construct model portfolios. In 
other words, the output of the optimizer is used 
to construct not an actual but a model portfolio. 
This model portfolio is used as input by portfo¬ 
lio managers. 


Risk Management 

Risk management has different meanings in dif¬ 
ferent contexts. In particular, when optimiza¬ 
tion is used, risk management is intrinsic to the 
optimization process, itself a risk-return trade¬ 
off optimization. In this case, risk management 
is an integral part of the portfolio construction 
process. 

Flowever, in most cases, the process of con¬ 
structing portfolios is entrusted to human port¬ 
folio managers who might use various inputs 
including, as noted above, ranking systems 


or model portfolios. In these cases, portfolios 
might not be optimal from the point of view 
of risk management and it is therefore neces¬ 
sary to ensure independent risk oversight. This 
oversight might take various forms. One form is 
similar to the type of risk oversight adopted by 
banks. The objective is to assess potential devi¬ 
ations from expectations. In order to perform 
this task, the risk manager receives as input 
the composition of portfolios and makes return 
projections using static forecasting models. 

Another form of risk oversight, perhaps 
the most diffused in portfolio management, 
assesses portfolio exposures to specific risk 
factors. As portfolio management is often 
performed relative to a benchmark and risk is 
defined as underperformance relative to the 
benchmark, it is important to understand the 
sensitivity of portfolios to different risk factors. 
This type of risk oversight does not entail the 
forecasting of returns. The risk manager uses 
various statistical techniques to estimate how 
portfolios move in function of different risk fac¬ 
tors. In most cases, linear regressions are used. 
A typical model will have the following form: 

S 

r i,t — Pij f j,t + &i,t (5) 

;=1 

where 


is the return observed at time f of the z'-th as¬ 
set, and the fjj are factors observed at time f. 
Note that this model is fundamentally different 
from a regressive model with time lags as given 
by (4). 

Asset-Liability Management 

Asset-liability management (ALM) is typical of 
those asset management applications that re¬ 
quire the optimization of portfolio returns at 
some fixed time horizon plus a stream of con¬ 
sumption throughout the entire life of the port¬ 
folio. ALM is important for managing portfolios 
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of institutional investors such as pension funds 
or foundations. It is also important for wealth 
management, where the objective is to cover 
the investor's financial needs over an extended 
period. 

ALM requires forecasting models able to cap¬ 
ture the asset behavior at short-, medium-, and 
long-term time horizons. Models of the long¬ 
term behavior of assets exist but are clearly diffi¬ 
cult to test. Important questions related to these 
long-term forecasting models include: 

• Do asset prices periodically revert to one or 
many common trends in the long run? 

• Can we assume that the common trends (if 
they exist) are deterministic trends such as 
exponentials or are common trends stochastic 
(i.e., random) processes? 

• Can we recognize regime shifts over long pe¬ 
riods of time? 


KEY POINTS 

• Financial econometrics employs the same ba¬ 
sic principles for formulating quantitative 
laws that characterized the development of 
quantitative science. 

• Although there are similarities between fi¬ 
nancial econometric models and models of 
the physical sciences, important differences 
exist. For example, physical sciences seek 
immutable laws of nature, while economet¬ 
ric models model the economy or finan¬ 
cial markets, which are artifacts subject to 
change. 

• Econometric models are mathematical rela¬ 
tionships between different variables at dif¬ 
ferent times, and every econometric model is 
an approximation. 

• Both continuous-time and discrete-time mod¬ 
els are used in financial econometrics. 

• Static models express relationships between 
different variables at any given time. Because 


they involve only one instant in time, static 
models cannot be used to make forecasts since 
to do so models that link variables in two or 
more instants in time are required. 

• Dynamic models involve one or more vari¬ 
ables at two or more points in time; the data 
generating process in dynamic models is the 
mathematical relationship between variables 
at different times. 

• Applying financial econometrics involves 
three key steps: (1) model selection, (2) model 
estimation, and (3) model testing. 

• In model selection, the modeler selects the 
model based on an assessment of a model's 
properties and its fit to economic theory. 

• Estimation provides the link between reality 
and models. In model estimation, the mod¬ 
eler applies financial econometric techniques 
to estimate the model's free parameters from 
sample data. 

• The evaluation of model performance re¬ 
quires separate data sets for training and 
for testing because performance evaluations 
made on training data are not reliable. 

• In investment management there has been 
increased use of econometric models in 
portfolio construction and optimization, 
risk management, and asset and liability 
management. A different modeling approach 
is needed for each type of application. 
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Abstract: The tools of financial econometrics play an important role in financial model building. The 
most basic tool is in financial econometrics is regression analysis. The purpose in regression analysis 
is to estimate the relationship between a random variable and one or more independent variables. To 
understand and apply regression analysis one must understand the theory and the methodologies 
for estimating the parameters of the regression model. Moreover, when the assumptions underlying 
the model are violated, it is necessary to know how to remedy the problem. 


Our first basic tool in econometrics is regression 
analysis. In regression analysis, we estimate the 
relationship between a random variable Y and 
one or more variables X, . The variables X, can be 
either deterministic variables or random vari¬ 
ables. The variable Y is said to be the depen¬ 
dent variable because its value is assumed to 
be dependent on the value of the X,'s. The X,'s 
are referred to as the independent variables, re¬ 
gressor variables, or explanatory variables. Our 
primary focus is on the linear regression model. 
We will be more precise about what we mean 
by a "linear" regression model later in this en¬ 
try. Let's begin with a discussion of the concept 
of dependence. 


THE CONCEPT OF 
DEPENDENCE 

Regressions are about dependence between vari¬ 
ables. In this section we provide a brief discus¬ 
sion of how dependence is represented in both a 
deterministic setting and a probabilistic setting. 
In a deterministic setting, the concept of depen¬ 
dence is embodied in the mathematical notion 
of function. A function is a correspondence be¬ 
tween the individuals of a given domain A and 
the individuals of a given range B. In particular, 
numerical functions establish a correspondence 
between numbers in a domain A and numbers 
in a range B. 
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In quantitative science, we work with vari¬ 
ables obtained through a process of observation 
or measurement. For example, price is the ob¬ 
servation of a transaction, time is the reading of 
a clock, position is determined with measure¬ 
ments of the coordinates, and so on. In quan¬ 
titative science, we are interested in numerical 
functions y —f(x ... ,x n ) that link the results of 
measurements so that by measuring the inde¬ 
pendent variables {x \,..., x n ) we can predict the 
value of the dependent variable y. Being the re¬ 
sults of measurements, variables are themselves 
functions that link a set S2 of unobserved "states 
of the world" to observations. Different states of 
the world result in different values for the vari¬ 
ables but the link among the variables remains 
constant. For example, a column of mercury in 
a thermometer is a physical object that can be in 
different "states." If we measure the length and 
the temperature of the column (in steady condi¬ 
tions), we observe that the two measurements 
are linked by a well-defined (approximately lin¬ 
ear) function. Thus, by measuring the length, 
we can predict the temperature. 

In order to model uncertainty, we keep the 
logical structure of variables as real-valued 
functions defined on a set D of unknown states 
of the world. However, we add to the set D the 
structure of a probability space. A probability 
space is a triple formed by a set of individuals 
(the states of the world), a structure of events, 
and a probability function: (D, 3, P). Random 
variables represent measurements as in the de¬ 
terministic case, but with the addition of a prob¬ 
ability structure that represents uncertainty. In 
financial econometrics, a "state of the world" 
should be intended as a complete history of the 
underlying economy, not as an instantaneous 
state. 

Our objective is to represent dependence be¬ 
tween random variables, as we did in the deter¬ 
ministic case, so that we can infer the value of 
one variable from the measurement of the other. 
In particular, we want to infer the future values 
of variables from present and past observations. 
The probabilistic structure offers different pos¬ 
sibilities. For simplicity, let's consider only two 


variables X and Y; our reasoning extends im¬ 
mediately to multiple variables. The first case 
of interest is the case when the dependent vari¬ 
able Y is a random variable while the indepen¬ 
dent variable X is deterministic. This situation 
is typical of an experimental setting where we 
can fix the conditions of the experiment while 
the outcome of the experiment is uncertain. 

In this case, the dependent variable Y has to 
be thought of as a family of random variables 
Y x , all defined on the same probability space 
(Q, 3, P), indexed with the independent vari¬ 
able x. Dependence means that the probability 
distribution of the dependent random variable 
depends on the value of the deterministic in¬ 
dependent value. To represent this dependence 
we use the notation F(y\x) to emphasize the fact 
that x enters as a parameter in the distribution. 
An obvious example is the dependence of a 
price random variable on a time variable in a 
stochastic price process. 

In this setting, where the independent vari¬ 
able is deterministic, the distributions F(y\x) can 
be arbitrarily defined. Important for the discus¬ 
sion of linear regressions in this entry is the 
case when the shape of the distribution F(y\x) 
remains constant and only the mean of the dis¬ 
tribution changes as a function of x. 

Consider now the case where both X and Y 
are random variables. For example, Y might be 
the uncertain price of IBM stock tomorrow and 
X the uncertain level of the S&P 500 tomor¬ 
row. One way to express the link between these 
two variables is through their joint distribution 
F(x,y ) and, if it exists, their joint density f(x,y). 
We define the joint and marginal distributions 
as follows: 

FxyW y) = P(X<x,Y< y), F x (x) = P(X < x), 

Fy(y) = P(Y < y) 

+oo +oo 

Fxy(x, y) = // /(*, y)dxdy 

—CO —CO 

X —OO X / —OO \ 
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—OO —OO —OO Voo / 

X 

= J fx(u)du 
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Fy(x) = // f(x, v)dxdv = II f (y, v)dx 


—oo Voo 




A 

J fv(v)dv 


We will also use the short notation: 


f x (x) = f(x), f Y (y) = /(y), /x|Y(x|y) 

= /(*ly)< /yix(yl*) = f(y\x) 

Given a joint density /(x,y), we can also rep¬ 
resent the functional link between the two vari¬ 
ables as the dependence of the distribution of 
one variable on the value assumed by the other 
variable. In fact, we can write the joint density 
/(x,y) as the product of two factors, the condi¬ 
tional density/(y | x) and the marginal density 
fx(x): 

fix, y) = /(y|x)/ x (x) (l) 

This factorization —that is, expressing a joint 
density as a product of a marginal density and 
a conditional density—is the conceptual basis 
of financial econometrics. There are significant 
differences in cases where both variables X and 
Y are random variables, compared to the case 
where the variable X is deterministic. First, as 
both variables are uncertain, we cannot fix the 
value of one variable as if it were independent. 
We have to adopt a framework of conditioning 
where our knowledge of one variable influences 
our knowledge of the other variable. 

The impossibility of making experiments is 
a major issue in econometrics. In the physical 
sciences, the ability to create the desired ex¬ 
perimental setting allows the scientist to isolate 
the effects of single variables. The experimenter 
tries to create an environment where the effects 
of variables other than those under study are 
minimized. In economics, however, all the vari¬ 
ables change together and cannot be controlled. 
Back in the 1950s, there were serious doubts 
that econometrics was possible. In fact, it was 
believed that estimation required the indepen¬ 
dence of samples while economic samples are 
never independent. 


However, the framework of conditioning ad¬ 
dresses this problem. After conditioning, the 
joint densities of a process are factorized into 
initial and conditional densities that behave 
as independent distributions. An econometric 
model is a probe that extracts independent 
samples—the noise terms—from highly depen¬ 
dent variables. 

Let's briefly see, at the heuristic level, how 
conditioning works. Suppose we learn that the 
random variable X has the value x, that is, 
X = x. Recall that X is a random variable that 
is a real-valued function defined over the set 
£2. If we know that X = x, we do not know the 
present state of the world but we do know that 
it must be in the subspace (w e X(&>) = x). 
We call (Y|X = x) the variable Y defined on this 
subspace. If we let x vary, we create a family 
of random variables defined on the family of 
subspaces (a> e £2: X(w) = x) and indexed by the 
value assumed by the variable X. 

It can be demonstrated that the sets (w e fl: 
X(&>) = x) can be given a structure of probability 
space, that the variables (Y|X = x) are indeed 
random variables on these probability spaces, 
and that they have (if they exist) the conditional 
densities: 

for/x(x) > 0. In the discrete setting we can write 

/ (y|x) = P(Y = y|X = x) 

fix , y) — P(X = x, Y = y) 

The conditional expectation E[Y|X = x] is the 
expectation of the variable (Y|X = x). Con¬ 
sider the previous example of the IBM stock 
price tomorrow and of the S&P 500 level to¬ 
morrow. Both variables have unconditional ex¬ 
pectations. These are the expectations of IBM's 
stock tomorrow and of S&P 500's level tomor¬ 
row considering every possible state of the 
world. However, we might be interested in 
computing the expected value of IBM's stock 
price tomorrow if we know S&P 500's value to¬ 
morrow. This is the case if, for example, we are 
creating scenarios based on S&P 500's value. 
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If we know the level of the S&P 500, we do not 
know the present state of the world but we do 
know the subset of states of the world in which 
the present state of the world is. If we only know 
the value of the S&P 500, IBM's stock price is not 
known because it is different in each state that 
belongs to this restricted set. IBM's stock price 
is a random variable on this restricted space and 
we can compute its expected value. 

If we consider a discrete setting, that is, if 
we consider only a discrete set of possible IBM 
stock prices and S&P 500 values, then the com¬ 
putation of the conditional expectation can be 
performed using the standard definition of con¬ 
ditional probability. In particular, the condi¬ 
tional expectation of a random variable Y given 
the event B is equal to the unconditional expec¬ 
tation of the variable Y set to zero outside of B 
and divided by the probability of B: £[Y|B] = 
E[1 b Y]/P(B), where lg is the indicator function 
of the set B, equal to 1 for all elements of B, zero 
elsewhere. Thus, in this example, 

E[IBM stock price | S&P 500 value = s] 

= £ [1 (s&p 500 valuers)(IBM stock price)]/ 

P(S&P500 value = s) 

However, in a continuous-state setting there is 
a fundamental difficulty: The set of states of 
the world corresponding to any given value of 
the S&P 500 has probability zero; therefore we 
cannot normalize dividing by P(B). As a conse¬ 
quence we cannot use the standard definition 
of conditional probability to compute directly 
the conditional expectation. 

To overcome this difficulty, we define the 
conditional expectation indirectly, using only 
unconditional expectations. We define the con¬ 
ditional expectation of IBM's stock price given 
the S&P 500 level as that variable that has the 
same unconditional expectation as IBM's stock 
price on each set that can be identified by for the 
value of the S&P 500. This is a random variable 
which is uniquely defined for each state of the 
world up to a set of probability zero. 


If the conditional density exists, conditional 
expectation is computed as follows: 

+oo 

E[Y\X = x]= f yf(y\x)dy (3) 

— OO 

We know from probability theory that the lazv 
of iterated expectations holds 

E[E[Y\X = x]] = E[Y] (4) 

and that the following relationship also holds 

E[XY] = £[XE[Y|X]] (5) 

Rigorously proving all these results requires 

a considerable body of mathematics and the 
rather difficult language and notation of a-alge¬ 
bras. However, the key ideas should be suffi¬ 
ciently clear. 

What is the bearing of the above on the dis¬ 
cussion of regressions in this entry? Regressions 
have a twofold nature: They can be either (1) the 
representation of dependence in terms of con¬ 
ditional expectations and conditional distribu¬ 
tions or (2) the representation of dependence of 
random variables on deterministic parameters. 
The above discussion clarifies the probabilistic 
meaning of both. 


REGRESSIONS AND LINEAR 
MODELS 

In this section we discuss regressions and, in 
particular, linear regressions. 

Case Where All Regressors Are 
Random Variables 

Let's start our discussion of regression with the 
case where all regressors are random variables. 
Given a set of random variables X = (Y, X \,..., 
X N )\ with a joint probability density/(y, X \,..., 
Xm), consider the conditional expectation of Y 
given the other variables (Xj,..., X N )': 

Y = E[Y\X 1 ,...,X n ] 
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As we saw in the previous section, the condi¬ 
tional expectation is a random variable. We can 
therefore consider the residual: 

e = Y-E[Y\X 1 ,...,X n ] 

The residual is another random variable de¬ 
fined over the set Q. We can rewrite the above 
equation as a regression equation: 

Y=E[Y\Xi,...,X N ] + e (6) 

The deterministic function y = ip( z) where 

y = (p(z) = E[Y\X 1 = z-i,... , X N = z N ] (7) 

is called the regression function. 

The following properties of regression equations 
hold. 

Property 1. The conditional mean of the residual is 
zero: £[e|Xi,..., X;v] = 0. In fact, taking con¬ 
ditional expectations on both sides of equation 
(7), we can write 


equation (7) by Xj,..., X N and taking expecta¬ 
tions. Note however, that the residuals are not 
necessarily independent of the regressor X. 

If the regression function is linear, we can 
write the following linear regression equation: 

N 

Y =a+J2 b i x i+ e ( 8 ) 

1=1 

and the following linear regression function: 

N 

y =«+ biXi ( 9 ) 

i=l 

The rest of this entry deals with linear regres¬ 
sions. If the vector Z = (Y, X\, ..., X^)' is jointly 
normally distributed, then the regression func¬ 
tion is linear. To see this, partition z, the vector 
of means q, and the covariance matrix E con¬ 
formably in the following way: 



E[Y|X 1i ...,X n ] = E[E[Y|X 1 ,...,X n ] 
|X 1 ,...,X N ] + £[e|X 1 ,...,X N ] 

Because 

£[£[Y|X 1 ,...,X n ]]X 1 ,...,X n ] 

= £[Y|Xj,..., X N ] 

is a property that follows from the law of 
iterated expectations, we can conclude that 
£[ £ |X 1 ,...,X n ] = 0. 

Property 2. The unconditional mean of the residual 
is zero: £[e] = 0. This property follows immedi¬ 
ately from the multivariate formulation of the 
law of iterated expectations (4): E[E[Y|Xi,..., 
X;v]] = E[Y]. In fact, taking expectation of both 
sides of equation (7) we can write 

E[Y] = E[E[Y|X 1 ,...,X N ]] + E[e] 

hence £[e] = 0. 

Property 3: The residuals are uncorrelated with 
the variables X\, ..., X.y: E[eX] = 0. This follows 
from equation (6) by multiplying both sides of 


2 _ ( a yy °*y \ 

\ °y\ ^xx / 

where // is the vector of means and E is the co- 
variance matrix. It can be demonstrated that the 
conditional density (Y|X = x) has the following 
expression: 

(Y|X = x) ~ N(a + P'x, a 2 ) (10) 


where 


P — Y t t <7x1/ 

« = (11) 
CT = a yij — CT yx^ xx CTxy 

The regression function can be written as fol¬ 
lows: 


N 

y = a + p x, or explicitly: y — a + ^ f J >, x, 

i=i 

( 12 ) 

The normal distribution is not the only 
joint distribution that yields linear regressions. 
Spherical and elliptical distributions also yield 
linear regressions. Spherical distributions ex¬ 
tend the multivariate normal distribution N(0,T) 
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(i.e., the joint distribution of independent nor¬ 
mal variables). Spherical distributions are char¬ 
acterized by the property that their density is 
constant on a sphere, so that their joint density 
can be written as 


f(x 1, ■. ■, x N ) = g(x\ H - h xh) 


for some function g. 

Spherical distributions have the property that 
their marginal distributions are uncorrelated 
but not independent, and can be viewed as 
multivariate normal random variables, with a 
random covariance matrix. An example of a 
spherical distribution used in financial econo¬ 
metrics is the multivariate f-distribution with 
m degrees of freedom, whose density has the 
following form: 


f(X!,...,X N ) = C 


1 H- (x^ + • - 

m 


+ x n) 


The multivariate f-distribution is important 
in econometrics for several reasons. First, 
some sampling distributions are actually a t- 
distribution entries. Second, the f-distribution 
proved to be an adequate description of fat¬ 
tailed error terms in some econometrics models 
(although not as good as the stable Paretian dis¬ 
tribution). 

Elliptical distributions generalize the mul¬ 
tivariate normal distribution Af(0,S). (See 
Bradley and Taqqu [2003].) Because they are 
constant on an ellipsoid, their joint density can 
be written as 


fix) = g((x - h)'£(x - b)), x' = (*1, ...,x N ) 

where p is a vector of constants and I is a 
strictly positive-definite matrix. Spherical dis¬ 
tributions are a subset of elliptical distributions. 
Conditional distributions and linear combina¬ 
tions of elliptical distributions are also elliptical. 

The fact that elliptical distributions yield lin¬ 
ear regressions is closely related to the fact that 
the linear correlation coefficient is a meaning¬ 
ful measure of dependence only for elliptical 
distributions. There are distributions that do 
not factorize as linear regressions. The linear 


correlation coefficient is not a meaningful mea¬ 
sure of dependence for these distributions. The 
copula function of a given random vector X = 
(Xj,..., X N )' completely describes the depen¬ 
dence structure of the joint distribution of ran¬ 
dom variables X;, i = 1,..., N. (See Embrechts, 
McNeil, and Straumann [2002].) 

Linear Models and Linear 
Regressions 

Let's now discuss the relationship between lin¬ 
ear regressions and linear models. In applied 
work, we are given a set of multivariate data 
that we want to explain through a model of 
their dependence. Suppose we want to explain 
the data through a linear model of the type: 

N 

Y = a+ y ' Pi X; + e 
;=i 

We might know from theoretical reasoning 
that linear models are appropriate or we might 
want to try a linear approximation to nonlin¬ 
ear models. A linear model such as the above 
is not, per se, a linear regression unless we 
apply appropriate constraints. In fact, linear re¬ 
gressions must satisfy the three properties men¬ 
tioned above. We call linear regressions linear 
models of the above type that satisfy the follow¬ 
ing set of assumptions such that 

N 

« + J> Xi 

i =1 

is the conditional expectation of Y. 

Assumption 1. The conditional mean of the resid¬ 
ual is zero: E[e|Xi,..., X N ]. 

Assumption 2. The unconditional mean of the 
residual is zero: £[e] = 0. 

Assumption 3: The correlation between the resid¬ 
uals and the variables X\, ..., X ; v is zero: 
£[eX] = 0. 

The above set of assumptions is not the full 
set of assumptions used when estimating a lin¬ 
ear model as a regression but only consistency 
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conditions to interpret a linear model as a re¬ 
gression. We will introduce additional assump¬ 
tions relative to how the model is sampled in 
the section on estimation. Note that the linear 
regression equation does not fully specify the 
joint conditional distribution of the dependent 
variables and the regressors. (This is a rather 
subtle point related to concept of exogeneity 
of variables. See Hendry [1995] for a further 
discussion.) 

Case Where Regressors Are 
Deterministic Variables 

In many applications of interest to the financial 
modeler, the regressors are deterministic vari¬ 
ables. Conceptually, regressions with determin¬ 
istic regressors are different from cases where 
regressors are random variables. In particular, 
as we have seen in a previous section, one 
cannot consider the regression as a conditional 
expectation. However, we can write a linear re¬ 
gression equation: 

N 

Y = a + ^ ^Xi + s (13) 

i=i 

and the following linear regression function: 

N 

y = ot + y^ fcxj (14) 

i=i 

where the regressors are deterministic vari¬ 
ables. As we will see in the following section, 
in both cases the least squares estimators are 
the same though the variances of the regres¬ 
sion parameters as functions of the samples are 
different. 


ESTIMATION OF LINEAR 
REGRESSIONS 

In this section, we discuss how to estimate the 
linear regression parameters. We consider two 
main estimation techniques: maximum likeli¬ 
hood method and least squares method. A dis¬ 
cussion of the sampling distributions of linear 


regression parameters follow. The method 
of moments and the instrumental variables 
method are other methods that are used but 
are not discussed in this entry. 


Maximum Likelihood Estimates 

Let's reformulate the regression problem in a 
matrix form that is standard in regression anal¬ 
ysis and that we will use in the following sec¬ 
tions. Let's start with the case of a dependent 
variable Y and one independent regressor X. 
This case is referred to as the bivariate case or 
the simple linear regression. Suppose that we 
are empirically given T pairs of observations of 
the regressor and the independent variable. In 
financial econometrics these observations could 
represent, for example, the returns Y of a stock 
and the returns X of a factor taken at fixed inter¬ 
vals of time t = 1,2,..., T. Using a notation that 
is standard in regression estimation, we place 
the given data in a vector Y and a matrix X: 


Y = 


/YA 


V Yr / 


X = 


/i XA 


\1 X r / 


(15) 


The column of Is represents constant terms. The 
regression equation can be written as a set of T 
samples from the same regression equation, one 
for each moment: 


Yi = Po + PiXi + si 


Y t = Po + P{X.t + £t 

that we can rewrite in matrix form, 

Y = X|3 + £ 

where (3 is the vector of regression coefficients. 



and e are the residuals. 

We now make a set of assumptions that 
are standard in regression analysis and that 
we will progressively relax. The assumptions 
for the linear regression model with normally 
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distributed residuals are: 


where 


1. The residuals are zero-mean, normally 
distributed independent variables 

e ~ N( 0, er 2 1), where er 2 is the common 
variance of the residuals and I is the 
identity matrix. 

2. X is distributed independently of the 
residuals £. 

(16) 


X=i£>,XY=i£x t Y t 

t=i t =i 

and where <5*, <5y are the empirical standard 
deviations of the sample variables X, Y re¬ 
spectively Substituting these expressions in the 
third equation 


The regression equation can then be written: 
E(Y|X) = Xp. The residuals form a sequence of 
independent variables. They can therefore be 
regarded as a strict white-noise sequence. As 
the residuals are independent draws from the 
same normal distribution, we can compute the 
log-likelihood function as follows: 


lo g L = lo g( 2jr ) - \ log(°f) 


(Yt-Po-P iX t ) 


T r 


-E 


t =l L 


2n 


2(7,? 


(17) 


3 log L 

da} 


= 0 


yields the variance of the residuals: 


% = \ £ [( Yf - & ~ & x 0 2 ] 
t =l 


(19) 


In the matrix notation established above, we 
can write the estimators as follows: 


For parameters: |3 = (X'X) 1 X'Y (20) 


The maximum likelihood (ML) principle re¬ 
quires maximization of the log-likelihood func¬ 
tion. Maximizing the log-likelihood function 
entails first solving the equations: 


81 og L = Q 9 l°g 1 

dPo ’ dPi 


= 0, 


3 log L 
3cr £ 2 


= 0 


These equations can be explicitly written as fol¬ 
lows: 


T 

£ (Yt ~P 0 ~ h, X0 = 0 

t= 1 
T 

MYt -Po- PiXt) = 0 

t =1 

T 

T n 2 - [(Y f - Po - ftX f ) 2 ] = 0 

t=l 


A little algebra shows that solving the first 
two equations yields 


A = 


XY- XY 


Po = (Y - Pi X) 


(18) 


For the variance of the regression: 

d 2 = 1 (Y - xp)' (Y - xp) (21) 

A comment is in order. We started with T pairs 
of given data (X„ Y,), i = 1,..., T and then at¬ 
tempted to explain these data as a linear re¬ 
gression Y = P\X + Po + s. We estimated the 
coefficients (Pi, P 2 ) with maximum likelihood 
estimation (MLE) methods. Given this estimate 
of the regression coefficients, the estimated vari¬ 
ance of the residuals is given by equation (22). 
Note that equation (22) is the empirical vari¬ 
ance of residuals computed using the estimated 
regression parameters. A large variance of the 
residuals indicates that the level of noise in the 
process (i.e., the size of the unexplained fluctu¬ 
ations of the process) is high. 

Generalization to Multiple Independent 
Variables 

The above discussion of the MLE method gen¬ 
eralizes to multiple independent variables, N. 
We are empirically given a set of T observations 
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that we organize in matrix form. 





(Xu . 

■ Xmi ^ 

Y = 


, x = 




UJ 


V XlT ■ 

■ x NT ) 


and the regression coefficients and error terms 
in the vectors. 



//h\ 


( El \ 

(3 = 


, £ = 



KPn) 


\ST ) 


(23) 


The matrix X which contains all the regressors 
is called the design matrix. The regressors X 
can be deterministic, the important condition 
being that the residuals are independent. One 
of the columns can be formed by Is to allow for 
a constant term (intercept). Our objective is to 
explain the data as a linear regression: 


Y = Xp + e 


We make the same set of assumptions given 
by equation (17) as we made in the case of a 
single regressor. Using the above notation, the 
loglikelihood function will have the form 

lo g 1 = lo g( 2jr ) - \ logK 2 ) 

-^(Y-X|3)'(Y-X|3) (24) 

The maximum likelihood conditions are writ¬ 
ten as 

d££i =0 , = o (25) 

9(3 9ct 2 V 

These equations are called normal equations. 
Solving the system of normal equations gives 
the same form for the estimators as in the uni¬ 
variate case: 


P = (X'Xr’X'Y 

b 2 = I (Y - X|3)' (Y - Xp) (26) 

The variance estimator is not unbiased. It can 
be demonstrated that to obtain an unbiased es¬ 
timator we have to apply a correction that takes 
into account the number of variables by replac¬ 


ing T with T — N, assuming T > N: 

Y-Xp)'(Y-X^) (27) 


The MLE method requires that we know the 
functional form of the distribution. If the dis¬ 
tribution is known but not normal, we can still 
apply the MLE method but the estimators will 
be different. We will not here discuss further 
MLE for nonnormal distributions. 


Ordinary Least Squares Method 

We now establish the relationship between the 
MLE principle and the ordinary least squares 
(OLS) method. OLS is a general method to ap¬ 
proximate a relationship between two or more 
variables. We use the matrix notation defined 
above for MLE method; that is, we assume 
that observations are described by equation (23) 
while the regression coefficients and the resid¬ 
uals are described by equation (24). 

If we use the OLS method, the assumptions 
of linear regressions can be weakened. In par¬ 
ticular, we need not assume that the residuals 
are normally distributed but only assume that 
they are uncorrelated and have finite variance. 
The residuals can therefore be regarded as a 
white-noise sequence (and not a strict white- 
noise sequence as in the previous section). We 
summarize the linear regression assumptions 
as follows: 

Assumptions for the linear regression model: 

1. The mean of the residuals is zero: E(e) = 0 

2. The residuals are mutually uncorrelated: 
(E(te') — a 2 1), where cr 2 is the variance of 
the residuals and I is the identity matrix. 

3. X is distributed independently of the 
residuals £. 

(28) 

In the general case of a multivariate regres¬ 
sion, the OLS method requires minimization of 
the sum of the squared residuals. Consider the 
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vector of residuals: 


si 


£ = 


, £ t . 


The sum of the squared residuals (SSR) = (e\ + 
... + Sj) can be written as SSR = t't. As e = Y 
— X0, we can also write 


assumed to be fixed deterministic variables or 
random variables. 

Let's first assume that the regressors are fixed 
deterministic variables. Thus only the error 
terms and the dependent variable change from 
sample to sample. The (3 are unbiased estima¬ 
tors and E[0] = 0 therefore holds. It can also 
be demonstrated that the following expression 
for the variance of 0 holds 


SSR = (V - X0)'(Y - X0) 

The OLS method requires that we minimize 
the SSR. To do so, we equate to zero the first 
derivatives of the SSR: 

3(Y-X0)'(Y-X0) 

90 

This is a system of N equations. Solving this 
system, we obtain the estimators: 

0 = (X'X^X'Y 

These estimators are the same estimators ob¬ 
tained with the MLE method; they have an 
optimality property. In fact, the Gauss-Markov 
theorem states that the above OLS estimators 
are the best linear unbiased estimators (BLUE). 
"Best" means that no other linear unbiased 
estimator has a lower variance. It should be 
noted explicitly that OLS and MLE are con¬ 
ceptually different methodologies: MLE seeks 
the optimal parameters of the distribution of 
the error terms, while OLS seeks to mini¬ 
mize the variance of error terms. The fact that 
the two estimators coincide was an important 
discovery. 

SAMPLING DISTRIBUTIONS 
OF REGRESSIONS 

Estimated regression parameters depend on the 
sample. They are random variables whose dis¬ 
tribution is to be determined. As we will see 
in this section, the sampling distributions dif¬ 
fer depending on whether the regressors are 


E[(P - PXP - P)'] = o-^X'X) -1 (29) 

where an estimate a 2 of er 2 is given by (27). 

Under the additional assumption that the 
residuals are normally distributed, it can be 
demonstrated that the regression coefficients 
are jointly normally distributed as follows: 

0 ~ Njv[0, ct 2 (X'X) -1 ] (30) 

These expressions are important because they 
allow us to compute confidence intervals for the 
regression parameters. 

Let's now suppose that the regressors are ran¬ 
dom variables. Under the assumptions set forth 
in (29), it can be demonstrated that the variance 
of the estimators 0 can be written as follows: 

V(P) = E[(X'X) _1 ]V(X'e)E[(X'X) _1 ] (31) 

where the terms E[(X'X) 1 ] and \/(X'e) are the 
empirical expectation of (X'X) -1 and the empir¬ 
ical variance of (X' e), respectively. 

The following terms are used to describe this 
estimator of the variance: sandwich estimator, ro¬ 
bust estimator, and White estimator. The term 
sandwich estimator is due to the fact that the 
term V(X'e) is sandwiched between the terms 
E[(X'X) ']. These estimators are robust because 
they take into account not only the variabil¬ 
ity of the dependent variables but also that 
of the independent variables. Consider that if 
the regressors are a large sample, the sandwich 
and the classical estimators are close to each 
other. 
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DETERMINING THE 
EXPLANATORY POWER 
OF A REGRESSION 

The above computations to estimate regression 
parameters were carried out under the assump¬ 
tion that the data were generated by a linear 
regression function with uncorrelated and nor¬ 
mally distributed noise. In general, we do not 
know if this is indeed the case. Though we can 
always estimate a linear regression model on 
any data sample by applying the estimators dis¬ 
cussed above, we must now ask the question: 
When is a linear regression applicable and how 
can one establish the goodness (i.e., explanatory 
power) of a linear regression? 

Quite obviously, a linear regression model 
is applicable if the relationship between the 
variables is approximately linear. How can we 
check if this is indeed the case? What happens if 
we fit a linear model to variables that have non¬ 
linear relationships, or if distributions are not 
normal? A number of tests have been devised 
to help answer these questions. 

Intuitively, a measure of the quality of approx¬ 
imation offered by a linear regression is given 
by the variance of the residuals. Squared residu¬ 
als are used because a property of the estimated 
relationship is that the sum of the residuals is 
zero. If residuals are large, the regression model 
has little explanatory power. However, the size 
of the average residual in itself is meaningless as 
it has to be compared with the range of the vari¬ 
ables. For example, if we regress stock prices 
over a broad-based stock index, other things 
being equal, the residuals will be numerically 
different if the price is in the range of dollars or 
in the range of hundreds of dollars. 


Coefficient of Determination 

A widely used measure of the quality and use¬ 
fulness of a regression model is given by the 
coefficient of determination denoted by R 2 or R- 
squared. The idea behind R 2 is the following. 


The dependent variable Y has a total variation 
given by the following expression: 

1 T 

Total variation = S 2 = — - — ^ (Y t — Y) 2 

t=i 

(32) 

where 


Y = 


1 


T - 1 


T 




This total variation is the sum of the variation 
of the variable Y due to the variation of the 
regressors plus the variation of residuals S 2 = 
S| + S 2 . We can therefore define the coefficient 
of determination: 


Coefficient of determination = R 2 = 


1 - R 2 = 


S 2 

d f 

s 2 

°Y 


°R 

s 2 

°Y 

(33) 


as the portion of the total fluctuation of the de¬ 
pendent variable, Y, explained by the regression 
relation. R 2 is a number between 0 and 1 : R 2 = 
0 means that the regression has no explanatory 
power, R 2 = 1 means that the regression has 
perfect explanatory power. The quantity R 2 is 
computed by software packages that perform 
linear regressions. 

It can be demonstrated that the coefficient 
of determination R 2 is distributed as the well- 
known Student F distribution. This fact allows 
one to determine intervals of confidence around 
a measure of the significance of a regression. 


Adjusted R 2 

The quantity R 2 as a measure of the usefulness 
of a regression model suffers from the prob¬ 
lem that a regression might fit data very well 
in-sample but have no explanatory power out- 
of-sample. This occurs if the number of regres¬ 
sors is too high. Therefore an adjusted R 2 is 
sometimes used. The adjusted R 2 is defined as 
R 2 corrected by a penalty function that takes 
into account the number p of regressors in the 
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model: 

Adjusted R 2 = — |r (34) 


Relation of R 2 to Correlation 
Coefficient 

The R 2 is the squared correlation coefficient. The 
correlation coefficient is a number between — 1 
and +1 that measures the strength of the depen¬ 
dence between two variables. If a linear rela¬ 
tionship is assumed, the correlation coefficient 
has the usual product-moment expression: 


r 


1 XY-XY 


Sy X x 


(35) 


USING REGRESSION 
ANALYSIS IN FINANCE 

This section provides several illustrations of 
regression analysis in finance as well as the 
data for each illustration. However, in order to 
present the data, we limit our sample size. 


Characteristic Line for 
Common Stocks 

The characteristic line of a security is the regres¬ 
sion of the excess returns of that security on the 
market excess returns: 

n = a.i + Pir M 


where 

r, = the security excess return of a security 
over the risk-free rate 

r M = the market excess return of the market 
over the risk-free rate 

We computed the characteristic lines of two 
common stocks, Oracle and General Motors 
(GM), and a randomly created portfolio con¬ 
sisting of 20 stocks equally weighted. We used 
the S&P 500 Index as a proxy for the market 
returns and the 90-day Treasury rate as a proxy 
for the risk-free rate. The return and excess re¬ 
turn data are shown in Table 1. Note that there 
are 60 monthly observations used to estimate 
the characteristic line from December 2000 to 
November 2005. The 20 stocks comprising the 
portfolio are shown at the bottom of Table 1. 


Table 1 Return and Excess Return Data for S&P 500, Oracle, GM, and Portfolio": 12/1/2000-11/1/2005 


Date 

S&P 500 
Return 

Risk- 

Free 

Rate 

S&P- 
Risk Free 
Rate 

Oracle 

Return 

Oracle 

Excess 

Return 

GM 

Return 

GM 

Excess 

Return 

Portfolio 

Return 

Portfolio 

Excess 

Return 

12/1/2000 

0.03464 

0.00473 

0.02990 

0.00206 

-0.00267 

0.05418 

0.04945 

0.01446 

0.00973 

1/1/2001 

-0.09229 

0.00413 

-0.09642 

-0.34753 

-0.35165 

-0.00708 

-0.01120 

-0.07324 

-0.07736 

2/1/2001 

-0.06420 

0.00393 

-0.06813 

-0.21158 

-0.21550 

-0.02757 

-0.03149 

-0.07029 

-0.07421 

3/1/2001 

0.07681 

0.00357 

0.07325 

0.07877 

0.07521 

0.05709 

0.05352 

0.11492 

0.11135 

4/1/2001 

0.00509 

0.00321 

0.00188 

-0.05322 

-0.05643 

0.03813 

0.03492 

0.01942 

0.01621 

5/1/2001 

-0.02504 

0.00302 

-0.02805 

0.24183 

0.23881 

0.13093 

0.12791 

-0.03050 

-0.03351 

6/1/2001 

-0.01074 

0.00288 

-0.01362 

-0.04842 

-0.05130 

-0.01166 

-0.01453 

-0.03901 

-0.04189 

7/1/2001 

-0.06411 

0.00288 

-0.06698 

-0.32467 

-0.32754 

-0.13915 

-0.14203 

-0.08264 

-0.08552 

8/1/2001 

-0.08172 

0.00274 

-0.08447 

0.03030 

0.02756 

-0.21644 

-0.21918 

-0.13019 

-0.13293 

9/1/2001 

0.01810 

0.00219 

0.01591 

0.07790 

0.07571 

-0.03683 

-0.03902 

0.05969 

0.05749 

10/1/2001 

0.07518 

0.00177 

0.07341 

0.03466 

0.03289 

0.20281 

0.20104 

0.11993 

0.11816 

11/1/2001 

0.00757 

0.00157 

0.00601 

-0.01568 

-0.01725 

-0.02213 

-0.02370 

0.02346 

0.02190 

12/1/2001 

-0.01557 

0.00148 

-0.01706 

0.24982 

0.24834 

0.05226 

0.05078 

0.05125 

0.04976 

1/1/2002 

-0.02077 

0.00144 

-0.02221 

-0.03708 

-0.03852 

0.03598 

0.03454 

0.02058 

0.01914 

2/1/2002 

0.03674 

0.00152 

0.03522 

-0.22984 

-0.23136 

0.14100 

0.13948 

0.02818 

0.02667 

3/1/2002 

-0.06142 

0.00168 

-0.06309 

-0.21563 

-0.21730 

0.06121 

0.05953 

-0.00517 

-0.00684 

4/1/2002 

-0.00908 

0.00161 

-0.01069 

-0.21116 

-0.21276 

-0.03118 

-0.03279 

-0.02664 

-0.02825 

5/1/2002 

-0.07246 

0.00155 

-0.07401 

0.19571 

0.19416 

-0.13998 

-0.14153 

-0.04080 

-0.04235 

6/1/2002 

-0.07900 

0.00149 

-0.08050 

0.05702 

0.05553 

-0.12909 

-0.13058 

-0.05655 

-0.05804 

7/1/2002 

0.00488 

0.00142 

0.00346 

-0.04196 

-0.04337 

0.02814 

0.02673 

-0.01411 

-0.01553 
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Table 1 ( Continued ) 


Date 

S&P 500 
Return 

Risk- 

Free 

Rate 

s&r- 

Risk Free 
Rate 

Oracle 

Return 

Oracle 

Excess 

Return 

GM 

Return 

GM 

Excess 

Return 

Portfolio 

Return 

Portfolio 

Excess 

Return 

8/1/2002 

-0.11002 

0.00133 

-0.11136 

-0.18040 

-0.18173 

-0.18721 

-0.18855 

-0.09664 

-0.09797 

9/1/2002 

0.08645 

0.00133 

0.08512 

0.29644 

0.29510 

-0.14524 

-0.14658 

0.06920 

0.06787 

10/1/2002 

0.05707 

0.00130 

0.05577 

0.19235 

0.19105 

0.19398 

0.19268 

0.08947 

0.08817 

11/1/2002 

-0.06033 

0.00106 

-0.06139 

-0.11111 

-0.11217 

-0.07154 

-0.07259 

-0.04623 

-0.04729 

12/1/2002 

-0.02741 

0.00103 

-0.02845 

0.11389 

0.11286 

-0.01438 

-0.01541 

-0.00030 

-0.00134 

1/1/2003 

-0.01700 

0.00100 

-0.01800 

-0.00582 

-0.00682 

-0.07047 

-0.07147 

-0.03087 

-0.03187 

2/1/2003 

0.00836 

0.00098 

0.00737 

-0.09365 

-0.09463 

-0.00444 

-0.00543 

-0.00951 

-0.01049 

3/1/2003 

0.08104 

0.00094 

0.08010 

0.09594 

0.09500 

0.07228 

0.07134 

0.06932 

0.06838 

4/1/2003 

0.05090 

0.00095 

0.04995 

0.09512 

0.09417 

-0.01997 

-0.02092 

0.06898 

0.06803 

5/1/2003 

0.01132 

0.00090 

0.01042 

-0.07686 

-0.07776 

0.01896 

0.01806 

0.00567 

0.00477 

6/1/2003 

0.01622 

0.00077 

0.01546 

-0.00167 

-0.00243 

0.03972 

0.03896 

0.03096 

0.03019 

7/1/2003 

0.01787 

0.00079 

0.01708 

0.07006 

0.06927 

0.09805 

0.09726 

0.03756 

0.03677 

8/1/2003 

-0.01194 

0.00086 

-0.01280 

-0.12315 

-0.12401 

-0.00414 

-0.00499 

-0.03145 

-0.03231 

9/1/2003 

0.05496 

0.00084 

0.05412 

0.06400 

0.06316 

0.04251 

0.04167 

0.07166 

0.07082 

10/1/2003 

0.00713 

0.00083 

0.00630 

0.00418 

0.00334 

0.00258 

0.00174 

0.00832 

0.00749 

11/1/2003 

0.05077 

0.00085 

0.04992 

0.10067 

0.09982 

0.24825 

0.24740 

0.06934 

0.06849 

12/1/2003 

0.01728 

0.00083 

0.01645 

0.04762 

0.04679 

-0.06966 

-0.07049 

0.00012 

-0.00070 

1/1/2004 

0.01221 

0.00081 

0.01140 

-0.07143 

-0.07224 

-0.03140 

-0.03221 

0.01279 

0.01198 

2/1/2004 

-0.01636 

0.00083 

-0.01718 

-0.06760 

-0.06842 

-0.01808 

-0.01890 

-0.03456 

-0.03538 

3/1/2004 

-0.01679 

0.00083 

-0.01762 

-0.06250 

-0.06333 

0.00360 

0.00277 

-0.00890 

-0.00972 

4/1/2004 

0.01208 

0.00091 

0.01118 

0.01333 

0.01243 

-0.04281 

-0.04372 

0.02303 

0.02212 

5/1/2004 

0.01799 

0.00109 

0.01690 

0.04649 

0.04540 

0.02644 

0.02535 

-0.00927 

-0.01036 

6/1/2004 

-0.03429 

0.00133 

-0.03562 

-0.11903 

-0.12036 

-0.07405 

-0.07538 

-0.05173 

-0.05307 

7/1/2004 

0.00229 

0.00138 

0.00090 

-0.05138 

-0.05276 

-0.04242 

-0.04380 

-0.00826 

-0.00965 

8/1/2004 

0.00936 

0.00143 

0.00793 

0.13139 

0.12996 

0.02832 

0.02689 

0.01632 

0.01488 

9/1/2004 

0.01401 

0.00156 

0.01246 

0.12234 

0.12078 

-0.09251 

-0.09407 

0.00577 

0.00421 

10/1/2004 

0.03859 

0.00167 

0.03693 

0.00632 

0.00465 

0.00104 

-0.00063 

0.05326 

0.05159 

11/1/2004 

0.03246 

0.00189 

0.03057 

0.07692 

0.07503 

0.03809 

0.03620 

0.02507 

0.02318 

12/1/2004 

-0.02529 

0.00203 

-0.02732 

0.00364 

0.00162 

-0.08113 

-0.08315 

-0.03109 

-0.03311 

1/1/2005 

0.01890 

0.00218 

0.01673 

-0.05955 

-0.06172 

-0.03151 

-0.03369 

0.01225 

0.01008 

2/1/2005 

-0.01912 

0.00231 

-0.02143 

-0.03629 

-0.03860 

-0.17560 

-0.17790 

-0.01308 

-0.01538 

3/1/2005 

-0.02011 

0.00250 

-0.02261 

-0.07372 

-0.07622 

-0.09221 

-0.09471 

-0.03860 

-0.04110 

4/1/2005 

0.02995 

0.00254 

0.02741 

0.10727 

0.10472 

0.18178 

0.17924 

0.04730 

0.04476 

5/1/2005 

-0.00014 

0.00257 

-0.00271 

0.03125 

0.02868 

0.07834 

0.07577 

-0.02352 

-0.02609 

6/1/2005 

0.03597 

0.00261 

0.03336 

0.02803 

0.02542 

0.08294 

0.08033 

0.04905 

0.04644 

7/1/2005 

-0.01122 

0.00285 

-0.01407 

-0.04274 

-0.04559 

-0.07143 

-0.07428 

-0.02185 

-0.02470 

8/1/2005 

0.00695 

0.00305 

0.00390 

-0.04542 

-0.04847 

-0.10471 

-0.10776 

0.00880 

0.00575 

9/1/2005 

-0.01774 

0.00306 

-0.02080 

0.02258 

0.01952 

-0.10487 

-0.10793 

-0.04390 

-0.04696 

10/1/2005 

0.03519 

0.00333 

0.03186 

-0.00631 

-0.00963 

-0.20073 

-0.20405 

0.01649 

0.01316 

11/1/2005 

0.01009 

0.00346 

0.00663 

-0.00714 

-0.01060 

0.01050 

0.00704 

0.01812 

0.01466 


"Portfolio includes the following 20 stocks: Honeywell, Alcoa, Campbell Soup, Boeing, General Dynamics, Oracle, 
Sun, General Motors, Procter & Gamble, Wal-Mart, Exxon, ITT, Unilever, Hilton, Martin Marietta, Coca-Cola, 
Northrop Grumman, Mercury Interact, Amazon, and United Technologies. 


The estimated parameters for the two stocks 
and the portfolios are reported in Table 2. As can 
be seen from the table, the intercept term is not 
statistically significant; however, the slope, re¬ 
ferred to as the beta of the characteristic line, is 
statistically significant. Typically for individual 
stocks, the R 2 ranges from 0.15 to 0.65. For Ora¬ 
cle and GM the R 2 is 0.23 and 0.26, respectively. 


In contrast, for a randomly created portfolio, 
the R 2 is considerably higher. For our 20-stock 
portfolio, the R 2 is 0.79. 

Note that some researchers estimate a stock's 
beta by using returns rather than excess re¬ 
turns. The regression estimated is referred to 
as the single-index market model. This model 
was first suggested by Markowitz as a proxy 
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Table 2 Characteristic Line of the Common Stock of 
General Motors, Oracle, and Portfolio: 
12/1/2000-11/1/2005 


Coefficient 

Coefficient 

Estimate 

Standard 

Error 

f-statistic 

p-value 

GM 





a 

-0.005 

0.015 

-0.348 

0.729 

p 

1.406 

0.339 

4.142 

0.000 

R 2 

0.228 




p-value 

0.000 




Oracle 





a 

-0.009 

0.011 

-0.812 

0.420 

p 

1.157 

0.257 

4.501 

0.000 

R 2 

0.259 




p-value 

0.000 




Portfolio 





a 

0.003 

0.003 

1.027 

0.309 

p 

1.026 

0.070 

14.711 

0.000 

R 2 

0.787 




p-value 

0.000 





measure of the covariance of a stock with an 
index so that the full mean-variance analysis 
need not be performed. While the approach was 
mentioned by Markowitz (1959) in a footnote in 
his book, it was Sharpe (1963) who investigated 
this further. It turns out that the beta estimated 
using both the characteristic line and the single¬ 
index market model do not differ materially. For 
example, for our 20-stock portfolio, the betas 
differed only because of rounding off. 

Empirical Duration of 
Common Stock 

A commonly used measure of the interest-rate 
sensitivity of an asset's value is its duration. 
Duration is interpreted as the approximate per¬ 
centage change in the value of an asset for a 
100-basis-point change in interest. Duration can 
be estimated by using a valuation model or em¬ 
pirically by estimating from historical returns 
the sensitivity of the asset's value to changes 
in interest rates. When duration is measured in 
the latter way, it is called empirical duration. 
Since it is estimated using regression analysis, 
it is sometimes referred to as regression-based 
duration. 


A simple linear regression for computing em¬ 
pirical duration using monthly historical data 
(see Reilly, Wright, and Johnson, 2007) is 

\jn = oij + p,xt + £;t 

where 

yn = the percentage change in the value of 
asset i for month f 

x t = the change in the Treasury yield for 
month f 

The estimated Pi is the empirical duration for 
asset i. 

We will apply this linear regression to 
monthly data from October 1989 to October 
2003 shown in Table 3 1 for the following asset 
indexes: 

• Electric Utility sector of the S&P 500 

• Commercial Bank sector of the S&P 500 

• Lehman U.S. Aggregate Bond Index (now the 
Barclays Capital U.S. Aggregate Bond Index) 

The yield change ( Xt ) is measured by the 
Lehman Treasury Index. The regression results 
are shown in Table 4. We report the empirical 
duration (Pi), the f-statistic, the p-value, the R 2 , 
and the intercept term. Negative values are re¬ 
ported for the empirical duration. In practice, 
however, the duration is quoted as a positive 
value. For the Electric Utility sector and the 
Lehman U.S. Aggregate Bond Index, the em¬ 
pirical duration is statistically significant at any 
reasonable level of significance. 

A multiple regression model to estimate the 
empirical duration that has been suggested is 

3 lit — oi i + PljXit + P2i%2t + Sit 

where y, f and X\ t are the same as for the simple 
linear regression and X 2 t is the return on the S&P 
500. The results for this model are also shown 
in Table 4. 

The results of the multiple regression indicate 
that the returns for the Electric Utility sector are 
affected by both the change in Treasury rates 
and the return on the stock market as proxied 
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Table 3 Data for Empirical Duration Illustration 


Month 

Change in 

Lehman Bros 
Treasury Yield 

S&P500 

Return 

Electric 

Utility 

Sector 

Monthly Returns for 

Commercial 

Bank Sector 

Lehman U.S. 
Aggregate 
Bond Index 

Oct-89 

-0.46 

-2.33 

2.350 

-11.043 

2.4600 

Nov-89 

-0.10 

2.08 

2.236 

-3.187 

0.9500 

Dec-89 

0.12 

2.36 

3.794 

-1.887 

0.2700 

Jan-90 

0.43 

-6.71 

-4.641 

-10.795 

-1.1900 

Feb-90 

0.09 

1.29 

0.193 

4.782 

0.3200 

Mar-90 

0.20 

2.63 

-1.406 

-4.419 

0.0700 

Apr-90 

0.34 

-2.47 

-5.175 

-4.265 

-0.9200 

May-90 

-0.46 

9.75 

5.455 

12.209 

2.9600 

Jun-90 

-0.20 

-0.70 

0.966 

-5.399 

1.6100 

Jul-90 

-0.21 

-0.32 

1.351 

-8.328 

1.3800 

Aug-90 

0.37 

-9.03 

-7.644 

-10.943 

-1.3400 

Sep-90 

-0.06 

-4.92 

0.435 

-15.039 

0.8300 

Oct-90 

-0.23 

-0.37 

10.704 

-10.666 

1.2700 

Nov-90 

-0.28 

6.44 

2.006 

18.892 

2.1500 

Dec-90 

-0.23 

2.74 

1.643 

6.620 

1.5600 

Jan-91 

-0.13 

4.42 

-1.401 

8.018 

1.2400 

Feb-91 

0.01 

7.16 

4.468 

12.568 

0.8500 

Mar-91 

0.03 

2.38 

2.445 

5.004 

0.6900 

Apr-91 

-0.15 

0.28 

-0.140 

7.226 

1.0800 

May-91 

0.06 

4.28 

-0.609 

7.501 

0.5800 

Jun-91 

0.15 

-4.57 

-0.615 

-7.865 

-0.0500 

Jul-91 

-0.13 

4.68 

4.743 

7.983 

1.3900 

Aug-91 

-0.37 

2.35 

3.226 

9.058 

2.1600 

Sep-91 

-0.33 

-1.64 

4.736 

-2.033 

2.0300 

Oct-91 

-0.17 

1.34 

1.455 

0.638 

1.1100 

Nov-91 

-0.15 

-4.04 

2.960 

-9.814 

0.9200 

Dec-91 

-0.59 

11.43 

5.821 

14.773 

2.9700 

Jan-92 

0.42 

-1.86 

-5.515 

2.843 

-1.3600 

Feb-92 

0.10 

1.28 

-1.684 

8.834 

0.6506 

Mar-92 

0.27 

-1.96 

-0.296 

-3.244 

-0.5634 

Apr-92 

-0.10 

2.91 

3.058 

4.273 

0.7215 

May-92 

-0.23 

0.54 

2.405 

2.483 

1.8871 

Jun-92 

-0.26 

-1.45 

0.492 

1.221 

1.3760 

Jul-92 

-0.41 

4.03 

6.394 

-0.540 

2.0411 

Aug-92 

-0.13 

-2.02 

-1.746 

-5.407 

1.0122 

Sep-92 

-0.26 

1.15 

0.718 

1.960 

1.1864 

Oct-92 

0.49 

0.36 

-0.778 

2.631 

-1.3266 

Nov-92 

0.26 

3.37 

-0.025 

7.539 

0.0228 

Dec-92 

-0.24 

1.31 

3.247 

5.010 

1.5903 

Jan-93 

-0.36 

0.73 

3.096 

4.203 

1.9177 

Feb-93 

-0.29 

1.35 

6.000 

3.406 

1.7492 

Mar-93 

0.02 

2.15 

0.622 

3.586 

0.4183 

Apr-93 

-0.10 

-2.45 

-0.026 

-5.441 

0.6955 

May-93 

0.25 

2.70 

-0.607 

-0.647 

0.1268 

Jun-93 

-0.30 

0.33 

2.708 

4.991 

1.8121 

Jul-93 

0.05 

-0.47 

2.921 

0.741 

0.5655 

Aug-93 

-0.31 

3.81 

3.354 

0.851 

1.7539 

Sep-93 

0.00 

-0.74 

-1.099 

3.790 

0.2746 

Oct-93 

0.05 

2.03 

-1.499 

-7.411 

0.3732 

Nov-93 

0.26 

-0.94 

-5.091 

-1.396 

-0.8502 

Dec-93 

0.01 

1.23 

2.073 

3.828 

0.5420 


(' Continued) 
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Table 3 ( Continued ) 


Month 

Change in 

Lehman Bros 
Treasury Yield 

S&P500 

Return 

Electric 

Utility 

Sector 

Monthly Returns for 

Commercial 

Bank Sector 

Lehman U.S. 
Aggregate 
Bond Index 

Jan-94 

-0.17 

3.35 

-2.577 

4.376 

1.3502 

Feb-94 

0.55 

-2.70 

-5.683 

-4.369 

-1.7374 

Mar-94 

0.55 

-4.35 

-4.656 

-3.031 

-2.4657 

Apr-94 

0.37 

1.30 

0.890 

3.970 

-0.7985 

May-94 

0.18 

1.63 

-5.675 

6.419 

-0.0138 

Jun-94 

0.16 

-2.47 

-3.989 

-2.662 

-0.2213 

Jul-94 

-0.23 

3.31 

5.555 

2.010 

1.9868 

Aug-94 

0.12 

4.07 

0.851 

3.783 

0.1234 

Sep-94 

0.43 

-2.41 

-2.388 

-7.625 

-1.4717 

Oct-94 

0.18 

2.29 

1.753 

1.235 

-0.0896 

Nov-94 

0.37 

-3.67 

2.454 

-7.595 

-0.2217 

Dec-94 

0.11 

1.46 

0.209 

-0.866 

0.6915 

Jan-95 

-0.33 

2.60 

7.749 

6.861 

1.9791 

Feb-95 

-0.41 

3.88 

-0.750 

6.814 

2.3773 

Mar-95 

0.01 

2.96 

-2.556 

-1.434 

0.6131 

Apr-95 

-0.18 

2.91 

3.038 

4.485 

1.3974 

May-95 

-0.72 

3.95 

7.590 

9.981 

3.8697 

Jun-95 

-0.05 

2.35 

-0.707 

0.258 

0.7329 

Jul-95 

0.14 

3.33 

-0.395 

4.129 

-0.2231 

Aug-95 

-0.10 

0.27 

-0.632 

5.731 

1.2056 

Sep-95 

-0.05 

4.19 

6.987 

5.491 

0.9735 

Oct-95 

-0.21 

-0.35 

2.215 

-1.906 

1.3002 

Nov-95 

-0.23 

4.40 

-0.627 

7.664 

1.4982 

Dec-95 

-0.18 

1.85 

6.333 

0.387 

1.4040 

Jan-96 

-0.13 

3.44 

2.420 

3.361 

0.6633 

Feb-96 

0.49 

0.96 

-3.590 

4.673 

-1.7378 

Mar-96 

0.31 

0.96 

-1.697 

2.346 

-0.6954 

Apr-96 

0.25 

1.47 

-4.304 

-1.292 

-0.5621 

May-96 

0.18 

2.58 

1.864 

2.529 

-0.2025 

Jun-96 

-0.14 

0.41 

5.991 

-0.859 

1.3433 

Jul-96 

0.08 

-4.45 

-7.150 

0.466 

0.2736 

Aug-96 

0.15 

2.12 

1.154 

4.880 

-0.1675 

Sep-96 

-0.23 

5.62 

0.682 

6.415 

1.7414 

Oct-96 

-0.35 

2.74 

4.356 

8.004 

2.2162 

Nov-96 

-0.21 

7.59 

1.196 

10.097 

1.7129 

Dec-96 

0.30 

-1.96 

-0.323 

-4.887 

-0.9299 

Jan-97 

0.06 

6.21 

0.443 

8.392 

0.3058 

Feb-97 

0.11 

0.81 

0.235 

5.151 

0.2485 

Mar-97 

0.36 

-4.16 

-4.216 

-7.291 

-1.1083 

Apr-97 

-0.18 

5.97 

-2.698 

5.477 

1.4980 

May-97 

-0.07 

6.14 

4.240 

3.067 

0.9451 

Jun-97 

-0.11 

4.46 

3.795 

4.834 

1.1873 

Jul-97 

-0.43 

7.94 

2.627 

12.946 

2.6954 

Aug-97 

0.30 

-5.56 

-2.423 

-6.205 

-0.8521 

Sep-97 

-0.19 

5.48 

5.010 

7.956 

1.4752 

Oct-97 

-0.21 

-3.34 

1.244 

-2.105 

1.4506 

Nov-97 

0.06 

4.63 

8.323 

3.580 

0.4603 

Dec-97 

-0.11 

1.72 

7.902 

3.991 

1.0063 

Jan-98 

-0.25 

1.11 

-4.273 

-4.404 

1.2837 

Feb-98 

0.17 

7.21 

2.338 

9.763 

-0.0753 
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Table 3 ( Continued ) 


Month 

Change in 

Lehman Bros 
Treasury Yield 

S&P500 

Return 

Electric 

Utility 

Sector 

Monthly Returns for 

Commercial 

Bank Sector 

Lehman U.S. 
Aggregate 
Bond Index 

Mar-98 

0.05 

5.12 

7.850 

7.205 

0.3441 

Apr-98 

0.00 

1.01 

-3.234 

2.135 

0.5223 

May-98 

-0.08 

-1.72 

-0.442 

-3.200 

0.9481 

Jun-98 

-0.09 

4.06 

3.717 

2.444 

0.8483 

Jul-98 

0.03 

-1.06 

-4.566 

0.918 

0.2122 

Aug-98 

-0.46 

-14.46 

7.149 

-24.907 

1.6277 

Sep-98 

-0.53 

6.41 

5.613 

2.718 

2.3412 

Oct-98 

0.05 

8.13 

-2.061 

9.999 

-0.5276 

Nov-98 

0.17 

6.06 

1.631 

5.981 

0.5664 

Dec-98 

0.02 

5.76 

2.608 

2.567 

0.3007 

Jan-99 

-0.01 

4.18 

-6.072 

-0.798 

0.7143 

Feb-99 

0.55 

-3.11 

-5.263 

0.524 

-1.7460 

Mar-99 

-0.05 

4.00 

-2.183 

1.370 

0.5548 

Apr-99 

0.05 

3.87 

6.668 

7.407 

0.3170 

May-99 

0.31 

-2.36 

7.613 

-6.782 

-0.8763 

Jun-99 

0.11 

5.55 

-4.911 

5.544 

-0.3194 

Jul-99 

0.11 

-3.12 

-2.061 

-7.351 

-0.4248 

Aug-99 

0.10 

-0.50 

1.508 

-4.507 

-0.0508 

Sep-99 

-0.08 

-2.74 

-5.267 

-6.093 

1.1604 

Oct-99 

0.11 

6.33 

1.800 

15.752 

0.3689 

Nov-99 

0.16 

2.03 

-8.050 

-7.634 

-0.0069 

Dec-99 

0.24 

5.89 

-0.187 

-9.158 

-0.4822 

Jan-00 

0.19 

-5.02 

5.112 

-2.293 

-0.3272 

Feb-00 

-0.13 

-1.89 

-10.030 

-12.114 

1.2092 

Mar-00 

-0.20 

9.78 

1.671 

18.770 

1.3166 

Apr-00 

0.17 

-3.01 

14.456 

-5.885 

-0.2854 

May-00 

0.07 

-2.05 

2.985 

11.064 

-0.0459 

Jun-00 

-0.26 

2.47 

-5.594 

-14.389 

2.0803 

Jul-00 

-0.08 

-1.56 

6.937 

6.953 

0.9077 

Aug-00 

-0.17 

6.21 

13.842 

12.309 

1.4497 

Sep-00 

-0.03 

-5.28 

12.413 

1.812 

0.6286 

Oct-OO 

-0.06 

-0.42 

-3.386 

-1.380 

0.6608 

Nov-00 

-0.31 

-7.88 

3.957 

-3.582 

1.6355 

Dec-00 

-0.33 

0.49 

4.607 

12.182 

1.8554 

Jan-01 

-0.22 

3.55 

-11.234 

3.169 

1.6346 

Feb-01 

-0.16 

-9.12 

6.747 

-3.740 

0.8713 

Mar-01 

-0.08 

-6.33 

1.769 

0.017 

0.5018 

Apr-01 

0.22 

7.77 

5.025 

-1.538 

-0.4151 

May-01 

0.00 

0.67 

0.205 

5.934 

0.6041 

Jun-01 

0.01 

-2.43 

-7.248 

0.004 

0.3773 

Jul-01 

-0.40 

-0.98 

-5.092 

2.065 

2.2357 

Aug-01 

-0.14 

-6.26 

-0.149 

-3.940 

1.1458 

Sep-01 

-0.41 

-8.08 

-10.275 

-4.425 

1.1647 

Oct-01 

-0.39 

1.91 

1.479 

-7.773 

2.0930 

Nov-01 

0.41 

7.67 

-0.833 

7.946 

-1.3789 

Dec-01 

0.21 

0.88 

3.328 

3.483 

-0.6357 

Jan-02 

0.00 

-1.46 

-3.673 

1.407 

0.8096 

Feb-02 

-0.08 

-1.93 

-2.214 

-0.096 

0.9690 

Mar-02 

0.56 

3.76 

10.623 

7.374 

-1.6632 

Apr-02 

-0.44 

-6.06 

1.652 

2.035 

1.9393 


(' Continued) 
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Table 3 ( Continued ) 


Month 

Change in 

Lehman Bros 
Treasury Yield 

S&P500 

Return 

Electric 

Utility 

Sector 

Monthly Returns for 

Commercial 

Bank Sector 

Lehman U.S. 
Aggregate 
Bond Index 

May-02 

-0.06 

-0.74 

-3.988 

1.247 

0.8495 

Jun-02 

-0.23 

-7.12 

-4.194 

-3.767 

0.8651 

Jul-02 

-0.50 

-7.80 

-10.827 

-4.957 

1.2062 

Aug-02 

-0.17 

0.66 

2.792 

3.628 

1.6882 

Sep-02 

-0.45 

-10.87 

-8.677 

-10.142 

1.6199 

Oct-02 

0.11 

8.80 

-2.802 

5.143 

-0.4559 

Nov-02 

0.34 

5.89 

1.620 

0.827 

-0.0264 

Dec-02 

-0.45 

-5.88 

5.434 

-2.454 

2.0654 

Jan-03 

0.11 

-2.62 

-3.395 

- 0.111 

0.0855 

Feb-03 

-0.21 

-1.50 

-2.712 

-1.514 

1.3843 

Mar-03 

0.05 

0.97 

4.150 

-3.296 

-0.0773 

Apr-03 

-0.03 

8.24 

5.438 

9.806 

0.8254 

May-03 

-0.33 

5.27 

10.519 

5.271 

1.8645 

Jun-03 

0.08 

1.28 

1.470 

1.988 

-0.1986 

Jul-03 

0.66 

1.76 

-5.649 

3.331 

-3.3620 

Aug-03 

0.05 

1.95 

1.342 

-1.218 

0.6637 

Sep-03 

-0.46 

-1.06 

4.993 

-0.567 

2.6469 

Oct-03 

0.33 

5.66 

0.620 

8.717 

-0.9320 

Nov-03 

0.13 

0.88 

0.136 

1.428 

0.2391 

Dec-03 

-0.14 

5.24 

NA 

NA 

NA 


by the S&P 500. For the Commercial Bank sec¬ 
tor, the coefficient of the changes in Treasury 
rates is not statistically significant, however the 
coefficient of the return on the S&P 500 is sta¬ 
tistically significant. The opposite is the case 
for the Lehman U.S. Aggregate Bond Index. It 
is interesting to note that the duration for the 
Lehman U.S. Aggregate Bond Index as reported 
by Lehman Brothers was about 4.55 in Novem¬ 
ber 2003. The empirical duration is 4.1. While 
the sign of the coefficient that is an estimate 
of duration is negative (which means the price 
moves in the opposite direction to the change in 
interest rates), mar ket participants talk in terms 
of the positive value of duration for a bond that 
has this characteristic. 

Predicting the 10-Year 
Treasury Yield 2 

The U.S. Department of the Treasury issues two 
types of securities: zero-coupon securities and 


coupon securities. Securities issued with one 
year or less to maturity are called Treasury 
bills; they are issued as zero-coupon instru¬ 
ments. Treasury securities with more than one 
year to maturity are issued as coupon-bearing 
securities. Treasury securities from more than 
one year up to 10 years of maturity are called 
Treasury notes; Treasury securities with a ma¬ 
turity in excess of 10 years are called Treasury 
bonds. The U.S. Treasury auctions securities of 
specified maturities on a regular calendar basis. 
The Treasury currently issues 30-year Treasury 
bonds but had stopped issuance of them from 
October 2001 to January 2006. 

An important Treasury coupon bond is the 
10-year Treasury note. In this illustration we 
will try to forecast this rate based on two in¬ 
dependent variables suggested by economic 
theory. A well-known theory of interest rates 
is that the interest rate in any economy con¬ 
sists of two components. This relationship is 
known as Fisher's law. The first is the expected 
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Table 4 Estimation of Regression Parameters for Empirical Duration 

Electric Lehman U.S. 

Utility Commercial Aggregate 

Sector Bank Sector Bond Index 


a. Simple Linear Regression 

Intercept 


a i 

0.6376 

1.1925 

0.5308 

f-statistic 

1.8251 

2.3347 

21.1592 

p-value 

0.0698 

0.0207 

0.0000 

Change in the Treasury yield 



Pi 

-4.5329 

-2.5269 

-4.1062 

f-statistic 

-3.4310 

-1.3083 

-43.2873 

p-value 

0.0008 

0.1926 

0.0000 

R 2 

0.0655 

0.0101 

0.9177 

F-value 

11.7717 

1.7116 

1873.8000 

p-value 

0.0007 

0.1926 

0.0000 

b. Multiple Linear Regression 



Intercept 




01 j 

0.3937 

0.2199 

0.5029 

f-statistic 

1.1365 

0.5835 

21.3885 

p-value 

0.2574 

0.5604 

0.0000 

Change in the Treasury yield 



Pli 

-4.3780 

-1.9096 

-4.0885 

f-statistic 

-3.4143 

-1.3686 

-46.9711 

p-value 

0.0008 

0.1730 

0.0000 

Return on the S&P 500 



Pn 

0.2664 

1.0620 

0.0304 

f-statistic 

3.4020 

12.4631 

5.7252 

p-value 

0.0008 

0.0000 

0.0000 

R 2 

0.1260 

0.4871 

0.9312 

F-value 

12.0430 

79.3060 

1130.5000 

p-value 

0.00001 

0.00000 

0.00000 


rate of inflation. The second is the real rate 
of interest. We use regression analysis to pro¬ 
duce a model to forecast the yield on the 
10-year Treasury note (simply, the 10-year 
Treasury yield)—the dependent variable—and 
the expected rate of inflation (simply, expected 
inflation) and the real rate of interest (simply, 
real rate). 

The 10-year Treasury yield is observable, but 
we need a proxy for the two independent vari¬ 
ables (i.e., the expected rate of inflation and the 
real rate of interest at the time) as they are not 
observable at the time of the forecast. Keep in 
mind that since we are forecasting, we do not 
use as our independent variable information 
that is unavailable at the time of the forecast. 
Consequently, we need a proxy available at the 
time of the forecast. 


The inflation rate is available from the U.S. 
Department of Commerce. However, we need 
a proxy for expected inflation. We can use some 
type of average of past inflation as a proxy. In 
our model, we use a 5-year moving average. 
There are more sophisticated methodologies for 
calculating expected inflation, but the 5-year 
moving average is sufficient for our illustra¬ 
tion. For example, one can use an exponential 
smoothing of actual inflation, a methodology 
used by the OECD. For the real rate, we use the 
rate on 3-month certificates of deposit (CDs). 
Again, we use a 5-year moving average. 

The monthly data for the three variables from 
November 1965 to December 2005 (482 obser¬ 
vations) are provided in Table 5. The regression 
results are reported in Table 6. As can be seen, 
the coefficients of both independent variables 
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Table 5 Monthly Data for 10-Year Treasury Yield, Expected Inflation, and Real Rate: November 1965-December 
2005 


Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

Real 

Rate 

Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

Real 

Rate 

Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

Real 

Rate 

1965 
Nov 
Dec 

1966 
Jan 

4.45 

4.62 

4.61 

1.326 

1.330 

1.334 

2.739 

2.757 

2.780 

1970 

Jan 

7.80 

3.621 

3.061 

1974 

Jan 

6.99 

4.652 

3.330 

Feb 

4.83 

1.348 

2.794 

Feb 

7.24 

3.698 

3.064 

Feb 

6.96 

4.653 

3.332 

Mar 

4.87 

1.358 

2.820 

Mar 

7.07 

3.779 

3.046 

Mar 

7.21 

4.656 

3.353 

Apr 

4.75 

1.372 

2.842 

Apr 

7.39 

3.854 

3.035 

Apr 

7.51 

4.657 

3.404 

May 

4.78 

1.391 

2.861 

May 

7.91 

3.933 

3.021 

May 

7.58 

4.678 

3.405 

June 

4.81 

1.416 

2.883 

June 

7.84 

4.021 

3.001 

June 

7.54 

4.713 

3.419 

July 

5.02 

1.440 

2.910 

July 

7.46 

4.104 

2.981 

July 

7.81 

4.763 

3.421 

Aug 

5.22 

1.464 

2.945 

Aug 

7.53 

4.187 

2.956 

Aug 

8.04 

4.827 

3.401 

Sept 

5.18 

1.487 

2.982 

Sept 

7.39 

4.264 

2.938 

Sept 

8.04 

4.898 

3.346 

Oct 

5.01 

1.532 

2.997 

Oct 

7.33 

4.345 

2.901 

Oct 

7.9 

4.975 

3.271 

Nov 

5.16 

1.566 

3.022 

Nov 

6.84 

4.436 

2.843 

Nov 

7.68 

5.063 

3.176 

Dec 

4.84 

1.594 

3.050 

Dec 

6.39 

4.520 

2.780 

Dec 

7.43 

5.154 

3.086 

1967 

Jan 

4.58 

1.633 

3.047 

1971 

Jan 

6.24 

4.605 

2.703 

1975 

Jan 

7.5 

5.243 

2.962 

Feb 

4.63 

1.667 

3.050 

Feb 

6.11 

4.680 

2.627 

Feb 

7.39 

5.343 

2.827 

Mar 

4.54 

1.706 

3.039 

Mar 

5.70 

4.741 

2.565 

Mar 

7.73 

5.431 

2.710 

Apr 

4.59 

1.739 

3.027 

Apr 

5.83 

4.793 

2.522 

Apr 

8.23 

5.518 

2.595 

May 

4.85 

1.767 

3.021 

May 

6.39 

4.844 

2.501 

May 

8.06 

5.585 

2.477 

June 

5.02 

1.801 

3.015 

June 

6.52 

4.885 

2.467 

June 

7.86 

5.639 

2.384 

July 

5.16 

1.834 

3.004 

July 

6.73 

4.921 

2.436 

July 

8.06 

5.687 

2.311 

Aug 

5.28 

1.871 

2.987 

Aug 

6.58 

4.947 

2.450 

Aug 

8.4 

5.716 

2.271 

Sept 

5.3 

1.909 

2.980 

Sept 

6.14 

4.964 

2.442 

Sept 

8.43 

5.738 

2.241 

Oct 

5.48 

1.942 

2.975 

Oct 

5.93 

4.968 

2.422 

Oct 

8.15 

5.753 

2.210 

Nov 

5.75 

1.985 

2.974 

Nov 

5.81 

4.968 

2.411 

Nov 

8.05 

5.759 

2.200 

Dec 

5.7 

2.027 

2.972 

Dec 

5.93 

4.964 

2.404 

Dec 

8 

5.761 

2.186 

1968 

Jan 

5.53 

2.074 

2.959 

1972 

Jan 

5.95 

4.959 

2.401 

1976 

Jan 

7.74 

5.771 

2.166 

Feb 

5.56 

2.126 

2.943 

Feb 

6.08 

4.959 

2.389 

Feb 

7.79 

5.777 

2.164 

Mar 

5.74 

2.177 

2.937 

Mar 

6.07 

4.953 

2.397 

Mar 

7.73 

5.800 

2.138 

Apr 

5.64 

2.229 

2.935 

Apr 

6.19 

4.953 

2.403 

Apr 

7.56 

5.824 

2.101 

May 

5.87 

2.285 

2.934 

May 

6.13 

4.949 

2.398 

May 

7.9 

5.847 

2.060 

June 

5.72 

2.341 

2.928 

June 

6.11 

4.941 

2.405 

June 

7.86 

5.870 

2.034 

July 

5.5 

2.402 

2.906 

July 

6.11 

4.933 

2.422 

July 

7.83 

5.900 

1.988 

Aug 

5.42 

2.457 

2.887 

Aug 

6.21 

4.924 

2.439 

Aug 

7.77 

5.937 

1.889 

Sept 

5.46 

2.517 

2.862 

Sept 

6.55 

4.916 

2.450 

Sept 

7.59 

5.981 

1.813 

Oct 

5.58 

2.576 

2.827 

Oct 

6.48 

4.912 

2.458 

Oct 

7.41 

6.029 

1.753 

Nov 

5.7 

2.639 

2.808 

Nov 

6.28 

4.899 

2.461 

Nov 

7.29 

6.079 

1.681 

Dec 

6.03 

2.697 

2.798 

Dec 

6.36 

4.886 

2.468 

Dec 

6.87 

6.130 

1.615 

1969 

Jan 

6.04 

2.745 

2.811 

1973 

Jan 

6.46 

4.865 

2.509 

1977 

Jan 

7.21 

6.176 

1.573 

Feb 

6.19 

2.802 

2.826 

Feb 

6.64 

4.838 

2.583 

Feb 

7.39 

6.224 

1.527 

Mar 

6.3 

2.869 

2.830 

Mar 

6.71 

4.818 

2.641 

Mar 

7.46 

6.272 

1.474 

Apr 

6.17 

2.945 

2.827 

Apr 

6.67 

4.795 

2.690 

Apr 

7.37 

6.323 

1.427 

May 

6.32 

3.016 

2.862 

May 

6.85 

4.776 

2.734 

May 

7.46 

6.377 

1.397 

June 

6.57 

3.086 

2.895 

June 

6.90 

4.752 

2.795 

June 

7.28 

6.441 

1.340 

July 

6.72 

3.156 

2.929 

July 

7.13 

4.723 

2.909 

July 

7.33 

6.499 

1.293 

Aug 

6.69 

3.236 

2.967 

Aug 

7.40 

4.699 

3.023 

Aug 

7.4 

6.552 

1.252 

Sept 

7.16 

3.315 

3.001 

Sept 

7.09 

4.682 

3.110 

Sept 

7.34 

6.605 

1.217 

Oct 

7.1 

3.393 

3.014 

Oct 

6.79 

4.668 

3.185 

Oct 

7.52 

6.654 

1.193 

Nov 

7.14 

3.461 

3.045 

Nov 

6.73 

4.657 

3.254 

Nov 

7.58 

6.710 

1.154 

Dec 

7.65 

3.539 

3.059 

Dec 

6.74 

4.651 

3.312 

Dec 

7.69 

6.768 

1.119 
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Table 5 ( Continued ) 


Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

Real 

Rate 

Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

Real 

Rate 

Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

Real 

Rate 

1978 

Jan 

7.96 

6.832 

1.068 

1982 

Jan 

14.59 

9.285 

2.497 

1986 

Jan 

9.19 

6.154 

5.284 

Feb 

8.03 

6.890 

0.995 

Feb 

14.43 

9.334 

2.612 

Feb 

8.7 

6.043 

5.249 

Mar 

8.04 

6.942 

0.923 

Mar 

13.86 

9.375 

2.741 

Mar 

7.78 

5.946 

5.225 

Apr 

8.15 

7.003 

0.854 

Apr 

13.87 

9.417 

2.860 

Apr 

7.3 

5.858 

5.143 

May 

8.35 

7.063 

0.784 

May 

13.62 

9.456 

2.958 

May 

7.71 

5.763 

5.055 

June 

8.46 

7.124 

0.716 

June 

14.3 

9.487 

3.095 

June 

7.8 

5.673 

4.965 

July 

8.64 

7.191 

0.598 

July 

13.95 

9.510 

3.183 

July 

7.3 

5.554 

4.878 

Aug 

8.41 

7.263 

0.482 

Aug 

13.06 

9.524 

3.259 

Aug 

7.17 

5.428 

4.789 

Sept 

8.42 

7.331 

0.397 

Sept 

12.34 

9.519 

3.321 

Sept 

7.45 

5.301 

4.719 

Oct 

8.64 

7.400 

0.365 

Oct 

10.91 

9.517 

3.363 

Oct 

7.43 

5.186 

4.671 

Nov 

8.81 

7.463 

0.322 

Nov 

10.55 

9.502 

3.427 

Nov 

7.25 

5.078 

4.680 

Dec 

9.01 

7.525 

0.284 

Dec 

10.54 

9.469 

3.492 

Dec 

7.11 

4.982 

4.655 

1979 

Jan 

9.1 

7.582 

0.254 

1983 

Jan 

10.46 

9.439 

3.553 

1987 

Jan 

7.08 

4.887 

4.607 

Feb 

9.1 

7.645 

0.224 

Feb 

10.72 

9.411 

3.604 

Feb 

7.25 

4.793 

4.558 

Mar 

9.12 

7.706 

0.174 

Mar 

10.51 

9.381 

3.670 

Mar 

7.25 

4.710 

4.493 

Apr 

9.18 

7.758 

0.108 

Apr 

10.4 

9.340 

3.730 

Apr 

8.02 

4.627 

4.445 

May 

9.25 

7.797 

0.047 

May 

10.38 

9.288 

3.806 

May 

8.61 

4.551 

4.404 

June 

8.91 

7.821 

-0.025 

June 

10.85 

9.227 

3.883 

June 

8.4 

4.476 

4.335 

July 

8.95 

7.834 

-0.075 

July 

11.38 

9.161 

3.981 

July 

8.45 

4.413 

4.296 

Aug 

9.03 

7.837 

-0.101 

Aug 

11.85 

9.087 

4.076 

Aug 

8.76 

4.361 

4.273 

Sept 

9.33 

7.831 

-0.085 

Sept 

11.65 

9.012 

4.152 

Sept 

9.42 

4.330 

4.269 

Oct 

10.3 

7.823 

0.011 

Oct 

11.54 

8.932 

4.204 

Oct 

9.52 

4.302 

4.259 

Nov 

10.65 

7.818 

0.079 

Nov 

11.69 

8.862 

4.243 

Nov 

8.86 

4.285 

4.243 

Dec 

10.39 

7.818 

0.154 

Dec 

11.83 

8.800 

4.276 

Dec 

8.99 

4.279 

4.218 

1980 

Jan 

10.8 

7.825 

0.261 

1984 

Jan 

11.67 

8.741 

4.324 

1988 

Jan 

8.67 

4.274 

4.180 

Feb 

12.41 

7.828 

0.418 

Feb 

11.84 

8.670 

4.386 

Feb 

8.21 

4.271 

4.149 

Mar 

12.75 

7.849 

0.615 

Mar 

12.32 

8.598 

4.459 

Mar 

8.37 

4.268 

4.104 

Apr 

11.47 

7.879 

0.701 

Apr 

12.63 

8.529 

4.530 

Apr 

8.72 

4.270 

4.075 

May 

10.18 

7.926 

0.716 

May 

13.41 

8.460 

4.620 

May 

9.09 

4.280 

4.036 

June 

9.78 

7.989 

0.702 

June 

13.56 

8.393 

4.713 

June 

8.92 

4.301 

3.985 

July 

10.25 

8.044 

0.695 

July 

13.36 

8.319 

4.793 

July 

9.06 

4.322 

3.931 

Aug 

11.1 

8.109 

0.716 

Aug 

12.72 

8.241 

4.862 

Aug 

9.26 

4.345 

3.879 

Sept 

11.51 

8.184 

0.740 

Sept 

12.52 

8.164 

4.915 

Sept 

8.98 

4.365 

3.844 

Oct 

11.75 

8.269 

0.795 

Oct 

12.16 

8.081 

4.908 

Oct 

8.8 

4.381 

3.810 

Nov 

12.68 

8.356 

0.895 

Nov 

11.57 

7.984 

4.919 

Nov 

8.96 

4.385 

3.797 

Dec 

12.84 

8.446 

1.004 

Dec 

12.5 

7.877 

4.928 

Dec 

9.11 

4.384 

3.787 

1981 

Jan 

12.57 

8.520 

1.132 

1985 

Jan 

11.38 

7.753 

4.955 

1989 

Jan 

9.09 

4.377 

3.786 

Feb 

13.19 

8.594 

1.242 

Feb 

11.51 

7.632 

4.950 

Feb 

9.17 

4.374 

3.792 

Mar 

13.12 

8.649 

1.336 

Mar 

11.86 

7.501 

4.900 

Mar 

9.36 

4.367 

3.791 

Apr 

13.68 

8.700 

1.477 

Apr 

11.43 

7.359 

4.954 

Apr 

9.18 

4.356 

3.784 

May 

14.1 

8.751 

1.619 

May 

10.85 

7.215 

5.063 

May 

8.86 

4.344 

3.758 

June 

13.47 

8.802 

1.755 

June 

10.16 

7.062 

5.183 

June 

8.28 

4.331 

3.723 

July 

14.28 

8.877 

1.897 

July 

10.31 

6.925 

5.293 

July 

8.02 

4.320 

3.679 

Aug 

14.94 

8.956 

2.037 

Aug 

10.33 

6.798 

5.346 

Aug 

8.11 

4.306 

3.644 

Sept 

15.32 

9.039 

2.155 

Sept 

10.37 

6.664 

5.383 

Sept 

8.19 

4.287 

3.623 

Oct 

15.15 

9.110 

2.256 

Oct 

10.24 

6.528 

5.399 

Oct 

8.01 

4.273 

3.614 

Nov 

13.39 

9.175 

2.305 

Nov 

9.78 

6.399 

5.360 

Nov 

7.87 

4.266 

3.609 

Dec 

13.72 

9.232 

2.392 

Dec 

9.26 

6.269 

5.326 

Dec 

7.84 

4.258 

3.611 


(' Continued) 
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Table 5 ( Continued ) 


Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

Real 

Rate 

Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

Real 

Rate 

Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

1990 

Jan 

8.418 

4.257 

3.610 

1994 

Jan 

5.642 

4.256 

1.739 

1998 

Jan 

5.505 

2.828 

Feb 

8.515 

4.254 

3.595 

Feb 

6.129 

4.224 

1.663 

Feb 

5.622 

2.806 

Mar 

8.628 

4.254 

3.585 

Mar 

6.738 

4.195 

1.586 

Mar 

5.654 

2.787 

Apr 

9.022 

4.260 

3.580 

Apr 

7.042 

4.166 

1.523 

Apr 

5.671 

2.765 

May 

8.599 

4.264 

3.586 

May 

7.147 

4.135 

1.473 

May 

5.552 

2.744 

June 

8.412 

4.272 

3.589 

June 

7.32 

4.106 

1.427 

June 

5.446 

2.725 

July 

8.341 

4.287 

3.568 

July 

7.111 

4.079 

1.394 

July 

5.494 

2.709 

Aug 

8.846 

4.309 

3.546 

Aug 

7.173 

4.052 

1.356 

Aug 

4.976 

2.695 

Sept 

8.795 

4.335 

3.523 

Sept 

7.603 

4.032 

1.315 

Sept 

4.42 

2.680 

Oct 

8.617 

4.357 

3.503 

Oct 

7.807 

4.008 

1.289 

Oct 

4.605 

2.666 

Nov 

8.252 

4.371 

3.493 

Nov 

7.906 

3.982 

1.278 

Nov 

4.714 

2.653 

Dec 

8.067 

4.388 

3.471 

Dec 

7.822 

3.951 

1.278 

Dec 

4.648 

2.641 

1991 

Jan 

8.007 

4.407 

3.436 

1995 

Jan 

7.581 

3.926 

1.269 

1999 

Jan 

4.651 

2.631 

Feb 

8.033 

4.431 

3.396 

Feb 

7.201 

3.899 

1.261 

Feb 

5.287 

2.621 

Mar 

8.061 

4.451 

3.360 

Mar 

7.196 

3.869 

1.253 

Mar 

5.242 

2.605 

Apr 

8.013 

4.467 

3.331 

Apr 

7.055 

3.840 

1.240 

Apr 

5.348 

2.596 

May 

8.059 

4.487 

3.294 

May 

6.284 

3.812 

1.230 

May 

5.622 

2.586 

June 

8.227 

4.504 

3.267 

June 

6.203 

3.781 

1.222 

June 

5.78 

2.572 

July 

8.147 

4.517 

3.247 

July 

6.426 

3.746 

1.223 

July 

5.903 

2.558 

Aug 

7.816 

4.527 

3.237 

Aug 

6.284 

3.704 

1.228 

Aug 

5.97 

2.543 

Sept 

7.445 

4.534 

3.223 

Sept 

6.182 

3.662 

1.232 

Sept 

5.877 

2.527 

Oct 

7.46 

4.540 

3.207 

Oct 

6.02 

3.624 

1.234 

Oct 

6.024 

2.515 

Nov 

7.376 

4.552 

3.177 

Nov 

5.741 

3.587 

1.229 

Nov 

6.191 

2.502 

Dec 

6.699 

4.562 

3.133 

Dec 

5.572 

3.549 

1.234 

Dec 

6.442 

2.490 

1992 

Jan 

7.274 

4.569 

3.092 

1996 

Jan 

5.58 

3.505 

1.250 

2000 

Jan 

6.665 

2.477 

Feb 

7.25 

4.572 

3.054 

Feb 

6.098 

3.458 

1.270 

Feb 

6.409 

2.464 

Mar 

7.528 

4.575 

3.014 

Mar 

6.327 

3.418 

1.295 

Mar 

6.004 

2.455 

Apr 

7.583 

4.574 

2.965 

Apr 

6.67 

3.376 

1.328 

Apr 

6.212 

2.440 

May 

7.318 

4.571 

2.913 

May 

6.852 

3.335 

1.359 

May 

6.272 

2.429 

June 

7.121 

4.567 

2.864 

June 

6.711 

3.297 

1.387 

June 

6.031 

2.421 

July 

6.709 

4.563 

2.810 

July 

6.794 

3.261 

1.417 

July 

6.031 

2.412 

Aug 

6.604 

4.556 

2.757 

Aug 

6.943 

3.228 

1.449 

Aug 

5.725 

2.406 

Sept 

6.354 

4.544 

2.682 

Sept 

6.703 

3.195 

1.481 

Sept 

5.802 

2.398 

Oct 

6.789 

4.533 

2.624 

Oct 

6.339 

3.163 

1.516 

Oct 

5.751 

2.389 

Nov 

6.937 

4.522 

2.571 

Nov 

6.044 

3.131 

1.558 

Nov 

5.468 

2.382 

Dec 

6.686 

4.509 

2.518 

Dec 

6.418 

3.102 

1.608 

Dec 

5.112 

2.374 

1993 

Jan 

6.359 

4.495 

2.474 

1997 

Jan 

6.494 

3.077 

1.656 

2001 

Jan 

5.114 

2.368 

Feb 

6.02 

4.482 

2.427 

Feb 

6.552 

3.057 

1.698 

Feb 

4.896 

2.366 

Mar 

6.024 

4.466 

2.385 

Mar 

6.903 

3.033 

1.746 

Mar 

4.917 

2.364 

Apr 

6.009 

4.453 

2.330 

Apr 

6.718 

3.013 

1.795 

Apr 

5.338 

2.364 

May 

6.149 

4.439 

2.272 

May 

6.659 

2.990 

1.847 

May 

5.381 

2.362 

June 

5.776 

4.420 

2.214 

June 

6.5 

2.968 

1.899 

June 

5.412 

2.363 

July 

5.807 

4.399 

2.152 

July 

6.011 

2.947 

1.959 

July 

5.054 

2.363 

Aug 

5.448 

4.380 

2.084 

Aug 

6.339 

2.926 

2.016 

Aug 

4.832 

2.365 

Sept 

5.382 

4.357 

2.020 

Sept 

6.103 

2.909 

2.078 

Sept 

4.588 

2.365 

Oct 

5.427 

4.333 

1.958 

Oct 

5.831 

2.888 

2.136 

Oct 

4.232 

2.366 

Nov 

5.819 

4.309 

1.885 

Nov 

5.874 

2.866 

2.189 

Nov 

4.752 

2.368 

Dec 

5.794 

4.284 

1.812 

Dec 

5.742 

2.847 

2.247 

Dec 

5.051 

2.370 
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Table 5 ( Continued ) 


Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

Real 

Rate 

Date 

10-Yr. 

Trea. 

Yield 

Exp. 

Infl. 

Real 

Rate 

2002 




2004 




Jan 

5.033 

2.372 

2.950 

Jan 

4.134 

2.172 

1.492 

Feb 

4.877 

2.372 

2.888 

Feb 

3.973 

2.157 

1.442 

Mar 

5.396 

2.371 

2.827 

Mar 

3.837 

2.149 

1.385 

Apr 

5.087 

2.369 

2.764 

Apr 

4.507 

2.142 

1.329 

May 

5.045 

2.369 

2.699 

May 

4.649 

2.136 

1.273 

June 

4.799 

2.367 

2.636 

June 

4.583 

2.134 

1.212 

July 

4.461 

2.363 

2.575 

July 

4.477 

2.129 

1.156 

Aug 

4.143 

2.364 

2.509 

Aug 

4.119 

2.126 

1.097 

Sept 

3.596 

2.365 

2.441 

Sept 

4.121 

2.124 

1.031 

Oct 

3.894 

2.365 

2.374 

Oct 

4.025 

2.122 

0.966 

Nov 

4.207 

2.362 

2.302 

Nov 

4.351 

2.124 

0.903 

Dec 

3.816 

2.357 

2.234 

Dec 

4.22 

2.129 

0.840 

2003 




2005 




Jan 

3.964 

2.351 

2.168 

Jan 

4.13 

2.131 

0.783 

Feb 

3.692 

2.343 

2.104 

Feb 

4.379 

2.133 

0.727 

Mar 

3.798 

2.334 

2.038 

Mar 

4.483 

2.132 

0.676 

Apr 

3.838 

2.323 

1.976 

Apr 

4.2 

2.131 

0.622 

May 

3.372 

2.312 

1.913 

May 

3.983 

2.127 

0.567 

June 

3.515 

2.300 

1.850 

June 

3.915 

2.120 

0.520 

July 

4.408 

2.288 

1.786 

July 

4.278 

2.114 

0.476 

Aug 

4.466 

2.267 

1.731 

Aug 

4.016 

2.107 

0.436 

Sept 

3.939 

2.248 

1.681 

Sept 

4.326 

2.098 

0.399 

Oct 

4.295 

2.233 

1.629 

Oct 

4.553 

2.089 

0.366 

Nov 

4.334 

2.213 

1.581 

Nov 

4.486 

2.081 

0.336 

Dec 

4.248 

2.191 

1.537 

Dec 

4.393 

2.075 

0.311 

Note: 








Expected Infl. (%) = expected rate of inflation as proxied by the 5-year moving average of the actual inflation rate. 

Real Rate (%) = real rate of interest as proxied by the 5-year moving average of the interest rate on 3-month certificates 

of deposit. 








Table 6 Results of Regression for Forecasting 10-Year Treasury Yield 




Regression Statistics 







Multiple R 2 

0.908318 






R 2 

0.825042 






Adjusted R 2 

0.824312 






Standard Error 

1.033764 






Observations 

482 







Analysis of Variance 








df 

SS 

MS 

F 

Significance F 



Regression 

2 

2413.914 

1206.957 

1129.404 

4.8E-182 



Residual 

479 

511.8918 

1.068668 




Total 

481 

2925.806 









Standard 


Statistics 





Coefficients Error 

t 

p-value 



Intercept 


1.89674 

0.147593 

12.85118 

1.1E-32 



Expected Inflation 

0.996937 

0.021558 

46.24522 

9.1E-179 



Real Rate 


0.352416 

0.039058 

9.022903 

4.45E-18 
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are positive (as would be predicted by economic 
theory) and highly significant. 

NONNORMALITY AND 
AUTOCORRELATION OF THE 
RESIDUALS 

In the above discussion we assumed that there 
is no correlation between the residual terms. 
Let's now relax these assumptions. The correla¬ 
tion of the residuals is critical from the point of 
view of estimation. Autocorrelation of residuals 
is quite common in financial estimation where 
we regress quantities that are time series. 

A time series is said to be autocorrelated 
if each term is correlated with its predeces¬ 
sor so that the variance of each term is par¬ 
tially explained by regressing each term on its 
predecessor. 

Recall from the previous section that we orga¬ 
nized regressor data in a matrix called the de¬ 
sign matrix. Suppose that both regressors and 
the variable Y are time series data, that is, every 
row of the design matrix corresponds to a mo¬ 
ment in time. The regression equation is written 
as follows: 

Y = X|3 + e 

Suppose that residuals are correlated. This 
means that in general E[ £;£,■] = a ,, ^ 0. Thus 
the variance-covariance matrix of the residuals 
{<Jij} will not be a diagonal matrix as in the 
case of uncorrelated residuals, but will exhibit 
nonzero off-diagonal terms. We assume that we 
can write 

{er ij] = or 2 ft 

where £2 is a positive definite symmetric matrix 
and a is a parameter to be estimated. 

If residuals are correlated, the regression pa¬ 
rameters can still be estimated without biases 
using the formula given by (26). However, this 
estimate will not be optimal in the sense that 
there are other estimators with lower variance 
of the sampling distribution. An optimal linear 


unbiased estimator has been derived. It is called 
Aitken's generalized least squares (GLS) estimator 
and is given by 

(3 = (x'ft^x^x'srA (36) 

where ft is the residual correlation matrix. 

The GLS estimators vary with the sampling 
distribution. It can also be demonstrated that 
the variance of the GLS estimator is also given 
by the following "sandwich" formula: 

V((3) = £(((3 - j3)(|3 - p)') = a 2 (X'ft- 1 X)“ 1 

(37) 

This expression is similar to equation (28) 
with the exception of the sandwiched term ft -1 . 
Unfortunately, (37) cannot be estimated with¬ 
out first knowing the regression coefficients. 
For this reason, in the presence of correlation 
of residuals, it is common practice to replace 
static regression models with models that ex¬ 
plicitly capture autocorrelations and produce 
uncorrelated residuals. 

The key idea here is that autocorrelated resid¬ 
uals signal that the modeling exercise has not 
been completed. If residuals are autocorrelated, 
this signifies that the residuals at a generic time 
t can be predicted from residuals at an earlier 
time. For example, suppose that we are lin¬ 
early regressing a time series of returns r t on N 
factors: 

ff = «i /i,t-i + ■ ■ • + aN /w.f-i + £f 

Suppose that the residual terms e* are auto¬ 
correlated and that we can write regressions of 
the type 

st = <pst-i + m 

where >] t are now uncorrelated variables. If we 
ignore this autocorrelation, valuable forecasting 
information is lost. Our initial model has to be 
replaced with the following model: 

ff = «i /i,f-i + ■ ■ • + «N + Sf 

st = (pst-i + rjt 

with the initial conditions £q. 
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Detecting Autocorrelation 

How do we detect the autocorrelation of resid¬ 
uals? Suppose that we believe that there is 
a reasonable linear relationship between two 
variables, for instance stock returns and some 
fundamental variable. We then perform a lin¬ 
ear regression between the two variables and 
estimate regression parameters using the OLS 
method. After estimating the regression param¬ 
eters, we can compute the sequence of residu¬ 
als. At this point, we can apply tests such as the 
Durbin-Watson test or the Dickey-Fuller test to 
gauge the autocorrelation of residuals. If resid¬ 
uals are auto-correlated, we should modify the 
model. 


PITFALLS OF REGRESSIONS 

It is important to understand when regressions 
are correctly applicable and when they are not. 
In addition to the autocorrelation of residuals, 
there are other situations where it would be in¬ 
appropriate to use regressions. In particular, we 
analyze the following cases, which represent 
possible pitfalls of regressions: 

• Spurious regressions with integrated vari¬ 
ables 

• Collinearity 

• Increasing the number of regressors 

Spurious Regressions 

The phenomenon of spurious regressions, ob¬ 
served by Yule in 1927, led to the study of coin¬ 
tegration. We encounter spurious regressions 
when we perform an apparently meaningful 
regression between variables that are indepen¬ 
dent. A typical case is a regression between two 
independent random walks. Regressing two in¬ 
dependent random walks, one might find very 
high values of R 2 even if the two processes are 
independent. More in general, one might find 
high values of R 2 in the regression of two or 
more integrated variables, even if residuals are 
highly correlated. 


Testing for regressions implies testing for 
cointegration. Anticipating what will be dis¬ 
cussed there, it is always meaningful to per¬ 
form regressions between stationary variables. 
When variables are integrated, regressions are 
possible only if variables are cointegrated. This 
means that residuals are a stationary (though 
possibly autocorrelated) process. As a rule of 
thumb. Granger and Newbold (1974) observe 
that if the R 2 is greater than the Durbin-Watson 
statistics, it is appropriate to investigate if cor¬ 
relations are spurious. 


Collinearity 

Collinearity, also referred to as multicollinear- 
ity, occurs when two or more regressors have a 
linear deterministic relationship. For example, 
there is collinearity if the design matrix 


/X n 


X = 


\X 


IT 


X m \ 

Xnt/ 


exhibits two or more columns that are perfectly 
proportional. Collinearity is essentially a nu¬ 
merical problem. Intuitively, it is clear that it cre¬ 
ates indeterminacy as we are regressing twice 
on the same variable. In particular, the stan¬ 
dard estimators given by (26) and (27) cannot 
be used because the relative formulas become 
meaningless. 

In principle, collinearity can be easily re¬ 
solved by eliminating one or more regressors. 
The problem with collinearity is that some vari¬ 
ables might be very close to collinearity, thus 
leading to numerical problems and indetermi¬ 
nacy of results. In practice, this might happen 
for many different numerical artifacts. Detect¬ 
ing and analyzing collinearity is a rather del¬ 
icate problem. In principle one could detect 
collinearity by computing the determinant of 
X'X. The difficulty resides in analyzing situa¬ 
tions where this determinant is very small but 
not zero. One possible strategy for detecting 
and removing collinearity is to go through a 
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process of orthogonalization of variables. (See 
Hendry [1995].) 

Increasing the Number of 
Regressors 

Increasing the number of regressors does not 
always improve regressions. The econometric 
theorem known as Pyrrho's lemma relates to 
the number of regressors. (See Dijkstra [1995].) 
Pyrrho's lemma states that by adding one 
special regressor to a linear regression, it is 
possible to arbitrarily change the size and sign 
of regression coefficients as well as to obtain 
an arbitrary goodness of fit. This result, rather 
technical, seems artificial as the regressor is an 
artificially constructed variable. It is, however, 
a perfectly rigorous result; it tells us that, if 
we add regressors without a proper design 
and testing methodology, we risk obtaining 
spurious results. 

Pyrrho's lemma is the proof that modeling re¬ 
sults can be arbitrarily manipulated in-sample 
even in the simple context of linear regressions. 
In fact, by adding regressors one might obtain 
an excellent fit in-sample though these regres¬ 
sors might have no predictive power out-of- 
sample. In addition, the size and even the sign 
of the regression relationships can be artificially 
altered in-sample. 

The above observations are especially impor¬ 
tant for those financial models that seek to 
forecast prices, returns, or rates based on regres¬ 
sions over economic or fundamental variables. 
With modem computers, by trial and error, one 
might find a complex structure of regressions 
that give very good results in-sample but have 
no real forecasting power. 

KEY POINTS 

• In regression analysis, the relationship 
between a random variable, called the de¬ 
pendent variable, and one or more variables 
referred to as the independent variables, 
regressors, or explanatory variables (which 


can be random variables or deterministic 
variables) is estimated. 

• Factorization, which involves expressing a 
joint density as a product of a marginal den¬ 
sity and a conditional density, is the concep¬ 
tual basis of financial econometrics. 

• An econometric model is a probe that extracts 
independent samples—the noise terms— 
from highly dependent variables. 

• Regressions have a twofold nature: they can 
be either (1) the representation of dependence 
in terms of conditional expectations and con¬ 
ditional distributions or (2) the representation 
of dependence of random variables on deter¬ 
ministic parameters. 

• In many applications in financial modeling, 
the regressors are deterministic variables. 
Therefore, on a conceptual level, regressions 
with deterministic regressors are different 
from cases where regressors are random 
variables. In particular, a financial modeler 
cannot view the regression as a conditional 
expectation. 

• There are two main estimation techniques 
for estimating the parameters of a regression: 
maximum likelihood method and ordinary 
least squares method. The maximum likeli¬ 
hood principle requires maximization of the 
log-likelihood function. The ordinary least 
squares method requires minimization of the 
sum of the squared residuals. The ordinary 
least squares estimators are the best linear un¬ 
biased estimators. 

• Because the estimated regression parameters 
depend on the sample, they are random vari¬ 
ables whose distribution is to be determined. 
The sampling distributions differ depending 
on whether the regressors are assumed to 
be fixed deterministic variables or random 
variables. 

• A measure of the quality of approximation 
offered by a linear regression is given by the 
variance of the residuals. If residuals are large, 
the regression model has little explanatory 
power. However, the size of the average resid¬ 
ual in itself is meaningless as it has to be 
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compared with the range of the variables. A 
widely used measure of the quality and use¬ 
fulness of a regression model is given by the 
coefficient of determination, denoted by R 2 or 
R-squared, that can attain a value from zero to 
one. The adjusted R 2 is defined as R 2 corrected 
by a penalty function that takes into account 
the number of regressors in the model. 
Stepwise regression is a model-building 
technique for regression designs. The two 
methodologies for stepwise regression are the 
backward stepwise method and the back¬ 
ward removal method. 

A time series is said to be autocorrelated 
if each term is correlated with its predeces¬ 
sor so that the variance of each term is par¬ 
tially explained by regressing each term on 
its predecessor. Autocorrelation of residuals, 
a violation of the regression model assump¬ 
tions, is quite common in financial estimation 
where financial modelers regress quantities 
that are time series. When there is autocorre¬ 
lation present in a time series, the generalized 
least squares estimation method is used. The 
Durbin-Watson test or the Dickey-Fuller test 
can be utilized to gauge test for the presence 
of autocorrelation for the residuals. 

Three other situations where there are pos¬ 
sible pitfalls of using regressions are spu¬ 
rious regressions with integrated variables, 
collinearity, and increasing the number of re¬ 
gressors. Spurious regressions occur when an 
apparently meaningful regression between 
variables that are independent is estimated. 
Collinearity occurs when two or more regres¬ 
sors in a regression model have a linear de¬ 
terministic relationship. 

Pyrrho's lemma, which relates to the number 
of regressors in a regression model, states that 
by adding one special regressor to a linear re¬ 
gression, it is possible to arbitrarily change 
the size and sign of regression coefficients as 


well as to obtain an arbitrary goodness of 
fit. Pyrrho's lemma is the proof that model¬ 
ing results can be arbitrarily manipulated in- 
sample even in the simple context of linear 
regressions. 

NOTES 

1. The data were supplied by David Wright of 
Northern Illinois University. 

2. We are grateful to Robert Scott of the Bank for 
International Settlement for suggesting this 
illustration and for providing the data. 
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Abstract: In the application of regression analysis there are many situations where either the de¬ 
pendent variable or one or more of the regressors are categorical variables. When one or more 
categorical variables are used as regressors, a financial modeler must understand how to code the 
data, test for the significance of the categorical variables, and, based on the coding, how to interpret 
the estimated parameters. When the dependent variable is a categorical variable, the model is a 
probability model. 


There are many times in the application of 
regression analysis when the financial mod¬ 
eler will need to include a categorical variable 
rather than a continuous variable as a regres¬ 
sor. Categorical variables are variables that rep¬ 
resent group membership. For example, given 
a set of bonds, the rating is a categorical vari¬ 
able that indicates to what category—AA, BB, 
and so on—each bond belongs. A categorical 
variable does not have a numerical value or a 
numerical interpretation in itself. Thus the fact 
that a bond is in category AA or BB does not, 
in itself, measure any quantitative characteris¬ 
tic of the bond, though quantitative attributes 
such as a bond's yield spread can be associated 
with each category. 

In this entry, we will discuss how to deal 
with regressors that are categorical variables in 
a regression model. There are also applications 


where the dependent variable may be a categor¬ 
ical variable. For example, the dependent vari¬ 
able could be bankruptcy or nonbankruptcy of 
a company over some period of time. In such 
cases, the product of a regression is a proba¬ 
bility. Probability models of this type include 
linear probability, logit regression, and probit 
linear models. 

INDEPENDENT 
CATEGORICAL VARIABLES 

Categorical input variables are used to cluster 
input data into different groups. That is, sup¬ 
pose we are given a set of input-output data 
and a partition of the data set in a number of 
subsets Ai so that each data point belongs to 
one and only one set. The A, represent a cate¬ 
gorical input variable. In financial econometrics 
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categories might represent, for example, differ¬ 
ent market regimes, economic states, ratings, 
countries, industries, or sectors. 

We cannot, per se, mix quantitative input vari¬ 
ables and categorical variables. For example, 
we cannot sum yield spreads and their ratings. 
However, we can perform a transformation that 
allows the mixing of categorical and quanti¬ 
tative variables. Let's see how. Suppose first 
that there is only one categorical input variable 
D, one quantitative input variable X, and one 
quantitative output variable Y Consider our set 
of quantitative data, that is, quantitative obser¬ 
vations. We organize data in a matrix form as 
usual: 



_ Yr 


'1 

X u " 

Y = 


,x = 




_Yr_ 


_1 

X T i_ 


Suppose data belong to two categories. An ex¬ 
planatory variable that distinguishes only two 
categories is called a dichotomous variable. The 
key is to represent a dichotomous categorical 
variable as a numerical variable D, called a 
dummy variable, that can assume the two val¬ 
ues 0,1. We can now add the variable D to the 
input variables to represent membership in one 
or the other group: 


X = 


~Di 1 

_D t 1 


X n 

Xn 


If D, = 0, the data X, belong to the first cate¬ 
gory; if D, = 1, the data X, belong to the second 
category. 

Consider now the regression equation 


Yi — P o + Pi X + 


In financial econometric applications, the in¬ 
dex i will be time or a variable that identifies a 
cross section of assets, such as bond issues. Con¬ 
sider that we can write three separate regression 
equations, one for those data that correspond 
to D = 1, one for those data that correspond to 
D = 0, and one for the fully pooled data. Sup¬ 
pose now that the three equations differ by the 


intercept term but have the same slope. Let's ex¬ 
plicitly write the two equations for those data 
that correspond to D = 1 and for those data that 
correspond to D = 0: 


Poo + Pixi + Si, if Di — 0 
Poi + Pi X; + Sj , if D[ — 1 


where i defines the observations that belong to 
the first category when the dummy variable D 
assumes value 0 and also defines the observa¬ 
tions that belong to the second category when 
the dummy variable D assumes value 1. If the 
two categories are recession and expansion, the 
first equation might hold in periods of expan¬ 
sion and the second in periods of recession. If 
the two categories are investment-grade bonds 
and noninvestment-grade bonds, the two equa¬ 
tions apply to different cross sections of bonds, 
as will be illustrated in an example later in this 
entry. 

Observe now that, under the assumption that 
only the intercept term differs in the two equa¬ 
tions, the two equations can be combined into 
a single equation in the following way: 


Yi — Poo + Y D(i) + /hX, + e, 


where Y — Po\ — Poo represents the difference of 
the intercept for the two categories. In this way 
we have defined a single regression equation 
with two independent quantitative variables, 
X, D, to which we can apply all the usual tools 
of regression analysis, including the ordinary 
least squares (OLS) estimation method and 
all the tests. By estimating the coefficients of 
this regression, we obtain the common slope 
and two intercepts. Observe that we would 
obtain the same result if the categories were 
inverted. However, the interpretation of the es¬ 
timated parameter for the categorical variable 
would differ depending on which category is 
omitted. 

Thus far we have assumed that there is no 
interaction between the categorical and the 
quantitative variable, that is, the slope of the 
regression is the same for the two categories. 
This means that the effects of variables are ad¬ 
ditive; that is, the effect of one variable is added 
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regardless of the value taken by the other 
variable. In many applications, this is an 
unrealistic assumption. 

Using dummy variables, the treatment is the 
same as that applied to intercepts. Consider the 
regression equation Y; = /to + ft] X, + e, and 
write two regression equations for the two cat¬ 
egories as we did above: 


fto + ftioXj + Si , if Dj — 0 
fto + ftuXj + Si, if D, = 1 


We can couple these two equations in a single 
equation as follows: 


Y — fto + ftioXj + S(D, Xj) + Si 


where 8 = ft u — ft w . In fact, the above equation 
is identical to the first equation for D, = 0 and 
to the second for D, = 1. This regression can be 
estimated with the usual LS methods. 

In practice, it is rarely appropriate to consider 
only interactions and not the intercept, which is 
the main effect. We call marginalization the fact 
that the interaction effect is marginal with re¬ 
spect to the main effect. However, we can easily 
construct a model that combines both effects. 
In fact we can write the following regression 
adding two variables, the dummy D and the 
interaction DX: 


Y — fto + yDi + fti X, + S(DjXi) + si 

This regression equation, which now includes 
three regressors, combines both effects. 

The above process of introducing dummy 
variables can be generalized to regressions 
with multiple variables. Consider the following 
regression: 

N 

Y = fto + ftj Xij + si 
j =1 

where data can be partitioned in two categories 
with the use of a dummy variable: 


"Di 

i 

X n • 

• X 

Dj 

i 

Xn ■ 

■ X' 


We can introduce the dummy D as well as its 
interaction with the N quantitative variable and 


thus write the following equation: 

N N 

Y = fto + Yi Di + ftj Xij + 8jj(Di Xjj) + Si 
1=1 7=1 

The above discussion depends critically on 
the fact that there are only two categories, a fact 
that allows one to use the numerical variable 
0,1 to identify the two categories. However, the 
process can be easily extended to multiple cat¬ 
egories by adding dummy variables. Suppose 
there are K > 2 categories. An explanatory vari¬ 
able that distinguishes between more than two 
categories is called a polytomous variable. 

Suppose there are three categories. A, B, and 
C. Consider a dummy variable D1 that assumes 
a value one on the elements of A and zero on 
all the others. Let's now add a second dummy 
variable D2 that assumes the value one on the 
elements of the category B and zero on all the 
others. The three categories are now completely 
identified: A is identified by the values 1,0 of 
the two dummy variables, B by the values 0,1, 
and C by the values 0,0. Note that the values 
1,1 do not identify any category. This process 
can be extended to any number of categories. If 
there are K categories, we need K — 1 dummy 
variables. 

How can we determine if a given categoriza¬ 
tion is useful? It is quite obvious that many 
categorizations will be totally useless for the 
purpose of any econometric regression. If 
we categorize bonds in function of the color 
of the logo of the issuer, it is quite obvious 
that we obtain meaningless results. In other 
cases, however, distinctions can be subtle and 
important. Consider the question of market 
regime shifts or structural breaks. These are 
delicate questions that can be addressed only 
with appropriate statistical tests. 

A word of caution about statistical tests is in 
order. Statistical tests typically work under the 
assumptions of the model and might be mis¬ 
leading if these assumptions are violated. If we 
try to fit a linear model to a process that is in¬ 
herently nonlinear, tests might be misleading. 
It is good practice to use several tests and to 
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be particularly attentive to inconsistencies be¬ 
tween test results. Inconsistencies signal poten¬ 
tial problems in applying tests, typically model 
misspecification. 

The t-statistic applied to the regression 
coefficients of dummy variables offer a set of 
important tests to judge which regressors are 
significant. The f-statistics are the coefficients 
divided by their respective squared errors. 
The p-value associated with each coefficient 
estimate is the probability of the hypothesis 
that the corresponding coefficient is zero, 
that is, that the corresponding variable is 
irrelevant. 

We can also use the F-test to test the signifi¬ 
cance of each specific dummy variable. To do 
so we can run the regression with and with¬ 
out that variable and form the corresponding 
F-test. The Chow test is the F-test to gauge if all 
the dummy variables are collectively irrelevant 
(see Chow, 1960). The Chow test is an F-test of 
mutual exclusion, written as follows: 

[SSR - (SSRt + SSR 2 )] [n - 2(Jfc + 1)] 

~ SSRj + SSR 2 k + 1 

where 

SSRi = the squared sum of residuals of 
the regression run with data in 
the first category without dummy 
variables 

SSR 2 = the squared sum of residuals of 
the regression run with data in the 
second category without dummy 
variables 

SSR = the squared sum of residuals of the 
regression run with fully pooled 
data without dummy variables 

Observe that SSR\ + SSR 2 is equal to the 
squared sum of residuals of the regression run 
on fully pooled data but with dummy variables. 
Thus the Chow test is the F-test of the unre¬ 
stricted regressions with and without dummy 
variables. 


Illustration: Predicting Corporate 
Bond Yield Spreads 

To illustrate the use of dummy variables, we 
will estimate a model to predict corporate bond 
spreads. 1 The regression is relative to a cross 
section of bonds. The regression equation is the 
following: 

Spread ( = + /FCoupon ( + /hCoverageRa tio, 

+ /bLoggedEBIT ( + e, 

where 

Spread; = option-adjusted spread (in 
basis points) for the bond 
issue of company i 

Coupon; = coupon rate for the bond of 
company i, expressed with¬ 
out considering percentage 
sign (i.e., 7.5% = 7.5) 

CoverageRatio; = earnings before interest, 
taxes, depreciation and 
amortization (EBITDA) di¬ 
vided by interest expense 
for company i 

LoggedEBIT; = logarithm of earnings 
(earnings before interest 
and taxes, EBIT, in millions 
of dollars) for company i 

The dependent variable. Spread, is not mea¬ 
sured by the typically nominal spread but by the 
option-adjusted spread. This spread measure 
adjusts for any embedded options in a bond 
(see Chapter 6 in Fabozzi, 2006). 

Theory would suggest the following proper¬ 
ties for the estimated coefficients: 

• The higher the coupon rate, the greater the is¬ 
suer's default risk and hence the larger the 
spread. Therefore, a positive coefficient for 
the coupon rate is expected. 

• A coverage ratio is a measure of a com¬ 
pany's ability to satisfy fixed obligations, 
such as interest, principal repayment, or 
lease payments. There are various coverage 
ratios. The one used in this illustration is the 
ratio of the earnings before interest, taxes. 
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depreciation, and amortization (EBITDA) 
divided by interest expense. Since the higher 
the coverage ratio the lower the default risk, 
an inverse relationship is expected between 
the spread and the coverage ratio; that is, the 
estimated coefficient for the coverage ratio is 
expected to be negative. 

• There are various measures of earnings re¬ 
ported in financial statements. Earnings in 
this illustration is defined as the trailing 
12-months earnings before interest and taxes 
(EBIT). Holding other factors constant, it is 
expected that the larger the EBIT, the lower 
the default risk and therefore an inverse rela¬ 
tionship (negative coefficient) is expected. 

We used 100 observations at two different 
dates, 6/6/05 and 11/28/05; thus there are 200 
observations in total. This will allow us to test 
if there is a difference in the spread regression 
for investment-grade and noninvestment grade 
bonds using all observations. We will then test 
to see if there is any structural break between 
the two dates. We organize the data in matrix 
form as usual. Data are shown in Table 1. The 
second column indicates that data belong to two 
categories and suggests the use of one dummy 
variable. Another dummy variable is used later 
to distinguish between the two dates. Let's first 
estimate the regression equation for the fully 
pooled data, that is, all data without any distinc¬ 
tion in categories. The estimated coefficients for 
the model and their corresponding f-statistics 
are shown below: 


Estimated 
Coefficient Coefficient 

Standard 

Error f-statistic 

/7-value 

Pa 

157.01 

89.56 

1.753 

0.081 

Pi 

61.27 

8.03 

7.630 

9.98E-13 

Pi 

-13.20 

2.27 

-5.800 

2.61E-08 

Pi 

-90.88 

16.32 

-5.568 

8.41E-08 


Other regression results are: 

SSR: 2.3666e -I- 006 
F-statistic: 89.38 
p-value: 0 
R 2 : 0.57 


Given the high value of the F-statistic and the 
p-value close to zero, the regression is signif¬ 
icant. The coefficient for the three regressors 
is statistically significant and has the expected 
sign. However, the intercept term is not statisti¬ 
cally significant. The residuals are given in the 
first column of Table 2. 

Let's now analyze if we obtain a better fit if 
we consider the two categories of investment- 
grade and below investment-grade bonds. 
It should be emphasized that this is only an 
exercise to show the application of regression 
analysis. The conclusions we reach are not 
meaningful from an econometric point of view 
given the small size of the database. The new 
equation is written as follows: 

Spread; = f) o + Pi D1, + /S 2 Coupon ; 

+ ^Dl, Coupon ; + /hCoverageRatiO; 

+ fkDl/CoverageRatiO; + /^LoggedEBIT; 
+ (SyDl/LoggedEBIT; + s ; 

There are now seven variables and eight pa¬ 
rameters to estimate. The estimated model co¬ 
efficients and the f-statistics are shown below: 


Coefficient 

Estimated 

Coefficient 

Standard 

Error 

f-statistic /7-value 

Pa 

284.52 

73.63 

3.86 

0.00 

Pi 

597.88 

478.74 

1.25 

0.21 

Pi 

37.12 

7.07 

5.25 

3.96E-07 

Pi 

-45.54 

38.77 

-1.17 

0.24 

Pi 

-10.33 

1.84 

-5.60 

7.24E-08 

Pi 

50.13 

40.42 

1.24 

0.22 

Pe 

-83.76 

13.63 

-6.15 

4.52E-09 

&7 

-0.24 

62.50 

-0.00 

1.00 


Other regression results are: 

SSR: 1.4744e + 006 
F-statistic: 76.83 
p-value: 0 
R 2 : 0.73 

The Chow test has the value 16.60. The 
F-statistic and the Chow test suggest that the 
use of dummy variables has greatly improved 
the goodness of fit of the regression, even after 
compensating for the increase in the number of 
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Table 1 Regression Data for the Bond Spread Application: 11/28/2005 and 06/06/2005 


Issue # 

Spread, 

11/28/05 

CCC+ 

and 

Below 

Coupon 

Coverage 

Ratio 

Logged 

EBIT 

Spread, 

6/6/05 

CCC+ 

and 

Below 

Coupon 

Coverage 

Ratio 

Logged 

EBIT 

1 

509 

0 

7.400 

2.085 

2.121 

473 

0 

7.400 

2.087 

2.111 

2 

584 

0 

8.500 

2.085 

2.121 

529 

0 

8.500 

2.087 

2.111 

3 

247 

0 

8.375 

9.603 

2.507 

377 

0 

8.375 

5.424 

2.234 

4 

73 

0 

6.650 

11.507 

3.326 

130 

0 

6.650 

9.804 

3.263 

5 

156 

0 

7.125 

11.507 

3.326 

181 

0 

7.125 

9.804 

3.263 

6 

240 

0 

7.250 

2.819 

2.149 

312 

0 

7.250 

2.757 

2.227 

7 

866 

1 

9.000 

1.530 

2.297 

852 

1 

9.000 

1.409 

1.716 

8 

275 

0 

5.950 

8.761 

2.250 

227 

0 

5.950 

11.031 

2.166 

9 

515 

0 

8.000 

2.694 

2.210 

480 

0 

8.000 

2.651 

2.163 

10 

251 

0 

7.875 

8.289 

1.698 

339 

0 

7.875 

8.231 

1.951 

11 

507 

0 

9.375 

2.131 

2.113 

452 

0 

9.375 

2.039 

2.042 

12 

223 

0 

7.750 

4.040 

2.618 

237 

0 

7.750 

3.715 

2.557 

13 

71 

0 

7.250 

7.064 

2.348 

90 

0 

7.250 

7.083 

2.296 

14 

507 

0 

8.000 

2.656 

1.753 

556 

0 

8.000 

2.681 

1.797 

15 

566 

1 

9.875 

1.030 

1.685 

634 

1 

9.875 

1.316 

1.677 

16 

213 

0 

7.500 

11.219 

3.116 

216 

0 

7.500 

10.298 

2.996 

17 

226 

0 

6.875 

11.219 

3.116 

204 

0 

6.875 

10.298 

2.996 

18 

192 

0 

7.750 

11.219 

3.116 

201 

0 

7.750 

10.298 

2.996 

19 

266 

0 

6.250 

3.276 

2.744 

298 

0 

6.250 

3.107 

2.653 

20 

308 

0 

9.250 

3.276 

2.744 

299 

0 

9.250 

3.107 

2.653 

21 

263 

0 

7.750 

2.096 

1.756 

266 

0 

7.750 

2.006 

3.038 

22 

215 

0 

7.190 

7.096 

3.469 

259 

0 

7.190 

6.552 

3.453 

23 

291 

0 

7.690 

7.096 

3.469 

315 

0 

7.690 

6.552 

3.453 

24 

324 

0 

8.360 

7.096 

3.469 

331 

0 

8.360 

6.552 

3.453 

25 

272 

0 

6.875 

8.612 

1.865 

318 

0 

6.875 

9.093 

2.074 

26 

189 

0 

8.000 

4.444 

2.790 

209 

0 

8.000 

5.002 

2.756 

27 

383 

0 

7.375 

2.366 

2.733 

417 

0 

7.375 

2.375 

2.727 

28 

207 

0 

7.000 

2.366 

2.733 

200 

0 

7.000 

2.375 

2.727 

29 

212 

0 

6.900 

4.751 

2.847 

235 

0 

6.900 

4.528 

2.822 

30 

246 

0 

7.500 

19.454 

2.332 

307 

0 

7.500 

16.656 

2.181 

31 

327 

0 

6.625 

3.266 

2.475 

365 

0 

6.625 

2.595 

2.510 

32 

160 

0 

7.150 

3.266 

2.475 

237 

0 

7.150 

2.595 

2.510 

33 

148 

0 

6.300 

3.266 

2.475 

253 

0 

6.300 

2.595 

2.510 

34 

231 

0 

6.625 

3.266 

2.475 

281 

0 

6.625 

2.595 

2.510 

35 

213 

0 

6.690 

3.266 

2.475 

185 

0 

6.690 

2.595 

2.510 

36 

350 

0 

7.130 

3.266 

2.475 

379 

0 

7.130 

2.595 

2.510 

37 

334 

0 

6.875 

4.310 

2.203 

254 

0 

6.875 

5.036 

2.155 

38 

817 

1 

8.625 

1.780 

1.965 

635 

0 

8.625 

1.851 

1.935 

39 

359 

0 

7.550 

2.951 

3.078 

410 

0 

7.550 

2.035 

3.008 

40 

189 

0 

6.500 

8.518 

2.582 

213 

0 

6.500 

13.077 

2.479 

41 

138 

0 

6.950 

25.313 

2.520 

161 

0 

6.950 

24.388 

2.488 

42 

351 

0 

9.500 

3.242 

1.935 

424 

0 

9.500 

2.787 

1.876 

43 

439 

0 

8.250 

2.502 

1.670 

483 

0 

8.250 

2.494 

1.697 

44 

347 

0 

7.700 

4.327 

3.165 

214 

0 

7.700 

4.276 

3.226 

45 

390 

0 

7.750 

4.327 

3.165 

260 

0 

7.750 

4.276 

3.226 

46 

149 

0 

8.000 

4.327 

3.165 

189 

0 

8.000 

4.276 

3.226 

47 

194 

0 

6.625 

4.430 

3.077 

257 

0 

6.625 

4.285 

2.972 

48 

244 

0 

8.500 

4.430 

3.077 

263 

0 

8.500 

4.285 

2.972 

49 

566 

1 

10.375 

2.036 

1.081 

839 

1 

10.375 

2.032 

1.014 

50 

185 

0 

6.300 

7.096 

3.469 

236 

0 

6.300 

6.552 

3.453 

51 

196 

0 

6.375 

7.096 

3.469 

221 

0 

6.375 

6.552 

3.453 

52 

317 

0 

6.625 

3.075 

2.587 

389 

0 

6.625 

2.785 

2.551 

53 

330 

0 

8.250 

3.075 

2.587 

331 

0 

8.250 

2.785 

2.551 
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Table 1 ( Continued ) 


Issue # 

Spread, 

11/28/05 

CCC+ 

and 

Below 

Coupon 

Coverage 

Ratio 

Logged 

EBIT 

Spread, 

6/6/05 

CCC+ 

and 

Below 

Coupon 

Coverage 

Ratio 

Logged 

EBIT 

54 

159 

0 

6.875 

8.286 

3.146 

216 

0 

6.875 

7.210 

3.098 

55 

191 

0 

7.125 

8.286 

3.146 

257 

0 

7.125 

7.210 

3.098 

56 

148 

0 

7.375 

8.286 

3.146 

117 

0 

7.375 

7.210 

3.098 

57 

112 

0 

7.600 

8.286 

3.146 

151 

0 

7.600 

7.210 

3.098 

58 

171 

0 

7.650 

8.286 

3.146 

221 

0 

7.650 

7.210 

3.098 

59 

319 

0 

7.375 

3.847 

1.869 

273 

0 

7.375 

4.299 

1.860 

60 

250 

0 

7.375 

12.656 

2.286 

289 

0 

7.375 

8.713 

2.364 

61 

146 

0 

5.500 

5.365 

3.175 

226 

0 

5.500 

5.147 

3.190 

62 

332 

0 

6.450 

5.365 

3.175 

345 

0 

6.450 

5.147 

3.190 

63 

354 

0 

6.500 

5.365 

3.175 

348 

0 

6.500 

5.147 

3.190 

64 

206 

0 

6.625 

7.140 

2.266 

261 

0 

6.625 

5.596 

2.091 

65 

558 

0 

7.875 

2.050 

2.290 

455 

0 

7.875 

2.120 

2.333 

66 

190 

0 

6.000 

2.925 

3.085 

204 

0 

6.000 

3.380 

2.986 

67 

232 

0 

6.750 

2.925 

3.085 

244 

0 

6.750 

3.380 

2.986 

68 

913 

1 

11.250 

2.174 

1.256 

733 

0 

11.250 

2.262 

1.313 

69 

380 

0 

9.750 

4.216 

1.465 

340 

0 

9.750 

4.388 

1.554 

70 

174 

0 

6.500 

4.281 

2.566 

208 

0 

6.500 

4.122 

2.563 

71 

190 

0 

7.450 

10.547 

2.725 

173 

0 

7.450 

8.607 

2.775 

72 

208 

0 

7.125 

2.835 

3.109 

259 

0 

7.125 

2.813 

3.122 

73 

272 

0 

6.500 

5.885 

2.695 

282 

0 

6.500 

5.927 

2.644 

74 

249 

0 

6.125 

5.133 

2.682 

235 

0 

6.125 

6.619 

2.645 

75 

278 

0 

8.750 

6.562 

2.802 

274 

0 

8.750 

7.433 

2.785 

76 

252 

0 

7.750 

2.822 

2.905 

197 

0 

7.750 

2.691 

2.908 

77 

321 

0 

7.500 

2.822 

2.905 

226 

0 

7.500 

2.691 

2.908 

78 

379 

0 

7.750 

4.093 

2.068 

362 

0 

7.750 

4.296 

2.030 

79 

185 

0 

6.875 

6.074 

2.657 

181 

0 

6.875 

5.294 

2.469 

80 

307 

0 

7.250 

5.996 

2.247 

272 

0 

7.250 

3.610 

2.119 

81 

533 

0 

10.625 

1.487 

1.950 

419 

0 

10.625 

1.717 

2.081 

82 

627 

0 

8.875 

1.487 

1.950 

446 

0 

8.875 

1.717 

2.081 

83 

239 

0 

8.875 

2.994 

2.186 

241 

0 

8.875 

3.858 

2.161 

84 

240 

0 

7.375 

8.160 

2.225 

274 

0 

7.375 

8.187 

2.075 

85 

634 

0 

8.500 

2.663 

2.337 

371 

0 

8.500 

2.674 

2.253 

86 

631 

1 

7.700 

2.389 

2.577 

654 

1 

7.700 

2.364 

2.632 

87 

679 

1 

9.250 

2.389 

2.577 

630 

1 

9.250 

2.364 

2.632 

88 

556 

1 

9.750 

1.339 

1.850 

883 

1 

9.750 

1.422 

1.945 

89 

564 

1 

9.750 

1.861 

2.176 

775 

1 

9.750 

1.630 

1.979 

90 

209 

0 

6.750 

8.048 

2.220 

223 

0 

6.750 

7.505 

2.092 

91 

190 

0 

6.500 

4.932 

2.524 

232 

0 

6.500 

4.626 

2.468 

92 

390 

0 

6.875 

6.366 

1.413 

403 

0 

6.875 

5.033 

1.790 

93 

377 

0 

10.250 

2.157 

2.292 

386 

0 

10.250 

2.057 

2.262 

94 

143 

0 

5.750 

11.306 

2.580 

110 

0 

5.750 

9.777 

2.473 

95 

207 

0 

7.250 

2.835 

3.109 

250 

0 

7.250 

2.813 

3.122 

96 

253 

0 

6.500 

4.918 

2.142 

317 

0 

6.500 

2.884 

1.733 

97 

530 

1 

8.500 

0.527 

2.807 

654 

1 

8.500 

1.327 

2.904 

98 

481 

0 

6.750 

2.677 

1.858 

439 

0 

6.750 

3.106 

1.991 

99 

270 

0 

7.625 

2.835 

3.109 

242 

0 

7.625 

2.813 

3.122 

100 

190 

0 

7.125 

9.244 

3.021 

178 

0 

7.125 

7.583 

3.138 


Notes-. 

Spread = option-adjusted spread (in basis points) 

Coupon = coupon rate, expressed without considering percentage sign (i.e., 7.5% = 7.5) 
Coverage Ratio = EBITDA divided by interest expense for company 
Logged EBIT = logarithm of earnings (EBIT in millions of dollars) 
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Illustration of Residuals and Leverage for Corporate Bond Spread 


Residuals 

Residuals 

Dummy 1 

Residuals 
Dummy 2 

118.79930 

148.931400 

162.198700 

126.39350 

183.097400 

200.622000 

-68.57770 

-39.278100 

-26.716500 

-37.26080 

-60.947500 

-71.034400 

16.63214 

4.419645 

-3.828890 

-128.76600 

-104.569000 

-92.122000 

386.42330 

191.377200 

217.840000 

73.53972 

48.516800 

56.58778 

104.15990 

146.400600 

160.438900 

-124.78700 

-98.020100 

-71.374300 

-4.28874 

73.473220 

94.555400 

-117.58200 

-88.168700 

-82.883100 

-223.61800 

-213.055000 

-202.748000 

54.13075 

99.735710 

123.153000 

-29.42160 

-132.755000 

-179.955000 

27.74192 

26.913670 

24.308960 

79.04072 

63.114850 

58.091160 

-8.57759 

-3.366800 

-5.003930 

18.62462 

13.109110 

9.664499 

-123.21000 

-56.256500 

-48.090100 

-181.64800 

-140.494000 

-118.369000 

26.43157 

27.457990 

14.487850 

71.79254 

84.897050 

73.862080 

63.73623 

93.025400 

84.583560 

-23.09740 

-22.603200 

-3.106990 

-146.00700 

-112.938000 

-110.018000 

53.72288 

78.075810 

78.781050 

-99.29780 

-84.003500 

-84.749600 

-46.31030 

-41.105600 

-43.489200 

98.22006 

79.285040 

96.588250 

32.05062 

37.541930 

41.075430 

-167.12000 

-148.947000 

-143.382000 

-127.03400 

-129.393000 

-127.118000 

-63.94940 

-58.458100 

-54.924600 

-85.93250 

-78.871000 

-75.085900 

24.10520 

41.795380 

47.283410 

12.86740 

23.326060 

33.884440 

333.53890 

101.376800 

173.584400 

58.02881 

82.472150 

77.040360 

-19.14100 

-32.550700 

-29.298900 

118.41190 

67.990200 

81.986050 

-169.48100 

-90.625700 

-64.883800 

-38.74030 

13.936980 

39.950520 

62.91014 

86.397490 

80.392250 

102.84620 

127.541400 

121.729700 

-153.47300 

-122.739000 

-127.583000 

-30.81510 

-32.968700 

-41.285200 

-95.711400 

-52.572300 

-53.631800 

-101.678000 

-219.347000 

-237.977000 

50.969050 

30.496460 

14.081700 

57.373200 

38.712320 

22.587840 

29.717770 

34.958870 

36.101100 

-56.859100 

-12.364200 

-4.932630 

-23.959100 

-31.659900 

-38.650000 

-7.278620 

-8.940330 

-14.962800 
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( Continued) 


Residuals 

Residuals 

Dummy 1 

Residuals 
Dummy 2 

-65.598100 

-61.220800 

-66.275700 

-115.386000 

-105.573000 

-109.757000 

-59.449600 

-48.429300 

-52.419900 

-69.299000 

-43.044000 

-23.885700 

15.946800 

13.880220 

28.513500 

11.362190 

-21.353800 

-35.607900 

139.148000 

129.380400 

118.803100 

158.084100 

149.524300 

139.140600 

-56.785300 

-60.952000 

-51.339900 

153.651800 

194.149900 

205.750200 

-15.653600 

-28.630900 

-40.227500 

-19.612200 

-14.472300 

-23.166100 

209.488200 

144.261600 

67.891100 

-185.659000 

-100.217000 

-63.396000 

-91.541800 

-92.646100 

-91.015000 

-36.623800 

-33.937000 

-29.003400 

-65.586300 

-51.301800 

-59.080100 

39.294110 

32.661770 

32.391920 

28.197460 

14.759650 

12.952710 

-73.910000 

-28.902200 

-22.353300 

-78.608000 

-47.733800 

-48.902600 

5.711553 

30.546620 

28.410290 

-10.926100 

22.258560 

38.888810 

-71.611400 

-69.462200 

-67.416900 

-10.848000 

3.505179 

15.383910 

-78.195700 

32.775440 

61.748590 

123.041000 

191.738700 

213.938800 

-223.662000 

-160.978000 

-142.925000 

-58.977600 

-47.671100 

-33.850800 

203.727300 

257.223800 

270.556600 

267.904600 

-65.208100 

89.636310 

220.923600 

-4.162260 

42.473790 

-12.621600 

-142.213000 

-168.474000 

31.862060 

-127.616000 

-134.267000 

-53.593800 

-57.028600 

-45.579800 

-70.794900 

-73.470000 

-70.669700 

24.164780 

34.342730 

62.098550 

-171.291000 

-73.744300 

-52.943000 

17.439710 

-22.092800 

-20.420000 

-74.246100 

-56.942100 

-64.236600 

-42.690600 

-42.602900 

-31.958300 

114.168900 

-66.109500 

-66.049500 

114.578500 

129.177300 

145.600600 

-34.225400 

-7.862790 

-13.705900 

-6.958960 

-10.488100 

-13.508000 

81.920940 

112.117900 

101.420600 

70.515070 

127.283800 

120.844000 

-18.587600 

24.683610 

20.132390 

-8.443100 

-26.784100 

-28.884400 

13.449820 

6.582981 

6.321103 

-50.430600 

-26.617000 

-36.781100 

318.056000 

133.403000 

130.828300 

47.876010 

16.919350 

5.068270 

64.341610 

107.038200 

99.281600 

-14.573200 

10.557760 

3.393970 


('Continued ) 
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(Continued) 


Residuals 

Residuals 

Dummy 1 

Residuals 
Dummy 2 

-66.995600 

11.539420 

7.987728 

-113.425000 

-82.640800 

-88.147800 

-209.054000 

-198.177000 

-205.892000 

107.522000 

152.737700 

142.464600 

41.638860 

-76.825800 

-145.458000 

7.647833 

10.327540 

9.887700 

33.946630 

21.528710 

18.669900 

-22.671700 

-13.952900 

-13.425200 

40.107630 

35.729610 

24.798540 

-142.727000 

-74.636000 

-73.956000 

-63.286100 

-31.013100 

-33.970100 

61.774140 

64.481450 

64.302480 

87.135110 

101.920500 

103.676700 

62.078800 

93.048860 

97.398200 

48.320900 

45.935300 

36.150130 

-121.736000 

-90.029000 

-92.609500 

87.253680 

111.626800 

105.229900 

-106.767000 

-91.452500 

-99.300700 

-28.566900 

-22.540100 

-29.135400 

108.560100 

98.752280 

95.570570 

64.418690 

71.586810 

60.886980 

-95.752300 

-75.902200 

-84.570100 

-27.665900 

-28.348600 

-40.306300 

-19.581300 

-12.413200 

-23.113000 

-119.564000 

-110.826000 

-121.274000 

47.473260 

66.840260 

58.094960 

-61.953700 

-53.237800 

-64.316600 

149.786400 

211.505100 

204.226300 

90.609530 

118.184700 

114.258300 

55.650810 

29.860840 

23.239180 

126.240500 

78.712630 

79.050720 

-107.826000 

-27.243600 

-31.116800 

7.614932 

60.121850 

50.036220 

-65.174500 

-41.979400 

-42.794500 

-22.238400 

2.164489 

1.542950 

-108.558000 

-78.116000 

-77.769900 

20.679750 

19.696850 

12.963030 

-88.216600 

-43.906700 

-43.383600 

165.253100 

48.262590 

-23.500200 

93.311620 

74.519920 

70.896340 

73.715770 

56.735780 

53.402470 

94.629570 

100.961000 

90.629950 

-62.947300 

-17.362000 

-21.403800 

14.480140 

10.216950 

6.659433 

40.160620 

41.936480 

39.346550 

-115.159000 

-107.344000 

-108.966000 

-94.946500 

-81.696400 

-82.447900 

-28.010400 

-13.552500 

-14.110500 

-110.127000 

-85.111400 

-96.632900 

9.959282 

18.682370 

12.662020 

89.889700 

57.689740 

48.509480 

150.675500 

141.424000 

135.920500 

150.611600 

142.567900 

137.258000 

-38.040900 

-36.521000 

-48.754100 

55.443990 

95.437610 

88.132530 
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Table 2 ( Continued ) 


Issue # 

Residuals 

Residuals 

Dummy 1 

Residuals 
Dummy 2 

166 

-4.652580 

-18.233400 

-27.698600 

167 

-10.611100 

-6.074840 

-12.637200 

168 

35.778970 

164.163000 

162.921500 

169 

-215.328000 

-131.013000 

-135.422000 

170 

-59.986400 

-60.605400 

-70.729300 

171 

-74.693600 

-66.782400 

-69.716200 

172 

-13.734800 

0.523639 

-3.905600 

173 

45.295840 

38.898770 

30.164940 

174 

30.476800 

13.024800 

3.159872 

175 

-67.888500 

-25.271900 

-23.635500 

176 

-135.061000 

-103.830000 

-107.375000 

177 

-90.741200 

-65.550000 

-70.062300 

178 

-28.683300 

4.187387 

-4.706060 

179 

-103.027000 

-97.290000 

-106.078000 

180 

-88.975000 

-66.845700 

-77.367900 

181 

-177.281000 

-67.904100 

-66.493200 

182 

-43.044700 

24.059160 

18.696920 

183 

-212.505000 

-152.131000 

-155.963000 

184 

-38.210800 

-25.916400 

-34.173800 

185 

-66.764700 

-12.702000 

-17.886300 

186 

295.611300 

-36.578800 

106.036400 

187 

176.630300 

-47.533000 

-13.126100 

188 

324.060100 

189.413000 

136.666400 

189 

221.951100 

76.029960 

34.046210 

190 

-58.422000 

-59.380500 

-70.254000 

191 

-37.907200 

-39.303500 

-49.850800 

192 

53.841660 

65.166450 

51.559780 

193 

-166.323000 

-68.275700 

-66.904900 

194 

-45.521100 

-79.888400 

-90.959200 

195 

-30.394500 

-13.116600 

-17.062000 

196 

-42.709500 

-33.855500 

-50.285700 

197 

257.550200 

34.224540 

70.337910 

198 

90.307160 

102.727000 

89.148700 

199 

-61.373800 

-35.037300 

-37.531400 

200 

-30.310400 

-29.889500 

-32.034600 


Notes: 

Residuals: residuals from the pooled regression without dummy variables for investment grade. 
Residuals Dummy 1: inclusion of dummy variable for investment grade. 

Residuals Dummy 2: inclusion of dummy variable to test for regime shift. 


parameters. The residuals of the model without 
and with dummy variable D1 are shown, re¬ 
spectively, in the second and third columns of 
Table 2. 

Now let's use dummy variables to test if there 
is a regime shift between the two dates. This is 
a common use for dummy variables in practice. 
To this end we create a new dummy variable 
that has the value 0 for the first date 11/28/05 


and 1 for the second date 6/6/05. The new 
equation is written as follows: 

Spread; = fio + Pi D2, + /S 2 Coupon ; 

+ p 3 D2, Coupon ; + /hCoverageRatiO; 

+ p s D2j CoverageRatiO; + /^LoggedEBIT; 
+ j 6 7 D2 i LoggedEBIT; + e, 

as in the previous case but with a different 
dummy variable. There are seven independent 
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variables and eight parameters to estimate. The 
estimated model coefficients and f-statistics are 
shown below: 


Coefficient 

Estimated Standard 
Coefficient Error 

f-statistic /j-value 

Pa 

257.26 

79.71 

3.28 

0.00 

Pi 

82.17 

61.63 

1.33 

0.18 

Pi 

33.25 

7.11 

4.67 

5.53E-06 

Ps 

28.14 

2.78 

10.12 

1.45E-19 

Pi 

-10.79 

2.50 

-4.32 

2.49E-05 

Ps 

0.00 

3.58 

0.00 

1.00 

Pe 

-63.20 

18.04 

-3.50 

0.00 

&7 

-27.48 

24.34 

-1.13 

0.26 


Other regression statistics are: 

SSR: 1.5399e + 006 
F-statistic: 72.39 
p-value: 0 
R 2 : 0.71 

The Chow test has the value 14.73. The 
F-statistics and the Chow test suggest that there 
is indeed a regime shift and that the spread re¬ 
gressions at the two different dates are different. 
Again, the use of dummy variables has greatly 
improved the goodness of fit of the regression, 
even after compensating for the increase in the 
number of parameters. The residuals of the 
model with dummy variables D2 are shown in 
the next-to-the-last column of Table 2. 

Illustration: Testing the Mutual 
Fund Characteristic Lines in 
Different Market Environments 

The characteristic line of a mutual fund is the 
regression of the excess returns of a mutual fund 
on the market's excess returns: 

yu — &i T fii %t 

where 

yu = mutual fund i's excess return over 
the risk-free rate 

Xt = market excess return over the risk¬ 
free rate 


a, and fij = the regression parameters to be es¬ 
timated for mutual fund i 

We will first estimate the characteristic line 
for two large-cap mutual funds. Since we would 
prefer not to disclose the name of each fund, we 
simply refer to them as A and B. (Neither mu¬ 
tual fund selected is an index fund.) Because 
the two mutual funds are large-cap funds, the 
S&P 500 was used as the benchmark. The risk¬ 
free rate used was the 90-day Treasury bill rate. 
Ten years of monthly data were used from Jan¬ 
uary 1, 1995 to December 31, 2004. The data 
are reported in Table 3. The first column in the 
table shows the month. The second and third 
columns give the return on the market return 
(r Mt ) and risk-free rate (ty f ), respectively. The 
fifth column is the excess market return, which 
is x t in the regression equation. The seventh and 
eighth columns show the returns for mutual 
funds A and B, respectively. The excess returns 
for the two mutual funds (yu) are given in the 
last two columns. The other columns will be 
explained shortly. 

The results of the above regression for both 
mutual funds are shown in Table 4. The esti¬ 
mated f> for both mutual funds is statistically 
significantly different from zero. 

Let's now perform a simple application of the 
use of dummy variables by determining if the 
slope (beta) of the two mutual funds is differ¬ 
ent in a rising stock market ("up market") and 
a declining stock market ("down market"). To 
test this, we can write the following multiple 
regression model: 

yu — at + Pnx t + foiDtXt) + £u 

where D f is the dummy variable that can take 
on a value of 1 or 0. We will let 

D f = 1 if period t is classified as an up market 
D f = 0 if period t is classified as a down market 

The coefficient for the dummy variable is @ 2 i- 
If that coefficient is statistically significant, then 
for the mutual fund: 

In an up market: fii = fin + fi 2 i 

In a down market: /3, = fi \; 
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Table 3 Data for Estimating Mutual Fund Characteristic Line with a Dummy Variable 


Month 

Ended 

I'M 

r ft 

Dummy 

D t 

I'M —fft 

Xt 

D t x t 

Mutual Fund 

A B 

n r t 

A 

yt 

B 

yt 

01/31/1995 

2.60 

0.42 

0 

2.18 

0 

0.65 

1.28 

0.23 

0.86 

02/28/1995 

3.88 

0.40 

0 

3.48 

0 

3.44 

3.16 

3.04 

2.76 

03/31/1995 

2.96 

0.46 

1 

2.50 

2.5 

2.89 

2.58 

2.43 

2.12 

04/30/1995 

2.91 

0.44 

1 

2.47 

2.47 

1.65 

1.81 

1.21 

1.37 

05/31/1995 

3.95 

0.54 

1 

3.41 

3.41 

2.66 

2.96 

2.12 

2.42 

06/30/1995 

2.35 

0.47 

1 

1.88 

1.88 

2.12 

2.18 

1.65 

1.71 

07/31/1995 

3.33 

0.45 

1 

2.88 

2.88 

3.64 

3.28 

3.19 

2.83 

08/31/1995 

0.27 

0.47 

1 

-0.20 

-0.2 

-0.40 

0.98 

-0.87 

0.51 

09/30/1995 

4.19 

0.43 

1 

3.76 

3.76 

3.06 

3.47 

2.63 

3.04 

10/31/1995 

-0.35 

0.47 

1 

-0.82 

-0.82 

-1.77 

-0.63 

-2.24 

-1.10 

11/30/1995 

4.40 

0.42 

1 

3.98 

3.98 

4.01 

3.92 

3.59 

3.50 

12/31/1995 

1.85 

0.49 

1 

1.36 

1.36 

1.29 

1.73 

0.80 

1.24 

01/31/1996 

3.44 

0.43 

1 

3.01 

3.01 

3.36 

2.14 

2.93 

1.71 

02/29/1996 

0.96 

0.39 

1 

0.57 

0.57 

1.53 

1.88 

1.14 

1.49 

03/31/1996 

0.96 

0.39 

1 

0.57 

0.57 

0.59 

1.65 

0.20 

1.26 

04/30/1996 

1.47 

0.46 

1 

1.01 

1.01 

1.46 

1.83 

1.00 

1.37 

05/31/1996 

2.58 

0.42 

1 

2.16 

2.16 

2.17 

2.20 

1.75 

1.78 

06/30/1996 

0.41 

0.40 

1 

0.01 

0.01 

-0.63 

0.00 

-1.03 

-0.40 

07/31/1996 

-4.45 

0.45 

1 

-4.90 

-4.9 

-4.30 

-3.73 

-4.75 

-4.18 

08/31/1996 

2.12 

0.41 

0 

1.71 

0 

2.73 

2.24 

2.32 

1.83 

09/30/1996 

5.62 

0.44 

0 

5.18 

0 

5.31 

4.49 

4.87 

4.05 

10/31/1996 

2.74 

0.42 

1 

2.32 

2.32 

1.42 

1.34 

1.00 

0.92 

11/30/1996 

7.59 

0.41 

1 

7.18 

7.18 

6.09 

5.30 

5.68 

4.89 

12/31/1996 

-1.96 

0.46 

1 

-2.42 

-2.42 

-1.38 

-0.90 

-1.84 

-1.36 

01/31/1997 

6.21 

0.45 

1 

5.76 

5.76 

4.15 

5.73 

3.70 

5.28 

02/28/1997 

0.81 

0.39 

1 

0.42 

0.42 

1.65 

-1.36 

1.26 

-1.75 

03/31/1997 

-4.16 

0.43 

1 

-4.59 

-4.59 

-4.56 

-3.75 

-4.99 

-4.18 

04/30/1997 

5.97 

0.43 

1 

5.54 

5.54 

4.63 

3.38 

4.20 

2.95 

05/31/1997 

6.14 

0.49 

1 

5.65 

5.65 

5.25 

6.05 

4.76 

5.56 

06/30/1997 

4.46 

0.37 

1 

4.09 

4.09 

2.98 

2.90 

2.61 

2.53 

07/31/1997 

7.94 

0.43 

1 

7.51 

7.51 

6.00 

7.92 

5.57 

7.49 

08/31/1997 

-5.56 

0.41 

1 

-5.97 

-5.97 

-4.40 

-3.29 

-4.81 

-3.70 

09/30/1997 

5.48 

0.44 

1 

5.04 

5.04 

5.70 

4.97 

5.26 

4.53 

10/31/1997 

-3.34 

0.42 

1 

-3.76 

-3.76 

-2.76 

-2.58 

-3.18 

-3.00 

11/30/1997 

4.63 

0.39 

0 

4.24 

0 

3.20 

2.91 

2.81 

2.52 

12/31/1997 

1.72 

0.48 

1 

1.24 

1.24 

1.71 

2.41 

1.23 

1.93 

01/31/1998 

1.11 

0.43 

1 

0.68 

0.68 

-0.01 

-0.27 

-0.44 

-0.70 

02/28/1998 

7.21 

0.39 

1 

6.82 

6.82 

5.50 

6.84 

5.11 

6.45 

03/31/1998 

5.12 

0.39 

1 

4.73 

4.73 

5.45 

3.84 

5.06 

3.45 

04/30/1998 

1.01 

0.43 

1 

0.58 

0.58 

-0.52 

1.07 

-0.95 

0.64 

05/31/1998 

-1.72 

0.40 

1 

-2.12 

-2.12 

-1.25 

-1.30 

-1.65 

-1.70 

06/30/1998 

4.06 

0.41 

1 

3.65 

3.65 

3.37 

4.06 

2.96 

3.65 

07/31/1998 

-1.06 

0.40 

1 

-1.46 

-1.46 

0.10 

-1.75 

-0.30 

-2.15 

08/31/1998 

-14.46 

0.43 

1 

-14.89 

-14.89 

-15.79 

-13.44 

-16.22 

-13.87 

09/30/1998 

6.41 

0.46 

0 

5.95 

0 

5.00 

4.86 

4.54 

4.40 

10/31/1998 

8.13 

0.32 

0 

7.81 

0 

5.41 

4.56 

5.09 

4.24 

11/30/1998 

6.06 

0.31 

0 

5.75 

0 

5.19 

5.56 

4.88 

5.25 

12/31/1998 

5.76 

0.38 

1 

5.38 

5.38 

7.59 

7.18 

7.21 

6.80 

01/31/1999 

4.18 

0.35 

1 

3.83 

3.83 

2.60 

3.11 

2.25 

2.76 

02/28/1999 

-3.11 

0.35 

1 

-3.46 

-3.46 

-4.13 

-3.01 

-4.48 

-3.36 

03/31/1999 

4.00 

0.43 

1 

3.57 

3.57 

3.09 

3.27 

2.66 

2.84 

04/30/1999 

3.87 

0.37 

1 

3.50 

3.5 

2.26 

2.22 

1.89 

1.85 


(' Continued) 
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Table 3 ( Continued ) 


Month 

Ended 

I'M 

r ft 

Dummy 

D t 

i*m fft 
x t 

D t x t 

Mutual 

A 

n 

Fund 

B 

n 

A 

yt 

B 

yt 

05/31/1999 

-2.36 

0.34 

1 

-2.70 

-2.7 

-2.12 

-1.32 

-2.46 

-1.66 

06/30/1999 

5.55 

0.40 

1 

5.15 

5.15 

4.43 

5.36 

4.03 

4.96 

07/31/1999 

-3.12 

0.38 

1 

-3.50 

-3.5 

-3.15 

-1.72 

-3.53 

-2.10 

08/31/1999 

-0.50 

0.39 

0 

-0.89 

0 

-1.05 

-2.06 

-1.44 

-2.45 

09/30/1999 

-2.74 

0.39 

1 

-3.13 

-3.13 

-2.86 

-1.33 

-3.25 

-1.72 

10/31/1999 

6.33 

0.39 

0 

5.94 

0 

5.55 

2.29 

5.16 

1.90 

11/30/1999 

2.03 

0.36 

1 

1.67 

1.67 

3.23 

3.63 

2.87 

3.27 

12/31/1999 

5.89 

0.44 

1 

5.45 

5.45 

8.48 

7.09 

8.04 

6.65 

01/31/2000 

-5.02 

0.41 

1 

-5.43 

-5.43 

-4.09 

-0.83 

-4.50 

-1.24 

02/29/2000 

-1.89 

0.43 

1 

-2.32 

-2.32 

1.43 

2.97 

1.00 

2.54 

03/31/2000 

9.78 

0.47 

0 

9.31 

0 

6.84 

5.86 

6.37 

5.39 

04/30/2000 

-3.01 

0.46 

1 

-3.47 

-3.47 

-4.04 

-4.55 

-4.50 

-5.01 

05/31/2000 

-2.05 

0.50 

1 

-2.55 

-2.55 

-2.87 

-4.47 

-3.37 

-4.97 

06/30/2000 

2.46 

0.40 

1 

2.06 

2.06 

0.54 

6.06 

0.14 

5.66 

07/31/2000 

-1.56 

0.48 

0 

-2.04 

0 

-0.93 

1.89 

-1.41 

1.41 

08/31/2000 

6.21 

0.50 

0 

5.71 

0 

7.30 

6.01 

6.80 

5.51 

09/30/2000 

-5.28 

0.51 

1 

-5.79 

-5.79 

-4.73 

-4.81 

-5.24 

-5.32 

10/31/2000 

-0.42 

0.56 

0 

-0.98 

0 

-1.92 

-4.84 

-2.48 

-5.40 

11/30/2000 

-7.88 

0.51 

0 

-8.39 

0 

-6.73 

- 11.00 

-7.24 

-11.51 

12/31/2000 

0.49 

0.50 

0 

-0.01 

0 

2.61 

3.69 

2.11 

3.19 

01/31/2001 

3.55 

0.54 

0 

3.01 

0 

0.36 

5.01 

-0.18 

4.47 

02/28/2001 

-9.12 

0.38 

0 

-9.50 

0 

-5.41 

-8.16 

-5.79 

-8.54 

03/31/2001 

-6.33 

0.42 

0 

-6.75 

0 

-5.14 

-5.81 

-5.56 

-6.23 

04/30/2001 

7.77 

0.39 

0 

7.38 

0 

5.25 

4.67 

4.86 

4.28 

05/31/2001 

0.67 

0.32 

0 

0.35 

0 

0.47 

0.45 

0.15 

0.13 

06/30/2001 

-2.43 

0.28 

1 

-2.71 

-2.71 

-3.48 

-1.33 

-3.76 

-1.61 

07/31/2001 

-0.98 

0.30 

1 

-1.28 

-1.28 

-2.24 

-1.80 

-2.54 

-2.10 

08/31/2001 

-6.26 

0.31 

0 

-6.57 

0 

-4.78 

-5.41 

-5.09 

-5.72 

09/30/2001 

-8.08 

0.28 

0 

-8.36 

0 

—6.46 

-7.27 

-6.74 

-7.55 

10/31/2001 

1.91 

0.22 

0 

1.69 

0 

1.01 

2.30 

0.79 

2.08 

11/30/2001 

7.67 

0.17 

0 

7.50 

0 

4.49 

5.62 

4.32 

5.45 

12/31/2001 

0.88 

0.15 

1 

0.73 

0.73 

1.93 

2.14 

1.78 

1.99 

01/31/2002 

-1.46 

0.14 

1 

-1.60 

-1.6 

-0.99 

-3.27 

-1.13 

-3.41 

02/28/2002 

-1.93 

0.13 

1 

-2.06 

-2.06 

-0.84 

-2.68 

-0.97 

-2.81 

03/31/2002 

3.76 

0.13 

0 

3.63 

0 

3.38 

4.70 

3.25 

4.57 

04/30/2002 

-6.06 

0.15 

0 

-6.21 

0 

-4.38 

-3.32 

-4.53 

-3.47 

05/31/2002 

-0.74 

0.14 

0 

-0.88 

0 

-1.78 

-0.81 

-1.92 

-0.95 

06/30/2002 

-7.12 

0.13 

0 

-7.25 

0 

-5.92 

-5.29 

-6.05 

-5.42 

07/31/2002 

-7.80 

0.15 

0 

-7.95 

0 

-6.37 

-7.52 

-6.52 

-7.67 

08/31/2002 

0.66 

0.14 

0 

0.52 

0 

-0.06 

1.86 

-0.20 

1.72 

09/30/2002 

-10.87 

0.14 

0 

-11.01 

0 

-9.38 

-6.04 

-9.52 

-6.18 

10/31/2002 

8.80 

0.14 

0 

8.66 

0 

3.46 

5.10 

3.32 

4.96 

11/30/2002 

5.89 

0.12 

0 

5.77 

0 

3.81 

1.73 

3.69 

1.61 

12/31/2002 

-5.88 

0.11 

1 

-5.99 

-5.99 

-4.77 

-2.96 

-4.88 

-3.07 

01/31/2003 

-2.62 

0.10 

1 

-2.72 

-2.72 

-1.63 

-2.34 

-1.73 

-2.44 

02/28/2003 

-1.50 

0.09 

0 

-1.59 

0 

-0.48 

-2.28 

-0.57 

-2.37 

03/31/2003 

0.97 

0.10 

0 

0.87 

0 

1.11 

1.60 

1.01 

1.50 

04/30/2003 

8.24 

0.10 

0 

8.14 

0 

6.67 

5.44 

6.57 

5.34 

05/31/2003 

5.27 

0.09 

1 

5.18 

5.18 

4.96 

6.65 

4.87 

6.56 

06/30/2003 

1.28 

0.10 

1 

1.18 

1.18 

0.69 

1.18 

0.59 

1.08 

07/31/2003 

1.76 

0.07 

1 

1.69 

1.69 

1.71 

3.61 

1.64 

3.54 

08/31/2003 

1.95 

0.07 

1 

1.88 

1.88 

1.32 

1.13 

1.25 

1.06 
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Table 3 ( Continued ) 


Month 

Ended 

ym 

r ft 

Dummy 

D t 

ym — fft 
x t 

D t x t 

Mutual Fund 

A B 

Yt Y t 

A 

yt 

B 

yt 

09/30/2003 

-1.06 

0.08 

1 

-1.14 

-1.14 

-1.34 

-1.12 

-1.42 

-1.20 

10/31/2003 

5.66 

0.07 

1 

5.59 

5.59 

5.30 

4.21 

5.23 

4.14 

11/30/2003 

0.88 

0.07 

1 

0.81 

0.81 

0.74 

1.18 

0.67 

1.11 

12/31/2003 

5.24 

0.08 

1 

5.16 

5.16 

4.87 

4.77 

4.79 

4.69 

01/31/2004 

1.84 

0.07 

1 

1.77 

1.77 

0.87 

2.51 

0.80 

2.44 

02/29/2004 

1.39 

0.06 

1 

1.33 

1.33 

0.97 

1.18 

0.91 

1.12 

03/31/2004 

-1.51 

0.09 

1 

-1.60 

-1.6 

-0.89 

-1.79 

-0.98 

-1.88 

04/30/2004 

-1.57 

0.08 

1 

-1.65 

-1.65 

-2.59 

-1.73 

-2.67 

-1.81 

05/31/2004 

1.37 

0.06 

0 

1.31 

0 

0.66 

0.83 

0.60 

0.77 

06/30/2004 

1.94 

0.08 

0 

1.86 

0 

1.66 

1.56 

1.58 

1.48 

07/31/2004 

-3.31 

0.10 

1 

-3.41 

-3.41 

-2.82 

-4.26 

-2.92 

-4.36 

08/31/2004 

0.40 

0.11 

0 

0.29 

0 

-0.33 

0.00 

-0.44 

-0.11 

09/30/2004 

1.08 

0.11 

0 

0.97 

0 

1.20 

1.99 

1.09 

1.88 

10/31/2004 

1.53 

0.11 

0 

1.42 

0 

0.33 

1.21 

0.22 

1.10 

11/30/2004 

4.05 

0.15 

1 

3.90 

3.9 

4.87 

5.68 

4.72 

5.53 

12/31/2004 

3.40 

0.16 

1 

3.24 

3.24 

2.62 

3.43 

2.46 

3.27 


Notes: 

1. The following information is used for determining the value of the dummy variable for the first three months: 



I’m 

r f 

Y,n ~ Yf 

Sep-94 

-2.41 

0.37 

-2.78 

Oct-94 

2.29 

0.38 

1.91 

Nov-94 

-3.67 

0.37 

-4.04 

Dec-94 

1.46 

0.44 

1.02 


2. The dummy variable is defined as follows: 

D(X t = Xf if {ym — Yft) for the prior three months > 0 
D ( x t = 0 otherwise 

If f$ 2 i is not statistically significant, then there 
is no difference in /3, for up and down markets. 

In our illustration, we have to define what we 
mean by an up and a down market. We will 


Table 4 Characteristic Line for Mutual Funds A and B 


Coefficient 

Coefficient Standard 
Estimate Error 

f-statistic" 

p-value 

Mutual Fund A 




a 

0.206 

0.102 

-2.014 

0.046 

p 

0.836 

0.022 

37.176 

0.000 

r 2 

0.92 




p-value 

0.000 




Mutual Fund B 




a 

0.010 

0.140 

0.073 

0.942 


0.816 

0.031 

26.569 

0.000 

r 2 

0.86 




p-value 

0.000 





"Null hypothesis is that fi is equal to zero. 


define an up market precisely as one where the 
average excess return (market return over the 
risk-free rate or (rm — for the prior three 
months is greater than zero. Then 

D f = 1 if the average (r Mt — Yft) for the prior 
three months > 0 

D f = 0 otherwise 

The regressor will then be 

DfX t = x f if (r Mt — Yft) for the prior 
three months > 0 
DfXf = 0 otherwise 

The data are presented in Table 3. The fourth 
column provides the coding for the dummy 
variable, D f , and the sixth column shows the 
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Table 5 Regression Results for Dummy Variable 
Regression for Mutual Funds A and B 


Coefficient 

Coefficient 

Estimate 

Standard 

Error 

t-statistic 

p-value 

Fund A 

a 

-0.23 

0.10 

-2.36 

0.0198 

Pi 

0.75 

0.03 

25.83 

4E-50 

Pi 

0.18 

0.04 

4.29 

4E-05 

Fund B 

a 

0.00 

0.14 

-0.03 

0.9762 

Pi 

0.75 

0.04 

18.02 

2E-35 

Pi 

0.13 

0.06 

2.14 

0.0344 


product of D f and x ( .The regression results for 
the two mutual funds are shown in Table 5. The 
adjusted R 2 is 0.93 and 0.83 for mutual funds A 
and B, respectively. 

For both funds, fi 2 ; is statistically significantly 
different from zero. Hence, for these two mu¬ 
tual funds, there is a difference in the /3, for up 
and down markets. From the results reported 
previously, we would find that: 



Mutual Fund A 

Mutual Fund B 

Down market 

0.75 

0.75 

Pi (= Pli) 



Up market 

0.93 

0.88 

Pi (= Pli + Pli) 

(= 0.75 + 0.18) 

(= 0.75 + 0.13) 


DEPENDENT CATEGORICAL 
VARIABLES 

Thus far we have discussed models where the 
independent variables can be either quantita¬ 
tive or categorical while the dependent variable 
is quantitative. Let's now discuss models where 
the dependent variable is categorical. Recall that 
a regression model can be interpreted as a con¬ 
ditional probability distribution. Suppose that 
the dependent variable is a categorical variable 
Y that can assume two values, which we repre¬ 
sent conventionally as 0 and 1. The probability 
distribution of the dependent variable is then a 
discrete function: 

P(Y = 1) = p 

P(Y = 0) = q = 1 - p 


A regression model where the dependent 
variable is a categorical variable is therefore a 
probability model; that is, it is a model of the 
probability p given the values of the indepen¬ 
dent variables X: 

P(Y = 1|X) = /(X) 

In the following sections we will discuss 
three probability models: the linear probabil¬ 
ity model, the probit regression model, and the 
logit regression model. 

Linear Probability Model 

The linear probability model assumes that the 
function /(X) is linear. For example, a linear 
probability model of default assumes that there 
is a linear relationship between the probabil¬ 
ity of default and the factors that determine 
default. 

P(Y = 1|X) = /(X) 

The parameters of the model can be obtained 
by using ordinary least squares applying the es¬ 
timation methods of multiple regression mod¬ 
els entry. Once the parameters of the model are 
estimated, the predicted value for P(Y) can be 
interpreted as the event probability such as the 
probability of default in our previous example. 
Note, however, that when using a linear prob¬ 
ability model, in this entry the R 2 is used only 
if all the independent variables are also binary 
variables. 

A major drawback of the linear probabil¬ 
ity model is that the predicted value may be 
negative. In the probit regression and logit re¬ 
gression models described below, the predicted 
probability is forced to be between 0 and 1. 

Probit Regression Model 

The probit regression model is a nonlinear regres¬ 
sion model where the dependent variable is 
a binary variable. Due to its nonlinearity, one 
cannot estimate this model with least squares 
methods. We have to use maximum like¬ 
lihood (ML) methods as described below. 
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Because what is being predicted is the standard 
normal cumulative probability distribution, the 
predicted values are between 0 and 1 . 

The general form for the probit regression 
model is 

P(Y = 1\X 1 , X 2 ,..., X K ) 

— N(a + b\Xi + b 2 X 2 + • • • + fcx X^) 

where N is the cumulative standard normal dis¬ 
tribution function. 

To see how ML methods work, consider a 
model of the probability of corporate bond de¬ 
faults. Suppose that there are three factors that 
have been found to historically explain corpo¬ 
rate bond defaults. The probit regression model 
is then 

■P(Y = 1|X 1 X 2i X 3 ) 

= N(fio + Pi Xi + P 2 X 2 + P 3 X 3 ) 

P(Y = 0|X 1 ,X 2 i X 3 ) 

= 1 — N(i So + PiX\ + ft 2 X 2 + p 3 X 3 ) 

The likelihood function is formed from the 
products 

]"~[ + Pi Xu + p 2 X 2 i + p 3 X 3i ) Yl 

i 

(1 - N(Po + piXu + p 2 X 2i + p 3 X 3/ )) 1 - yi 

extended to all the samples, where the variable 
Y assumes a value of 0 for defaulted companies 
and 1 for nondefaulted companies. Parameters 
are estimated by maximizing the likelihood. 

Suppose that the following parameters are 
estimated: 

P = -2.1 Pi = 1.9 p 2 = 0.3 p 3 = 0.8 
Then 

N(a+biXi + b 2 X 2 + b 3 X 3 ) 

= N(- 2.1 + 1.9Xi + 0.3X 2 + O. 8 X 3 ) 

Now suppose that the probability of default 
of a company with the following values for the 


independent variables is sought: 

Xj = 0.2 X 2 = 0.9 X 3 = 1.0 
Substituting these values we get 
N(—2.1 + 1.9(0.2) + 0.3(0.9) + 0.8(1.0)) = N(- 0.65) 

The standard normal cumulative probability for 
N(—0.65) is 25.8%. Therefore, the probability of 
default for a company with this characteristic is 
25.8%. 

Application to Hedge Fund Survival 

An illustration of the probit regression model is 
provided by Malkiel and Saha (2005) who use 
it to calculate the probability of the demise of a 
hedge fund. The dependent variable in the re¬ 
gression is 1 if a fund is defunct (did not survive) 
and 0 if it survived. The explanatory variables, 
their estimated coefficient, and the standard er¬ 
ror of the coefficient using hedge fund data from 
1994 to 2003 are given below: 


Explanatory Variable 

Coefficient 

Standard 

Deviation 

1. Return for the first 

-1.47 

0.36 

quarter before the end of 
fund performance 

2. Return for the second 

-4.93 

0.32 

quarter before the end of 
fund performance 

3. Return for the third 

-2.74 

0.33 

quarter before the end of 
fund performance 

4. Return for the fourth 

-3.71 

0.35 

quarter before the end of 
fund performance 

5. Standard deviation for 

17.76 

0.92 

the year prior to the end 
of fund performance 

6. Number of times in the 

0.00 

0.33 

final three months the 
fund's monthly return 
fell below the monthly 
median of all funds in 
the same primary 
category 

7. Assets of the fund (in 

-1.30 

-7.76 

billions of dollars) 
estimated at the end of 
performance 

Constant term 

-0.37 

0.07 
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For only one explanatory variable, the sixth 
one, the coefficient is not statistically signifi¬ 
cant from zero. That explanatory variable is a 
proxy for peer comparison of the hedge fund 
versus similar hedge funds. The results suggest 
that there is a lower probability of the demise 
of a hedge fund if there is good recent per¬ 
formance (the negative coefficient of the first 
four variables above) and the more assets un¬ 
der management (the negative coefficient for 
the last variable above). The greater the hedge 
fund performance return variability, the higher 
the probability of demise (the positive coeffi¬ 
cient for the fifth variable above). 


Logit Regression Model 

As with the probit regression model, the logit 
regression model is a nonlinear regression model 
where the dependent variable is a binary vari¬ 
able and the predicted values are between 0 
and 1. The predicted value is also a cumula¬ 
tive probability distribution. However, rather 
than being a standard normal cumulative prob¬ 
ability distribution, it is a standard cumulative 
probability distribution of a distribution called 
the logistic distribution. 

The general formula for the logit regression 
model is 

P(Y = 1|X 1 ,X 2 ,...,X N ) 

= F(a +b\X i + £> 2 X 2 + ... + b^X^) 

= —1/1 + e~ w 

where W = a + b\Xi + i 7 2 X 2 + ... + fr N X N . 

As with the probit regression model, the 
logit regression model is estimated with ML 
methods. 

Using our previous illustration, W = —0.65. 
Therefore 

1/[1 + e -w ] = 1/[1 + e - ( -°- 65 >] = 34.3% 


The probability of default for the company 
with these characteristics is 34.3%. 


KEY POINTS 

• Categorical variables are variables that rep¬ 
resent group membership and can appear in 
a regression equation as a regressor or as an 
independent variable. 

• A dichotomous variable is an explanatory 
variable that distinguishes only two cate¬ 
gories; the key is to represent a dichotomous 
categorical variable as a numerical variable, 
referred to as a dummy variable, that can as¬ 
sume the two values 0 , 1 . 

• When a dummy variable is a regressor, the 
f-statistic can be used to determine if that vari¬ 
able is statistically significant. The Chow test 
can also be used to test if all the dummy vari¬ 
ables in a regression model are collectively 
relevant. 

• A regression model where the dependent 
variable is a categorical variable is a prob¬ 
ability model, and there are three types of 
such models: the probability model, the pro¬ 
bit regression model, and the logit regression 
model. 

• The linear probability model assumes that the 
probability model to be estimated is linear 
and can be estimated using least squares. 

• The probit regression model is a nonlinear 
regression model where the dependent vari¬ 
able is a binary variable. The model cannot 
be estimated using least squares because it is 
a nonlinear model and is instead estimated 
using maximum likelihood methods. 

• The logit regression model is a nonlinear re¬ 
gression model where the dependent variable 
is a binary variable and the predicted values 
are between 0 and 1 and represent a cumula¬ 
tive probability distribution. Rather than be¬ 
ing a standard normal cumulative probability 
distribution, it is a standard cumulative prob¬ 
ability of a logit. 
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NOTE 

1. The model presented in this illustration was 
developed by FridsonVision and is described 
in "Focus Issues Methodology," Leverage 
World (May 30, 2003). The data for this illus¬ 
tration were provided by Greg Braylovskiy 
of FridsonVision. The firm uses about 650 
companies in its analysis. Only 100 observa¬ 
tions were used in this illustration. 
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Abstract: Many of the statistical methods that are most commonly used by researchers and practi¬ 
tioners in finance are mainly focused on identifying the central tendency within a data set. However, 
there are numerous situations where it may be equally or more important to understand the disper¬ 
sion between outcomes that are higher or lower than the central tendency. One statistical method 
that can be useful in such investigations is quantile regression, which conceptually can be viewed 
as a logical extension of ordinary least squares methods. 


Many investors use regression methods to 
gauge the relative attractiveness of different 
firms, the risks inherent in active or passive 
portfolios, the historical performance of invest¬ 
ment factors, and similar topics. Such research 
often focuses on understanding the "central 
tendency" within a data set, and for this pur¬ 
pose perhaps the most commonly used tool is 
regression based on ordinary least squares (OLS) 
approaches. OLS methods are designed to find 
the "line of best fit" by minimizing the sum 
of squared errors from individual data points. 
OLS analysis generally does a good job of de¬ 
scribing the central tendency within a data set, 
but typically will be much less effective at de¬ 
scribing the behavior of data points that are 
distant from the line of best fit. Quantile re¬ 
gressions, however, can be useful in such in¬ 
vestigations. This statistical approach can be 
viewed conceptually as a logical extension of or¬ 
dinary least squares methods. We present a brief 


overview of quantile regression approaches, 
together with some examples of how such 
methods can be applied in practical situations. 

COMPARING QUANTILE 
AND OLS APPROACHES 

Conceptually, OLS statistical analysis can be 
summarized by the following equation, as ex¬ 
pressed in a univariate context where a single 
independent variable is being used to explain 
or predict a single dependent variable: 

N 

Y i =a + Y i PX i +e 

1=1 

where Y represents the dependent variable, 
X represents the observed value of an indepen¬ 
dent variable, i = 1,..., N data points, a rep¬ 
resents the intercept (in other words, the value 
on the vertical axis when the horizontal axis is 


The material discussed here does not necessarily represent the opinions, methods, or views of 
Delaware Investments. 
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zero), ft represents the slope of the relationship 
between X and Y, and e is an error term with 
an expected mean value of zero. 

As a hypothetical example, suppose that X 
reflects the expected dividend in dollars for a 
universe of firms, and Y represents the stock 
price for each of those firms. Then the value of 
ft will reflect the value that the market is assign¬ 
ing to each SI of dividend payment, while the 
value of a will reflect the expected price of a 
stock that does not pay a dividend. (Please note 
that we are not proposing that such an equation 
would provide a usable investment thesis.) It is 
possible to adapt this simple OLS equation to a 
multivariate context, in which several different 
independent variables are being used together 
to explain or predict the value of the dependent 
variable. 

Similarly, quantile regression approaches can 
be summarized by the following equation, 
again in a univariate context: 

N 

Yi =aP + J2P Px i+£ P 

i=i 

where a p represents the intercept for a spec¬ 
ified quantile, fi 1 ' represents the slope of the 
relationship between X and Y for a specified 
quantile, and s v similarly represents the error 
term for that specified quantile. (The specific 
form for these two equations has been adapted 
from Meligkotsidou, Vrontos, and Vrontos, 
2007; other authors might use different termi¬ 
nology, but the underlying concepts are the 
same.) And just as OLS methods can be used in 
both univariate and multivariate contexts, the 
same is true for quantile regression approaches. 

In this context, what is a quantile? It is a gen¬ 
eralized form of a percentile, in other words 
a measure of spread between the highest and 
lowest values in a particular range. A quantile 
can conveniently be expressed in terms of per¬ 
centages, so that the median will be the 50th 
quantile. But the same method can be used for 
any quantile, not just the 50th quantile. In this 
sense, quantile methods are somewhat similar 


to value-at-risk (VaR) approaches, which seek 
to measure the "95th percentile" or "99th per¬ 
centile" of potential losses in a portfolio. 

REASONS FOR USING 
QUANTILE METHODS 

If a data set is distributed in an approximately 
normal fashion, and if the analysis focuses 
specifically on the 50th quantile, then the re¬ 
sults will often be quite similar to those derived 
from conventional OLS analysis. However, OLS 
methods tend to provide unreliable results if a 
data set is skewed, has "fat tails," or has some 
extreme outliers —any or all of which can exist 
when the relevant data are drawn from eco¬ 
nomics or finance (Koenker and Hallock, 2001). 
In such circumstances, quantile regression fo¬ 
cusing on the 50th quantile will often provide 
a more robust estimate of the central tendency 
than would be available from OLS approaches. 
Figure 1 provides a hypothetical example of a 
situation where quantile regression might be 
useful. 

Figure 1 shows a scatter plot of a hypothet¬ 
ical relationship that has three main traits: (1) 
positive slope, (2) higher dispersion of results 
when the independent variable is small, and (3) 
a single outlier toward the top end of the range. 
The graph shows that the outlier exerts consid¬ 
erable influence on the OLS analysis by tending 
to skew the relationship upward. A conven¬ 
tional OLS approach might decide to exclude 
the outlier, but this would effectively mean 
throwing away the information contained in 
that data point. By contrast, the quantile anal¬ 
ysis includes the outlier, but is less affected by 
its presence. As a consequence, quantile regres¬ 
sion does a better job of identifying the "central 
tendency" within this data set—in exactly the 
same way as an analyst might choose to use the 
median rather than the mean when describing a 
distribution that has a heavy weight in the left 
or right tail. 

The above analysis shows that quantile re¬ 
gression is more robust than OLS methods in 
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Figure 1 Effect of Outliers on OLS and Quantile Analysis 
Note: Data are hypothetical and based on a simulated relationship. 


the presence of outliers and other potentially 
distorting influences. Another useful feature of 
quantile approaches is that they allow analy¬ 
sis of areas away from the middle of the dis¬ 
tribution. Conventional regression techniques 
focus on the "central tendency" of the data, and 
thus tend to prioritize describing the relation¬ 
ship that is most representative of the average. 
However, from the perspective of an active in¬ 
vestor or a risk manager, the most interesting 
information may well be in the tails of the dis¬ 


tribution, where the standard OLS approaches 
are not generally very informative, but where 
quantile methods can be readily applied. 

Figure 2 shows the same scatter plot as Fig¬ 
ure 1, but instead of showing the quantile me¬ 
dian, it shows estimated lines for the 10th and 
90th quantiles. The lines form a funnel-like 
shape, indicating that there is greater variation 
on the left of the distribution than the right. 
From the perspective of an investor, this sug¬ 
gests that the range of possible outcomes from 



Figure 2 Effect of Outliers on OLS and Quantile Analysis: Estimated Lines for the 10th and 90th 
Quantiles 

Note: Data are hypothetical and based on a simulated relationship. 
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investing in companies on the left of the dis¬ 
tribution may be larger, and thus require more 
careful analysis. From the perspective of a risk 
manager, the difference in slope between the 
10th and 90th percentiles might suggest that 
greater provisioning would be appropriate if a 
portfolio tends to have greater weight in the 
left of the distribution. Once again, the outlier 
is included in the analysis, but its impact on 
the estimated intercept and slope for the 10th 
and 90th percentiles is considerably muted by 
comparison with what would be expected us¬ 
ing OLS-like methods. 


BACKGROUND AND 
FURTHER EXAMPLES 

Quantile regression methods were first devel¬ 
oped in the 1970s in the discipline of statis¬ 
tics (Koenker and Bassett, 1978). Koenker (2005) 
provides a comprehensive overview of quan¬ 
tile regression in general, with numerous exam¬ 
ples drawn from finance and from other subject 
areas. The statistical packages R, S-Plus, Stata, 
SAS, and SPSS all have quantile regression ca¬ 
pabilities, either as part of their base distribu¬ 
tion or as separate modules. These packages 
typically focus on linear quantile regression, 
but extensions to nonlinear applications are also 
feasible (Koenker and Flallock, 2001). 

In recent years, quantile regression methods 
have become increasingly popular in finance 
and economics. Chemozhukov and Umantsev 
(2000) applied quantile methods to estimate 
VaR, noting that the basic structure could be ap¬ 
plied to various possible modeling approaches. 
Wu and Xiao (2002) also used quantile meth¬ 
ods to estimate VaR and provided an exam¬ 
ple of how such approaches could be used in 
the context of an index fund. Engle and Man- 
ganelli (2004) provided an example of how to 
use quantile regression approaches in calculat¬ 
ing a conditional VaR measure. Kuester, Mit- 
tnik, and Paolella (2006) proposed extending 
the conditional VaR approach by incorporating 
some additional autoregressive elements. 


An important area of research for academics 
and practitioners has been the influence of in¬ 
vestment style on portfolio returns. One way 
to perform such analysis is through the analy¬ 
sis of portfolio holdings, but these are typically 
only available periodically and with a consid¬ 
erable lag. Another approach has been to focus 
on portfolio returns, which may be available 
at higher frequency and with a smaller delay. 
Early work in this area, such as Sharpe (1992) 
and Carhart (1997), generally relied on OLS 
approaches, which led to a focus on a portfo¬ 
lio's "central tendency" relative to its bench¬ 
mark. Bassett and Chen (2001) extended this 
earlier work by applying quantile methods, and 
showed that this permits examination of active 
performance during periods when the portfolio 
and / or its benchmark are far away from their 
central tendency. 

As shown above, quantile regression provides 
a more complete picture than OLS approaches 
of the conditional relationship among financial 
variables. Landajo, de Andres, and Lorca (2008) 
used quantile methods to gauge the relation¬ 
ship between size and profitability for publish¬ 
ing firms in Spain, and showed that the patterns 
for small firms were rather different from those 
for their larger peers. Similarly, Lee, Chan, Yeh, 
and Chan (2010) used quantile methods on a 
sample of firms from Taiwan in order to assess 
how increasing internationalization affects rel¬ 
ative valuation. 

Quantile methods can also be used to test 
whether the quantile-specific parameters are 
stable over different quantiles and over time, 
as noted by Koenker and Xiao (2006). Quan¬ 
tile models can thus demonstrate how different 
variables affect the location, scale, and shape 
of the conditional distribution of the response. 
Such methods therefore constitute a significant 
extension of classical constant coefficient time 
series models, in which the effect of condition¬ 
ing is typically confined to a shift of the intercept 
and/or the slope of the central tendency. Fat- 
touh, Scaramozzino, and Plarris (2005) used 
quantile methods to analyze how the capi¬ 
tal structure of firms in Korea had changed 
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over time. Billett and Xue (2008) used quan¬ 
tile approaches to analyze the motivations be¬ 
hind open market share repurchases, and found 
that firms are generally more likely to repur¬ 
chase shares when they are at higher risk of 
being taken over. Pires, Pereira, and Martins 
(2010) use quantile methods to analyze the 
determinants of credit default swap spreads 
over time, and report that some previously re¬ 
ported anomalous results may have occurred 
due to the emphasis on the conditional mean of 
the distribution, rather than on the upper and 
lower tails. 

KEY POINTS 

• Quantile regression methods are well estab¬ 
lished in the statistical literature, and are in¬ 
creasingly being used in finance. 

• Quantile regression methods are more robust 
than conventional OLS approaches to skewed 
distributions, fat tails, and the presence of 
outliers—all of which are frequently encoun¬ 
tered in real-world financial data. 

• Quantile regression approaches can be used 
to assess the central tendency of a data 
set, and in this sense can be viewed as a 
regression-based analogue of the median of 
a distribution. The same approaches can also 
be used to examine the upper or lower reaches 
of a data set, which is not possible using con¬ 
ventional OLS methods. 

• For active investors and risk managers, the 
upper or lower tails of a distribution may well 
be more interesting than the central tendency, 
and quantile regression is an appropriate tool 
for such work. 

• Quantile regression methods can be applied 
to data from a single period, but can also be 
applied in a time-series context. Such meth¬ 
ods can help in analyzing how relationships 
may have changed over time. 
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Abstract: Volatility is a key parameter used in many financial applications, from derivatives valua¬ 
tion to asset management and risk management. Volatility measures the size of the errors made in 
modeling returns and other financial variables. It was discovered that, for vast classes of models, the 
average size of volatility is not constant but changes with time and is predictable. Autoregressive 
conditional heteroskedasticity (ARCH), generalized autoregressive conditional heteroskedasticity 
(GARCH) models, and stochastic volatility models are the main tools used to model and forecast 
volatility. Moving from single assets to portfolios made of multiple assets, not only are there id¬ 
iosyncratic volatilities but also correlations and covariances between assets that are time varying 
and predictable. Multivariate ARCH/GARCH models and dynamic factor models, eventually in a 
Bayesian framework, are the basic tools used to forecast correlations and covariances. 


In this entry we discuss the modeling of the 
time behavior of the uncertainty related to 
many econometric models when applied to 
financial data. Finance practitioners know that 
errors made in predicting markets are not of a 
constant magnitude. There are periods when 
unpredictable market fluctuations are larger 
and periods when they are smaller. This behav¬ 
ior, known as heteroskedasticity, refers to the fact 
that the size of market volatility tends to cluster 
in periods of high volatility and periods of 


low volatility. The discovery that it is possible 
to formalize and generalize this observation 
was a major breakthrough in econometrics. 
In fact, we can describe many economic and 
financial data with models that predict, si¬ 
multaneously, the economic variables and the 
average magnitude of the squared prediction 
error. 

In this entry, we show how the average error 
size can be modeled as an autoregressive pro¬ 
cess. Given their autoregressive nature, these 
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models are called autoregressive conditional het- 
eroskedasticity (ARCH) or generalized autoregres¬ 
sive conditional heteroskedasticity (GARCH). This 
discovery is particularly important in financial 
econometrics, where the error size is, in itself, a 
variable of great interest. 

REVIEW OF LINEAR 
REGRESSION AND 
AUTOREGRESSIVE MODELS 

Let's first discuss two examples of basic econo¬ 
metric models, the linear regression model and 
the autoregressive model, and illustrate the mean¬ 
ing of homoskedasticity or heteroskedasticity in 
each case. 

The linear regression model is the workhorse 
of economic modeling. A univariate linear re¬ 
gression represents a proportionality relation¬ 
ship between two variables: 

y — a + fix + e 

The preceding linear regression model states 
that the expectation of the variable y is /3 times 
the expectation of the variable x plus a constant 
a. The proportionality relationship between y 
and x is not exact but subject to an error e. 

In standard regression theory, the error e is 
assumed to have a zero mean and a constant 
standard deviation a. The standard deviation is 
the square root of the variance, which is the ex¬ 
pectation of the squared error: a 1 = E (e 2 ). It is 
a positive number that measures the size of the 
error. We call homoskedasticity the assumption 
that the expected size of the error is constant 
and does not depend on the size of the vari¬ 
able x. We call heteroskedasticity the assumption 
that the expected size of the error term is not 
constant. 

The assumption of homoskedasticity is con¬ 
venient from a mathematical point of view and 
is standard in regression theory. However, it 
is an assumption that must be verified empir¬ 
ically. In many cases, especially if the range 
of variables is large, the assumption of homo¬ 


skedasticity might be unreasonable. For exam¬ 
ple, assuming a linear relationship between 
consumption and household income, we can 
expect that the size of the error depends on the 
size of household income. In fact, high-income 
households have more freedom in the alloca¬ 
tion of their income. 

In the preceding household-income example, 
the linear regression represents a cross-sectional 
model without any time dimension. However, 
in finance and economics in general, we deal 
primarily with time series, that is, sequences 
of observations at different moments of time. 
Let's call Xf the value of an economic time se¬ 
ries at time t. Since the groundbreaking work 
of Haavelmo (1944), economic time series are 
considered to be realizations of stochastic pro¬ 
cesses. That is, each point of an economic time 
series is considered to be an observation of a 
random variable. 

We can look at a stochastic process as a 
sequence of variables characterized by joint- 
probability distributions for every finite set of 
different time points. In particular, we can con¬ 
sider the distribution ft of each variable Xf at 
each moment. Intuitively, we can visualize a 
stochastic process as a very large (infinite) num¬ 
ber of paths. A process is called weakly station¬ 
ary if all of its second moments are constant. In 
particular this means that the mean and vari¬ 
ance are constants /x t = fi and er 2 = a 2 that do 
not depend on the time t. A process is called 
strictly stationary if none of its finite distribu¬ 
tions depends on time. A strictly stationary pro¬ 
cess is not necessarily weakly stationary as its 
finite distributions, though time-independent, 
might have infinite moments. 

The terms fit and a 2 are the unconditional 
mean and variance of a process. In finance 
and economics, however, we are typically in¬ 
terested in making forecasts based on past and 
present information. Therefore, we consider 
the distribution f tl (x |l fl ) of the variable X tl 
at time f 2 conditional on the information I fl 
known at time t\. Based on information avail¬ 
able at time t — 1, h-\, we can also define the 
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conditional mean and the conditional variance 
(ht (<h 2 Uf-i)- 

A process can be weakly stationary but have 
time-varying conditional variance. If the condi¬ 
tional mean is constant, then the unconditional 
variance is the unconditional expectation of the 
conditional variance. If the conditional mean is 
not constant, the unconditional variance is not 
equal to the unconditional expectation of the 
conditional variance; this is due to the dynam¬ 
ics of the conditional mean. 

In describing ARCH/GARCH behavior, we 
focus on the error process. In particular, we as¬ 
sume that the errors are an innovation process, 
that is, we assume that the conditional mean of 
the errors is zero. We write the error process as: 
St = crfZf where er f is the conditional standard 
deviation and the z terms are a sequence of in¬ 
dependent, zero-mean, unit-variance, normally 
distributed variables. Under this assumption, 
the unconditional variance of the error process 
is the unconditional mean of the conditional 
variance. Note, however, that the unconditional 
variance of the process variable does not, in gen¬ 
eral, coincide with the unconditional variance 
of the error terms. 

In financial and economic models, condition¬ 
ing is often stated as regressions of the future 
values of the variables on the present and past 
values of the same variable. For example, if we 
assume that time is discrete, we can express 
conditioning as an autoregressive model: 

Xf+1 = OlQ + PoXf + • • • + finXt-n + £f+l 

The error term e, is conditional on the infor¬ 
mation 1/ that, in this example, is represented 
by the present and the past n values of the 
variable X. The simplest autoregressive model 
is the random walk model of the logarithms of 
prices pf. 

Pt+1 = pf + Pf + £f 

In terms of returns, the random walk model 
is simply: 


A major breakthrough in econometric model¬ 
ing was the discovery that, for many families of 
econometric models, linear and nonlinear alike, 
it is possible to specify a stochastic process for 
the error terms and predict the average size of 
the error terms when models are fitted to empir¬ 
ical data. This is the essence of ARCH modeling 
introduced by Engle (1982). 

Two observations are in order. First, we have 
introduced two different types of heteroskedas- 
ticity. In the first example, regression errors are 
heteroskedastic because they depend on the 
value of the independent variables: The average 
error is larger when the independent variable 
is larger. In the second example, however, error 
terms are conditionally heteroskedastic because 
they vary with time and do not necessarily de¬ 
pend on the value of the process variables. Later 
in this entry we will describe a variant of the 
ARCH model where the size of volatility is cor¬ 
related with the level of the variable. However, 
in the basic specification of ARCH models, the 
level of the variables and the size of volatility 
are independent. 

Second, let's observe that the volatility (or the 
variance) of the error term is a hidden, nonob¬ 
servable variable. Later in this entry, we will 
describe realized volatility models that treat 
volatility as an observed variable. Theoretically, 
however, time-varying volatility can be only in¬ 
ferred, not observed. As a consequence, the er¬ 
ror term cannot be separated from the rest of 
the model. This occurs both because we have 
only one realization of the relevant time se¬ 
ries and because the volatility term depends 
on the model used to forecast expected returns. 
The ARCH/GARCH behavior of the error term 
depends on the model chosen to represent the 
data. We might use different models to repre¬ 
sent data with different levels of accuracy. Each 
model will be characterized by a different spec¬ 
ification of heteroskedasticity. 

Consider, for example, the following model 
for returns: 


r, = Ap t = p + s t 


r t = m + s t 
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In this simple model, the clustering of volatil¬ 
ity is equivalent to the clustering of the squared 
returns (minus their constant mean). Now 
suppose that we discover that returns are 
predictable through a regression on some 
predictor/: 

r t = m + f t -1 + e t 

As a result of our discovery, we can expect 
that the model will be more accurate, the size of 
the errors will decrease, and the heteroskedastic 
behavior will change. 

Note that in the model r t = m + e f , the er¬ 
rors coincide with the fluctuations of returns 
around their unconditional mean. If errors are 
an innovation process, that is, if the conditional 
mean of the errors is zero, then the variance of 
returns coincides with the variance of errors, 
and ARCH behavior describes the fluctuations 
of returns. However, if we were able to make 
conditional forecasts of returns, then the ARCH 
model describes the behavior of the errors and it 
is no longer true that the unconditional variance 
of errors coincides with the unconditional vari¬ 
ance of returns. Thus, the statement that ARCH 
models describe the time evolution of the vari¬ 
ance of returns is true only if returns have a 
constant expectation. 

ARCH/GARCH effects are important be¬ 
cause they are very general. It has been found 
empirically that most model families presently 
in use in econometrics and financial economet¬ 
rics exhibit conditionally heteroskedastic errors 
when applied to empirical economic and finan¬ 
cial data. The heteroskedasticity of errors has 
not disappeared with the adoption of more so¬ 
phisticated models of financial variables. The 
ARCH/GARCH specification of errors allows 
one to estimate models more accurately and to 
forecast volatility. 

ARCH/GARCH MODELS 

In this section, we discuss univariate ARCH and 
GARCH models. Because in this entry we fo¬ 
cus on financial applications, we will use finan¬ 


cial notation. Let the dependent variable, which 
might be the return on an asset or a portfolio, 
be labeled r t . The mean value m and the vari¬ 
ance h will be defined relative to a past informa¬ 
tion set. Then the return r in the present will be 
equal to the conditional mean value of r (that is, 
the expected value of r based on past informa¬ 
tion) plus the conditional standard deviation of 
r (that is, the square root of the variance) times 
the error term for the present period: 

r t = m t + V'/qZf 

The econometric challenge is to specify how 
the information is used to forecast the mean and 
variance of the return conditional on the past 
information. While many specifications have 
been considered for the mean return and used 
in efforts to forecast future returns, rather sim¬ 
ple specifications have proven surprisingly suc¬ 
cessful in predicting conditional variances. 

First, note that if the error terms were strict 
white noise (that is, zero-mean, independent 
variables with the same variance), the condi¬ 
tional variance of the error terms would be con¬ 
stant and equal to the unconditional variance of 
errors. We would be able to estimate the error 
variance with the empirical variance: 

n 

Esf 

h = '^- 
n 

using the largest possible available sample. 
However, it was discovered that the residuals of 
most models used in financial econometrics ex¬ 
hibit a structure that includes heteroskedastic¬ 
ity and autocorrelation of their absolute values 
or of their squared values. 

The simplest strategy to capture the time de¬ 
pendency of the variance is to use a short 
rolling window for estimates. In fact, before 
ARCH, the primary descriptive tool to cap¬ 
ture time-varying conditional standard devia¬ 
tion and conditional variance was the rolling 
standard deviation or the rolling variance. This 
is the standard deviation or variance calcu¬ 
lated using a fixed number of the most recent 



ARCH/GARCH Models in Applied Financial Econometrics 


363 


observations. For example, a rolling standard 
deviation or variance could be calculated ev¬ 
ery day using the most recent month (22 busi¬ 
ness days) of data. It is convenient to think of 
this formulation as the first ARCH model; it as¬ 
sumes that the variance of tomorrow's return 
is an equally weighted average of the squared 
residuals of the last 22 days. 

The idea behind the use of a rolling window 
is that the variance changes slowly over time, 
and it is therefore approximately constant on a 
short rolling-time window. However, given that 
the variance changes over time, the assumption 
of equal weights seems unattractive: It is rea¬ 
sonable to consider that more recent events are 
more relevant and should therefore have higher 
weights. The assumption of zero weights for 
observations more than one month old is also 
unappealing. 

In the ARCH model proposed by Engle (1982), 
these weights are parameters to be estimated. 
Engle's ARCH model thereby allows the data to 
determine the best weights to use in forecasting 
the variance. In the original formulation of the 
ARCH model, the variance is forecasted as a 
moving average of past error terms: 

V 

h t = (D+ y ajS^i 

i =l 

where the coefficients a, must be estimated 
from empirical data. The errors themselves will 
have the form 

St = yfhtZ t 

where the z terms are independent, standard 
normal variables (that is, zero-mean, unit- 
variance, normal variables). In order to ensure 
that the variance is nonnegative, the constants 

p 

(co,ai) must be nonnegative. If a, < 1, the 

i =1 

ARCH process is weakly stationary with con¬ 
stant unconditional variance: 


P 

1 - OLi 
i =1 


Two remarks should be made. First, ARCH is 
a forecasting model insofar as it forecasts the 
error variance at time t on the basis of informa¬ 
tion known at time t — 1. Second, forecasting is 
conditionally deterministic, that is, the ARCH 
model does not leave any uncertainty on the ex¬ 
pectation of the squared error at time t knowing 
past errors. This must always be true of a fore¬ 
cast, but, of course, the squared error that occurs 
can deviate widely from this forecast value. 

A useful generalization of this model is 
the GARCH parameterization introduced by 
Bollerslev (1986). This model is also a weighted 
average of past squared residuals, but it has 
declining weights that never go completely to 
zero. In its most general form, it is not a Marko¬ 
vian model, as all past errors contribute to fore¬ 
cast volatility. It gives parsimonious models 
that are easy to estimate and, even in its sim¬ 
plest form, has proven surprisingly successful 
in predicting conditional variances. 

The most widely used GARCH specification 
asserts that the best predictor of the variance 
in the next period is a weighted average of 
the long-run average variance, the variance 
predicted for this period, and the new infor¬ 
mation in this period that is captured by the 
most recent squared residual. Such an updat¬ 
ing rule is a simple description of adaptive or 
learning behavior and can be thought of as 
Bayesian updating. Consider the trader who 
knows that the long-run average daily stan¬ 
dard deviation of the Standard and Poor's 500 
is 1%, that the forecast he made yesterday was 
2%, and the unexpected return observed to¬ 
day is 3%. Obviously, this is a high-volatility 
period, and today is especially volatile, sug¬ 
gesting that the volatility forecast for tomorrow 
could be even higher. However, the fact that the 
long-term average is only 1% might lead the 
forecaster to lower his forecast. The best strat¬ 
egy depends on the dependence between days. 
If these three numbers are each squared and 
weighted equally, then the new forecast would 
be 2.16 = y'Tl +4 + 9) /3. However, rather than 
weighting these equally, for daily data it is 
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generally found that weights such as those 
in the empirical example of (0.02, 0.9, 0.08) 
are much more accurate. Hence, the forecast 
is 2.08 = V0.02 x 1 + 0.9 x 4 + 0.08 x 9. To be 
precise, we can use h t to define the variance of 
the residuals of a regression rt = m t + TT t St . In 
this definition, the variance of St is one. There¬ 
fore, a GARCH(1,1) model for variance looks 
like this: 

ht +1 = co + a (ft — m t ) 2 + fdht = co + ah t s 2 + fiht 

This model forecasts the variance of date t 
return as a weighted average of a constant, yes¬ 
terday's forecast, and yesterday's squared error. 
If we apply the previous formula recursively, 
we obtain an infinite weighted moving aver¬ 
age. Note that the weighting coefficients are 
different from those of a standard exponentially 
weighted moving average (EWMA). The econo¬ 
metrician must estimate the constants co,a, 
updating simply requires knowing the previ¬ 
ous forecast h and the residual. 

The weights are (1 — or — /?, /J, a) and the 

long-run average variance is Jco/{l — a — /3). It 
should be noted that this works only if a + ft < 
1 and it really makes sense only if the weights 
are positive, requiring a > 0, fi > 0, co > 0. In 
fact, the GARCH(1,1) process is weakly station¬ 
ary if a + < 1. If E [log(/l + az 2 )] < 0, the pro¬ 

cess is strictly stationary. The GARCH model 
with a + = 1 is called an integrated GARCH 

or IGARCH. It is a strictly stationary process 
with infinite variance. 

The GARCH model described above and typ¬ 
ically referred to as the GARCH(1,1) model 
derives its name from the fact that the 1,1 in 
parentheses is a standard notation in which the 
first number refers to the number of autoregres¬ 
sive lags (or ARCH terms) that appear in the 
equation and the second number refers to the 
number of moving average lags specified (often 
called the number of GARCH terms). Models 
with more than one lag are sometimes needed 
to find good variance forecasts. Although this 
model is directly set up to forecast for just one 


period, it turns out that, based on the one- 
period forecast, a two-period forecast can be 
made. Ultimately, by repeating this step, long- 
horizon forecasts can be constructed. For the 
GARCH(1,1), the two-step forecast is a little 
closer to the long-run average variance than 
is the one-step forecast, and, ultimately, the 
distant-horizon forecast is the same for all time 
periods as long as a + <1. This is just the 

unconditional variance. Thus, GARCH mod¬ 
els are mean reverting and conditionally het- 
eroskedastic but have a constant unconditional 
variance. 

Let's now address the question of how the 
econometrician can estimate an equation like 
the GARCH(1,1) when the only variable on 
which there are data is 7q. One possibility is to 
use maximum likelihood by substituting ht for 
cr 2 in the normal likelihood and then maximiz¬ 
ing with respect to the parameters. GARCH 
estimation is implemented in commercially 
available software such as EViews, GAUSS, 
Matlab, RATS, SAS, or TSR The process is quite 
straightforward: For any set of parameters 
co, a, p and a starting estimate for the variance 
of the first observation, which is often taken 
to be the observed variance of the residuals, 
it is easy to calculate the variance forecast for 
the second observation. The GARCH updating 
formula takes the weighted average of the un¬ 
conditional variance, the squared residual for 
the first observation, and the starting variance 
and estimates the variance of the second obser¬ 
vation. This is input into the forecast of the third 
variance, and so forth. Eventually, an entire 
time series of variance forecasts is constructed. 

Ideally, this series is large when the residu¬ 
als are large and small when the residuals are 
small. The likelihood function provides a sys¬ 
tematic way to adjust the parameters co, a, to 
give the best fit. Of course, it is possible that 
the true variance process is different from the 
one specified by the econometrician. In order to 
check this, a variety of diagnostic tests are avail¬ 
able. The simplest is to construct the series of 
{fif}, which are supposed to have constant mean 
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and variance if the model is correctly specified. 
Various tests, such as tests for autocorrelation 
in the squares, can detect model failures. The 
Ljung-Box test with 15 lagged autocorrelations 
is often used. 


Application to Value at Risk 

Applications of the ARCH/GARCH approach 
are widespread in situations where the volatil¬ 
ity of returns is a central issue. Many banks 
and other financial institutions use the idea of 
value at risk (VaR) as a way to measure the risks 
in their portfolios. The 1% VaR is defined as 
the number of dollars that one can be 99% cer¬ 
tain exceeds any losses for the next day. Let's 
use the GARCH(1,1) tools to estimate the 1% 
VaR of a $1 million portfolio on March 23,2000. 
This portfolio consists of 50% Nasdaq, 30% Dow 


Jones, and 20% long bonds. We chose this date 
because, with the fall of equity markets in the 
spring of 2000, it was a period of high volatil¬ 
ity. First, we construct the hypothetical histor¬ 
ical portfolio. (All calculations in this example 
were done with the EViews software program.) 
Figure 1 shows the pattern of the Nasdaq, Dow 
Jones, and long Treasury bonds. In Table 1, 
we present some illustrative statistics for each 
of these three investments separately and, in 


Table 1 Portfolio Data 


Sample: 3/23/1990 3/23/2000 


NQ 

DJ 

RATE 

PORT 

Mean 

0.0009 

0.0005 

0.0001 

0.0007 

Std. Dev. 

0.0115 

0.0090 

0.0073 

0.0083 

Skewness 

-0.5310 

-0.3593 

-0.2031 

-0.4738 

Kurtosis 

7.4936 

8.3288 

4.9579 

7.0026 
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Table 2 GARCH(1,1) 


Dependent Variable: PORT 
Sample (adjusted): 3/26/1990 3/23/2000 
Convergence achieved after 16 iterations 
Bollerslev-Wooldrige robust standard errors and covariance 


Variance Equation 

C 

0.0000 

0.0000 

3.1210 

0.0018 

ARCH(l) 

0.0772 

0.0179 

4.3046 

0.0000 

GARCH(l) 

0.9046 

0.0196 

46.1474 

0.0000 

S.E. of regression 

0.0083 

Akaike info criterion 

-6.9186 

Sum squared resid 

0.1791 

Schwarz criterion 


-6.9118 

Log likelihood 

9028.2809 

Durbin-Watson stat 

1.8413 


the final column, for the portfolio as a whole. 
Then we forecast the standard deviation of the 
portfolio and its 1% quantile. We carry out this 
calculation over several different time frames: 
the entire 10 years of the sample up to March 23, 
2000, the year before March 23, 2000, and from 
January 1, 2000 to March 23, 2000. 

Consider first the quantiles of the historical 
portfolio at these three different time horizons. 
Over the full 10-year sample, the 1% quantile 
times $1 million produces a VaR of $22,477. 
Over the last year, the calculation produces a 
VaR of $24,653—somewhat higher, but not sig¬ 
nificantly so. However, if the first quantile is cal¬ 
culated based on the data from January 1, 2000, 
to March 23,2000, the VaR is $35,159. Thus, the 
level of risk has increased significantly over the 
last quarter. 

The basic GARCH(1,1) results are given in 
Table 2. Notice that the coefficients sum up to 
a number slightly less than one. The forecasted 
standard deviation for the next day is 0.014605, 
which is almost double the average standard 
deviation of 0.0083 presented in the last col¬ 
umn of Table 1. If the residuals were normally 
distributed, then this would be multiplied by 
2.326348, giving a VaR equal to $33,977. As it 
turns out, the standardized residuals, which are 
the estimated values of [st], have a 1% quantile 
of 2.8437, which is well above the normal quan¬ 
tile. The estimated 1% VaR is $39,996. Notice 
that this VaR has risen to reflect the increased 
risk in 2000. 


Finally, the VaR can be computed based solely 
on estimation of the quantile of the forecast dis¬ 
tribution. This has been proposed by Engle and 
Manganelli (2001), adapting the quantile regres¬ 
sion methods of Koenker and Basset (1978). 
Application of their method to this dataset de¬ 
livers a VaR of $38,228. Instead of assuming the 
distribution of return series, Engle and Man¬ 
ganelli (2004) propose a new VaR modeling 
approach, conditional autoregressive value at risk 
(CAViaR), to directly compute the quantile of 
an individual financial asset. On a theoretical 
level, due to structural changes of the return 
series, the constant-parameter CAViaR model 
can be extended. Huang et al. (2010) formulate 
a time-varying CAViaR model, which they call 
an index-exciting time-varying CAViaR model. 
The model incorporates the market index infor¬ 
mation to deal with the unobservable structural 
break points for the individual risky asset. 

WHY ARCH/GARCH? 

The ARCH/GARCH framework proved to be 
very successful in predicting volatility changes. 
Empirically, a wide range of financial and eco¬ 
nomic phenomena exhibit the clustering of 
volatilities. As we have seen, ARCH/GARCH 
models describe the time evolution of the av¬ 
erage size of squared errors, that is, the evolu¬ 
tion of the magnitude of uncertainty. Despite 
the empirical success of ARCH/GARCH mod¬ 
els, there is no real consensus on the economic 
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reasons why uncertainty tends to cluster. That 
is why models tend to perform better in some 
periods and worse in other periods. 

It is relatively easy to induce ARCH behavior 
in simulated systems by making appropriate 
assumptions on agent behavior. For example, 
one can reproduce ARCH behavior in artificial 
markets with simple assumptions on agent 
decision-making processes. The real economic 
challenge, however, is to explain ARCH/ 
GARCH behavior in terms of features of agents 
behavior and/or economic variables that could 
be empirically ascertained. 

In classical physics, the amount of uncer¬ 
tainty inherent in models and predictions can 
be made arbitrarily low by increasing the preci¬ 
sion of initial data. This view, however, has been 
challenged in at least two ways. First, quan¬ 
tum mechanics has introduced the notion that 
there is a fundamental uncertainty in any mea¬ 
surement process. The amount of uncertainty 
is prescribed by the theory at a fundamental 
level. Second, the theory of complex systems 
has shown that nonlinear complex systems are 
so sensitive to changes in initial conditions that, 
in practice, there are limits to the accuracy of 
any model. ARCH/GARCH models describe 
the time evolution of uncertainty in a complex 
system. 

In financial and economic models, the future 
is always uncertain but over time we learn 
new information that helps us forecast this 
future. As asset prices reflect our best fore¬ 
casts of the future profitability of companies 
and countries, these change whenever there 
is news. ARCH/GARCH models can be inter¬ 
preted as measuring the intensity of the news 
process. Volatility clustering is most easily un¬ 
derstood as news clustering. Of course, many 
things influence the arrival process of news and 
its impact on prices. Trades convey news to 
the market and the macroeconomy can mod¬ 
erate the importance of the news. These can 
all be thought of as important determinants 
of the volatility that is picked up by ARCH/ 
GARCH. 


GENERALIZATIONS OF THE 
ARCH/GARCH MODELS 

Thus far, we have described the fundamental 
ARCH and GARCH models and their applica¬ 
tion to VaR calculations. The ARCH/GARCH 
framework proved to be a rich framework and 
many different extensions and generalizations 
of the initial ARCH/GARCH models have been 
proposed. We will now describe some of these 
generalizations and extensions. We will focus 
on applications in finance and will continue to 
use financial notation assuming that our vari¬ 
ables represent returns of assets or of portfolios. 

Let's first discuss why we need to general¬ 
ize the ARCH/GARCH models. There are three 
major extensions and generalizations: 

1. Integration of first, second, and higher mo¬ 
ments 

2. Generalization to high-frequency data 

3. Multivariate extensions 


Integration of First, Second, and 
Higher Moments 

In the ARCH/GARCH models considered thus 
far, returns are assumed to be normally dis¬ 
tributed and the forecasts of the first and sec¬ 
ond moments independent. These assumptions 
can be generalized in different ways, either al¬ 
lowing the conditional distribution of the error 
terms to be non-normal and / or integrating the 
first and second moments. 

Let's first consider asymmetries in volatil¬ 
ity forecasts. There is convincing evidence that 
the direction does affect volatility. Particularly 
for broad-based equity indexes and bond mar¬ 
ket indexes, it appears that market declines 
forecast higher volatility than do comparable 
market increases. There are now a variety of 
asymmetric GARCH models, including the ex¬ 
ponential GARCH (EGARCH) model of Nelson 
(1991), the threshold ARCH (TARCH) model at¬ 
tributed to Rabemananjara and Zakoian (1993) 
and Glosten, Jagannathan, and Runkle (1993), 
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and a collection and comparison by Engle and 
Ng (1993). 

In order to illustrate asymmetric GARCH, 
consider, for example, the asymmetric 
GARCH(1,1) model of Glosten, Jagannathan, 
and Runkle (1993). In this model, we add a 
term y (f{ S(< o)) sf to the basic GARCH: 

h t+ i = co + uh t e] + y (t{ Ef <oi) £? + Ph t . 

The term (If et< oj) is an indicator function that 
is zero when the error is positive and 1 when 
it is negative. If y is positive, negative errors 
are leveraged. The parameters of the model are 
assumed to be positive. The relationship a + 
P + y/ 2 < 1 is assumed to hold. 

In addition to asymmetries, it has been empir¬ 
ically found that residuals of ARCH/GARCH 
models fitted to empirical financial data ex¬ 
hibit excess kurtosis. One way to handle this 
problem is to consider non-normal distribu¬ 
tions of errors. Non-normal distributions of er¬ 
rors were considered by Bollerslev (1987), who 
introduced a GARCH model where the variable 
z follows a Student-f distribution. 

Let's now discuss the integration of first 
and second moments through the GARCH-M 
model. ARCH/GARCH models imply that the 
risk inherent in financial markets varies over 
time. Given that financial markets implement 
a risk-return trade-off, it is reasonable to ask 
whether changing risk entails changing returns. 
Note that, in principle, predictability of returns 
in function of predictability of risk is not a viola¬ 
tion of market efficiency. To correlate changes in 
volatility with changes in returns, Engle, Lilien, 
and Robins (1987) proposed the GARCH-M 
model (not to be confused with the multivariate 
MG ARCH model that will be described shortly). 
The GARCH-M model, or GARCH in mean 
model, is a complete nonlinear model of asset 
returns and not only a specification of the error 
behavior. In the GARCH-M model, returns are 
assumed to be a constant plus a term propor¬ 
tional to the conditional variance: 

h+i = fi t + cr f Zf, fit — Po + MicTf 2 


where er f 2 follows a GARCH process and the 
z terms are independent and identically dis¬ 
tributed (IID) normal variables. Alternatively, 
the GARCH-M process can be specified mak¬ 
ing the mean linear in the standard deviation 
but not in the variance. 

The integration of volatilities and expected re¬ 
turns, that is the integration of risk and returns, 
is a difficult task. The reason is that not only 
volatilities but also correlations should play a 
role. The GARCH-M model was extended by 
Bollerslev (1986) in a multivariate context. The 
key challenge of these extensions is the explo¬ 
sion in the number of parameters to estimate; 
we will see this when discussing multivariate 
extensions in the following sections. 

Generalizations to High-Frequency 
Data 

With the advent of electronic trading, a growing 
amount of data has become available to practi¬ 
tioners and researchers. In many markets, data 
at transaction level, called tick-by-tick data or 
ultra-high-frequency data, are now available. The 
increase of data points in moving from daily 
data to transaction data is significant. For exam¬ 
ple, the average number of daily transactions 
for U.S. stocks in the Russell 1000 is in the order 
of 2,000. Thus, we have a 2,000-fold increase in 
data going from daily data to tick-by-tick data. 

The interest in high-frequency data is twofold. 
First, researchers and practitioners want to find 
events of interest. For example, the measure¬ 
ment of intraday risk and the discovery of trad¬ 
ing profit opportunities at short time horizons 
are of interest to many financial institutions. 
Second, researchers and practitioners would 
like to exploit high-frequency data to obtain 
more precise forecasts at the usual forecasting 
horizon. Let's focus on the latter objective. 

As observed by Merton (1980), while in diffu¬ 
sive processes the estimation of trends requires 
long stretches of data, the estimation of volatil¬ 
ity can be done with arbitrary precision us¬ 
ing data extracted from arbitrarily short time 
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periods provided that the sampling rate is arbi¬ 
trarily high. In other words, in diffusive mod¬ 
els, the estimation of volatility greatly profits 
from high-frequency data. It therefore seems 
tempting to use data at the highest possible 
frequency, for example spaced at a few min¬ 
utes, to obtain better estimates of volatility at 
the frequency of practical interest, say daily or 
weekly. As we will see, the question is not so 
straightforward and the answer is still being 
researched. 

We will now give a brief account of the main 
modeling strategies and the main obstacles in 
using high-frequency data for volatility esti¬ 
mates. We will first assume that the return 
series are sampled at a high but fixed fre¬ 
quency. In other words, we initially assume 
that data are taken at fixed intervals of time. 
Later, we will drop this assumption and con¬ 
sider irregularly spaced tick-by-tick data, what 
Engle (2000) refers to as "ultra-high-frequency 
data." 

Let's begin by reviewing some facts about 
the temporal aggregation of models. The ques¬ 
tion of temporal aggregation is the question of 
whether models maintain the same form when 
used at different time scales. This question has 
two sides: empirical and theoretical. From the 
empirical point of view, it is far from being obvi¬ 
ous that econometric models maintain the same 
form under temporal aggregation. In fact, pat¬ 
terns found at some time scales might disap¬ 
pear at another time scale. For example, at very 
short time horizons, returns exhibit autocorre¬ 
lations that disappear at longer time horizons. 
Note that it is not a question of the precision 
and accuracy of models. Given the uncertainty 
associated with financial modeling, there are 
phenomena that exist at some time horizon and 
disappear at other time horizons. 

Time aggregation can also be explored from 
a purely theoretical point of view. Suppose that 
a time series is characterized by a given data- 
generating process (DGP). We want to inves¬ 
tigate what DGPs are closed under temporal 
aggregation; that is, we want to investigate 


what DGPs, eventually with different param¬ 
eters, can represent the same series sampled at 
different time intervals. 

The question of time aggregation for GARCH 
processes was explored by Drost and Nijman 
(1993). Consider an infinite series {y} with 
given fixed-time intervals A Xt = Xf+i — x t . Sup¬ 
pose that the series {x t } follows a GARCH(p,q) 
process. Suppose also that we sample this series 
at intervals that are multiples of the basic inter¬ 
vals: A xji = hAxt — Xt+h — x t . We obtain a new 
series {i/f} • Drost and Nijman found that the new 
series {y f } does not, in general, follow another 
GARCH(p',q') process. The reason is that, in the 
standard GARCH definition presented in the 
previous sections, the series {x t = a f z f } is sup¬ 
posed to be a martingale difference sequence 
(that is, a process with zero conditional mean). 
This property is not conserved at longer time 
horizons. 

To solve this problem, Drost and Nijman in¬ 
troduced weak GARCH processes, processes 
that do not assume the martingale difference 
condition. They were able to show that weak 
GARCH(p,q) models are closed under tempo¬ 
ral aggregation and established the formulas to 
obtain the parameters of the new process after 
aggregation. One consequence of their formu¬ 
las is that the fluctuations of volatility tend to 
disappear when the time interval becomes very 
large. This conclusion is quite intuitive given 
that conditional volatility is a mean-reverting 
process. 

Christoffersen, Diebold, and Schuerman 
(1998) use the Drost and Nijman formula to 
show that the usual scaling of volatility, which 
assumes that volatility scales with the square 
root of time as in the random walk, can be seri¬ 
ously misleading. In fact, the usual scaling mag¬ 
nifies the GARCH effects when the time horizon 
increases while the Drost and Nijman analysis 
shows that the GARCH effect tends to disap¬ 
pear with growing time horizons. If, for exam¬ 
ple, we fit a GARCH model to daily returns 
and then scale to monthly volatility multiply¬ 
ing by the square root of the number of days in 
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a month, we obtain a seriously biased estimate 
of monthly volatility. 

Various proposals to exploit high-frequency 
data to estimate volatility have been made. 
Meddahi and Renault (2004) proposed a class 
of autoregressive stochastic volatility models— 
the SR-SARV model class—that are closed un¬ 
der temporal aggregation; they thereby avoid 
the limitations of the weak GARCH models. 
Andersen and Bollerslev (1998) proposed real¬ 
ized volatility as a virtually error-free measure 
of instantaneous volatility. To compute real¬ 
ized volatility using their model, one simply 
sums intraperiod high-frequency squared 
returns. 

Thus far, we have briefly described models 
based on regularly spaced data. However, the 
ultimate objective in financial modeling is us¬ 
ing all the available information. The maximum 
possible level of information on returns is con¬ 
tained in tick-by-tick data. Engle and Russell 
(1998) proposed the autoregressive conditional du¬ 
ration (ACD) model to represent sequences of 
random times subject to clustering phenomena. 
In particular, the ACD model can be used to 
represent the random arrival of orders or the 
random time of trade execution. 

The arrival of orders and the execution of 
trades are subject to clustering phenomena in¬ 
sofar as there are periods of intense trading 
activity with frequent trading followed by pe¬ 
riods of calm. The ACD model is a point pro¬ 
cess. The simplest point process is likely the 
Poisson process, where the time between point 
events is distributed as an exponential vari¬ 
able independent of the past distribution of 
points. The ACD model is more complex than a 
Poisson process because it includes an autore¬ 
gressive effect that induces the point process 
equivalent of ARCH effects. As it turns out, the 
ACD model can be estimated using standard 
ARCH/GARCH software. Different extensions 
of the ACD model have been proposed. In par¬ 
ticular, Bauwens and Giot (1997) introduced the 
logarithmic ACD model to represent the bid- 
ask prices in the Nasdaq stock market. 


Ghysel and Jasiak (1997) introduced a class 
of approximate ARCH models of returns se¬ 
ries sampled at the time of trade arrivals. This 
model class, called ACD-GARCH, uses the ACD 
model to represent the arrival times of trades. 
The GARCH parameters are set as a function 
of the duration between transactions using in¬ 
sight from the Drost and Nijman weak GARCH. 
The model is bivariate and can be regarded as 
a random coefficient GARCH model. 


Multivariate Extensions 

The models described thus far are models of 
single assets. However, in finance, we are also 
interested in the behavior of portfolios of assets. 
If we want to forecast the returns of portfolios 
of assets, we need to estimate the correlations 
and covariances between individual assets. We 
are interested in modeling correlations not only 
to forecast the returns of portfolios but also to 
respond to important theoretical questions. For 
example, we are interested in understanding if 
there is a link between the magnitude of correla¬ 
tions and the magnitude of variances and how 
correlations propagate between different mar¬ 
kets. Questions like these have an important 
bearing on investment and risk management 
strategies. 

Conceptually, we can address covariances 
in the same way as we addressed variances. 
Consider a vector of N return processes: r t = 
jr,,} ,i = 1,... ,N,t = 1,... ,T. At every mo¬ 
ment t, the vector r t can be represented as: 
r t — m t (d) + s t , where m t (d) is the vector of 
conditional means that depend on a finite vec¬ 
tor of parameters d and the term e f is written 
as: 

e t = H/ 2 ( d)z t 

where H/ 2 (d) is a positive definite matrix that 
depends on the finite vector of parameters 
d. We also assume that the N-vector Zt has 
the following moments: E (z t ) = 0, Var (z f ) = In 
where In is the N x N identity matrix. 
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1 / 

To explain the nature of the matrix H/ 2 (i)), 
consider that we can write: 

Var (r f |I ( _i) = Var t _i (r f ) = Var ( _i (e f ) 

= H}V ar f _! (zt) H f 1/2 ' = H t 

where It -1 is the information set at time t — 1. 
For simplicity, we left out in the notation the 
dependence on the parameters 0. Thus H/ 2 is 
any positive definite N x N matrix such that 

H t is the conditional covariance matrix of the 

1 / 

process r t . The matrix H/ 2 could be obtained by 
Cholesky factorization of H t . Note the formal 
analogy with the definition of the univariate 
process. 

Consider that both the vector mt (b) and the 

1 / 

matrix H/ 2 ( f )) depend on the vector of param¬ 
eters i). If the vector t) can be partitioned into 
two subvectors, one for the mean and one for 
the variance, then the mean and the variance are 
independent. Otherwise, there will be an inte¬ 
gration of mean and variance as was the case in 
the univariate GARCH-M model. Let's abstract 
from the mean, which we assume can be mod¬ 
eled through some autoregressive process, and 

1 / 

focus on the process Et = H t (b) Zf. 

We will now define a number of specifica¬ 
tions for the variance matrix H t . In principle, 
we might consider the covariance matrix het- 
eroskedastic and simply extend the ARCH/ 
GARCH modeling to the entire covariance 
matrix. There are three major challenges in 
MGARCH models: 

1. Determining the conditions that ensure that 
the matrix H t is positive definite for every f. 

2. Making estimates feasible by reducing the 
number of parameters to be estimated. 

3. Stating conditions for the weak stationarity 
of the process. 

In a multivariate setting, the number of 
parameters involved makes the (conditional) 
covariance matrix very noisy and virtually 
impossible to estimate without appropriate 
restrictions. Consider, for example, a large 
aggregate such as the S&P 500. Due to symme¬ 


tries, there are approximately 125,000 entries 
in the conditional covariance matrix of the S&P 
500. If we consider each entry as a separate 
GARCH(1,1) process, we would need to esti¬ 
mate a minimum of three GARCH parameters 
per entry. Suppose we use three years of data 
for estimation, that is, approximately 750 data 
points for each stock's daily returns. In total, 
there are then 500 x 750 = 375,000 data points 
to estimate 3 x 125,000 = 375,000 parameters. 
Clearly, data are insufficient and estimation 
is therefore very noisy. To solve this problem, 
the number of independent entries in the 
covariance matrix has to be reduced. 

Consider that the problem of estimating large 
covariance matrices is already severe if we want 
to estimate the unconditional covariance matrix 
of returns. Using the theory of random matrices. 
Potter, Bouchaud, Laloux, and Cizeau (1999) 
show that only a small number of the eigenval¬ 
ues of the covariance matrix of a large aggregate 
carry information, while the vast majority of 
the eigenvalues cannot be distinguished from 
the eigenvalues of a random matrix. Techniques 
that impose constraints on the matrix entries, 
such as factor analysis or principal components 
analysis, are typically employed to make 
less noisy the estimation of large covariance 
matrices. 

Assuming that the conditional covariance ma¬ 
trix is time varying, the simplest estimation 
technique is using a rolling window. Estimating 
the covariance matrix on a rolling window suf¬ 
fers from the same problems already discussed 
in the univariate case. Nevertheless, it is one 
of the two methods used in RiskMetrics. The 
second method is the EWMA method. EWMA 
estimates the covariance matrix using the fol¬ 
lowing equation: 

H t = as t e' t + (1 — a) H t -\ 

where a is a small constant. 

Let's now turn to multivariate GARCH speci¬ 
fications, or MGARCH, and begin by introduc¬ 
ing the vech notation. The vech operator stacks 
the lower triangular portion of an N x N matrix 
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asanN(N + l)/2x 1 vector. In the vech nota¬ 
tion, the MGARCH(1,1) model, called the VEC 
model, is written as follows: 

h t = co+ Arjt-i + Bh t -i 

where h t = vech (H t ), u> is an N(N+ l)/2 x 1 
vector, and A,B are N (N + l)/2 x N (N + l)/2 
matrices. 

The number of parameters in this model 
makes its estimation impossible except in the bi¬ 
variate case. In fact, for N — 3 we should already 
estimate 78 parameters. In order to reduce the 
number of parameters, Bollerslev, Engle, and 
Wooldridge (1988) proposed the diagonal VEC 
model (DVEC), imposing the restriction that 
the matrices A, B be diagonal matrices. In the 
DVEC model, each entry of the covariance ma¬ 
trix is treated as an individual GARCH process. 
Conditions to ensure that the covariance matrix 
H t is positive definite are derived in Attanasio 
(1991). The number of parameters of the DVEC 
model, though much smaller than the number 
of parameters in the full VEC formulation, is 
still very high: 3 N (N + 1) /2. 

To simplify conditions to ensure that H f is pos¬ 
itive definite, Engle and Kroner (1995) proposed 
the BEKK model (the acronym BEKK stands 
for Baba, Engle, Kraft, and Kroner). In its most 
general formulation, the BEKK(1,1,K) model is 
written as follows: 

K K 

H t = CC' + J2 A £ t-i £ t-i A k + B kHt^iB k 

k=l k=l 

where C, A k , B k are N x N matrices and C is 
upper triangular. The BEKK(1,1,1) model sim¬ 
plifies as follows: 

H t = CC + As t -is' t _ 1 A+ B'Ht-iB 

which is a multivariate equivalent of the 
GARCH(1,1) model. The number of parameters 
in this model is very large; the diagonal BEKK 
was proposed to reduce the number of param¬ 
eters. 

The VEC model can be weakly (covariance) 
stationary but exhibit a time-varying condi¬ 
tional covariance matrix. The stationarity con¬ 


ditions require that the eigenvalues of the ma¬ 
trix A + B are less than one in modulus. Simi¬ 
lar conditions can be established for the BEKK 
model. The unconditional covariance matrix H 
is the unconditional expectation of the condi¬ 
tional covariance matrix. We can write: 

H = [I n , - A-B]~\ N* = N(N+ l)/2x 

MGARCH based on factor models offers a 
different modeling strategy. Standard (strict) 
factor models represent returns as linear regres¬ 
sions on a small number of common variables 
called factors: 

r t = m+ Bft + s t 

where r t is a vector of returns, ft is a vector of 
K factors, B is a matrix of factor loadings, St is 
noise with diagonal covariance, so that the co- 
variance between returns is accounted for only 
by the covariance between the factors. In this 
formulation, factors are static factors without 
a specific time dependence. The unconditional 
covariance matrix of returns £2 can be written 
as: 

D = Btt K B' + £ 

where is the covariance matrix of the factors. 

We can introduce a dynamics in the expec¬ 
tations of returns of factor models by making 
some or all of the factors dynamic, for example, 
assuming an autoregressive relationship: 

r t =m + Bf t + e f 

ft+i = a +bf t + rit 

We can also introduce a dynamic of volatilities 
assuming a GARCH structure for factors. En¬ 
gle, Ng, and Rothschild (1990) used the notion 
of factors in a dynamic conditional covariance 
context assuming that one factor, the market 
factor, is dynamic. Various GARCH factor mod¬ 
els have been proposed: the F-GARCH model 
of Lin (1992); the full factor FF-GARCH model 
of Vrontos, Dellaportas, and Politis (2003); the 
orthogonal O-GARCH model of Kariya (1988); 
and Alexander and Chibumba (1997). 
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Another strategy is followed by Bollerslev 
(1990) who proposed a class of GARCH mod¬ 
els in which the conditional correlations are 
constant and only the idiosyncratic variances 
are time varying (CCC model). Engle (2002) 
proposed a generalization of Bollerslev's CCC 
model called the dynamic conditional correla¬ 
tion (DCC) model. 


KEY POINTS 

• Volatility, a key parameter used in many fi¬ 
nancial applications, measures the size of the 
errors made in modeling returns and other fi¬ 
nancial variables. For vast classes of models, 
the average size of volatility is not constant 
but changes with time and is predictable. 

• In standard regression theory, the assump¬ 
tion of homoskedasticity is convenient from 
a mathematical point of view. The homo¬ 
skedasticity assumption means that the ex¬ 
pected size of the error is constant and does 
not depend on the size of the explanatory 
variable. When it is assumed in regression 
analysis that the expected size of the error 
term is not constant, this means the error 
terms are assumed to be heteroskedastic. 

• A major breakthrough in econometric 
modeling was the discovery that for many 
families of econometric models it is possi¬ 
ble to specify a stochastic process for the 
error terms and predict the average size 
of the error terms when models are fitted 
to empirical data. This is the essence of 
ARCH modeling. This original modeling of 
conditional heteroskedasticity has developed 
into a full-fledged econometric theory of the 
time behavior of the errors of a large class of 
univariate and multivariate models. 

• The availability of more and better data and 
the availability of low-cost, high-performance 
computers allowed the development of a vast 
family of ARCH/GARCH models. Among 
these are the EGARCH, IGARCH, GARCH- 
M, MGARCH, and ACD models. 


• While the forecasting of expected returns 
remains a rather elusive task, predicting 
the level of uncertainty and the strength 
of comovements between asset returns has 
become a fundamental pillar of financial 
econometrics. 
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Abstract: Classification and regression trees (CART) are nonparametric and nonlinear modeling 
techniques that do not rely upon the many stringent assumptions required by classical parametric 
models. Despite the fact that researchers in many fields have regularly found trees to be an attractive 
way to express underlying relationships, they are relatively unfamiliar to financial modelers where 
the historical focus of financial modeling has been on parametric regression. Although the linear 
type of regression analysis is convenient and sometimes intuitive, it may not fully capture the 
complexity of financial markets. As the quantity and variety of financial information available 
to data exploration have increased over time, there has been a commensurate need for a more 
robust and versatile process to analyze these data. CART offers a valuable alternative to traditional 
methods for modeling financial data. 


Classification and regression trees (CART) are non¬ 
parametric and nonlinear modeling techniques 
that essentially use recursive partitioning tech¬ 
niques to separate observations in a binary and 
sequential fashion. There are two varieties: (1) 
classification trees when the dependent variable 
is categorical, and (2) regression trees when the 
dependent variable is continuous. 

Although the approach is not widely uti¬ 
lized within the investment community, the 
applications of CART to financial markets 


nevertheless include the classification of finan¬ 
cially distressed firms by Frydman, Altman, 
and Kao (1985), asset allocation by Sorensen, 
Mezrich, and Miller (1998), equity style timing 
by Kao and Shumaker (1999), and stock selec¬ 
tion by Sorensen, Miller, and Ooi (2000). In this 
entry we provide an introduction to the CART 
framework and contrast it to more traditional 
modeling methods. We then illustrate the tech¬ 
nique by applying it to stock selection across 
the North American equity markets. 
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TECHNICAL DETAILS 

We begin by introducing the standard tree ter¬ 
minology. The root is the top node, which 
includes all observations in the learning sam¬ 
ple. The splitting condition at each node is 
expressed as an "if-then-else" rule that is de¬ 
termined by a specific splitting criterion. The 
splitting node is also called the parent and the 
two descendant subnodes are called children. 
A node keeps splitting until a terminal node or 
leaf is reached. 

The fundamental idea behind CART is to re¬ 
cursively partition the space until all the sub¬ 
spaces are sufficiently homogenous in order to 
apply simple models to them. This is in con¬ 
trast to linear and logistic regressions, the linear 
parametric counterparts of regression and clas¬ 
sification trees, respectively, which are global 
models where a single predictive formula is 
imposed over the entire data space. When the 
dataset has multiple features that interact in 
complicated and nonlinear ways, as is often the 
case with financial data, a single global model 
may inadequately capture the underlying rela¬ 
tionships. 

There are two major steps in the CART analy¬ 
sis: (1) Build a tree using a recursive splitting of 
nodes, and (2) prune the tree in order to obtain 
the optimal tree size so as to prevent overfit¬ 
ting. Each of these two steps will be discussed in 
more detail below. Breiman et al. (1984) provide 
a detailed overview of the theory and method¬ 
ology of CART, including a number of exam¬ 
ples from many disciplinary areas. There are 
also many software packages that implement 
the CART algorithm. Popular ones include R 
packages such as rpnrt and tree and the Matlab 
function classregtree. 

Binary Recursive Partitioning 

Let £ be a learning sample, C = {(x\,y\), ■ ■ ■, 
(x„. i/„)l, where x, is a vector of attributes; y, 
is the response, which can be categorical or 
continuous; and n is the number of observa¬ 
tions. The attribute vector x, belongs to X, the 



Figure 1 A split generates two children of the 
node f, denoted by f/, and t R . A proportion /:)/ of 
the initial data go into the left child and a propor¬ 
tion of Pr go into the right child. 

attribute space. The tree-building algorithm in¬ 
volves repeatedly splitting subsets of C into two 
descendant subsets, beginning with C itself. For 
a continuous variable x, , the allowed splits are 
of the form x; < c versus x, > c. For categorical 
variables the levels are divided into two classes. 
Therefore, for a categorical variable with K lev¬ 
els, there are 2 K 1 — 1 possible splits, disallow¬ 
ing the empty split and ignoring the order. 

In choosing the best splitting rule, CART seeks 
to maximize the average purity of the two child 
nodes. Hence, some criterion measuring data 
homogeneity or, alternatively, impurity should 
be introduced. These impurity measures are 
loosely classed splitting criteria. Let us intro¬ 
duce, for any node f, a measure i (f) that signifies 
the impurity of the node. Suppose that a candi¬ 
date split s divides the node into h and t R such 
that a proportion pi of the cases in t go into f/ 
and a proportion Pr go into t R (see Figure 1). 
Then the goodness of the split is defined to be 
the decrease of impurity 

A i(s, t ) = z'(f) - p L i{t L ) - p R i(t R ) 

For an arbitrary node t and a set of splitting 
candidates S, the optimal split is chosen to be 
the one 

s* = max Az(s, t) 

scS 

In other words, the optimal split is the one 
that reduces impurity by the greatest amount. 

The idea for classification and regression trees 
is quite similar in terms of partitioning meth¬ 
ods, which is based on impurity reducing. 
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However, they use different measures of im¬ 
purity to decode the split. 

In a classification problem, suppose that we 
want to classify data into K classes. At each 
node t of a classification tree we have a proba¬ 
bility distribution ptk, k — 1, ■ • •, K, over all K 
categories. The probabilities are conventionally 
estimated from the node proportions, such that 
ptk — ntk/tit, where n t k is the number of obser¬ 
vations in the fc-th class, and n t is the sample 
size at node f. 

The two most common measures of impurity 
are the Gini index 

i(t) = mVtiPfk = i - 
and entropy or information 

i(t) = -J2*Ptk l °g(Ptk) 

where 0 log(O) = 0. 

As for regression trees, the most popular im¬ 
purity measure is 

*(o = X!"li (w ~ ^ 2 

where the constant pit for node f is estimated 
by the mean of the values of the training data 
falling into node f. 

TREE PRUNING 

However, the use of partitioning rules alone 
cannot guarantee a useful tree model. If reduc¬ 
ing impurity is the only goal in tree induction, 
we will eventually end up with a maximal tree, 
which has one observation or one class in each 
leaf, whichever reaches first. This kind of tree 
adapts too well to the features of the learn¬ 
ing sample and has a very high risk of being 
overfitted. Tree pruning is a way to improve 
the robustness of the model by trading off in- 
sample fitting against out-of-sample accuracy. 
This is particularly important if the model is 
being used to make predictions. 

The best-known procedure for tree pruning 
is the cost-complexity pruning proposed by 
Breiman et al. (1984). Let T be a subtree of the 
maximal tree grown without pruning. Let the 


size of a tree be the number of terminal nodes. 
The optimal tree is the one that minimizes the 
following cost-complexity measure 

Rc(T) = R(T) + a size(T) 

where a is a complexity parameter to penal¬ 
ize the tree size, and R is the cost, which is 
commonly taken as misclassification errors in 
classification cases and deviance in regression 
cases. For a given value of the complexity pa¬ 
rameter a, an optimal tree can be determined. In 
general, finding the optimal value for a would 
require an independent set of data (i.e., a testing 
sample). This requirement is often avoided in 
practice by using a cross validation procedure. 

STRENGTHS AND 
WEAKNESSES OF CART 

Compared to classical parametric models, 
CART offers a number of benefits in data ex¬ 
ploration. In particular, it has a very high 
degree of interpretability. CART efficiently 
compresses a large volume of data into an easy- 
to-understand graphical form that identifies the 
essential characteristics. It is also very flexible 
in terms of the structure of the input variables, 
as either categorical or continuous factors or 
a combination can be used as inputs. Further¬ 
more, CART is quite robust in the presence of 
outliers and well suited to noisy datasets, both 
of which tend to be features of financial data. 

Being nonparametric it does not require any 
assumptions to be made about the underlying 
distribution of the variables being modeled. 
The high incidences of extreme events in the 
financial markets suggest that the supposition 
of distributional normality is questionable in 
many areas of finance. While the assumption 
may in many cases be a fair approximation 
to the underlying structural relationship, it 
is quite rare that tests for non-normality or 
nonlinearity are explicitly assessed in advance 
even though this information would help to 
inform the appropriate choice of modeling 
technique. 
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The CART approach also departs from tradi¬ 
tional modeling methods by determining a hi¬ 
erarchy of input variables that may be closer to 
the human decision-making processes. Indeed, 
a key strength of CART over classical model¬ 
ing methods is that it allows one to represent 
various types of interactions between variables, 
particularly conditional relevance. Conditional 
relevance occurs if a factor is relevant only when 
it is conditioned upon some other factor. For ex¬ 
ample, only if a certain condition is met by the 
first high-level attribute is a second attribute 
taken into consideration. The same holds for 
the next attribute in the tree hierarchy, and 
so on. 

Another possible benefit for financial model¬ 
ers using CART is the diversification of model 
risk as argued by Philpotts et al. (2011). The 
widespread use of linear modeling methodolo¬ 
gies among quantitative asset managers, taken 
together with the similarity in data sources and 
risk models, may in turn have contributed to 
model risk in financial markets leading to a 
high degree of commonality in investment de¬ 
cisions. As a less used technique, CART is ap¬ 
pealing in the context of potentially offering 
a degree of model diversification. Philpotts et 
al. (2011) present empirical evidence highlight¬ 
ing the favorable performance of tree-based 
models compared to more traditional modeling 
techniques. 

One potential weakness of the recursive par¬ 
titioning tree construction process is local op¬ 
timization instead of global optimization. That 
is, the sequential node-splitting process chooses 
the next split without attempting to optimize 
the performance of the whole tree. The result¬ 
ing tree structure therefore does not guarantee 
global optimization. Instability is another pos¬ 
sible problem in CART solutions. This refers 
to perturbing a small proportion of the learn¬ 
ing sample or resampling the learning sample, 
which often results in a solution with a very 
different tree structure. Several alternatives to 
CART have been developed to address these 


problems, such as random forests (see Brieman, 
2001) and a hybrid approach that combines 
CART with logistic regression (see Zhu et al., 
2011). 

APPLICATION OF CART IN 
STOCK SELECTION 

In this section, we provide a detailed example of 
the CART algorithm as applied to the problem 
of identifying profitable stocks. This example 
was specifically chosen so as to provide a con¬ 
trast with the vast majority of the linear model¬ 
ing techniques used by financial practitioners. 
The model was built with monthly stock data 
from December 1986 to August 2010 covering 
all liquid stocks listed on the North American 
equity markets but excluding financial stocks 
because they would require their own specific 
model. 1 The number of total observations is 
279,188 (or 980 stocks per month on average). 

At the end of each month, forward total stock 
returns (price return plus dividends) were cal¬ 
culated. Using the median return of all sample 
companies in the same period as a proxy of 
the market return, the excess returns were then 
computed as the total returns minus the market 
returns. 

A broad spectrum of company valuation and 
quality-based characteristics, as well as mea¬ 
sures of investor sentiment such as price mo¬ 
mentum and earnings revisions were selected 
as reported in Table 1. Instead of using raw val¬ 
ues, we use rank orders in order to improve 
the robustness of the analyses. At each month, 
the rank order for each variable was computed 
by first ranking n stocks according to the corre¬ 
sponding variable value, and then dividing the 
rank by n to scale it between 0 and 1. Further¬ 
more, in order to overcome the high correlation 
among some of the explanatory variables, nine 
composite factors were promoted as potential 
explanatory variables, which were constructed 
as an equally weighted average of multiple vari¬ 
ables as described in Table 1. 
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Table 1 Input Variables 


Composite factor Description 


Value 

(VAL) 

Profitability 

(PROF) 

Leverage 

(LEVERAGE) 

Debt Service 
(DEBT.SERVICE) 

Momentum 

(MOM) 

Stability 

(STAB) 

Historic Growth 
(HIST.GROWTH) 

Forward Growth 
(FWD.GROWTH) 


An equally weighted average of value metrics including dividends to price, cash flow to 
price, sales to price, and book to price. 

An equally weighted average of profitability terms including return on equity, cash 
return on equity, pretax margins, and asset turnover. 

An equally weighted average of financial strength terms including debt to equity and 
debt to market cap. 

An equally weighted average of debt sustainability measures including interest cover 
and free cash flow to debt. 

An equally weighted average of momentum terms measured over various time 
horizons including 6 months and 12 months. 

A composite term that captures the volatility in earnings, sales, and cash flows over the 
previous 5 years. 

An equally weighted average of 3-year historic growth in earnings, sales, and cash flow. 

An equally weighted average of I/B/E/S forecasted earnings growth expectation for 
FY1 and FY2. 


Earnings Revisions An equally weighted average of the 3-month change in I/B/E/S forecasted earnings 
(EREV) expectations for FY1 and FY2. 


We built a classification tree with the pur¬ 
pose of predicting subsequent stock perfor¬ 
mance. Stocks were sorted into two groups, 
"outperformers" for those with positive excess 
returns and "underperformers" for the remain¬ 
der. The induced categorical variable was then 
used as the dependent variable in the subse¬ 
quent modeling process. One of the benefits 
of working with categorical responses instead 
of raw returns lies in the fact that it alleviates 
the impact of extreme returns, which may have 
multiple causes. The tree model was built with 
the data up to and including April 2007 while 
the data between May 2007 and August 2010 
were reserved for out-of-sample testing. Figure 
2 graphically illustrates the hierarchical structure 
of the stock selection tree. 

The first observation to note is that the pri¬ 
mary split is valuation based. More specifi¬ 
cally, the tree makes a distinction between those 
stocks that are relatively expensive (the right- 
hand branch) and those that are not expensive. 
One of the most attractive nodes splits again 
on high value and therefore identifies cheap 


stocks as having a 59.2% probability of outper¬ 
forming the universe (Node 1). In contrast, the 
worst performing stocks are characterized by 
being expensive and exhibiting low profitabil¬ 
ity (Node 14). Companies with these attributes 
only have a 42% chance of outperforming. 

The tree is able to identify the exception to 
the rule. For example, while identifying that 
value was the most important driver of stock re¬ 
turns, the tree also suggests that more expensive 
stocks still have a good chance of outperform¬ 
ing the market providing that they are blessed 
with profitability, stability in earnings, strong 
momentum, and are also associated with strong 
earnings revisions (Node 10). 

Similarly, the decision tree framework also 
highlights the nonlinear behavior of the stock 
returns to the underlying predictor variables. 
For example, stocks in Nodes 3 and 5 have 
similar outperforming probabilities but are 
of opposite preference with regard to lever¬ 
age. Conditional on above-average debt cover. 
Node 3 actually prefers some degree of lever¬ 
age and more significantly penalizes overly 
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Figure 2 Decision Tree for North American stocks built using data from December 1986 to April 2007 to model the chance of a stock 
outperforming the overall market. The dependent variable is set as an "outperformer" (Out) if a stock subsequently achieves a higher return 
than the market, and "underperformer" (Und) otherwise. The outperforming probabilities are reported in percentages at each terminal node 
along with the splitting criteria. 
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Table 2 Out-of-Sample Performance (May 2007- 
August 2010). Portfolios were rebalanced monthly and 
transaction costs were not taken into account. 


Portfolio 

Excess 

Return 

(%) 

Tracking 
Error (%) 

IR 

Monthly 

Win 

Rate 

Long 

2.6 

2.9 

0.89 

0.57 

Short 

-2.8 

3.4 

-0.82 

0.43 


conservative firms (with too low leverage). 
In contrast, leverage is a characteristic to be 
avoided among firms that cannot service their 
debts (Node 5). 

Table 2 is an out-of-sample test of the model. 
Each month from May 2007 until August 2010, 
we ranked all stocks based upon the predicted 
outperforming probabilities by the tree model 
and formed two portfolios. One portfolio is an 
equal weighting of stocks with the highest half 
of outperforming probabilities (long), and the 
second is an equal weighting of the rest ex¬ 
pected to underperform (short). Table 2 reports 
the annualized excess return, the tracking er¬ 
ror, the information ratio, and the monthly win 
rate of the two portfolios. The long portfolio 
outperformed the benchmark by 2.6% with a 
similar relative risk. The short portfolio un¬ 
derperformed by 2.8% with a slightly higher 
tracking error. The monthly win rate is the 
proportion of months that a portfolio outper¬ 
formed the benchmark out-of-sample. The tree 
model achieved a monthly win rate of 57%. 

KEY POINTS 

• CART is a flexible modeling technique that of¬ 
fers significant potential to assist in financial 
decision making. 

• CART is a nonparametric modeling technique 
that does not impose the stringent assump¬ 
tions required by classical regression analysis, 
and therefore sidesteps many of the known 
issues associated with traditional parametric 
models. 

• CART is well suited to identifying nonlinear¬ 
ities and complex interactions in the data. It 


is minimally affected by missing values, out¬ 
liers, or multicollinearity. 

• Unlike many other methods, CART can be 
easily visualized, which helps financial deci¬ 
sion makers to assess the theoretical support 
behind the resulting investment insights. 

• The hierarchical structure of a tree model may 
more closely resemble how the human brain 
makes decisions. In particular, the "if-then- 
else" nature of the rules in the model emulates 
an expert system that is able to incorporate the 
exception to the rule. 

* CART also embraces the important feature of 
conditional relevance, which is widespread in 
financial data. In the CART framework, input 
variables are allowed to interact and have dif¬ 
ferent influences under varying conditions. 

* As with any other quantitative model devel¬ 
opment process, care must be taken to ensure 
the integrity of the input data and that the 
intended application makes intuitive sense. 

NOTE 

1. Financial stocks were excluded due to 
their different accounting structure, which 
makes comparisons with nonfinancials trou¬ 
blesome, although similarly structured stock 
selection models can also be applied within 
the sector. 
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Abstract: Financial time series data tend to exhibit stochastic trends. To uncover relationships among 
financial variables it is important to model changes in stochastic trends over time. Cointegration 
can be used to identify common stochastic trends among different financial variables. If financial 
variables are cointegrated, it can also be shown that the variables exhibit a long-run relationship. If 
this long-run relationship is severed, this may indicate the presence of a financial bubble. 


The long-term relationships among economic 
variables, such as short-term versus long-term 
interest rates, or stock prices versus dividends, 
have long interested finance practitioners. For 
certain types of trends, multiple regression 
analysis needs modification to uncover these 
relationships. A trend represents a long-term 
movement in the variable. One type of trend, 
a deterministic trend, has a straightforward so¬ 
lution. Since a deterministic trend is a function 
of time, we merely include this time function 
in the regression. For example, if the variables 
are increasing or decreasing as a linear func¬ 
tion of time, we may simply include time as 
a variable in the regression equation. The is¬ 
sue becomes more complex when the trend is 
stochastic. A stochastic trend is defined (Stock 
and Watson, 2003) as "a persistent but random 
long-term movement of the variable over time." 
Thus a variable with a stochastic trend may 


exhibit prolonged long-run increases followed 
by prolonged long-run declines and perhaps 
another period of long-term increases. 

Most financial theorists believe stochastic 
trends better describe the behavior of financial 
variables than deterministic trends. For exam¬ 
ple, if stock prices are rising, there is no reason 
to believe they will continue to do so in the fu¬ 
ture. Or, even if they continue to increase in the 
future, they may not do so at the same rate as in 
the past. This is because stock prices are driven 
by a variety of economic factors and the impact 
of these factors may change over time. One way 
of capturing these common stochastic trends is 
by using an econometric technique usually re¬ 
ferred to as cointegration. 

In this entry, we explain the concept of coin¬ 
tegration. There are two major ways of testing 
for cointegration. We outline both econometric 
methods and the underlying theory for each 
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method. We illustrate the first technique with 
an example of the first type of cointegration 
problem, testing market efficiency. Specifically, 
we examine the present value model of stock 
prices. We illustrate the second technique with 
an example of the second type of cointegration 
problem, examining market linkages. In partic¬ 
ular, we test the linkage and the dynamic in¬ 
teractions among stock market indexes of three 
European countries. Finally, we also use cointe¬ 
gration to test for the presence of an asset price 
bubble. Specifically, we test for the possibility 
of bubbles in the real estate markets. 

STATIONARY AND 
NONSTATIONARY 
VARIABLES AND 
COINTEGRATION 

The presence of stochastic trends may lead a 
researcher to conclude that two economic vari¬ 
ables are related over time when in fact they 
are not. This problem is referred to as spuri¬ 
ous regression. For example, during the 1980s 
the U.S. stock market and the Japanese stock 
market were both rising. An ordinary least 
squares (OLS) regression of the U.S. Morgan 
Stanley Stock Index on the Japanese Morgan 
Stanley Stock Index (USD) for the time period 
1980-1990 using monthly data yields 

Japanese Stock Index = 76.74 + 19 U.S. 

Stock Index 

f-statistic (—13.95) (26.51) R-square = 0.86 

The f-statistic on the slope coefficient (26.51) 
is quite large, indicating a strong positive rela¬ 
tionship between the two stock markets. This 
strong relationship is reinforced with a very 
high R-square value. However, estimating the 
same regression over a different time period, 
1990-2007, reveals 

Japanese Stock Index = 2905.67 — 0.29 U.S. 

Stock Index 

f-statistic (30.54) (—2.80) R-square = 0.04 


This regression equation suggests there is a 
strong negative relationship between the two 
stock market indexes. Although the f-statistic 
on the slope coefficient (2.80) is large, the low 
R-square value suggests that the relationship is 
rather weak. 

The reason behind these contradictory results 
is the presence of stochastic trends in both se¬ 
ries. During the first time span, these stochastic 
trends were aligned, but not during the latter 
time span. Since different economic forces in¬ 
fluence the stochastic trends and these forces 
change over time, during some periods they 
will line up and in some periods they will not. 
In summary, when the variables have stochas¬ 
tic trends, the OLS technique may provide 
misleading results—the spurious regression 
problem. 

Another problem is that when the variables 
contain a stochastic trend, the f-values of the 
regressors no longer follow a normal distribu¬ 
tion, even for large samples. Standard hypothe¬ 
sis tests are no longer valid for these nonnormal 
distributions. 

At first, researchers attempted to deal with 
these problems by removing the trend through 
differencing these variables. That is, they fo¬ 
cused on the change in these variables, X t - 
X f _ 2 , rather than the level of these variables, 
X f . Although this technique was successful for 
univariate Box-Jenkins analysis, there are two 
problems with this approach in a multivariate 
scenario. First, we can only make statements 
about the changes in the variables rather than 
the level of the variables. This will be particu¬ 
larly troubling if our major interest is the level 
of the variable. Second, if the variables are sub¬ 
ject to a stochastic trend, then focusing on the 
changes in the variables will lead to a specifica¬ 
tion error in our regressions. 

The cointegration technique allows re¬ 
searchers to investigate variables that share the 
same stochastic trend and at the same time 
avoid the spurious regression problem. Coin¬ 
tegration analysis uses regression analysis to 
study the long-run linkages among economic 
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variables and allows us to consider the short- 
run adjustments to deviations from the long- 
run equilibrium. 

The use of cointegration in finance has 
grown significantly. Surveying this vast liter¬ 
ature would take us beyond the scope of this 
entry. To narrow our focus, we note that coin¬ 
tegration analysis has been used mainly for 
two types of problems in finance. First, it has 
been used to evaluate the efficiency of finan¬ 
cial markets in a wide variety of contexts. For 
example, it was used to evaluate the purchas¬ 
ing power parity theory (see Enders, 1988), the 
rational expectations theory of the term struc¬ 
ture, the present value model of stock prices 
(Campbell and Shiller, 1987), and the relation¬ 
ship between the forward and spot exchange 
rates (Liu and Maddala, 1992). The second type 
of cointegration study investigates market link¬ 
ages. For example, Hendry and Juselius (2000) 
examine how gasoline prices at different sta¬ 
tions are linked to the world price of oil. Ar- 
shanapalli and Doukas (1993) investigate the 
linkages and dynamic interactions among stock 
market indexes of several countries. 

Before explaining cointegration it is first nec¬ 
essary to distinguish between stationary and 
nonstationary variables. A variable is said to be 
stationary (more formally, weakly stationary) if 
its mean and variance are constant and its au¬ 
tocorrelation depends on the lag length, that 
is 

E(X t ) = jx, Var(X t ) = or 2 , and 

Cov(X t , X t _j) = y(l) 

Stationary means that the variable fluctuates 
about its mean with constant variation. Another 
way to put it is that the variable exhibits mean 
reversion and so displays no stochastic trend. 
In contrast, nonstationary variables may wan¬ 
der arbitrarily far from the mean. Thus, only 
nonstationary variables exhibit a stochastic 
trend. 

The simplest example of a nonstationary vari¬ 
able is a random walk. A variable is a ran¬ 
dom walk if X f = X t _i + e t where e t is a 


random error term with mean 0 and standard 
deviation a. It can be shown that the standard 
deviation <j(X t ) = to (see Stock and Watson, 
1993), where t is time. Since the standard de¬ 
viation depends on time, a random walk is 
nonstationary. 

Nonstationary time series are often referred 
to as a unit root series. The unit root reflects the 
coefficient of the X t _i term in an autoregres¬ 
sive relationship of order one. In higher-order 
autoregressive models, the condition of nonsta- 
tionarity is more complex. Consider the p order 
autoregressive model where the a, terms are co¬ 
efficients and the L' is the lag operator. If the 
sum of polynomial coefficients equals 1, then 
the X f series are nonstationary and again are 
referred to as a unit root process. 

(1 — fljL 1 — ... — fl p L p )X t — e t + a 0 (1) 

If all the variables under consideration are 
stationary, then there is no spurious regression 
problem and standard OLS applies. If some of 
the variables are stationary, and some are non¬ 
stationary, then no economically significant re¬ 
lationships exist. Since nonstationary variables 
contain a stochastic trend, they will not exhibit 
any relationship with the stationary variables 
that lack this trend. The spurious regression 
problem occurs only when all the variables in 
the system are nonstationary. 

If the variables share a common stochastic 
trend, we may overcome the spurious regres¬ 
sion problem. In this case, cointegration anal¬ 
ysis may be used to uncover the long-term 
relationship and the short-term dynamics. Two 
or more nonstationary variables are cointe¬ 
grated if there exists a linear combination of the 
variables that is stationary. This suggests cointe¬ 
grated variables share long-run links. They may 
deviate in the short run but are likely to get back 
to some sort of equilibrium in the long run. The 
term "equilibrium" is not the same as used in 
economics. To economists equilibrium means 
the desired amount equals the actual amount, 
and there is no inherent tendency to change. In 
contrast, equilibrium in cointegration analysis 
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means that if variables are apart, they show a 
greater likelihood to move closer together than 
further apart. 

More formally, consider two-time series x t 
and y t . Assume that both series are nonstation¬ 
ary and integrated order one. (Integrated order 
one means that if we difference the variable one 
time, the resultant series is stationary.) These 
series are cointegrated if Zf = x t - ay t , Zt is sta¬ 
tionary for some value of a. In the multivariate 
case, the definition is similar with vector nota¬ 
tion. Let A and Y be vectors (flj,^/_ a n) and 

(yit, yit ,... y n t)'- Then the variables in Y are coin¬ 
tegrated if each of the yit ■■ ■ y n t are nonstation¬ 
ary and Z = AY, Z is stationary. A represents a 
cointegrating vector. 

Cointegration represents a special case. We 
should not expect most nonstationary variables 
to be cointegrated. If two variables lack coin¬ 
tegration, then they do not share a long-run 
relationship or a common stochastic trend be¬ 
cause they can move arbitrarily far away from 
each other. In terms of the present value model 
of stock prices, suppose stock prices and div¬ 
idends lack cointegration. Then stock prices 
could rise arbitrarily far above the level of their 
dividends. This would be consistent with a 
stock market bubble (see Gurkaynak, 2005, for a 
more rigorous discussion of cointegration tests 
of financial market bubbles) and even if it is 
not a bubble, it is still inconsistent with the ef¬ 
ficient market theory. In terms of stock market 
linkages, if the stock price indexes of different 
countries lack cointegration, then stock prices 
can wander arbitrarily far apart from each other. 
This possibility should encourage international 
portfolio diversification. 

TESTING FOR 
COINTEGRATION 

There are two popular methods of testing for 
cointegration: the Engle-Granger tests and the 
Johansen-Juselius tests. We discuss and illus¬ 
trate both in the remainder of this entry. 


Engle-Granger Cointegration Tests 

The Engle-Granger conintegration test, devel¬ 
oped by Engle and Granger (1987), involves the 
following four-step process: 

Step 1 

First determine whether the time series vari¬ 
ables under investigation are stationary. We 
may consider both informal and formal meth¬ 
ods. Informal methods entail an examination of 
a graph of the variable over time and an ex¬ 
amination of the autocorrelation function. The 
autocorrelation function describes the autocor¬ 
relation of the series for various lags. The corre¬ 
lation coefficient between x t and x f _, is called the 
lag-/ autocorrelation. For nonstationary vari¬ 
ables, the lag one autocorrelation coefficient 
should be very close to one and decay slowly 
as the lag length increases. Thus, examining 
the autocorrelation function allows us to deter¬ 
mine the stationarity of a variable. This method 
is not perfect. For stationary series that are 
very close to unit root processes, the autocor¬ 
relation function may exhibit the slow-fading 
behavior described above. If more formal meth¬ 
ods are desired, the researcher may employ the 
Dickey-Fuller statistic, the augmented Dickey- 
Fuller statistic, or the Phillips-Perron statis¬ 
tic. These statistics test the hypothesis that the 
variables have a unit root, against the alterna¬ 
tive that they do not (Dickey and Fuller, 1979, 
1981; Phillips and Perron, 1988). The Phillips- 
Perron test makes weaker assumptions than 
both Dickey-Fuller statistics and is generally 
considered more reliable (Phillips and Perron, 
1988). If it is determined that the variable is 
nonstationary and the differenced variable is 
stationary, proceed to step 2. 

Step 2 

Estimate the following regression: 

y t = c + dx t + z t 


( 2 ) 
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To make this concrete, let y t represent some 
U.S. stock market index, x t represents stock div¬ 
idends on that stock market index, and Zf is 
the error term, c and d are regression parame¬ 
ters. For cointegration tests, the null hypothesis 
states that the variables lack cointegration, and 
the alternative claims that they are cointegrated. 

Step 3 

To test for cointegration, we test for stationarity 
in Zf . The Dickey-Fuller test is the most obvi¬ 
ous candidate. That is, we should consider the 
following autoregression of the error term: 

AZf = pzt-i + u t (3) 

where Zf is the estimated residual from equa¬ 
tion (2). The test focuses on the significance of 
the estimated p. If the estimate of p is statisti¬ 
cally negative, we conclude that the residuals, 
Zf, are stationary and reject the hypothesis of no 
cointegration. 

The residuals of equation (3) should be 
checked to ensure they are white noise. If 
they are not, we should employ the aug¬ 
mented Dickey-Fuller test (ADF). The aug¬ 
mented Dickey-Fuller test is analogous to the 
Dickey-Fuller test but includes additional lags 
of A Zf as shown in equation (4). The ADF test 
for stationarity, like the Dickey-Fuller test, tests 
the hypothesis of p = 0 against the alternative 
hypothesis of p < 0 for the equation (4): 

Azt = pz t _! + a i Azt_i + ■ ■ • + fl n Azt_n + i*t 

( 4 ) 

Generally, the OLS-produced residuals tend 
to have as small a sample variance as possi¬ 
ble, thereby making residuals look as station¬ 
ary as possible. Thus, the standard f-statistic 
or ADF statistic may reject the null hypoth¬ 
esis of nonstationarity too often. Flence, it is 
important to have correct statistics; fortunately, 
Engle and Yoo (1987) provide the correct statis¬ 
tics. Furthermore, if it is believed that the vari¬ 
able under investigation has a long-run growth 


component, it is appropriate to test the series for 
stationarity around a deterministic time trend 
for both the DF and ADF tests. This is accom¬ 
plished by adding a time trend to equations (3) 
or (4). 

Step 4 

The final step involves estimating the error- 
correction model. Engle and Granger (1987) 
showed that if two variables are cointegrated, 
then these variables can be described in an error- 
correction format described in the following two 
equations: 

n n 

At/t = ho + '^ / b li Ay t _ i + Y cp Ax f _ ; - 
i =1 i= 

+ di(y t _i - flXt-i) + eit (5) 

n n 

Ax t = b 20 + Y bu&yt-i + Y c 2j Ax t-i 

>'=i i= 

+ d 2 (yt -1 - ax t _i) + e 2i (6) 

Equation (5) tells us that the changes in y t 
depend on its own past changes, the past 
changes in x t , and the disequilibrium between 
Xf _2 and y t ~i ( y t -i — ax t -i). The size of the error- 
correction term, dj, captures the speed of ad¬ 
justment of x t and y t to the previous period's 
disequilibrium. Equation (6) has a correspond¬ 
ing interpretation. 

The appropriate lag length is found by exper¬ 
imenting with different lag lengths. For each 
lag the Akaike information criterion (AIC), the 
Bayes information criterion, or the Schwarz 
information criterion is calculated and the 
lag with the lowest value of the criteria is 
employed. 1 

The value of (yt-i - ax f _i) is estimated with 
the residuals from the cointegrating equation 
(3), Zf_j. This procedure is only legitimate if the 
variables are cointegrated. The error-correction 
term, z f will be stationary by definition if and 
only if they are cointegrated. The remaining 
terms in the equation, the lag difference of each 
variable, are also stationary because the levels 
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S&P 500 Index and Dividends* 1962:2006 



Figure 1 

were assumed nonstationary. This guarantees 
the stationarity of all the variables in equations 
(5) and (6) and justifies the use of OLS. 

Empirical Illustration Using the 
Dividend Growth Model 

The dividend growth model of stock price val¬ 
uation claims the fundamental value of a stock 
is determined by the present value of its fu¬ 
ture dividend stream. This model may be rep¬ 
resented as: 

P 0 = I>/(l + r) 

where 

P 0 is the current stock price 
di is a dividend in period i 
r is the discount rate 

If the discount rate exceeds the growth rate of 
dividends and the discount rate remains con¬ 
stant over time, then one can test for cointe¬ 
gration between stock prices and dividends. In 
brief, if the present value relationship holds, one 


does not expect stock prices and dividends to 
meander arbitrarily far from each other. 

Before starting any analysis it is useful to ex¬ 
amine the plot of the underlying time series 
variables. Figure 1 presents a plot of stock prices 
and dividends for the years 1962 through 2006. 
The stock prices are represented by the S&P 
500 index and the dividends represent the div¬ 
idend received by the owner of $1,000 worth of 
the S&P 500 index. The plot shows that the vari¬ 
ables move together until the early 1980s. As a 
result of this visual analysis, we will entertain 
the possibility that the variables were cointe¬ 
grated until the 1980s. After that, the common 
stochastic trend may have dissipated. We will 
first test for cointegration in the 1962-1982 pe¬ 
riod and then for the whole 1962-2006 period. 

In accordance with the first step of the 
cointegration protocol, we must first establish 
the nonstationarity of the variables. To iden¬ 
tify nonstationarity, we will use both formal 
and informal methods. The first informal test 
consists of analyzing the plot of the series 
shown in Figure 1. Neither series exhibits mean 
reversion. The dividend series wanders less 
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Table 1 Auto Correlation Functions of the S&P 500 Index and Dividends 


Lag Auto Correlation 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

S&P 500 

.993 

.986 

.979 

.973 

.967 

.961 

.954 

.948 

.940 

.933 

.926 

.918 

Dividend 

.991 

.983 

.974 

.966 

.958 

.979 

.941 

.933 

.925 

.916 

.908 

.900 

Lag Auto Correlation 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

S&P 500 

.911 

.903 

.896 

.889 

.881 

.874 

.866 

.858 

.851 

.843 

.835 

.827 

Dividend 

.891 

.883 

.876 

.868 

.860 

.852 

.845 

.837 

.830 

.822 

.815 

.808 

Lag Auto Correlation 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

S&P 500 

.819 

.811 

.804 

.796 

.789 

.782 

.775 

.768 

.761 

.754 

.748 

.741 

Dividend 

.801 

.794 

.788 

.781 

.775 

.769 

.763 

.758 

.753 

.747 

.743 

.738 


from its mean than the stock prices. Neverthe¬ 
less, neither series appears stationary. 

The second informal method involves exam¬ 
ining the autocorrelation function. We present 
in Table 1 the autocorrelation function for 36 
lags of the S&P 500 index and the dividends 
for the 1962-2006 period using monthly data. 
The autocorrelations for the early lags are quite 
close to one. Furthermore, the autocorrelation 
function exhibits a slow decay at higher lags. 
This provides sufficient evidence to conclude 
that stock prices and dividends are nonstation¬ 
ary. When we inspect the autocorrelation func¬ 
tion of their first differences (not shown), the 
autocorrelation of the first lag is not close to 
one. We may conclude the series are stationary 
in the first differences. 

In Table 2, we present the results of formal 
tests of nonstationarity. The lag length for the 
ADF test was determined by the Schwarz cri¬ 
terion. The null hypothesis is that the S&P 500 


stock index (dividends) contains a unit root; the 
alternative is that it does not. For both statistics, 
the ADF and the Phillips-Perron, the results in¬ 
dicate that the S&P 500 index is nonstationary 
and the changes in that index are stationary. The 
results for dividends are mixed. The ADF statis¬ 
tic supports the presence of a unit root in div¬ 
idends, while the Phillips-Peron statistic does 
not. Since both the autocorrelation function and 
the ADF statistic conclude there is a unit root 
process, we shall presume that the dividend se¬ 
ries is nonstationary. In sum, our analysis sug¬ 
gests that the S&P 500 index and dividends 
series each contain a stochastic trend in the lev¬ 
els, but not in their first differences. 

In the next step of the protocol we exam¬ 
ine whether the S&P 500 index and dividends 
are cointegrated. This is accomplished by es¬ 
timating the long-run equilibrium relation by 
regressing the logarithm (log) of the S&P 500 
index on the log of the dividends. We use the 


Table 2 Stationarity Test for the S&P 500 Index and Dividends 1962-2006 


Variable 

Augumented Dickey 

Fuller (ADF) 

Phillips-Perron 

Critical Value of Test 
Statistics at 1%, 5%, 10% 
Significance 

S&P 500 

1.22 

1.12 

-3.44 (1%) 

A S&P 500 

-19.07 

-19.35 

-2.87 (5%) 

Dividends 

1.52 

4.64 

-2.56 (10%) 

A Dividends 

-2.13 

-31.68 



Null hypothesis: Variable is nonstationary. 

The lag length for the ADF test was determined by the Schwarz Criterion. For the S&P 500 index and its first 
difference, the lag length was 1. For the dividends and its first difference, the lag lengths were 12 and 11, respectively. 
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Table 3 Cointegration Regression: S&P 500 and Dividends 


Log S&P 500 = 

a + b log dividends + Zf 



Period 

Constant 

Coefficient of Dividends 

t-Stat Dividends 

1962-1982 

4.035 

.404 

17.85 

1962-2006 

2.871 

1.336 

68.54 


logarithms of both variables to help smooth the 
series. The results using monthly data are re¬ 
ported in Table 3 for both the 1962-1982 period 
and the 1962-2006 period. We pay little atten¬ 
tion to the high f-statistic on the dividends vari¬ 
able because the f-test is not appropriate unless 
the variables are cointegrated. This is, of course, 
the issue. 

Once we estimate the regression in step 2, 
the next step involves testing the residuals of 
the regression, Zt, for stationarity. By definition, 
the residuals have a zero mean and lack a time 
trend. This simplifies the test for stationarity. 
This is accomplished by estimating equation (4). 
The null hypothesis is that the variables lack 
cointegration. If we conclude that p in equa¬ 
tion (4) is significantly negative, then we reject 
the null hypothesis and conclude that the evi¬ 
dence is consistent with the presence of cointe¬ 
gration between the stock index and dividends. 


The appropriate lag lengths may be determined 
by the Akaike information criterion or theoreti¬ 
cal and practical considerations. We decided to 
use a lag length of three periods representing 
one quarter. The results are presented in Table 
4. For the 1962-1982 period, we may reject the 
null hypothesis of no cointegration at the 10% 
level of statistical significance. For the entire pe¬ 
riod (1962-2006), we cannot reject the null hy¬ 
pothesis (p — 0) of no cointegration. Apparently, 
the relationship between stock prices and div¬ 
idends unraveled in the 1980s and the 1990s. 
This evidence is consistent with the existence of 
an Internet stock bubble in the 1990s. 

Flaving established that the S&P 500 index 
and dividends are cointegrated from 1962-1982, 
in the final step of the protocol we examine the 
interaction between stock prices and dividends 
by estimating the error-correction model, equa¬ 
tions (5) and (6). It is useful at this point to 


Table 4 Augmented Dickey Fuller Tests of Residuals for Cointegration 


Variable 

Coefficient 

t-Stat 

p-Value 

Panel A 1962-1982 n = 248 




Zt 

-.063 

-3.23 

.001 

A z ( _ i 

.272 

4.32 

.000 

Az f _2 

-.030 

-.46 

.642 

Az,_3 

.090 

1.40 

.162 

t-statistic of p = —3.23; critical values (5%) 
Panel B 1962-2006 n = 536 

-3.36 (10%) -3.06 



Zt 

-.008 

-1.81 

.070 

AZ( _ i 

.265 

6.13 

.000 

AZf_ 2 

-.048 

-1.08 

.280 

Az t _3 

.031 

.71 

.477 


t-statistic of p = —1.81; critical values (5%) 3.35 (10%) 3.05 

The critical values of the Augumented Dickey Fuller (ADF) statistic are from Engle and Yoo (1987). The cointegration 
equation errors used to perform the ADF test is based on the following regression: 

Az ( = —p z f _ i + a Az f _ i + bAz ( _2 + cAz ( _3 + e t 

where Az f is the change in the error term from the co-integration regression and e t is a random error. If p is positive 
and significantly different from zero, the z residuals from the equilibrium equation and stationary so we may accept 
the null hypothesis of cointegration. In both equations the error terms are white noise, so no further stationarity tests 
were performed. 
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review our interpretation of equations (5) and 
(6). Equation (5) claims that changes in the S&P 
500 Index depend upon past changes in the S&P 
500 Index and past changes in dividends and 
the extent of disequilibrium between the S&P 
500 index and dividends. Equation (6) has a 
similar statistical interpretation. However, from 
a theoretical point of view, equation (6) is mean¬ 
ingless. Financial theory does not claim that 
changes in dividends are impacted either by 
past changes in stock prices or the extent of the 
disequilibrium between stock prices and divi¬ 
dends. As such, equation (6) degenerates into 
an autoregressive model of dividends. 

We estimated the error-correction equations 
using three lags. The error term, Zj_i, used in 
these error-correction regressions was obtained 
from OLS estimation of the cointegration equa¬ 
tion reported in Table 3. Estimates of the error- 
correction equations are reported in Table 5. 
By construction, the error-correction term rep¬ 
resents the degree to which the stock prices and 
dividends deviate from long-run equilibrium. 
The error-correction term is included in both 
equations to guarantee that the variables do not 
drift too far apart. If the variables are cointe¬ 
grated, Engle and Granger (1987) showed that 
the coefficient on the error-correction term (y f _ i 
— ax t _ i) in at least one of the equations must 
be nonzero. The t value of the error-correction 
term in equation (5) is statistically different 


from zero. The coefficient of —0.07 is known 
as the speed of adjustment coefficient. It sug¬ 
gests that 7% of the previous month's disequi¬ 
librium between the stock index and dividends 
is eliminated in the current month. In general, 
the higher the speed of adjustment coefficient, 
the faster the long-run equilibrium is restored. 
Since the speed of adjustment coefficient for the 
dividend equation is statistically indistinguish¬ 
able from zero, all of the adjustment falls on the 
stock price. 

An interesting observation from Table 5 re¬ 
lates to the lag structure of equation (5). The 
first lag on past stock price changes is statis¬ 
tically significant. This means that the change 
in the stock index this month depends upon 
the change during the last month. This is in¬ 
consistent with the efficient market hypothesis. 
On the other hand, the change in dividend lags 
is not statistically different from zero. The effi¬ 
cient market theory suggests, and the estimated 
equation confirms, that past changes in divi¬ 
dends do not affect the current changes in stock 
prices. 

Johansen-Juselius Cointegration Tests 

The Engle-Granger method does have some 
problems (see Enders, 1995). These problems 
are magnified in a multivariate (three or 
more variables) context. In principle, when the 


Table 5 Error Correction Model: S&P 500 Index and Dividends 1962-1982 

AYf = boi + fn AY (_2 + bi 2 AYf _2 + bi3AYf_3 + cn AX t _i + Ci 2 Xf _2 + Ci3AXf_3 + di(Yf_i — « Xf_i) + eif (5) 

AX (_1 = &20 + &21 AYf_i + ^22 AYf _2 + 1^23 AYf _3 + C 2 lAX(_i + C 22 AXf _2 + C 23 A X (_3 + d 2 (Yf_i — flXf_i) + C 2 f ( 6 ) 



Equation 5 



Equation 6 



Coefficient 

f-stat 


Coefficient 

f-stat 

boi 

-.009 

-2.42 

b20 

.001 

2.91 

bn 

.251 

4.00 

b2i 

.002 

.63 

bn 

-.043 

-.66 

b22 

-.003 

-.88 

bi3 

.081 

1.27 

B23 

.004 

1.07 

Cll 

.130 

.11 

C21 

.939 

14.60 

Cl2 

-.737 

-.46 

C22 

-.005 

-.06 

Cl3 

-.78 

-0.65 

C23 

-.006 

.87 

di 

-.07 

-3.64 

d2 

.000 

.30 


The change in the S&P 500 index is denoted as AY f and the change in dividends is denoted as AX f . 
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cointegrating equation is estimated (even in a 
two-variable problem), we may use any vari¬ 
able as the dependent variable. In our last ex¬ 
ample, this would entail placing dividends on 
the left-hand side of equation (2) and the S&P 
500 index on the right-hand side. As the sam¬ 
ple size approaches infinity, Engle and Granger 
(1987) showed the cointegration tests produce 
the same results irrespective of what variable 
you use as the dependent variable. The question 
then is: How large a sample is large enough? 

A second problem is that the errors we use 
to test for cointegration are only estimates and 
not the true errors. Thus any mistakes made in 
estimating the error term, Zt , in equation (2) are 
carried forward into the equation (3) regression. 
Finally, the Engle-Granger procedure is unable 
to detect multiple cointegrating relationships. 

The procedures developed by Johansen and 
Juselius (1990) avoid these problems. Consider 
the following multivariate model: 

yt = Ay t-1 + u t (7) 

where 

yt is an n x 1 vector (yi t , y 2t ,.y n t)' 

u t is an n-dimensional error term at t 
A is an n x n matrix of coefficients 

If the variables display a time trend, we may 
wish to add the matrix Aq to equation (7). This 
would reflect a deterministic time trend. The 
same applies to equation (8) presented below. It 
does not change the nature of our analysis. 

The model (without the deterministic time 
trend) can then be represented as: 

Ay t = (I - A)y t _i + u t (8) 

Let B = I — A. I is the identity matrix of di¬ 
mension n. The cointegration of the system is 
determined by the rank of B matrix. The high¬ 
est rank of B one can obtain is n, the number 
of variables under consideration. If the rank of 
the matrix equals zero, then the B matrix is null. 
This means Ay* = 0 + u t , where 0 is the null vec¬ 
tor. In this case yu will follow a random walk 


(y f = y t _ j + iif) and no linear combination of yt 
will be stationary, so there are no cointegrating 
vectors. 

If the rank of B is n, then each y !f is an autore¬ 
gressive process. This means each y !f is station¬ 
ary and thus they cannot be cointegrated. For 
any rank between 1 and n — 1, the system is 
cointegrated and the rank of the matrix is the 
number of cointegrating vectors. 

The higher-order autoregressive representa¬ 
tion is similar. Although it is more involved, the 
Johansen and Juselius estimation procedure can 
still handle it easily. Since the rank of a matrix 
equals the number of distinct nonzero charac¬ 
teristic roots of a matrix, the Johansen-Juselius 
procedure attempts to determine the number of 
nonzero characteristic roots of the relevant ma¬ 
trices. The procedure estimates the matrices and 
hence the characteristic roots with a maximum 
likelihood method. 

The Johansen-Juselius procedure employs 
two statistics to test for nonzero characteristic 
roots. First they order the characteristic roots 
from high to low, ),]*> ). 2 *>... ./.> n *. X* to esti¬ 
mate nonzero characteristic roots. 

The first statistic, the trace test statistic, 
verifies the null hypothesis that at most i 
characteristic roots are different from zero. The 
alternative hypothesis is that more than i char¬ 
acteristic roots are nonzero. The statistic em¬ 
ployed is: 

ktrace(i) = -T[ln(l - A*) + ln(l - A* +1 ) 

H-+ ln(l — A.*)]. (9) 

where T is the number of included time periods. 
If all the characteristic roots are zero since ln( 1) 
= 0, the statistic will equal zero. Thus low values 
of the test statistic will lead us to fail to reject the 
null hypothesis. The larger any characteristic 
root is, the more negative 1— A.;* and the larger 
the test statistic and the more likely we will 
reject the null hypothesis. 

The alternative test is called the maximum 
eigenvalue test since it is based on the largest 
eigenvalue. This statistic tests the null hypothe¬ 
sis that there are i cointegrating vectors against 



Applying Cointegration to Problems in Finance 


393 


the alternative hypothesis of i + 1. This statistic 
is: 

UU + 1)= -Tln(l-A* +1 ) (10) 

Again, if a, + -] * = 0, then the test statistic will 
equal zero. So low (high) values of A,- +1 * will 
lead to a failure to reject (rejection of) the null 
hypothesis. 

Johansen and Juselius derive critical values 
for both test statistics. The critical values are 
different if there is a deterministic time trend 
and an Ao matrix is included. Enders (1995) pro¬ 
vides tables for both critical statistics with and 
without the trend terms. Software programs 
often provide critical values and the relevant 
p-values. 

Testing of the Dynamic 
Relationships among Country 
Stock Markets 

Many portfolio managers seek international di¬ 
versification. If stock market returns in differ¬ 
ent countries were not highly correlated, then 
portfolio managers could obtain risk reduction 
without significant loss of return by investing 
in different countries. But with the advent of 
globalization and the simultaneous integration 
of capital markets, the risk-diversifying bene¬ 
fits of international investing have been subject 
to challenge. In this section, we illustrate how 
cointegration can shed light on this issue and 
apply the Johansen-Juselius technique. 

The idea of a common currency for the Eu¬ 
ropean countries is to reduce transactions costs 
and more closely link the economies. We shall 
use cointegration to examine whether the stock 
markets of France, Germany, and the Nether¬ 
lands are linked following the introduction of 
the Euro in 1999. We use monthly data for the 
period 1999-2006. 

The first step to test for cointegration is to es¬ 
tablish that the three stock indexes are nonsta¬ 
tionary in the levels and stationary in the first 
differences. In testing the present value model, 
we presented the autocorrelation function (the 


ADF statistic), and the Phillips-Perron statis¬ 
tic. For reasons of space, we will not repeat 
this. Next we should establish the appropri¬ 
ate lag length for equation (8). This is typically 
done by estimating a traditional vector autore¬ 
gressive (VAR) model and applying a multivari¬ 
ate version of the Akaike information criterion 
or Schwarz criterion. For our model, we use one 
lag, and thus the model takes the form: 

yt = A 0 + Aiy t _i + u t (11) 

where y t is the n x 3 vector (y lt/ y 2 t, ysi)’ of the 
logs of the stock market index for France, Ger¬ 
many, and the Netherlands (i.e., element y lt is 
the log of the French index at time f; y 2 t is the 
log of the German index at time f; and y 2l is 
the log of the Netherlands index at time f). We 
use logs of the stock market indexes to smooth 
the series. Ao and A/ are n x n matrices of 
parameters and lit is the n x n error matrix. 

The next step is to estimate the model. This 
means fitting equation (8). We incorporated a 
linear time trend, hence the inclusion of the ma¬ 
trix Ao. Since there are restrictions across the 
equations, the procedure uses a maximum like¬ 
lihood estimation procedure and not OLS. The 
focus of this estimation is not on the parame¬ 
ters of the A matrices. Few software programs 
present these estimates; rather, the emphasis 
is on the characteristic roots of the matrix B, 
which are estimated to determine the rank of the 
matrix. 

The estimates of the characteristic roots are 
presented in Table 6. We want to establish 
whether i indexes are cointegrated. Thus, we 
test the null hypothesis that the indexes lack 
cointegration. To accomplish this, the A tra ce (0) 
statistic is calculated. Table 6 also provides this 
statistic. To ensure comprehension of this im¬ 
portant statistic, we detail its calculation. 

We have 96 usable observations. 

Atrace(0) = —T[ln(l — a*) + ln(l — k%) 

+ ln(l - A;)] = -96[ln(l - 0.227) 
+ ln(l - 0.057) + ln(l - 0.028)] 

= 33.05 
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Table 6 Cointegration Test 


Hypothesized No. Trace 

of Cointegrating Characteristic Statistics 5% Critical Max Statistic 5% Critical 


Vectors 

Roots 

^•trace 

Value 

/7-Value 

^max 

Value 

p- Value 

None 

.227 

33.05 

29.80 

.02 

24.72 

21.13 

.01 

At most 1 

.057 

8.32 

15.49 

.43 

5.61 

14.26 

.66 

At most 2 

.028 

2.72 

3.84 

.10 

2.72 

3.84 

.10 


As reported in Table 6, this exceeds the crit¬ 
ical value for 5% significance of 29.80 and has 
a p-value of 0.02. Thus, we may reject the null 
hypothesis at a 5% level of significance and con¬ 
clude that the evidence is consistent with at least 
one cointegrating vector. Next we can examine 
A.trace (1) to test the null hypothesis of at most 1 
cointegrating vector against the alternative of 2 
cointegrating vectors. Table 6 shows that X\ at 
8.33 is less than the critical value of 15.49 nec¬ 
essary to establish statistical significance at the 
5% level. We do not reject the null hypothesis. 
We therefore conclude that there is at least one 
cointegrating vector. There is no need to evalu¬ 
ate /'-trace (2). 

The /, max statistic reinforces our conclusion. 
We can use /. max (0,1) to test the null hypothe¬ 
sis that the variables lack cointegration against 
the alternative that they are cointegrated with 
one cointegrating vector. Table 6 presents the 
value of k max (0, 1). Again, for pedagogic rea¬ 
sons we outline the calculation of A max (0,1). 

WO, 1) = (—Tln(l - X*) = — 961n(l - 0.227) 
= 24.72 

The computed value of 24.72 exceeds the crit¬ 
ical value of 21.13 at the 5% significance level 
and has a p-value of 0.01. Once again, this leads 
us to reject the null hypothesis that the indexes 
lack cointegration and conclude that there ex¬ 
ists at least one cointegrating vector. 

The next step requires a presentation of the 
cointegrating equation and an analysis of the 
error-correction model. Table 7 presents both. 
The cointegrating equation is a multivariate 
representation of Zf_i in the Engle-Granger 


method. This is presented in panel A of Table 7. 
The error-correction model takes the following 
representation. 

n n 

Ay t = bio + bli A V‘ + 5Z Cl i Axt ~> 

*=i /= 

+di(y t _i — ax t _i) + en (12) 

The notation of equation (12) differs some¬ 
what from the notation of equations (5) and 
(6). The notation used in equation (12) reflects 
the matrix notation adopted for the Johansen- 
Juselius method in equation (8). Nevertheless, 
for expositional convenience, we did not use the 
matrix notation for the error-correction term. 
Again, the A means the first difference of the 
variable; thus means the change in the 

log of the French stock index in period t — 1, 
(yit-i — J/if— 2 ). Equation (12) claims that changes 
in the log of the French stock index are due 
to changes in the French stock index during 
the last two (2) periods; changes in the Ger¬ 
man stock index during the last two periods; 
changes in the Netherlands stock index dur¬ 
ing the last two periods; and finally deviations 
of the French stock index from its stochastic 
trend with Germany and the Netherlands. An 
analogous equation could be written for both 
Germany and the Netherlands. 

Panel B of Table 7 presents the error-correction 
model estimates for each of the three countries. 
The software used a two-period lag for the past 
values of the changes in the stock indexes as 
indicated by the Schwarz criterion. 

The error-correction term in each equation re¬ 
flects the deviation from the long-run stochas¬ 
tic trend of that stock index in the last period. 
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Table 7 Cointegrating Equation and Error Correction Equations 1999-2007 
Panel A: Cointegrating Equation 
France = 4.82 + 2.13 Germany -1.71 Netherlands 
[-8.41] [5.25] 

Panel B: Error Correction Equations 


Country 

A (France) 

A(Germany) 

A (Netherlands) 

Z,_i 

-0.151477 

-0.057454 

-0.179129 


[-2.21470] 

[-0.66835] 

[-2.52373] 

A(France(—1)) 

0.087360 

0.245750 

0.225357 


[0.27222] 

[0.60927] 

[0.67667] 

A(France(—2)) 

-0.200773 

-0.218331 

-0.324250 


[-0.68179] 

[-0.58990] 

[-1.06105] 

A(Germany(—1)) 

-0.189419 

-0.024306 

-0.094891 


[-0.82197] 

[-0.08392] 

[-0.39680] 

A(Germany(—2)) 

-0.155386 

-0.109070 

-0.127301 


[-0.67237] 

[-0.37551] 

[-0.53081] 

A (Netherlands!—1)) 

0.079881 

-0.189775 

-0.188295 


[0.34284] 

[-0.64805] 

[-0.77875] 

A(Netherlands(—2)) 

0.439569 

0.446368 

0.483929 


[1.89288] 

[1.52936] 

[2.00810] 

C 

0.005967 

0.002575 

0.002688 


[1.02860] 

[0.35321] 

[0.44641] 


France (—1) represents the log return of the French stock index one month ago. Germany (—1) and Netherlands (—1) 
have a similar interpretation the [ ] represent the f-statistic. 


It should be noted that in contrast to the two- 
step procedure of the Engle-Granger approach, 
the Johansen-Juselius approach estimates the 
speed of adjustment coefficient in one step. It 
provides insight into the short-run dynamics. 
This coefficient is insignificant (at the 5% level) 
for Germany. This means that stock prices in 
Germany do not change in response to devi¬ 
ations from their stochastic trend with France 
and the Netherlands. Because the variables are 
cointegrated, we are guaranteed that at least 
one speed of adjustment coefficient will be sig¬ 
nificant. In fact, the speed of adjustment coeffi¬ 
cients of both France and the Netherlands attain 
statistical significance (at the 5% level) and are 
about the same size. This shows that when the 
economies of France and the Netherlands de¬ 
viate from the common stochastic trend, they 
adjust. In France about 15% and in the Nether¬ 
lands about 17% of the last-period deviation is 
corrected during this period. 

For France, neither past changes in its own 
stock index nor the past changes in Germany's 
stock index appear to affect French stock prices. 


The changes in the lagged values of both in¬ 
dexes lack statistical significance. Only the sec¬ 
ond lag of the Netherlands stock index attained 
significance. For Germany, the past changes in 
its own stock prices and the past changes in the 
stock indexes of the other countries failed to ob¬ 
tain significance at the 5% level. For the Nether¬ 
lands, its own second-period lag obtained 
statistical significance. Nevertheless, the failure 
of individual lags to obtain significance does 
not mean that jointly the lags are insignificant. 

To see this, we turn to an examination of 
Granger causality in the error-correction mod¬ 
els. Granger causality helps us to classify the 
variables into dependent and independent. A 
variable Granger causes another variable when 
past values of that variable improve our abil¬ 
ity to forecast the original variable. To test for 
Granger causality, an T-test is employed to ver¬ 
ify whether the lagged changes in, say, the stock 
index of France jointly zero in the German equa¬ 
tion. Table 8 reports the results of pairwise 
Granger causality tests. We find that France 
and Germany do not Granger cause each other 
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Table 8 Cointegration Test Results 1975-2000 


Cointegration between 

Hypothesized 
No. of CE(s) 

0.05 

Eigenvalue Trace Statistic Critical Value Prob. ** 

Home Prices vs. Household Debt 

None 

0.09 

14.72 

25.87 

0.60 

Ratio 

At most 1 

0.05 

5.28 

12.52 

0.56 

Home Prices vs. Housing 

None 

0.13 

20.34 

25.87 

0.21 

Affordability Index 

At most 1 

0.07 

7.06 

12.52 

0.34 

Home Prices vs. Mortgage Rate 

None 

0.10 

18.16 

25.87 

0.33 


At most 1 

0.08 

7.73 

12.52 

0.27 

Home Prices vs. Homebuilders 

None 

0.15 

25.69 

25.87 

0.05 

Stock Index 

At most 1 

0.09 

9.49 

12.52 

0.15 

Home Prices vs. Unemployment 

None * 

0.15 

25.90 

25.87 

0.05 

Rate 

At most 1 

0.09 

9.66 

12.52 

0.14 

Home Prices vs. Mean of Middle 

None * 

0.20 

32.58 

25.87 

0.01 

Fifth of Income 

At most 1 

0.10 

9.90 

12.52 

0.13 

Home Prices vs. Mean of Top Fifth 

None * 

0.21 

32.29 

25.87 

0.01 

of Income 

At most 1 

0.08 

8.70 

12.52 

0.20 


Source: This table is reprinted from Arshanapalli and Nelson (2008) with permission of The International Journal of 
Business and Finance Research. 

* Denotes rejection of the null hypothesis of no cointegration at the 0.05 level 
** Denotes the p-value 


at any conventional levels of significance. The 
smaller Netherlands economy finds its stock 
prices Granger caused by both France and Ger¬ 
many but the Netherlands does not Granger 
cause either French or German stock prices at 
conventional levels of significance. 

Empirical Illustration of a Test for 
the Presence of a Housing Bubble 

The third application demonstrates the use of 
cointegration to test the possibility of a bub¬ 
ble in the housing market. As we illustrated in 
our previous examples, the beginning of any 
analysis is a picture of the time series under 
examination. Figure 2 shows the trend in the 
U.S. housing index from 1975 to the third quar¬ 
ter of 2007. Clearly, since 2001 the United States 
has experienced several years of strong home 
price increases. Also, the figure illustrates that 
the rise began to slow in 2005. At this time 
we know that housing prices collapsed in 2008. 
This sort of evidence has led the financial and 
the general press to conclude that the U.S. hous¬ 
ing market has experienced a bubble. Flowever, 
the detection of a bubble after the fact is of little 
practical use. The question is, can cointegration 


provide evidence of a bubble before the bubble 
bursts? 

The widely accepted efficient market the¬ 
ory claims that financial asset prices reflect all 
the publicly available information at all times. 
This denies the possibility of a bubble. While 
some may believe prices are too high relative 
to fundamental factors, according to the the¬ 
ory they are wrong, because investors recog¬ 
nize immediately if the price of anything is 
too high (or too low) and respond by selling 
(or purchasing) the asset until the over-(under) 
pricing is eliminated. A mountainous body of 
academic research (see Fama, 1970, for a sam¬ 
pling) supports this view. 

Nevertheless, the efficient market theory has 
been subject to much serious criticism (Shiller, 
2003). Furthermore, much of the research fo¬ 
cused on financial assets. The efficient market 
theory assumes that investors can sell an asset 
short to eliminate overpricing. Real estate is a 
real and illiquid asset. During the period of the 
housing price run-up there was no mechanism 
known to us for shorting a residential home. 
A futures market for housing is a relatively re¬ 
cent innovation. These markets do not function 
well enough to fulfill the assumptions of the 
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Figure 2 Home Prices in the United States 

‘The bursting of a real estate bubble has important implications for the U.S. economy. Residential real 
estate is an important component of householder wealth. In 1996, it represented 39% of household 
wealth. This figure is reprinted from Arshanapalli and Nelson (2008) with the permission of the Institute 
for Business and Finance Research. 


efficient market theory. Thus, we should not dis¬ 
miss the possibility of a housing bubble out of 
hand. 

Arshanapalli and Nelson (2008) tested for 
the existence of a housing bubble, examining 
the stability of the underlying relationship of 
home prices and the economic forces that deter¬ 
mine them. A relationship suddenly becomes 
unstable when rising home prices are not jus¬ 
tified by the underlying economic fundamen¬ 
tals. Cointegration is well suited to test for this. 
Cointegration implies that two variables share a 
common stochastic trend. A common stochastic 
trend does not simply mean that they move up¬ 
ward or downward together, but rather that the 
variables may share both prolonged upward 
and prolonged downward movements. 

Suppose housing prices are cointegrated with 
an economic variable and a bubble develops in 
the housing market, then housing prices rise 
without a corresponding rise in the variable. 
This implies the severing of a long-term rela¬ 
tionship between housing prices and the vari¬ 
able. In other words, the cointegration should 
cease. In summary, if there were a housing bub¬ 
ble beginning in about 2000, then the variables. 


which were cointegrated with housing prices 
before 2000, will no longer remain cointegrated 
after 2000. 


Data 

Quarterly data are used and the study covers 
the period 1975Q1-2007Q2. We employ the U.S. 
Office of Federal Housing Enterprise Oversight 
(OFHEO) Home Price quarterly index to mea¬ 
sure housing prices. The index is not seasonally 
adjusted. 

Next, we consider a series of seven variables 
that reflect the fundamental economic forces 
determining housing prices. The most impor¬ 
tant of these is income. Case and Shiller (2003) 
conclude that in nonbubble markets income 
explains most of the rise in housing prices. 
We employ two separate measures of income. 
The first is the mean of the middle quintile of 
the income distribution, denoted as the Mid¬ 
dle Fifth. Second, we use the mean of the 
highest quintile of the income distribution, de¬ 
noted as the Top Fifth. This attempts to account 
for the possibility that the wealthiest segment 
of the population influences housing prices 
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disproportionately because of their greater 
mobility. The U.S. Census Bureau, Histori¬ 
cal Income Tables-Families (all races), and the 
National Association of Realtors provided these 
data. 

The mortgage rate represents a strong in¬ 
fluence on consumer demand for housing. 
We obtained the 30-year conventional mort¬ 
gage rate (fixed rate, first mortgages) from the 
Board of Governors of the U.S. Federal Re¬ 
serve System. The civilian unemployment rate 
measures the state of the economy. The U.S. 
Bureau of Labor Statistics provided the sea¬ 
sonally adjusted percentage of civilian unem¬ 
ployment. We converted the monthly data for 
both variables to quarterly data by a simple 
mean. The Homebuilders Stock Index provides 
an indication of the state of the housing market. 
A capitalization-weighted, price-level index of 
homebuilding stocks based on stocks included 
in the S&P 500 stock index was obtained from 
Merrill Lynch. 

The final variables measure the ability of con¬ 
sumers to handle mortgage debt. The house¬ 
hold debt ratio is the ratio of household credit 
market debt outstanding to annualized per¬ 
sonal disposable income. The data also came 
from the Board of Governors of the U.S. Fed¬ 
eral Reserve System. The Housing Affordabil¬ 
ity Index for all homebuyers (HAI) measures 
whether or not a typical family could qualify 
for a mortgage loan on a typical home, assum¬ 
ing a 20% down payment. We define a typical 
home as the national median-priced, existing 
single-family home as calculated by NAR. In 
its final form used here, the HAI is essentially 
"median family income divided by qualifying 
income." The index is interpreted as follows: 
A value of 100 means that a family with the 
median family income (from the U.S. Bureau 
of the Census and NAR) has exactly enough 
income to qualify for a mortgage on a median- 
priced home. National Association of Realtors 
(NAR) provided the data. In this research, 
the monthly HAI values result from quarterly 
samples. 2 


Again, the first step in establishing cointe¬ 
gration is to test the variables for stationarity. 
To establish nonstationarity we employed the 
ADF (augmented Dickey Fuller) test and the 
Phillips-Peron test. Although we do not display 
the results here, we conclude all the variables 
are nonstationary. 

Next, we examine whether home prices and 
the seven fundamental variables are cointe¬ 
grated. This is accomplished by examining a 
cointegrating regression for each of the seven 
variables with home prices. Table 8 presents 
the results of these cointegration tests for the 
1975Q1-2000Q4 period. The Trace Statistic Test 
shows that for three of the seven variables, 
top fifth, middle fifth, and the unemployment 
rate, we may reject the null hypothesis of no 
cointegration at a 5% level of significance. Fur¬ 
thermore, we may reject the null hypothesis 
of no cointegration at the 10% level of statisti¬ 
cal significance for one additional variable, the 
Homebuilders stock index. Thus for the period 
preceding the runup in home prices there ap¬ 
pears to have been a strong link between home 
prices and both the income variables and the 
unemployment rate and a marginal link with 
the Homebuilders Stock Index. 

Table 9 presents the results of these cointe¬ 
gration tests for the period 1975-2007Q3. The 
trace tests indicate the eigenvalues are not sta¬ 
tistically distinguishable from zero in any equa¬ 
tion at the 5% level. However, the P-value for 
the middle fifth of income was .0502. Recog¬ 
nizing the belief that the bubble burst in late 
2005, we did the cointegration test for the pe¬ 
riod 1975-2005Q2. The P-value (hypothesized 
no. of CE(s) = none) for home prices vs. mid¬ 
dle fifth of income was 11%. (Although we did 
not display the results, we cannot reject the hy¬ 
pothesis of no cointegration for any of the other 
fundamental variables during this period. This 
suggests that in the post-2005 period the normal 
relationship between home prices and income 
was reasserting itself. This result suggests that 
the linkage between home prices and funda¬ 
mental variables has been substantially reduced 
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Table 9 Cointegration Test Results for the Whole Period: 1975-2007Q3 


Cointegration between 

Hypothesized 
No. of CE(s) 

Eigenvalue 

Trace Statistic 

0.05 Critical 
Value 

Prob. ’ 

Home Prices vs. Household Debt 

None 

0.07 

7.68 

15.49 

0.50 

Ratio 

At most 1 

0.00 

0.29 

3.84 

0.59 

Home Prices vs. Housing 

None 

0.12 

13.64 

15.49 

0.09 

Affordability Index 

At most 1 

0.01 

0.89 

3.84 

0.35 

Home Prices vs. Mortgage Rate 

None 

0.10 

10.63 

15.49 

0.24 


At most 1 

0.00 

0.42 

3.84 

0.52 

Home Prices vs. Homebuilders 

None 

0.10 

12.28 

15.49 

0.14 

Stock Index 

At most 1 

0.02 

1.78 

3.84 

0.18 

Home Prices vs. Unemployment 

None 

0.10 

12.13 

15.49 

0.15 

Rate 

At most 1 

0.02 

1.81 

3.84 

1.18 

Home Prices vs. Mean of Middle 

None 

0.13 

15.48 

15.49 

0.05 

Fifth of Income 

At most 1 

0.02 

2.02 

3.84 

0.16 

Home Prices vs. Mean of Top Fifth 

None 

0.09 

8.85 

15.49 

0.38 

of Income 

At most 1 

0.00 

0.03 

3.84 

0.87 


Source: This table is reprinted from Arshanapalli and Nelson (2008) with permission of The International Journal of 
Business and Finance Research. 

* Denotes the p-value 


after 2000. The evidence is consistent with a real 
estate bubble. 


KEY POINTS 

* Many of the variables of interest to finance 
professionals are nonstationary. 

* The relationships among them can be fruit¬ 
fully analyzed if they share a common 
stochastic trend. A way of capturing this com¬ 
mon stochastic trend is the application of 
cointegration. 

* Cointegration analysis can reveal interesting 
long-run relationships between the variables. 

* It is possible that cointegrating variables may 
deviate in the short run from their relation¬ 
ship, but the error correction model shows 
how these variables adjust to the long-run 
equilibrium. 

* Cointegration analysis can reveal interesting 
short-run asset pricing adjustments. 

* The error-correction models tend to have a 
better forecasting performance than simple 
vector autoregressive models. 

* Cointegration analysis shows when funda¬ 
mental long-run relationships are severed. 


This is consistent with the presence of an asset 
price bubble. 

NOTES 

1. For a summary of these criteria, see Chap¬ 
ter 12 in Focardi and Fabozzi (2004). 

2. For more details on the exact calculation, go 
to www.realtor.org. 
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Abstract: Many financial and economic data exhibit nonlinear characteristics. Prices of commodities 
such as crude oil often rise quickly but decline slowly. The monthly U.S. unemployment rate exhibits 
sharp increases followed by slow decreases. To model these characteristics in a satisfactory manner, 
one must employ nonlinear econometric models or use nonparametric statistical methods. For most 
applications, it suffices to employ simple nonlinear models. For example, the quarterly growth 
rate of the U.S. gross domestic product can be adequately described by the Markov switching or 
threshold autoregressive models. These models typically classify the state of the U.S. economy into 
two categories corresponding roughly to expansion and contraction. 


In this entry, we study nonlinearity in financial 
data, discuss various nonlinear models avail¬ 
able in the literature, and demonstrate appli¬ 
cation of nonlinear models in finance with 
real examples. The models discussed include 
bilinear models, threshold autoregressive mod¬ 
els, smooth threshold autoregressive models, 
Markov switching models, and nonlinear addi¬ 
tive autoregressive models. We also consider 
nonparametric methods and neural networks, and 
apply nonparametric methods to estimate in¬ 
terest models. To detect nonlinearity in finan¬ 
cial data, we introduce various nonlinearity tests 
available in the literature and apply the tests 
to some financial series. Finally, we analyze the 
monthly U.S. unemployment rate and compare 
out-of-sample prediction of nonlinear models 
with linear ones via several criteria. 


STUDY OF NONLINEARITY 
IN ECONOMETRICS AND 
STATISTICS 

Assume, for simplicity, a univariate time series 
x t is observed at equally spaced time points. 
We denote the observations by {x t \t — 1,..., T}, 
where T is the sample size. A purely stochastic 
time series Xf is said to be linear if it can be 
written as 

OO 

X t = p + ^2 ^i a t-i (1) 

i=0 

where p is a constant, \jj, are real numbers with 
ij/o = 1, and {a t } is a sequence of indepen¬ 
dent and identically distributed (IID) random 
variables with a well-defined distribution func¬ 
tion. We assume that the distribution of at is 
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continuous and E(a t ) = 0. In many cases, we fur¬ 
ther assume that Var(a t ) = er 2 or, even stronger, 
that ai is Gaussian. If er 2 i]/f < oo, then X t 
is weakly stationary (i.e., the first two moments 
of x t are time-invariant). The well-known au¬ 
toregressive moving-average (ARMA) process 
of Box et al. (2008) is linear because it has an 
moving-average (MA) representation in equa¬ 
tion (1). Any stochastic process that does not 
satisfy the condition of equation (1) is said to be 
nonlinear. The prior definition of nonlinearity 
is for purely stochastic time series. One may ex¬ 
tend the definition by allowing the mean of Xt 
to be a linear function of some exogenous vari¬ 
ables, including the time index and some peri¬ 
odic functions. But such a mean function can 
be handled easily by using a regression model 
with time series errors discussed in Tsay (2010, 
Chapter 2), and we shall not consider the ex¬ 
tension here. Mathematically, a purely stochas¬ 
tic time series model for x t is a function of an 
IID sequence consisting of the current and past 
shocks—that is, 

x t = f(a t ,a t -i, ...) (2) 

The linear model in equation (1) says that/(.) 
is a linear function of its arguments. Any non¬ 
linearity in/(.) results in a nonlinear model. The 
general nonlinear model in equation (2) is too 
vague to be useful in practice. Further assump¬ 
tions are needed to make the model applicable. 

To put nonlinear models available in the lit¬ 
erature in a proper perspective, we write the 
model of Xf in terms of its conditional mo¬ 
ments. Let Ft -1 be the a-field generated by 
available information at time t — 1 (inclusive). 
Typically, Ft -1 denotes the collection of linear 
combinations of elements in {Xf_i,Xt_ 2 , ■ • .}and 
{at- 1 , at- 2 , ■ • The conditional mean and vari¬ 
ance of Xt given Ft -1 are 

fx,t = E(x t \ F t -i) = g(Ft-i) . . 

o} = Var{xt\F t -x) = h{Ft-{) K) 

where g(.) and h(.) are well-defined functions 
with h{.) > 0. Thus, we restrict the model to 

x t = g(F t -\) + yJh{F t -t)e t 


where et — a t /<j t is a standardized shock (or in¬ 
novation). For the linear series Xt in equation 
(1), g(.) is a linear function of elements of Ft-i 
and h{.) = er 2 . The development of nonlinear 
models involves making extensions of the two 
equations in equation (3). If g{.) is nonlinear, 
Xt is said to be nonlinear in mean. If /?(.) is 
time-variant, then x t is nonlinear in variance. 
The conditional heteroscedastic models, for ex¬ 
ample, the GARCFI model of Bollerslev (1986), 
are nonlinear in variance because their condi¬ 
tional variances er 2 evolve over time. Based on 
the well-known Wold decomposition, a weakly 
stationary and purely stochastic time series can 
be expressed as a linear function of uncorre¬ 
lated shocks. For stationary volatility series, 
these shocks are uncorrelated, but dependent. 
The models discussed in this entry represent 
another extension to nonlinearity derived from 
modifying the conditional mean equation in 
equation (3). 

Many nonlinear time series models have 
been proposed in the statistical literature, 
such as the bilinear models of Granger and 
Andersen (1978), the threshold autoregres¬ 
sive (TAR) model of Tong (1978), the state- 
dependent model of Priestley (1980), and the 
Markov switching model of Hamilton (1989). 
The basic idea underlying these nonlinear mod¬ 
els is to let the conditional mean /i t evolve 
over time according to some simple paramet¬ 
ric nonlinear function. Recently, a number of 
nonlinear models have been proposed by mak¬ 
ing use of advances in computing facilities 
and computational methods. Examples of such 
extensions include the nonlinear state-space 
modeling of Carlin, Poison, and Staffer (1992), 
the functional-coefficient autoregressive model 
of Chen and Tsay (1993a), the nonlinear addi¬ 
tive autoregressive model of Chen and Tsay 
(1993b), the multivariate adaptive regression 
spline of Lewis and Stevens (1991), and the 
generalized autoregressive score (GAS) model 
of Creal et al. (2010). The basic idea of these 
extensions is either using simulation methods 
to describe the evolution of the conditional 
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distribution of x t or using data-driven meth¬ 
ods to explore the nonlinear characteristics of 
a series. Finally, nonparametric and semipara- 
metric methods such as kernel regression and 
artificial neural networks have also been ap¬ 
plied to explore the nonlinearity in a time series. 
We discuss some nonlinear models in this entry 
that are applicable to financial time series. The 
discussion includes some nonparametric and 
semiparametric methods. 

Apart from the development of various non¬ 
linear models, there is substantial interest in 
studying test statistics that can discriminate lin¬ 
ear series from nonlinear ones. Both paramet¬ 
ric and nonparametric tests are available. Most 
parametric tests employ either the Lagrange 
multiplier or likelihood ratio statistics. Non¬ 
parametric tests depend on either higher order 
spectra of x t or the concept of dimension cor¬ 
relation developed for chaotic time series. We 
review some nonlinearity tests, discuss model¬ 
ing and forecasting of nonlinear models, and 
provide an application of nonlinear models. 

NONLINEAR MODELS 

Most nonlinear models developed in the sta¬ 
tistical literature focus on the conditional mean 
equation in equation (3); see Priestley (1988) and 
Tong (1990) for summaries of nonlinear models. 
Our goal here is to introduce some nonlinear 
models that are useful in finance. 

Bilinear Model 

The linear model in equation (1) is simply the 
first-order Taylor series expansion of the /(.) 
function in equation (2). As such, a natural ex¬ 
tension to nonlinearity is to employ the second- 
order terms in the expansion to improve the 
approximation. This is the basic idea of bilinear 
models, which can be defined as 
P 9 

*f = C + £ - £ d i a t-j 

i=1 M 

m s vv 

+ £ £ PijXt-i^t-j + 

i= i ;=i 


where p, q, m, and s are nonnegative integers. 
This model was introduced by Granger and 
Andersen (1978) and has been widely investi¬ 
gated. Subba Rao and Gabr (1984) discuss some 
properties and applications of the model, and 
Liu and Brockwell (1988) study general bilinear 
models. Properties of bilinear models such as 
stationarity conditions are often derived by (a) 
putting the model in a state-space form and (b) 
using the state transition equation to express 
the state as a product of past innovations and 
random coefficient vectors. A special general¬ 
ization of the bilinear model in equation (4) 
has conditional heteroscedasticity. For example, 
consider the model 

S 

x t = P + Piat_ia t + a t (5) 

i =1 

where {a t } is a white noise series. The first two 
conditional moments of x t are 

E(x t |F f _i) = p 

Var(x f |F f _ 1 ) = ^1 + £ A'flf-iJ £ 

which confirm that the model has time-varying 
volatility. 

Example 1. Consider the monthly simple re¬ 
turns of the CRSP equal-weighted index from 
January 1926 to December 2008 for 996 observa¬ 
tions. Denote the series by R t . The sample par¬ 
tial autocorrelation function (PACF) of Rt shows 
significant serial correlations at lags 1 and 3 so 
that an AR(3) model is used for the mean equa¬ 
tion. The squared series of the AR(3) residuals 
suggests that the conditional heteroscedasticity 
might depend on lags 1, 3 and 8 of the resid¬ 
uals. Therefore, we employ the special bilinear 
model 

Rt = M + </>iFf-i + <feFf_3 

+(1 + PlClt-l + /hrt f _3)rtf 

for the series, where a t = fio ft with c t being 
an IID series with mean zero and variance 1. 
Note that lag 8 is omitted for simplicity. As¬ 
suming that the conditional distribution of a t 
is normal, we use the conditional maximum 
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Figure 1 Time Plot of a Simulated 2-Regime TAR(l) Series 


likelihood method and obtain the fitted model 

R, = 0.0114 + 0.167K,_i - 0.095R,_3 
+0.071(1 + 0.377fl f _i - 0.646fl f _ 3 )e f 

( 6 ) 

where the standard errors of the parameters 
are, in the order of appearance, 0.0023, 0.032, 
0.027, 0.002, 0.147, and 0.136, respectively. All 
estimates are significantly different from zero 
at the 5% level. Define 

R t - 0.0114 - 0.167R f _i + 0.095R f _3 
€t ~ 0.071(1 + 0.377rt f _i - 0.646«f_ 3 ) 

where €t = 0 for t < 3 as the standardized resid¬ 
ual series of the model. The sample autocorre¬ 
lation function (ACF) of e t shows no significant 
serial correlations, but the series is not indepen¬ 
dent because the squared series ef has signifi¬ 
cant serial correlations. The validity of model 
(6) deserves further investigation. For compari¬ 
son, we also consider an AR(3)-ARCH(3) model 
for the series and obtain 

Rt = 0.013 + 0.223 R t _; + 0.006 R f _ 2 

—0.013Rf_3 + at ,y, 

a} = 0.002 + 0.185^ + 0.301« f 2 _ 2 1 ’ 

+0.197a 2 _ 3 

where all estimates but the coefficients of R f _ 2 
and R t - 3 are highly significant. The standard¬ 
ized residual series of the model shows no se¬ 
rial correlations, but the squared residuals show 
Q(10) = 19.78 with a p-value of 0.031. Models (6) 
and (7) appear to be similar, but the latter seems 


to fit the data better. Further study shows that 
an AR(1)-GARCFI(1,1) model fits the data well. 


Threshold Autoregressive 
(TAR) Model 

This model is motivated by several nonlin¬ 
ear characteristics commonly observed in prac¬ 
tice such as asymmetry in declining and rising 
patterns of a process. It uses piecewise linear 
models to obtain a better approximation of the 
conditional mean equation. However, in con¬ 
trast to the traditional piecewise linear model 
that allows for model changes to occur in the 
"time" space, the TAR model uses threshold 
space to improve linear approximation. Let us 
start with a simple 2-regime AR(1) model 

_ -1.5Xf_i +« f if Xt-i < 0 
Xt 0.5x t _i +a t ifx t _i > 0 

where the a t are IID N(0,1). Here the threshold 
variable is x t -i and the threshold is 0. 

Figure 1 shows the time plot of a simulated 
series of x t with 200 observations. A horizontal 
line of zero is added to the plot, which illustrates 
several characteristics of TAR models. First, de¬ 
spite the coefficient —1.5 in the first regime, 
the process x t is geometrically ergodic and sta¬ 
tionary. In fact, the necessary and sufficient 
condition for model (8) to be geometrically er¬ 
godic is 0J 1 ' < 1, < 1 and where 
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is the AR coefficient of regime i; see Petruc- 
celli and Woolford (1984) and Chen and Tsay 
(1991). 

Ergodicity is an important concept in time se¬ 
ries analysis. For example, the statistical theory 
showing that the sample mean x = Xt)/T 

of Xf converges to the mean of x t is referred to 
as the ergodic theorem, which can be regarded 
as the counterpart of the central limit theory for 
the IID case. Second, the series exhibits an asym¬ 
metric increasing and decreasing pattern. If x f _j 
is negative, then x t tends to switch to a positive 
value due to the negative and explosive coeffi¬ 
cient —1.5. Yet when x t _\ is positive, it tends to 
take multiple time periods for x t to reduce to a 
negative value. Consequently, the time plot of x t 
shows that regime 2 has more observations than 
regime 1, and the series contains large upward 
jumps when it becomes negative. The series is 
therefore not time-reversible. Third, the model 
contains no constant terms, but E(x t ) is not zero. 
The sample mean of the particular realization is 
0.61 with a standard deviation of 0.07. In gen¬ 
eral, E(xt) is a weighted average of the con¬ 
ditional means of the two regimes, which are 
nonzero. The weight for each regime is simply 
the probability that x t is in that regime under its 
stationary distribution. It is also clear from the 
discussion that, for a TAR model to have zero 
mean, nonzero constant terms in some of the 
regimes are needed. This is very different from 
a stationary linear model for which a nonzero 
constant implies that the mean of x t is not zero. 

A time series x t is said to follow a /c-regime 
self-exciting TAR (SETAR) model with thresh¬ 
old variable Xt-d if it satisfies 

x t = <Pq ) + <Po n xt-i — • • • (pp' > x t _p + ^ 

if Yj -1 < x t-d < Yj 

where k and d are positive integers, j = 1, , 
k, Yi ar e real numbers such that — oo = yq < 
Yi <■■■ < Yk -1 < yk = oo, the superscript (j) is 
used to signify the regime, and {al !> } are IID 
sequences with mean 0 and variance crj and 
are mutually independent for different j. The 
parameter d is referred to as the delay pa¬ 


rameter and Yj are the thresholds. Here it is 
understood that the AR models are different 
for different regimes; otherwise, the number of 
regimes can be reduced. Equation (9) says that a 
SETAR model is a piecewise linear AR model 
in the threshold space. It is similar in spirit to 
the usual piecewise linear models in regression 
analysis, where model changes occur in the or¬ 
der in which observations are taken. The SETAR 
model is nonlinear provided that k > 1. 

Properties of general SETAR models are hard 
to obtain, but some of them can be found in 
Tong (1990), Chan (1993), Chan and Tsay (1998), 
and the references therein. In recent years, there 
is increasing interest in TAR models and their 
applications; see, for instance, Hansen (1997), 
Tsay (1998), and Montgomery et al. (1998). Tsay 
(1989) proposed a testing and modeling proce¬ 
dure for univariate SETAR models. The model 
in equation (9) can be generalized by using a 
threshold variable Zf that is measurable with re¬ 
spect to F f _ i (i.e., a function of elements of F f _i). 
The main requirements are that Zf is stationary 
with a continuous distribution function over a 
compact subset of the real line and that Zt-d is 
known at time f. Such a generalized model is 
referred to as an open-loop TAR model. 

Example 2. To demonstrate the application of 
TAR models, consider the U.S. monthly civilian 
unemployment rate, seasonally adjusted and 
measured in percentage, from January 1948 to 
March 2009 for 735 observations. The data are 
obtained from the Bureau of Labor Statistics, 
Department of Labor, and are shown in Fig¬ 
ure 2. The plot shows two main characteris¬ 
tics of the data. First, there appears to be a 
slow but upward trend in the overall unem¬ 
ployment rate. Second, the unemployment rate 
tends to increase rapidly and decrease slowly. 
Thus, the series is not time-reversible and may 
not be unit-root stationary, either. 

Because the sample autocorrelation function 
decays slowly, we employ the first differenced 
series y f = (1 —B)u t in the analysis, where 
lif is the monthly unemployment rate. Using 
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Figure 2 Time Plot of Monthly U.S. Civilian Unemployment Rate, Seasonally Adjusted, from January 
1948 to March 2009 


univariate ARIMA models, we obtain the 
model 

(1 - 1.13B + 0.27B 2 )(l - 0.51B 12 )yf 
= (1 - 1.12B + 0.44B 2 )(1 - 0.82 B u )a t 

( 10 ) 

where a a = 0.187 and all estimates but the 
AR(2) coefficient are statistically significant at 
the 5% level. The t-ratio of the estimate of AR(2) 
coefficient is —1.66. The residuals of model (10) 
give Q(12) = 12.3 and Q(24) = 25.5, respectively. 
The corresponding p-values are 0.056 and 0.11, 
respectively, based on / 2 distributions with 6 
and 18 degrees of freedom. Thus, the fitted 
model adequately describes the serial depen¬ 
dence of the data. Note that the seasonal AR and 
M A coefficients are highly significant with stan¬ 
dard error 0.049 and 0.035, respectively even 
though the data were seasonally adjusted. The 
adequacy of seasonal adjustment deserves fur¬ 
ther study. Using model (10), we obtain the 
1-step ahead forecast of 8.8 for the April 2009 
unemployment rate, which is close to the actual 
data of 8.9. 

To model nonlinearity in the data, we employ 
TAR models and obtain the model 


yt 


0.083i/f_2 + 0.158i/f_3 + 0.118i/f_4 
—0.180y f _i2 + if yt-i < 0.1 

0.421 y f _2 + 0.239y f _3 — 0.127yf_i2 
+a 2 1 if yt-i > 0.1 


where the standard errors of an are 0.180 and 
0.217, respectively, the standard errors of the AR 
parameters in regime 1 are 0.046, 0.043, 0.042, 
and 0.037 whereas those of the AR parameters 
in regime 2 are 0.054, 0.057, and 0.075, respec¬ 
tively. The number of data points in regimes 1 
and 2 are 460 and 262, respectively. The stan¬ 
dardized residuals of model (11) only shows 
some minor serial correlation at lag 12. Based on 
the fitted TAR model, the dynamic dependence 
in the data appears to be stronger when the 
change in monthly unemployent rate is greater 
than 0.1%. This is understandable because a 
substantial increase in the unemployment rate 
is indicative of weakening in the U.S. economy, 
and policy makers might be more inclined to 
take action to help the economy which in turn 
may affect the dynamics of the unemployment 
rate series. Consequently, model (11) is capable 
of describing the time-varying dynamics of the 
U.S. unemployment rate. 

The MA representation of model (10) is 

i jr(B) « 1 + 0.01B + 0.18B 2 + 0.20B 3 
+0.18B 4 + 0.15B 5 + • • •. 

It is then not surprising to see that no y f _i term 
appears in model (11). 

Threshold models can be used in finance to 
handle the leverage effect, that is, volatility re¬ 
sponds differently to prior positive and nega¬ 
tive returns. The models can also be used to 
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Figure 3 Time Plot of the Daily Log Returns, in Percentages, for IBM Stock from January 2, 2001 to 
December 31, 2009 


study arbitrage trading in index futures and 
cash prices. See Tsay (2010, chap. 8) for dis¬ 
cussions and demonstration. Here we focus on 
volatility modeling and introduce an alterna¬ 
tive approach to parameterization of threshold 
GARCH (TGARCH) models. In some applica¬ 
tions, this new general TGARCH model fares 
better than the model of Glosten et al. (1993). 

Example 3. Consider the daily log returns, in 
percentages and including dividends, of IBM 
stock from January 2, 2001 to December 31, 
2009 for 2,263 observations. Figure 3 shows the 
time plot of the series. The volatility seems to 
be larger at the beginning and end of the data 
span. If GARCH models are entertained, we ob¬ 
tain the following GARCH(1,1) model for the 
series: 

r, = 0.058 + a t , a t = cr t e t 
a} = 0.041 + 0.093fl f 2 _ 1 + 0.894n 2 _ 1 K ’ 

where r t is the log return, { e t } is a Gaus¬ 
sian white noise sequence with mean zero 
and variance 1.0, the standard error of the 
constant term in the mean equation is 0.026, 
and those of the volatility equation are 0.012, 
0.020, and 0.021, respectively. All estimates 
are statistically significant at the 5% level. 
The Ljung-Box statistics of the standardized 


residuals, €t — £ t /d t , gi ve Q(10) = 10.08(0.43) 
and Q(20) = 23.24(0.28), where the number in 
parentheses denotes p-value obtained using the 
asymptotic X 2 distribution. For the squared 
standardized residuals, we obtain Q(10) = 
7.38(0.69) and Q(20) = 15.43(0.75). The model 
is adequate in modeling the serial dependence 
and conditional heteroscedasticity of the data. 
But the unconditional mean for rt of model (12) 
is 0.058, which is substantially larger than the 
sample mean 0.024, indicating that the model 
might be misspecified. 

Next, we employ the TGARCH model of 
Glosten et al. (1993) and obtain 

r t — 0.015 + a t , fl f = (T f Cf 
cr 2 = 0.032 + 0.033(1^ + 0.091N f _ 1 fl 2 _ 1 

+0.911ct 2 _ 1 (13) 

where N t -i is the indicator for negative a t _\ 
such that Nf_i = 1 if a t _\ < 0 and = 0 oth¬ 
erwise, the standard error of the parameter in 
the mean equation is 0.026, and those of the 
volatility equation are 0.005, 0.005, 0.006, and 
0.008, respectively. All estimates except the con¬ 
stant term of the mean equation are highly 
significant. Let a t be the standardized residu¬ 
als of model (13). We have Q(10) = 9.81(0.46) 
and Q(20) = 22.17(0.33) for the {a t } series and 
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Q(10) = 22.12(0.01) and Q(20) = 31.15(0.05) 
for {a 2 }. The model fails to describe the 
conditional heteroscedasticity of the data at the 
5% level. 

The idea of TAR models can be used to refine 
the prior TGARCH model by allowing for in¬ 
creased flexibility in modeling the asymmetric 
response in volatility. More specifically, we con¬ 
sider a TAR-GARCH(1,1) model for the series 
and use the constrained optimization method 
L-BFGS-B to perform estimation. The resulting 
model is 

Tf — 0.023 -j- Uf, cit = cTfCf 
a 2 = 0.086 + 0.044a £ + O^lScr 2 ^ 

+(-0.114 + 0.052a 2 _ 1 + 0.214rr t 2 _ 1 )M_ 1 

(14) 

where all estimates are significant at the 5% 
level and N t -\ is defined in equation (13). The 
estimate —0.114 is only marginally significant 
because its standard error is 0.055. The coeffi¬ 
cient of is greater than 1 when at-\ < 0, but 
it is not significantly different from 1 in view of 
its standard error. 

Let d t be the standardized residuals of model 
(14). We obtain Q(10) = 9.10(0.52) and Q(20) = 
21.82(0.35) for {a,} and Q(10) = 19.80(0.03) and 
Q(20) = 27.41(0.12) for {a 2 }. Thus, model (14) 
is adequate in modeling the serial correlation 
and conditional heteroscedasticity of the daily 
log returns of IBM stock considered. The 
unconditional mean return of model (14) is 
0.023, which is much closer to the sample mean 
0.024 than those implied by models (12) and 
(13). Comparing the fitted TAR-GARCH and 
TGARCH models, we see that the asymmetric 
behavior in daily IBM stock volatility is much 
stronger than what is allowed in a TGARCH 
model. Specifically, the coefficient of a*_ 1 also 
depends on the sign of a t -\. 

Smooth Transition AR 
(STAR) Model 

A criticism of the SETAR model is that its con¬ 
ditional mean equation is not continuous. The 


thresholds {y;} are the discontinuity points of 
the conditional mean function /x f . In response 
to this criticism, smooth TAR models have 
been proposed; see Chan and Tong (1986) and 
Terasvirta (1994) and the references therein. A 
time series x t follows a 2-regime STAR(p) model 
if it satisfies 

p 

+ = C 0 + £ <Po,iX t -i + F 
i =1 

x (c + £<£1 ,ix t ~i\ + at 

where d is the delay parameter, A and s are pa¬ 
rameters representing the location and scale of 
model transition, and F(.) is a smooth transition 
function. In practice, F(.) often assumes one of 
three forms—namely, logistic, exponential, or 
a cumulative distribution function. From equa¬ 
tion (15) and with 0 < K) < 1, the conditional 
mean of a STAR model is a weighted linear com¬ 
bination between the following two equations: 

v 

Mlf = Co + ^ 0o n x t-i 
i =1 

P 

M2 1 = (Co + Cl) + (0o ,i + 01 ,i) x t-i 

i =1 

The weights are determined in a continuous 
manner by F((x t -d — A)/s). The prior two 
equations also determine properties of a STAR 
model. For instance, a prerequisite for the sta- 
tionarity of a STAR model is that all zeros of 
both AR polynomials are outside the unit circle. 
An advantage of the STAR model over the TAR 
model is that the conditional mean function is 
differentiable. However, experience shows that 
the transition parameters A and s of a STAR 
model are hard to estimate. In particular, most 
empirical studies show that standard errors of 
the estimates of A and s are often quite large, 
resulting in f-ratios about 1.0; see Terasvirta 
(1994). This uncertainty leads to various com¬ 
plications in interpreting an estimated STAR 
model. 

Example 4. To illustrate the application of 
STAR models in financial time series analysis. 
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we consider the monthly simple stock returns 
for Minnesota Mining and Manufacturing (3M) 
Company from February 1946 to December 
2008. If ARCH models are entertained, we ob¬ 
tain the following ARCH(2) model 

Rf = 0.013 + flf, cit = (TfCt C16') 

ct 2 = 0.003 + 0.088(7^ + 0.109 a^_ 2 1 ’ 

where standard errors of the estimates are 0.002, 
0.0003, 0.047, and 0.050, respectively. As dis¬ 
cussed before, such an ARCH model fails to 
show the asymmetric responses of stock volatil¬ 
ity to positive and negative prior shocks. The 
STAR model provides a simple alternative that 
may overcome this difficulty Applying STAR 
models to the monthly returns of 3M stock, we 
obtain the model 

Rf = 0.015 + at, at = a> et 

a} = (0.003 + 0.205« + 0.092(7 2 _ 2 ) 

0.001 - 0.239(7 f 2 _j (17) 

~*~1 + exp(—1000(7 t _i) 

where the standard error of the constant term 
in the mean equation is 0.002 and the standard 
errors of the estimates in the volatility equa¬ 
tion are 0.0002, 0.074, 0.043, 0.0004, and 0.080, 
respectively. The scale parameter 1000 of the lo¬ 
gistic transition function is fixed a priori to sim¬ 
plify the estimation. This STAR model provides 
some support for asymmetric responses to pos¬ 
itive and negative prior shocks. For a large neg¬ 
ative (7f_i, the volatility model approaches the 
ARCH(2) model 

ct 2 = 0.003 + 0.205(7 f 2 _j + 0.092(7 2 _ 2 

Yet for a large positive (7 f _i, the volatility pro¬ 
cess behaves like the ARCH(2) model 

er 2 = 0.004 - 0.034(7 2 _ 1 + 0.092(7 t 2 _ 2 

The negative coefficient of (7 2 _j in the prior 
model is counterintuitive, but the magnitude is 
small. As a matter of fact, for a large positive 
shock (7f_i, the ARCH effects appear to be weak 
even though the parameter estimates remain 
statistically significant. 


Markov Switching Model 

The idea of using probability switching in 
nonlinear time series analysis is discussed in 
Tong (1983). Using a similar idea, but em¬ 
phasizing aperiodic transition between various 
states of an economy, Hamilton (1989) consid¬ 
ers the Markov switching autoregressive (MSA) 
model. Here the transition is driven by a hid¬ 
den two-state Markov chain. A time series x t 
follows an MSA model if it satisfies 

| c i + X!f= i 01 ’7 x t-i + a it if s t = 1 

} C 2 + J2Li 02 ,iX t -i + a 2 t if Sf = 2 

where Sf assumes values in {1,2} and is 
a first-order Markov chain with transition 
probabilities 

P(s t = 2|s f -i = 1) = mji, P(s t = l|Sf_i = 2) = w 2 

The innovational series {(7u} and {a 2 t} are se¬ 
quences of IID random variables with mean 
zero and finite variance and are independent of 
one another. A small w, means that the model 
tends to stay longer in state i. In fact, 1 / w, is 
the expected duration of the process to stay in 
state i. From the definition, an MSA model uses 
a hidden Markov chain to govern the transition 
from one conditional mean function to another. 
This is different from that of a SETAR model 
for which the transition is determined by a par¬ 
ticular lagged variable. Consequently, a SETAR 
model uses a deterministic scheme to govern 
the model transition whereas an MSA model 
uses a stochastic scheme. 

In practice, the stochastic nature of the states 
implies that one is never certain about which 
state x t belongs to in an MSA model. When the 
sample size is large, one can use some filtering 
techniques to draw inference on the state of x t . 
Yet as long as x,_ ( f is observed, the regime of 
x t is known in a SETAR model. This difference 
has important practical implications in forecast¬ 
ing. For instance, forecasts of an MSA model 
are always a linear combination of forecasts 
produced by submodels of individual states. 
But those of a SETAR model only come from 
a single regime provided that Xt-d is observed. 
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Figure 4 Time Plot of the Growth Rate of the U.S. Quarterly Real GNP from 1947.11 to 1991.1 
Note: The data are seasonally adjusted and in percentages. 


Forecasts of a SETAR model also become a lin¬ 
ear combination of those produced by models 
of individual regimes when the forecast hori¬ 
zon exceeds the delay d. It is much harder 
to estimate an MSA model than other mod¬ 
els because the states are not directly observ¬ 
able. Hamilton (1990) uses the EM algorithm, 
which is a statistical method iterating between 
taking expectation and maximization. McCul¬ 
loch and Tsay (1994) consider a Markov chain 
Monte Carlo (MCMC) method to estimate gen¬ 
eral MSA models. For applications of MCMC 
methods in finance, see Tsay (2010, Chapter 12). 

McCulloch and Tsay (1993) generalize the 
MSA model in equation (18) by letting the tran¬ 
sition probabilities w\ and w 2 be logistic, or 
probit, functions of some explanatory variables 
available at time f — 1. Chen, McCulloch, and 
Tsay (1997) use the idea of Markov switching as 
a tool to perform model comparison and selec¬ 
tion between nonnested nonlinear time series 
models (e.g., comparing bilinear and SETAR 
models). Each competing model is represented 
by a state. This approach to select a model is a 
generalization of the odds ratio commonly used 
in Bayesian analysis. Finally, the MSA model 
can easily be generalized to the case of more 
than two states. The computational intensity 


involved increases rapidly, however. For more 
discussions of Markov switching models in 
econometrics, see Hamilton (1994, Chapter 22). 

Example 5. Consider the growth rate, in per¬ 
centages, of the U.S. quarterly real gross na¬ 
tional product (GNP) from the second quarter 
of 1947 to the first quarter of 1991. The data 
are seasonally adjusted and shown in Figure 4, 
where a horizontal line of zero growth is also 
given. It is reassuring to see that a majority of 
the growth rates are positive. This series has 
been widely used in nonlinear analysis of eco¬ 
nomic time series. Tiao and Tsay (1994) and 
Potter (1995) use TAR models, whereas Hamil¬ 
ton (1989) and McCulloch and Tsay (1994) em¬ 
ploy Markov switching models. 

Employing the MSA model in equation (18) 
with p — 4 and using a Markov chain Monte 
Carlo method, McCulloch and Tsay (1994) ob¬ 
tain the estimates shown in Table 1. The re¬ 
sults have several interesting findings. First, the 
mean growth rate of the marginal model for 
state 1 is 0.909/(1 - 0.265 - 0.029 + 0.126 + 
0.11) = 0.965 and that of state 2 is —0.42/(l — 
0.216 - 0.628 + 0.073 + 0.097) = -1.288. Thus, 
state 1 corresponds to quarters with positive 
growth, or expansion periods, whereas state 2 
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Table 1 Estimation Results of a Markov Switching Model with p = 4 for the Growth Rate of U.S. Quarterly Real 
GNP, Seasonally Adjusted 


State 1 

Parameter 

C i 

<Pi 

<P 2 

<p3 

<Pi 


Wi 

Estimate 

0.909 

0.265 

0.029 

-0.126 

-0.110 

0.816 

0.118 

Std. Error 

0.202 

0.113 

0.126 

0.103 

0.109 

0.125 

0.053 

State 2 

Estimate 

-0.420 

0.216 

0.628 

-0.073 

-0.097 

1.017 

0.286 

Std. Error 

0.324 

0.347 

0.377 

0.364 

0.404 

0.293 

0.064 


Note: The estimates and their standard errors are posterior means and standard errors of a Gibbs sampling with 5000 
iterations. 


consists of quarters with negative growth, or a 
contraction period. Second, the relatively large 
posterior standard deviations of the parameters 
in state 2 reflect that there are few observations 
in that state. This is expected as Figure 4 shows 
few quarters with negative growth. Third, the 
transition probabilities appear to be different 
for different states. The estimates indicate that 
it is more likely for the U.S. GNP to get out 
of a contraction period than to jump into one 
—0.286 versus 0.118. Fourth, treating 1 / w, as 
the expected duration for the process to stay in 
state i, we see that the expected durations for a 
contraction period and an expansion period are 
approximately 3.69 and 11.31 quarters. Thus, 
on average, a contraction in the U.S. economy 
lasts about a year, whereas an expansion can 
last for 3 years. Finally, the estimated AR coef¬ 
ficients of x t _2 differ substantially between the 
two states, indicating that the dynamics of the 
U.S. economy are different between expansion 
and contraction periods. 

Nonparametric Methods 

In some financial applications, we may not have 
sufficient knowledge to prespecify the nonlin¬ 
ear structure between two variables Y and X. 
In other applications, we may wish to take ad¬ 
vantage of the advances in computing facili¬ 
ties and computational methods to explore the 
functional relationship between Y and X. These 
considerations lead to the use of nonparametric 


methods and techniques. Nonparametric meth¬ 
ods, however, are not without cost. They are 
highly data dependent and can easily result in 
overfitting. Our goal here is to introduce some 
nonparametric methods for financial applica¬ 
tions and some nonlinear models that make use 
of nonparametric methods and techniques. The 
nonparametric methods discussed include ker¬ 
nel regression, local least squares estimation, 
and neural network. 

The essence of nonparametric methods is 
smoothing. Consider two financial variables Y 
and X, which are related by 

Y, = m(Xt) + a t (19) 

where m(.) is an arbitrary, smooth, but un¬ 
known function and {a t } is a white noise se¬ 
quence. We wish to estimate the nonlinear 
function m(.) from the data. For simplicity, con¬ 
sider the problem of estimating /«(.) at a par¬ 
ticular date for which X = x. That is, we are 
interested in estimating m(x). Suppose that at 
X — x we have repeated independent observa¬ 
tions yi,... ,\Jt- Then the data become 

y t = m(x) + a t , t = 1,... ,T 
Taking the average of the data, we have 

EkA = m (x)+^f^ 

By the law of large numbers, the average of the 
shocks converges to zero as T increases. There¬ 
fore, the average y = (Yyr=i yt)/T is a consistent 
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estimate of m(x). That the average y provides a 
consistent estimate of m(x) or, alternatively, that 
the average of shocks converges to zero shows 
the power of smoothing. 

In financial time series, we do not have re¬ 
peated observations available at X = x. What 
we observed are {(yt, x t )} for t = 1,..., T. But if 
the function m(.) is sufficiently smooth, then the 
value of Y f for which X t ~ x continues to pro¬ 
vide accurate approximation of m(x). The value 
of Y( for which X f is far away from x provides 
less accurate approximation for m(x). As a com¬ 
promise, one can use a weighted average of 
instead of the simple average to estimate m(x). 
The weight should be larger for those Yf with 
X f close to x and smaller for those Y f with X f 
far away from x. Mathematically, the estimate 
of m(x) for a given x can be written as 

1 T 

m(x) = - w t (x)y t (20) 

t=i 

where the weights wt(x) are larger for those yt 
with Xf close to x and smaller for those y t with 
Xf far away from x. In equation (20), we assume 
that the weights sum to T. One can treat 2/T as 
part of the weights and make the weights sum 
to one. 

From equation (20), the estimate m(x) is sim¬ 
ply a local weighted average with weights de¬ 
termined by two factors. The first factor is the 
distance measure (i.e., the distance between x f 
and x). The second factor is the assignment 
of weight for a given distance. Different ways 
to determine the distance between x f and x 
and to assign the weight using the distance 
give rise to different nonparametric methods. 
In what follows, we discuss the commonly used 
kernel regression and local linear regression 
methods. 

Kernel Regression 

Kernel regression is perhaps the most com¬ 
monly used nonparametric method in smooth¬ 
ing. The weights here are determined by a 
kernel, which is typically a probability density 


function, is denoted by K(x), and satisfies 

K(x) >0, J K(z)dz = 1 

However, to increase the flexibility in distance 
measure, one often rescales the kernel using a 
variable h > 0, which is referred to as the band¬ 
width. The rescaled kernel becomes 

K h (x) = 1 K(x/h ), j K h (z)dz = 1 (21) 

The weight function can now be defined as 


Wf(x) = 


K/,(x — Xf) 
£t=i K,,(x - x t ) 


( 22 ) 


where the denominator is a normalization con¬ 
stant that makes the smoother adaptive to the 
local intensity of the X variable and ensures the 
weights sum to one. Plugging equation (22) into 
the smoothing formula (20), we have the well- 
known Nadaraya-Watson kernel estimator 


m(x) = w t(x)y t = _ Xt) f 

Z—/t= 1 %t) 


t =1 


(23) 


see Nadaraya (1964) and Watson (1964). In prac¬ 
tice, many choices are available for the kernel 
K(x). However, theoretical and practical con¬ 
siderations lead to a few choices, including the 
Gaussian kernel 



and the Epanechnikov kernel (Epanechnikov, 
1969) 



where 2(A) is an indicator such that 2(A) = 1 
if A holds and 2(A) = 0 otherwise. Figure 5 
shows the Gaussian and Epanechnikov kernels 
for h — 1. 

To gain insight into the bandwidth h, we 
evaluate the Nadaraya-Watson estimator with 
the Epanechnikov kernel at the observed val¬ 
ues {x f } and consider two extremes. First, if 
h -* 0, then 


m(x f) —> 


K>»(Q )yt 

K h ( 0) 


yt 
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X 


Figure 5 Standard Normal Kernel (Solid Line) and Epanechnikov Kernel (Dashed Line) with Band¬ 
width h = 1 


indicating that small bandwidths reproduce the 
data. Second, if h —>oo, then 


m(x t ) 


TJ =1 K„(0 )y t 

Eli K h (0) 


It* = 9 


t =1 


suggesting that large bandwidths lead to an 
oversmoothed curve—the sample mean. In 
general, the bandwidth function h acts as fol¬ 
lows. If h is very small, then the weights focus 
on a few observations that are in the neighbor¬ 
hood around each x ( . If h is very large, then the 
weights will spread over a larger neighborhood 
of x t . Consequently, the choice of h plays an 
important role in kernel regression. This is the 
well-known problem of bandwidth selection in 
kernel regression. 


Bandwidth Selection 

There are several approaches for bandwidth 
selection; see Hardle (1990) and Fan and Yao 
(2003). The first approach is the plug-in method, 
which is based on the asymptotic expansion of 
the mean integrated squared error (MISE) for 
kernel smoothers 

/ OO 

[m(x) — m(x)] 2 dx 

-OO 


where m(.) is the true function. The quantity 
£ [m(x) — m(x)] 2 of the MISE is a pointwise mea¬ 
sure of the mean squared error (MSE) of m(x) 
evaluated at x. 

Under some regularity conditions, one can 
derive the optimal bandwidth that minimizes 
the MISE. The optimal bandwidth typically de¬ 
pends on several unknown quantities that must 
be estimated from the data with some prelim¬ 
inary smoothing. Several iterations are often 
needed to obtain a reasonable estimate of the 
optimal bandwidth. In practice, the choice of 
preliminary smoothing can become a problem. 
Fan and Yao (2003) give a normal reference 
bandwidth selector as 


*opt 


1.06s T 1//5 for the Gaussian kernel 
2.34s T _1/5 for the Epanechnikov kernel 


where s is the sample standard error of the 
indepenent variable, which is assumed to be 
stationary. 

The second approach to bandwidth selection 
is the leave-one-out cross-validation. First, one 
observation (xj, yj) is left out. The remaining 
T — 1 data points are used to obtain the follow¬ 
ing smoother at Xf. 

m h ,j(xj) = —— Y Mxj)yt 
t*j 
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which is an estimate of \jj, where the weights 
w t (xj) sum to T—1. Second, perform step-1 for 
j = 1,... ,T and define the function 

1 T 

CV(h) = - [y/ - ™h,j(xj)] 2 W(xj) 

;'=i 

where w(.) is a nonnegative weight function 
satisfying E/=l W(xj) — T, that can be used to 
down-weight the boundary points if necessary. 
Decreasing the weights assigned to data points 
close to the boundary is needed because those 
points often have fewer neighboring observa¬ 
tions. The function CV(h) is called the cross- 
validation function because it validates the 
ability of the smoother to predict {y f }T f=1 . One 
chooses the bandwidth h that minimizes the 
CV(.) function. 

Local Linear Regression Method 

Assume that the second derivative of m(.) in 
model (19) exists and is continuous at x, where 
x is a given point in the support of /«(.). Denote 
the data available by {(y t , x f )} f r =1 . The local linear 
regression method to nonparametric regression 
is to find a and b that minimize 

T 

L(a, b) = E l]/t - a - b(x - x t )] 2 K h (x - x t ) 

t= 1 

(24) 

where K),(.) is a kernel function defined in equa¬ 
tion (21) and h is a bandwidth. Denote the re¬ 
sulting value of a by a. The estimate of m(x) is 
then defined as a. In practice, x assumes an ob¬ 
served value of the independent variable. The 
estimate b can be used as an estimate of the first 
derivative of m(.) evaluated at x. 

Under the least squares theory, equation (24) 
is a weighted least squares problem and one 
can derive a closed-form solution for a. Specif¬ 
ically, taking the partial derivatives of L(a, b) 
with respect to both a and b and equating the 
derivatives to zero, we have a system of two 
equations with two unknowns: 

T T 

^2 K h{x - x t )y, = a ^2 K h (x - x,) 

t=l t =1 

T 

+b^2(x - x t )K,,(x - x t ) 

t=l 


T T 

^ y t (x - x t )K h (x - x t ) = a^2(x - x t )K h (x - x t ) 

t =1 t =1 

T 

+bJ2( x - x t ) 2 K h (x - x t ) 

t =1 


Define 

T 

s T i = Kh(x — x t )(x — x t y, i = 0,1, 2. 

t =l 


The prior system of equations becomes 


St,0 

Sr,i 

a 

.sr,i 

ST,2_ 

b _ 


ELi K i‘( x - x t)yt 

E f =i ( x - x t)Kn( x - x i)yt 

Consequently, we have 
a = 

St,2 ELi k ''( x ~ x ‘)Vt ~ s t, i ELi ( x ~ x t )K h (x - x t )y t 

Sr, 0 ST ,2 — 

The numerator and denominator of the prior 
fraction can be further simplified as 

T 

St,2 = E K h(x - Xt)yt 

f=l 

r 

-s T ,i E ( x ~ x t )K h (x - x,)y t 

t -1 
T 

= E i K h(x ~ X t )(s T ,2 - ( X - X t )S T ,l)]yt. 

t=l 

T 

ST,0ST,2 ~ Sj 1 = E k(,(x - X t )S T ,2 
t=l 

T 

- E ( x - x t )K h (x - x t )s T ,i 

t=l 

T 

= E Kh(x - x t )[{ s T ,2 - (X - X,)s r ,i)] 

t=l 


In summary, we have 

- = E f =i w >yt 
E f =i m 

where wt is defined as 


(25) 


w t = Ki,(x — Xf)[sr ,2 — (x — Xf)sr,i] 


In practice, to avoid possible zero in the denom¬ 
inator, we use wz(x) next to estimate m(x): 


m(x) = 


EL i w ty t 

E f =i w f + 1/T 2 


(26) 
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Notice that a nice feature of equation (26) is 
that the weight w f satisfies 

T 

Y: ( x—x t )w t — o 

f=l 

Also, if one assumes that m{.) of equation (19) 
has the first derivative and finds the minimizer 
of 

T 

Y(yt - a ? K h{x - x t ) 

t=i 

then the resulting estimator is the Nadaraya- 
Watson estimator mentioned earlier. In general, 
if one assumes that m(x) has a bounded /cth 
derivative, then one can replace the linear poly¬ 
nomial in equation (24) by a (k— l)-order poly¬ 
nomial. We refer to the estimator in equation 
(26) as the local linear regression smoother. Fan 
(1993) shows that, under some regularity condi¬ 
tions, the local linear regression estimator has 
some important sampling properties. The se¬ 
lection of bandwidth can be carried out via the 
same methods as before. 

Financial Time Series Application 

In time series analysis, the explanatory vari¬ 
ables are often the lagged values of the series. 
Consider the simple case of a single explanatory 
variable. Here model (19) becomes 

Xt — m(x t -i) + a t 

and the kernel regression and local linear re¬ 
gression method discussed before are directly 
applicable. When multiple explanatory vari¬ 
ables exist, some modifications are needed to 
implement the nonparametric methods. For the 
kernel regression, one can use a multivari¬ 
ate kernel such as a multivariate normal den¬ 
sity function with a prespecified covariance 
matrix: 

K h (x) = - -=^ -exp (■) 

(WSFmiv 2 v 2 h 2 j 

where p is the number of explanatory variables 
and E is a prespecified positive-definite ma¬ 
trix. Alternatively, one can use the product of 


univariate kernel functions as a multivariate 
kernel—for example. 


0.75 




i =1 


hi' 



This latter approach is simple, but it over¬ 
looks the relationship between the explanatory 
variables. 


Example 6. To illustrate the application of 
nonparametric methods in finance, consider the 
weekly 3-month Treasury bill secondary market 
rate from 1970 to 1997 for 1,461 observations. 
The data are obtained from the Federal Reserve 
Bank of St. Louis and are shown in Figure 6. 
This series has been used in the literature as 
an example of estimating stochastic diffusion 
equations using discretely observed data. Here 
we consider a simple model 

yt = p.(xt-i)dt + a(xt-i)dwt 

where Xt is the 3-month Treasury bill rate, yt = 
Xt — x t _i, w t is a standard Brownian motion, and 
//.(.) and ct(.) are smooth functions of x t -i, and 
apply the local smoothing function lowess of 
R or S-Plus to obtain nonparametric estimates 
of //(.) and er(.); see Cleveland (1979). For sim¬ 
plicity, we use \y t \ as a proxy of the volatility 
of Xt- 

For the simple model considered, ii{x t - 1 ) is 
the conditional mean of y t given Xt- 1 , that is, 
fx(x t - 1 ) = E(y t \xt-\). Figure 7(a) shows the scat- 
terplot of y(t) versus Xt _i. The plot also contains 
the local smooth estimate of /r(x f _i) obtained by 
the method of lowess in the statistical package 
R. The estimate is essentially zero. However, 
to better understand the estimate. Figure 7(b) 
shows the estimate jX{x t -\) on a finer scale. It is 
interesting to see that jl(xt-\) is positive when 
Xt -1 is small, but becomes negative when x t -\ 
is large. This is in agreement with the common 
sense that when the interest rate is high, it is ex¬ 
pected to come down, and when the rate is low, 
it is expected to increase. Figure 7(c) shows the 
scatterplot of |y(f)| versus x f _i and the estimate 
of <r(Xf_i) via lowess. The plot confirms that the 
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Weekly 3-m TB Rate, 2nd Market 



Figure 6 Time Plot of U.S. Weekly 3-Month Treasury Bill Rate in the Seconday Market from 1970 to 
1997 


higher the interest rate, the larger the volatil¬ 
ity. Figure 7(d) shows the estimate a(x t _ i) on a 
finer scale. Clearly the volatility is an increasing 
function of Xt -1 and the slope seems to acceler¬ 
ate when xt -i is approaching 10%. This exam- 

fa) y(t) vs. x(t-1) 

C\J 

9 -I- 



0.04 0.06 0.08 0.10 0.12 0.14 0.16 

x(t- 1 ) 


(b) Estimate of mu(.) 



x(t-1) 


pie demonstrates that simple non-parametric 
methods can be helpful in understanding the 
dynamic structure of a financial time series. 

The following nonlinear models are derived 
with the help of nonparametric methods. 


(c) abs(y) vs. x(t—1) 



0.04 0.06 0.08 0.10 0.12 0.14 0.16 
x(t-1) 


(d) Estimate of sigma(.) 



x(t-1) 


Figure 7 Estimation of Conditional Mean and Volatility of Weekly 3-Month Treasury Bill Rate via a 
Local Smoothing Method: (a) y t versus x t -\, where y f = x t — x f _i and x t is the interest rate; (b) estimate of 
fi(x t - 1 ); (c) \y t | versus x t -\', and (d) estimate of a(x t -i) 
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Functional Coefficient AR Model 

Recent advances in nonparametric techniques 
enable researchers to relax parametric con¬ 
straints in proposing nonlinear models. In 
some cases, nonparametric methods are used 
in a preliminary study to help select a para¬ 
metric nonlinear model. This is the approach 
taken by Chen and Tsay (1993a) in proposing 
the functional-coefficient autoregressive (FAR) 
model that can be written as 

x t = /i(X f _i)x f _i + ■ • • + fp(Xt-\)xt-p + a t 

(27) 

where X t _i = (x t _i,..., X t -k)' is a vector of 
lagged values of Xf. If necessary xt -1 may also 
include other explanatory variables available at 
time t— 1. The functions/)/.) of equation (27) are 
assumed to be continuous, even twice differen¬ 
tiable, almost surely with respect to their argu¬ 
ments. Most of the nonlinear models discussed 
before are special cases of the FAR model. In 
application, one can use nonparametric meth¬ 
ods such as kernel regression or local linear re¬ 
gression to estimate the functional coefficients 
/)(.), especially when the dimension of X f _i 
is low (e.g., X t _i is a scalar). Recently, Cai, 
Fan, and Yao (2000) applied the local linear re¬ 
gression method to estimate/)/.) and showed 
that substantial improvements in 1-step ahead 
forecasts can be achieved by using FAR 
models. 

Nonlinear Additive AR Model 

A major difficulty in applying nonparametric 
methods to nonlinear time series analysis is 
the "curse of dimensionality." Consider a gen¬ 
eral nonlinear AR(p) process x t = f(x t _\,..., 
x t _p) + a t . A direct application of nonparamet¬ 
ric methods to estimate /(.) would require p- 
dimensional smoothing, which is hard to do 
when p is large, especially if the number of data 
points is not large. A simple, yet effective way to 
overcome this difficulty is to entertain an addi¬ 
tive model that only requires lower dimensional 
smoothing. A time series x t follows a nonlinear 


additive AR (NAAR) model if 

P 

x t — fo(t) + fi( x t-i ) + a t (28) 

i =1 

where the /)(.) are continuous functions al¬ 
most surely. Because each function fi(.) has 
a single argument, it can be estimated non- 
parametrically using one-dimensional smooth¬ 
ing techniques and hence avoids the curse of 
dimensionality. In application, an iterative esti¬ 
mation method that estimates/)/.) nonparamet- 
rically conditioned on estimates of/)/.) for all 
j ^ i is used to estimate a NAAR model; see 
Chen and Tsay (1993b) for further details and 
examples of NAAR models. 

The additivity assumption is rather restric¬ 
tive and needs to be examined carefully in 
application. Chen, Liu, and Tsay (1995) con¬ 
sider test statistics for checking the additivity 
assumption. 

Nonlinear State-Space Model 

Making using of recent advances in MCMC 
methods (Gelfand and Smith, 1990), Carlin, Poi¬ 
son, and Staffer (1992) propose a Monte Carlo 
approach for nonlinear state-space modeling. 
The model considered is 

St — ft(St-i) + tit, x t = gt(St) + Vt (29) 

where S t is the state vector, //(.) and g t (.) are 
known functions depending on some unknown 
parameters, {lit} is a sequence of IID multivari¬ 
ate random vectors with zero mean and non¬ 
negative definite covariance matrix Xu, {v t } is 
a sequence of IID random variables with mean 
zero and variance er^, and { u t \ is independent 
of {v t }■ 

Monte Carlo techniques are employed to 
handle the nonlinear evolution of the state 
transition equation because the whole condi¬ 
tional distribution function of St given Sf_i is 
needed for a nonlinear system. Other numerical 
smoothing methods for nonlinear time series 
analysis have been considered by Kitagawa 
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(1998) and the references therein. MCMC 
methods (or computing-intensive numerical 
methods) are powerful tools for nonlinear time 
series analysis. Their potential has not been 
fully explored. However, the assumption of 
knowing/f(.) and g t (.) in model (29) may hinder 
practical use of the proposed method. A possi¬ 
ble solution to overcome this limitation is to use 
nonparametric methods such as the analyses 
considered in FAR and NAAR models to specify 
/t(.) and g t (.) before using nonlinear state-space 
models. 


Neural Networks 

A popular topic in modern data analysis is 
neural network, which can be classified as a 
semiparametric method. The literature on neu¬ 
ral network is enormous, and its application 
spreads over many scientific areas with vary¬ 
ing degrees of success; see Ripley (1993, Sec¬ 
tions 2 and 10). Cheng and Titterington (1994) 
provide information on neural networks from a 
statistical viewpoint. In this subsection, we fo¬ 
cus solely on the feed-forward neural networks 
in which inputs are connected to one or more 
neurons, or nodes, in the input layer, and these 
nodes are connected forward to further layers 
until they reach the output layer. Figure 8 shows 
an example of a simple feed-forward network 
for univariate time series analysis with one hid¬ 
den layer. The input layer has two nodes, and 
the hidden layer has three. The input nodes 
are connected forward to each and every node 
in the hidden layer, and these hidden nodes 
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Figure 8 A Feed-Forward Neural Network with 
One Hidden Layer for Univariate Time Series 
Analysis 


are connected to the single node in the out¬ 
put layer. We call the network a 2-3-1 feed¬ 
forward network. More complicated neural 
networks, including those with feedback con¬ 
nections, have been proposed in the literature, 
but the feed-forward networks are most rele¬ 
vant to our study. 


Feed-Forward Neural Networks 

A neural network processes information from 
one layer to the next by an "activation func¬ 
tion." Consider a feed-forward network with 
one hidden layer. The jth node in the hidden 
layer is defined as 

hj = fj(a 0 j + ^ WijXi) (30) 


where x, is the value of the ith input node, fj(.) is 
an activation function typically taken to be the 
logistic function 


m = 


exp(z) 

1 + exp(z) 


a<jj is called the bias, the summation i -> j means 
summing over all input nodes feeding to j, and 
Wij are the weights. For illustration, the /th node 
of the hidden layer of the 2-3-1 feed-forward 
network in Figure 8 is 


hi = - 

1 1 


exp(a 0; + w\jX\ + w 2 jX 2 ) 


exp(a 0; + w\jX\ + w 2 jX 2 ) 

For the output layer, the node is defined as 


j = 1,2,3. 
(31) 


0 = foipiQo + ^2 Wjohj) 


1^0 


(32) 


where the activation function/ 0 (.) is either lin¬ 
ear or a Heaviside function. If/ 0 (.) is linear, then 

k 

o — a 0o + ^2 w j°h i 
;=i 

where k is the number of nodes in the hid¬ 
den layer. By a Heaviside function, we mean 
f 0 (z) = 1 if z > 0 and/ 0 (z) = 0 otherwise. A neu¬ 
ron with a Heaviside function is called a thresh¬ 
old neuron, with "1" denoting that the neuron 
fires its message. For example, the output of the 
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2-3-1 network in Figure 8 is 

0 — a 0o + w lo hi + u>2oh 2 + w 3o h 3 


if the activation function is linear; it is 

_ 1 if aoo + w i 0 fti + w 2o h 2 + w 3o li 3 > 0 
0 if a 0o + w\ 0 hi + w 2o h 2 + w 3o h 3 < 0 

if f 0 (.) is a Heaviside function. 

Combining the layers, the output of a feed¬ 
forward neural network can be written as 


o = f 0 


aoo 


Wjofj ctoj + Wi i Xi 


»->■/ 


(33) 


If one also allows for direct connections from the 
input layer to the output layer, then the network 
becomes 


o = fo 


OlQo + J2 di 0 Xi+ J2 w j»fj 




x ( «0; + J2 w ij x i 


(34) 


where the first summation is summing over the 
input nodes. When the activation function of 
the output layer is linear, the direct connections 
from the input nodes to the output node repre¬ 
sent a linear function between the inputs and 
output. Consequently, in this particular case 
model (34) is a generalization of linear models. 
For the 2-3-1 network in Figure 8, if the output 
activation function is linear, then equation (33) 
becomes 


3 

o — a 0o + ^2 Wj 0 hj 

;'=i 

where hj is given in equation (31). The network 
thus has 13 parameters. If equation (34) is used, 
then the network becomes 

2 3 

o = a 0o + ^2 a io Xi + ^ w joh j 
>'=i ;'=i 

where again hj is given in equation (31). The 
number of parameters of the network increases 
to 15. 

We refer to the function in equation (33) or (34) 
as a semiparametric function because its func¬ 


tional form is known, but the number of nodes 
and their biases and weights are unknown. The 
direct connections from the input layer to the 
output layer in equation (34) mean that the net¬ 
work can skip the hidden layer. We refer to 
such a network as a skip-layer feed-forward 
network. 

Feed-forward networks are known as mul¬ 
tilayer percetrons in the neural network liter¬ 
ature. They can approximate any continuous 
function uniformly on compact sets by increas¬ 
ing the number of nodes in the hidden layer; see 
Hornik, Stinchcombe, and White (1989), Hornik 
(1993), and Chen and Chen (1995). This prop¬ 
erty of neural networks is the universal approx¬ 
imation property of the multilayer percetrons. 
In short, feed-forward neural networks with a 
hidden layer can be seen as a way to parame¬ 
terize a general continuous nonlinear function. 

Training and Forecasting 

Application of neural networks involves two 
steps. The first step is to train the network (i.e., 
to build a network, including determining the 
number of nodes and estimating their biases 
and weights). The second step is inference, es¬ 
pecially forecasting. The data are often divided 
into two nonoverlapping subsamples in the 
training stage. The first subsample is used to es¬ 
timate the parameters of a given feed-forward 
neural network. The network so built is then 
used in the second subsample to perform fore¬ 
casting and compute its forecasting accuracy. By 
comparing the forecasting performance, one se¬ 
lects the network that outperforms the others as 
the "best" network for making inference. This is 
the idea of cross-validation widely used in sta¬ 
tistical model selection. Other model selection 
methods are also available. 

In a time series application, let {(r f , x t )\t — 
1,..., T} be the available data for network train¬ 
ing, where x t denotes the vector of inputs and 
r, is the series of interest (e.g., log returns of an 
asset). For a given network, let Ot be the output 
of the network with input * f ; see equation (34). 
Training a neural network amounts to choosing 
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its biases and weights to minimize some fitting 
criterion—for example, the least squares 

S 2 = J](r,-o t ) 2 

f=i 

This is a nonlinear estimation problem that can 
be solved by several iterative methods. To en¬ 
sure the smoothness of the fitted function, some 
additional constraints can be added to the prior 
minimization problem. In the neural network 
literature, the back propagation (BP) learning 
algorithm is a popular method for network 
training. The BP method, introduced by Bryson 
and Ho (1969), works backward starting with 
the output layer and uses a gradient rule to 
modify the biases and weights iteratively. (Ap¬ 
pendix 2 A of Ripley, 1993, provides a derivation 
of back propagation.) Once a feed-forward neu¬ 
ral network is built, it can be used to compute 
forecasts in the forecasting subsample. 

Example 7. To illustrate applications of the 
neural network in finance, we consider the 
monthly log returns, in percentages and includ¬ 
ing dividends, for IBM stock from January 1926 
to December 1999. We divide the data into two 
subsamples. The first subsample consisting of 
returns from January 1926 to December 1997 
for 864 observations is used for modeling. Us¬ 
ing model (34) with three inputs and two nodes 
in the hidden layer, we obtain a 3-2-1 network 
for the series. The three inputs are r t -\, r t - 2 , 
and r f _3 and the biases and weights are given 
next: 

ft = 3-22 - 1.81 /i(r f _i) - 2.28/ 2 (r f _i) 

-0.09r,_i - 0.05r f _ 2 - 0.12r f _ 3 K ’ 

where r t -1 = (r f _i, r f _ 2 , r*_ 3 ) and the two logistic 
functions are 

/i(r f _i) = 

exp(—8.34 - 18.97r,_! + 2.17r f _ 2 - 19.17r f _ 3 ) 

1 + exp(—8.34 - 18.97r f _! + 2.17r f _ 2 - 19.17r f _ 3 ) 
fiirt-i) = 

exp(39.25 - 22.17r f _i - 17.34r f _ 2 - 5.98r,_ 3 ) 

1 + exp(39.25 - 22.17r f _ 1 - 17.34r f _ 2 - 5.98r,_ 3 ) 


The standard error of the residuals for the prior 
model is 6.56. For comparison, we also built an 
AR model for the data and obtained 

r t = 1.101 + 0.077r f _i + a t , o a = 6.61 (36) 

The residual standard error is slightly greater 
than that of the feed-forward model in equation 

(35). 

Forecast Comparison 

The monthly returns of IBM stock in 1998 and 
1999 form the second subsample and are used 
to evaluate the out-of-sample forecasting per¬ 
formance of neural networks. As a benchmark 
for comparison, we use the sample mean of q in 
the first subsample as the 1-step ahead forecast 
for all the monthly returns in the second sub¬ 
sample. This corresponds to assuming that the 
log monthly price of IBM stock follows a ran¬ 
dom walk with drift. The mean squared fore¬ 
cast error (MSFE) of this benchmark model is 
91.85. For the AR(1) model in equation (36), 
the MSFE of 1-step ahead forecasts is 91.70. 
Thus, the AR(1) model slightly outperforms the 
benchmark. For the 3-2-1 feed-forward network 
in equation (35), the MSFE is 91.74, which is es¬ 
sentially the same as that of the AR(1) model. 

Example 8. Nice features of the feed-forward 
network include its flexibility and wide ap¬ 
plicability. For illustration, we use the net¬ 
work with a Heaviside activation function for 
the output layer to forecast the direction of 
price movement for IBM stock considered in 
Example 7. Define a direction variable as 



We use eight input nodes consisting of the first 
four lagged values of both r t and d t and four 
nodes in the hidden layer to build an 8-4-1 
feed-forward network for d t in the first sub¬ 
sample. The resulting network is then used to 
compute the 1-step ahead probability of an "up¬ 
ward movement" (i.e., a positive return) for the 
following month in the second subsample. 
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Figure 9 One-Step Ahead Probability Forecasts for a Positive Monthly Return for IBM Stock Using an 
8-4-1 Feed-Forward Neural Network 

Note: The forecasting period is from January 1998 to December 1999. 


Figure 9 shows a typical output of probability 
forecasts and the actual directions in the second 
subsample with the latter denoted by circles. A 
horizontal line of 0.5 is added to the plot. If we 
take a rigid approach by letting = 1 if the 
probability forecast is greater than or equal to 
0.5 and = 0 otherwise, then the neural net¬ 
work has a successful rate of 0.58. The success 
rate of the network varies substantially from 
one estimation to another, and the network uses 
49 parameters. 

To gain more insight, we did a simulation 
study of running the 8-4-1 feed-forward net¬ 
work 500 times and computed the number of 
errors in predicting the upward and downward 
movement using the same method as before. 
The mean and median of errors over the 500 
runs are 11.28 and 11, respectively, whereas the 
maximum and minimum number of errors are 
18 and 4. For comparison, we also did a simu¬ 
lation with 500 runs using a random walk with 
drift—that is. 


1 if f t — 1.19 + e t > 0, 
0 otherwise 


where 1.19 is the average monthly log return 
for IBM stock from January 1926 to December 


1997 and {e f } is a sequence of IID N(0, 1) ran¬ 
dom variables. The mean and median of the 
number of forecast errors become 10.53 and 11, 
whereas the maximum and minimum number 
of errors are 17 and 5, respectively. Figure 10 
shows the histograms of the number of forecast 
errors for the two simulations. The results show 
that the 8-4-1 feed-forward neural network does 
not outperform the simple model that assumes 
a random walk with drift for the monthly log 
price of IBM stock. 


NONLINEARITY TESTS 

In this section, we discuss some nonlinearity 
tests available in the literature that have decent 
power against the nonlinear models considered 
earlier in this entry. The tests discussed include 
both parametric and nonparametric statistics. 
The Ljung-Box statistics of squared residuals, 
the bispectral test, and the Brock, Dechert, 
and Scheinkman (BDS) test are nonparametric 
methods. The RESET test (Ramsey, 1969), the F 
tests of Tsay (1986, 1989), and other Lagrange 
multiplier and likelihood ratio tests depend on 
specific parametric functions. Because nonlin¬ 
earity may occur in many ways, there exists no 
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Neural network Random walk with a drift 

Figure 10 Histograms of the Number of Forecasting Errors for the Directional Movements of Monthly 
Log Returns of IBM Stock 

Note: The forecasting period is from January 1998 to December 1999. 


single test that dominates the others in detect¬ 
ing nonlinearity. 

Nonparametric Tests 

Under the null hypothesis of linearity, residu¬ 
als of a properly specified linear model should 
be independent. Any violation of independence 
in the residuals indicates inadequacy of the 
entertained model, including the linearity as¬ 
sumption. This is the basic idea behind various 
nonlinearity tests. In particular, some of the 
nonlinearity tests are designed to check for pos¬ 
sible violation in quadratic forms of the under¬ 
lying time series. 

Q-Statistic of Squared Residuals 

McLeod and Li (1983) apply the Ljung- 
Box statistics to the squared residuals of an 
ARMA(p, q) model to check for model inade¬ 
quacy. The test statistic is 

m a 2 / 2\ 

Q(m) = T(T + 

i =1 

where T is the sample size, m is a properly cho¬ 
sen number of autocorrelations used in the test. 


a t denotes the residual series, and p,(aj) is the 
lag-/' ACF of a f. If the entertained linear model is 
adequate, Q{m) is asymptotically a chi-squared 
random variable with m — p—q degrees of free¬ 
dom. The prior Q-statistic is useful in detect¬ 
ing conditional heteroscedasticity of a t and is 
asymptotically equivalent to the Lagrange mul¬ 
tiplier test statistic of Engle (1982) for ARCH 
models. The null hypothesis of the statistics is 
H 0 : p ! = ••• = p m = 0, where Pi is the coefficient 
of af_ t in the linear regression 

fl f 2 = A) + Pl^l-i + ■ • • + Pm^j- m + D 

for f = m + 1,... ,T. Because the statistic is 
computed from residuals (not directly from the 
observed returns), the number of degrees of 
freedom is m — p — q. 

Bispectral Test 

This test can be used to test for linearity and 
Gaussianity. It depends on the result that a 
properly normalized bispectrum of a linear 
time series is constant over all frequencies 
and that the constant is zero under normality. 
The bispectrum of a time series is the Fourier 
transform of its third-order moments. For a 
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stationary time series x t in equation (1), the 
third-order moment is defined as 

OO 

c(u,v) = g ^2 'fa'foc+u'fa+v (37) 

k ——oo 

where u and v are integers, g — E(af), \[r 0 = 1, 
and \jr k — Q for k < 0. Taking Fourier transforms 
of equation (37), we have 

b 3 (wi, w 2 ) = ^r[-(u>i + w 2 )]r(wi)r(w 2 ) 

(38) 

where r(w) = Yl'uLo ’/'» exp (—iwu) with i = 
V—1, and uij are frequencies. Yet the spectral 
density function of x t is given by 

2 

V( w ) = ^l r (w)| : 

where w denotes the frequency. Consequently, 
the function 

,, v \b 3 {wi, w 2 )\ 2 

b(W 1 , W2) = - 

p(wi)p(w 2 )p(wi + w 2 ) (39) 

= constant for all (uq, w 2 ) 

The bispectrum test makes use of the property 
in equation (39). Basically, it estimates the func¬ 
tion b(w i, w 2 ) in equation (39) over a suitably 
chosen grid of points and applies a test statistic 
similar to Hotelling's T 2 statistic to check the 
constancy of b(ww 2 ). For a linear Gaussian 
series, E(af) = g = 0 so that the bispectrum is 
zero for all frequencies (uq, w 2 ). For further de¬ 
tails of the bispectral test, see Priestley (1988), 
Subba Rao and Gabr (1984), and Hinich (1982). 
Limited experience shows that the test has de¬ 
cent power when the sample size is large. 

BDS Statistic 

Brock, Dechert, and Scheinkman (1987) propose 
a test statistic, commonly referred to as the BDS 
test, to detect the IID assumption of a time se¬ 
ries. The statistic is, therefore, different from 
other test statistics discussed because the lat¬ 
ter mainly focus on either the second- or third- 
order properties of x t . The basic idea of the BDS 
test is to make use of a "correlation integral" 


popular in chaotic time series analysis. Given a 
/c-dimensional time series X t and observations 
{X t )2 v define the correlation integral as 

Ck(S ) = lim 2 J2 h(X > ’ X 0 (40) 

Tk^oo l k (l k - i) 

where Is(u, v) is an indicator variable that equals 
one if || u —p|| < S, and zero otherwise, where || ,| 
is the supnorm. The correlation integral mea¬ 
sures the fraction of data pairs of {X f } that are 
within a distance of S from each other. 

Consider next a time series x t . Construct k- 
dimensional vectors X\ = ( x t , x t+ i, ..., x t+k _i)', 
which are called /c-hi stories. The idea of the BDS 
test is as follows. Treat a /c-history as a point in 
the /c-dimensional space. If {x t }J =1 are indeed IID 
random variables, then the /c-histories {} t= *L 1 
should show no pattern in the /c-d imensional 
space. Consequently, the correlation integrals 
should satisfy the relation C k (S) = [Ci G)]^. Any 
departure from the prior relation suggests that 
x t are not IID. As a simple but informative ex¬ 
ample, consider a sequence of IID random vari¬ 
ables from the uniform distribution over [0, 1]. 
Let [a, b] be a subinterval of [0, 1] and con¬ 
sider the "2-history" ( x t , Xf+i), which represents 
a point in the two-dimensional space. Under 
the IID assumption, the expected number of 
2-histories in the subspace [a, b] x [a, b] should 
equal the square of the expected number of Xt 
in [a, b]. 

This idea can be formally examined by using 
sample counterparts of correlation integrals. 
Define 

e t (S,T)= T 2 Y>(Xf,Xp, l = l,k 
T k (T k ~ 1) f-' 

where T) =T — l + l and X* = Xi if i = 1 and 
X* = Xf iff = k. Under the null hypothesis that 
{x t } are IID with a nondegenerated distribution 
function F(.), Brock, Dechert, and Scheinkman 
(1987) show that 

C k (S, T) —»■ [Cr(5)] fc with probability 1, 
as T -> oo 
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for any fixed k and 8. Furthermore, the statis¬ 
tic VT{Ck(S, T) — [Ci(i5, T)] k ] is asymptotically 
distributed as normal with mean zero and 
variance 


°H S ) = 

i *-i 

4 ^ + 2^ N k ->C 2 ’ + (k- 1 ) 2 C 2k 
\ /=* 


k 2 NC 2k ~ 2 


where C — f [F (z + 8) — F (z — S)]dF (z) and 
N = J [F (z + 8) — F (z — S)] 2 dF (z). Note that 
Ci {8, T) is a consistent estimate of C, and N 
can be consistently estimated by 


N(S, T) 


6 

T k (T k - 1 )(T k - 2) 

E h(xt,x s )h(x s ,x u ) 

t<s<u 


The BDS test statistic is then defined as 


Dk(S, T) = Vf{C k (8, T) - [Ci(5, T)] k }/a k (8, T) 

(41) 

where a k {8, T) is obtained from a k (S) when C 
and N are replaced by Ci(<5, T) and N(S, T), 
respectively This test statistic has a standard 
normal limiting distribution. For further dis¬ 
cussion and examples of applying the BDS test, 
see FIsieh (1989) and Brock, FIsieh, and LeBaron 
(1991). In application, one should remove linear 
dependence, if any, from the data before apply¬ 
ing the BDS test. The test may be sensitive to the 
choices of 8 and k, especially when k is large. 


Parametric Tests 

Turning to parametric tests, we consider the 
RESET test of Ramsey (1969) and its general¬ 
izations. We also discuss some test statistics for 
detecting threshold nonlinearity. 

The RESET Test 

Ramsey (1969) proposes a specification test for 
linear least squares regression analysis. The test 
is referred to as a RESET test and is readily ap¬ 
plicable to linear AR models. Consider the lin¬ 
ear AR(p) model 


where X t _! = (1, x t _i,..., x t _ p )' and </> = 

0i,..., (j) p y. The first step of the RESET test is to 
obtain the least squares estimate 0 of equation 
(42) and compute the fit x t = X^^, the residual 
a t — x t — Xt, and the sum of squared residuals 
SSR o = Ei=p+ i^t' where T is the sample size. 
In the second step, consider the linear regres¬ 
sion 


dt — Xj_jai + M' t _i0i2 + Vt (43) 

where M f _i — (xf, ..., xf + 1 )' for some s > 1, and 
compute the least squares residuals 

v t = d t — — M' t _ 1 a 2 


and the sum of squared residuals SSRi = 
Ef +1 vf of the regression. The basic idea of 
the RESET test is that if the linear AR(p) model 
in equation (42) is adequate, then a k and a 2 of 
equation (43) should be zero. This can be tested 
by the usual F statistic of equation (43) given by 


(.SSRp - SSRj)/g 
SSR\/(T — p — g) 


with 


g = s + p + 1 


(44) 

which, under the linearity and normality as¬ 
sumption, has an F distribution with degrees of 
freedom g and T — p — g. 

Because 5c k for k = 2,..., s + 1 tend to be 
highly correlated with Xf_ \ and among them¬ 
selves, principal components of Mt~\ that are 
not colinear with X;_ i are often used in fitting 
equation (43). 

Keenan (1985) proposes a nonlinearity test for 
time series that uses if only and modifies the 
second step of the RESET test to avoid multi- 
collinearity between if and X f _i. Specifically 
the linear regression (43) is divided into two 
steps. In step 2(a), one removes linear depen¬ 
dence of if on X f _i by fitting the regression 


x t 2 = X'f.jjS + u t 


and obtaining the residual u t — if — X f _i fi. In 
step 2(b), consider the linear regression 


Xt — X'_i0 + (it (42) 


dt — Uta + Vt 
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and obtain the sum of squared residuals SSRp = 

TJ = p+ i («t - = Et= P +1 to test the nul1 

hypothesis a = 0. 

The F Test 

To improve the power of Keenan's test and 
the RESET test, Tsay (1986) uses a different 
choice of the regressor M f _ i. Specifically, he 
suggests using M t _i = vech(Xt_iX' t _^), where 
vech(A ) denotes the half-stacking vector of the 
matrix A using elements on and below the di¬ 
agonal only For example, if p =2, then Mf„i 
= (x f 2 _j, rf_ 2 )'- The dimension of M t _i 

is p(p + l)/2 for an AR(p) model. In practice, 
the test is simply the usual partial F statistic for 
testing a = 0 in the linear least squares regres¬ 
sion 

Xf = Xf_-^(p -f- -f- 

where et denotes the error term. Under the 
assumption that x t is a linear AR(p) process, the 
partial F statistic follows an F distribution with 
degrees of freedom g and T — p — g — 1, where 
g = p(p + l)/2. We refer to this F test as the 
Ori-F test. Luukkonen, Saikkonen, and 
Terasvirta (1988) further extend the test by 
augmenting M f „j with cubic terms xf_ { for 
x = 1,..., p. 

Threshold Test 

When the alternative model under study is a 
SETAR model, one can derive specific test statis¬ 
tics to increase the power of the test. One of the 
specific tests is the likelihood ratio statistic. This 
test, however, encounters the difficulty of unde¬ 
fined parameters under the null hypothesis of 
linearity because the threshold is undefined for 
a linear AR process. Another specific test seeks 
to transform testing threshold nonlinearity into 
detecting model changes. It is then interesting 
to discuss the differences between these two 
specific tests for threshold nonlinearity. 

To simplify the discussion, let us consider the 
simple case that the alternative model is a 2- 
regime SETAR model with threshold variable 
Xt-d- The null hypothesis H 0 : x t follows the lin¬ 


ear AR(p) model 

v 

Xt = (p 0 + X! 4>i x t-i + a t ( 45 ) 

i=l 

whereas the alternative hypothesis H„ : x t fol¬ 
lows the SETAR model 

_ { <t>o ] + £f= i +a lt if x t -d < xq 

Xt 1 4? + £f= 1 <P? ] x t -i + a 2t if X t _ d > xq 

(46) 

where xq is the threshold. For a given re¬ 
alization {x t }J =1 and assuming normality let 
Zo(0, <r 2 ) be the log likelihood function evalu¬ 
ated at the maximum likelihood estimates of </> 
= (<po , ■ ■ ■ ,<p v )' and er fl 2 This is easy to compute. 
The likelihood function under the alternative 
is also easy to compute if the threshold iq is 
given. Let h{^\',(pi, h, 2 ; (j> 2 - & 2 ) be the log likeli¬ 
hood function evaluated at the maximum like¬ 
lihood estimates of (pi = (<pg \ ■ ■ ■, ^p )' and cr 2 
conditioned on knowing the threshold r\. The 
log likelihood ratio /(?q) defined as 

Kn) = h{r\,(pi, ct 2 ;0 2 , 0 2 ) - io(0. $) 

is then a function of the threshold xq, which is 
unknown. Yet under the null hypothesis, there 
is no threshold and xq is not defined. The param¬ 
eter xq is referred to as a nuisance parameter 
under the null hypothesis. Consequently, the 
asymptotic distribution of the likelihood ratio 
is very different from that of the conventional 
likelihood ratio statistics. (See Chan, 1991, for 
further details and critical values of the test.) A 
common approach is to use / max = supj, <ri<u 
l(ri) as the test statistic, where v and u are pre¬ 
specified lower and upper bounds of the thresh¬ 
old. Davis (1987) and Andrews and Ploberger 
(1994) provide further discussion on hypothe¬ 
sis testing involving nuisance parameters under 
the null hypothesis. Simulation is often used to 
obtain empirical critical values of the test statis¬ 
tic / max , which depends on the choices of v and 
u. The average of Z(?q) over xq e [ v , u] is also 
considered by Andrews and Ploberger as a test 
statistic. 
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Tsay (1989) makes use of arranged autore¬ 
gression and recursive estimation tio derive an 
alternative test for threshold nonlinearity. The 
arranged autoregression seeks to transfer the 
SETAR model under the alternative hypothe¬ 
sis H a into a model change problem with the 
threshold ri serving as the change point. To 
see this, the SETAR model in equation (46) says 
that Xt follows essentially two linear models de¬ 
pending on whether x t -d < T\ or Xt _d > ri. For 
a realization {xt}J =1 , Xt-d can assume values 
{xi,X T -d}- Let x ( i) < X( 2 ) < < X( T -d) be 

the ordered statistics of {xt}J=-f (i.e., arranging 
the observations in increasing order). The SE¬ 
TAR model can then be written as 

v 

x (j)+d = Po + Pi x (j)+d-i + rt 0')+d, 

i= i (47) 

j = 1,... ,T — d 

where $ = </>[' 1 if < r\ and /l, = (j)f ] if X(j) > 
ri. Consequently, the threshold ri is a change 
point for the linear regression in equation (47), 
and we refer to equation (47) as an arranged au¬ 
toregression (in increasing order of the thresh¬ 
old Xt-d). Note that the arranged autoregression 
in (47) does not alter the dynamic dependence 
of Xt on x t -i for i = 1,..., p because X( ])+ d still de¬ 
pends on X(j) + d-i for i = 1,..., p. What is done 
is simply to present the SETAR model in the 
threshold space instead of in the time space. 
That is, the equation with a smaller Xt-d appears 
before that with a larger x t -d- The threshold test 
of Tsay (1989) is obtained as follows. 

• Step 1. Fit equation (47) using j = 1,..., m, 
where m is a prespecified positive integer 
(e.g., 30). Denote the least squares estimates 
of Pi by Pi. m , where m denotes the number of 
data points used in estimation. 

• Step 2. Compute the predictive residual 

v 

d (m+1 )+d — x (m+1 )+d Pi.), in ^ ' fii,m x (m+) )+d—i 

i=l 

and its standard error. Let §( m+ i)+d be the stan¬ 
dardized predictive residual. 


* Step 3. Use the recursive least squares method 
to update the least squares estimates to A,m+i 
by incorporating the new data point X(m+i)+d- 

* Step 4. Repeat steps 2 and 3 until all data 
points are processed. 

* Step 5. Consider the linear regression of the 
standardized predictive residual 

v 

£(m+j)+d — a 0 + a iX( m +j)+d-i + U, 

i=l ( 4 °] 

j = 1,..., T — d — m 

and compute the usual F statistic for testing 
on = 0 in equation (48) for i — 0,... ,p. Under 
the null hypothesis that Xt follows a linear 
AR(p) model, the F ratio has a limiting F dis¬ 
tribution with degrees of freedom p + 1 and 
T — d — m — p. 

We refer to the earlier F test as a TAR-F test. 
The idea behind the test is that under the null 
hypothesis there is no model change in the ar¬ 
ranged autoregression in equation (47) so that 
the standardized predictive residuals should be 
close to IID with mean zero and variance 1. In 
this case, they should have no correlations with 
the regressors X( m +i)+d-b For further details in¬ 
cluding formulas for a recursive least squares 
method and some simulation study on perfor¬ 
mance of the TAR-F test, see Tsay (1989). The 
TAR-F test avoids the problem of nuisance pa¬ 
rameters encountered by the likelihood ratio 
test. It does not require knowing the threshold 
r\. It simply tests that the predictive residuals 
have no correlations with regressors if the null 
hypothesis holds. Therefore, the test does not 
depend on knowing the number of regimes in 
the alternative model. Yet the TAR-F test is not 
as powerful as the likelihood ratio test if the 
true model is indeed a 2-regime SETAR model 
with a known innovational distribution. 

Applications 

In this subsection, we apply some of the non¬ 
linearity tests discussed previously to five time 
series. For a real financial time series, an AR 
model is used to remove any serial correlation 
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Table 2 Nonlinearity Tests for Simulated Series and Some Log Stock Returns 



Q 

Q 


BDS (<$ = 

: 1.5<t„) 


Data 

(5) 

(10) 

2 

3 

4 

5 

N(0,1) 

3.2 

6.5 

-0.32 

-0.14 

-0.15 

-0.33 

*6 

0.9 

1.7 

-0.87 

-1.18 

-1.56 

-1.71 

ln(ew) 

2.9 

4.9 

9.94 

11.72 

12.83 

13.65 

ln(vw) 

1.0 

9.8 

8.61 

9.88 

10.70 

11.29 

ln(ibm) 

0.6 

7.1 

4.96 

6.09 

6.68 

6.82 



d = 1 


BDS(<5: 

= &a) 


Data 

Ori-F 

TAR-F 

2 

3 

4 

5 

N(0,1) 

1.13 

0.87 

-0.77 

-0.71 

-1.04 

-1.27 

^6 

0.69 

0.81 

-0.35 

-0.76 

-1.25 

-1.49 

ln(ew) 

5.05 

6.77 

10.01 

11.85 

13.14 

14.45 

ln(vw) 

4.95 

6.85 

7.01 

7.83 

8.64 

9.53 

ln(ibm) 

1.32 

1.51 

3.82 

4.70 

5.45 

5.72 


Note: The sample size of simulated series is 500 and that of stock returns is 864. The BDS 
test uses k = 2,..., 5. 


in the data, and the tests apply to the residual 
series of the model. The five series employed 
are as follows: 

1. r\ t : A simulated series of IID N(0,1) with 500 
observations. 

2. f 2 f: A simulated series of IID Student-f distri¬ 
bution with 6 degrees of freedom. The sam¬ 
ple size is 500. 

3. cist'- The residual series of monthly log returns 
of CRSP equal-weighted index from 1926 to 
1997 with 864 observations. The linear AR 
model used is 

(1 - 0.180B + 0.099B 3 - 0.105B> 3t 
= 0.0086 + a 3t 

4. da : The residual series of monthly log returns 
of CRSP value-weighted index from 1926 to 
1997 with 864 observations. The linear AR 
model used is 

(1 - 0.098B + 0.111B 3 - 0.088B 5 )r 4t 
= 0.0078 + da 

5. « 5 f: The residual series of monthly log returns 
of IBM stock from 1926 to 1997 with 864 ob¬ 
servations. The linear AR model used is 

(1 — 0.077B)r 5f = 0.011 + a 3t 


Table 2 shows the results of the nonlinearity 
test. For the simulated series and IBM returns, 
the F tests are based on an AR(6) model. For the 
index returns, the AR order is the same as the 
model given earlier. For the BDS test, we chose 
8 = i% and 8 = 1.5cr a with k = 2,..., 5. Also 
given in the table are the Ljung-Box statistics 
that confirm no serial correlation in the residual 
series before applying nonlinearity tests. Com¬ 
pared with their asymptotic critical values, the 
BDS test and F tests are insignificant at the 5% 
level for the simulated series. However, the BDS 
tests are highly significant for the real financial 
time series. The F tests also show significant 
results for the index returns, but they fail to 
suggest nonlinearity in the IBM log returns. In 
summary, the tests confirm that the simulated 
series are linear and suggest that the stock re¬ 
turns are nonlinear. 

1 MODELING 

Nonlinear time series modeling necessarily in¬ 
volves subjective judgment. However, there are 
some general guidelines to follow. It starts with 
building an adequate linear model on which 
nonlinearity tests are based. For financial time 
series, the Ljung-Box statistics and Engle's test 
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are commonly used to detect conditional het- 
eroscedasticity. For general series, other tests 
discussed in the previous section apply. If non¬ 
linearity is statistically significant, then one 
chooses a class of nonlinear models to entertain. 
The selection here may depend on the experi¬ 
ence of the analyst and the substantive matter 
of the problem under study. 

For volatility models, the order of an ARCFI 
process can often be determined by check¬ 
ing the partial autocorrelation function of the 
squared series. For GARCFI and exponential 
GARCFI models, only lower orders such as 
(1,1), (1,2), and (2,1) are considered in most ap¬ 
plications. Fligher order models are hard to es¬ 
timate and understand. For TAR models, one 
may use the procedures given in Tong (1990) 
and Tsay (1989, 1998) to build an adequate 
model. When the sample size is sufficiently 
large, one may apply nonparametric techniques 
to explore the nonlinear feature of the data and 
choose a proper nonlinear model accordingly; 
see Chen and Tsay (1993a) and Cai, Fan, and 
Yao (2000). The MARS procedure of Lewis and 
Stevens (1991) can also be used to explore the 
dynamic structure of the data. 

Finally, information criteria such as the 
Akaike information criterion (Akaike, 1974) 
and the generalized odd ratios in Chen, McCul¬ 
loch, and Tsay (1997) can be used to discrimi¬ 
nate between competing nonlinear models. The 
chosen model should be carefully checked be¬ 
fore it is used for prediction. 


FORECASTING 

Unlike the linear model, there exist no closed- 
form formulas to compute forecasts of most 
nonlinear models when the forecast horizon is 
greater than 1. We use parametric bootstraps to 
compute nonlinear forecasts. It is understood 
that the model used in forecasting has been rig¬ 
orously checked and is judged to be adequate 
for the series under study. By a model, we mean 
the dynamic structure and innovational distri¬ 


butions. In some cases, we may treat the esti¬ 
mated parameters as given. 

Parametric Bootstrap 

Let T be the forecast origin and t be the forecast 
horizon (£ > 0). That is, we are at time index 
T and interested in forecasting Xr + t- The para¬ 
metric bootstrap considered computes realiza¬ 
tions Xj + i,, X T+i sequentially by (a) drawing a 
new innovation from the specified innovational 
distribution of the model, and (b) computing 
x T+1 using the model, data, and previous fore¬ 
casts x T+ i, ..., Xj+i-i- This results in a realiza¬ 
tion for Xj + i. The procedure is repeated M times 
to obtain M realizations of Xr+e denoted by 
i x T+e lyLi* The point forecast of X^+r is then the 
sample average of . Let the forecast be x T (£) ■ 
We used M = 3000 in some applications and the 
results seem fine. The realizations can 

also be used to obtain an empirical distribution 
of x-j' + f . We make use of this empirical distribu¬ 
tion later to evaluate forecasting performance. 

Forecasting Evaluation 

There are many ways to evaluate the fore¬ 
casting performance of a model, ranging from 
directional measures to magnitude measures 
to distributional measures. A directional 
measure considers the future direction (up 
or down) implied by the model. Predicting 
that tomorrow's S&P 500 index will go up or 
down is an example of directional forecasts 
that are of practical interest. Predicting the 
year-end value of the daily S&P 500 index 
belongs to the case of magnitude measure. 
Finally, assessing the likelihood that the daily 
S&P 500 index will go up 10% or more between 
now and the year end requires knowing the 
future conditional probability distribution of 
the index. Evaluating the accuracy of such an 
assessment needs a distributional measure. 

In practice, the available data set is divided 
into two subsamples. The first subsample of the 
data is used to build a nonlinear model, and 
the second subsample is used to evaluate the 
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forecasting performance of the model. We re¬ 
fer to the two subsamples of data as estimation 
and forecasting subsamples. In some studies, a 
rolling forecasting procedure is used in which 
a new data point is moved from the forecast¬ 
ing subsample into the estimation subsample 
as the forecast origin advances. In what fol¬ 
lows, we briefly discuss some measures of fore¬ 
casting performance that are commonly used 
in the literature. Keep in mind, however, that 
there exists no widely accepted single measure 
to compare models. A utility function based on 
the objective of the forecast might be needed to 
better understand the comparison. 

Directional Measure 

A typical measure here is to use a 2 x 2 con¬ 
tingency table that summarizes the number of 
"hits" and "misses" of the model in predict¬ 
ing ups and downs of Xj+i in the forecasting 
subsample. Specifically, the contingency table 
is given as 


Actual 

Predicted 



up 

down 


up 

m n 

mu 

m 10 

down 

m 2 1 

W 22 

m 20 


mo 1 

m 0 2 

m 


where m is the total number of f-step ahead 
forecasts in the forecasting subsample, win is the 
number of "hits" in predicting upward move¬ 
ments, OT 21 is the number of "misses" in predict¬ 
ing downward movements of the market, and 
so on. Larger values in mn and WI 22 indicate 
better forecasts. The test statistic 

_ y' y' ( m ij - ritiomoj/m) 2 
m i0 m 0 j/m 

can then be used to evaluate the performance of 
the model. A large y 2 signifies that the model 
outperforms the chance of random choice. Un¬ 
der some mild conditions, / 2 has an asymptotic 
chi-squared distribution with 1 degree of free¬ 


dom. For further discussion of this measure, see 
Dahl and Hylleberg (1999). 

For illustration of the directional measure, 
consider the 1-step ahead probability forecasts 
of the 8-4-1 feed-forward neural network shown 
in Figure 9. The 2x2 table of "hits" and 
"misses" of the network is 


Actual 

Predicted 


up down 

up 

12 2 

14 

down 

8 2 

10 


20 4 

24 


The table shows that the network predicts the 
upward movement well, but fares poorly in 
forecasting the downward movement of the 
stock. The chi-squared statistic of the table 
is 0.137 with 77-value 0.71. Consequently, the 
network does not significantly outperform a 
random-walk model with equal probabilities 
for "upward" and "downward" movements. 

Magnitude Measure 

Three statistics are commonly used to measure 
performance of point forecasts. They are the 
mean squared error (MSE), mean absolute de¬ 
viation (MAD), and mean absolute percentage 
error (MAPE). For t-step ahead forecasts, these 
measures are defined as 

- m— 1 

MSE(l) = -J2 l XT +*+i - X7 >/(f)] 2 (49) 
m ;=0 
^ m —1 

MAD(l) = — I XT+i+j - x T +j(l) I (50) 
m /=0 

MAPE(l) = - V | %T+i (£) - 11 (51) 

where m is the number of f-step ahead forecasts 
available in the forecasting subsample. 

In application, one often chooses one of the 
above three measures, and the model with 
the smallest magnitude on that measure is re¬ 
garded as the best f-step ahead forecasting 
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Figure 11 Time Plot of the U.S. Quarterly Unemployment Rate, Seasonally Adjusted, from 1948 to 1993 


model. It is possible that different l may re¬ 
sult in selecting different models. The measures 
also have other limitations in model compar¬ 
ison; see, for instance, Clements and Hendry 
(1993). 

Distributional Measure 

Practitioners recently began to assess forecast¬ 
ing performance of a model using its predictive 
distributions. Strictly speaking, a predictive dis¬ 
tribution incorporates parameter uncertainty in 
forecasts. We call it conditional predictive dis¬ 
tribution if the parameters are treated as fixed. 
The empirical distribution of Xj+i obtained by 
the parametric bootstrap is a conditional pre¬ 
dictive distribution. This empirical distribution 
is often used to compute a distributional mea¬ 
sure. Let uj(l) be the percentile of the observed 
Xj +t in the prior empirical distribution. We then 
have a set of m percentiles {u T+ j(£)}J^, where 
again m is the number of f-step ahead forecasts 
in the forecasting subsample. If the model en¬ 
tertained is adequate, {tir+j (f)} should be a 
random sample from the uniform distribution 
on [0, 1]. For a sufficiently large m, one can 
compute the Kolmogorov-Smirnov statistic of 
{ur+j {(-)} with respect to uniform [0, 1]. The 
statistic can be used for both model checking 
and forecasting comparison. 


2 APPLICATION 

In this section, we illustrate nonlinear time se¬ 
ries models by analyzing the quarterly U.S. 
civilian unemployment rate, seasonally ad¬ 
justed, from 1948 to 1993. This series was an¬ 
alyzed in detail by Montgomery, Zarnowitz, 
Tsay, and Tiao (1998). We repeat some of the 
analyses here using nonlinear models. Figure 11 
shows the time plot of the data. Well-known 
characteristics of the series include that (a) it 
tends to move countercyclically with U.S. busi¬ 
ness cycles, and (b) the rate rises quickly but 
decays slowly. The latter characteristic suggests 
that the dynamic structure of the series is non¬ 
linear. 

Denote the series by x t and let A.r, = x t —xt ~i 
be the change in unemployment rate. The linear 
model 

(1 - 0.31B 4 )(1 - 0.65B)Ax f = (1 - 0.78B 4 )a f , 
<r„ 2 = 0.090 

(52) 

was built by Montgomery et al. (1998), where 
the standard errors of the three coefficients are 
0.11, 0.06, and 0.07, respectively. This is a sea¬ 
sonal model even though the data were sea¬ 
sonally adjusted. It indicates that the seasonal 
adjustment procedure used did not successfully 
remove the seasonality. This model is used as a 
benchmark model for forecasting comparison. 
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Table 3 Nonlinearity Test for Changes in the U.S. Quarterly Unemployment Rate: 1948.II-1993.IV 


Type 

Ori-F 

LST 

TAR(l) 

TAR(2) 

TARO) 

TARO) 

Test 

2.80 

2.83 

2.41 

2.16 

2.84 

2.98 

p value 

.0007 

.0002 

.0298 

.0500 

.0121 

.0088 


Note : An AR(5) model was used in the tests, where LST denotes the test of Luukkonen et al. (1988) 
and TAR(d) means threshold test with delay d. 


To test for nonlinearity, we apply some of the 
nonlinearity tests discussed earlier in this entry 
with an AR(5) model for the differenced series 
Ax t . The results are given in Table 3. All of the 
tests reject the linearity assumption. In fact, the 
linearity assumption is rejected for all AR(p) 
models we applied, where p — 2,..., 10. 

Using a modeling procedure similar to that of 
Tsay (1989), Montgomery et al. (1998) build the 
following TAR model for the Ax t series: 

' 0.01 + 0.73Axt_i + 0.10AXf_ 2 + a u 
_ if Ax t -2 < 0.1, 

X> 0.18 + 0.80 Axt-i — 0.56AXf_ 2 + a 2 t 
otherwise 

(53) 

The sample variances of a\ t and a 2 t are 0.76 and 
0.165, respectively, the standard errors of the 
three coefficients of regime 1 are 0.03, 0.10, and 
0.12, respectively, and those of regime 2 are 0.09, 
0.1, and 0.16. This model says that the change 
in the U.S. quarterly unemployment rate, A Xt, 
behaves like a piecewise linear model in the ref¬ 
erence space of Xt —2 — x t -3 with threshold 0.1. 
Intuitively, the model implies that the dynamics 
of unemployment act differently depending on 
the recent change in the unemployment rate. In 
the first regime, the unemployment rate has had 
either a decrease or a minor increase. Here the 
economy should be stable, and essentially the 
change in the rate follows a simple AR(1) model 
because the lag-2 coefficient is insignificant. In 
the second regime, there is a substantial jump 
in the unemployment rate (0.1 or larger). This 
typically corresponds to the contraction phase 
in the business cycle. It is also the period during 
which government interventions and industrial 
restructuring are likely to occur. Here Axt fol¬ 


lows an AR(2) model with a positive constant, 
indicating an upward trend in Xt . The AR(2) 
polynomial contains two complex characteris¬ 
tic roots, which indicate possible cyclical behav¬ 
ior in Ax t . Consequently, the chance of having 
a turning point in Xt increases, suggesting that 
the period of large increases in xt should be 
short. This implies that the contraction phases 
in the U.S. economy tend to be shorter than the 
expansion phases. 

Applying a Markov chain Monte Carlo 
method, Montgomery et al. (1998) obtain the 
following Markov switching model for A Xt: 

' -0.07 + 0.38Ax f _i - 0.05 Ax,_ 2 + e lt 
if s f = 1 

Ax f = 

0.16 + 0.86AXf_i — 0.38AXf_ 2 + ( 2 : 

if Sf = 2 

(54) 

The conditional means of Axt are —0.10 for 
St = 1 and 0.31 for s t — 2. Thus, the first state rep¬ 
resents the expansionary periods in the econ¬ 
omy, and the second state represents the con¬ 
tractions. The sample variances of C\ t and e 2 t 
are 0.031 and 0.192, respectively. The standard 
errors of the three parameters in state St = 1 
are 0.03, 0.14, and 0.11, and those of state St 
= 2 are 0.04, 0.13, and 0.14, respectively. The 
state transition probabilities are P(Sf = 2|s t _i 
= 1) = 0.084(0.060) and P(s t = l|s t _i = 2) = 
0.126(0.053), where the number in parentheses 
is the corresponding standard error. This model 
implies that in the second state the unemploy¬ 
ment rate x t has an upward trend with an AR(2) 
polynomial possessing complex characteristic 
roots. This feature of the model is similar to 
the second regime of the TAR model in equa¬ 
tion (53). In the first state, the unemployment 
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rate x f has a slightly decreasing trend with a 
much weaker autoregressive structure. 

Forecasting Performance 

A rolling procedure was used by Montgomery 
et al. (1998) to forecast the unemployment rate 
x t . The procedure works as follows: 

1. Begin with forecast origin T = 83, corre¬ 
sponding to 1968.11, which was used in the 
literature to monitor the performance of var¬ 
ious econometric models in forecasting un¬ 
employment rate. Estimate the linear, TAR, 
and MSA models using the data from 1948.1 
to the forecast origin (inclusive). 

2. Perform 1-quarter to 5-quarter ahead fore¬ 
casts and compute the forecast errors of 
each model. Forecasts of nonlinear models 
used are computed by using the parametric 
bootstrap method explained earlier in this 
entry. 

3. Advance the forecast origin by 1 and repeat 
the estimation and forecasting processes un¬ 
til all data are employed. 

4. Use MSE and mean forecast error to compare 
performance of the models. 

Table 4 shows the relative MSE of forecasts 
and mean forecast errors for the linear model in 
equation (52), the TAR model in equation (53), 
and the MSA model in equation (54), using the 
linear model as a benchmark. The comparisons 
are based on overall performance as well as the 
status of the U.S. economy at the forecast origin. 
From the table, we make the following obser¬ 
vations: 

1. For the overall comparison, the TAR model 
and the linear model are very close in MSE, 
but the TAR model has smaller biases. Yet 
the MSA model has the highest MSE and 
smallest biases. 

2. For forecast origins in economic contrac¬ 
tions, the TAR model shows improvements 
over the linear model both in MSE and bias. 
The MSA model also shows some improve¬ 
ment over the linear model, but the improve¬ 
ment is not as large as that of the TAR model. 


Table 4 Out-of-Sample Forecast Comparison Among 
Linear, TAR, and MSA Models for the U.S. Quarterly 
Unemployment Rate 


(A) 


Relative MSE of Forecast 


Model 

1-step 

2-step 

3-step 

4-step 

5-step 

(a) Overall Comparison 

Linear 

1.00 

1.00 

1.00 

1.00 

1.00 

TAR 

1.00 

1.04 

0.99 

0.98 

1.03 

MSA 

1.19 

1.39 

1.40 

1.45 

1.61 

MSE 

0.08 

0.31 

0.67 

1.13 

1.54 


(b) Forecast Origins in Economic 
Contractions 


Linear 

1.00 

1.00 

1.00 

1.00 

1.00 

TAR 

0.85 

0.91 

0.83 

0.72 

0.72 

MSA 

0.97 

1.03 

0.96 

0.86 

1.02 

MSE 

0.22 

0.97 

2.14 

3.38 

3.46 


(c) Forecast Origins 

in Economic Expansions 

Linear 

1.00 

1.00 

1.00 

1.00 

1.00 

TAR 

1.06 

1.13 

1.10 

1.15 

1.17 

MSA 

1.31 

1.64 

1.73 

1.84 

1.87 

MSE 

0.06 

0.21 

0.45 

0.78 

1.24 

(B) 


Mean of Forecast Errors 


Model 

1-step 

2-step 

3-step 

4-step 

5-step 

(a) Overall Comparison 

Linear 

0.03 

0.09 

0.17 

0.25 

0.33 

TAR 

-0.10 

-0.02 

-0.03 

-0.03 

-0.01 

MSA 

0.00 

-0.02 

-0.04 

-0.07 

-0.12 


(b) Forecast Origins in Economic 



Contractions 




Linear 

0.31 

0.68 

1.08 

1.41 

1.38 

TAR 

0.24 

0.56 

0.87 

1.01 

0.86 

MSA 

0.20 

0.41 

0.57 

0.52 

0.14 


(c) Forecast Origins 

in Economic Expansions 

Linear 

-0.01 

0.00 

0.03 

0.08 

0.17 

TAR 

-0.05 

-0.11 

-0.17 

-0.19 

-0.14 

MSA 

-0.03 

-0.08 

-0.13 

-0.17 

-0.16 

Note: The starting forecast origin is 1968.11, where the 


row marked by "MSE" shows the MSE of the benchmark 
linear model. 

3. For forecast origins in economic expansions, 
the linear model outperforms both nonlinear 
models. 

The results suggest that the contributions of 
nonlinear models over linear ones in forecast¬ 
ing the U.S. quarterly unemployment rate are 
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mainly in the periods when the U.S. econ¬ 
omy is in contraction. This is not surprising 
because, as mentioned before, it is during the 
economic contractions that government inter¬ 
ventions and industrial restructuring are most 
likely to occur. These external events could in¬ 
troduce nonlinearity in the U.S. unemployment 
rate. Intuitively, such improvements are im¬ 
portant because it is during the contractions 
that people pay more attention to economic 
forecasts. 


KEY POINTS 

• Nonlinearity exists in many financial data, in¬ 
cluding log returns of widely used market in¬ 
dexes such as CRSP equal- and value-weight 
indexes. 

• Nonlinearity also appears in asset volatility. 
Indeed, simple threshold models such as the 
threshold GARCH model can be used to bet¬ 
ter describe the behavior of asset volatility. 
The model has been used to model the lever¬ 
age effect between return and volatility. 

• Simple nonparametric methods such as the 
local linear regression method can be used to 
provide a deeper understanding of interest 
rate dynamics. 

• The unemployment rate example shows that, 
even though nonlinear models may not out¬ 
perform linear ones in all forecast origins, 
they can provide more accurate forecasts 
when the U.S. economy is under contraction. 
This is useful because people in general pay 
more attention to forecasts during economic 
recession. 

• Among the nonlinear models, the Markov 
switching model has the smallest bias in out- 
of-sample prediction. The model, however, 
has a larger mean square of forecast errors 
than the threshold autoregressive model. This 
behavior is consistent with the structure of the 
model because the true states of the economy 
are never certain under the switching model. 
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Abstract: The Theil-Sen estimator is an exceptionally simple and robust linear regression estimator, 
affording estimates of slope and intercept that are virtually identical to their ordinary least squares 
counterparts in the absence of outliers, but which do not change appreciably in the presence of 
outliers. In fact, with univariate data, it improves on ordinary least squares in almost every way 
imaginable, and it is therefore a striking fact that this remarkable estimator is not universally known 
and used. It can be used to derive robust estimates of beta and the correlation coefficient that are 
virtually identical to their classical counterparts when asset returns are normally distributed, and 
which are significantly more robust when asset returns are highly skewed or contaminated with 
outliers. 


Point estimates of betas and correlations are 
most often obtained using ordinary least squares 
(OLS) and the standard maximum likelihood 
estimator, respectively. While these estimators 
are clearly optimal when asset returns are nor¬ 
mally distributed, and when we hold no view 
on their prior distribution, they can be far from 
optimal when these conditions are not satisfied. 
In this entry, a novel explanation of OLS is pro¬ 
vided and is then used to motivate a robust uni¬ 
variate regression algorithm due to Theil (1950) 
and Sen (1968). This estimator is then used to 
obtain remarkably robust (i.e., outlier resistant) 
estimates of asset betas, asset correlations, and 
non-negative definite correlation and covariance 
matrices. 


OLS REVISITED 

Generations of students have learned OLS in 
the way depicted pictorially in Figure 1. We are 
given a set of points, each with an abscissa (or 
x value) and an ordinate (or y value), and which 
are displayed on a scatter plot in the X — Y 
plane. All errors are assumed to be concentrated 
in the ordinates. The abcissae are assumed to be 
known with certainty. The i th point has coordi¬ 
nates (x;, yi), and the collection of points visu¬ 
ally evidences a noisy, but linear, relationship 
between the x's and the y's. The object of OLS 
is to find a straight line, the line of best fit, with 
slope Pols and intercept ocolS/ and which mini¬ 
mizes the sum of squared vertical distances (or 
errors) from the points to this line. 
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Figure 1 Ordinary Least Squares—Classical Depiction 


This pictorial representation has become so 
firmly embedded in our consciousness that 
we take its geometry and its formulation for 
granted. But consider that the method dates 
back to 1800, and the fact that it was inde¬ 
pendently discovered by Carl Friedrich Gauss 
and Joseph-Louis Lagrange, who surely rank 
among the greatest mathematicians of all time, 
and it should come as no surprise that this text¬ 
book depiction of OLS hides more than one se¬ 
cret. In this section, we expose two of its secrets. 


We start our exploration of OLS with Figure 2, 
which plots the same set of points as Figure 1, 
but now, instead of drawing a single line of 
best fit through the entire data set, we choose 
two specific points, (x ,, y;) and (x j, y,), draw the 
unique straight line joining them and project it 
back till it intersects the Y axis. This line has 
slope Pij and intercept where fi,, and are 
given by: 



Figure 2 Ordinary Least Squares—Alternative Depiction 
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and 


Xi X l/j - Xj X \Ji 


Xj - X; 


( 2 ) 


On comparing Figures 1 and 2, it is clear 
that Pols must necessarily lie between min /L 

and max /L-, both endpoints inclusive, and that 

i.j 

aoLS must likewise lie between min a,; and 

>•/ 

max a,, . The OLS slope and intercept can there- 

hi 

fore be written as weighted averages of all 
(nJ = N (N-i) p a j rw j se slopes and intercepts for 
some nonnegative sets of weights, that is. 


Pols = = w 'i^° 

' i i i 

(3) 

and 

OiOLS = v ‘i a ‘i - J2J2 Vi i = 1 ’ Vi i-° 

1 i ‘ i 

(4) 

In any particular situation, these weights are 
not unique, as equations (3) and (4) are enor¬ 
mously overdetermined, and we therefore seek 
a set of strictly positive weights that simulta¬ 
neously solves both equations for an arbitrary 
collection of points. Such a set of weights can 
be identified using some clever guesswork mo¬ 
tivated by the following observation: If (xj, y,) 
and (xj, ijj ) are close together, then any error 
in either ordinate will induce significant errors 
in Pij and a,j. Pairs of points that are far apart 
are much less susceptible to this problem. We 
ought, therefore, to overweight slopes and in¬ 
tercepts derived from pairs of points that are 
far apart relative to those that are derived from 
pairs of points that are close together. 

Next, as all the error is concentrated in the 
abscissae, and as the ordinates are known with 
certainty, the weights must be a function only of 
(xj — Xj )—they cannot depend on (y, — y ; ). Fi¬ 
nally, the function must be even, because some 
weights would be negative if it were odd. Some 
tedious and not particularly enlightening alge¬ 


bra shows our intuition to be correct, that is, 

\2 


Wjj = Vij 


( x i - x i) 

J2H( x k- x i) 2 

k 1 


(5) 


It follows that 


12 12 Pij ( X i X j) 

Pols = ‘ J ' -72 ( 6 ) 

J2J2i x k-x,) 
k l 


and 


«OLS = 


1212 ^jj i x i x j ) 

' j _ 

J 212 ( x k -*/) 2 

k l 


(7) 


Equations (6) and (7) yield OLS' first little 
secret—the line of best fit is just an appropri¬ 
ately weighted average of all possible lines that 
could be drawn using this data set! While this 
argument does not readily extend to the mul¬ 
tivariate case, it does give us a fresh perspec¬ 
tive on OLS, which now stands exposed as a 
clever and computationally efficient weighting 
scheme over the set of unique straight lines 
drawn through all possible pairs of points. A 
proof of this result, which is usually credited to 
Sen (1968), can be found in Fleitman and Ord 
(1985). 


Its second little secret lies in its focus on 
squared errors. Why should it be the second, 
and not the fourth or the sixty-fourth power 
of the error that is minimized? To answer this 
question, recall the way in which the OLS slope 
and intercept are defined: 


o'OLS- Pols = argmin J2 errorf 

i 

= arg min J2 (jA - aors - Pols x X;) 2 

( 8 ) 

Solving this minimization problem requires 
us to compute the partial derivatives of the 
sum of squared errors with respect to aoLS and 
Pols, and to equate the resulting expressions to 
0. This results in a set of linear equations that 
can be solved in closed form (the solution was 
known to both Gauss and Lagrange). If, how¬ 
ever, the error is raised to a power other than 
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two, we would have to solve a set of nonlin¬ 
ear equations, which do not, in general, admit 
a closed form solution—they must be solved 
numerically on a computer, a tool that neither 
Gauss nor Lagrange had access to. That said, 
raising the error to any even power (or even 
making it the argument of any even function) 
and then performing the indicated minimiza¬ 
tion numerically will result in a line that is op¬ 
timal under that measure, though its slope and 
intercept will not, in general, equal the OLS 
slope and intercept. 

All of this leads to our second insight—the 
mathematical formulation of OLS is driven by 
thoroughly practical considerations. In 1800, 
anything else simply couldn't be (and for 
that matter, still can't be) solved analytically! 
Having exposed these two little secrets of OLS, 
we now describe a better way in which to 
compute univariate regressions and explore its 
application to the estimation of beta and the cor¬ 
relation coefficient, as well as to the estimation 
of positive definite correlation and covariance 
matrices. 


THEIL-SEN REGRESSION 

The Theil-Sen regression algorithm (Thiel, 1950; 
Sen, 1968) is a robust alternative to univariate 
OLS that performs particularly well in the pres¬ 
ence of outliers (loosely, in the presence of large, 
sporadic errors that are anything but Gaussian). 
It has long been known that OLS is acutely sen¬ 
sitive to errors in its inputs, and it is immedi¬ 
ately apparent from equations (6) and (7) that 
even a single outlier can induce arbitrarily large 
errors in p OL s and a OLS . 

Theil (1950) and Sen (1968) propose a novel 
solution to this problem—they propose using 
the median of all ^ ;V j = N ^ -1> slopes to esti¬ 
mate the slope of the regression line, and choose 
the intercept to force the median error to 0. The 
primary difference between their methods is 
that Theil uses all available observations, while 
Sen restricts attention to the subset of observa¬ 
tions with distinct abscissae, that is, the set of 


points for which X; ^ Xy, and replaces each set 
of points that share the same abscissa with a 
single point whose ordinate is the average of 
their ordinates. 

Formally, the Theil-Sen estimates of slope and 
intercept are given by: 

Pt s = median {fcj) (9) 

and 

a T s — median {y; — p T $ x x,} (10) 

hi 

This regression has been widely studied. 
Peng, Wang, and Wang (2008), for example, 
show that it is strongly consistent and superef¬ 
ficient, and derive its asymptotic distribution. 
Interestingly, the median has long been used 
as a robust estimator of the mean for symmet¬ 
ric distributions, but this appears to be the first 
known application of the median to the estima¬ 
tion of regression coefficients. 

We illustrate the superiority of Thiel-Sen re¬ 
gression over OLS via simulations, the results 
of which are displayed in Tables 1 and 2. When 
the distribution of errors is normal, the distribu¬ 
tions of Pts and cuts are almost identical to those 
of Pols an d aoLS- When the errors are drawn 
from a highly skewed distribution, or when the 
data are contaminated with significant amounts 
of noise, the distributions of Pts and ajs are far 
less variable than those of Pols and aoLS- 

These results are generated as follows. Us¬ 
ing a high-quality random number generator 
(Mersenne twister), we create two independent 
random vectors, X and Y, both of length 100, 
and drawn from one of two distributions—unit 
normal and Pareto(2). We then regress Y against 
X using both OLS and the Theil-Sen regression. 
As the vectors are independent, the distribu¬ 
tion of the slope and the intercept of the re¬ 
gression lines should be centered at 0 and E[X], 
respectively. 

We run 10,000 simulations to ensure that 
the 99% confidence intervals on our estimates 
are extremely tight (the width of the confidence 
interval is inversely proportional to the square 
root of the number of simulation runs), and 
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Table 1 Theil-Sen Regression vs. OLS: Normally Distributed Random Variables 


Percentiles 

1 

5 

10 

25 

50 

75 

90 

95 

99 

Theil-Sen Slope 

-0.26 

-0.17 

-0.14 

-0.07 

0 

0.07 

0.13 

0.18 

0.26 

Least Squares Slope 

-0.24 

-0.17 

-0.13 

-0.07 

0 

0.07 

0.13 

0.17 

0.24 

Theil-Sen Intercept 

-0.3 

-0.21 

-0.16 

-0.09 

0 

0.08 

0.16 

0.21 

0.3 

Least Squares Intercept 

-0.23 

-0.17 

-0.13 

-0.07 

0 

0.07 

0.13 

0.17 

0.23 

Theil-Sen Mean Square 

0.69 

0.77 

0.81 

0.89 

0.98 

1.08 

1.17 

1.24 

1.35 

Least Squares Mean Square Err 

0.69 

0.76 

0.81 

0.88 

0.98 

1.07 

1.16 

1.23 

1.33 

Theil-Sen Median 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Least Squares Median Error 

-0.17 

-0.12 

-0.09 

-0.05 

0 

0.05 

0.09 

0.12 

0.18 


Tables 1 and 2 display various percentiles of the 
distribution of the slope, the intercept, and the 
mean squared error (i.e., the sum of squared 
errors divided by 100) for both OLS and the 
Theil-Sen regression. 

The first simulation, for normal random 
variables, illustrates how close the Theil-Sen 
algorithm is to OLS in the special case when 
OLS is clearly optimal. The second simulation 
illustrates its robustness with Pareto(2) random 
variables, whose distribution is highly skewed, 
and whose long tails serve as proxies for 
outliers. 

When X and Y are normally distributed 
(Table 1), the median slope, the interquartile 
range for the slope (the difference between the 
75th and the 25th percentiles), and the MSE 
for the two algorithms are essentially identi¬ 
cal. The same holds true even when we look 
at a 90% range (the difference between the 5th 
and the 95th percentiles). However, when X 
and Y are drawn from a Pareto(2) distribution 
(Table 2), the performance of the two algorithms 
diverges: The interquartile range for the slope 
is 40% smaller for the Theil-Sen regression (0.06 
vs. 0.10 for OLS) and an astonishing 60% smaller 


for the 90% range (0.16 vs. 0.41), though the me¬ 
dian MSE rises by about 12%. 

The median slope remains 0 for the Theil-Sen 
regression, but exhibits a slight downward bias 
for OLS. The range of the intercept for the Thiel- 
Sen regression is slightly larger than it is for 
OLS, but this is driven entirely by the fact that 
the Theil-Sen intercept is chosen to force the 
median error to 0, while the OLS intercept is 
chosen to minimize the sum of squared errors. 

These simulations clearly show that the Theil- 
Sen regression gives up nothing to OLS when 
X and Y are normally distributed and is at a 
significant advantage when they are not. Sim¬ 
ilar results are obtained when either X or Y is 
contaminated with outliers. In all such experi¬ 
ments, the advantage of the Theil-Sen approach 
is readily apparent. In fact, it can be shown that 
as many as 29% of the data points can be cor¬ 
rupted with errors of arbitrary size before the 
Theil-Sen estimates of slope and intercept start 
to break down. 

Given these results, and the accompanying 
fact that the vast majority of regressions run in 
practice are univariate, it is surprising that the 
Theil-Sen regression is not more widely used 


Table 2 Theil-Sen Regression vs. OLS: Pareto(2) Distributed Random Variables 


Percentiles 

1 

5 

10 

25 

50 

75 

90 

95 

99 

Theil-Sen Slope 

-0.12 

-0.07 

-0.05 

-0.03 

0 

0.03 

0.06 

0.09 

0.14 

Least Squares Slope 

-0.35 

-0.18 

-0.13 

-0.07 

-0.02 

0.03 

0.13 

0.23 

0.65 

Theil-Sen Intercept 

1.17 

1.24 

1.28 

1.34 

1.41 

1.49 

1.57 

1.62 

1.74 

Least Squares Intercept 

0.98 

1.49 

1.62 

1.78 

1.95 

2.16 

2.41 

2.62 

3.44 

Theil-Sen Mean Sqr Err 

0.59 

0.88 

1.11 

1.67 

2.83 

5.46 

12.29 

22.67 

107.54 

Least Squares Mean Sqr Err 

0.52 

0.78 

0.98 

1.47 

2.53 

4.97 

11.54 

21.47 

104.8 

Theil-Sen Median Error 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Least Squares Median Error 

-1.51 

-0.97 

-0.83 

-0.64 

-0.5 

-0.4 

-0.32 

-0.28 

-0.22 
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and appreciated. This may in part be driven 
by the fact that Theil-Sen regression, unlike 
OLS, does not generalize naturally to the case 
where there are many independent variables, 
as the median is inherently a one-dimensional 
measure. 

A number of attempts have been made to 
create multivariate extensions of the Theil-Sen 
regression, the two most popular ones being 
the iterative Gauss-Seidel method described by 
Hastie and Tibshirani (1990) and the elemen¬ 
tal subset method of Oja and Niinimaa (1984), 
which is described in Rousseeuw and Leroy 
(1987). Unfortunately, neither approach is en¬ 
tirely reliable in practice, and it is easy to find 
simple examples for which they converge to the 
wrong solution, particularly when the relation¬ 
ship being modeled is nonlinear. 


ROBUST ESTIMATES 
OF BETA 

The beta of an asset Y with respect to the mar¬ 
ket portfolio X plays a central role in modem 
finance as a result of the capital asset pricing 
model (Treynor, 1961; Sharpe, 1964; Lintner, 


1965; and Mossin, 1966), and is defined to be 
Cov(X , Y) 


Py\x ■ 


( 11 ) 


This quantity is, of course, just the slope co¬ 
efficient in a univariate regression, and is pre¬ 
cisely what OLS estimates. The application of 
the Theil-Sen regression algorithm to the esti¬ 
mation of beta is obvious—the Theil-Sen esti¬ 
mate of slope ought to provide us a more robust 
estimate of the historical of a security than the 
corresponding OLS estimate. 

The advantages of the Theil-Sen estimator of 
beta are made clear by the following estimate 
of the beta of IBM around the crash of 1987. 
Starting on July 1,1987, and ending on Decem¬ 
ber 31,1987, we estimate IBM's by regressing 
its daily return for the most recent 132 days 
(or 6 months) against the corresponding daily 
return of the S&P 500. As can be seen from Fig¬ 
ure 3, the Theil-Sen estimate is far more stable 
than the OLS estimate. In particular, it does not 
jump sharply after the 20% drop in the S&P 500 
on October 19 as does the OLS beta, just as one 
would expect given its robustness. 

While this is clearly an extreme example of 
a single outlier corrupting an estimate of beta. 



Figure 3 OLS vs. Theil-Sen Estimates of Beta: July 1,1987, to December 31,1987 
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outliers in financial data are far more common 
than is usually assumed, and they are not easily 
detected, as they influence many classical esti¬ 
mators in a way that masks their presence. One 
popular method of identifying and removing 
outliers is to remove points that lie more than 
three sample standard deviations away from 
the regression line. 

Unfortunately, outliers can so distort the slope 
and intercept of the regression line, as well as 
the sample standard deviation of the errors, 
that all the points, including the outliers, will 
be found to lie within three sample standard 
deviations of the regression line! In general, fil¬ 
tering data using classical estimators to identify 
outliers works poorly in practice, and it proves 
far more effective to use estimators that are in¬ 
herently robust to outliers. 

The Theil-Sen estimate of beta can be fur¬ 
ther adjusted for the effects of nonsynchronous 
trading using the Scholes-Williams (1977) or 
Dimson (1979) corrections and can be shrunk 
cross-sectionally using a Bayesian correction 
as is done in Vasicek (1973). In each case, the 
Theil-Sen estimates of beta will provide a more 
robust point from which to start building an 
enhanced estimate of beta. 

The Dimson correction sums contemporane¬ 
ous and lagged betas for the asset to create an 
overall beta that accounts for the fact that an 
asset may have both a contemporaneous and a 
lagged response to market shocks, that is, 

^r n = (i2) 

!=0 


betas is normal and shows that the maximum a 
posteriori estimate of beta is a linear combina¬ 
tion of its initial estimate and its cross-sectional 
mean, that is. 


3 Vasicek 


J Y\X — x Py\X “1“ (1 x Paverage 


(13) 


where 


Wy 


Cross-Sectional 


WIX) 


+ cr 


2 

Cross-Sectional 


(14) 


a fi(Y\x) is the variance of p Y \x, and <T* ross _ Sectional is 
the cross-sectional variance of the betas of the 
entire universe of securities under considera¬ 
tion at this point in time. A particularly sim¬ 
ple and reasonably effective implementation of 
this method sets wy = 0.5 for all assets and at 
all points in time. Both techniques see use in 
the enhanced estimation of beta across a wide 
range of asset classes in Frazzini and Pedersen 
( 2010 ). 


ROBUST ESTIMATES OF 
CORRELATION 

To derive a robust estimate of the correlation 
coefficient, we rewrite and re-interpret the ex¬ 
pression for the correlation coefficient in a novel 
way, and then show how it can be estimated 
using two Theil-Sen regressions. Recall the def¬ 
inition of the correlation between two random 
variables X and Y: 


Px,Y = 


Cov(X, Y) 
ax x Gy 


(15) 


When using daily data, k is commonly set to 4 
(i.e., one week's data), and when using monthly 
data, it is most commonly set to 1, so as not to 
pick up spurious responses from shocks in the 
distant past. 

Vasicek's (1973) correction is a Bayesian cor¬ 
rection, which allows the user to reflect informa¬ 
tion gleaned from the (known) cross-sectional 
distribution of betas to enhance an uncondi¬ 
tional estimate of beta. In particular, Vasicek 
(1973) assumes that the prior distribution of 


where Cov(X, Y) is the covariance between X 
and Y, and ax and ay are the standard devia¬ 
tions of X and Y, respectively. This expression 
can be rewritten as 

Cov(X, Y) 2 

a\ X Gy 

Cov(X, Y) Cov(X, Y) (16) 

T2 x T2 
a x a Y 

= y/U|X X Px\Y 
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Table 3 Distribution of Theil-Sen Estimates of Correlation vs. Standard Maximum Likelihood Estimate: Normally 
Distributed Random Variables 


Percentiles 

1 

5 

10 

25 

50 

75 

90 

95 

99 

Theil-Sen correlation 

-0.24 

-0.17 

-0.14 

-0.07 

0 

0.07 

0.13 

0.18 

0.25 

Maximum Likelihood correlation 

-0.23 

-0.17 

-0.13 

-0.07 

0 

0.07 

0.13 

0.16 

0.23 


Factored in this way, the correlation coeffi¬ 
cient stands revealed as the geometric mean 
of two betas and is interpreted as follows. If 
a causal linear relationship runs from X to Y, 
(i.e., if X causes Y), the logical quantity to focus 
on is Py\x- Likewise, if a causal linear relation¬ 
ship runs from Y to X, (i.e., if Y causes X), the 
logical quantity to focus on is fix\Y- 

But when we don't know which way the cau¬ 
sation flows, or even if the relationship is lin¬ 
ear, we throw our hands up, take the geometric 
mean of these two betas, and call this quantity 
the correlation coefficient! For jointly normally 
distributed random variables, the correlation 
coefficient fully captures and encapsulates their 
covariation. For all other distributions, it serves 
merely as a useful shortcut that measures their 
covariation in a standardized way, as evidenced 
by the fact that its value is bounded between 
— 1 and 1. 

The application of the Theil-Sen regression to 
the robust estimation of correlation is now ob¬ 
vious. Given two random vectors, X and Y, first 
regress X on Y, and then regress Y on X, using 
the Theil-Sen regression both times. The geo¬ 
metric mean of the two slopes is a robust esti¬ 
mate of the correlation coefficient, that is. 

Robust _ / oTheil-Sen v oTheil-Sen /i ry\ 

Px,y — y Py\x x Px\y v 17 / 

When the random vectors are drawn from a 
normal distribution and are not corrupted by 
noise, we expect that this approach will work 


just as well as the standard maximum likeli¬ 
hood estimator. In the presence of outliers, or 
if distribution of X and Y is highly skewed, it 
ought to do much better. And so it is, as the data 
in Tables 3 and 4 demonstrate. 

Table 3 compares the performance of equation 
(16) to the standard maximum likelihood esti¬ 
mator when X and Y are drawn from a normal 
distribution, while Table 4 performs an identi¬ 
cal comparison for Pareto(2) random variables. 
Both tables are created by extending the simula¬ 
tions used to illuminate the performance of the 
Theil-Sen regression to compute correlations as 
well. 

The results follow the pattern established in 
Tables 1 and 2 for the slope coefficient. When X 
and Y are drawn from a normal distribution, the 
distribution of the Theil-Sen estimate of corre¬ 
lation is essentially identical to that of the max¬ 
imum likelihood estimate; and when they are 
drawn from a Pareto(2) distribution, the Theil- 
Sen estimate of correlation is far more stable 
than the maximum likelihood estimate. Similar 
results are obtained when either X or Y (or both) 
are contaminated with noise (i.e., with outliers). 

It is a short step from estimating individual 
correlations to estimating correlation matrices, 
and the repeated use of the Theil-Sen estima¬ 
tor across a set of random variables gives us a 
computationally inefficient but robust estimate 
of a correlation matrix p, whose /,/th element 
is denoted by pip and whose diagonal elements 


Table 4 Distribution of Theil-Sen Estimates of Correlation vs. Standard Maximum Likelihood Estimate: Pareto(2) 
Distributed Random Variables 


Percentiles 1 5 10 25 50 75 90 95 99 

Theil-Sen correlation —0.11 —0.07 —0.05 —0.03 0 0.03 0.06 0.08 0.13 

Maximum Likelihood correlation —0.15 —0.11 —0.1 —0.06 —0.02 0.04 0.12 0.19 0.37 
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are all 1. Unfortunately, there is no guarantee 
that this correlation matrix will be nonnegative 
definite. 

This, however, is no cause for alarm. It is 
relatively easy to transform this matrix into a 
nearby nonnegative definite correlation matrix 
p*. Ideally, the transformation will minimally 
distort p, and the many available solutions to 
this problem differ largely in the metric (or 
norm) that they use to measure the distance 
between p and p*. In general, they solve the 
following optimization problem: 

Minimize || p* — p || , s.t. p* is a nonnegative 
definite correlation matrix. (18) 

Lindskog (2000), Rousseeuw and Molen- 
berghs (1993), and Higham (2002) describe a 
number of different ways in which the nearest 
correlation matrix can be identified using both 
linear and nonlinear transformations of p. The 
method that seems to work best in practice is the 
iterative method described by Higham (2002), 
which iteratively identifies the closest valid cor¬ 
relation matrix under a Frobenius norm (the 
sum of squared element by element differences) 
by factoring the correlation matrix in a partic¬ 
ular way, forcing its negative eigenvalues to 
0, then recombining its constituent pieces and 
forcing its diagonal elements to 1. The algo¬ 
rithm is described here for the sake of com¬ 
pleteness and can be found in the NAG Fortran 
and C Libraries, as well as the NAG Toolbox for 
Matlab. 

We first define two operators, Ps(A) and 
P U (A) that can be applied to any symmetric ma¬ 
trix A. As A is symmetric, it admits a spectral 
decomposition A — QDQ T , where Q is orthog¬ 
onal, and D = diag ()., ) is a square matrix whose 
diagonal elements are the eigenvalues of A, and 
whose off-diagonal elements are 0. Pg(A) and 
Pi / ( A) are defined via 

P S (A) = QD*Q T , D* = max (D„, 0), and (19) 
Pu{A) — Set Du — 1, i.e. replace all diagonal 

elements of D by 1. (20) 


The algorithm proceeds as follows, with both 
X k and Y k converging linearly to p*: 

Algorithm H (Higham, 2002) 

1. AS 0 = 0, X 0 = t,Y 0 = p,k = 0 

2. While ||Yjt - X k \\ > s, Do 
a. k = k + 1 

b ■ Rk — Yjt-i — ASfc-i (Dykstra's correction 
to speed convergence) 

c. X k — P s (R k ) 

d. AS*; = X k — R k 

e. Y k = P u (X k ) 

3. p* = Y k 

It is but a short step from estimating a robust 
nonnegative definite correlation matrix to esti¬ 
mating a similarly robust nonnegative definite 
covariance matrix. Given robust estimates of the 
volatility of each security, say crf obust , we can 
form a matrix whose diagonal elements are the 
robust volatilities of the assets, and whose off- 
diagonal elements are all 0, that is, 

-^Rcbust Q 0 " 

E = ° ' ' Q (21) 

O n —Robust 

• U ° N J 

and we can then define a robust nonnegative 
definite covariance matrix C via: 

C = Xp*E (22) 

If the correlation matrix is nonnegative defi¬ 
nite, the covariance matrix described in equa¬ 
tion (22) is nonnegative definite as well. 
Rousseeuw and Croux (1993) describe a num¬ 
ber of robust estimators of volatility, their pre¬ 
ferred one being Qn(X), which is defined to be 
2.222 times the 25th percentile of the set of dis¬ 
tances {\ xi — Xj |, i < j }. They explore the prop¬ 
erties of this estimator, which is similar in spirit 
to the Hodges-Lehmann (1963) estimate of the 
mean, show that its efficiency for the normal 
distribution is high (82%), and that it is robust 
to errors of arbitrary size in approximately half 
the points. 
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The robust covariance matrix defined by equa¬ 
tion (22) can be used in a variety of applica¬ 
tions such as mean-variance portfolio analysis 
and risk budgeting. It proves remarkably useful 
in practice, as it reduces and often completely 
eliminates the need for various constraints to 
ensure positive solutions that accord with a 
thoughtful portfolio manager's intuition. 

KEY POINTS 

• The Theil-Sen regression algorithm is an 
extraordinarily simple, intuitive, and robust 
algorithm for performing univariate regres¬ 
sions. 

• The Theil-Sen estimator should be used 
routinely in place of OLS when perform¬ 
ing univariate regressions, and in place of 
the standard maximum likelihood estimator 
when estimating correlations. 

• The fact that the Theil-Sen estimator does not 
generalize naturally to multivariate regres¬ 
sion should not be held against it—the vast 
majority of regressions that are carried out in 
practice are univariate, and a wide range of 
robust algorithms that work well with multi¬ 
variate data are known. 

• The Theil-Sen regression algorithm can be 
used to obtain robust estimates of beta, which 
can be further enhanced by the application 
of Dimson's correction for nonsynchronous 
trading and Vasicek's Bayesian adjustment. 

• The robust estimates of correlation obtained 
from the Theil-Sen regression algorithm can 
be used as inputs to Higham's projection 
algorithm to estimate a nonnegative defi¬ 
nite correlation matrix. This nonnegative def¬ 
inite correlation matrix can be combined with 
Rousseeuw and Croux's robust estimator of 
volatility to estimate a nonnegative definite 
covariance matrix. This nonnegative definite 
covariance matrix is of particular use in a 
wide range of mean-variance portfolio opti¬ 
mization and risk budgeting applications, in¬ 
cluding, but not limited to, the construction 
of minimum variance portfolios. 
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Abstract: High-frequency trading (HFT) has exploded into the popular press as a major develop¬ 
ment affecting securities markets around the world. Unlike more established trading approaches 
that examine daily data and tactically rebalance portfolios every month or quarter, HFT parses trade- 
by-trade data at the highest speeds available. This typically implies that high-frequency traders 
monitor every tick of many securities concurrently and make their portfolio allocation decisions 
at lightning speeds with ultra-short investment horizons in mind. In fact, hedge fund managers 
consider strategies to be high frequency when their holding periods range from microseconds to 
several hours, without any positions held overnight. To process reams of data and make informed 
and rational decisions at such high speeds would be difficult even for the most accomplished 
traders. Thankfully, computer technology has evolved to become robust and inexpensive enough 
to aid any willing portfolio manager to take up the high-frequency craft. 


This entry examines high-frequency data, the 
particularities and opportunities they bring, 
and compares these data with their low- 
frequency counterparts, wherever appropriate. 
High-frequency trading (HFT) strategies by their 
nature use a different population of data, and 
the traditional methods of data analysis need 
to be adjusted accordingly. Specifically, this 
entry examines the topics of volume, time¬ 
spacing, and bid-ask-bounce inherent in the 
high-frequency data. 

WHAT ARE 

HIGH-FREQUENCY DATA? 

High-frequency data, also known as "tick data," 
are a record of live market activity. Every time 
a customer, a dealer, or another entity posts a 
so-called limit order to buy s units of a specific 


security with ticker X at price q, a bid quote qf h 
is logged at time tj, to buy units of X. (Market 
orders are incorporated into tick data in a differ¬ 
ent way as discussed below.) When the newly 
arrived bid quote q\ b has the highest price rel¬ 
ative to all other previously arrived bid quotes 
in force, ^[’becomes known as "the best bid" 
available at time fj,. Similarly, when a trading 
entity posts a limit order to sell s units of X at 
price q, an ask quote q^ is logged at time t a to 
sell s? units of X. If the latest qf is lower than 
all other available ask quotes for security X, q" 
becomes known as "the best ask" at time t a . 

What happens to quotes from the moment 
they arrive largely depends on the venue where 
the orders are posted. Best bids and asks posted 
directly on an exchange will be broadcast to all 
exchange participants and other parties track¬ 
ing quote data. In situations when the new best 
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bid exceeds the best ask already in force on the 
exchange, qf > q^, most exchanges will imme¬ 
diately "match" such quotes, executing a trade 
at the preexisting best ask, q^ at time 4 . Con¬ 
versely, should the newly arrived best ask fall 
below the current best bid, q^ < qf h , the trade 
is executed at the preexisting best bid, q\ b at 
time f fl . 

Most dark pools match bids and asks "crossing 
the spread," but may not broadcast the newly 
arrived quotes (hence the mysterious moniker, 
the "dark pools"). Similarly, quotes destined for 
the interdealer networks may or may not be 
disseminated to other market participants, de¬ 
pending on the venue. 

Market orders contribute to high-frequency 
data in the form of "last trade" informa¬ 
tion. Unlike a limit order that is an order 
to buy a specified quantity of a security at 
a certain price, a market order is an or¬ 
der to buy a specified quantity of a security 
at the best price available at the moment the or¬ 
der is "posted" on the trading venue. As such, 
market orders are executed immediately at the 
best available bid or best ask prices, with each 
market buy order executed at the best ask and 
each market sell matched with the best bid, and 
the transaction is recorded in the quote data as 
the "last trade price" and the "last trade size." 

A large market order may need to be matched 
with one or several best quotes, generating sev¬ 
eral "last trade" data points. For example, if 
the newly arrived market buy order is smaller 
in size than that of the best ask, the best ask 
quote may still remain in force on most trading 
venues, but the best ask size will be reduced to 
reflect that the portion of the best ask quote has 
been matched with the market order. When the 
size of the incoming market buy order is big¬ 
ger than the size of the corresponding best ask, 
the market order consumes the best ask in its 
entirety, and then proceeds to be matched se¬ 
quentially with the next available best ask until 
the size of the market order is fulfilled. The re¬ 
maining lowest-priced ask quote becomes the 
best ask available on the trading venue. 


Most limit and market orders are placed in so- 
called "lot sizes": increments of a certain num¬ 
ber of units, known as a lot. In foreign exchange, 
a standard trading lot today is USS5 million, 
a considerable reduction from a minimum of 
$25 million entertained by high-profile brokers 
just a few years ago. On equity exchanges, a lot 
can be as low as one share, but dark pools may 
still enforce a 100 share minimum requirement 
for orders. An order for the amount other than 
an integer increment of a lot size is called an 
"odd lot." 

Small limit and market "odd lot" orders 
posted through a broker-dealer may be aggre¬ 
gated, or "packaged," by the broker-dealer into 
larger-size orders in order to obtain volume dis¬ 
counts at the orders' execution venue. In the 
process, the brokers may "sit" on quotes with¬ 
out transmitting them to an executing venue, 
delaying execution of customers' orders. 

HOW ARE HIGH-FREQUENCY 
DATA RECORDED? 

The highest-frequency data are a collection of 
sequential "ticks," arrivals of the latest quote, 
trade, price, order size, and volume infor¬ 
mation. Tick data usually have the following 
properties: 

• A timestamp 

• A financial security identification code 

• An indicator of what information it carries 

• Bid price 

• Ask price 

• Available bid size 

• Available ask size 

• Last trade price 

• Last trade size 

• Security-specific data, such as implied volatil¬ 
ity for options 

• The market value information, such as the 
actual numerical value of the price, available 
volume, or size 

A timestamp records the date and time at 
which the quote originated. It may be the time 



Working with High-Frequency Data 


451 


at which the exchange or the broker-dealer re¬ 
leased the quote, or the time when the trad¬ 
ing system has received the quote. At the time 
this entry is written, the standard "round-trip" 
travel time of an order quote from the order¬ 
ing customer to the exchange and back to the 
customer with the acknowledgement of order 
receipt is 15 milliseconds or less in New York. 
Brokers have been known to be fired by their 
customers if they are unable to process or¬ 
ders at this now standard speed. Sophisticated 
quotation systems, therefore, include millisec¬ 
onds and even microseconds as part of their 
timestamps. 

Another part of the quote is an identifier of 
the financial security. In equities, the identifica¬ 
tion code can be a ticker, or, for tickers simulta¬ 
neously traded on multiple exchanges, a ticker 
followed by the exchange symbol. For futures, 
the identification code can consist of the un¬ 
derlying security, futures expiration date, and 
exchange code. 

The last trade price shows the price at which 
the last trade in the security cleared. Last trade 
price can differ from the bid and ask. The differ¬ 
ences can arise when a customer posts a favor¬ 
able limit order that is immediately matched by 
the broker without broadcasting the customer's 
quote. Last trade size shows the actual size of 
the last executed trade. 

The best bid is the highest price available for 
sale of the security in the market. The best ask is 
the lowest price entered for buying the security 
at any particular time. In addition to the best bid 
and best ask, quotation systems may dissemi¬ 
nate "market depth" information: the bid and 
ask quotes entered posted on the trading venue 
at prices worse than the best bid and ask, as well 
as aggregate order sizes corresponding to each 
bid and ask recorded on the trading venue's 
"books." Market depth information is some¬ 
times referred to as the Level II data and may 
be disseminated as the premium subscription 
service only. In contrast, the best bid, best ask, 
last trade price, and size information ("Level I 
data") is often available for a small nominal fee. 


Panels (a) and (b) of Figure 1 illustrate a 
30-second log of Level I high-frequency data 
recorded by NYSE Area for SPDR S&P 500 ETF 
(ticker SPY) from 14:00:16:400 to 14:02:00:000 
GMT on November 9, 2009. Panel (a) shows 
quote data: best bid, best ask, and last trade in¬ 
formation, while panel (b) displays correspond¬ 
ing position sizes (best bid size, best ask size, 
and last trade size). 

PROPERTIES OF 
HIGH-FREQUENCY DATA 

Fligh-frequency securities data have been stud¬ 
ied for many years. Yet, the concept is still 
something of a novelty to many academics and 
practitioners. Unlike daily or monthly data sets 
commonly used in much of financial research 
and related applications, high-frequency data 
have distinct properties, which simultaneously 
can be advantageous and intimidating to re¬ 
searchers. Table 1 summarizes the properties of 
high-frequency data. Each property, its advan¬ 
tages, and its disadvantages are discussed in 
detail later in the entry. 

HIGH-FREQUENCY DATA 
ARE VOLUMINOUS 

The nearly two-minute sample of tick data for 
SPDR S&P 500 ETF (ticker SPY) shown in Fig¬ 
ure 1 contained over 100 observations of Level I 
data: best bid quotes and sizes, best ask quotes 
and sizes, and last trade prices and sizes. Ta¬ 
ble 2 summarizes the breakdown of the data 
points provided by NYSE Area for SPY from 
14:00:16:400 to 14:02:00:000 GMT on November 
9, 2009, and SPY, Japanese yen futures, and a 
euro call option throughout the day on Novem¬ 
ber 9, 2009. Other Level I data omitted from 
Table 2 include cumulative daily trade volume 
for SPY and Japanese yen futures, and "Greeks" 
for the euro call option. The number of quotes 
observed on November 9, 2009, for SPY alone 
would comprise over 160 years of daily open. 
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Panel (a): HF Data for S&P 500 ETF Recorded from 14:00:16:400 to 
14:02:00:000 GMT: Best Bid, Best Ask and Last Trade Data 
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Panel (b): HF Data for S&P 500 ETF Recorded from 14:00:16:400 to 14:02:00:000 GMT: Bid Size, 
Ask Size and Last Trade Size 



Bid Size 


-Ask Size x Last Trade Size 


Figure 1 Level I High-Frequency Data Recorded by NYSE Area for SPDR S&P 500 ETF (ticker SPY) 
from 14:00:16:400 to 14:02:00:000 GMT on November 9, 2009 


high, low, close, and volume data points, as¬ 
suming an average of 252 trading days per year. 

The quality of data does not always match its 
quantity. Centralized exchanges generally pro¬ 
vide accurate data on bids, asks, and volume of 
any. The information on the limit order book is 
less commonly available. In decentralized mar¬ 


kets, such as foreign exchange and the inter¬ 
bank money market, no market-wide quotes are 
available at any given time. In such markets, 
participants are aware of the current price lev¬ 
els, but each institution quotes its own prices 
adjusted for its order book. In decentralized 
markets, each dealer provides his or her own 
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Table 1 Summary of Properties of High-Frequency Data 


Property of HF Data 

Description 

Pros 

Cons 

Voluminous 

Each day of high-frequency 
data contains the number 
of observations 
equivalent to 30 years of 
daily data. 

Large numbers of 
observations carry lots 
of information. 

High-frequency data are 
difficult to handle manually. 

Irregularly spaced in 

Arrival of tick data is 

Durations between data 

Most traditional models require 

time 

asynchronous. 

arrivals carry 
information. 

regularly spaced data; need 
to convert high-frequency 
data to some regular 
intervals, or "bars" of data. 
Converted data is often 
sparse (populated with zero 
returns), once again making 
traditional econometric 
inferences difficult. 

Subject to bid-ask 

Unlike traditional data 

Bid and ask quotes can 

Bid and ask quotes are 

bounce 

based on just closing 
prices, tick data carries 
additional supply and 
demand information in 
the form of bid and ask 
prices and offering sizes. 

carry valuable 
information about 
impending market 
moves and can be 
harnessed to 
researcher's advantage. 

separated by a spread. 
Continuous movement from 
bid to ask and back 
introduces a jump process, 
difficult to deal with through 
many conventional models. 


tick data to clients. As a result, a specific quote 
on a given financial instrument at any given 
time may vary from dealer to dealer. Reuters, 
Telerate, and Knight Ridder, among others, col¬ 
lect quotes from different dealers and dissemi¬ 
nate them back, improving the efficiency of the 
decentralized markets. 

There are generally thought to be three 
anomalies in interdealer quote discrepancies. 
First, each dealer's quotes reflect that dealer's 
own inventory. For example, a dealer that has 
just sold a customer $100 million of USD/CAD 
would be eager to diversify the risk of the posi¬ 


tion and avoid selling any more of USD/CAD. 
Most dealers are, however, obligated to trans¬ 
act with their clients on tradable quotes. To in¬ 
cite clients to place sell orders on USD/CAD, 
the dealer temporarily raises the bid quote 
on USD/CAD. At the same time, to encour¬ 
age clients to withhold placing buy orders, the 
dealer raises the ask quote on USD/CAD. Thus, 
dealers tend to raise both bid and ask prices 
whenever they are short in a particular financial 
instrument and lower both bid and ask prices 
whenever they are disproportionally long in a 
financial instrument. 


Table 2 Summary Statistics for Level I Quotes for Selected Securities on November 9, 2009 


Quote Type 

SPY, 14:00:16:400 to 
14:02:00:000 GMT 

SPY, all day 

USD/JPY Dec 2009 
Futures, all day 

EUR/USD Call Expiring 
Dec 2009 with Strike 
Price of 1.5100, all day 

Best Bid Quote 

4 (3%) 

5,467 (3%) 

6,320 (5%) 

1,521 (3%) 

Best Bid Size 

36 (29%) 

38,948 (19%) 

39,070 (32%) 

5,722 (11%) 

Best Ask Quote 

4 (3%) 

4,998 (2%) 

6,344 (5%) 

1,515 (3%) 

Best Ask Size 

35 (28%) 

38,721 (19%) 

38,855 (32%) 

5,615 (11%) 

Last Trade Price 

6 (5%) 

9,803 (5%) 

3,353 (3%) 

14 (0%) 

Last Trade Size 

20 (16%) 

27,750 (14%) 

10,178 (8%) 

25 (0%) 

Total 

125 

203,792 

123,216 

49,982 
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Figure 2 Average Hourly Bid-Ask Spread on EUR/USD Spot for the Last Two Weeks of October 2008 
on a Median Transaction Size of USD 5 million 
Source: Aldridge (2009). 


Second, in an anonymous marketplace, such 
as a dark pool, dealers as well as other market 
makers may "fish" for market information by 
sending indicative quotes that are much off the 
previously quoted price to assess the available 
demand or supply. 

Third, Dacorogna et al. (2001) note that some 
dealers' quotes may lag real market prices. The 
lag is thought to vary from milliseconds to a 
minute. Some dealers quote moving averages 
of quotes of other dealers. The dealers who 
provide delayed quotes usually do so to ad¬ 
vertise their market presence in the data feed. 
This was particularly true when most order 
prices were negotiated over the telephone, al¬ 
lowing a considerable delay between quotes 
and orders. Fast-paced electronic markets dis¬ 
courage lagged quotes, improving the quality of 
markets. 

HIGH-FREQUENCY DATA 
ARE SUBJECT TO BID-ASK 
BOUNCE 

In addition to trade price and volume data 
long available in low-frequency formats, high- 
frequency data comprise bid and ask quotes 
and the associated order sizes. Bid and ask data 


arrive asynchronously and introduce noise in 
the quote process. 

The difference between the bid quote and the 
ask quote at any given time is known as the 
bid-ask spread. The bid-ask spread is the cost 
of instantaneously buying and selling the secu¬ 
rity. The higher the bid-ask spread, the higher 
a gain the security must produce in order to 
cover the spread along with other transaction 
costs. Most low-frequency price changes are 
large enough to make the bid-ask spread neg¬ 
ligible in comparison. In tick data, on the other 
hand, incremental price changes can be compa¬ 
rable or smaller than the bid-ask spread. 

Bid-ask spreads usually vary throughout the 
day. Figure 2 illustrates the average bid-ask 
spread cycles observed in the institutional 
EUR/USD market for the last two weeks of 
October 2008. As Figure 2 shows, the aver¬ 
age spread increases significantly during Tokyo 
trading hours when the market is quiet. The 
spread then reaches its lowest levels during the 
overlap of the London and New York trading 
sessions when the market has many active buy¬ 
ers and sellers. The spike in the spread over 
the weekend of October 18-19,2008, reflects the 
market concern over the subpoenas issued on 
October 17, 2009, to senior Lehman executives 
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Hour of the Day (GMT) 


Figure 3 Comparison of Average Bid-Ask Spreads for Different Hours of the Day during Normal 
Market Conditions and Crisis Conditions 


in a case relating to potential securities fraud at 
Lehman Brothers. 

Bid-ask spreads typically increase during pe¬ 
riods of market uncertainty or instability. Fig¬ 
ure 3, for example, compares average bid-ask 
spreads on EUR/USD in the stable market 
conditions of July-August 2008 and the crisis 
conditions of September-October 2008. As the 
figure shows, the intraday spread pattern is per¬ 
sistent in both crisis and normal market condi¬ 
tions, but the spreads are significantly higher 
during crisis months than during normal con¬ 
ditions at all hours of the day. As Figure 3 also 
shows, the spread increase is not uniform at all 
hours of the day. The average hourly EUR/USD 
spreads increased by 0.0048% (0.48 basis points 
or pips) between the hours of 12 GMT and 16 
GMT, when the London and New York trading 
sessions overlap. From 0 to 2 GMT, during the 
Tokyo trading hours, the spread increased by 
0.0156 %, over three times the average increase 
during the New York/London hours. 

As a result of increasing bid-ask spreads 
during periods of uncertainty and crises, the 


profitability of high-frequency strategies de¬ 
creases during those times. For example, high- 
frequency EUR/USD strategies running over 
Asian hours incurred significantly higher costs 
during September and October 2008 as com¬ 
pared with normal market conditions. A strat¬ 
egy that executed 100 trades during Asian 
hours alone resulted in 1.56 percent evaporating 
from daily profits due to the increased spreads, 
while the same strategy running during 
London and New York hours resulted in a 
smaller but still significant daily profit de¬ 
crease of 0.48%. The situation can be even more 
severe for high-frequency strategies built for 
less liquid instruments. For example, bid-ask 
spreads for NZD/USD (not shown) on aver¬ 
age increased thrice during September-October 
in comparison with market conditions of 
July-August 2008. 

While tick data carries information about 
market dynamics, it is also distorted by the 
same processes that make the data so valuable 
in the first place. Dacorogna et al. (2001) report 
that sequential trade price bounces between the 
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bid and ask quotes during market execution of 
orders introduce significant distortions into es¬ 
timation of high-frequency parameters. Corsi, 
Zumbach, Muller, and Dacorogna (2001), for ex¬ 
ample, show that the bid-ask bounce introduces 
a considerable bias into volatility estimates. The 
authors calculate that the bid-ask bounce on av¬ 
erage results in -40% negative first-order auto¬ 
correlation of tick data. Corsi et al. (2001) as well 
as Voev and Lunde (2007) propose to remedy 
the bias by filtering the data from the bid-ask 
noise prior to estimation. 

To use standard econometric techniques in 
the presence of the bid-ask bounce, many prac¬ 
titioners convert the tick data to "mid-quote" 
format: the simple average of the latest bid and 
ask quotes. The mid-quote is used to approxi¬ 
mate the price level at which the market is the¬ 
oretically willing to trade if buyers and sellers 
agreed to meet each other halfway on the price 
spectrum. Mathematically, the mid-quote can 
be expressed as follows: 


C = \ K +<?tO where ^ = 

(i) 

The latter condition for t m reflects the contin¬ 
uous updating of the mid-quote estimate: q is 
updated whenever the latest best bid, q b b , or best 
ask quote, q£, arrives, at f/, or t a respectively. 

Another way to sample tick quotes into a co¬ 
hesive data series is by weighing the latest best 
bid and best ask quotes by their accompanying 
order sizes: 


t a , if t a > h 
fb, otherwise 


q t = 




< s l 




( 2 ) 


where q b b and s£ is the best bid quote and the 
best bid available size recorded at time 4 (when 
q b tb became the best bid), and q" a and sj? is the 
best bid quote and the best bid available size 
recorded at time t„. 

Figure 5 compares the histograms of simple 
returns computed from mid-quote (panel a), 
size-weighted mid-quote (panel b), and trade- 
price (panel c) processes for SPDR S&P 500 
ETF data recorded as they arrive throughout 


November 9, 2009. The data neglect the time 
difference between the adjacent quotes, treat¬ 
ing each sequential quote as an independent 
observation. Figure 6 contrasts the quantile dis¬ 
tribution plots of the same data sets with the 
quantiles of a standard normal distribution. 

As Figures 4 and 5 show, the basic mid¬ 
quote distribution is constrained by the mini¬ 
mum "step size": The minimum changes in the 
mid-quote can occur at half-tick increments (at 
present, the minimum tick size is SO.01 in eq¬ 
uities). The size-weighted mid-quote forms the 
most continuous distribution among the three 
distributions discussed. Figure 6 confirms this 
notion further and also illustrates the fat tails 
present in all three types of data distributions. 

In addition to real-time adjustments to bid- 
ask data, researchers deploy forecasting tech¬ 
niques to estimate the impending bid-ask 
spread and adjust for it in models ahead of time. 
Future realizations of the bid-ask spread can be 
estimated using the model suggested by Roll 
(1984), where the price of an asset at time f, 
p t , is assumed to equal an unobservable funda¬ 
mental value, m t , offset by a value equal to half 
of the bid-ask spread, s. The price offset is pos¬ 
itive when the next market order is a buy, and 
negative when the trade is a sell, as shown in 
equation (3): 


Pt = ni t + 2 ^ 


where It 


1 market buy at ask 
—1, market sell at bid 


( 3 ) 


If either a buy or a sell order can arrive 
next with equal probability, then E [t f ] = 0, and 
£ [ Apt] = 0, absent changes in the fundamental 
asset value, m t . The covariance of subsequent 
price changes, however, is different from 0: 

s 2 

cov [Ap t , Apt+i] = E [AptApt+i] = (4) 

As a result, the future expected spread can be 
estimated as follows: 


£ [s] = 2^ —cov [Ap t , Ap f+ i] whenever 
cov [Apt, Ap t+ i] < 0 
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Figure 4 Bid-Ask Aggregation Techniques on Data for SPDR S&P 500 ETF (ticker SPY) Recorded by 
NYSE Area on November 9, 2009, from 14:00:16:400 to 14:00:02:000 GMT 


Numerous extensions of Roll's model have 
been developed to account for contemporary 
market conditions along with numerous other 
variables. Hasbrouck (2007) provides a good 
summary of the models. 

HIGH-FREQUENCY DATA 
ARE IRREGULARLY SPACED 
IN TIME 

Most modern computational techniques have 
been developed to work with regularly spaced 
data, presented in monthly, weekly, daily, 
hourly, or other consistent intervals. The tra¬ 
ditional reliance of researchers on fixed time 
intervals is due to: 

• Relative availability of daily data (newspa¬ 
pers have published daily quotes since the 
1920s). 

• Relative ease of processing regularly spaced 
data. 

• An outdated view that "whatever drove se¬ 
curity prices and returns, it probably did not 
vary significantly over short time intervals." 
(Goodhart and O'Hara, 1997, pp. 80-81) 


In contrast, high-frequency observations are 
separated by varying time intervals. One way 
to overcome the irregularities in the data is 
to sample it at certain predetermined periods 
of time—for example, every hour or minute. 
For example, if the data are to be converted 
from tick data to minute "bars," then under 
the traditional approach, the bid or ask price 
for any given minute would be determined as 
the last quote that arrived during that particu¬ 
lar minute. If no quotes arrived during a cer¬ 
tain minute, then the previous minute's closing 
prices would be taken as the current minute's 
closing prices, and so on. Figure 7, panel (a) il¬ 
lustrates this idea. This approach implicitly as¬ 
sumes that in the absence of new quotes, the 
prices stay constant, which does not have to be 
the case. 

Dacorogna et al. (2001) propose a potentially 
more precise way to sample quotes—linear 
time-weighted interpolation between adjacent 
quotes. At the core of the interpolation tech¬ 
nique is an assumption that at any given time, 
unobserved quotes lie on a straight line that 
connects two neighboring observed quotes. 
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Panel (a): Mid-quote Simple Returns 


Panel (b): Size-Weighted Mid-Quote Simple Returns 


MidQuoteSimple Quotes x 10 4 SWMidQuoteSimple Quotes 
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Panel (c): Last Trade Price Simple Returns 
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Figure 5 Histograms of Simple Returns Computed from Mid-Quote (panel a), Size-Weighted Mid- 
Quote (panel b), and Trade-Price (panel c) Processes for SPDR S&P 500 ETF Data Recorded as They 
Arrive Throughout November 9, 2009 


Figure 7, panel (b) illustrates linear interpola¬ 
tion sampling. 

As shown in Figure 7, panels (a) and (b), 
the two quote-sampling methods produce quite 
different results. 

Mathematically, the two sampling methods 
can be expressed as follows: 

Quote sampling using closing prices: qt = qtjast 

(5) 


Quote sampling using linear interpolation: 

tft = tft,last “I" {fit, next fit, last) T 7 (6) 

^next ‘'last 

where q t is the resulting sampled quote, f is the 
desired sampling time (start of a new minute, 
for example), k as t is the timestamp of the last 
observed quote prior to the sampling time t, 
qtjast is the value of the last quote prior to the 
sampling time f, t nex t is the timestamp of the 
first observed quote after the sampling time f. 
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Panel (a): Mid-quote returns 

MidQuoteSimple Quotes 



Panel (b): Size-weighted mid-quote returns 

SWMidQuoteSimple Quotes 



Panel (c): Trade price returns 


TradeDataSimple Quotes 



Figure 6 Quantile Plots of Simple Returns of Mid-Quote (panel a), Size-Weighted Mid-Quote 
(panel b), and Trade-Price (panel c) Processes for SPDR S&P 500 ETF Data Recorded as They Arrive 
Throughout November 9, 2009 
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Figure 7 Data-Sampling Methodologies 
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Figure 8 Mid-Quote "Closing Quotes" Sampled at 200 ms (left) and 15s Intervals 


and Cjt,next is the value of the first quote after the 
sampling time t. 

Figures 8 and 9 compare histograms of the 
mid-quote data sampled as closing prices and 
interpolated at frequencies of 200 ms and 15s. 
Figure 10 compares quantile plots of closing 
prices and interpolated distributions. As Fig¬ 
ures 8 and 9 show, often-sampled distributions 
are sparse, that is, contain more 0 returns than 
distributions sampled at lower frequencies. At 
the same time, returns computed from interpo¬ 


lated quotes are more continuous than closing 
prices, as Figure 10 illustrates. 

Instead of manipulating the interquote inter¬ 
vals into the convenient regularly spaced for¬ 
mats, several researchers have studied whether 
the time distance between subsequent quote 
arrivals itself carries information. For exam¬ 
ple, most researchers agree that intertrade 
intervals indeed carry information on securi¬ 
ties for which short sales are disallowed; the 
lower the intertrade duration, the more likely the 



Figure 9 Mid-Quote "Time-Interpolated Quotes" Sampled at 200 ms (left) and 15s Intervals 
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Figure 10 Quantile Plots: Closing Prices vs. Interpolated Mid-Quotes Sampled at 200 ms 


yet-to-be-observed good news and the higher 
the impending price change. 

Duration models are used to estimate the 
factors affecting the time between any two se¬ 
quential ticks. Such models are known as quote 
processes and trade processes, respectively. Du¬ 
ration models are also used to measure the time 
elapsed between price changes of a prespeci¬ 
fied size, as well as the time interval between 
predetermined trade volume increments. The 
models working with fixed price are known as 
price processes; the models estimating varia¬ 
tion in duration of fixed volume increments are 
known as volume processes. 

Durations are often modeled using Poisson 
processes that assume that sequential events, 
like quote arrivals, occur independently of one 
another. The number of arrivals between any 
two time points t and (f + r) is assumed to have 
a Poisson distribution. In a Poisson process, X 
arrivals occur per unit time. In other words, 
the arrivals occur at an average rate of (1/A). 
The average arrival rate may be assumed to 
hold constant, or it may vary with time. If the 
average arrival rate is constant, the probability 
of observing exactly k arrivals between times t 
and (f + r) is 

P[(N(t + T) - N(t)) =k]= ^e~ Xr (XT) k , 

k = 0,1,2,... (7) 


Diamond and Verrecchia (1987) and Easley 
and O'Hara (1992) were the first to suggest that 
the duration between subsequent ticks carries 
information. Their models posit that in the pres¬ 
ence of short-sale constraints, intertrade dura¬ 
tion can indicate the presence of good news; 
in markets of securities where short selling is 
disallowed, the shorter the intertrade duration, 
the higher is the likelihood of unobserved good 
news. The reverse also holds: In markets with 
limited short selling and normal liquidity lev¬ 
els, the longer the duration between subse¬ 
quent trade arrivals, the higher the probabil¬ 
ity of yet-unobserved bad news. A complete 
absence of trades, however, indicates a lack 
of news. 

Easley and O'Hara (1992) further point out 
that trades that are separated by a time interval 
have a much different information content than 
trades occurring in close proximity. One of the 
implications of Easley and O'Hara (1992) is that 
the entire price sequence conveys information 
and should be used in its entirety whenever 
possible, strengthening the argument for high- 
frequency trading. 

Table 3 shows summary statistics for a dura¬ 
tion measure computed on all trades recorded 
for S&P 500 Depository Receipts ETF (SPY) on 
May 13,2009. As Table 3 illustrates, the average 
intertrade duration was the longest outside of 
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Table 3 Hourly Distributions of Intertrade Duration Observed on May 13, 2009 for S&P 500 Depository Receipts 
ETF (SPY) 


Hour (ET) 


Intertrade Duration (milliseconds) 




No. of Trades 

Average 

Median 

Std Dev 

Skewness 

Kurtosis 

4-5 AM 

170 

19074.58 

5998 

47985.39 

8.430986 

91.11571 

5-6 AM 

306 

11556.95 

4781.5 

18567.83 

3.687372 

21.92054 

6-7 AM 

288 

12606.81 

4251 

20524.15 

3.208992 

16.64422 

7-8 AM 

514 

7096.512 

2995 

11706.72 

4.288352 

29.86546 

8-9 AM 

767 

4690.699 

1997 

7110.478 

3.775796 

23.56566 

9-10 AM 

1089 

2113.328 

1934 

24702.9 

3.5185 

24.6587 

10-11 AM 

1421 

2531.204 

1373 

3409.889 

3.959082 

28.53834 

11-12 PM 

1145 

3148.547 

1526 

4323.262 

3.240606 

17.24866 

12-1 PM 

749 

4798.666 

1882 

7272.774 

2.961139 

13.63373 

1-2 PM 

982 

3668.247 

1739.5 

5032.795 

2.879833 

13.82796 

2-3 pm 

1056 

3408.969 

1556 

4867.061 

3.691909 

23.90667 

3-4 pm 

1721 

2094.206 

1004 

2684.231 

2.9568 

15.03321 

4-5 pm 

423 

8473.593 

1500 

24718.41 

7.264483 

69.82157 

5-6 pm 

47 

73579.23 

30763 

113747.8 

2.281743 

7.870699 

6-7 pm 

3 

1077663 

19241 

1849464 

0.707025 

1.5 


regular market hours, and the shortest during 
the hour preceding the market close (3-4 p.m. 
ET). 

The variation in duration between subse¬ 
quent trades may be due to several other causes. 
While the lack of trading may be due to a 
lack of new information, trading inactivity may 
also be due to low levels of liquidity, trading 
halts on exchanges, and strategic motivations 
of traders. Foucault, Kadan, and Kandel (2005) 
consider that patiently providing liquidity us¬ 
ing limit orders may itself be a profitable trading 
strategy, as liquidity providers should be com¬ 
pensated for their waiting. The compensation 
usually comes in the form of a bid-ask spread 
and is a function of the waiting time until the 
order limit is "hit" by liquidity takers; lower in¬ 
tertrade durations induce lower spreads. How¬ 
ever, Dufour and Engle (2000) and Saar and 
Hasbrouck (2002) find that spreads are actually 
higher when traders observe short durations, 
contrasting the time-based limit order compen¬ 
sation hypothesis. 

In addition to durations between subsequent 
trades and quotes, researchers have also been 
modeling durations between fixed changes in 
security prices and volumes. The time interval 
between subsequent price changes of a spec¬ 


ified magnitude is known as price duration. 
Price duration has been shown to decrease with 
increases in volatility. Similarly, the time inter¬ 
val between subsequent volume changes of a 
prespecified size is known as the volume du¬ 
ration. Volume duration has been shown to de¬ 
crease with increases in liquidity. 

The information content of quote, trade, price, 
and volume durations introduces biases into 
the estimation process, however. If the avail¬ 
able information determines the time between 
subsequent trades, time itself ceases to be an 
independent variable, introducing substantial 
endogeneity bias into estimation. As a result, 
traditional estimates of variance of transaction 
prices are too high in comparison with the true 
variance of the price series. 

KEY POINTS 

* High-frequency data are different from daily 
or lower frequency data. Whereas low fre¬ 
quency data typically comprise regularly 
spaced open, high, low, close, and volume 
information for a given financial security 
recorded during a specific period of time, 
high-frequency data include bid and ask 
quotes, sizes, and latest trade characteristics 
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that are recorded sequentially at irregular 
time intervals. 

• The differences affect trading strategy mod¬ 
eling, introducing new opportunities and pit- 
falls for researchers. 

• Numerous data points allow researchers to 
deduce statistically significant inferences on 
even short samples of high-frequency data. 

• Different sampling approaches have been 
developed to convert high-frequency data 
into a more regular format better familiar 
to researchers, yet diverse sampling method¬ 
ologies result in datasets with drastically 
dissimilar statistical properties. 

• Some properties of high-frequency data, like 
intertrade duration, carry important market 
information unavailable at lower frequencies. 
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Abstract: The origins of financial modeling can be traced back to the development of mathematical 
equilibrium at the end of the nineteenth century, followed in the beginning of the twentieth cen¬ 
tury with the introduction of sophisticated mathematical tools for dealing with the uncertainty of 
prices and returns. In the 1950s and 1960s, financial modelers had tools for dealing with proba¬ 
bilistic models for describing markets, the principles of contingent claims analysis, an optimization 
framework for portfolio selection based on mean and variance of asset returns, and an equilibrium 
model for pricing capital assets. The 1970s ushered in models for pricing contingent claims and 
a new model for pricing capital assets based on arbitrage pricing. Consequently, by the end of 
the 1970s, the frameworks for financial modeling were well known. It was the advancement of 
computing power and refinements of the theories to take into account real-world markets starting 
in the 1980s that facilitated implementation and broader acceptance of mathematical modeling of 
financial decisions. 


The mathematical development of present-day 
economic and finance theory began in Lau¬ 
sanne, Switzerland at the end of the nineteenth 
century, with the development of the math¬ 
ematical equilibrium theory by Leon Walras 
(1874) and Vilfredo Pareto (1906). Shortly there¬ 
after, at the beginning of the twentieth cen¬ 
tury, Louis Bachelier (1900) in Paris and Filip 
Lundberg (1903) in Uppsala (Sweden) made 
two seminal contributions: They developed 
sophisticated mathematical tools to describe 
uncertain price and risk processes. These de¬ 
velopments were well in advance of their time. 
Further progress was to be made only much 
later in the twentieth century, thanks to the de¬ 


velopment of digital computers. By making it 
possible to compute approximate solutions to 
complex problems, digital computers enabled 
the large-scale application of mathematics to 
business problems. 

A first round of innovation occurred in the 
1950s and 1960s. Kenneth Arrow and Georges 
Debreu (1954) introduced a probabilistic model 
of markets and the notion of contingent claims. 
Harry Markowitz (1952) described mathemat¬ 
ically the principles of the investment process 
in terms of utility optimization. In 1961, Franco 
Modigliani and Merton Miller (1961) clarified 
the nature of economic value, working out the 
implications of absence of arbitrage. Between 
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1964 and 1966, William Sharpe (1964), John Lint- 
ner (1965), and Jan Mossin (1966) developed 
a theoretical model of market prices based on 
the principles of financial decision making laid 
down by Markowitz. The notion of efficient 
markets was introduced by Paul Samuelson 
(1965), and five years later, further developed 
by Eugene Fama (1970). 

The second round of innovation started at the 
end of the 1970s. In 1973, Fischer Black, Myron 
Scholes (1973), and Robert Merton (1973a) dis¬ 
covered how to determine option prices using 
continuous hedging. Three years later, Stephen 
Ross (1976) introduced arbitrage pricing the¬ 
ory (APT). Both were major developments that 
were to result in a comprehensive mathematical 
methodology for investment management and 
the valuation of derivative financial products. 
At about the same time, Merton introduced a 
continuous-time intertemporal, dynamic opti¬ 
mization model of asset allocation. Major re¬ 
finements in the methodology of mathematical 
optimization and new econometric tools were 
to change the way investments are managed. 

More recently, the diffusion of electronic 
transactions has made available a huge amount 
of empirical data. The availability of this data 
created the hope that economics could be 
given a more solid scientific grounding. A new 
field—econophysics—opened with the expec¬ 
tation that the proven methods of the physical 
sciences and the newly bom science of com¬ 
plex systems could be applied with benefit to 
economics. It was hypothesized that economic 
systems could be studied as physical systems 
with only minimal a priori economic assump¬ 
tions. Classical econometrics is based on a sim¬ 
ilar approach; but while the scope of classical 
econometrics is limited to dynamic models of 
time series, econophysics uses all the tools of 
statistical physics and complex systems analy¬ 
sis, including the theory of interacting multia¬ 
gent systems. 

In this entry, we will describe the milestones 
in financial modeling. 


THE PRECURSORS: PARETO, 
WALRAS, AND THE 
LAUSANNE SCHOOL 

The idea of formulating quantitative laws of 
economic behavior in ways similar to the phys¬ 
ical sciences started in earnest at the end of 
the 19th century. Though quite accurate eco¬ 
nomic accounting on a large scale dates back to 
Assyro-Babylonian times, a scientific approach 
to economics is a recent endeavor. 

Leon Walras and Wilfredo Pareto, founders of 
the so-called Lausanne School at the University 
of Lausanne in Switzerland, were among the 
first to explicitly formulate quantitative princi¬ 
ples of market economies, stating the principle 
of economic equilibrium as a mathematical the¬ 
ory. Both worked at a time of great social and 
economic change. In Pareto's work in particu¬ 
lar, pure economics and political science occupy 
a central place. 

Convinced that economics should become a 
mathematical science, Walras set himself the 
task of writing the first mathematical gen¬ 
eral equilibrium system. The British economist 
Stanley Jevons and the Austrian economist Carl 
Menger had already formulated the idea of eco¬ 
nomic equilibrium as a situation where sup¬ 
ply and demand match in interrelated markets. 
Walras's objective—to prove that equilibrium 
was indeed possible—required the explicit 
formulation of the equations of supply-and- 
demand equilibrium. 

Walras introduced the idea of tatonnement 
(French for groping) as a process of exploration 
by which a central auctioneer determines 
equilibrium prices. A century before, in 1776, 
Adam Smith had introduced the notion of the 
"invisible hand" that coordinates the activity 
of independent competitive agents to achieve 
desirable global goals. In the modern parlance 
of complex systems, the "invisible hand" 
would be called an "emerging property" of 
competitive markets. Much recent work on 
complex systems and artificial life has focused 
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on understanding how the local interaction of 
individuals might result in complex and pur¬ 
poseful global behavior. Walras was to make 
the hand "visible" by defining the process of 
price discovery. 

Pareto followed Walras in the Chair of Eco¬ 
nomics at the University of Lausanne. Pareto's 
focus was the process of economic decision 
making. He replaced the idea of supply-and- 
demand equilibrium with a more general idea 
of the ordering of preferences through utility 
functions. (Pareto used the word "ophelimity" 
to designate what we would now call util¬ 
ity. The concept of ophelimity is slightly dif¬ 
ferent from the concept of utility insofar as 
ophelimity includes constraints on people's 
preferences.) Equilibrium is reached where 
marginal utilities are zero. The Pareto system 
hypothesized that agents are able to order their 
preferences and take into account constraints 
in such a way that a numerical index—"utility" 
in today's terminology—can be associated with 
each choice. Note that it was not until 1944 that 
utility theory was formalized in a set of nec¬ 
essary and sufficient axioms by von Neumann 
and Morgenstern and applied to decision mak¬ 
ing under risk and uncertainty. 

Economic decision making is therefore based 
on the maximization of utility. As Pareto as¬ 
sumed utility to be a differentiable function, 
global equilibrium is reached where marginal 
utilities (i.e., the partial derivatives of utility) 
vanish. Pareto was especially interested in the 
problem of the global optimum of utility. The 
Pareto optimum is a state in which nobody can 
be better off without making others worse off. 
A Pareto optimum does not imply the equal di¬ 
vision of resources; quite the contrary, a Pareto 
optimum might be a maximally unequal distri¬ 
bution of wealth. 

A lasting contribution of Pareto is the formu¬ 
lation of a law of income distribution. Known 
as the Pareto law, this law states that there is a 
linear relationship between the logarithm of the 
income I and the number N of people that earn 


more than this income: 

Log N = A + s log I 

where A and s are appropriate constants. 

The importance of the works of Walras and 
Pareto were not appreciated at the time. With¬ 
out digital computers, the equilibrium systems 
they conceived were purely abstract: There was 
no way to compute solutions to economic equi¬ 
librium problems. In addition, the climate at 
the turn of the century did not allow a serene 
evaluation of the scientific merit of their work. 
The idea of free markets was at the center of 
heated political debates; competing systems in¬ 
cluded mercantile economies based on trade re¬ 
strictions and privileges as well as the emerging 
centrally planned Marxist economies. 

PRICE DIFFUSION: 

BACHEFIER 

In 1900, the Sorbonne University student Louis 
Bachelier presented a doctoral dissertation, 
Theorie de la Speculation, that was to anticipate 
much of today's work in finance theory. Bache¬ 
lier's advisor was the great French mathemati¬ 
cian Henri Poincare. There were three notable 
aspects in Bachelier's thesis: (1) He argued that 
in a purely speculative market stock prices 
should be random; (2) he developed the math¬ 
ematics of Brownian motion; and (3) he com¬ 
puted the prices of several options. 

To appreciate the importance of Bachelier's 
work, it should be remarked that at the be¬ 
ginning of the 20th century, the notion of 
probability was not yet rigorous; the formal 
mathematical theory of probability was devel¬ 
oped only in the 1930s. In particular, the pre¬ 
cise notion of the propagation of information 
essential for the definition of conditional prob¬ 
abilities in continuous time had not yet been 
formulated. 

Anticipating the development of the theory 
of efficient markets 60 years later, the key eco¬ 
nomic idea of Bachelier was that asset prices in 
a speculative market should be a fair game, that 
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is, a martingale process such that the expected 
return is zero. According to Bachelier, "The ex¬ 
pectation of the speculator is zero." The formal 
concept of a martingale (i.e., of a process such 
that its expected value at any moment coincides 
with the present value) had not yet been intro¬ 
duced in probability theory. In fact, the rigorous 
notion of conditional probability and filtration 
were developed only in the 1930s. In formulat¬ 
ing his hypothesis on market behavior, Bache¬ 
lier relied on intuition. 

Bachelier actually went much further. He as¬ 
sumed that stock prices evolve as a continuous¬ 
time Markov process. This was a brilliant 
intuition: Markov was to start working on these 
problems only in 1906. Bachelier established 
the differential equation for the time evolution 
of the probability distribution of prices, noting 
that this equation was the same as the heat dif¬ 
fusion equation. Five years later, in 1905, Albert 
Einstein used the same diffusion equation for 
the Brownian motion (i.e., the motion of a small 
particle suspended in a fluid). Bachelier also 
made the connection with the continuous limit 
of random walks, thus anticipating the work 
of the Japanese mathematician Kiyosi Ito at the 
end of the 1940s and the Russian mathematician 
and physicist Ruslan Stratonovich on stochastic 
integrals at the end of the 1950s. 

By computing the extremes of Brownian mo¬ 
tion, Bachelier computed the price of several 
options. He also computed the distributions of 
a number of functionals of Brownian motion. 
These were remarkable mathematical results 
in themselves. Formal proof was given only 
much later. Even more remarkable, Bachelier 
established option pricing formulas well before 
the formal notion of absence of arbitrage was 
formulated. 

Bachelier's work was outside the mainstream 
of contemporary mathematics but was too 
mathematically complex for the economists of 
his time. It wasn't until the formal development 
of probability theory in 1930s that his ideas be¬ 
came mainstream mathematics and only in the 
1960s, with the development of the theory of 


efficient markets, that his ideas became part of 
mainstream finance theory. In an efficient mar¬ 
ket, asset prices should, in each instant, reflect 
all the information available at the time, and 
any event that causes prices to move must be 
unexpected (i.e., a random disturbance). As a 
consequence, prices move as martingales, as ar¬ 
gued by Bachelier. Bachelier was, in fact, the 
first to give a precise mathematical structure in 
continuous time to price processes subject to 
competitive pressure by many agents. 

THE RUIN PROBLEM IN 
INSURANCE: LUNDBERG 

In Uppsala, Sweden, in 1903, three years after 
Bachelier defended his doctoral dissertation in 
Paris, Filip Lundberg defended a thesis that was 
to become a milestone in actuarial mathematics: 
He was the first to define a collective theory of 
risk and to apply a sophisticated probabilistic 
formulation to the insurance ruin problem. The 
ruin problem of an insurance company in a non¬ 
life sector can be defined as follows. Suppose 
that an insurance company receives a stream 
of sure payments (premiums) and is subject to 
claims of random size that occur at random 
times. What is the probability that the insurer 
will not be able to meet its obligations (i.e., the 
probability of ruin)? 

Lundberg solved the problem as a collec¬ 
tive risk problem, pooling together the risk of 
claims. To define collective risk processes, he 
introduced marked Poisson processes. Marked 
Poisson processes are processes where the ran¬ 
dom time between two events is exponentially 
distributed. The magnitude of events is random 
with a distribution independent of the time of 
the event. Based on this representation, Lund¬ 
berg computed an estimate of the probability 
of ruin. 

Lundberg's work anticipated many future 
developments of probability theory, including 
what was later to be known as the theory 
of point processes. In the 1930s, the Swedish 
mathematician and probabilist Harald Cramer 
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gave a rigorous mathematical formulation 
to Lundberg's work. A more comprehensive 
formal theory of insurance risk was later 
developed. This theory now includes Cox 
processes—point processes more general than 
Poisson processes—and fat-tailed distributions 
of claim size. 

A strong connection between actuarial math¬ 
ematics and asset pricing theory has since 
been established. (See, for example, Embrechts, 
Kliippelberg, and Mikosch, 1996). In well- 
behaved, complete markets, establishing insur¬ 
ance premiums entails principles that mirror 
asset prices. In the presence of complete mar¬ 
kets, insurance would be a risk-free business: 
There is always the possibility of reinsurance. 
In markets that are not complete—essentially 
because they make unpredictable jumps— 
hedging is not possible; risk can only be diver¬ 
sified and options are inherently risky. Option 
pricing theory again mirrors the setting of in¬ 
surance premiums. 

Lundberg's work went unnoticed by the actu¬ 
arial community for nearly 30 years, though this 
did not stop him from enjoying a successful ca¬ 
reer as an insurer. Both Bachelier and Lundberg 
were in advance of their time; they anticipated, 
and probably inspired, the subsequent devel¬ 
opment of probability theory. But the type of 
mathematics implied by their work could not 
be employed in full earnest prior to the devel¬ 
opment of digital computers. It was only with 
digital computers that we were able to tackle 
complex mathematical problems whose solu¬ 
tions go beyond closed-form formulas. 

THE PRINCIPLES OF 
INVESTMENT: MARKOWITZ 

Just how an investor should allocate his re¬ 
sources has long been debated. Classical wis¬ 
dom suggested that investments should be 
allocated to those assets yielding the highest 
returns, without the consideration of correla¬ 
tions. Before the modem formulation of effi¬ 
cient markets, speculators widely acted on the 


belief that positions should be taken only if they 
had a competitive advantage in terms of infor¬ 
mation. A large amount of resources were there¬ 
fore spent on analyzing financial information. 
John Maynard Keynes suggested that investors 
should carefully evaluate all available informa¬ 
tion and then make a calculated bet. The idea of 
diversification was anathema to Keynes, who 
was actually quite a successful investor. 

In 1952, Harry Markowitz, then a graduate 
student at the University of Chicago, published 
a seminal article on optimal portfolio selection 
that upset established wisdom. He advocated 
that, being risk adverse, investors should diver¬ 
sify their portfolios. (The principles in his arti¬ 
cle were developed further in his 1959 book.) 
The idea of making risk bearable through risk 
diversification was not new: It was widely 
used by medieval merchants. Markowitz un¬ 
derstood that the risk-return trade-off of in¬ 
vestments could be improved by diversification 
and cast diversification in the framework of 
optimization. 

Markowitz was interested in the investment 
decision-making process. Along the lines set 
forth by Pareto 60 years earlier, Markowitz as¬ 
sumed that investors order their preferences ac¬ 
cording to a utility index, with utility as a convex 
function that takes into account investors' risk- 
return preferences. Markowitz assumed that 
stock returns are jointly normal. As a conse¬ 
quence, the return of any portfolio is a nor¬ 
mal distribution, which can be characterized 
by two parameters: the mean and the vari¬ 
ance. Utility functions are therefore defined on 
two variables—mean and variance—and the 
Markowitz framework for portfolio selection is 
commonly referred to as mean-variance analy¬ 
sis. The mean and variance of portfolio returns 
are in turn a function of a portfolio's weights. 
Given the variance-covariance matrix, utility is 
a function of portfolio weights. The investment 
decision-making process involves maximizing 
utility in the space of portfolio weights. 

The inputs to the mean-variance analysis in¬ 
clude expected returns, variance of returns, and 
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either covariance or correlation of returns be¬ 
tween each pair of securities. For example, an 
analysis that allows 200 securities as possible 
candidates for portfolio selection requires 200 
expected returns, 200 variances of return, and 
19,900 correlations or covariances. An invest¬ 
ment team tracking 200 securities may reason¬ 
ably be expected to summarize their analyses 
in terms of 200 means and variances, but it 
is clearly unreasonable for them to produce 
19,900 carefully considered correlation coeffi¬ 
cients or covariances. It was clear to Markowitz 
that some kind of model of the covariance struc¬ 
ture was needed for the practical application of 
the model. He did little more than point out the 
problem and suggest some possible models of 
covariance for research to large portfolios. In 
1963, William Sharpe suggested the single in¬ 
dex market model as a proxy for the covariance 
structure of security returns. 

Markowitz joined the Rand Corporation, 
where he met George Dantzig, who introduced 
him to computer-based optimization technol¬ 
ogy. Markowitz was quick to appreciate the role 
that computers would have in bringing math¬ 
ematics to bear on business problems. Opti¬ 
mization and simulation were on the way to 
becoming the tools of the future, replacing the 
quest for closed-form solutions of mathematical 
problems. 

In the following years, Markowitz developed 
a full theory of the investment management 
process based on optimization. His optimiza¬ 
tion theory had the merit of being applicable to 
practical problems, even outside of the realm of 
finance. With the progressive diffusion of high¬ 
speed computers, the practice of financial opti¬ 
mization has found broad application. 

UNDERSTANDING VALUE: 
MODIGLIANI AND MILLER 

At about the same time that Markowitz was 
tackling the problem of how investors should 
behave, taking asset price processes as a given, 
other economists were trying to understand 


how markets determine value. Adam Smith 
had introduced the notion of perfect compe¬ 
tition (and therefore perfect markets) in the sec¬ 
ond half of the 18th century. In a perfect market, 
there are no impediments to trading: Agents 
are price takers who can buy or sell as many 
units as they wish. The neoclassical economists 
of the 1960s took the idea of perfect markets as a 
useful idealization of real free markets. In par¬ 
ticular, they argued that financial markets are 
very close to being perfect markets. The theory 
of asset pricing was subsequently developed to 
explain how prices are set in a perfect market. 

In general, a perfect market results when the 
number of buyers and sellers is sufficiently 
large, and all participants are small enough rel¬ 
ative to the market so that no individual market 
agent can influence a commodity's price. Con¬ 
sequently, all buyers and sellers are price takers, 
and the market price is determined where there 
is equality of supply and demand. This condi¬ 
tion is more likely to be satisfied if the commod¬ 
ity traded is fairly homogeneous (for example, 
corn or wheat). 

There is more to a perfect market than market 
agents being price takers. It is also required that 
there are no transaction costs or impediments 
that interfere with the supply and demand of 
the commodity. Economists refer to these vari¬ 
ous costs and impediments as "frictions." 

The costs associated with frictions generally 
result in buyers paying more than in the absence 
of frictions, and/or sellers receiving less. In the 
case of financial markets, frictions include: 

• Commissions charged by brokers. 

• Bid-ask spreads charged by dealers. 

• Order handling and clearance charges. 

• Taxes (notably on capital gains) and 
government-imposed transfer fees. 

• Costs of acquiring information about the fi¬ 
nancial asset. 

• Trading restrictions, such as exchange- 
imposed restrictions on the size of a position 
in the financial asset that a buyer or seller may 
take. 
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• Restrictions on market makers. 

• Halts to trading that may be imposed by reg¬ 
ulators where the financial asset is traded. 

Modigliani-Miller Irrelevance 
Theorems and the Absence 
of Arbitrage 

A major step was taken in 1958 when Franco 
Modigliani and Merton Miller published a 
then-controversial article in which they main¬ 
tained that the value of a company does not de¬ 
pend on the capital structure of the firm. (In a 
1963 article, they corrected their analysis for the 
impact of corporate taxes.) The capital structure 
is the mix of debt and equity used to finance the 
firm. The traditional view prior to the publica¬ 
tion of the article by Modigliani and Miller was 
that there existed a capital structure that maxi¬ 
mized the value of the firm (i.e., there is an op¬ 
timal capital structure). Modigliani and Miller 
demonstrated that in the absence of taxes and 
in a perfect capital market, the capital structure 
was irrelevant (i.e., the capital structure does 
not affect the value of a firm). By extension, the 
irrelevance principle applies to the type of debt 
a firm may select (e.g., senior, subordinated, se¬ 
cured, and unsecured). 

In 1961, Modigliani and Miller published yet 
another controversial article in which they ar¬ 
gued that the value of a company does not 
depend on the dividends it pays but on its 
earnings. The basis for valuing a firm—earnings 
or dividends—had always attracted consider¬ 
able attention. Because dividends provide the 
hard cash that remunerates investors, they were 
considered by many as key to a firm's value. 

Modigliani and Miller's challenge to the tra¬ 
ditional view that capital structure and divi¬ 
dends matter when determining a firm's value 
was founded on the principle that the tradi¬ 
tional views were inconsistent with the work¬ 
ings of competitive markets where securities are 
freely traded. In their view, the value of a com¬ 
pany is independent of its financial structure: 
From a valuation standpoint, it does not mat¬ 


ter whether the firm keeps its earnings or dis¬ 
tributes them to shareholders. 

Known as the Modigliani-Miller theorems, these 
theorems paved the way for the development 
of arbitrage pricing theory. In fact, to establish 
their theorems, Modigliani and Miller made use 
of the notion of absence of arbitrage. Absence of 
arbitrage means that there is no possibility of 
making a risk-free profit without an investment. 
This implies that the same stream of cash flows 
should be priced in the same way across differ¬ 
ent markets. Absence of arbitrage is the funda¬ 
mental principle for relative asset pricing; it is 
the pillar on which derivative pricing rests. 

EFFICIENT MARKETS: FAMA 
AND SAMUELSON 

Absence of arbitrage entails market efficiency. 
Shortly after the Modigliani-Miller theorems 
had been established, Paul Samuelson in 1965 
and Eugene Fama in 1970 developed the no¬ 
tion of efficient markets: A market is efficient if 
prices reflect all available information. Bache- 
lier had argued that prices in a competitive 
market should be random conditionally to the 
present state of affairs. Fama and Samuelson 
put this concept into a theoretical framework, 
linking prices to information. 

In general, an efficient market refers to a market 
where prices at all times fully reflect all avail¬ 
able information that is relevant to the valuation 
of securities. That is, relevant information about 
the security is quickly impounded into the price 
of securities. 

Fama and Samuelson define "fully reflects" in 
terms of the expected return from holding a se¬ 
curity. The expected return over some holding 
period is equal to expected cash distributions 
plus the expected price change, all divided by 
the initial price. The price formation process 
defined by Fama and Samuelson is that the ex¬ 
pected return one period from now is a stochas¬ 
tic variable that already takes into account the 
"relevant" information set. They argued that 
in a market where information is shared by 
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all market participants, prices should fluctuate 
randomly. 

A price-efficient market has implications for 
the investment strategy that investors may wish 
to pursue. In an active strategy, investors seek to 
capitalize on what they perceive to be the mis¬ 
pricing of financial instruments (cash instru¬ 
ments or derivative instruments). In a market 
that is price efficient, active strategies will not 
consistently generate a return after taking into 
consideration transaction costs and the risks as¬ 
sociated with a strategy that is greater than sim¬ 
ply buying and holding securities. This has led 
investors in certain sectors of the capital market 
where empirical evidence suggests the sector is 
price efficient to pursue a strategy of indexing, 
which simply seeks to match the performance 
of some financial index. However Samuelson 
was careful to remark that the notion of effi¬ 
cient markets does not make investment analy¬ 
sis useless; rather, it is a condition for efficient 
markets. 

Another facet in this apparent contradiction of 
the pursuit of active strategies despite empirical 
evidence on market efficiency was soon to be 
clarified. Agents optimize a risk-return trade¬ 
off based on the stochastic features of price pro¬ 
cesses. Price processes are not simply random 
but exhibit a rich stochastic behavior. The ob¬ 
jective of investment analysis is to reveal this 
behavior. 


CAPITAL ASSET PRICING 
MODEL: SHARPE, LINTNER, 
AND MOSSIN 

Absence of arbitrage is a powerful economic 
principle for establishing relative pricing. In it¬ 
self, however, it is not a market equilibrium 
model. William Sharpe (1964), John Lintner 
(1965), and Jan Mossin (1966) developed a the¬ 
oretical equilibrium model of market prices 
called the capital asset pricing model (CAPM). 
As anticipated 60 years earlier by Walras and 
Pareto, Sharpe, Lintner, and Mossin developed 


the consequences of Markowitz's portfolio se¬ 
lection into a full-fledged stochastic general 
equilibrium theory. 

Asset pricing models categorize risk factors 
into two types. The first type is risk factors that 
cannot be diversified away via the Markowitz 
framework. That is, no matter what the investor 
does, the investor cannot eliminate these risk 
factors. These risk factors are referred to as sys¬ 
tematic risk factors or nondiversifiable risk factors. 
The second type is risk factors that can be elim¬ 
inated via diversification. These risk factors are 
unique to the asset and are referred to as unsys¬ 
tematic risk factors or diversifiable risk factors. 

The CAPM has only one systematic risk 
factor—the risk of the overall movement of the 
market. This risk factor is referred to as "mar¬ 
ket risk." This is the risk associated with hold¬ 
ing a portfolio consisting of all assets, called the 
"market portfolio." In the market portfolio, an 
asset is held in proportion to its market value. 
So, for example, if the total market value of all 
assets is SX and the market value of asset j is $Y, 
then asset j will comprise $Y/$X of the market 
portfolio. 

The expected return for an asset i according 
to the CAPM is equal to the risk-free rate plus a 
risk premium. The risk premium is the product 
of (1) the sensitivity of the return of asset i to 
the return of the market portfolio, and (2) the 
difference between the expected return on the 
market portfolio and the risk-free rate. It mea¬ 
sures the potential reward for taking on the risk 
of the market above what can be earned by in¬ 
vesting in an asset that offers a risk-free rate. 
Taken together, the risk premium is a product 
of the quantity of market risk and the poten¬ 
tial compensation of taking on market risk (as 
measured by the second component). 

The CAPM was highly appealing from the 
theoretical point of view. It was the first general- 
equilibrium model of a market that admitted 
testing with econometric tools. A critical chal¬ 
lenge to the empirical testing of the CAPM as 
pointed out by Roll (1977) is the identification 
of the market portfolio. 


Milestones in Financial Modeling 


475 


THE MULTIFACTOR CAPM: 
MERTON 

The CAPM assumes that the only risk that an 
investor is concerned with is uncertainty about 
the future price of a security. Investors, how¬ 
ever, are usually concerned with other risks that 
will affect their ability to consume goods and 
services in the future. Three examples would be 
the risks associated with future labor income, 
the future relative prices of consumer goods, 
and future investment opportunities. 

Recognizing these other risks that investors 
face, Robert Merton (1973b) extended the 
CAPM based on consumers deriving their opti¬ 
mal lifetime consumption when they face these 
"extramarket" sources of risk. These extramar¬ 
ket sources of risk are also referred to as "fac¬ 
tors," hence the model derived by Merton is 
called a multifactor CAPM. 

The multifactor CAPM says that investors 
want to be compensated for the risk associ¬ 
ated with each source of extramarket risk, in 
addition to market risk. In the case of the 
CAPM, investors hedge the uncertainty asso¬ 
ciated with future security prices by diver¬ 
sifying. This is done by holding the market 
portfolio. In the multifactor CAPM, in addition 
to investing in the market portfolio, investors 
will also allocate funds to something equiva¬ 
lent to a mutual fund that hedges a particu¬ 
lar extramarket risk. While not all investors are 
concerned with the same sources of extramarket 
risk, those that are concerned with a specific ex¬ 
tramarket risk will basically hedge them in the 
same way. 

The multifactor CAPM is an attractive model 
because it recognizes nonmarket risks. The pric¬ 
ing of an asset by the marketplace, then, must 
reflect risk premiums to compensate for these 
extramarket risks. Unfortunately, it may be dif¬ 
ficult to identify all the extramarket risks and to 
value each of these risks empirically. Further¬ 
more, when these risks are taken together, the 
multifactor CAPM begins to resemble the arbi¬ 
trage pricing theory model described next. 


ARBITRAGE PRICING 
THEORY: ROSS 

An alternative to the equilibrium asset pricing 
model just discussed, an asset pricing model 
based purely on arbitrage arguments, was de¬ 
rived by Stephen Ross (1976). The model, called 
the arbitrage pricing theory (APT) model, postu¬ 
lates that an asset's expected return is influ¬ 
enced by a variety of risk factors, as opposed to 
just market risk as assumed by the CAPM. The 
APT model states that the return on a security 
is linearly related to H systematic risk factors. 
However, the APT model does not specify what 
the systematic risk factors are, but it is assumed 
that the relationship between asset returns and 
the risk factors is linear. 

The APT model as given asserts that investors 
want to be compensated for all the risk factors 
that systematically affect the return of a security. 
The compensation is the sum of the products of 
each risk factor's systematic risk and the risk 
premium assigned to it by the capital market. 

Proponents of the APT model argue that it has 
several major advantages over the CAPM. First, 
it makes less restrictive assumptions about in¬ 
vestor preferences toward risk and return. As 
explained earlier, the CAPM theory assumes 
investors trade off between risk and return 
solely on the basis of the expected returns 
and standard deviations of prospective invest¬ 
ments. The APT model, in contrast, simply re¬ 
quires that some rather unobtrusive bounds be 
placed on potential investor utility functions. 
Second, no assumptions are made about the dis¬ 
tribution of asset returns. Finally, since the APT 
model does not rely on the identification of the 
true market portfolio, the theory is potentially 
testable. The model simply assumes that no ar¬ 
bitrage is possible. That is, using no additional 
funds (wealth) and without increasing risk, it is 
not possible for an investor to create a portfolio 
to increase return. 

The APT model provides theoretical support 
for an asset pricing model where there is more 
than one risk factor. Consequently, models of 
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this type are referred to as multifactor risk 
models. These models are applied to portfolio 
management. 

ARBITRAGE, HEDGING, AND 
OPTION THEORY: BLACK, 
SCHOLES, AND MERTON 

The idea of arbitrage pricing can be extended 
to any price process. A general model of asset 
pricing will include a number of independent 
price processes plus a number of price processes 
that depend on the first process by arbitrage. 
The entire pricing structure may or may not be 
cast in a general equilibrium framework. 

Arbitrage pricing allowed derivative pricing. 
With the development of derivatives trading, 
the requirement of a derivative valuation and 
pricing model made itself felt. The first formal 
solution of the option pricing model was devel¬ 
oped independently by Fisher Black and My¬ 
ron Scholes (1973), working together, and in the 
same year by Robert Merton (1973a). 

The solution of the option pricing problem 
proposed by Black, Scholes, and Merton was 
simple and elegant. Suppose that a market 
contains a risk-free bond, a stock, and an op¬ 
tion. Suppose also that the market is arbitrage- 
free and that stock price processes follow a 
continuous-time geometric Brownian motion. 
Black, Scholes, and Merton demonstrated that 
it is possible to construct a portfolio made up of 
the stock plus the bond that perfectly replicates 
the option. The replicating portfolio can be ex¬ 
actly determined, without anticipation, solving 
a partial differential equation. 

The idea of replicating portfolios has impor¬ 
tant consequences. Whenever a financial instru¬ 
ment (security or derivative instrument) pro¬ 
cess can be exactly replicated by a portfolio of 
other securities, absence of arbitrage requires 
that the price of the original financial instru¬ 
ment coincide with the price of the replicating 
portfolio. Most derivative pricing algorithms 
are based on this principle: To price a deriva¬ 


tive instrument, one must identify a replicating 
portfolio whose price is known. 

Pricing by portfolio replication received a 
powerful boost with the discovery that calcu¬ 
lations can be performed in a risk-neutral prob¬ 
ability space where processes assume a simpli¬ 
fied form. The foundation was thus laid for the 
notion of equivalent martingales, developed by 
Michael Harrison and David Kreps (1979) and 
Michael Harrison and Stanley Pliska (1981). Not 
all price processes can be reduced in this way: If 
price processes do not behave sufficiently well 
(i.e., if the risk does not vanish with the van¬ 
ishing time interval), then replicating portfolios 
cannot be found. In these cases, risk can be min¬ 
imized but not hedged. 

KEY POINTS 

• The development of mathematical finance be¬ 
gan at the end of the nineteenth century with 
work on general equilibrium theory by Wal¬ 
ras and Pareto. 

• At the beginning of the twentieth century, 
Bachelier and Lundberg made a seminal con¬ 
tribution, introducing respectively Brownian 
motion price processes and Markov Poisson 
processes for collective risk events. 

• The advent of digital computers enabled the 
large-scale application of advanced mathe¬ 
matics to finance theory, ushering in opti¬ 
mization and simulation. 

• In 1952, Markowitz introduced the theory of 
portfolio optimization, which advocates the 
strategy of portfolio diversification. 

• In 1961, Modigliani and Miller argued that the 
value of a company is based not on its divi¬ 
dends and capital structure, but on its earn¬ 
ings; their formulation was to be called the 
Modigliani-Miller theorem. 

• In the 1960s, major developments included 
the efficient market hypothesis (Samuelson 
and Fama), the capital asset pricing model 
(Sharpe, Lintner, and Mossin), and the multi¬ 
factor CAPM (Merton). 


Milestones in Financial Modeling 
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• In the 1970s, major developments included 
the arbitrage pricing theory (Ross) that led 
to multifactor models and option pricing for¬ 
mulas (Black, Scholes, and Merton) based on 
replicating portfolios, which are used to price 
derivatives if the underlying price processes 
are known. 
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Abstract: It is often said that investment management is an art, not a science. However, since 
the early 1990s the market has witnessed a progressive shift toward a more industrial view of the 
investment management process. There are several reasons for this change. First, with globalization 
the universe of investable assets has grown many times over. Asset managers might have to choose 
from among several thousand possible investments from around the globe. The S&P 500 index 
is itself chosen from a pool of 8,000 investable U.S. stocks. Second, institutional investors, often 
together with their investment consultants, have encouraged asset management firms to adopt 
an increasingly structured process with documented steps and measurable results. Pressure from 
regulators and the media is another factor. Lastly, the sheer size of the markets makes it imperative 
to adopt safe and repeatable methodologies. The volumes are staggering. 


In its modern sense, financial modeling is 
the design (or engineering) of contracts and 
portfolios of contracts that result in prede¬ 
termined cash flows contingent on different 
events. Broadly speaking, financial models are 
employed to manage investment portfolios and 
risk. The objective is the transfer of risk from 
one entity to another via appropriate contracts. 
Though the aggregate risk is a quantity that can¬ 
not be altered, risk can be transferred if there is 
a willing counterparty. 

Financial modeling came to the forefront of 
finance in the 1980s with the broad diffusion 
of derivative instruments. However, the con¬ 
cept and practice of financial modeling are 
quite old. Evidence of the use of sophisticated 
cross-border instruments of credit and pay¬ 


ment dating from the time of the First Crusade 
(1095-1099) has come down to us from the let¬ 
ters of Jewish merchants in Cairo. The notion 
of the diversification of risk (central to modern 
risk management) and the quantification of in¬ 
surance risk (a requisite for pricing insurance 
policies) were already understood, at least in 
practical terms, in the 14th century. The rich 
epistolary of Francesco Datini, a 14th-century 
merchant, banker, and insurer from Prato 
(Tuscany, Italy), contains detailed instructions 
to his agents on how to diversify risk and in¬ 
sure cargo. It also gives us an idea of insurance 
costs: Datini charged 3.5% to insure a cargo of 
wool from Malaga to Pisa and 8% to insure 
a cargo of malmsey (sweet wine) from Genoa 
to Southampton, England. These, according to 
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one of Datini's agents, were low rates: He con¬ 
sidered 12-15% a fair insurance premium for 
similar cargo. 

What is specific to modern financial model¬ 
ing is the quantitative management of risk. Both 
the pricing of contracts and the optimization of 
investments require some basic capabilities of 
statistical modeling of financial contingencies. 
It is the size, diversity, and efficiency of mod¬ 
ern competitive markets that makes the use of 
modeling imperative. 

THE ROLE OF INFORMATION 
TECHNOLOGY 

Advances in information technology are be¬ 
hind the widespread adoption of modeling in 
finance. The most important advance has been 
the enormous increase in the amount of com¬ 
puting power, concurrent with a steep fall in 
prices. Government agencies have long been 
using computers for economic modeling, but 
private firms found it economically justifiable 
only as of the 1980s. Back then, economic mod¬ 
eling was considered one of the "Grand Chal¬ 
lenges" of computational science (a term coined 
by Kenneth Wilson [1989], recipient of the 1982 
Nobel Prize in Physics, and later adopted by 
the U.S. Department of Energy in its High 
Performance Communications and Computing 
Program, which included economic modeling 
among the grand challenges). 

In the late 1980s, firms such as Merrill Lynch 
began to acquire supercomputers to perform 
derivative pricing computations. The overall 
cost of these supercomputing facilities, in the 
range of several million dollars, limited their 
diffusion to the largest firms. Today, compu¬ 
tational facilities ten times more powerful cost 
only a few thousand dollars. To place today's 
computing power in perspective, consider that 
a 1990 run-of-the-mill Cray supercomputer 
cost several million U.S. dollars and had a 
clock cycle of 4 nanoseconds (i.e., 4 billionths 
of a second or 250 million cycles per second, 
notated as 250 MHz). Today's fast laptop 


computers are 10 times faster with a clock cycle 
of 2.5 GHz and, at a few thousand dollars, cost 
only a fraction of the price. Supercomputer per¬ 
formance has itself improved significantly, with 
top computing speed in the range of several 
teraflops compared to the several megaflops 
of a Cray supercomputer in the 1990s. (Flops, 
which stands for floating point operations per 
second, is a measure of computational speed. 
A teraflop computer is a computer able to 
perform a trillion floating point operations 
per second.) In the space of 15 years, sheer 
performance has increased 1,000 times while 
the price-performance ratio has decreased by a 
factor of 10,000. Storage capacity has followed 
similar dynamics. 

The diffusion of low-cost, high-performance 
computers has allowed the broad use of numer¬ 
ical methods. Computations that were once per¬ 
formed by supercomputers in air-conditioned 
rooms are now routinely performed on desk¬ 
top machines. This has changed the landscape 
of financial modeling. The importance of find¬ 
ing closed-form solutions and the consequent 
search for simple models has been dramatically 
reduced. Computationally intensive methods 
such as Monte Carlo simulations and the nu¬ 
merical solution of differential equations are 
now widely used. As a consequence, it has be¬ 
come feasible to represent prices and returns 
with relatively complex models. Non-normal 
probability distributions have become common¬ 
place in many sectors of financial modeling. It 
is fair to say that the key limitation of finan¬ 
cial econometrics is now the size of available 
data samples or training sets, not the compu¬ 
tations; it is the data that limit the complexity 
of estimates. 

Mathematical modeling has also undergone 
major changes. Techniques such as equiv¬ 
alent martingale methods are being used 
in derivative pricing and cointegration, the 
theory of fat-tailed processes, and state-space 
modeling (including ARCH/GARCH and 
stochastic volatility models) are being used in 
econometrics. 


From Art to Financial Modeling 


481 


Powerful specialized mathematical lan¬ 
guages and vast statistical software libraries 
have been developed. The ability to program 
sequences of statistical operations within a sin¬ 
gle programming language has been a big 
step forward. Software firms such as Math- 
ematica and Mathworks, and major suppli¬ 
ers of statistical tools such as SAS, have 
created simple computer languages for the pro¬ 
gramming of complex sequences of statisti¬ 
cal operations. This ability is key to financial 
econometrics, which entails the analysis of large 
portfolios. (Note that although a number of 
highly sophisticated statistical packages are 
available to economists, these packages do not 
serve the needs of the financial econometrician 
who has to analyze a large number of time 
series.) 

Presently only large or specialized firms write 
complex applications from scratch; this is typ¬ 
ically done to solve specific problems, often in 
the derivatives area. The majority of financial 
modelers make use of high-level software pro¬ 
gramming tools and statistical libraries. It is dif¬ 
ficult to overestimate the advantage brought by 
these software tools; they cut development time 
and costs by orders of magnitude. 

In addition, there is a wide range of off-the- 
shelf financial applications that can be used 
directly by operators who have a general un¬ 
derstanding of the problem but no advanced 
statistical or mathematical training. For exam¬ 
ple, powerful complete applications from firms 
such as MSCI Barra and component applica¬ 
tions from firms such as FEA make sophisti¬ 
cated analytical methods available to a large 
number of professionals. 

Data have, however, remained a significant 
expense. The diffusion of electronic transac¬ 
tions has made available large amounts of data, 
including high-frequency data (HFD), which 
gives us information at the transaction level. 
As a result, in budgeting for financial model¬ 
ing, data have become an important factor in 
deciding whether to undertake a new model¬ 
ing effort. 


A lot of data are now available free on the 
Internet. If the required granularity of data is 
not high, these data allow one to study the vi¬ 
ability of models and to perform rough tun¬ 
ing. However, real-life applications, especially 
applications based on finely grained data, re¬ 
quire data streams of a higher quality than those 
typically available free on the Internet. 


INTEGRATING 
QUALITATIVE AND 
QUANTITATIVE 
INFORMATION 

Textual information has remained largely out¬ 
side the domain of quantitative modeling, having 
long been considered the domain of judgment. 
This is now changing as financial firms begin to 
tackle the problem of what is commonly called 
information overload ; advances in computer tech¬ 
nology are again behind the change (see Jonas 
and Focardi, 2002). Reuters publishes the equiv¬ 
alent of three bibles of (mostly financial) news 
daily; it is estimated that five new research doc¬ 
uments come out of Wall Street every minute; 
asset managers at medium-sized firms report 
receiving up to 1,000 e-mails daily and work 
with as many as five screens on their desk. 
Conversely, there is also a lack of "digested" in¬ 
formation. It has been estimated that only one 
third of the roughly 10,000 U.S. public com¬ 
panies are covered by meaningful Wall Street 
research; there are thousands of companies 
quoted on the U.S. exchanges with no Wall 
Street research at all. It is unlikely the situa¬ 
tion is better relative to the tens of thousands 
of firms quoted on other exchanges through¬ 
out the world. Yet increasingly companies are 
providing information, including press releases 
and financial results, on their Web sites. 

Such unstructured (textual) information is 
progressively being transformed into self¬ 
describing, semistructured information that can 
be automatically categorized and searched by 


482 


Financial Modeling Principles 


computers. A number of developments are 
making this possible. These include: 

• The development of XML (extensible 
Markup Language) standards for tagging tex¬ 
tual data. This is taking us from free text 
search to queries on semistructured data. 

• The development of RDF (Resource Descrip¬ 
tion Framework) standards for appending 
metadata. This provides a description of the 
content of documents. 

• The development of algorithms and software 
that generate taxonomies and perform auto¬ 
matic categorization and indexation. 

• The development of database query functions 
with a high level of expressive power. 

• The development of high-level text mining 
functionality that allows "discovery." 

The emergence of standards for the han¬ 
dling of "meaning" is a major development. It 
implies that unstructured textual information, 
which some estimates put at 80% of all content 
stored in computers, will be largely replaced by 
semistructured information ready for machine 
handling at a semantic level. Today's standard 
structured databases store data in a prespecified 
format so that the position of all elementary in¬ 
formation is known. For example, in a trading 
transaction, the date, the amount exchanged, 
the names of the stocks traded, and so on are 
all stored in predefined fields. Flowever, textual 
data such as news or research reports do not al¬ 
low such a strict structuring. To enable the com¬ 
puter to handle such information, a descriptive 
metafile is appended to each unstructured file. 
The descriptive metafile is a structured file that 
contains the description of the key information 
stored in the unstructured data. The result is a 
semistructured database made up of unstruc¬ 
tured data plus descriptive metafiles. 

Industry-specific and application-specific 
standards are being developed around the 
general-purpose XML. At the time of this writ¬ 
ing, there are numerous initiatives established 
with the objective of defining XML standards 
for applications in finance, from time series to 


analyst and corporate reports and news. While 
it is not yet clear which of the competing efforts 
will emerge as the de facto standards, attempts 
are now being made to coordinate standardiza¬ 
tion efforts, eventually adopting the ISO 15022 
central data repository as an integration point. 

Technology for handling unstructured data 
has already made its way into the industry 
Factiva, a Dow Jones-Reuters company, uses 
commercially available text mining software to 
automatically code and categorize more than 
400,000 news items daily, in real time (prior to 
adopting the software, they manually coded 
and categorized some 50,000 news articles 
daily). Users can search the Factiva database, 
which covers 118 countries and includes some 
8,000 publications and more than 30,000 com¬ 
pany reports with simple intuitive queries 
expressed in a language close to the natural lan¬ 
guage. Several firms use text mining technology 
in their Web-based research portals for clients 
on the buy and sell sides. Such services typ¬ 
ically offer classification, indexation, tagging, 
filtering, navigation, and search. 

These technologies are helping to organize re¬ 
search flows. They allow us to automatically 
aggregate, sort, and simplify information and 
provide the tools to compare and analyze the in¬ 
formation. In serving to pull together material 
from myriad sources, these technologies will 
not only form the basis of an internal knowl¬ 
edge management system but allow us to better 
structure the whole investment management 
process. Ultimately, the goal is to integrate data 
and text mining in applications such as fun¬ 
damental research and event analysis, linking 
news, and financial time series. 


PRINCIPLES FOR 
ENGINEERING A SUITE 
OF MODELS 

Creating a suite of models to satisfy the needs of 
a financial firm is engineering in full earnest. It 
begins with a clear statement of the objectives. 
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In the case of financial modeling, the objective is 
identified by the type of decision-making pro¬ 
cess that a firm wants to implement. The engi¬ 
neering of a suite of financial models requires 
that the process on which decisions are made 
is fully specified and that the appropriate infor¬ 
mation is supplied at every step. This statement 
is not as banal as it might seem. 

We have now reached the stage where, in 
some markets, financial decision making can be 
completely automated through optimizers. As 
we will see in the following entries, one can de¬ 
fine models able to construct a conditional prob¬ 
ability distribution of returns. An optimizer will 
then translate the forecast into a tradable port¬ 
folio. The manager becomes a kind of high-level 
supervisor of an otherwise automated process. 

However, not all financial decision-making 
applications are, or can be, fully automated. 
In many cases, it is the human operator who 
makes the decision, with models supplying the 
information needed to arrive at the decision. 
Building an effective suite of financial models 
requires explicit decisions as to (1) what level 
of automation is feasible and desirable, and (2) 
what information or knowledge is required. 

The integration of different models and of 
qualitative and quantitative information is a 
fundamental need. This calls for integration 
of different statistical measures and points of 
view. For example, an asset management firm 
might want to complement a portfolio opti¬ 
mization methodology based on Gaussian fore¬ 
casting with a risk management process based 
on extreme value theory. The two processes of¬ 
fer complementary views. In many cases, how¬ 
ever, different methodologies give different 


results though they work on similar principles 
and use the same data. In these cases, integra¬ 
tion is delicate and might run against statistical 
principles. 

In deciding which modeling efforts to invest 
in, many firms have in place a sophisticated 
evaluation system. Firms evaluate a model's re¬ 
turn on investment and how much it will cost 
to buy the data necessary to run the model. 

KEY POINTS 

* Key to a quantitative framework is the mea¬ 
surement and management of uncertainty 
(i.e., risk) and financial modeling. 

* Modeling is the tool to achieve these objec¬ 
tives; advances in information technology are 
the enabler. 

* Unstructured textual information is progres¬ 
sively being transformed into self-describing, 
semistructured information, allowing a better 
structuring of the research process. 

* After nearly two decades of experience with 
quantitative methods, market participants 
now more clearly perceive the benefits and 
the limits of modeling; given today's tech¬ 
nology and markets, the need to better 
integrate qualitative and quantitative infor¬ 
mation is clearly felt. 
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Abstract: We are confronted with data every day, constantly. Daily newspapers contain information 
on stock prices, economic figures, quarterly business reports on earnings and revenues, and much 
more. These data offer observed values of given quantities. The basic data types can be qualitative, 
ordinal, or quantitative. 


In this entry, we will present the first essentials 
of data description. We describe all data types 
and levels. We explain and illustrate why one 
has to be careful about the permissible compu¬ 
tations concerning each data level. 1 

We will restrict ourselves to univariate data, 
that is, data of only one dimension. For ex¬ 
ample, if you follow the daily returns of one 
particular stock, you obtain a one-dimensional 
series of observations. If you had observed two 
stocks, then you would have obtained a two- 
dimensional series of data, and so on. More¬ 
over, the notion of frequency distributions, 
empirical frequency distributions, and cumula¬ 
tive frequency distributions is introduced. The 
goal of this entry is to provide the first methods 


necessary to begin data analysis. After reading 
this entry you will learn how to formalize the 
first impression you obtain from the data in or¬ 
der to retrieve the most basic structure inherent 
in the data. That is essential for any subsequent 
tasks you may undertake with the data. Above 
all, though, you will have to be fully aware of 
what you want to learn from the data. That step 
is maybe the most important task before getting 
started in investigating the data. For example, 
you may just want to know what the minimum 
return has been of your favorite stock during 
the last year before you decide to purchase. Or 
you are interested in all returns from last year 
to learn how this stock typically performs, that 
is, which returns occur more often than others. 
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and how often. In the latter case, you definitely 
have to be more involved to obtain the neces¬ 
sary information than in the first case. 


DATA TYPES 

Data are gathered by several methods. In the 
financial industry, we have market data based 
on regular trades recorded by the exchanges. 
Theses data are directly observable. Aside from 
the regular trading process, there is so-called 
over-the-counter (OTC) business whose data 
are less accessible. Annual reports and quar¬ 
terly reports, on the other hand, are published 
by companies themselves in print or electron¬ 
ically. These data are available also in the 
business and finance sections of most major 
business oriented print media and the Internet. 
The fields of marketing and the social sciences 
know additional forms of data collection meth¬ 
ods. There are telephone surveys, mail ques¬ 
tionnaires, and even experiments. 

If one does research on certain financial quan¬ 
tities of interest, one might find the data avail¬ 
able from either free or commercial databases. 
Hence, one must be concerned with the quality 
of the data. Unfortunately, very often databases 
of unrestricted access such as those available on 
the Internet may be of limited credibility. In con¬ 
trast, there are many commercial purveyors of 
financial data who are generally acknowledged 
as providing accurate data. But, as always, qual¬ 
ity may have its price. 


Information Contained in the Data 

Once the data are gathered, it is the objective 
of descriptive statistics to visually and compu¬ 
tationally convert the amount of information 
given into quantities revealing the essentials in 
which we are interested. Commonly in this con¬ 
text, visual support is added since very often 
that allows for a much easier grasp of the infor¬ 
mation. 


The field of descriptive statistics discerns dif¬ 
ferent types of data. Very generally, there are 
two types: qualitative and quantitative data. 

If certain attributes of an item can only be 
assigned to categories, these data are referred 
to as qualitative. For example, stocks listed on 
the New York Stock Exchange (NYSE) can be 
categorized as belonging to a specific indus¬ 
try sector such as "banking," "energy," "media 
and telecommunications," and so on. That way, 
we assign the item stock as its attribute sector 
one or possibly more values from the set con¬ 
taining banking, energy, media and telecommu¬ 
nications, and so on. (Instead of attribute, we 
will most of the time use the term "variable.") 
Another example would be the credit ratings 
assigned to debt obligations by commercial 
rating companies such as Standard & Poor's, 
Moody's, and Fitch Ratings. Except for retriev¬ 
ing the value of an attribute, nothing more can 
be done with qualitative data. One may use a 
numerical code to indicate the different sectors 
(e.g., 1 = banking, 2 = energy, and so on). How¬ 
ever, we are not allowed to perform any compu¬ 
tation with these figures since they are simply 
proxies of the underlying attribute sector. 

However, if an item is assigned a quantita¬ 
tive variable, the value of this variable is nu¬ 
merical. Generally, all real numbers are eligible. 
Depending on the case, however, one will 
use discrete values only, such as integers. 
Stock prices or dividends, for example, are 
quantitative data drawing from—up to some 
digits—positive real numbers. Quantitative 
data have the feature that one can perform 
transformations and computations with them. 
One can easily think of the average price of all 
companies comprising some index on a certain 
day, while it would make absolutely no sense 
to do the same with qualitative data. 

Data Levels and Scale 

In descriptive statistics, we group data accord¬ 
ing to measurement levels. The measurement 
level gives an indication as to the sophistica¬ 
tion of the analysis techniques that one can 
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apply to the data collected. Typically, a hierar¬ 
chy with five levels of measurement—nominal, 
ordinal, interval, ratio, and absolute—is used 
to group data. The latter three form the set of 
quantitative data. If the data are of a certain 
measurement level, they are said to be scaled 
accordingly. That is, the data are referred to as 
nominally scaled, and so on. 

Nominally scaled data are on the bottom of the 
hierarchy. Despite the low level of sophistica¬ 
tion, this type of data are commonly used. An 
example is the attribute sector of stocks. We al¬ 
ready learned that, even though we can assign 
numbers as proxies to nominal values, these 
numbers have no numerical meaning whatso¬ 
ever. We might just as well assign letters to the 
individual nominal values, for example, "B = 
banking," "E = energy," and so on. 

Ordinally scaled data are one step higher in the 
hierarchy. We also refer to this type as "rank 
data," since we can already perform a ranking 
within the set of values. We can make use of a re¬ 
lationship among the different values by treat¬ 
ing them as quality grades. For example, we 
can divide the stocks listed in a particular stock 
index according to their market capitalization 
into five groups of equal size. Let "A" denom¬ 
inate the top 20% of the stocks. Also, let "B" 
denote the next 20% below, and so on, until we 
obtain the five groups: A, B, C, D, and E. After 
ordinal scaling, we can make statements such 
as "Group A is better than group C." Hence, 
we have a natural ranking or order among the 
values. However, we cannot quantify the dif¬ 
ference between them. Also, the credit rating of 
debt obligations is ordinarily scaled. 

Until now, we can summarize that while we 
can test the relationship between nominal data 
for equality only, we can additionally deter¬ 
mine a greater or less than relationship between 
ordinal data. 

Data on an interval scale are given if they can 
be reasonably transformed by a linear equation. 
Suppose we are given values x. It is now feasi¬ 
ble to express a new variable y by the relation¬ 
ship y = a' x + b, where the x’s are our original 


data. If x has a meaning, then so does y. It is 
obvious that data have to possess a numerical 
meaning and therefore be quantitative in or¬ 
der to be measured on an interval scale. For 
example, consider the temperature F given in 
degrees Fahrenheit. Then, the corresponding 
temperature in degrees Celsius, C, will result 
from the equation C = (F — 32)/ 1.8. Equiva¬ 
lently, if one is familiar with physics, the same 
temperature measured in degrees Kelvin, K, 
will result from K — C + 273.15. So, say it is 
55° Fahrenheit for Americans, the same tem¬ 
perature will mean approximately 13° Celsius 
for Europeans, and they will not feel any cooler. 
Generally, interval data allow for the calculation 
of differences. For example, 70° — 60° Fahren¬ 
heit = 10° Fahrenheit may reasonably express 
the difference in temperature between Los An¬ 
geles and San Francisco. But be careful—the 
difference in temperature measured in Celsius 
between the two cities is not the same. How 
much is it? 

Data measured on a ratio scale share all the 
properties of interval data. In addition, ratio 
data have a fixed or true zero point. This is not 
the case with interval data. Their intercept, b, 
can be arbitrarily changed through transforma¬ 
tion. Since the zero point of ratio data is in¬ 
variable, one can only transform the slope, a. 
So, for example, y = a' x is always a multiple 
of x. In other words, there is a relationship be¬ 
tween y and x given by the ratio a, hence the 
name used to describe this type of data. One 
would not have this feature if one would per¬ 
mit some b different from zero in the transfor¬ 
mation. Consider, for example, the stock price, 
E, of some European stock given in euro units. 
The same price in U.S. dollars, D, would be D 
equals E times the exchange rate between euros 
and U.S. dollars. But if the company's price af¬ 
ter bankruptcy went to zero, the price in either 
currency would be zero, even at different rates 
determined by the ratio of U.S. dollar per euro. 
This is a result of the invariant zero point. 

Absolute data are given by quantitative data 
measured on a scale even stricter than for 
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ratio data. Here, along with the zero point, the 
units are invariant as well. Data measured on 
an absolute scale occur when transformation 
would be mathematically feasible but lacks any 
interpretational implication. A common exam¬ 
ple is provided by counting numbers. Anybody 
would agree on the number of stocks listed in 
a certain stock index. There is no ambiguity as 
to the zero point and the count increments. If 
one stock is added to the index, it is immedi¬ 
ately clear that the difference to the content of 
the old index is exactly one unit of stock, as¬ 
suming that no stock is deleted This absolute 
scale is the most intuitive and needs no further 
discussion. 


Cross-S 


ectional 


Sector 


Time Series 

-► 


Time 


Figure 1 Relationship between Cross-Sectional 
and Time Series Data 


Cross-Sectional and Time 
Series Data 

There is another way of classifying data. Imag¬ 
ine collecting data from one and the same quan¬ 
tity of interest or variable. A variable is some 
quantity that can assume values from a value 
set. For example, the variable "stock price" can 
technically assume any nonnegative real num¬ 
ber of currency but only one value at a time. 
Each day, it assumes a certain value, which is 
the day's stock price. As another example, a 
variable could be the dividend payments from 
a specific company over some period of time. 
In the case of dividends, the observations are 
made each quarter. The accumulated data then 
form what is called time series data. In contrast, 
one could pick a particular time period of inter¬ 
est such as the first quarter of the current year 
and observe the dividend payments of all com¬ 
panies listed in the Standard & Poor's 500 index. 
By doing so, one would obtain cross-sectional 
data of the universe of stocks in the S&P 500 
index at that particular time. 

Summarizing, time series data are data re¬ 
lated to a variable successively observed at a 
sequence of points in time. Cross-sectional data 
are values of a particular variable across some 
universe of items observed at a unique point in 
time. This is visualized in Figure 1. 


FREQUENCY 

DISTRIBUTIONS 

Sorting and Counting Data 

One of the most important aspects when deal¬ 
ing with data is that they are effectively orga¬ 
nized and transformed in order to convey the 
essential information contained in them. This 
processing of the original data helps to display 
the inherent meaning in a way that is more ac¬ 
cessible for intuition. But before advancing to 
the graphical presentation of the data, we will 
first describe the methods of structuring data. 

Suppose that we are interested in a particular 
variable that can assume a set of either finite 
or infinitely many values. These values may be 
qualitative or quantitative by nature. In either 
case, the initial step when obtaining a data sam¬ 
ple for some variable is to sort the values of 
each observation and then to determine the fre¬ 
quency distribution of the dataset. This is done 
simply by counting the number of observations 
for each possible value of the variable. Alter¬ 
natively, if the variable can assume values on 
all or part of the real line, the frequency can be 
determined by counting the number of obser¬ 
vations that fall into nonoverlapping intervals 
partitioning the real line. 

In our illustration, we will begin with qual¬ 
itative data first and then move on to the 
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Table 1 DJIA Components as of December 12, 2006 


Company 

Industrial Classification 

Benchmark 

(ICB) Subsector 

3M Co. 

Diversified Industrials 

Alcoa Inc. 

Aluminum 

Altria Group Inc. 

Tobacco 

American Express Co. 

Consumer Finance 

American International 

Full Line Insurance 

Group Inc. 

AT&T Inc. 

Fixed Line 
Telecommunications 

Boeing Co. 

Aerospace 

Caterpillar Inc. 

Commercial Vehicles & 
Trucks 

Citigroup Inc. 

Banks 

Coca-Cola Co. 

Soft Drinks 

E.I. DuPont de Nemours & 

Commodity Chemicals 

Co. 

Exxon Mobil Corp. 

Integrated Oil & Gas 

General Electric Co. 

Diversified Industrials 

General Motors Corp. 

Automobiles 

Hewlett-Packard Co. 

Computer Hardware 

Home Depot Inc. 

Home Improvement 
Retailers 

Honeywell International 

Diversified Industrials 

Inc. 

Intel Corp. 

Semiconductors 

International Business 

Computer Services 

Machines Corp. 

Johnson & Johnson 

Pharmaceuticals 

JPMorgan Chase & Co. 

Banks 

McDonald's Corp. 

Restaurants & Bars 

Merck & Co. Inc. 

Pharmaceuticals 

Microsoft Corp. 

Software 

Pfizer Inc. 

Pharmaceuticals 

Procter & Gamble Co. 

Nondurable Household 
Products 

United Technologies Corp. 

Aerospace 

Verizon Communications 

Fixed Line 

Inc. 

Telecommunications 

Wal-Mart Stores Inc. 

Broadline Retailers 

Walt Disney Co. 

Broadcasting & 
Entertainment 


quantitative aspects in the sequel. For exam¬ 
ple, suppose we want to analyze the frequency 
of the industry subsectors of the components 
listed in the Dow Jones Industrial Average 
(DJIA), an index comprised of 30 U.S. stocks. 
Table 1 displays the 30 companies in the in¬ 
dex along with their respective industry sectors 
as of December 12, 2006. By counting the ob¬ 
served number of each possible Industry Clas- 


Table 2 Frequency Distribution of the Industry 
Subsectors 


ICB Subsector 

Frequency a, 

Aerospace 

2 

Aluminum 

1 

Automobiles 

1 

Banks 

2 

Broadcasting & Entertainment 

1 

Broadline Retailers 

1 

Commercial Vehicles & Trucks 

1 

Commodity Chemicals 

1 

Computer Hardware 

1 

Computer Services 

1 

Consumer Finance 

1 

Diversified Industrials 

3 

Fixed Line Telecommunications 

2 

Full Line Insurance 

1 

Home Improvement Retailers 

1 

Integrated Oil & Gas 

1 

Nondurable Household Products 

1 

Pharmaceuticals 

3 

Restaurants & Bars 

1 

Semiconductors 

1 

Soft Drinks 

1 

Software 

1 

Tobacco 

1 


sification Benchmark (ICB) subsector, we obtain 
Table 2, which shows the frequency distribu¬ 
tion of the variable subsector. Note in the table 
that many subsector values appear only once. 
Hence, this might suggest employing a coarser 
set for the ICB subsector values in order to re¬ 
duce the amount of information in the data to a 
necessary minimum. 

Now suppose you would like to compare 
this to the Dow Jones Global Titans 50 Index 
(DJGTI). This index includes the 50 largest- 
capitalization and best-known blue-chip 
companies listed on the NYSE. The companies 
contained in this index are listed in Table 3 along 
with their respective ICB subsectors. The next 
step would also be to sort the data according to 
their values and count each hit of a value, finally 
listing the respective count numbers for each 
value. A problem arises now, however, when 
you want to directly compare the numbers with 
those obtained for the DJIA because the num¬ 
ber of stocks contained in each index is not the 
same. Hence, we cannot compare the respective 
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Table 3 Dow Jones Global Titans 50 Index as of December 12, 2006 


Company Name ICB Subsector 


Abbott Laboratories 
Altria Group Inc. 

American International Group Inc. 
Astrazeneca PLC 
AT&T Inc. 

Bank of America Corp. 

Barclays PLC 
BP PLC 
Chevron Corp. 

Cisco Systems Inc. 

Citigroup Inc. 

Coca-Cola Co. 

ConocoPhillips 
Dell Inc. 

ENI S.p.A. 

Exxon Mobil Corp. 

General Electric Co. 

GlaxoSmithKline PLC 
HBOS PLC 
Hewlett-Packard Co. 

HSBC Holdings PLC (UK Reg) 

ING Groep N.V. 

Intel Corp. 

International Business Machines Corp. 
Johnson & Johnson 
JPMorgan Chase & Co. 

Merck & Co. Inc. 

Microsoft Corp. 

Mitsubishi UFJ Financial Group Inc. 
Morgan Stanley 
Nestle S.A. 

Nokia Corp. 

Novartis AG 
PepsiCo Inc. 

Pfizer Inc. 

Procter & Gamble Co. 

Roche Holding AG Part. Cert. 

Royal Bank of Scotland Group PLC 
Royal Dutch Shell PLC A 
Samsung Electronics Co. Ltd. 

Siemens AG 
Telefonica S.A. 

Time Warner Inc. 

Total S.A. 

Toyota Motor Corp. 

UBS AG 

Verizon Communications Inc. 
Vodafone Group PLC 
Wal-Mart Stores Inc. 

Wyeth 


Pharmaceuticals 

Tobacco 

Full Line Insurance 

Pharmaceuticals 

Fixed Line Telecommunications 

Banks 

Banks 

Integrated Oil & Gas 
Integrated Oil & Gas 
Telecommunications Equipment 
Banks 
Soft Drinks 
Integrated Oil & Gas 
Computer Hardware 
Integrated Oil & Gas 
Integrated Oil & Gas 
Diversified Industrials 
Pharmaceuticals 
Banks 

Computer Hardware 
Banks 

Life Insurance 
Semiconductors 
Computer Services 
Pharmaceuticals 
Banks 

Pharmaceuticals 

Software 

Banks 

Investment Services 
Food Products 

Telecommunications Equipment 

Pharmaceuticals 

Soft Drinks 

Pharmaceuticals 

Nondurable Household Products 

Pharmaceuticals 

Banks 

Integrated Oil & Gas 

Semiconductors 

Electronic Equipment 

Fixed Line Telecommunications 

Broadcasting & Entertainment 

Integrated Oil & Gas 

Automobiles 

Banks 

Fixed Line Telecommunications 
Mobile Telecommunications 
Broadline Retailers 
Pharmaceuticals 
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Table 4 Comparison of Relative Frequencies of DJIA 
and DJGTI 


ICB Subsector 

Relative 

Frequencies 

DJIA DJGTI 

Aerospace 

0.067 

0.000 

Aluminum 

0.033 

0.000 

Automobiles 

0.033 

0.020 

Banks 

0.067 

0.180 

Broadcasting & Entertainment 

0.033 

0.020 

Broadline Retailers 

0.033 

0.020 

Commercial Vehicles & Trucks 

0.033 

0.000 

Commodity Chemicals 

0.033 

0.000 

Computer Hardware 

0.033 

0.040 

Computer Services 

0.033 

0.020 

Consumer Finance 

0.033 

0.000 

Diversified Industrials 

0.100 

0.020 

Electronic Equipment 

0.000 

0.020 

Fixed Line Telecommunications 

0.067 

0.060 

Food Products 

0.000 

0.020 

Full Line Insurance 

0.033 

0.020 

Home Improvement Retailers 

0.033 

0.000 

Integrated Oil & Gas 

0.033 

0.140 

Investment Services 

0.000 

0.020 

Life Insurance 

0.000 

0.020 

Mobile Telecommunications 

0.000 

0.020 

Nondurable Household Products 

0.033 

0.020 

Pharmaceuticals 

0.100 

0.180 

Restaurants & Bars 

0.033 

0.000 

Semiconductors 

0.033 

0.040 

Soft Drinks 

0.033 

0.040 

Software 

0.033 

0.020 

Telecommunications Equipment 

0.000 

0.040 

Tobacco 

0.033 

0.020 


absolute frequencies. Instead, we have to resort 
to something that creates comparability of the 
two datasets. This is done by expressing the 
number of observations of a particular value as 
the proportion of the total number of observa¬ 
tions in a specific dataset. That means we have 
to compute the relative frequency. See Table 4. 

Formal Presentation of Frequency 

For a better formal presentation, we denote the 
(absolute) frequency by a and, in particular, by 
a, for the ith value of the variable. Formally, the 
relative frequency /, of the ith value is, then, 
defined by 


where n is the total number of observations. 
With k being the number of the different values, 
the following holds: 

» = £/< 

i=i 

In our illustration, let n\ — 30 be the number 
of total observations in the DJIA and 112 — 50 
the total number of observations in the DJGTI. 
Table 4 shows the relative frequencies for all 
possible values. Notice that each index has 
some values that were observed with zero fre¬ 
quency, which still have to be listed for com¬ 
parison. When we look at the DJIA, we find 
out that the sectors Diversified Industrials and 
Pharmaceuticals each account for 10% of all sec¬ 
tors and therefore are the sectors with the high¬ 
est frequencies. Comparing these two sectors 
to the DJGTI, we find out that Pharmaceuticals 
play as important a role as a sector with an 18% 
share, while Diversified Industrials are of mi¬ 
nor importance. In this index. Banks are a very 
important sector with 18% also. A comparison 
of this sort can now be carried through for all 
subsectors thanks to the relative frequencies. 

Naturally, frequency (absolute and relative) 
distributions can be computed for all types of 
data since they do not require that the data have 
a numerical value. 

EMPIRICAL CUMULATIVE 
EREQUENCY DISTRIBUTION 
Accumulating Frequencies 

In addition to the frequency distribution, there 
is another quantity of interest for comparing 
data that is closely related to the absolute or 
relative frequency distribution. Suppose that 
one is interested in the percentage of all large- 
capitalization stocks in the DJIA with closing 
prices of at most US $50 on a specific day. One 
can sort the observed closing prices by their 
numerical values in ascending order to obtain 
something like the array shown in Table 5 for 
market prices as of December 15,2006. Note that 
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Table 5 DJIA Stocks by Share Price in Ascending 
Order as of December 15, 2006 


Company 

Share Price 

Intel Corp. 

20.77 

Pfizer Inc. 

25.56 

General Motors Corp. 

29.77 

Microsoft Corp. 

30.07 

Alcoa Inc. 

30.76 

Walt Disney Co. 

34.72 

AT&T Inc. 

35.66 

Verizon Communications Inc. 

36.09 

General Electric Co. 

36.21 

Hewlett-Packard Co. 

39.91 

Home Depot Inc. 

39.97 

Honeywell International Inc. 

42.69 

Merck & Co. Inc. 

43.60 

McDonald's Corp. 

43.69 

Wal-Mart Stores Inc. 

46.52 

JPMorgan Chase & Co. 

47.95 

E.I. DuPont de Nemours & Co. 

48.40 

Coca-Cola Co. 

49.00 

Citigroup Inc. 

53.11 

American Express Co. 

61.90 

United Technologies Corp. 

62.06 

Caterpillar Inc. 

62.12 

Procter & Gamble Co. 

63.35 

Johnson & Johnson 

66.25 

American International Group Inc. 

72.03 

Exxon Mobil Corp. 

78.73 

3M Co. 

78.77 

Altria Group Inc. 

84.97 

Boeing Co. 

89.93 

International Business Machines Corp. 

95.36 


Source: www.dj.com/TheCompany/FactSheets.htm, 

December 15,2006. 


since each value occurs once only, we have to 
assign each value an absolute frequency of 1 or 
a relative frequency of 1 / 30, respectively, since 
there are 30 component stocks in the DJIA. We 
start with the lowest entry ($20.77) and advance 
up to the largest value still less than $50, which 
is $49 (Coca-Cola). Each time we observe less 
than or equal to $50, we add 1 /30, accounting 
for the frequency of each company to obtain an 
accumulated frequency of 18/30 representing 
the total share of closing prices below $50. This 
accumulated frequency is called the "empirical 
cumulative frequency" at the value $50. If one 
computes this for all values, one obtains the em¬ 
pirical cumulative frequency distribution. The 
term "empirical" is used because the distribu¬ 
tion is computed from observed data. 


Formal Presentation of Cumulative 
Frequency Distributions 

Formally, the empirical cumulative frequency 
distribution F emp is defined as 

k 

Femp{x) = ^ ' d ttj 
/'=1 

where k is the index of the largest value ob¬ 
served that is still less than x. In our example, k is 
18. When we use relative frequencies, we obtain 
the empirical relative cumulative frequency dis¬ 
tribution defined analogously to the empirical 
cumulative frequency distribution, this time us¬ 
ing relative frequencies. Hence, we have 

k 

p/ mp w = E/‘ 

2=1 

In our example, F/„ p (50) = 18/30 = 0.6 = 
60%. 

Note that the empirical cumulative frequency 
distribution can be evaluated at any real x even 
though x need not be an observation. For any 
value x between two successive observations 
X{{) and X(j + \), the empirical cumulative fre¬ 
quency distribution as well as the empirical cu¬ 
mulative relative frequency distribution remain 
at their respective levels at x^y that is, they are 
of constant level F emp (x(,)) and F e 4 P (x(i)), respec¬ 
tively. For example, consider the empirical rel¬ 
ative cumulative frequency distribution for the 
data shown in Table 5. We can extend the distri¬ 
bution to a function that determines the value 
of the distribution at each possible value of the 
share price. The function is given in Table 6. 
Notice that if no value is observed more than 
once, then the empirical relative cumulative fre¬ 
quency distribution jumps by 1/N at each ob¬ 
served value. In our illustration, the jump size 
is 1/30. 

In Figure 2 the empirical relative cumulative 
frequency distribution is shown as a graph. 
Note that the values of the function are constant 
on the extended line between two successive 
observations, indicated by the solid point to the 
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Table 6 Empirical Relative Cumulative Frequency 
Distribution of DJIA Stocks from Table 5 


Femp(x) 

0.00 


X < 

20.77 

0.03 

20.77 

< X < 

25.56 

0.07 

25.56 

< X < 

29.77 

0.10 

29.77 

< X < 

30.07 

0.13 

30.07 

< X < 

30.76 

0.17 

30.76 

< X < 

34.72 

0.20 

34.72 

< X < 

35.66 

0.23 

35.66 

< X < 

36.09 

0.27 

36.09 

< X < 

36.21 

0.30 

36.21 

< X < 

39.91 

0.33 

39.91 

< X < 

39.97 

0.37 

39.97 

< X < 

42.69 

0.40 

42.69 

< X < 

43.60 

0.43 

43.60 

< X < 

43.69 

0.47 

43.69 

< X < 

46.52 

0.50 

46.52 

< X < 

47.95 

0.53 

47.95 

< X < 

48.40 

0.57 

48.40 

< X < 

49.00 

0.60 

49.00 

< X < 

53.11 

0.63 

53.11 

< X < 

61.90 

0.67 

61.90 

< X < 

62.06 

0.70 

62.06 

< X < 

62.12 

0.73 

62.12 

< X < 

63.35 

0.77 

63.35 

< X < 

66.25 

0.80 

66.25 

< X < 

72.03 

0.83 

72.03 

< X < 

78.73 

0.87 

78.73 

< X < 

78.77 

0.90 

78.77 

< X < 

84.97 

0.93 

84.97 

< X < 

89.93 

0.97 

89.93 

< X < 

95.36 

1.00 

95.36 

< X 



left of each horizontal line. At each observation, 
the vertical distance between the horizontal line 
extending to the right from the preceding obser¬ 
vation and the value of the function is exactly 
the increment 1/30. 

100% -I — 

90% - — 

80% - — 

70% ' 

60% - 
50% - 

40% - J 

30% - ^ 

20 % 

10% - 

0% J-—,=---.-. 

0 20 40 60 80 100 120 

Figure 2 Empirical Relative Cumulative Fre¬ 
quency Distribution of DJIA Stocks from Table 5 


The computation of either form of empirical 
cumulative distribution function is obviously 
not intuitive for categorical data unless we as¬ 
sign some meaningless numerical proxy to each 
value such as "Sector A" = 1, "Sector B" = 2, 
and so on. 

DATA CLASSES 

Reasons for Classifying 

When quantitative variables are such that the 
set of values—whether observed or theoreti¬ 
cally possible— includes intervals or the entire 
real numbers, then the variable is continuous. 
This is in contrast to discrete variables, which 
assume values only from a limited or count¬ 
able set. Variables on a nominal scale cannot 
be considered in this context. And because of 
the difficulties with interpreting the results, we 
will not attempt to explain the issue of classes 
for rank data either. 

When one counts the frequency of observed 
values of a continuous variable, one notices 
that hardly any value occurs more than once. 
(Naturally, the precision given by the number 
of digits rounded may result in higher occur¬ 
rences of certain values.) Theoretically, with 
100% chance, all observations will yield differ¬ 
ent values. Thus, the method of counting the 
frequency of each value is not feasible. Instead, 
the continuous set of values is divided into mu¬ 
tually exclusive intervals. Then, for each such 
interval, the number of values falling within 
that interval can be counted again. In other 
words, one groups the data into classes for 
which the frequencies can be computed. Classes 
should be such that their respective lower and 
upper bounds are real numbers. Also, whether 
the class bounds are elements of the classes or 
not must be specified. The class bounds of a 
class must be bounds of the respective adjacent 
classes as well, such that the classes seamlessly 
cover the entire data. The width should be the 
same for all classes. However, if there are ar¬ 
eas where the data are very intensely dense 
in contrast to areas of lesser density, then the 
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class width can vary according to significant 
changes in value density. In certain cases, most 
of the data are relatively evenly scattered within 
some range, while there are extreme values that 
are located in isolated areas on either end of the 
data array. Then, it is sometimes advisable to 
specify no lower bound to the lowest class and 
no upper bound to the uppermost class. Classes 
of this sort are called "open classes." Moreover, 
one should consider the precision to which the 
data are given. If values are rounded to the first 
decimal but there is the chance that the exact 
value might vary within half a decimal about 
the value given, class bounds have to consider 
this lack of certainty by admitting plus half a 
decimal on either end of the class. 

Formal Procedure of Classifying 

Formally, there are four criteria that the classes 
need to meet: 

Criterion 1: Mutual Exclusiveness: Each value 
can be placed in only one class. 

Criterion 2: Completeness: The set of classes 
needs to cover all values. 

Criterion 3: Equidistance: If possible, form 
classes of equal width. 

Criterion 4: Nonemptiness: If possible, avoid 
forming empty classes. 

It is intuitive that the number of classes should 
increase with an increasing range of values 
and increasing number of data. Though there 
are no stringent rules, two rules of thumb are 
given here with respect to the advised number 
of classes (first rule) and the best class width 
(second rule). The first, the so-called Sturge's 
rule, states that for a given set of continuous 
data of size n, one should use the nearest integer 
figure to 

1 + log 2 n = 1 + 3.222 log 10 n. 

Here, log fl n denotes the logarithm of n to the 
base a, with a being either 2 or 10. 

The second guideline is the so-called 
Freedman-Diaconis rule for the appropriate class 
width or bin size. Before turning to the second 


rule of thumb in more detail, we have to intro¬ 
duce the notion of the inner quartile range ( IQR ). 
This quantity measures the distance between 
the value where F/ mp is closest to 0.25 (that 
is, the so-called 0.25-quantile), and the value 
where F/ mp is closest to 0.75 (that is, the so- 
called 0.75-quantile). (The term "percentile" is 
used interchangeably with "quantile.") So the 
IQR range states how remote the lowest 25% of 
the observations are from the highest 25%. 2 As 
a consequence, the IQR comprises the central 
50% of a data sample. A little more attention 
will be given to the determination of the above- 
mentioned quantiles when we discuss sample 
moments and quantiles, since formally there 
might arise some ambiguity when computing 
them. (Note that the IQR cannot be computed 
for nominal or categorial data in a natural way.) 

Now we can return to the Freedman-Diaconis 
rule. It states that a good class width is given by 
the nearest integer to 

2 x IQR x N~ 1/3 

where N is the number of observations in the 
dataset. Note that there is an inverse relation¬ 
ship between the class width and the number 
of classes for each set of data. That is, given 
that the partitioning of the values into classes 
covers all observations, the number of classes n 
has to be equal to the difference between largest 
and smallest value divided by the class width, 
if classes are all of equal size w. Mathematically, 
that means 

U = (-hnax Tmin)/tf 

where x max denotes the largest value and 
x min denotes the smallest value considered, 
respectively. 

One should not be intimidated by all these 
rules. Generally, by mere ordering of the data 
in an array, intuition produces quite a good feel¬ 
ing of what the classes should look like. Some 
thought can be given to the timing of the for¬ 
mation of the classes. That is, when classes are 
formed prior to the data-gathering process, one 
does not have to store the specific values but 
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rather count only the number of hits within each 
class. 

Example of Classing Procedures 

Let's illustrate these rules. Table 7 gives the 12- 
month returns (in percent) of the 235 Franklin 
Templeton Investments Funds on January 11, 
2007. With this many data, it becomes obvious 
that it cannot be helpful to anyone to know the 
relative performance for the 235 funds. To ob¬ 
tain an overall impression of the distribution of 
the data without getting lost in detail, one has to 
aggregate the information given by classifying 
the data. 

For the sake of a better overview, the ordered 
array is given in Table 8. A quick glance at the 
data sorted in ascending order gives us the low¬ 
est (minimum) and largest (maximum) return, 
respectively. Here, we have x min = —18.3% and 
x max = 41.3%, respectively, yielding a range of 
59.6% to cover. 

We first classify the data according to Sturge's 
rule. For the number of classes, n, we obtain the 
nearest integer to 1 +log 2 235 = 8.877, which 
is 9. The class width is then determined by 
the range divided by the number of classes, 
56.6%/9, yielding a width of roughly 6.62%. 
This is not a nice number to deal with, so 
we may choose 7% instead without deviat¬ 
ing noticeably from the exact numbers given 
by Sturge's rule. We now cover a range of 
9 x 7% = 63%, which is slightly larger than the 
original range of the data. 

Selecting a value for the lower class bound of 
the lowest class slightly below our minimum, 
say —20.0%, and an upper class bound of the 
highest class, say 43.0%, we spread the surplus 
of the range (3.4%) evenly. The resulting classes 
can be viewed in Table 9, where in the first row 
the index of the respective class is given. The 
second row contains the class bounds. Brack¬ 
ets indicate that the value belongs to the class, 
whereas parentheses exclude given values. So, 
we obtain a half-open interval for each class 
containing all real numbers between the lower 
bound and just below the upper bound, thus 


excluding that value. In row three, we have the 
number of observations that fall into the respec¬ 
tive classes. 

We can check for the compliance with the four 
criteria given earlier. Because we use half-open 
intervals, we guarantee that Criterion 1 is ful¬ 
filled. Since the lowest class starts at —20%, and 
the highest class ends at 43%, Criterion 2 is sat¬ 
isfied. All nine classes are of width 7%, which 
complies with Criterion 3. Finally, the compli¬ 
ance with Criterion 4 can be checked easily. 

Next, we apply the Freedman-Diaconis rule. 
With our ordered array of data, we can deter¬ 
mine the 0.25 quartile by selecting the observa¬ 
tion whose index is the first to exceed 0.25 x 
N = .25 x 235 = 58.75. This yields the value of 
observation 59, which is 4.2%. Accordingly, the 
0.75-quartile is given by the value whose index 
is the first to exceed 0.75 x 235 = 176.25. For 
our return data, it is X177, which is 18.9%. The 
IQR is computed as 

18.9% - 4.2% = 14.7% 

such that the bin size of the classes (or class 
width) is now determined according to w = 
2x1 QR x ^/y235 = 4.764%. Taking the data 
range of 59.6% from the previous calculation, 
we obtain as the suggested number of classes 
59.6%/4.764 = 12.511. Once again, this is not 
a neat-looking figure. We stick with the initial 
class width of w — 4.764% as closely as possi¬ 
ble by selecting the next integer, say 5%. And, 
without any loss, we extend the range artifi¬ 
cially to 60%. So, we obtain for the number of 
classes 60%/5 = 12, which is close to our origi¬ 
nal real number, 12.511, computed according to 
the Freedman-Diaconis rule but much nicer to 
handle. We again spread the range surplus of 
0.4% (60% — 59.6%) evenly across either end of 
the range such that we begin our lowest class at 
—18.5% and end our highest class at 41.5%. The 
classes are given in Table 10. The first row of the 
table indicates the index of the respective class, 
while the second row gives the class bounds. 
The number of observations that fall into each 
class is shown in the last row. (One can easily 
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Table 7 12-Month Returns (in %) for the 235 Franklin Templeton Investment Funds (Luxembourg) on January 11, 
2007 


Aggr Growth A Acc 

1.9 

Mut Gb Discov A Acc EUR 

8.1 

Asian Grth A Dis USD 

21.3 

Glbl Bd A Dis GBP 

-0.7 

Aggr Growth A Dis 

-6.9 

Mut Gb Discov A Acc USD 

16.4 

Asian Grth C Acc 

20.6 

Glbl Bd B Dis 

7.5 

Aggr Growth B Acc 

0.9 

Mut Gb Discov A Dis GBP 

5.9 

Asian Grth I Acc EUR 

14.0 

Glbl Bd C Dis 

8.2 

Aggr Growth I Acc 

2.9 

Mut Gb Discov B Acc USD 

14.9 

Asian Grth I Acc USD 

22.6 

Glbl Bd I Acc EUR 

1.4 

Biotech Disc A Acc 

-0.4 

Mut Gb Discov C Acc USD 

15.7 

BRIC A Acc EUR 

25.3 

Glbl Bd I Acc USD 

9.9 

Biotech Disc B Acc 

-1.8 

Mut Gb Discov I Acc EUR 

9.1 

BRIC A Acc USD 

34.8 

Glbl Bd(Euro) A Acc 

1.5 

Biotech Disc I Acc 

0.6 

Mut Gb Discov I Acc USD 

17.4 

BRIC A Dis GBP 

22.7 

Glbl Bd(Euro) A Dis 

1.4 

Europ Growth A Acc 

20.0 

T Japan A Acc EUR 

-18.3 

BRIC B Acc USD 

33.2 

Glbl Bd(Euro) I Acc 

2.0 

Europ Growth I Acc 

21.3 

T Japan A Acc JPY 

-8.3 

BRIC C Acc USD 

34.1 

Global Euro A Acc 

11.1 

EurSMidCapGr A Acc 

33.1 

T Japan A Acc USD 

-12.2 

BRIC I Acc USD 

36.4 

Global Euro A Dis 

11.1 

EurSMidCapGr I Acc 

33.3 

T Japan C Acc USD 

-12.6 

China A Acc 

36.1 

Global Euro I Acc 

12.1 

EurSMidCapGrBAccUSD 

41.3 

T Japan I Acc EUR 

-17.6 

China A Dis 

23.7 

Global A Acc 

20.5 

Global Growth A Acc 

15.5 

T Japan I Acc USD 

-11.4 

China I Acc 

37.6 

Global A Dis 

20.4 

GlblMidCapGr A Acc 

10.7 

T Gib Gr&Val A Acc 

16.7 

Eastern Europ A Acc EUR 

13.3 

Global B Acc 

18.9 

GlblMidCapGr B Acc 

9.3 

T Gib Gr&Val B Acc 

15.3 

Eastern Europ A Acc USD 

21.9 

Global C Acc 

19.7 

GlblRealEst A Acc EUR 

19.7 

T Gib Gr&Val C Acc 

16.0 

Eastern Europ A Dis EUR 

13.3 

Global I Acc 

21.5 

GlblRealEst I Acc EUR 

20.7 

T Gib Gr&Val I Acc 

17.8 

Eastern Europ A Dis GBP 

11.0 

Gib Eq Inc A Acc EUR 

11.4 

GlblRealEst A Dis GBP 

17.1 

Technology A Acc 

-0.4 

Eastern Europ C Acc EUR 

12.6 

Gib Eq Inc A Acc USD 

19.9 

GlblRealEst A Acc USD 

22.1 

Technology B Acc 

-1.4 

Eastern Europ C Acc USD 

21.2 

Gib Eq Inc A Dis 

19.9 

GlblRealEst A Dis USD 

22.1 

US Eqty A Acc EUR 

0.2 

Eastern Europ I Acc 

14.7 

Gib Eq Inc B Dis 

18.4 

GlblRealEst B Dis USD 

20.5 

US Eqty A Acc EUR Hdg 

4.9 

Emg Mkt A Acc 

14.4 

Gib Eq Inc C Dis 

19.3 

GlblRealEst C Dis USD 

21.4 

US Eqty A Acc USD 

7.7 

Emg Mkt A Dis 

14.4 

Gib Eq Inc I Acc 

20.5 

GlblRealEst I Acc USD 

23.1 

US Eqty B Acc 

6.4 

Emg Mkt B Acc 

13.0 

Gib Inc A Acc EUR 

10.4 

GlblRealEst I Dis USD 

23.1 

US Eqty C Acc 

7.1 

Emg Mkt C Acc 

13.7 

Gib Inc A Acc USD 

18.7 

High Yield A Acc 

6.9 

US Eqty I Acc EUR 

-5.9 

Emg Mkt I Acc 

15.8 

Gib Inc A Dis 

18.7 

High Yield A Dis 

7.1 

US Eqty I Acc USD 

8.9 

EmMktBd A Dis EUR 

5.2 

Gib Inc B Dis 

17.2 

High Yield B Dis 

5.6 

US Gov A Dis 

3.1 

EmMktBd A Dis USD 

13.2 

Gib Inc C Dis 

17.9 

High Yield C Acc 

6.2 

US Gov B Dis 

1.8 

Emg Mkt Bd B Dis 

11.9 

Gib Inc I Acc 

19.4 

High Yield I Dis 

7.8 

US Gov B Acc 

1.9 

Emg Mkt Bd C Acc 

12.6 

Glbl Sm Co A Acc 

21.3 

High Yld Eur A Acc 

8.3 

US Gov C Acc 

2.2 

Emg Mkt Bd I Acc 

14.3 

Glbl Sm Co A Dis 

21.3 

High Yld Eur A Dis 

8.3 

US Gov I Dis 

3.8 

Euro Liq Res A Acc 

1.9 

Glbl Sm Co C Acc 

12.1 

High Yld Eur I Acc 

9.1 

US Growth A Acc 

3.8 

Euro Liq Res A Dis 

1.9 

Glbl Sm Co I Acc 

22.4 

High Yld Eur I Dis 

9.1 

US Growth B Acc 

2.5 

Euroland Bd A Dis 

-1.8 

Dlbl Tot Ret A Acc 

12.6 

Income A Dis 

12.8 

US Growth C Acc 

3.3 

Euroland Bd I Acc 

-1.2 

Dlbl Tot Ret A Dis 

12.6 

Income B Dis 

11.4 

US Growth I Acc 

6.4 

Euroland A Acc 

18.5 

Dlbl Tot Ret B Acc 

10.9 

Income C Acc 

12.1 

US Ultra Sh Bd A Dis 

3.7 

Euroland A Dis 

19.8 

Dlbl Tot Ret B Dis 

10.9 

Income C Dis 

12.1 

US Ultra Sh Bd B Acc 

2.5 

Euroland C Acc 

17.8 

Dlbl Tot Ret C Dis 

11.8 

Income I Acc 

13.7 

US Ultra Sh Bd B Dis 

2.5 

Euroland I Acc 

19.6 

Dlbl Tot Ret I Acc 

13.1 

India A Acc EUR 

29.0 

US Ultra Sh Bd C Dis 

2.6 

European A Acc USD 

24.0 

Dlbl Tot Ret I Dis 

10.0 

India A Acc USD 

38.7 

US Ultra Sh Bd I Acc 

4.2 

European A Acc EUR 

15.3 

Growth(Euro) A Acc 

7.5 

India A Dis GBP 

26.2 

US SmMidCapGro A Ac 

2.5 

European A Dis EUR 

15.2 

Growth(Euro) A Dis 

7.4 

India B Acc USD 

36.9 

US SmMidCapGro B Ac 

1.2 

European A Dis USD 

24.0 

Growth(Euro) I Acc 

8.4 

India C Acc USD 

37.9 

US SmMidCapGro C Ac 

2.0 

European C Acc EUR 

14.6 

Growth(Euro) I Dis 

8.4 

India I Acc EUR 

30.2 

US Tot Rtn A Acc 

4.1 

European I Acc 

16.4 

Japan A Acc 

-8.0 

India I Acc USD 

40.0 

US Tot Rtn A Dis 

4.2 

Euro Tot Ret A Acc 

-0.4 

Korea A Acc 

-3.8 

Mut Beacon AAccEUR 

7.4 

US Tot Rtn B Acc 

2.6 

Euro Tot Ret A Dis EUR 

-0.5 

Latin Amer A Acc 

35.9 

Mut Beacon AAccUSD 

15.5 

US Tot Rtn B Dis 

2.7 

Euro Tot Ret A Dis GBP 

-2.6 

Latin Amer A Dis GBP 

23.6 

Mut Beacon ADisUSD 

15.5 

US Tot Rtn C Dis 

3.1 

Euro Tot Ret A Dis USD 

7.1 

Latin Amer A Dis USD 

35.9 

Mut Beacon Bacc 

14.0 

US Tot Rtn I Acc 

4.8 

Euro Tot Ret C Acc EUR 

-1.3 

Latin Amer I Acc USD 

37.4 

Mut Beacon Cacc 

14.8 

Asian Bond A Acc EUR 

5.9 

Euro Tot Ret C Dis USD 

6.2 

Thailand A Acc 

-11.0 

Mut Beacon IAcc 

16.6 

Asian Bond A Acc USD 

14.1 

Euro Tot Ret I Acc 

-0.3 

US$ Liq Res A Acc 

4.2 

Mut Europ AAcc EUR 

15.9 

Asian Bond A Dis USD 

14.0 

Glbl Bal A Acc EUR 

6.5 

US$ Liq Res A Dis 

4.1 

Mut Europ AAcc USD 

24.7 

Asian Bond B Dis USD 

12.4 

Glbl Bal A Acc USD 

14.6 

US$ Liq Res B Dis 

3.1 

Mut Europ ADis EUR 

15.9 

Asian Bond C Dis USD 

13.0 

Glbl Bal A Dis 

14.6 

US$ Liq Res C Acc 

3.2 

Mut Europ ADis GBP 

14.0 

Asian Bond I Acc USD 

14.6 

Glbl Bal B Acc 

13.1 

US Value A Acc 

14.5 

Mut Europ B Acc 

23.1 

Asian Grth A Acc EUR 

12.7 

Glbl Bal C Dis 

13.9 

US Value B Acc 

13.0 

Mut Europ C Acc USD 

23.9 

Asian Grth A Acc USD 

21.4 

Glbl Bd A Dis USD 

9.2 

US Value C Acc 

13.8 

Mut Europ C Acc EUR 

15.2 

Asian Grth A Dis EUR 

12.8 

Glbl Bd A Acc EUR 

1.5 

US Value I Acc 

15.6 

Mut Europ I Acc 

16.9 

Asian Grth A Dis GBP 

10.4 

Glbl Bd A Dis EUR 

1.5 
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Table 8 Ordered Array of the 235 12-month Returns for the Franklin Templeton Investment Funds (Luxembourg) 


Obs. (i) Value 


x(l) 

-18.3 

x(40) 

2.2 

x(79) 

7.5 

x(118) 

12.7 

x(157) 

15.7 

x(196) 

21.3 

x(2) 

-17.6 

x (41) 

2.5 

x(80) 

7.5 

x(119) 

12.8 

x(158) 

15.8 

x(197) 

21.3 

x(3) 

-12.6 

x (42) 

2.5 

x (81) 

7.7 

x(120) 

12.8 

x(159) 

15.9 

x(198) 

21.4 

x(4) 

-12.2 

x(43) 

2.5 

x (82) 

7.8 

x (121) 

13 

x(160) 

15.9 

x(199) 

21.4 

x(5) 

-11.4 

x(44) 

2.5 

x(83) 

8.1 

x (122) 

13 

x(161) 

16 

x(200) 

21.5 

x(6) 

-11 

x(45) 

2.6 

x(84) 

8.2 

x(123) 

13 

x(162) 

16.4 

x(201) 

21.9 

x(7) 

-8.3 

x(46) 

2.6 

x(85) 

8.3 

x(124) 

13.1 

x(163) 

16.4 

x(202) 

22.1 

x(8) 

-8 

x(4 7) 

2.7 

x(86) 

8.3 

x(125) 

13.1 

x(164) 

16.6 

x(203) 

22.1 

x(9) 

-6.9 

x(48) 

2.9 

x(8 7) 

8.4 

x(126) 

13.2 

x(165) 

16.7 

x(204) 

22.4 

x(10) 

-5.9 

x(49) 

3.1 

x(88) 

8.4 

x (127) 

13.3 

x(166) 

16.9 

x(205) 

22.6 

x(ll) 

-3.8 

x(50) 

3.1 

x(89) 

8.9 

x(128) 

13.3 

x(167) 

17.1 

x(206) 

22.7 

x(12) 

-2.6 

x (51) 

3.1 

x(90) 

9.1 

x(129) 

13.7 

x(168) 

17.2 

x(207) 

23.1 

x(13) 

-1.8 

x (52) 

3.2 

x (91) 

9.1 

x(130) 

13.7 

x(169) 

17.4 

x(208) 

23.1 

x(14) 

-1.8 

x(53) 

3.3 

x (92) 

9.1 

x(131) 

13.8 

x(170) 

17.8 

x(209) 

23.1 

x(15) 

-1.4 

x(54) 

3.7 

x (93) 

9.2 

x(132) 

13.9 

x (171) 

17.8 

x(210) 

23.6 

x (16) 

-1.3 

x(55) 

3.8 

x(94) 

9.3 

x(133) 

14 

x (172) 

17.9 

x (211) 

23.7 

x(l 7) 

-1.2 

x(56) 

3.8 

x(95) 

9.9 

x(134) 

14 

x(173) 

18.4 

x(212) 

23.9 

x (18) 

-0.7 

x (57) 

4.1 

x(96) 

10 

x(135) 

14 

x(174) 

18.5 

x(213) 

24 

x (19) 

-0.5 

x(58) 

4.1 

x (97) 

10.4 

x(136) 

14 

x(175) 

18.7 

x(214) 

24 

x (20) 

-0.4 

x(59) 

4.2 

x(98) 

10.4 

x(137) 

14.1 

x(176) 

18.7 

x(215) 

24.7 

x(21) 

-0.4 

x(60) 

4.2 

x(99) 

10.7 

x(138) 

14.3 

x (177) 

18.9 

x(216) 

25.3 

x(22) 

-0.4 

x (61) 

4.2 

x(100) 

10.9 

x(139) 

14.4 

x(178) 

19.3 

x (217) 

26.2 

x (23) 

-0.3 

x (62) 

4.8 

x(101) 

10.9 

x(140) 

14.4 

x(179) 

19.4 

x(218) 

29 

x(24) 

0.2 

x(63) 

4.9 

x(102) 

11 

x(141) 

14.5 

x(180) 

19.6 

x(219) 

30.2 

x(25) 

0.6 

x (64) 

5.2 

x(103) 

11.1 

x(142) 

14.6 

x(181) 

19.7 

x(220) 

33.1 

x (26) 

0.9 

x(65) 

5.6 

x(104) 

11.1 

x(143) 

14.6 

x(182) 

19.7 

x (221) 

33.2 

x(27) 

1.2 

x(66) 

5.9 

x(105) 

11.4 

x(144) 

14.6 

x(183) 

19.8 

x(222) 

33.3 

x (28) 

1.4 

x(6 7) 

5.9 

x(106) 

11.4 

x(145) 

14.6 

x(184) 

19.9 

x(223) 

34.1 

x (29) 

1.4 

x(68) 

6.2 

x(107) 

11.8 

x(146) 

14.7 

x(185) 

19.9 

x(224) 

34.8 

x(30) 

1.5 

x(69) 

6.2 

x(108) 

11.9 

x(147) 

14.8 

x(186) 

20 

x(225) 

35.9 

x (31) 

1.5 

x (70) 

6.4 

x(109) 

12.1 

x(148) 

14.9 

x(187) 

20.4 

x(226) 

35.9 

x (32) 

1.5 

x (71) 

6.4 

x(110) 

12.1 

x(149) 

15.2 

x(188) 

20.5 

x(227) 

36.1 

x(33) 

1.8 

x(72) 

6.5 

x(lll) 

12.1 

x(150) 

15.2 

x(189) 

20.5 

x(228) 

36.4 

x(34) 

1.9 

x(73) 

6.9 

x (112) 

12.1 

x(151) 

15.3 

x(190) 

20.5 

x(229) 

36.9 

x(35) 

1.9 

x(74) 

7.1 

x(113) 

12.4 

x(152) 

15.3 

x(191) 

20.6 

x(230) 

37.4 

x(36) 

1.9 

x (75) 

7.1 

x(114) 

12.6 

x(153) 

15.5 

x(192) 

20.7 

x(231) 

37.6 

x (37) 

1.9 

x(76) 

7.1 

x(115) 

12.6 

x(154) 

15.5 

x(193) 

21.2 

x(232) 

37.9 

x(38) 

2 

x(77) 

7.4 

x(116) 

12.6 

x(155) 

15.5 

x(194) 

21.3 

x(233) 

38.7 

x (39) 

2 

x(78) 

7.4 

x(117) 

12.6 

x(156) 

15.6 

x(195) 

21.3 

x(234) 

40 











x(235) 

41.3 


check that the four requirements for the classes 
are met again.) 

Let us next compare Tables 9 and 10. We ob¬ 
serve a finer distribution when the Freedman- 
Diaconis rule is employed because this rule 
generates more classes for the same data. How¬ 
ever, it is generally difficult to judge which 
rule provides us with the better information 
because, as is seen, the two rules set up com¬ 
pletely different classes. But the choice of class 
bounds is essential. By just slightly shifting the 


bounds between two adjacent classes, many ob¬ 
servations may fall from one class into the other 
due to this alteration. As a result, this might 
produce a totally different picture about the 
data distribution. So, we have to be very careful 
when we interpret the two different results. 

For example, class 7, that is, [22,29) in Ta¬ 
ble 9 contains 16 observations. Classes 9 and 
10 of Table 10 cover approximately the same 
range, [21.5,31.5). Together they account for 20 
observations. We could now easily present two 
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Table 9 Classes for the 235 Fund Returns According to Sturge's Rule 


Class Index 

I 

1 

2 

3 

4 

5 

6 

7 

8 

9 

[aibi) 

[-20, -13) 

[-13, -6) 

[-6,1) 

[1,8) 

[8,15) 

[15,22) 

[22,29) 

[29,36) 

[36,43) 


2 

7 

17 

56 

66 

53 

16 

9 

9 


scenarios that would provide rather different 
conceptions about the frequency. In scenario 
one, suppose one assumes that two observa¬ 
tions are between 21.5 and 22.0. Then, there 
would have to be 16 observations between 22.0 
and 26.5 to add up to 18 observations in class 9 
of Table 10. This, in return, would mean that the 
16 observations of class 7 from Table 9 would 
all have to lie between 22.0 and 26.5 as well. 
Then, the two observations from class 10 of Ta¬ 
ble 10 must lie beyond 29.0. The other scenario 
could assume that we have four observations 
between 21.5 and 22.0. Then, for similar rea¬ 
sons as before, we would have 14 observations 
between 22.0 and 26.5. The two observations 
from class 10 of Table 10 would now have to be 
between 26.5 and 29.0, so that the total of 16 ob¬ 
servations in class 7 of Table 9 is met. See how 
easily slightly different classes can lead to am¬ 
biguous interpretation? Looking at all classes 
at once, many of these puzzles can be solved. 
However, some uncertainty remains. As can be 
seen, the choice of the number of classes and 
thus the class bounds can have a significant im¬ 
pact on the information that the data conveys 
when condensed into classes. 

CUMULATIVE FREQUENCY 
DISTRIBUTIONS 

In contrast to the empirical cumulative fre¬ 
quency distributions, in this section we will 
introduce functions that convey basically the 
same information, that is, th e frequency distribu¬ 


tion, but rely on a few more assumptions. These 
cumulative frequency distributions introduced 
here, however, should not be confused with the 
theoretical definitions given in probability the¬ 
ory even though the notion is akin to both. 

The absolute cumulative frequency at each 
class bound states how many observations have 
been counted up to this particular class bound. 
However, we do not exactly know how the data 
are distributed within the classes. When relative 
frequencies are used, though, the cumulative 
relative frequency distribution states the over¬ 
all proportion of all values up to a certain lower 
or upper bound of some class. 

So far, things are not much different from 
the definition of the empirical cumulative fre¬ 
quency distribution and empirical cumulative rel¬ 
ative frequency distribution. At each bound, the 
empirical cumulative frequency distribution 
and cumulative frequency coincide. However, 
an additional assumption is made regarding the 
distribution of the values between bounds of 
each class when computing the cumulative fre¬ 
quency distribution. The data are thought of 
as being continuously distributed and equally 
spread between the particular bounds. (This 
type of assumed behavior is defined as a "uni¬ 
form distribution of data.") Hence, both forms 
(absolute and relative) of the cumulative fre¬ 
quency distributions increase in a linear fash¬ 
ion between the two class bounds. So, for both 
forms of cumulative distribution functions, one 
can compute the accumulated frequencies at 
values inside of classes. 


Table 10 Classes for the 235 Fund Returns According to the Freedman-Diaconis Rule 


I 1 2 3 456789 10 11 12 

[a l: h) [- 18 . 5 ;- 13 . 5 ) [- 13 . 5 ;- 8 . 5 ) [- 8 . 5 ;- 3 . 5 ) [— 3 . 5 ; 1 . 5 ) [ 1 . 5 ; 6 . 5 ) [ 6 . 5 ; 11 . 5 ) [ 11 . 5 ; 16 . 5 ) [ 16 . 5 ; 21 . 5 ) [ 21 . 5 ; 26 . 5 ) [ 26 . 5 ; 31 . 5 ) [ 31 . 5 ; 36 . 5 ) [ 36 . 41 . 5 ) 

a, 2 4 5 18 42 35 57 36 18 2 9 7 
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For a more thorough summary of this, let's 
use a more formal presentation. Let I denote the 
set of all class index i with i being some integer 
value between 1 and n t = \I ] (that is, the num¬ 
ber of classes). Moreover, let cij and /, denote 
the (absolute) frequency and relative frequency 
of some class j, respectively. The cumulative fre¬ 
quency distribution at some upper bound, x l u , 
of a given class i is computed as 

f (*,<) = Y a i = Y a i+ a i (!) 

;: 4<4 

In words, this means that we sum up the fre¬ 
quencies of all classes whose upper bound is 
less than x l u plus the frequency of class i it¬ 
self. The corresponding cumulative relative fre¬ 
quency distribution at the same value is then 

F f K)= Y fi= Y fj + fi (2) 

j-.xl<x\ 

This describes the same procedure as in equa¬ 
tion (1) using relative frequencies instead of fre¬ 
quencies .For any value x in between the bound¬ 
aries of, say, class i, x\ and x' u , the cumulative 
relative frequency distribution is defined by 

Ff(x) = Ff(xi)+^if i (3) 
K - x\ 

In words, this means that we compute the 
cumulative relative frequency distribution at 
value x as the sum of two things. First, we take 
the cumulative relative frequency distribution 
at the lower bound of class i. Second, we add 
that share of the relative frequency of class i that 
is determined by the part of the whole interval 
of class i that is covered by x. 

Figure 3 might appeal more to intuition. At 
the bounds of class i, we have values of the 
cumulative relative frequency given by F f (xj ) 
and F ^(x l u ) respectively. We assume that the cu¬ 
mulative relative frequency increases linearly 
along the line connecting F f (xj) and F f (x‘ u ). 
Then, at any value x* inside of class i, we find 
the corresponding value F f (x*) by the inter¬ 
section of the dashed line and the vertical axis 
as shown. The dashed line is obtained by ex- 


F'(x-) 



Figure 3 Determination of Frequency Distribu¬ 
tion within Class Bounds 

tending a horizontal line through the intersec¬ 
tion of the vertical line through x* and the 
line connecting F f (xj ) and T^(x' ( ) with slope 
Ff(x*) - Ff(xj)/x* -xj. 

KEY POINTS 

• The field of descriptive statistics discerns dif¬ 
ferent types of data. Very generally, there are 
two types: qualitative and quantitative data. 
If certain attributes of an item can only be as¬ 
signed to categories, these data are referred to 
as qualitative. Flowever, if an item is assigned 
a quantitative variable, the value of this 
variable is numerical. Generally, all real num¬ 
bers are eligible. 

• In descriptive statistics, data are grouped 
according to measurement levels. The mea¬ 
surement level gives an indication as to the 
sophistication of the analysis techniques that 
can be applied to the data collected. Typically, 
a hierarchy with five levels of measurement— 
nominal, ordinal, interval, ratio, and absolute 
data—are used to group data. The latter three 
form the set of quantitative data. If the data 
are of a certain measurement level, they are 
said to be scaled accordingly. That is, the data 
are referred to as nominally scaled, and so on. 

• Another way of classifying data is in terms 
of cross-sectional and time series data. 
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Cross-sectional data are values of a particu¬ 
lar variable across some universe of items ob¬ 
served at a unique point in time. Time series 
data are data related to a variable successively 
observed at a sequence of points in time. 

• Frequency (absolute and relative) distribu¬ 
tions can be computed for all types of data 
since they do not require that the data have 
a numerical value. The cumulative frequency 
distribution is another quantity of interest for 
comparing data that is closely related to the 
absolute or relative frequency distribution. 

* Four criteria that data classes need to sat¬ 
isfy are (1) each value can be placed in 
only one class (mutual exclusiveness), (2) 
the set of classes needs to cover all val¬ 
ues (completeness), (3) if possible, form 
classes of equal width (equidistance), and 


(4) if possible, avoid forming empty classes 
(nonemptiness). 

NOTES 

1. For a more detailed discussion, see Rachev 
et al. (2010). 

2. The 0.75-quantile divides the data into the 
lowest 75% and the highest 25%. 
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Abstract: A stochastic process is a time-dependent random variable. Stochastic processes such 
as Brownian motion and Ito processes develop in continuous time. This means that time 
is a real variable that can assume any real value. In many financial modeling applications, 
however, it is convenient to constrain time to assume only discrete values. A time series is a 
discrete-time stochastic process; that is, it is a collection of random variables X, indexed with the 
integers .. -n ,... ,-2,-l,0,l,2, 


In this entry, we introduce models of discrete¬ 
time stochastic processes (that is, time series). 
In financial modeling, both continuous-time 
and discrete-time models are used. In many 
instances, continuous-time models allow sim¬ 
pler and more concise expressions as well as 
more general conclusions, though at the ex¬ 
pense of conceptual complication. For instance, 
in the limit of continuous time, apparently sim¬ 
ple processes such as white noise cannot be 
meaningfully defined. The mathematics of as¬ 
set management tends to prefer discrete-time 
processes, while the mathematics of derivatives 
tends to prefer continuous-time processes. 

The first issue to address in financial econo¬ 
metrics is the spacing of discrete points of time. 
An obvious choice is regular, constant spacing. 
In this case, the time points are placed at mul¬ 
tiples of a single time interval: t = i At. For 


instance, one might consider the closing prices 
at the end of each day. The use of fixed spacing 
is appropriate in many applications. Spacing of 
time points might also be irregular but deter¬ 
ministic. For instance, weekends introduce ir¬ 
regular spacing in a sequence of daily closing 
prices. These questions can be easily handled 
within the context of discrete-time series. 

In this entry, we discuss only time series at 
discrete and fixed intervals of time, introducing 
concepts, representations, and models of time 
series. 1 


CONCEPTS OF TIME SERIES 

A time series is a collection of random variables 
Xf indexed with a discrete time index t = .. 2, 
—1,0,1,2,.... The variables X/ are defined over a 
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probability space (£2, P, 5), where £2 is the set of 
states, P is a probability measure, and 5 is the 
a-algebra of events, equipped with a discrete 
filtration {jj f } that determines the propagation 
of information (see the Appendix). A realization 
of a time series is a countable sequence of real 
numbers, one for each time point. 

The variables X f are characterized by finite¬ 
dimensional distributions as well as by condi¬ 
tional distributions, F s (x s /Zt), s > t. The latter 
are the distributions of the variable x at time s 
given the a-algebra {(T} at time t. Note that con¬ 
ditioning is always conditioning with respect to 
a a -algebra though we will not always strictly 
use this notation and will condition with respect 
to the value of variables, for instance: 

Fs(x s /x t ), s > t 

If the series starts from a given point, ini¬ 
tial conditions must be fixed. Initial conditions 
might be a set of fixed values or a set of random 
variables. If the initial conditions are not fixed 
values but random variables, one has to con¬ 
sider the correlation between the initial values 
and the random shocks of the series. A usual as¬ 
sumption is that the initial conditions and the 
random shocks of the series are statistically in¬ 
dependent. 

How do we describe a time series? One way to 
describe a time series is to determine the math¬ 
ematical form of the conditional distribution. 
This description is called an autopredictive model 
because the model predicts future values of the 
series from past values. However, we can also 
describe a time series as a function of another 
time series. This is called an explanatory model 
as one variable is explained by another. The 
simplest example is a regression model where a 
variable is proportional to another exogenously 
given variable plus a constant term. Time series 
can also be described as random fluctuations or 
adjustments around a deterministic path. These 
models are called adjustment models. Explana¬ 
tory, autopredictive, and adjustment models 
can be mixed in a single model. The data gen¬ 
eration process (DGP) of a series is a mathemat¬ 


ical process that computes the future values of 
the variables given all information known at 
time t. 

An important concept is that of a stationary 
time series. A series is stationary in the "strict 
sense" if all finite dimensional distributions are 
invariant with respect to a time shift. A series 
is stationary in a "weak sense" if only the mo¬ 
ments up to a given order are invariant with 
respect to a time shift. In this entry, time se¬ 
ries will be considered (weakly) stationary if 
the first two moments are time-independent. 
Note that a stationary series cannot have a start¬ 
ing point but must extend over the entire infi¬ 
nite time axis. Note also that a series can be 
strictly stationary (i.e., have all distributions 
time-independent, but the moments might not 
exist). Thus a strictly stationary series is not nec¬ 
essarily weakly stationary. 

A time series can be univariate or multivari¬ 
ate. A multivariate time series is a time-dependent 
random vector. The principles of modeling re¬ 
main the same but the problem of estimation 
might become very difficult given the large 
numbers of parameters to be estimated. 

Models of time series are essential building 
blocks for financial forecasting and, therefore, 
for financial decision-making. In particular as¬ 
set allocation and portfolio optimization, when 
performed quantitatively, are based on some 
model of financial prices and returns. This entry 
lays down the basic financial econometric the¬ 
ory for financial forecasting. We will introduce 
a number of specific models of time series and 
of multivariate time series, presenting the ba¬ 
sic facts about the theory of these processes. We 
will consider primarily models of financial as¬ 
sets, though most theoretical considerations ap¬ 
ply to macroeconomic variables as well. These 
models include: 

• Correlated random walks. The simplest 
model of multiple financial assets is that 
of correlated random walks. This model is 
only a rough approximation of equity price 
processes and presents serious problems of 
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estimation in the case of a large number of 
processes. 

• Factor models. Factor models address the 
problem of estimation in the case of a large 
number of processes. In a factor model there 
are correlations only among factors and be¬ 
tween each factor and each time series. Fac¬ 
tors might be exogenous or endogenously 
modeled. 

• Cointegrated models. In a cointegrated model 
there are portfolios that are described by auto- 
correlated, stationary processes. All processes 
are linear combinations of common trends 
that are represented by the factors. 

The above models are all linear. However, 
nonlinearities are at work in financial time se¬ 
ries. One way to model nonlinearities is to break 
down models into two components, the first 
being a linear autoregressive model of the pa¬ 
rameters, the second a regressive or autore¬ 
gressive model of empirical quantities whose 
parameters are driven by the first. This is the 
case with most of today's nonlinear models 
(e.g., ARCH/GARCH models), Hamilton mod¬ 
els, and Markov switching models. 

There is a coherent modeling landscape, from 
correlated random walks and factor models to 
the modeling of factors, and, finally, the mod¬ 
eling of nonlinearities by making the model 
parameters vary. Before describing models in 
detail, however, let's present some key empiri¬ 
cal facts about financial time series. 

STYLIZED FACTS OF 
FINANCIAL TIME SERIES 

Most sciences are stratified in the sense that 
theories are organized on different levels. The 
empirical evidence that supports a theory is 
generally formulated in a lower level theory. In 
physics, for instance, quantum mechanics can¬ 
not be formulated as a stand-alone theory but 
needs classical physics to give meaning to mea¬ 
surement. Economics is no exception. A basic 
level of knowledge in economics is represented 


by the so-called stylized facts. Stylized facts are 
statistical findings of a general nature on finan¬ 
cial and economic time series; they cannot be 
considered raw data insofar as they are for¬ 
mulated as statistical hypotheses. On the other 
hand, they are not full-fledged theories. 

Among the most important stylized facts 
from the point of view of finance theory, we 
can mention the following: 

• Returns of individual stocks exhibit nearly 
zero autocorrelation at every lag. 

• Returns of some equity portfolios exhibit sig¬ 
nificant autocorrelation. 

• The volatility of returns exhibits hyperbolic 
decay with significant autocorrelation. 

* The distribution of stock returns is not nor¬ 
mal. The exact shape is difficult to ascertain 
but power law decay cannot be rejected. 

* There are large stock price drops (that is, mar¬ 
ket crashes) that seem to be outliers with re¬ 
spect to both normal distributions and power 
law distributions. 

* Stock return time series exhibit significant 
cross-correlation. 

These findings are, in a sense, model- 
dependent. For instance, the distribution of 
returns, a subject that has received a lot of at¬ 
tention, can be fitted by different distributions. 
There is no firm evidence on the exact value of 
the power exponent, with alternative proposals 
based on variable exponents. The autocorrela¬ 
tion is model-dependent while the exponential 
decay of return autocorrelation can be inter¬ 
preted only as absence of linear dependence. 

It is fair to say that these stylized facts set the 
stage for financial modeling but leave ample 
room for model selection. Financial time series 
seem to be nearly random processes that ex¬ 
hibit significant cross correlations and, in some 
instances, cross autocorrelations. The global 
structure of auto and cross correlations, if it ex¬ 
ists at all, must be fairly complex and there is 
no immediate evidence that financial time se¬ 
ries admit a simple DGR 
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One more important feature of financial time 
series is the presence of trends. Prima facie 
trends of economic and financial variables are 
exponential trends. Trends are not quantities 
that can be independently measured. Trends 
characterize an entire stochastic model. There¬ 
fore there is no way to arrive at an assessment of 
trends independent from the model. Exponen¬ 
tial trends are a reasonable first approximation. 

Given the finite nature of world resources, ex¬ 
ponential trends are not sustainable in the long 
run. However, they might still be a good ap¬ 
proximation over limited time horizons. An ad¬ 
ditional insight into financial time series comes 
from the consideration of investors' behavior. 
If investors are risk averse, as required by the 
theory of investment, then price processes must 
exhibit a trade-off between risk and returns. The 
combination of this insight with the assump¬ 
tion of exponential trends yields market models 
with possibly diverging exponential trends for 
prices and market capitalization. 

Again, diverging exponential trends are dif¬ 
ficult to justify in the long run as they would 
imply that after a while only one entity would 
dominate the entire market. Some form of 
reversion to the mean or more disruptive 
phenomena that prevent time series to diverge 
exponentially must be at work. 

In the following sections we will proceed to 
describe the theory and the estimation proce¬ 
dures of a number of market models that have 
been proposed. We will present the multivari¬ 
ate random walk model, introduce cointegra¬ 
tion and autoregressive models. 


INFINITE 

MOVING-AVERAGE AND 
AUTOREGRESSIVE 
REPRESENTATION 
OF TIME SERIES 

There are several general representations (or 
models) of time series. This section introduces 


representations based on infinite moving av¬ 
erages or infinite autoregressions useful from 
a theoretical point of view. In the practice 
of econometrics, however, more parsimonious 
models such as the ARMA models (described 
in the next section) are used. Representations 
are different for stationary and nonstationary 
time series. Let's start with univariate station¬ 
ary time series. 

Univariate Stationary Series 

The most fundamental model of a univariate 
stationary time series is the infinite moving av¬ 
erage of a white noise process. In fact, it can be 
demonstrated that under mild regularity condi¬ 
tions, any univariate stationary causal time se¬ 
ries admits the following infinite moving-average 
representation: 

OO 

x t = Yh^, + m 
i=0 

where the hi are coefficients and is a one¬ 
dimensional zero-mean white-noise process. 
This is a causal time series as the present value 
of the series depends only on the present and 
past values of the noise process. A more general 
infinite moving-average representation would 
involve a summation that extends from —oo 
to Too. Because this representation would not 
make sense from an economic point of view, we 
will restrict ourselves only to causal time series. 

A sufficient condition for the above series to 
be stationary is that the coefficients hi are abso¬ 
lutely summable: 

OO 

Y2 \ h ’\ 2 < 00 

1=0 

The Lag Operator L 

Let's now simplify the notation by introducing 
the lag operator L. The lag operator L is an oper¬ 
ator that acts on an infinite series and produces 
another infinite series shifted one place to the 
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left. In other words, the lag operator replaces 
every element of a series with the one delayed 
by one time lag: 

L(x t ) — X;_i 

The n-th power of the lag operator shifts a series 
by n places: 

V\x t ) = x t - n 

Negative powers of the lag operator yield the 
forward operator F, which shifts places to the 
right. The lag operator can be multiplied by a 
scalar and different powers can be added. In 
this way, linear functions of different powers of 
the lag operator can be formed as follows: 

N 

A(L) = J2 a ‘ L ‘ 

i=1 

Note that if the lag operator is applied to a series 
that starts from a given point, initial conditions 
must be specified. 

Within the domain of stationary series, infi¬ 
nite power series of the lag operator can also 
be formed. In fact, given a stationary series, if 
the coefficients /;, are absolutely summable, the 
series 

OO 

hjL'xt 

1 =0 

is well defined in the sense that it converges 
and defines another stationary series. It there¬ 
fore makes sense to define the operator: 

CO 

A(L) = y>L' 

;=o 

Now consider the operator I — XL. If X < 1, 
this operator can be inverted and its inverse is 
given by the infinite power series, 

OO 

(f - XL)- 1 = y>'L' 

1=0 


as can be seen by multiplying I — XL by the 

OO 

power series A/ L z : 

i =1 

oo 

(I — XL) ^ X'V = L° = I 

i =1 

On the basis of this relationship, it can be 
demonstrated that any operator of the type 

N 

A(L) = J2 a ‘ Li 

i=i 

can be inverted provided that the solutions of 
the equation 

N 

a j z l = 0 
i =1 

have absolute values strictly greater than 1. The 
inverse operator is an infinite power series 

OO 

A- 1 (L) = y]^L i 
1=0 

Given two linear functions of the operator L, it 
is possible to define their product 

M 

A(L) = J2 a ‘ Li 

i =1 
N 

B(L) = y>L' 

;=i 

M+N 

P(L) = A(L)B(L) = PiL 1 
2=1 

2 

Pi = y ^Clybi-r 
1-1 

The convolution product of two infinite series 
in the lag operator is defined in a similar way 

OO 

A(L) = J2 a ' L ‘ 

2=0 

OO 

B(L) = y>L ; 

;=0 

OO 

C(L) = A(L) x B(L) = y>L* 

k=0 

k 

Ck — ^ ^ ^s^k—s 
s= 0 
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We can define the left-inverse (right-inverse) of 
an infinite series as the operator A -1 ( L ), such 
that A -1 (L) x A(L) = I. The inverse can always 
be computed solving an infinite set of recursive 
equations provided that Uq ^ 0. However, the 
inverse series will not necessarily be stationary. 
A sufficient condition for stationarity is that the 
coefficients of the inverse series are absolutely 
summable. 

In general, it is possible to perform on the 
symbolic series 

OO 

H{L) = J2 h ‘ V 

i=i 

the same operations that can be performed on 
the series 

OO 

H(z) = J>z' 

i=i 

with z complex variable. However operations 
performed on a series of lag operators neither 
assume nor entail convergence properties. In 
fact, one can think of z simply as a symbol. In 
particular, the inverse does not necessarily ex¬ 
hibit absolutely summable coefficients. 

Stationary Univariate 
Moving Average 

Using the lag operator L notation, the infinite 
moving-average representation can be written 
as follows: 

OO \ 

Y. hjV I eH- m = H(L)s t + m 

i=0 / 

Consider now the inverse series: 

OO 

T1(L) = Y x i L ‘ ’ n (L)H(L) = I 

i =0 

If the coefficients a, are absolutely summable, 
we can write 

OO 

St = n(L)Xf = XjVxt-j 
i =0 


Multivariate Stationary Series 

The concepts of infinite moving-average rep¬ 
resentation and of invertibility defined above 
for univariate series carry over immediately to 
the multivariate case. In fact, it can be demon¬ 
strated that under mild regularity conditions, 
any multivariate stationary causal time series 
admits the following infinite moving-average 
representation: 

OO 

x f = Y H,' s t ~i + m 
;=o 

where the H, are n x n matrices, St is an n- 
dimensional, zero-mean, white noise process 
with nonsingular variance-covariance matrix 
f2, and m is an n-vector of constants. The 
coefficients H, are called Markov coefficients. 
This moving-average representation is called 
the Wold representation. Wold representation 
states that any series where only the past in¬ 
fluences the present can be represented as an 
infinite moving average of white noise terms. 
Note that, as in the univariate case, the infinite 
moving-average representation can be written 
in more general terms as a sum that extends 
from —oo to +oo. However, a series of this type 
is not suitable for financial modeling as it is not 
causal (that is, the future influences the present). 
Therefore we consider only moving averages 
that extend to past terms. 

Suppose that the Markov coefficients are an 
absolutely summable series: 

OO 

£]]H,|| <+oo 

7=0 

where ||H|| 2 indicates the largest eigenvalue of 
the matrix HH'. Under this assumption, it can 
be demonstrated that the series is stationary 
and that the (time-invariant) first two moments 
can be computed in the following way: 

OO 

cov(x f x f _/,) = 

1=0 


and the series is said to be invertible. 


£[xt] = m 
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with the convention H, = 0 if i < 0. Note that the 
assumption that the Markov coefficients are an 
absolutely summable series is essential, other¬ 
wise the covariance matrix would not exist. For 
instance, if the H, were identity matrices, the 
variances of the series would become infinite. 

As the second moments are all constants, the 
series is weakly stationary. We can write the 
time-independent autocovariance function of 
the series, which is an n x n matrix whose en¬ 
tries are a function of the lag h, as 

OO 

r x (/z) = £H,smu 

i=0 

Under the assumption that the Markov coef¬ 
ficients are an absolutely summable series, we 
can use the lag-operator L representation and 
write the operator 

OO 

H(L) = ^H,L ; 

i =0 

so that the Wold representation of a series can 
be written as 

x f = H(L)e + m 

The concept of invertibility carries over to 
the multivariate case. A multivariate stationary 
time series is said to be invertible if it can be 
represented in autoregressive form. Invertibil¬ 
ity means that the white noise process can be 
recovered as a function of the series. In order to 
explain the notion of invertible processes, it is 
useful to introduce the generating function of 
the operator H, defined as the following matrix 
power series: 

OO 

H( Z ) = ^H,z ! ' 

i= 0 

It can be demonstrated that, if Ho = I, then 
H(0) = H 0 and the power series H(z) is invertible 
in the sense that it is possible to formally derive 
the inverse series, 

OO 

n(z) = £ n,z i 

i= 0 


such that 

n(z)H(z) = (n x h)(z) = i 

where the product is intended as a convolution 
product. If the coefficients II,■ are absolutely 
summable, as the process x ( is assumed to be 
stationary, it can be represented in infinite au¬ 
toregressive form: 

II(L)(x f - m) = s t 

In this case the process x* is said to be invertible. 

From the above, it is clear that the infinite 
moving average representation is a more gen¬ 
eral linear representation of a stationary time 
than the infinite autoregressive form. A pro¬ 
cess that admits both representations is called 
invertible. 

Nonstationary Series 

Let's now look at nonstationary series. As there 
is no very general model of nonstationary time 
series valid for all nonstationary series, we have 
to restrict somehow the family of admissible 
models. Let's consider a family of linear, moving- 
average, nonstationary models of the following 
type: 

t 

x, = is t -i + h(f)z_! 

1=0 

where the H, are left unrestricted and do not 
necessarily form an absolutely summable se¬ 
ries, h(f) is deterministic, and z_j is a random 
vector called the initial conditions, which is sup¬ 
posed to be uncorrelated with the white noise 
process. The essential differences of this linear 
model with respect to the Wold representation 
of stationary series are: 

* The presence of a starting point and of initial 
conditions. 

* The absence of restrictions on the coefficients. 

* The index f, which restricts the number of 
summands. 

The first two moments of a linear process 
are not constant. They can be computed in a 
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way similar to the infinite moving average 
case: 

t 

cov(x f x,_;,) = y H;S2H-_ fc + h(f)var(z)h' 

i=0 

£[xf] = m f = h(t)E[z] 

Let's now see how a linear process can be 
expressed in autoregressive form. To simplify 
notation let's introduce the processes l t and 
Xf and the deterministic series h(f) defined as 
follows: 

£f if t > 0 _ Xf if t > 0 

= 0 if t < 0 Xf = 0 if t < 0 

£ hf if t > 0 

1 = 0 if t < 0 

It can be demonstrated that, due to the initial 
conditions, a linear process always satisfies the 
following autoregressive equation: 

n(L)x f = £f + n(L)hf x (f)z_i 
A random walk model 

t 

Xf = X,_! + St — St + 'y ' £f-j 
i =1 

is an example of a linear nonstationary model. 

The above linear model can also represent 
processes that are nearly stationary in the sense 
that they start from initial conditions but then 
converge to a stationary process. A process 
that converges to a stationary process is called 
asymptotically stationary. 

We can summarize the previous discussion 
as follows. Under mild regularity conditions, 
any causal stationary series can be represented 
as an infinite moving average of a white noise 
process. If the series can also be represented in 
an autoregressive form, then the series is said to 
be invertible. Nonstationary series do not have 
corresponding general representations. Linear 
models are a broad class of nonstationary mod¬ 
els and of asymptotically stationary models that 
provide the theoretical base for ARMA and 
state-space processes that will be discussed in 
the following sections. 


ARMA REPRESENTATIONS 

The infinite moving average or autoregres¬ 
sive representations of the previous section are 
useful theoretical tools but they cannot be 
applied to estimate processes. One needs a 
parsimonious representation with a finite num¬ 
ber of coefficients. Autoregressive moving average 
(ARMA) models and state-space models pro¬ 
vide such representation; though apparently 
conceptually different, they are statistically 
equivalent. 

Stationary Univariate 
ARMA Models 

Let's start with univariate stationary processes. 
An autoregressive process of order p — AR(p) is 
a process of the form: 

Xf + fliXf-l + . . . + flpXf_p = £f 

which can be written using the lag operator as 
A(L)x t = (1 + a\L T ... -}-npL^ )xf 

= Xf + fliLxf + ... + «pL p Xf_p = et 

Not all processes that can be written in autore¬ 
gressive form are stationary. In order to study 
the stationarity of an autoregressive process, 
consider the following polynomial: 

A(z) = 1 + a\z + ... + a v z p 

where z is a complex variable. 

The equation 

A(z) — 1 + «iz + ... + a P z p — 0 

is called the inverse characteristic equation. It 
can be demonstrated that if the roots of this 
equation, that is, its solutions, are all strictly 
greater than 1 in modulus (that is, the roots 
are outside the unit circle), then the opera¬ 
tor A(L) is invertible and admits the inverse 
representation: 

Xf = A _1 (L)£ f 

+oo -foo 

with y, | A.f | < +oo 
1=0 !=0 
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In order to avoid possible confusion, note that 
the solutions of the inverse characteristic equa¬ 
tion are the reciprocal of the solution of the char¬ 
acteristic equation defined as 

A(z) = z p + fliz p_1 + ... + a p = 0 

Therefore an autoregressive process is invert¬ 
ible with an infinite moving average represen¬ 
tation that only involves positive powers of the 
operator L if the solutions of the characteristic 
equation are all strictly smaller than 1 in abso¬ 
lute value. This is the condition of invertibility 
often stated in the literature. 

Let's now consider finite moving-average 
representations. A process is called a moving 
average process of order q — MA(t/) if it admits 
the following representation: 

Xf = (1 -{- b\L + ...-(- bpL^^St 
= St + b\St-\ + ■ ■ ■ + bpEt-q 

In a way similar to the autoregressive case, if 
the roots of the equation 

B(z) = 1 + biz + ... + b q z q = 0 

are all strictly greater than 1 in modulus, then 
the MA(< 7 ) process is invertible and, therefore, 
admits the infinite autoregressive representa¬ 
tion: 

St = B _1 (L)x f 

-foo +oo 

= TjXt-j, with E \7ti\ < +oo 

z=0 i =0 

As in the previous case, if one considers the 
characteristic equation, 

B(z) = z q + foiz' 7-1 + ... + b q = 0 

then the MA(ij) process admits a causal autore¬ 
gressive representation if the roots of the char¬ 
acteristic equation are strictly smaller than 1 in 
modulus. 

Let's now consider, more in general, an 
ARMA process of order p,q. We say that a sta¬ 
tionary process admits a minimal ARMA(p, q) 
representation if it can be written as 

Xt + ci\Xt-\ + cipXt-p — b\£t + ... + bpSt-cj 


or equivalently in terms of the lag operator 
A(L)x t = B(L)s t 


where St is a serially uncorrelated white noise 
with nonzero variance, (Iq = bg = 1, a p ^ 0, b^ ^ 
0, the polynomials A and B have roots strictly 
greater than 1 in modulus and do not have any 
root in common. 

Generalizing the reasoning in the pure MA or 
AR case, it can be demonstrated that a generic 
process that admits the ARMA(p,ij) representa¬ 
tion A(L)x t — B(L)et is stationary if both polyno¬ 
mials A and B have roots strictly different from 
1. In addition, if all the roots of the polynomial 
A(z) are strictly greater than 1 in modulus, then 
the ARMA(p,g) process can be expressed as a 
moving average process: 


Xt 


B(L) 

A(L) 


St 


Conversely, if all the roots of the polyno¬ 
mial B(z) are strictly greater than 1, then the 
ARMA(p,ij) process can be expressed as an au¬ 
toregressive process: 


St 


A(L) 

B(L) 


x t 


Note that in the above discussions every pro¬ 
cess was centered—that is, it had zero constant 
mean. As we were considering stationary pro¬ 
cesses, this condition is not restrictive as the 
eventual nonzero mean can be subtracted. 

Note also that ARMA stationary processes ex¬ 
tend through the entire time axis. An ARMA 
process, which begins from some initial condi¬ 
tions at starting time t — 0, is not stationary 
even if its roots are strictly outside the unit cir¬ 
cle. It can be demonstrated, however, that such 
a process is asymptotically stationary. 


Nonstationary Univariate 
ARMA Models 

So far we have considered only stationary pro¬ 
cesses. However, ARMA equations can also rep¬ 
resent nonstationary processes if some of the 
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roots of the polynomial A(z) are equal to 1 in 
modulus. A process defined by the equation 

A(L)x t = B(L)st 

is called an autoregressive integrated moving- 
average (ARIMA) process if at least one of the 
roots of the polynomial A is equal to 1 in mod¬ 
ulus. Suppose that X be a root with multiplicity 
d. In this case the ARMA representation can be 
written as 

A'(L)(I - XL) d x t = B(L)s t 
A(L) = A'(L)(I - XL) d 

However this formulation is not satisfactory 
as the process A is not invertible if initial con¬ 
ditions are not provided; it is therefore prefer¬ 
able to offer a more rigorous definition, which 
includes initial conditions. Therefore, we give 
the following definition of nonstationary inte¬ 
grated ARMA processes. 

A process Xt defined for t > 0 is called 
an autoregressive integrated moving-average pro¬ 
cess — ARIMA(p,d,q) —if it satisfies a relationship 
of the type 

A(L)(I — XL) d x t = B(L)e t 

where: 

• The polynomials A(L) and B(L) have roots 
strictly greater than 1. 

• £f is a white noise process defined for t > 0. 

• A set of initial conditions (x_i,..., x_ p _d, 
St ,..., S-q) independent from the white noise 
is given. 

Later in this entry we discuss the interpre¬ 
tation and further properties of the ARIMA 
condition. 

Stationary Multivariate 
ARMA Models 

Let's now move on to consider stationary multi¬ 
variate processes. A stationary process that ad¬ 
mits an infinite moving-average representation 


of the type 

OO 

* = £ H ' £ ‘-‘ 

1=0 

where et-i is an n-dimensional, zero-mean, 
white-noise process with nonsingular variance- 
covariance matrix S2 is called an autoregressive 
moving-average — ARMA(p,q) — model, if it satis¬ 
fies a difference equation of the type 

A (L)x t = B (L)e t 

where A and B are matrix polynomials in the 
lag operator L of order p and q respectively: 

v 

A(L) = ^A,L\ Ao = I,A p #0 

i '=0 

v 

B(L) = £b ; -L', B 0 = I, Bq ^ 0 
i =o 

If q = 0, the process is purely autoregressive of 
order p; if q = 0, the process is purely a moving 
average of order q. Rearranging the terms of the 
difference equation, it is clear that an ARMA 
process is a process where the z-th component 
of the process at time f, x,q is a linear function of 
all the components at different lags plus a finite 
moving average of white noise terms. 

It can be demonstrated that the ARMA rep¬ 
resentation is not unique. The nonuniqueness 
of the ARMA representation is due to differ¬ 
ent reasons, such as the existence of a com¬ 
mon polynomial factor in the autoregressive 
and the moving-average part. It entails that 
the same process can be represented by mod¬ 
els with different pairs p,q. For this reason, one 
would need to determine at least a minimal 
representation— that is, an ARMA(p,t/) repre¬ 
sentation such that any other ARMA(p',ij') rep¬ 
resentation would have p' > p, q' > q. With the 
exception of the univariate case, these problems 
are very difficult from a mathematical point of 
view and we will not examine them in detail. 

Let's now explore what restrictions on the 
polynomials A(L) and B(L) ensure that the rela¬ 
tive ARMA process is stationary. Generalizing 
the univariate case, the mathematical analysis 
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of stationarity is based on the analysis of the 
polynomial det[A(z)] obtained by formally re¬ 
placing the lag operator L with a complex vari¬ 
able z in the matrix A(L) whose entries are finite 
polynomials in L. 

It can be demonstrated that if the complex 
roots of the polynomial det[A(z)], that is, the 
solutions of the algebraic equation det[A(z)] = 
0, which are in general complex numbers, all lie 
outside the unit circle, that is, their modulus is 
strictly greater than one, then the process that 
satisfies the ARMA conditions, 

A (L)x f = B(L)e t 

is stationary As in the univariate case, if one 
would consider the equations in 1 /z, the same 
reasoning applies but with roots strictly inside 
the unit circle. 

A stationary ARMA(p,ij) process is an 
autocorrelated process. Its time-independent 
autocorrelation function satisfies a set of linear 
difference equations. Consider an ARMA(p,ij) 
process that satisfies the following equation: 

AoXf + AjX(_i + ...-(- ApXf_p = B 0 £f + Bj£f_i 

+ • • • + B ? £(_q 

where Ao = I. By expanding the expression for 
the autocovariance function, it can be demon¬ 
strated that the autocovariance function sat¬ 
isfies the following set of linear difference 
equations: 

A 0 r ;! + AiT/i-i + ... + ApTji-p 

0 if h > q 

LBqftflH'. 

;=0 

where S2 and H, are, respectively, the covari¬ 
ance matrix and the Markov coefficients of the 
process in its infinite moving-average represen¬ 
tation: 

OO 

x, = £ H, £(_; 

i=0 

From the above representation, it is clear that 
if the process is purely MA, that is, if p = 0, 


then the autocovariance function vanishes for 
lag h > q. 

It is also possible to demonstrate the converse 
of this theorem. If a linear stationary process 
admits an autocovariance function that satisfies 
the following equations, 

Aolh, + AilVi + ... + A pTh-p = 0 if h > q 

then the process admits an ARMA(p,^) repre¬ 
sentation. In particular, a stationary process is a 
purely finite moving-average process MA(ij), if 
and only if its autocovariance functions vanish 
for h > q, where q is an integer. 


Nonstationary Multivariate 
ARMA Models 

Let's now consider nonstationary series. Con¬ 
sider a series defined for t > 0 that satisfies the 
following set of difference equations: 

AoXf + Aix f _i + ... + ApXf_p = B 0 £f + Bj£f_i 

+ ■ ■ ■ + B,j tt-q 

where, as in the stationary case, St-i is an n- 
dimensional zero-mean, white noise process 
with nonsingular variance-covariance matrix 
fl, A 0 = I, B 0 = I, Ap ± 0, B, / 0. Suppose, 
in addition, that initial conditions ( x... [,... ,x_ p , 
£f,... ,£-,,) are given. Under these conditions, 
we say that the process x ( , which is well de¬ 
fined, admits an ARMA representation. 

A process x t is said to admit an ARIMA repre¬ 
sentation if, in addition to the above, it satisfies 
the following two conditions: (1) det[B(z)] has 
all its roots strictly outside of the unit circle, and 
(2) det[A(z)] has all its roots outside the unit cir¬ 
cle but with at least one root equal to 1. In other 
words, an ARIMA process is an ARMA process 
that satisfies some additional conditions. Later 
in this entry we will clarify the meaning of in¬ 
tegrated processes. 
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Markov Coefficients and 
ARMA Models 

For the theoretical analysis of ARMA processes, 
it is useful to state what conditions on the 
Markov coefficients ensure that the process 
admits an ARMA representation. Consider a 
process x f , stationary or not, which admits a 
moving-average representation either as 

OO 

Xf = £ f -i 

i=0 

or as a linear model: 

t 

Xf = £ Hi tt-i + h(f)z 
i=0 

The process x, admits an ARMA representa¬ 
tion if and only if there is an integer q and a set of 
p matrices A„ i = 0,...,p such that the Markov 
coefficients H, satisfy the following linear dif¬ 
ference equation starting from q: 

P 

= 0, / > q 

7=0 

Therefore, any ARMA process admits an 
infinite moving-average representation whose 
Markov coefficients satisfy a linear differ¬ 
ence equation starting from a certain point. 
Conversely, any such linear infinite moving- 
average representation can be expressed par¬ 
simoniously in terms of an ARMA process. 

Hankel Matrices and ARMA Models 

For the theoretical analysis of ARMA processes 
it is also useful to restate the above conditions in 
terms of the Flankel infinite matrices. (A Flankel 
matrix is a matrix where for each antidiagonal 
the element is the same.) It can be demonstrated 
that a process, stationary or not, which admits 
either the infinite moving average representa¬ 
tion 

OO 

* = £ H; t t -i 
i= 0 


or a linear moving average model 

t 

Xf = £ H; £(_; + h(f)z 
;=o 

also admits an ARMA representation if and 
only if the Hankel matrix formed with the 
sequence of its Markov coefficients has finite 
rank or, equivalently, a finite column rank or 
row rank. 

INTEGRATED SERIES 
AND TRENDS 

This section introduces the fundamental no¬ 
tions of trend stationary series, difference sta¬ 
tionary series, and integrated series. Consider a 
one-dimensional time series. A trend stationary 
series is a series formed by a deterministic trend 
plus a stationary process. It can be written as 

X f = f(t) + e(t) 

A trend stationary process can be transformed 
into a stationary process by subtracting the 
trend. Removing the deterministic trend entails 
that the deterministic trend is known. A trend 
stationary series is an example of an adjustment 
model. 

Consider now a time series X t . The opera¬ 
tion of differencing a series consists of form¬ 
ing a new series Y f = AX t = X t — X f _i. The 
operation of differencing can be repeated an 
arbitrary number of times. For instance, differ¬ 
encing twice the series Xf yields the following 
series: 

Z, = AY f = A(AXf) 

= (X, - Xf.j) - (Xf_ 2 - X f _ 3 ) 

= Xf - Xf_! - Xf_ 2 + Xf_3 

Differencing can be written in terms of the lag 
operator as 

AXf = (1- L)‘ f X f 

A difference stationary series is a series that 
is transformed into a stationary series by dif¬ 
ferencing. A difference stationary series can be 
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written as 


AX* = fi + s(t) 


Xt — Xf_\ + /x + e(t) 

where e(t) is a zero-mean stationary process and 
p, is a constant. A trend stationary series with a 
linear trend is also difference stationary, if spac- 
ings are regular. The opposite is not generally 
true. A time series is said to be integrated of 
order n if it can be transformed into a stationary 
series by differencing n times. 

Note that the concept of integrated series as 
defined above entails that a series extends on 
the entire time axis. If a series starts from a set 
of initial conditions, the difference sequence can 
only be asymptotically stationary. 

There are a number of obvious differences be¬ 
tween trend stationary and difference station¬ 
ary series. A trend stationary series experiences 
stationary fluctuation, with constant variance, 
around an arbitrary trend. A difference station¬ 
ary series meanders arbitrarily far from a linear 
trend, producing fluctuations of growing vari¬ 
ance. The simplest example of difference sta¬ 
tionary series is the random walk. 

An integrated series is characterized by a 
stochastic trend. In fact, a difference stationary 
series can be written as 


ft -1 


Xt — pt + 


E e < s ) 


+ s(t) 


The difference X t — X* between the value of 
a process at time t and the best affine predic¬ 
tion at time t — 1 is called the innovation of the 
process. In the above linear equation, the sta¬ 
tionary process e(t) is the innovation process. 
A key aspect of integrated processes is that in¬ 
novations s(t) never decay but keep on accu¬ 
mulating. In a trend stationary process, on the 
other hand, past innovations disappear at every 
new step. 

These considerations carry over immediately 
in a multidimensional environment. Multi¬ 
dimensional trend stationary series will ex¬ 
hibit multiple trends, in principle one for 


each component. Multidimensional difference¬ 
stationary series will yield a stationary process 
after differencing. 

Let's now see how these concepts fit into 
the ARMA framework, starting with univari¬ 
ate ARMA model. Recall that an ARIMA pro¬ 
cess is defined as an ARMA process in which 
the polynomial B has all roots outside the unit 
circle while the polynomial A has one or more 
roots equal to 1. In the latter case the process 
can be written as 

A(L)A d x f = B(L)s t 
A(L) = (1 - L) d A'(L) 

and we say that the process is integrated of or¬ 
der n. If initial conditions are supplied, the pro¬ 
cess can be inverted and the difference sequence 
is asymptotically stationary. 

The notion of integrated processes carries 
over naturally in the multivariate case but with 
a subtle difference. Recall from earlier discus¬ 
sion in this entry that an ARIMA model is an 
ARMA model: 

A(L)x f = B(L)e t 

which satisfies two additional conditions: (1) 
det[B(z)] has all its roots strictly outside of the 
unit circle, and (2) det[A(z)] has all its roots out¬ 
side the unit circle but with at least one root 
equal to 1. 

Now suppose that, after differencing d times, 
the multivariate series A d x t can be represented 
as follows: 

A'(L)x f = B'(L)e f , 1 with A'(L) = A(L)A d 

In this case, if (1) B'(z) is of order q and det[B'(z)] 
has all its roots strictly outside of the unit circle 
and (2) A'(z) is of order p and det[A'(z)] has all 
its roots outside the unit circle, then the process 
is called ARlMA(p,d,q)- Not all ARIMA mod¬ 
els can be put in this framework as different 
components might have a different order of in¬ 
tegration. 

Note that in an ARIMA (p,d,q) model each 
component series of the multivariate model is 
individually integrated. A multivariate series is 
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integrated of order d if every component series 
is integrated of order d. 

Note also that ARIMA processes are not in¬ 
vertible as infinite moving averages, but as 
discussed, they can be inverted in terms of 
a generic linear moving-average model with 
stochastic initial conditions. In addition, the 
process in the d-differences is asymptotically 
stationary 

In both trend stationary and difference sta¬ 
tionary processes, innovations can be serially 
autocorrelated. In the ARMA representations 
discussed in the previous section, innovations 
are serially uncorrelated white noise as all the 
autocorrelations are assumed to be modeled in 
the ARMA model. If there is residual autocorre¬ 
lation, the ARMA or ARIMA model is somehow 
misspecified. 

The notion of an integrated process is essen¬ 
tially linear. A process is integrated if station¬ 
ary innovations keep on adding indefinitely. 
Note that innovations could, however, cumu¬ 
late in ways other than addition, producing 
essentially nonlinear processes. In ARCH and 
GARCH processes for instance, innovations do 
not simply add to past innovations. 

The behavior of integrated and nonintegrated 
time series is quite different and the estima¬ 
tion procedures are different as well. It is 
therefore important to ascertain if a series is 
integrated or not. Often a preliminary analysis 
to ascertain integratedness suggests what type 
of model should be used. 

A number of statistical tests to ascertain if a 
univariate series is integrated are available. Per¬ 
haps the most widely used and known are the 
Dickey-Fuller (DF) and the Augmented Dickey- 
Fuller (ADF) tests. The DF test assumes as a null 
hypothesis that the series is integrated of order 
1 with uncorrelated innovations. Under this as¬ 
sumption, the series can be written as a random 
walk in the following form: 

X f+1 = pX t + b + s t 
P = 1 
s t IID 


where IID is an independent and identical se¬ 
quence. 

In a sample generated by a model of this 
type, the value of p estimated on the sample 
is stochastic. Estimation can be performed 
with the ordinary least square (OLS) method. 
Dickey and Fuller determined the theoretical 
distribution of p and computed the critical 
values of p that correspond to different confi¬ 
dence intervals. The theoretical distribution of 
p is determined computing a functional of the 
Brownian motion. 

Given a sample of a series, for instance a series 
of log prices, application of the DF test entails 
computing the autoregressive parameter p on 
the given sample and comparing it with the 
known critical values for different confidence 
intervals. The strict hypothesis of random walk 
is too strong for most econometric applications. 
The DF test was extended to cover the case of 
correlated residuals that are modeled as a linear 
model. In the latter case, the DF test is called 
the Augmented Dickey-Fuller or ADF test. The 
Phillips and Perron test is the DF test in the 
general case of autocorrelated residuals. 

APPENDIX 

We will begin with several concepts from prob¬ 
ability theory. 

Stochastic Processes 

When it is necessary to emphasize the depen¬ 
dence of the random variable from both time 
f and the element o>, a stochastic process is ex¬ 
plicitly written as a function of two variables: 
X = X(t,co). Given a>, the function X = X t (a>) is 
a function of time that is referred to as the path 
of the stochastic process. 

The variable X might be a single random 
variable or a multidimensional random vector. 
A stochastic process is therefore a function X = 
X(f, w) from the product space [0,T] x U into the 
/(-dimensional real space R". Because to each 
co corresponds a time path of the process—in 
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general formed by a set of functions X = 
X t (co) —it is possible to identify the space 
with a subset of the real functions defined over 
an interval [0,T]. 

Let's now discuss how to represent a stochas¬ 
tic process X = X(t,co) and the conditions of 
identity of two stochastic processes. As a stochas¬ 
tic process is a function of two variables, we can 
define equality as pointwise identity for each 
couple (f, co). However, as processes are defined 
over probability spaces, pointwise identity is 
seldom used. It is more fruitful to define equal¬ 
ity modulo sets of measure zero or equality with 
respect to probability distributions. In general, 
two random variables X,Y will be considered 
equal if the equality X(co) = Y(co) holds for ev¬ 
ery co with the exception of a set of probability 
zero. In this case, it is said that the equality holds 
almost everywhere (denoted a.e.). 

A rather general (but not complete) represen¬ 
tation is given by the finite dimensional prob¬ 
ability distributions. Given any set of indices 
t\,..., t m , consider the distributions 

= P[(X tl ,..., XJ eH.He ®»] 

These probability measures are, for any choice 
of the t„ the finite-dimensional joint probabil¬ 
ities of the process. They determine many, but 
not all, properties of a stochastic process. For 
example, the finite dimensional distributions of 
a Brownian motion do not determine whether 
or not the process paths are continuous. 

In general, the various concepts of equality 
between stochastic processes can be described 
as follows: 

• Property 1. Two stochastic processes are 
weakly equivalent if they have the same 
finite-dimensional distributions. This is the 
weakest form of equality. 

• Property 2. The process X = X(t,co) is said to 
be equivalent or to be a modification of the 
process Y = Y(t,co) if, for all t, 

P(X t = Y t ) = 1 


• Property 3. The process X = X(f, co) is said to 

be strongly equivalent to or indistinguishable 

from the process Y — Y(t, co) if 

P(X t = Y t , for all t) = 1 

Property 3 implies Property 2, which in turn 
implies Property 1. Implications do not hold in 
the opposite direction. Two processes having 
the same finite distributions might have com¬ 
pletely different paths. However it is possible 
to demonstrate that if one assumes that paths 
are continuous functions of time. Properties 2 
and 3 become equivalent. 

Information Structures 

Let's now turn our attention to the question of 
time. We must introduce an appropriate repre¬ 
sentation of time as well as rules that describe 
the evolution of information, that is, informa¬ 
tion propagation, over time. The concepts of 
information and information propagation are 
fundamental in economics and finance theory. 

The concept of information in finance is dif¬ 
ferent from both the intuitive notion of infor¬ 
mation and that of information theory in which 
information is a quantitative measure related 
to the a priori probability of messages. In our 
context, information means the (progressive) 
revelation of the set of events to which the 
current state of the economy belongs. Though 
somewhat technical, this concept of informa¬ 
tion sheds light on the probabilistic structure 
of finance theory. The point is the following. 
Assets are represented by stochastic processes, 
that is, time-dependent random variables. But 
the probabilistic states on which these random 
variables are defined represent entire histories 
of the economy. To embed time into the prob¬ 
abilistic structure of states in a coherent way 
calls for information structures and filtrations 
(a concept we explain next). 

It is assumed that the economy is in one 
of many possible states and that there is un¬ 
certainty on the state that has been realized. 
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Consider a time period of the economy. At the 
beginning of the period, there is complete un¬ 
certainty on the state of the economy (i.e., there 
is complete uncertainty on what path the econ¬ 
omy will take). Different events have different 
probabilities, but there is no certainty. As time 
passes, uncertainty is reduced as the number 
of states to which the economy can belong is 
progressively reduced. Intuitively, revelation of 
information means the progressive reduction 
of the number of possible states; at the end of 
the period, the realized state is fully revealed. 
In continuous time and continuous states, the 
number of events is infinite at each instant. Thus 
its cardinality remains the same. We cannot 
properly say that the number of events shrinks. 
A more formal definition is required. 

The progressive reduction of the set of pos¬ 
sible states is formally expressed in the con¬ 
cepts of information structure and filtration. 
Let's start with information structures. Informa¬ 
tion structures apply only to discrete probabil¬ 
ities defined over a discrete set of states. At the 
initial instant To, there is complete uncertainty 
on the state of the economy; the actual state is 
known only to belong to the largest possible 
event (that is, the entire space Q). At the follow¬ 
ing instant T assuming that instants are dis¬ 
crete, the states are separated into a partition, a 
partition being a denumerable class of disjoint 
sets whose union is the space itself. The actual 
state belongs to one of the sets of the partitions. 
The revelation of information consists in ruling 
out all sets but one. For all the states of each 
partition, and only for these, random variables 
assume the same values. 

Suppose, to exemplify, that only two assets 
exist in the economy and that each can assume 
only two possible prices and pay only two pos¬ 
sible cash flows. At every moment there are 16 
possible price-cash flow combinations. We can 
thus see that at the moment T \ all the states are 
partitioned into 16 sets, each containing only 
one state. Each partition includes all the states 
that have a given set of prices and cash distri¬ 
butions at the moment T \. The same reasoning 


can be applied to each instant. The evolution of 
information can thus be represented by a tree 
structure in which every path represents a state 
and every point a partition. Obviously the tree 
structure does not have to develop as symmet¬ 
rically as in the above example; the tree might 
have a very generic structure of branches. 

Filtration 

The concept of information structure based 
on partitions provides a rather intuitive rep¬ 
resentation of the propagation of information 
through a tree of progressively finer parti¬ 
tions. However, this structure is not sufficient 
to describe the propagation of information in 
a general probabilistic context. In fact, the set 
of possible events is much richer than the 
set of partitions. It is therefore necessary to 
identify not only partitions but also a struc¬ 
ture of events. The structure of events used 
to define the propagation of information is 
called a filtration. In the discrete case, however, 
the two concepts—information structure and 
filtration—are equivalent. 

The concept of filtration is based on identi¬ 
fying all events that are known at any given 
instant. It is assumed that it is possible to as¬ 
sociate to each trading moment t a a -algebra 
of events Zt C Z formed by all events that are 
known prior to or at time t. It is assumed that 
events are never "forgotten," that is, that Zt C Zsi 
if t < s. An ordering of time is thus created. This 
ordering is formed by an increasing sequence of 
a -algebras, each associated to the time at which 
all its events are known. This sequence is a fil¬ 
tration. Indicated as {Zt }, a filtration is therefore 
an increasing sequence of all a -algebras Zt, each 
associated to an instant t. 

In the finite case, it is possible to create a 
mutual correspondence between filtrations and 
information structures. In fact, given an infor¬ 
mation structure, it is possible to associate to 
each partition the algebra generated by the 
same partition. Observe that a tree informa¬ 
tion structure is formed by partitions that create 
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increasing refinement: By going from one in¬ 
stant to the next, every set of the partition is 
decomposed. One can then conclude that the 
algebras generated by an information structure 
form a filtration. 

On the other hand, given a filtration { 3 1 }, it 
is possible to associate a partition to each 3 c 
In fact, given any element that belongs to £2, 
consider any other element that belongs to Q 
such that, for each set of 3f, both either belong to 
or are outside this set. It is easy to see that classes 
of equivalence are thus formed, that these create 
a partition, and that the algebra generated by 
each such partition is precisely the 3i that has 
generated the partition. 

A stochastic process is said to be adapted 
to the filtration {3 1 } if the variable Xf is 
measurable with respect to the a-algebra 3c It 
is assumed that the price and cash distribution 
processes St(co) and d t (a>) of every asset are 
adapted to {3 f} ■ This means that, for each f, no 
measurement of any price or cash distribution 
variable can identify events not included in the 
respective algebra or cr-algebra. Every random 
variable is a partial image of the set of states 
seen from a given point of view and at a given 
moment. 

The concepts of filtration and of processes 
adapted to a filtration are fundamental. They 
ensure that information is revealed without 
anticipation. Consider the economy and asso¬ 
ciate at every instant a partition and an al¬ 
gebra generated by the partition. Every ran¬ 
dom variable defined at that moment assumes 
a value constant on each set of the partition. The 
knowledge of the realized values of the random 
variables does not allow identifying sets of 
events finer than partitions. 

One might well ask: Why introduce the com¬ 
plex structure of a-algebras as opposed to sim¬ 
ply defining random variables? The point is 
that, from a logical point of view, the primi¬ 
tive concept is that of states and events. The 
evolution of time has to be defined on the prim¬ 
itive structure—it cannot simply be imposed 
on random variables. In practice, filtrations be¬ 


come an important concept when dealing with 
conditional probabilities in a continuous envi¬ 
ronment. As the probability that a continuous 
random variable assumes a specific value is 
zero, the definition of conditional probabilities 
requires the machinery of filtration. 


Conditional Probability and 
Conditional Expectation 

Conditional probabilities and conditional aver¬ 
ages are fundamental in the stochastic descrip¬ 
tion of financial markets. For instance, one is 
generally interested in the probability distribu¬ 
tion of the price of an asset at some date given 
its price at an earlier date. The widely used re¬ 
gression models are an example of conditional 
expectation models. 

The conditional probability of event A given 
event B was defined earlier as 


P(A\B) = 


P(AnB) 

P(B) 


This simple definition cannot be used in the con¬ 
text of continuous random variables because 
the conditioning event (i.e., one variable assum¬ 
ing a given value) has probability zero. To avoid 
this problem, we condition on a-algebras and 
not on single zero-probability events. In gen¬ 
eral, as each instant is characterized by a a- 
algebra 33, the conditioning elements are the 3f ■ 
The general definition of conditional expecta¬ 
tion is the following. Consider a probability 
space (Q, 3/ P) and a a-algebra 0 contained in 
3 and suppose that X is an integrable random 
variable on (Q, 3, P). We define the conditional 
expectation of X with respect to 0, written as 
£[X|0], as a random variable measurable with 
respect to 0 such that 

J E[X0]dP 

G 


for every set G e 0. In other words, the condi¬ 
tional expectation is a random variable whose av¬ 
erage on every event that belongs to 0 is equal 
to the average of X over those same events, but 
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it is ©-measurable while X is not. It is possible 
to demonstrate that such variables exist and are 
unique up to a set of measure zero. 

Econometric models usually condition a ran¬ 
dom variable given another variable. In the 
previous framework, conditioning one random 
variable X with respect to another random 
variable Y means conditioning X given er{Y} 
(i.e., given the < 7 -algebra generated by Y). Thus 
£[X|Y] means E[X|<r{Y}]. 

This notion might seem to be abstract and 
to miss a key aspect of conditioning: intu¬ 
itively, conditional expectation is a function of 
the conditioning variable. For example, given 
a stochastic price process, X t , one would like 
to visualize conditional expectation E[X f | X s ], 
s <t as a function of X s that yields the expected 
price at a future date given the present price. 
This intuition is not wrong insofar as the con¬ 
ditional expectation £[X| Y] of X given Y is a 
random variable function of Y. 

However, we need to specify how conditional 
expectations are formed, given that the usual 
conditional probabilities cannot be applied as 
the conditioning event has probability zero. 
Here is where the above definition comes into 
play. The conditional expectation of a variable 
X given a variable Y is defined in full generality 
as a variable that is measurable with respect to 
the (7-algebra o(Y) generated by the condition¬ 
ing variable Y and has the same expected value 
of Y on each set of er(Y). Later in this section 
we will see how conditional expectations can 
be expressed in terms of the joint p.d.f. of the 
conditioning and conditioned variables. 

One can define conditional probabilities start¬ 
ing from the concept of conditional expecta¬ 
tions. Consider a probability space (O, 5/ P), 
a sub-CT-algebra © of 3, and two events A e 3, 
B e If lh>, are the indicator functions of the 
sets A,B (the indicator function of a set assumes 
value 1 on the set, 0 elsewhere), we can define 
conditional probabilities of the event A, respec¬ 
tively, given © or given the event B as 

P(A\&) = E[I A \<8] P(A\B) = E[I a \I b ] 


Using these definitions, it is possible to demon¬ 
strate that given two random variables X and Y 
with joint density/(x, y ), the conditional density 
of X given Y is 


f(*\y) = 


fix, y) 

My) 


where the marginal density, defined as 


My) = 


00 

J y)dx 

—OO 


is assumed to be strictly positive. 

In the discrete case, the conditional expecta¬ 
tion is a random variable that takes a constant 
value over the sets of the finite partition asso¬ 
ciated to Of. Its value for each element of Q is 
defined by the classical concept of conditional 
probability. Conditional expectation is simply 
the average over a partition assuming the clas¬ 
sical conditional probabilities. 

An important econometric concept related to 
conditional expectations is that of a martingale. 
Given a probability space (£2,3,P) and a filtra¬ 
tion {3, }, a sequence of (/-measurable random 
variables X, is called a martingale if the follow¬ 
ing condition holds: 


E[X !+1 |3;] = X, 


A martingale translates the idea of a "fair 
game" as the expected value of the variable at 
the next period is the present value of the same 
value. 


KEY POINTS 

• Stochastic processes are time-dependent ran¬ 
dom variables. 

• An information structure is a collection of par¬ 
titions of events associated to each instant of 
time that become progressively finer with the 
evolution of time. A filtration is an increasing 
collection of a -algebras associated to each in¬ 
stant of time. 

• The states of the economy, intended as full 
histories of the economy, are represented as a 
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probability space. The revelation of informa¬ 
tion with time is represented by information 
structures or filtrations. Prices and other fi¬ 
nancial quantities are represented by adapted 
stochastic processes. 

By conditioning is meant the change in prob¬ 
abilities due to the acquisition of some infor¬ 
mation. It is possible to condition with respect 
to an event if the event has nonzero proba¬ 
bility. In general terms, conditioning means 
conditioning with respect to a filtration or an 
information structure. 

A martingale is a stochastic process such 
that the conditional expected value is always 
equal to its present value. It embodies the idea 
of a fair game where today's wealth is the best 
forecast of future wealth. 

A time series is a discrete-time stochastic pro¬ 
cess, that is, a denumerable collection of ran¬ 
dom variables indexed by integer numbers. 
Any stationary time series admits an infinite 
moving average representation, that is to say, 
it can be represented as an infinite sum of 
white noise terms with appropriate coeffi¬ 
cients. 

A time series is said to be invertible if it can 
also be represented as an infinite autoregres¬ 
sion, that is, an infinite sum of all past terms 
with appropriate coefficients. 


• ARMA models are parsimonious represen¬ 
tations that involve only a finite number of 
moving average and autoregressive terms. 

• An ARMA model is stationary if all the roots 
of the inverse characteristic equation of the 
AR or the MA part have roots with modulus 
strictly greater than one. 

• A process is said to be integrated of order 
p if it becomes stationary after differencing 
p times. 


NOTE 

1. See Enders (2009), Gourieroux and Monfort 
(1997), Hamilton (1994), and Tsay (2001) for 
a comprehensive discussion of modern time 
series econometrics. 
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Abstract: Information on risk-neutral density is valuable in financial markets for a wide range of 
participants. This density can be used to mark-to-market exotic options that are not very liquid on 
the market, for anticipation of effects determined by new policy or possible extreme events such 
as crashes, and even for designing new trading strategies. There are many models that have been 
proposed in the past for estimating the risk-neutral density, each with their pros and cons. 


The concept of risk-neutral density (RND) plays 
an important theoretical role in asset pricing 
as outlined in Cox and Ross (1976), published 
very shortly after the publication of the Black- 
Scholes model. Since then, the estimation of 
RND has become an essential tool for central 
banks in monitoring the stability of the finan¬ 
cial system and for measuring the impact of 
new policies. Investment banks also rely on the 
RND calibrated from liquid European vanilla 
options to determine the price of more exotic 
positions on their balance sheet that are not very 
liquid. Moreover, the first moments of the RND, 
such as implied volatility and skewness, can be 
used to design trading strategies. 

One may argue that the information con¬ 
tained in option prices is redundant to the 
information provided by historical prices of 


the underlying asset. However, based on the 
1987 stock market crash, Jackwerth and Rubin¬ 
stein (1996) demonstrated that this is not the 
case. Prior to the crash, the RND estimated at 
one-month horizon had been close to lognor¬ 
mal but subsequently the shape of the RND 
changed considerably. At the same time, they 
also revealed that the historical distribution had 
been lognormal and it remained like that after 
the crash. In other words, the option prices in 
the equity market contain different information 
from the historical equity prices. 

In this entry we highlight the main steps 
for estimating the RND associated with an eq¬ 
uity index. We exemplify the estimation proce¬ 
dure by applying a model based on both the 
generalized inverse Gaussian distribution that has 
been advocated in the literature for financial 
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modeling and the well-known lognormal mix¬ 
ture model that has been widely used by in¬ 
vestment houses and central banks. These two 
models are straightforward and easy to apply 
since the option pricing formulas can be derived 
in closed form. 


AN APPROPRIATE 
PARAMETRIC MODEL 


The RND is recovered from a bundle of Euro¬ 
pean vanilla call and put option prices on the 
same underlying asset X and with the same 
maturity T. The options differ in the exercise 
price K. Denoting with//) the probability den¬ 
sity function of the underlying asset X under 
the risk-neutral probability measure Q, the Eu¬ 
ropean vanilla call price for strike K is 

pOO 

C(K) = e~ rT / (X T - K)f(X T )dX T (1) 
Jk 

where r is the continuous compounding risk¬ 
free rate. 

The partial derivative of 1 with respect to the 
strike price K 


dc 

Jk 


— e 


' pOO 

/ (X T - K)f(X T )dX T 

Jk 

poo 

/ f{Xj)dXj = —e _rT [l — F(K)] 
Jk 

where F(-) is the cumulative distribution func¬ 
tion under the risk-neutral measure. Thus 


= — e 


d 

Jk 

-rT 


FTO = '’1§ + 1 


( 2 ) 


The RND probability function/ can be obtained 
by derivation of the cumulative function F 


f(K) = e 


rT 


d 2 c 

JkJ 


(3) 


One could then try to reconstruct either F 
or / from a grid of option prices using fi¬ 
nite difference schemes. However, such numer¬ 
ical methods are notoriously unreliable and 
very sensitive to the sample of option prices 
available. 

Over the years, two main classes of methods 
have emerged. First, parametric methods are 


underpinned by univariate distributions such 
as the Weibull distribution (see Savickas, 2002, 
2005), the generalized beta distribution (see Mc¬ 
Donald and Xu, 1995; Anagnou et al., 2005), the 
generalized lambda distribution (see Corrado, 
2001), the generalized gamma distribution (see 
Albota et al., 2009), the g-and-h distribution as 
proposed by Dutta and Babbel (2005); and a 
mixture of univariate distributions such as that 
proposed by Gemmill and Saflekos (2000) for 
two lognormals, and Melick and Thomas (1997) 
for three lognormals. 

The second class is defined by semiparamet- 
ric and nonparametric methods such as (1) ex¬ 
pansion methods as used by Jarrow and Rudd 
(1982) and Corrado and Su (1997), (2) direct fit¬ 
ting of the implied volatility curve with splines 
or other interpolation methods as described 
by Shimko (1993), Anagnou et al. (2003), and 
Brunner and Hafner (2003), (3) kernel methods 
developed in Ait-Sahalia and Lo (1998) and Ait- 
Sahalia and Duarte (2003), and (4) maximum 
entropy methods as applied by Buchen and 
Kelly (1996) and Avellaneda (1998). 

The nonparametric approach usually requires 
a large sample of data in order to achieve a good 
fit. In financial markets, for many asset classes, 
large samples may simply not be available. In 
this entry, we focus on the fully parametric ap¬ 
proach. 

The strategy for parametric models repre¬ 
sented by a vector of parameters 0 is to 
minimize some type of discrepancy measure 
between the theoretical options prices and the 
observed market prices. 

Given the availability of N European call 

options {C(Kj j )}j = i and M put options 

{P(K S) )}/=;,...,m, all with the same maturity T, 
the problem that must be solved is the mini¬ 
mization of the function 

N 

H 1 (e) = Y^ c W- cmkt (K il )] 2 

M 

M 

+ J2[P(K Si )-P mkt ( K Si )] 2 (4) 

7=1 
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subject to the forward constraint E' ! '-[X-r] = Fq, 
where Fq is the forward price on the same un¬ 
derlying asset X and the last term of the sum 
accounts for the forward martingale condition 
that must be satisfied for any parametric model. 
The notation C mkt , and P mkt r relates, respectively, 
to the actual option prices from the market. The 
function H is a discrepancy measure between 
the theoretical prices obtained under the chosen 
parametric RND/(-; 0) and the market prices. 

While the H in (4) is widely used in practice, it 
is sometimes useful to consider other potential 
discrepancy measures such as 


H 2 (0) 


H 3 (0) 


H 4 (0) 


* [c^,) - c^m )] 2 

P C mkt (Ki.) 

“ [P(K Sj ) - P mkt (K Sj )] 2 
P P mkt (K Sj ) 

* [C(K it ) - C mW (^ y )] 2 

U C ( K 0 

^[P(K Sj )-P mkt (K Sj )] 2 

U P&;) 

N 

M 

M 

+ J2\[P(K Si )-P mkt (K Sj )]\ 

i=i 


Since the market option prices that do not sat¬ 
isfy put-call parity are filtered out of the data 
used for calibration, it is possible to work with 
call prices only or with put prices only, if that is 
more convenient numerically. 


TWO PARAMETRIC MODELS 
FOR RND ESTIMATION 

In order to be able to reverse engineer the RND 
from options prices, a pricing formula for Eu¬ 
ropean vanilla options under the chosen distri¬ 
bution is needed. There is a great advantage in 
having the pricing formulas in closed form, oth¬ 


erwise numerical integral approximation meth¬ 
ods must be employed and this means that there 
is a risk of introducing errors in the estimation 
procedure. 

Here we illustrate the RND estimation proce¬ 
dure for two special cases, the general inverse 
Gaussian (GIG) distribution and the lognormal 
mixture (LnMix) distribution. For both models, 
closed-form solutions for pricing European op¬ 
tions are available. 


Pricing Options with the GIG 
Distribution 

The GIG distribution has been advocated for 
applications in financial modeling due to its 
flexibility to fit heavy tails (see Bibby and 
Sorensen, 2003). The probability density func¬ 
tion of the GIG distribution is 1 

x^expt—+ ifx)\ 
/gig(x;X, X, VO = -- T\ - 

Mx, VO 

X f(0,oo)M (5) 


where 


p OO 

Mx.V0 = / 

Jo 


exp 


~^{x x 1 + fx) 


dx 


is a normalizing constant that is related to the 
modified Bessel function of the third kind. 


If 00 r z 

K v (z) = 2 j 0 * ex P 2 ^ 1 + ^ 


df (6) 


via 


/ X X/2 _ 

(x,*) = 2(jj W^A) (7) 


Further technical details on this distribution can 
be found in Paolella (2007). 

The GIG distribution is well defined, or 
"proper," for the parameter domain 


{(X, x, VO e ® x (0, oo) x (0, oo)} 


There are also two boundary cases possible: (1) 
X > 0, x = 0 and \[r > 0 and (2) X < 0, x > 0 










524 


Financial Modeling Principles 


and = 0. Applying some standard algebraic 
routine leads to 

P(K) = Ke~ rT F G1G {K-,X,x,f) 

-e~ rT xF G ig(x;X, x, ip)dx 
Jo 

= Ke- rT F G ic(K;k,x,4 >) 

- rT ^± iOoVO 

x f /gig(x;X + 1, x. VOdx 
Jo 

= Ke- rT F G i G (K;X,x,ir) 

_ e -rT fT Kx+l{yfx¥) 

V ^ K x {Jxf) 

x F G ig(K'A + 1, X > iA) 

This formula can be rewritten in terms of the 
forward price Fo = E ( 2(Xr) as 

P(K) = Ke- rT F G iG(K;X,x,ir) 

-F 0 e- rT F G iG(K-,X + l,xT) ( 8 ) 


RND Estimation with the LnMix 
Distribution 

The importance of fat tails and non-normal dis¬ 
tributions in modeling equity stock and vanilla 
options has become prominent in the aftermath 
of the Black Monday 1987 crisis. The LnMix 
model is a convex combination of several log¬ 
normal individual models. Bahra (1997) was the 
first to propose using the LnMix model for RND 
estimation. An exact solution for options pric¬ 
ing of vanilla European call and put options 


can be derived as a weighted sum of standard 
Black-Scholes prices. In practice, the preferred 
mixture model is the one based on two individ¬ 
ual lognormal models. 

If LN(x; a, ft) is the lognormal distribution 
with parameters a and fi, then the LnMix dis¬ 
tribution is given by the following probability 
density function 

/ln(x; «i, Pi, 012, Pi, >l) = vLN(x) ai. Pi) 
-H(l — i i)LN(x-a 2 ,Pi) (9) 

Bahra (1997) described the formulas for pric¬ 
ing European vanilla call and put options 
C(K) = e~ rT He^+^Nid i) - KN(d 2 )] 

+ (1 - r,)[e^ +03 ®N(d 3 ) - KN(di)]} 

P(K) = e~ rT {rj[e^ ai+05 ^N(di) - KN(d z )] 

+ (1 - V)[e M5 ®N(d 3 ) - KN(di)]} 


where 


di = 
d 3 = 


oil + Pi -logW 

Pi 

a 2 + Pi - log(K) 

Pi 


d 2 = di — Pi 


di = d 3 — p 2 


and N is the standard normal cumulative dis¬ 
tribution function. 

This model has five parameters a\, Pi, a 2 , p 2 , 
and )j and one should expect a better fit of data 
with this model compared to the GIG model 
that has only three parameters. If the calibration 
goodness-of-fit results are very similar between 
the two models, then the model with fewer pa¬ 
rameters should be preferred based on the prin¬ 
ciple of parsimony. 


Table 1 Call Option Prices on May 29,1998, on the FTSE100 Index 


T 

To 

DF 

5700 

5750 

5800 

5850 

5900 

5950 

6000 

6050 

Sep-98 

5915.50 

0.98 

418.81 

385.79 

354.00 

323.51 

294.41 

266.76 

240.62 

216.06 

Dec-98 

6000.11 

0.96 

586.56 

553.83 

521.93 

490.86 

460.62 

431.28 

402.89 

375.54 

Mar-99 

6079.46 

0.95 

727.12 

694.31 

662.15 

630.57 

599.56 

569.19 

539.56 

510.77 

Jun-99 

6128.55 

0.93 

837.32 

804.79 

772.79 

741.21 

710.02 

679.35 

649.34 

620.15 

Sep-99 

6195.66 

0.91 

950.32 

917.58 

885.27 

853.33 

821.71 

790.54 

759.96 

730.09 

Dec-99 

6269.07 

0.90 

1061.20 

1028.20 

995.70 

963.43 

931.46 

899.9 

868.85 

838.46 

Mar-00 

6341.98 

0.88 

1167.80 

1134.70 

1001.90 

1069.40 

1037.2 

1005.40 

974.12 

943.36 

Jun-00 

6383.58 

0.87 

1250.20 

1217.30 

1184.70 

1152.30 

1120.30 

1088.60 

1057.40 

1026.70 


Note: Initial value Xo = 5843.32. In the second column the forward prices are reported. The third column reports the 
discount factors. Strike prices range from 5700 to 6050. 
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Table 2 Discrepancy Measures across Maturities for the Data in Table 1 



Sep-98 

Dec-98 

Mar-99 

Jun-99 

Sep-99 

Dec-99 

Mar-00 

Jun-00 

Hj GIG 

2.66E-05 

3.21E-05 

4.97E-05 

5.32E-05 

6.98E-05 

8.03E-05 

2.87E-04 

1.41E-04 

Hi LnMix 

2.64E-05 

3.96E-05 

5.12E-05 

6.75E-05 

8.85E-05 

1.03E-04 

2.90E-04 

1.36E-04 

H 2 GIG 

5.26E-04 

4.04E-04 

4.82E-04 

4.33E-04 

4.91E-04 

4.99E-04 

1.58E-03 

7.30E-04 

H 2 LnMix 

5.23E-04 

4.98E-04 

4.94E-04 

5.50E-04 

6.23E-04 

6.37E-04 

1.61E-03 

7.00E-04 

h 3 GIG 

5.12E-04 

4.93E-04 

4.90E-04 

5.46E-04 

6.19E-04 

6.34E-04 

1.57E-03 

6.98E-04 

H 3 LnMix 

5.15E-04 

4.00E-04 

4.78E-04 

4.31E-04 

4.89E-04 

4.97E-04 

1.55E-03 

7.27E-04 

h 4 GIG 

0.0128 

0.0140 

0.0176 

0.0181 

0.0207 

0.0222 

0.0374 

0.0294 

H 4 LnMix 

0.0127 

0.0156 

0.0177 

0.0203 

0.0233 

0.0251 

0.0376 

0.0288 



K/X o 
(a) Sep 98 


K/X o 
(b) Dec 98 




0.95 


K/X o 
(c) Mar 99 


K/X o 
(d) Jun 99 


1.05 



Figure 1 Absolute Percentage Errors for the First Four Maturities 
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FITTING THE MODELS 
TO DATA 

For the RND estimation with a parametric 
model, the main elements are (1) formulas for 
pricing either European call or European put 
options, together with a formula for the forward 
price, (2) a minimization procedure for a non¬ 
linear function such as Hi in the function given 
by (4), and (3) a set of market option prices. 



(a) Sep 99 



(c) Mar 00 


Here we illustrate the calibration of the GIG 
and LnMix models using a dataset reported in 
Table 1, which is described in Rebonato (2004, 
pp. 290-291), and it is a typical example for the 
UK equity market. 

The goodness of fit of the two models can 
be assessed to some extent from the results in 
Table 2, which reports the values obtained for 
the sum of squared residual Hi(6;m), where 
9 — arg ming H (0; m) and m is the vector with 



(b) Dec 99 



(d) Jun 00 


Figure 2 Absolute Percentage Errors for the Last Four Maturities 
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components Kj/X o reflecting moneyness. The 
smaller the value, the better is the fit. It is inter¬ 
esting that the GIG distribution, having three 
parameters, seems to calibrate across maturi¬ 
ties very closely and is even a superior fit than 
the lognormal mixture (LnMix) model that uses 
five parameters. 

A more informative comparison can be done 
by looking at the error structure versus mon¬ 
eyness. The fitting error for the two models 
and for each maturity are plotted in Figures 1 
and 2 as the absolute percentage errors, defined 
for the European call option prices as 100 x 
\C(9;in) — C mkt (m)\/C mkt (m), where C(9;m) is 
the same as the theoretical prices established 
in equation (1), which is calculated for the esti¬ 
mated parameter vector 9 following the min¬ 
imization procedure focused on the function 
given in (4). In the neighborhood of at-the- 
money prices, the absolute percentage error is 
less than 1%, while out-of-the-money or in-the- 
money, it may go even higher. 

Which parametric model to use depends on 
the task at hand. It is possible that some para¬ 
metric models perform better for some asset 
classes (such as foreign exchange), while other 
models perform better for different asset classes 
(such as equity). Some models may have a su¬ 
perior fit in the tails. 


KEY POINTS 

* The information contained in the risk-neutral 
density is useful to many participants in fi¬ 
nancial markets. Central banks use this in¬ 
formation in monitoring the stability of the 
financial system and for assessing the impact 
of new policies, and banks use it for marking 
positions in exotic derivatives that they hold. 

* To recover the RND, a bundle of market prices 
for European vanilla call and put options on 
the same underlying asset and with the same 
maturity is used. 

* Parameteric and nonparametric models have 
been proposed for estimating the RND. For 


several reasons, in practice, parametric mod¬ 
els are better to employ. The main elements 
of a parametric model to estimate RND are 
an option pricing formula combined with a 
forward price formula, a minimization pro¬ 
cedure, and a database of observed option 
prices. 

• RND estimation can be done easily with para¬ 
metric models for which pricing formulas are 
available for European vanilla options. The 
generalized inverse Gaussian model and the 
lognormal mixture model are examples of 
such models. 

* The calibration is done by minimizing a dis¬ 
crepancy measure between the theoretical 
model prices and the observed option mar¬ 
ket prices. 

NOTE 

1. Ia(%) is the indicator function being equal to 
1 when x e A and zero otherwise. 
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Abstract: Much of the financial data that is used in financial modeling is drawn from the company's 
financial statements. The four basic financial statements are the balance sheet, the income statement, 
the statement of cash flows, and the statement of shareholders' equity. It is important to understand 
these data so that the information conveyed by them is interpreted properly in financial modeling. 
The financial statements are created using several assumptions that affect how to use and interpret 
the financial data. 


Financial statements are summaries of the op¬ 
erating, financing, and investment activities of 
a business. Financial statements should pro¬ 
vide information useful to both investors and 
creditors in making credit, investment, and 
other business decisions. And this usefulness 
means that investors and creditors can use 
these statements to predict, compare, and eval¬ 
uate the amount, timing, and uncertainty of 
future cash flows. 1 In other words, financial 
statements provide the information needed to 
assess a company's future earnings and, there¬ 
fore, the cash flows expected to result from 
those earnings. 

Information from financial statements is typ¬ 
ically used in financial modeling for forecast¬ 
ing and valuation purposes. In this entry, we 
discuss the general principles that guide the 
preparation of financial statements (generally 


accepted accounting principles), the four ba¬ 
sic financial statements (balance sheet, income 
statement, statement of cash flows, and state¬ 
ment of shareholders' equity), and the assump¬ 
tions underlying the preparation of financial 
statements. 


ACCOUNTING PRINCIPLES 

The accounting data in financial statements are 
prepared by the firm's management according 
to a set of standards, referred to as generally 
accepted accounting principles (GAAP). Gener¬ 
ally accepted accounting principles consist of 
the FASB Accounting Standards Codification, 
and, for publicly-traded companies, the rules 
and releases of the Securities and Exchange 
Commission. 2 
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The financial statements of a company whose 
stock is publicly traded must, by law, be au¬ 
dited at least annually by independent public 
accountants (i.e., accountants who are not em¬ 
ployees of the firm). In such an audit, the ac¬ 
countants examine the financial statements and 
the data from which these statements are pre¬ 
pared and attest—through the published audi¬ 
tor's opinion—that these statements have been 
prepared according to GAAP. The auditor's 
opinion focuses on whether the statements con¬ 
form to GAAP and that there is adequate dis¬ 
closure of any material change in accounting 
principles. 

The financial statements are created using 
several assumptions that affect how we use and 
interpret the financial data: 

• Transactions are recorded at historical cost. There¬ 
fore, the values shown in the statements are 
not market or replacement values, but rather 
reflect the original cost (adjusted for depreci¬ 
ation in the case of depreciable assets). 

• The appropriate unit of measurement is the dollar. 
While this seems logical, the effects of infla¬ 
tion, combined with the practice of recording 
values at historical cost, may cause problems 
in using and interpreting these values. 

• The statements are recorded for predefined periods 
of time. Generally, statements are produced to 
cover a chosen fiscal year or quarter, with the 
income statement and the statement of cash 
flows spanning a period's time and the bal¬ 
ance sheet and statement of shareholders' eq¬ 
uity as of the end of the specified period. But 
because the end of the fiscal year is generally 
chosen to coincide with the low point of activ¬ 
ity in the operating cycle, the annual balance 
sheet and statement of shareholders' equity 
may not be representative of values for the 
year. 

• Statements are prepared using accrual account¬ 
ing and the matching principle. Most businesses 
use accrual accounting, where income and 
revenues are matched in timing such that in¬ 


come is recorded in the period in which it is 
earned and expenses are reported in the pe¬ 
riod in which they are incurred in an attempt 
to generate revenues. The result of the use 
of accrual accounting is that reported income 
does not necessarily coincide with cash flows. 
Because the financial analyst is concerned ul¬ 
timately with cash flows, he or she often must 
understand how reported income relates to a 
company's cash flows. 

• It is assumed that the business will continue as a 
going concern. The assumption that the busi¬ 
ness enterprise will continue indefinitely jus¬ 
tifies the appropriateness of using historical 
costs instead of current market values be¬ 
cause these assets are expected to be used up 
over time instead of sold. 

• Full disclosure requires providing information 
beyond the financial statements. The require¬ 
ment that there be full disclosure means 
that, in addition to the accounting num¬ 
bers for such accounting items as revenues, 
expenses, and assets, narrative and addi¬ 
tional numerical disclosures are provided in 
notes accompanying the financial statements. 
An analysis of financial statements is, there¬ 
fore, not complete without this additional 
information. 

• Statements are prepared assuming conservatism. 
In cases in which more than one interpre¬ 
tation of an event is possible, statements 
are prepared using the most conservative 
interpretation. 

The financial statements and the auditors' 
findings are published in the firm's annual and 
quarterly reports sent to shareholders and the 
10-K and 10-Q filings with the Securities and Ex¬ 
change Commission (SEC). Also included in the 
reports, among other items, is a discussion by 
management, providing an overview of com¬ 
pany events. The annual reports are much more 
detailed and disclose more financial informa¬ 
tion than the quarterly reports. 
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INFORMATION CONVEYED 
BY THE BASIC FINANCIAL 
STATEMENTS 

In this section we will discuss the four basic 
financial statements and the information that 
they convey. 

The Balance Sheet 

The balance sheet is a report of the assets, liabili¬ 
ties, and equity of a firm at a point in time, gen¬ 
erally at the end of a fiscal quarter or fiscal year. 
Assets are resources of the business enterprise, 
which are comprised of current or long-lived 
assets. How did the company finance these re¬ 
sources? It did so with liabilities and equity. 
Liabilities are obligations of the business enter¬ 
prise that must be repaid at a future point in 
time, whereas equity is the ownership inter¬ 
est of the business enterprise. The relation be¬ 
tween assets, liabilities and equity is simple, as 
reflected in the balance of what is owned and 
how it is financed, referred to as the accounting 
identity: 

Assets = Liabilities + Equity 

Assets 

Assets are anything that the company owns that 
has a value. These assets may have a physical 
existence or not. Examples of physical assets 
include inventory items held for sale, office fur¬ 
niture, and production equipment. If an asset 
does not have a physical existence, we refer to 
it as an intangible asset, such as a trademark 
or a patent. You cannot see or touch an intan¬ 
gible asset, but it still contributes value to the 
company. 

Assets may also be current or long-term, de¬ 
pending on how fast the company would be 
able to convert them into cash. Assets are gen¬ 
erally reported in the balance sheet in order of 
liquidity, with the most liquid asset listed first 
and the least liquid listed last. 


The most liquid assets of the company are 
the current assets. Current assets are assets that 
can be turned into cash in one operating cycle 
or one year, whichever is longer. This contrasts 
with the noncurrent assets, which cannot be liq¬ 
uidated quickly. 

There are different types of current assets. The 
typical set of current assets is the following: 

• Cash, bills, and currency are assets that are 
equivalent to cash (e.g., bank account). 

• Marketable securities, which are securities 
that can be readily sold. 

• Accounts receivable, which are amounts 
due from customers arising from trade 
credit. 

• Inventories, which are investments in raw 
materials, work-in-process, and finished 
goods for sale. 

A company's need for current assets is dic¬ 
tated, in part, by its operating cycle. The oper¬ 
ating cycle is the length of time it takes to turn 
the investment of cash into goods and services 
for sale back into cash in the form of collections 
from customers. The longer the operating cy¬ 
cle, the greater a company's need for liquidity. 
Most firms' operating cycle is less than or equal 
to one year. 

Noncurrent assets comprise both physical 
and nonphysical assets. Plant assets are phys¬ 
ical assets, such as buildings and equipment, 
and are reflected in the balance sheet as gross 
plant and equipment and net plant and equip¬ 
ment. Gross plant and equipment, or gross 
property, plant, and equipment, is the total cost 
of investment in physical assets; that is, what 
the company originally paid for the property, 
plant, and equipment that it currently owns. 
Net plant and equipment, or net property, plant, 
and equipment, is the difference between gross 
plant and equipment and accumulated depre¬ 
ciation, and represents the book value of the 
plant and equipment assets. Accumulated de¬ 
preciation is the sum of depreciation taken for 
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physical assets in the firm's possession. There¬ 
fore, 

Gross plant and equipment 
— Accumulated depreciation 
= Net plant and equipment 

Companies may present just the net plant and 
equipment figure on the balance sheet, placing 
the detail with respect to accumulated depreci¬ 
ation in a footnote. Interpreting financial state¬ 
ments requires knowing a bit about how assets 
are depreciated for financial reporting pur¬ 
poses. Depreciation is the allocation of the cost 
of an asset over its useful life (or economic life). 
In the case of the fictitious Sample Company, 
whose balance sheet is shown in Table 1, the 


Table 1 The Sample Company Balance Sheet for 
Years 1 and 2 (in millions) 



Year 2 

Year 1 

Cash 

$40 

$30 

Accounts receivable 

100 

90 

Inventory 

180 

200 

Other current assets 

10 

10 

TOTAL CURRENT ASSETS 

$350 

$330 

Property, plant, and equipment 

$900 

$800 

Less accumulated depreciation 

270 

200 

Net property, plant, and equipment 

630 

600 

Intangible assets 

20 

20 

TOTAL ASSETS 

$1,000 

$950 

Accounts payable 

$150 

$140 

Current maturities of long-term debt 

60 

40 

TOTAL CURRENT LIABILITIES 

$180 

$165 

Long-term debt 

300 

250 

TOTAL LIABILITIES 

$380 

$325 

Minority interest 

30 

15 

Common stock 

50 

50 

Additional paid-in capital 

100 

100 

Retained earnings 

500 

400 

TOTAL SHAREHOLDERS' EQUITY 

650 

550 

TOTAL LIABILITIES AND 
SHAREHOLDERS' EQUITY 

$1,000 

$950 


original cost of the fixed assets (i.e., plant, prop¬ 
erty, and equipment)—less any write-downs for 
impairment—for year 2 is $900 million. The ac¬ 
cumulated depreciation for Sample in Year 1 is 
$250 million; this means that the total depreci¬ 
ation taken on existing fixed assets over time is 
$270 milion. The net property, plant, and equip¬ 
ment account balance is $630 million. This is 
also referred to as the book value or carrying 
value of these assets. 

Intangible assets are assets that are not finan¬ 
cial instruments, yet have no physical existence, 
such as patents, trademarks, copyrights, fran¬ 
chises, and formulas. Intangible assets may be 
amortized over some period, which is akin to 
depreciation. Keep in mind that a company may 
own a number of intangible assets that are not 
reported on the balance sheet. A company may 
only include an intangible asset's value on its 
balance sheet if (1) there are likely future bene¬ 
fits attributable specifically to the asset, and (2) 
the cost of the intangible asset can be measured. 

Suppose a company has an active, ongoing 
investment in research and development to de¬ 
velop new products. It must expense what is 
spent on research and development each year 
because a given investment in R&D does not 
likely meet the two criteria because it is not un¬ 
til much later, after the R&D expense is made, 
that the economic viability of the investment is 
determined. If, on the other hand, a company 
buys a patent from another company, this cost 
may be capitalized and then amortized over the 
remaining life of the patent. So when you look 
at a company's assets on its balance sheet, you 
may not be getting the complete picture of what 
it owns. 

Liabilities 

We generally use the terms "liability" and 
"debt" as synonymous terms, though "liabil¬ 
ity" is actually a broader term, encompassing 
not only the explicit contracts that a company 
has, in terms of short-term and long-term debt 
obligations, but also obligations that are not 
specified in a contract, such as environmental 
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obligations or asset retirement obligations. Lia¬ 
bilities may be interest-bearing, such as a bond 
issue, or noninterest-bearing, such as amounts 
due to suppliers. 

In the balance sheet, liabilities are presented 
in order of their due date and are often pre¬ 
sented in two categories, current liabilities and 
long-term liabilities. Current liabilities are obli¬ 
gations due within one year or one operating 
cycle (whichever is longer). Current liabilities 
consist of: 

• Accounts payable are amounts due to suppli¬ 
ers for purchases on credit. 

* Wages and salaries payable are amounts due 
employees. 

* Current portion of long-term indebtedness. 

• Short-term bank loans. 

Long-term liabilities are obligations that are 
due beyond one year. There are different types 
of long-term liabilities, including: 


that case, an analyst will classify deferred taxes 
as equity. 

Equity 

The equity of a company is the ownership in¬ 
terest. The book value of equity, which for a 
corporation is often referred to as sharehold¬ 
ers' equity or stockholders' equity, is basically 
the amount that investors paid the company 
for their ownership interest, plus any earnings 
(or less any losses), and minus any distribu¬ 
tions to owners. For a corporation, equity is the 
amount that investors paid the corporation for 
the stock when it was initially sold, plus or mi¬ 
nus any earnings or losses, less any dividends 
paid. Keep in mind that for any company, the 
reported amount of equity is an accumulation 
over time since the company's inception (or in¬ 
corporation, in the case of a corporation). 

Shareholders' equity is the carrying or book 
value of the ownership of a company. Share¬ 
holders' equity comprises: 


• Notes payables and bonds, which are indebt¬ 
edness (loans) in the form of securities 

* Capital leases, which are rental obligations 
that are long-term, fixed commitments 

* Asset retirement liability, which is the con¬ 
tractual or statutory obligation to retire or de¬ 
commission the asset and restore the site to 
required standards at the end of the asset's 
life 

• Deferred taxes, which are taxes that may have 
to be paid in the future that are currently not 
due, though they are expensed for financial 
reporting purposes. Deferred taxes arise from 
differences between accounting and tax meth¬ 
ods (e.g., depreciation methods). 

Note that although deferred income taxes are 
often referred to as liabilities, some analysts 
will classify them as equity if the deferral is per¬ 
ceived to be perpetual. For example, a company 
that buys new depreciable assets each year will 
always have some level of deferred taxes; in 


+ Par value 


+ Additional 
paid-in-capital 


— Treasury stock 


+ Retained 
earnings 


± Accumulated 
comprehensive 
income or loss 


A nominal amount per share of 
stock (sometimes prescribed by 
law), or the stated value, which is 
a nominal amount per share of 
stock assigned for accounting 
purposes if the stock has no par 
value. 

Also referred to as capital surplus, 
the amount paid for shares of 
stock by investors in excess of 
par or stated value. 

The accounting value of shares of 
the firm's own stock bought by 
the firm. 

The accumulation of prior and 
current periods' earnings and 
losses, less any prior or current 
periods' dividends. 

The total amount of income or loss 
that arises from transactions that 
result in income or losses, yet are 
not reported through the income 
statement. Items giving rise to 
this income include foreign 
currency translation adjustments 
and unrealized gains or losses on 
available-for-sale investments. 


= Shareholders' 
equity 
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A Note on Minority Interest 

On many companies' consolidated financial 
statements, you will notice a balance sheet ac¬ 
count titled Minority Interest. When a company 
owns a substantial portion of another com¬ 
pany, the accounting principles require that the 
company consolidate that company's financial 
statements into its own. Basically what hap¬ 
pens in consolidating the financial statements 
is that the parent company will add the ac¬ 
counts of the subsidiary to its accounts (i.e., 
subsidiary inventory + parent inventory = con¬ 
solidated inventory). 3 If the parent does not 
own 100% of the subsidiary's ownership inter¬ 
est, an account is created, referred to as minority 
interest, which reflects the amount of the sub¬ 
sidiary's assets not owned by the parent. This 
account will be presented between liabilities 
and equity on the consolidated balance sheet. 
Is it a liability or an equity account? It is neither. 

A similar adjustment takes place on the in¬ 
come statement. The minority interest account 
on the income statement reflects the income (or 
loss) in proportion to the equity in the sub¬ 
sidiary not owned by the parent. 

Structure of the Balance Sheet 

Consider a simple balance sheet for the Sam¬ 
ple Company shown in Table 1 for fiscal years 
Year 1 and Year 2. The most recent fiscal year's 
data is presented in the left-most column of 
data. Notice that the accounting identity holds; 
that is, total assets are equal to the sum of 
the total liabilities and the total shareholders' 
equity. 

The Income Statement 

The income statement is a summary of operating 
performance over a period of time (e.g., a fis¬ 
cal quarter or a fiscal year). We start with the 
revenue of the company over a period of time 
and then subtract the costs and expenses related 
to that revenue. The bottom line of the income 
statement consists of the owners' earnings for 
the period. To arrive at this "bottom line," we 
need to compare revenues and expenses. The 


basic structure of the income statement includes 
the following: 


Sales or revenues 


Less: Cost of goods sold 
(or cost of sales) 


Gross profit 

Less: Selling and general 
expenditures 


Operating profit 


Less: Interest expense 
Net income before taxes 
Less: Taxes 


Net income 


Less: Preferred stock 
dividends 

Earnings available to 
common shareholders 


-t= Represent the amount of 
goods or services sold, in 
terms of price paid by 
customers 

•<= The amount of goods or 
services sold, in terms of 
cost to the firm 
The difference between 
sales and cost of goods 
sold 

Salaries, administrative, 
marketing expenditures, 
etc. 

-t= Income from operations 
(ignores effects of 
financing decisions and 
taxes); earnings before 
interest and taxes (EBIT), 
operating income, and 
operating earnings 
•<= Interest paid on debt 
•<= Earnings before taxes 
<= Taxes expense for the 
current period 
•<= Operating profit less 
financing expenses (e.g., 
interest) and taxes 
<;= Dividends paid to 
preferred shareholders 
<= Net income less 
preferred stock 
dividends; residual 
income 


Though the structure of the income statement 
varies by company, the basic idea is to present 
the operating results first, followed by non¬ 
operating results. The cost of sales, also referred 
to as the cost of goods sold, is deducted from 
revenues, producing a gross profit; that is, a 
profit without considering all other, general op¬ 
erating costs. These general operating expenses 
are those expenses related to the support of the 
general operations of the company, which in¬ 
cludes salaries, marketing costs, and research 
and development. Depreciation, which is the 
amortized cost of physical assets, is also de¬ 
ducted from gross profit. The amount of the 
depreciation expense represents the cost of the 
wear and tear on the property, plant, and equip¬ 
ment of the company. 
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Table 2 The Sample Company Income Statement for 
Year 2 (in millions) 


Sales 

Cost of goods sold 

Gross profit 
Depreciation 

Selling, general, and administrative expenses 

Operating profit 
Interest expense 

Income before taxes 
Taxes 


$ 1,000 

600 


$400 

50 

160 


$190 

23 


$167 

67 


Net income 


$100 


per share is net income (minus preferred div¬ 
idends), divided by the number of shares out¬ 
standing considering all dilutive securities (e.g., 
convertible debt, options). Diluted earnings per 
share, therefore, gives the shareholder informa¬ 
tion about the potential dilution of earnings. 
For companies with a large number of dilutive 
securities (e.g., stock options, convertible pre¬ 
ferred stock or convertible bonds), there can be 
a significant difference between basic and di¬ 
luted EPS. You can see the effect of dilution by 
comparing the basic and diluted EPS. 


Once we have the operating income, we have 
summarized the company's performance with 
respect to the operations of the business. But 
there is generally more to a company's per¬ 
formance. From operating income, we deduct 
interest expense and add any interest income. 
Further, adjustments are made for any other in¬ 
come or cost that is not a part of the company's 
core business. 

There are a number of other items that may 
appear as adjustments to arrive at net income. 
One of these is extraordinary items, which are 
defined as unusual and infrequent gains or 
losses. Another adjustment would be for the 
expense related to the write-down of an asset's 
value. 

In the case of the Sample Company, whose 
income statement is presented in Table 2, the 
income from operations—its core business—is 
$190 million, whereas the net income (i.e., the 
"bottom line") is $100 million. 

Earnings Per Share 

Companies provide information on earnings per 
share (EPS) in their annual and quarterly finan¬ 
cial statement information, as well as in their 
periodic press releases. Generally, EPS is cal¬ 
culated as net income, divided by the number 
of shares outstanding. Companies must report 
both basic and diluted earnings per share. 

Basic earnings per share is net income (minus 
preferred dividends), divided by the average 
number of shares outstanding. Diluted earnings 


More on Depreciation 

There are different methods that can be used 
to allocate an asset's cost over its life. Gener¬ 
ally, if the asset is expected to have value at the 
end of its economic life, the expected value, re¬ 
ferred to as a salvage value (or residual value), 
is not depreciated; rather, the asset is depreci¬ 
ated down to its salvage value. There are dif¬ 
ferent methods of depreciation that we classify 
as either straight-line or accelerated. Straight- 
line depreciation allocates the cost (less salvage 
value) in a uniform manner (equal amount per 
period) throughout the asset's life. Accelerated 
depreciation allocates the asset's cost (less sal¬ 
vage value) such that more depreciation is taken 
in the earlier years of the asset's life. There 
are alternative accelerated methods available, 
including: 

* Declining balance method, in which a con¬ 
stant rate is applied to a declining amount 
(the undepreciated cost) 

* Sum-of-the-years' digits method, in which a 
declining rate is applied to the asset's depre¬ 
ciable basis 

Another method is the units-of-activity 
method, in which the useful life is defined in 
terms of a measure of units of production or 
some other metric or use (e.g., hours, miles). 
The depreciation expense in any period is de¬ 
termined as the usage in that period. 
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A common declining balance method is 
the double-declining balance method (DDB), 
which applies the rate that is twice that of the 
straight-line rate. In this case, the straight-line 
rate is 10% per year; therefore, the declining 
balance rate is 20% per year. We apply this rate 
of 20% against the original cost of $1,000,000, 
resulting in a depreciation expense in the first 
year of $200,000. In the second year, we apply 
this 20% against the undepreciated balance of 
$1,000,000 — 200,000 = $800,000, resulting in a 
depreciation of $160,000. 

Because the declining balance methods result 
in more depreciation sooner, relative to straight- 
line, and lower depreciation in the later years, 
companies may switch to straight-line in these 
later years. The same amount is depreciated 
over the life of the asset, but the pattern—and 
depreciation's impact on earnings—is modified 
slightly. In the case of the declining balance 
method, salvage value is not considered in the 
calculation of depreciation until the undepreci¬ 
ated balance reaches the salvage value. 

For this same asset, the sum-of-the-years' dig¬ 
its (SYD) depreciation for the first year is the 
rate of 10/55, or 18.18%, applied against the 
depreciable basis of $1,000,000 — 100,000 = 
$900,000: 

SYD first year = $900,000( 1 °/ 5 5) = $163,636 

We calculate the denominator as the "sum of 
the years": 10 + 9 + 8 + 7 + 6 + 5+ 4 + 3 + 
2 + 1 = 55. In the second year, the rate is 9/55 
applied against the $900,000, and so on. 

Accelerated methods result in higher depre¬ 
ciation expenses in earlier years, relative to 
straight-line, as can be seen in Figure 1. As 
a result, accelerated methods result in lower 
reported earnings in earlier years, relative to 
straight-line. When comparing companies, it is 
important to understand whether the compa¬ 
nies use different methods of depreciation be¬ 
cause the choice of depreciation method affects 
both the balance sheet (through the carrying 
value of the asset) and the income statement 
(through the depreciation expense). 


A major source of deferred income taxes and 
deferred tax assets is the accounting methods 
used for financial reporting purposes and tax 
purposes. In the case of financial accounting 
purposes, the company chooses the method 
that best reflects how its assets lose value over 
time, though most companies use the straight- 
line method. Flowever, for tax purposes the 
company has no choice but to use the prescribed 
rates of depreciation, using the Modified Ac¬ 
celerated Cost Recovery System (MACRS). For 
tax purposes, a company does not have discre¬ 
tion over the asset's depreciable life or the rate 
of depreciation—they must use the MACRS 
system. 

The MACRS system does not incorporate sal¬ 
vage value and is based on a declining balance 
system. The depreciable life for tax purposes 
may be longer than or shorter than that used 
for financial reporting purposes. For example, 
the MACRS rate for 3- and 5-year assets are as 
follows: 


Year 

3-year 

5-year 

1 

33.33% 

20.00% 

2 

44.45% 

32.00% 

3 

14.81% 

19.20% 

4 

7.41% 

11.52% 

5 


11.52% 

6 


5.76% 


You'll notice the fact that a 3-year asset is 
depreciated over four years and a 5-year as¬ 
set is depreciated over six years. That is the re¬ 
sult of using what is referred to as a half-year 
convention—using only half a year's worth of 
depreciation in the first year of an asset's life. 
This system results in a leftover amount that 
must still be depreciated in the last year (i.e., 
the fourth year in the case of a 3-year asset 
and the sixth year in the case of a 5-year as¬ 
set). We provide a comparison of straight-line 
and MACRS depreciation in Figure 2. You can 
see that the methods produce different depre¬ 
ciation expenses, which result in the different 
income amounts for tax and financial reporting 
purposes. 
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Panel A: Depreciation Expense 
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Panel B: Book Value of the Asset 
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Figure 1 Comparison of Depreciation Expense and Book Value 

Depreciation expense each year for an asset with an original cost of $1,000,000, a salvage value of $10,000, 
and a 10-year useful life 


The Statement of Cash Flows 

The statement of cash flows is the summary of a 
firm's cash flows, summarized by operations, 
investment activities, and financing activities. 
A simplified cash flow statement is provided in 
Table 3 for the fictitious Sample Company. Cash 
flow from operations is cash flow from day- 
to-day operations. Cash flow from operating 
activities is basically net income adjusted for 
(1) noncash expenditures, and (2) changes in 
working capital accounts. The adjustment for 


changes in working capital accounts is neces¬ 
sary to adjust net income that is determined us¬ 
ing the accrual method to a cash flow amount. 
Increases in current assets and decreases in cur¬ 
rent liabilities are positive adjustments to arrive 
at the cash flow; decreases in current assets and 
increases in current liabilities are negative ad¬ 
justments to arrive at the cash flow. 

Cash flow for / from investing is the cash flows 
related to the acquisition (purchase) of plant, 
equipment, and other assets, as well as the 
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Panel A: Depreciation Expense 
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Panel B: Carrying Value 
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Figure 2 Depreciation for Financial Accounting Purposes versus Tax Purposes 

Consider an asset that costs $200,000 and has a salvage value of $20,000. If the asset has a useful life of 
8 years, but is classified as a 5-year asset for tax purposes, the depreciation and book value of the asset 
will be different between the financial accounting records and the tax records 


proceeds from the sale of assets. Cash flow for/ 
from financing activities is the cash flow from 
activities related to the sources of capital funds 
(e.g., buy back common stock, pay dividends, 
issue bonds). 

Not all of the classifications required by ac¬ 
counting principles are consistent with the true 
flow for the three types of activities. For exam¬ 
ple, interest expense is a financing cash flow, yet 
it affects the cash flow from operating activities 
because it is a deduction to arrive at net income. 
This inconsistency is also the case for interest 


income and dividend income, both of which re¬ 
sult from investing activities, but show up in 
the cash flow from operating activities through 
their contribution to net income. 

The sources of a company's cash flows can re¬ 
veal a great deal about the company and its 
prospects. For example, a financially healthy 
company tends to consistently generate cash 
flows from operations (that is, positive oper¬ 
ating cash flows) and invests cash flows (that 
is, negative investing cash flows). To remain vi¬ 
able, a company must be able to generate funds 
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Table 3 The Sample Company Statement of Cash 
Flows, for the period ending December 31,2006 
(in millions) 


Net income 

$100 


Add depreciation 

50 


Subtract increase in accounts receivable 

-10 


Add decrease in inventory 

20 


Add increase in accounts payable 

Cash flow from operations 

50 

$210 

Retire debt 

Cash flow for financing 

-$100 

-100 

Purchase of equipment 

Cash flow for investment 

-$100 

-100 

Change in cash flow 


$10 


from its operations; to grow, a company must 
continually make capital investments. 

The change in cash flow—also called net cash 
flow—is the bottom line in the statement of cash 
flows and is equal to the change in the cash 
account as reported on the balance sheet. For 
the Sample Company, shown in Table 3, the 
net change in cash flow is a positive $10 mil¬ 
lion; this is equal to the change in the cash ac¬ 
count from $50 million in Year 1 to $60 million in 
Year 2. 

By studying the cash flows of a company 
over time, we can gauge a company's finan¬ 
cial health. For example, if a company relies 
on external financing to support its operations 
(that is, reliant on cash flows from financing and 
not from operations) for an extended period of 
time, this is a warning sign of financial trouble 
up ahead. 

The Statement of Stockholders' 
Equity 

The statement of stockholders' equity (also re¬ 
ferred to as the statement of shareholders' equity) 
is a summary of the changes in the equity ac¬ 
counts, including information on stock options 
exercised, repurchases of shares, and Treasury 
shares. The basic structure is to include a rec¬ 
onciliation of the balance in each component 
of equity from the beginning of the fiscal year 
with the end of the fiscal year, detailing changes 


attributed to net income, dividends, purchases 
or sales of Treasury stock. The components are 
common stock, additional paid-in capital, re¬ 
tained earnings, and Treasury stock. For each 
of these components, the statement begins with 
the balance of each at the end of the previous 
fiscal period and then adjustments are shown 
to produce the balance at the end of the current 
fiscal period. 

In addition, there is a reconciliation of any 
gains or losses that affect stockholders' equity 
but which do not flow through the income state¬ 
ment, such as foreign-currency translation ad¬ 
justments and unrealized gains on investments. 
These items are of interest because they are part 
of comprehensive income, and hence income 
to owners, but they are not represented on the 
company's income statement. 

Why Bother About the Footnotes? 

Footnotes to the financial statements contain 
additional information, supplementing or ex¬ 
plaining financial statement data. These notes 
are presented in both the annual report and the 
10-K filing (with the SEC), though the latter usu¬ 
ally provides a greater depth of information. 

The footnotes to the financial statements pro¬ 
vide information pertaining to: 

• The significant accounting policies and practices 
that the company uses. This helps the analyst 
with the interpretation of the results, com¬ 
parability of the results to other companies 
and to other years for the same company, 
and in assessing the quality of the reported 
information. 

• Income taxes. The footnotes tell us about 
the company's current and deferred income 
taxes, breakdowns by the type of tax (e.g., 
federal versus state), and the effective tax rate 
that the company is paying. 

• Pension plans. The detail about pension plans, 
including the pension assets and the pension 
liability, is important in determining whether 
a company's pension plan is overfunded or 
underfunded. 
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• Leases. You can learn about both the capital 
leases, which are the long-term lease obliga¬ 
tions that are reported on the balance sheet, 
and about the future commitments under op¬ 
erating leases, which are not reflected on the 
balance sheet. 

• Long-term debt. You can find detailed infor¬ 
mation about the maturity dates and interest 
rates on the company's debt obligations. 

The phrase "the devil is in the details" applies 
aptly to the footnotes of a company's financial 
statement. Through the footnotes, a company 
is providing information that is crucial in ana¬ 
lyzing a company's financial health and perfor¬ 
mance. If footnotes are vague or confusing, as 
they were in the case of Enron prior to the break 
in the scandal, the analyst must ask questions 
to help understand this information. 

ACCOUNTING FLEXIBILITY 

The generally accepted accounting principles 
provide some choices in the manner in which 
some transactions and assets are accounted. For 
example, a company may choose to account 
for inventory, and hence costs of sales, using 
Last-in, First-out (LIFO) or First-in, First-out 
(FIFO). This is intentional because these prin¬ 
ciples are applied to a broad set of companies 
and no single set of methods offers the best rep¬ 
resentation of a company's condition or perfor¬ 
mance for all companies. Ideally, a company's 
management, in consultation with the accoun¬ 
tants, chooses those accounting methods and 
presentations that are most appropriate for the 
company. 

A company's management has always had 
the ability to manage earnings through the ju¬ 
dicious choice of accounting methods within 
the GAAP framework. The company's "watch¬ 
dogs" (i.e., the accountants) should keep the 
company's management in check. However, re¬ 
cent scandals have revealed that the watch¬ 
dog function of the accounting firms was not 
working well. Additionally, some companies' 


management used manipulation of financial re¬ 
sults and outright fraud to distort the financial 
picture. 

The Sarbanes-Oxley Act of 2002 offers some 
comfort in terms of creating the oversight board 
for the auditing accounting firms. In addi¬ 
tion, the Securities and Exchange Commission, 
the Financial Accounting Standards Board, 
and the International Accounting Standards 
Board are tightening some of the flexibility that 
companies had in the past. 

Pro Forma Financial Data 

Pro forma financial information is really a 
misnomer—the information is neither pro 
forma (that is, forward looking), nor reliable 
financial data. What is it? Creative accounting. 
It started during the Internet-tech boom in the 
1990s and persists today: Companies release fi¬ 
nancial information that is prepared according 
to its own liking, using accounting methods that 
they create. 

Why did companies start doing this? What 
is wrong with generally accepted accounting 
principles (GAAP)? During the Internet-tech 
stock boom, many startup companies quickly 
went public and then felt the pressures to gen¬ 
erate profits. However, profits in that indus¬ 
try were hard to come by during that period 
of time. What some companies did is generate 
financial data that they included in company 
releases that reported earnings not calculated 
using GAAP—but rather by methods of their 
own. In some cases, these alternative methods 
hid a lot of the ills of these companies. 

The use of pro forma financial data may 
be helpful, but also may be misleading to 
investors. Analysts routinely adjust published 
financial statement data to remove unusual, 
nonrecurring items. This can give the analyst a 
better predictor of the continued performance 
of the company. So what is wrong with the 
company itself doing this? Nothing, unless it 
becomes misleading, such as a company includ¬ 
ing its nonrecurring gains, but not including its 
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nonrecurring losses. In concern for the possi¬ 
bility of misleading information being given to 
investors, the Securities and Exchange Com¬ 
mission now requires that if companies release 
pro forma financial data, they must also 
reconcile this data with GAAP. 4 


KEY POINTS 

• There are four basic financial statements: the 
balance sheet, the income statement, the state¬ 
ment of cash flows, and the statement of 
stockholders' equity. 

• The balance sheet and the statement of share¬ 
holders' equity are statements with values of 
accounts at a point in time. In the case of the 
balance sheet, the company presents data as 
of the end of the most recent two years; in 
the case of the statement of shareholders' eq¬ 
uity, from the latest fiscal year to the end. 
The income statement and the statement of 
cash flows provide data on earnings and cash 
flows over the period, whether that period is 
a fiscal quarter or year. 

• The information conveyed in the footnotes 
is essential to the understanding of financial 
statements. There is detail in these footnotes 
that gives us a better idea of the financial 
health of the company. The financial state¬ 
ments and the accompanying footnotes pro¬ 
vide the accounting principles that guide 
companies in the preparation of financial 
statements. 


* Not only must the accounting methods that a 
company uses be understood, but the choices 
that a company has made among the available 
accounting methods should be understood. 

NOTES 

1. The purpose, focus, and objectives of finan¬ 
cial statements are detailed in Financial Ac¬ 
counting Standards Board (1978,1980). 

2. Effective July 1, 2009, Financial Accounting 
Standards Board (FASB) Accounting Stan¬ 
dards Codification. 

3. There are other adjustments made for inter¬ 
corporate transactions, but we will not go 
into these in this entry. 

4. Securities and Exchange Commission 
RIN3235-A169, "Conditions for Use of Non- 
GAAP Financial Measures," effective March 
28, 2003. 
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Abstract: Financial analysis involves the selection, evaluation, and interpretation of financial data 
and other pertinent information to assist in evaluating the operating performance and financial 
condition of a company. The operating performance of a company is a measure of how well a 
company has used its resources—its assets, both tangible and intangible—to produce a return on its 
investment. The financial condition of a company is a measure of its ability to satisfy its obligations, 
such as the payment of interest on its debt in a timely manner. The analyst has many tools available 
in the analysis of financial information. These tools include financial ratio analysis and quantitative 
analysis. The analyst must understand how to use these tools, along with economics and accounting 
information, in the most effective manner. 


Financial analysis is one of the many tools 
useful in valuation because it helps analysts 
and investors gauge returns and risks. In 
this entry, we explain and illustrate financial 
ratios—one of the tools of financial analysis. In 
financial ratio analysis we select the relevant 
information—primarily the financial statement 
data—and evaluate it. We show how to incorpo¬ 
rate market data and economic data in the anal¬ 
ysis of financial ratios. Finally, we show how to 
interpret financial ratio analysis, identifying the 
pitfalls that occur when it's not done properly. 

RATIOS AND THEIR 
CLASSIFICATION 

A ratio is a mathematical relation between two 
quantities. Suppose you have 200 apples and 


100 oranges. The ratio of apples to oranges is 
200/100, which we can conveniently express 
as 2:1 or 2. A financial ratio is a comparison 
between one bit of financial information and 
another. Consider the ratio of current assets 
to current liabilities, which we refer to as the 
current ratio. This ratio is a comparison be¬ 
tween assets that can be readily turned into 
cash—current assets—and the obligations that 
are due in the near future—current liabilities. 
A current ratio of 2 or 2:1 means that we have 
twice as much in current assets as we need to 
satisfy obligations due in the near future. 

Ratios can be classified according to the 
way they are constructed and the financial 
characteristic they are describing. For exam¬ 
ple, we will see that the current ratio is con¬ 
structed as a coverage ratio (the ratio of current 
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assets—available funds—to current liabilities— 
the obligation) that we use to describe a firm's 
liquidity (its ability to meet its immediate 
needs). 

There are as many different financial ratios as 
there are possible combinations of items ap¬ 
pearing on the income statement, balance sheet, 
and statement of cash flows. We can classify 
ratios according to how they are constructed 
or according to the financial characteristic that 
they capture. 

Ratios can be constructed in the following 
four ways: 

1. As a coverage ratio. A coverage ratio is a mea¬ 
sure of a firm's ability to "cover," or meet, a 
particular financial obligation. The denomi¬ 
nator may be any obligation, such as interest 
or rent, and the numerator is the amount of 
the funds available to satisfy that obligation. 

2. As a return ratio. A return ratio indicates a 
net benefit received from a particular invest¬ 
ment of resources. The net benefit is what 
is left over after expenses, such as operating 
earnings or net income, and the resources 
may be total assets, fixed assets, inventory, 
or any other investment. 

3. As a turnover ratio. A turnover ratio is a mea¬ 
sure of how much a firm gets out of its assets. 
This ratio compares the gross benefit from an 
activity or investment with the resources em¬ 
ployed in it. 

4. As a component percentage. A component per¬ 
centage is the ratio of one amount in a finan¬ 
cial statement, such as sales, to the total of 
amounts in that financial statement, such as 
net profit. 

In addition, we can also express financial data 
in terms of time—say, how many days' worth of 
inventory we have on hand—or on a per-share 
basis—say, how much a firm has earned for each 
share of common stock. Both are measures we 
can use to evaluate operating performance or 
financial condition. 

When we assess a firm's operating perfor¬ 
mance, a concern is whether the company is 


applying its assets in an efficient and profitable 
manner. When an analyst assesses a firm's fi¬ 
nancial condition, a concern is whether the com¬ 
pany is able to meet its financial obligations. The 
analyst can use financial ratios to evaluate five 
aspects of operating performance and financial 
condition: 

1. Return on investment 

2. Liquidity 

3. Profitability 

4. Activity 

5. Financial leverage 

There are several ratios reflecting each of the 
five aspects of a firm's operating performance 
and financial condition. We apply these ratios 
to the Fictitious Corporation, whose balance 
sheets, income statements, and statement of 
cash flows for two years are shown in Tables 1, 
2, and 3, respectively. We refer to the most re¬ 
cent fiscal year for which financial statements 


Table 1 Fictitious Corporation Balance Sheets for 
Years Ending December 31, in Thousands 



Current 

Year 

Prior 

Year 

ASSETS 

Cash 

$400 

$200 

Marketable securities 

200 

0 

Accounts receivable 

600 

800 

Inventories 

1,800 

1,000 

Total current assets 

$3,000 

$2,000 

Gross plant and equipment 

$11,000 

$10,000 

Accumulated depreciation 

(4,000) 

(3,000) 

Net plant and equipment 

7,000 

7,000 

Intangible assets 

1,000 

1,000 

Total assets 

$11,000 

$10,000 

LIABILITIES AND SHAREHOLDERS' EQUITY 

Accounts payable 

$500 

$400 

Other current liabilities 

500 

200 

Long-term debt 

4,000 

5,000 

Total liabilities 

$5,000 

$5,600 

Common stock, $1 par value; 

Authorized 2,000,000 shares 

Issued 1,500,000 and 1,200,000 

1,500 

1,200 

shares 

Additional paid-in capital 

1,500 

800 

Retained earnings 

3,000 

2,400 

Total shareholders' equity 

6,000 

4,400 

Total liabilities and 

$11,000 

$10,000 

shareholders' equity 
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Table 2 Fictitious Corporation Income Statements for 
Years Ending December 31, in Thousands 



Current 

Year 

Prior 

Year 

Sales 

$10,000 

$9,000 

Cost of goods sold 

(6,500) 

(6,000) 

Gross profit 

$3,500 

$3,000 

Lease expense 

(1,000) 

(500) 

Administrative expense 

(500) 

(500) 

Earnings before interest and 

$2,000 

$2,000 

taxes (EBIT) 

Interest 

(400) 

(500) 

Earnings before taxes 

$1,600 

$1,500 

Taxes 

(400) 

(500) 

Net income 

$1,200 

$1,000 

Preferred dividends 

(100) 

(100) 

Earnings available to common 

$1,100 

$900 

shareholders 

Common dividends 

(500) 

(400) 

Retained earnings 

$600 

$500 


are available as the "current year." The "prior 
year " is the fiscal year prior to the current year. 

The ratios we introduce here are by no means 
the only ones that can be formed using financial 
data, though they are some of the more com¬ 
monly used. After becoming comfortable with 
the tools of financial analysis, an analyst will 
be able to create ratios that serve a particular 
evaluation objective. 


RETURN-ON-INVESTMENT 

RATIOS 

Return-on-investment ratios compare measures 
of benefits, such as earnings or net income, with 
measures of investment. For example, if an an¬ 
alyst wants to evaluate how well the firm uses 
its assets in its operations, he could calculate 
the return on assets —sometimes called the basic 
earning power ratio —as the ratio of earnings be¬ 
fore interest and taxes (EBIT) (also known as 
operating earnings) to total assets: 

Basic earning power 

Earnings before interest and taxes 


Table 3 Fictitious Company Statement of Cash Flows, 
Years Ended December 31, in Thousands 



Current 

Year 

Prior 

Year 

Cash flow from (used for) operating 
activities 

Net income 

$1,200 

$1,000 

Add or deduct adjustments to cash 
basis: 

Change in accounts receivables 

$200 

$(200) 

Change in accounts payable 

100 

400 

Change in marketable securities 

(200) 

200 

Change in inventories 

(800) 

(600) 

Change in other current liabilities 

300 

0 

Depreciation 

1,000 

1,000 


600 

800 

Cash flow from operations 

$1,800 

$1,800 

Cash flow from (used for) investing 
activities 

Purchase of plant and equipment 

$(1,000) 

$0 

Cash flow from (used for) investing 

$(1,000) 

$0 

activities 

Cash flow from (used for) financing 
activities 

Sale of common stock 

$1,000 

$0 

Repayment of long-term debt 

(1,000) 

(1,500) 

Payment of preferred dividends 

(100) 

(100) 

Payment of common dividends 

(500) 

(400) 

Cash flow from (used for) financing 

(600) 

(1,900) 

activities 

Increase (decrease) in cash flow 

$200 

$(100) 

Cash at the beginning of the year 

200 

300 

Cash at the end of the year 

$400 

$200 


For Fictitious Corporation, for the current year: 


Basic earning power = 


$2,000,000 
$11,000,000 
0.1818 or 18.18% 


For every dollar invested in assets. Fictitious 
earned about 18 cents in the current year. This 
measure deals with earnings from operations; 
it does not consider how these operations are 
financed. 

Another return-on-assets ratio uses net 
income—operating earnings less interest and 
taxes—instead of earnings before interest and 
taxes: 


Net income 

Return on assets = --- 

Total assets 


Total assets 
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(In actual application the same term, return on 
assets, is often used to describe both ratios. It 
is only in the actual context or through an ex¬ 
amination of the numbers themselves that we 
know which return ratio is presented. We use 
two different terms to describe these two return- 
on-asset ratios in this entry simply to avoid any 
confusion.) 

For Fictitious in the current year: 


Recap: Return-on-Investment Ratios 

The return-on-investment ratios for Fictitious 
Corporation for the current year are: 

Basic earning power = 18.18% 

Return on assets = 10.91% 

Return on equity = 20.00% 

These return-on-investment ratios indicate: 


Return on assets 


$1,200,000 

$11,000,000 


= 0.1091 or 10.91% 


Thus, without taking into consideration how 
assets are financed, the return on assets for Fic¬ 
titious is 18%. Taking into consideration how 
assets are financed, the return on assets is 11%. 
The difference is due to Fictitious financing part 
of its total assets with debt, incurring interest of 
$400,000 in the current year; hence, the return- 
on-assets ratio excludes taxes of $400,000 in the 
current year from earnings in the numerator. 

If we look at Fictitious's liabilities and equi¬ 
ties, we see that the assets are financed in part 
by liabilities ($1 million short term, $4 million 
long term) and in part by equity ($800,000 pre¬ 
ferred stock, $5.2 million common stock). In¬ 
vestors may not be interested in the return the 
firm gets from its total investment (debt plus 
equity), but rather shareholders are interested 
in the return the firm can generate on their in¬ 
vestment. The return on equity is the ratio of the 
net income shareholders receive to their equity 
in the stock: 


Return on equity 

Net income 

Book value of shareholders' equity 

For Fictitious Corporation, there is only one 
type of shareholder: common. For the current 
year: 


• Fictitious earns over 18% from operations, or 
about 11% overall, from its assets. 

• Shareholders earn 20% from their investment 
(measured in book value terms). 

These ratios do not provide information on: 

• Whether this return is due to the profit mar¬ 
gins (that is, due to costs and revenues) or to 
how efficiently Fictitious uses its assets. 

• The return shareholders earn on their actual 
investment in the firm, that is, what share¬ 
holders earn relative to their actual invest¬ 
ment, not the book value of their investment. 
For example, $100 may be invested in the 
stock, but its value according to the balance 
sheet may be greater than or, more likely, less 
than $100. 

DuPont System 

The returns on investment ratios provides a 
"bottom line" on the performance of a company, 
but do not tell us anything about the "why" be¬ 
hind this performance. For an understanding of 
the "why," an analyst must dig a bit deeper into 
the financial statements. A method that is useful 
in examining the source of performance is the 
DuPont system. The DuPont system is a method 
of breaking down return ratios into their com¬ 
ponents to determine which areas are respon¬ 
sible for a firm's performance. To see how it is 
used, let us take a closer look at the first defini¬ 
tion of the return on assets: 


Return on equity = 


$1,200,000 
$6,000,000 
0.2000 or 20.00% 


Basic earning power 

Earnings before interest and taxes 


Total assets 
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Suppose the return on assets changes from 
20% in one period to 10% the next period. We do 
not know whether this decreased return is due 
to a less efficient use of the firm's assets—that 
is, lower activity—or to less effective manage¬ 
ment of expenses (that is, lower profit margins). 
A lower return on assets could be due to lower 
activity, lower margins, or both. Because an an¬ 
alyst is interested in evaluating past operating 
performance to evaluate different aspects of the 
management of the firm and to predict future 
performance, knowing the source of these re¬ 
turns is valuable. 

Let us take a closer look at the return on assets 
and break it down into its components: mea¬ 
sures of activity and profit margin. We do this 
by relating both the numerator and the denom¬ 
inator to sales activity. Divide both the numer¬ 
ator and the denominator of the basic earning 
power by revenues: 

Basic earning power 

Earnings before interest and taxes/Revenues 
Revenues total assets/Revenues 
which is equivalent to: 

Basic earning power 

Earnings before interest and taxes 
Revenues 
Revenues 

Revenues total assets 

This says that the earning power of the com¬ 
pany is related to profitability (in this case, op¬ 
erating profit) and a measure of activity (total 
asset turnover). 

Basic earning power 
= (Operating profit margin) 

(Total asset turnover) 

When analyzing a change in the company's 
basic earning power, an analyst could look at 
this breakdown to see the change in its compo¬ 
nents: operating profit margin and total asset 
turnover. 

This method of analyzing return ratios in 
terms of profit margin and turnover ratios, re¬ 



ferred to as the DuPont System, is credited to the 
E.I. DuPont Corporation, whose management 
developed a system of breaking down return 
ratios into their components. 

Let's look at the return on assets of Fictitious 
for the two years. Its returns on assets were 20% 
in the prior year and 18.18% in the current year. 
We can decompose the firm's returns on assets 
for the two years to obtain: 



Basic Earning 

Operating 

Total Asset 

Year 

Power 

Profit Margin 

Turnover 

Prior 

20.00% 

22.22% 

0.9000 times 

Current 

18.18 

20.00 

0.9091 times 


We see that operating profit margin declined 
over the two years, yet asset turnover improved 
slightly, from 0.9000 to 0.9091. Therefore, the 
return-on-assets decline is attributable to lower 
profit margins. 

The return on assets can be broken down into 
its components in a similar manner: 

„ /Net income 

Return on assets = | —- 

Revenues 

Revenues 

Revenues total assets 
or 

Return on assets 
= (Net profit margin)(Total asset turnover) 


The basic earning power ratio relates to the 
return on assets. Recognizing that: 

Net income = Earnings before tax(l — Tax rate) 


then 


Net income = Earnings before interest and taxes 

( Earnings before taxes \ 

Earnings before interest and taxes / (1 — Tax rate) 

t t 

equity's share of earnings tax retention % 


The ratio of earnings before taxes to earn¬ 
ings before interest and taxes reflects the inter¬ 
est burden of the company, whereas the term 
(1 — tax rate) reflects the company's tax burden. 
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Therefore, 


into three parts: 


Return on assets 

( Earnings before interest and taxes \ 
Revenues J 

( Revenues \ 


\ Revenues total assets ) 

( Earnings before taxes 
Earnings before interest and taxes 
(1 — Tax rate) 


or 


Return on assets 

= (Operating profit margin) (Total asset turnover) 
x (Equity's share of earnings)(Tax retention %) 

The breakdown of a return-on-equity ratio 
requires a bit more decomposition because in¬ 
stead of total assets as the denominator, the de¬ 
nominator in the return is shareholders' equity. 
Because activity ratios reflect the use of all of the 
assets, not just the proportion financed by eq¬ 
uity, we need to adjust the activity ratio by the 
proportion that assets are financed by equity 
(that is, the ratio of the book value of share¬ 
holders' equity to total assets): 


Return on equity = (Return on assets) ^ 


Total assets 



Shareholder’s equity 


Total assets 


Shareholder’s equity 

t 

Equity multiplier 


The ratio of total assets to shareholders' eq¬ 
uity is referred to as the equity multiplier. The 
equity multiplier, therefore, captures the effects 
of how a company finances its assets, referred 
to as its financial leverage. Multiplying the to¬ 
tal asset turnover ratio by the equity multiplier 
allows us to break down the return-on-equity 
ratios into three components: profit margin, as¬ 
set turnover, and financial leverage. For exam¬ 
ple, the return on equity can be broken down 


Return on equity 

= (Net profit margin)(Total asset turnover) 
(Equity multiplier) 

Applying this breakdown to Fictitious for the 
two years: 



Return 

Net 

Total 

Total 

Equity 


on 

Profit 

Asset 

Debt to 

Multi- 

Year 

Equity Margin Turnover 

Assets 

plier 

Prior 

22.73% 

11.11% 

0.9000 times 

56.00% 

2.2727 

Current 20.00 

12.00 

0.9091 

45.45% 

1.8332 


The return on equity decreased over the two 
years because of a lower operating profit mar¬ 
gin and less use of financial leverage. 

The analyst can decompose the return on eq¬ 
uity further by breaking out the equity's share 
of before-tax earnings (represented by the ratio 
of earnings before and after interest) and tax 
retention percentage. Consider the example in 
Figure 1, in which we provide a DuPont break¬ 
down of the return on equity for Microsoft Cor¬ 
poration for the fiscal year ending June 30,2006, 
in Panel A. The return on equity of 31.486% can 
be broken down into three and then five compo¬ 
nents, as shown in this figure. We can also use 
this breakdown to compare the return on equity 
for the 2005 and 2006 fiscal years, as shown in 
Panel B. As you can see, the return on equity im¬ 
proved from 2005 to 2006 and, using this break¬ 
down, we can see that this was due primarily 
to the improvement in the asset turnover and 
the increased financial leverage. 

This decomposition allows the analyst to take 
a closer look at the factors that are control¬ 
lable by a company's management (e.g., asset 
turnover) and those that are not controllable 
(e.g., tax retention). The breakdowns lead the 
analyst to information on both the balance sheet 
and the income statement. And this is not the 
only breakdown of the return ratios—further 
decomposition is possible. 
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For the fiscal year ending June 30, 2006, 


Return on equity = 


Net income 
Total assets 


f 2,599 = 0.31486 or 31.486% 
$40,014 


Breaking return on equity into three components: 


„ Net income Revenues Total assets 

Return on equity = -x-x- 

Revenues Total assets Shareholders’equity 


$12.599 $44.282 

$44,282 X $69,597 X 


f 69 597 - 0.31486 or 31.486% 
$40,014 


Breaking the return on equity into five components: 


Return on equity = 


Earnings before ' 
interest and taxes 

, Revenues 


Earnings before taxes' 
Earnings before 
interest and taxes 


x (1 - Tax rate) x 


Revenues 
Total assets 


x 


Total assets 
Shareholders’equity 


Return on equity = 


$18.262 'l (' $18.262 

$44,282 J X l $18,262 
0.41240 x 1.0 
0.31486 or 31.486% 


( $44,282'] ( $69,597 


x (1 - 0.31010)x „ 
v ^ $69,597 J l, $40,014 

x 0.68990 x 0.63626 x 1.73932 


Comparing the components between the June 30, 2006 fiscal year and the June 30, 2005 fiscal year, 

Total assets 
Shareholders'equity 

Return on equity June 30, 2006 = 0.41240 x 1.0 x 0.68990 x 0.63626 x 1.73932 = 31.486% 

Return on equity June 30, 2006 = 0.41791 x 1.0 x 0.73695 x 0.56186 x 1.47179 = 25.468% 


Return on equity = 


earnings oerore 
interest and taxes 


Revenues 


Earnings before taxes 


Earnings before 
interest and taxes 


x (1 - Tax rate) x 


Revenues 1 

- > 

Total assets 


Figure 1 The DuPont System Applied to Microsoft Corporation 


LIQUIDITY 

Liquidity reflects the ability of a firm to meet its 
short-term obligations using those assets that 
are most readily converted into cash. Assets 
that may be converted into cash in a short pe¬ 
riod of time are referred to as liquid assets ; they 
are listed in financial statements as current as¬ 
sets. Current assets are often referred to as work¬ 
ing capital, since they represent the resources 
needed for the day-to-day operations of the 
firm's long-term capital investments. Current 
assets are used to satisfy short-term obligations, 
or current liabilities. The amount by which cur¬ 
rent assets exceed current liabilities is referred 
to as the net working capital. 

Operating Cycle 

How much liquidity a firm needs depends on 
its operating cycle. The operating cycle is the du¬ 
ration from the time cash is invested in goods 


and services to the time that investment pro¬ 
duces cash. For example, a firm that produces 
and sells goods has an operating cycle compris¬ 
ing four phases: 

1. Purchase raw materials and produce goods, 
investing in inventory. 

2. Sell goods, generating sales, which may or 
may not be for cash. 

3. Extend credit, creating accounts receivable. 

4. Collect accounts receivable, generating cash. 

The four phases make up the cycle of cash 
use and generation. The operating cycle would 
be somewhat different for companies that pro¬ 
duce services rather than goods, but the idea 
is the same—the operating cycle is the length 
of time it takes to generate cash through the 
investment of cash. 

What does the operating cycle have to 
do with liquidity? The longer the operat¬ 
ing cycle, the more current assets are needed 
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(relative to current liabilities) since it takes 
longer to convert inventories and receivables 
into cash. In other words, the longer the operat¬ 
ing cycle, the greater the amount of net working 
capital required. 

To measure the length of an operating cycle 
we need to know: 


How long would it take Fictitious to run out of 
inventory? 

We compute the days sales in inventory (DSI), 
also known as the number of days of inventory, by 
calculating the ratio of the amount of inventory 
on hand (in dollars) to the average day's cost of 
goods sold (in dollars per day): 


• The time it takes to convert the investment in 
inventory into sales (that is, cash —> inven¬ 
tory sales —> accounts receivable). 

• The time it takes to collect sales on credit (that 
is, accounts receivable —> cash). 

We can estimate the operating cycle for Fic¬ 
titious Corporation for the current year, using 
the balance sheet and income statement data. 
The number of days Fictitious ties up funds in 
inventory is determined by the total amount 
of money represented in inventory and the av¬ 
erage day's cost of goods sold. The current in¬ 
vestment in inventory—that is, the money "tied 
up" in inventory—is the ending balance of in¬ 
ventory on the balance sheet. The average day's 
cost of goods sold is the cost of goods sold on 
an average day in the year, which can be esti¬ 
mated by dividing the cost of goods sold (which 
is found on the income statement) by the num¬ 
ber of days in the year. The average day's cost 
of goods sold for the current year is: 

Average day's cost of goods sold 
Cost of goods sold 
365 days 
_ $6,500,000 
365 days 

= $17,808 per day 

In other words. Fictitious incurs, on average, a 
cost of producing goods sold of $17,808 per day. 

Fictitious has $1.8 million of inventory on 
hand at the end of the year. How many days' 
worth of goods sold is this? One way to look at 
this is to imagine that Fictitious stopped buying 
more raw materials and just finished produc¬ 
ing whatever was on hand in inventory, using 
available raw materials and work-in-process. 


Days sales in inventory 

Amount of inventory on hand 
Average day's cost of goods sold 
$1,800,000 
$17,808 per day 

In other words. Fictitious has approximately 
101 days of goods on hand at the end of the cur¬ 
rent year. If sales continued at the same price, 
it would take Fictitious 101 days to run out of 
inventory. 

If the ending inventory is representative of 
the inventory throughout the year, then it takes 
about 101 days to convert the investment in 
inventory into sold goods. Why worry about 
whether the year-end inventory is representa¬ 
tive of inventory at any day throughout the 
year? Well, if inventory at the end of the fis¬ 
cal year-end is lower than on any other day of 
the year, we have understated the DSI. Indeed, 
in practice most companies try to choose fiscal 
year-ends that coincide with the slow period of 
their business. That means the ending balance 
of inventory would be lower than the typical 
daily inventory of the year. To get a better pic¬ 
ture of the firm, we could, for example, look 
at quarterly financial statements and take aver¬ 
ages of quarterly inventory balances. However, 
here for simplicity we make a note of the prob¬ 
lem of representatives and deal with it later in 
the discussion of financial ratios. 

It should be noted that as an attempt to make 
the inventory figure more representative, some 
suggest taking the average of the beginning and 
ending inventory amounts. This does nothing 
to remedy the representativeness problem be¬ 
cause the beginning inventory is simply the 
ending inventory from the previous year and, 
like the ending value from the current year, is 
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measured at the low point of the operating cy¬ 
cle. A preferred method, if data are available, is 
to calculate the average inventory for the four 
quarters of the fiscal year. 

We can extend the same logic for calculating 
the number of days between a sale—when an 
account receivable is created—and the time it 
is collected in cash. If we assume that Fictitious 
sells all goods on credit, we can first calculate 
the average credit sales per day and then figure out 
how many days' worth of credit sales are rep¬ 
resented by the ending balance of receivables. 

The average credit sales per day are: 

_ , , Credit sales 

Credit sales per day = --- 

K J 365 days 

_ $10,000,000 

365 days 

= $27,397 per day 

Therefore, Fictitious generates $27,397 of 
credit sales per day. With an ending balance 
of accounts receivable of $600,000, the days sales 
outstanding (DSO), also known as the number of 
days of credit, in this ending balance is calculated 
by taking the ratio of the balance in the accounts 
receivable account to the credit sales per day: 

^ Accounts receivable 

Days sales outstanding = —— ----— 

Credit sales per day 

$600,000 
$27,397 per day 
= 22 days 

If the ending balance of receivables at the end 
of the year is representative of the receivables 
on any day throughout the year, then it takes, 
on average, approximately 22 days to collect 
the accounts receivable. In other words, it takes 
22 days for a sale to become cash. 

Using what we have determined for the in¬ 
ventory cycle and cash cycle, we see that for 
Fictitious: 

Operating cycle = DSI + DSO 

= 101 days + 22 days 
= 123 days 


We also need to look at the liabilities on the 
balance sheet to see how long it takes a firm 
to pay its short-term obligations. We can apply 
the same logic to accounts payable as we did to 
accounts receivable and inventories. Flow long 
does it take a firm, on average, to go from cre¬ 
ating a payable (buying on credit) to paying for 
it in cash? 

First, we need to determine the amount of an 
average day's purchases on credit. If we assume all 
the Fictitious purchases are made on credit, then 
the total purchases for the year would be the 
cost of goods sold less any amounts included 
in cost of goods sold that are not purchases. For 
example, depreciation is included in the cost of 
goods sold yet is not a purchase. Since we do 
not have a breakdown on the company's cost of 
goods sold showing how much was paid for in 
cash and how much was on credit, let us assume 
for simplicity that purchases are equal to cost of 
goods sold less depreciation. The average day's 
purchases then become: 

Average day's purchases 

Cost of goods sold — Depreciation 
365 days 

_ $6,500,000 - $1,000,000 
365 days 
= $15,068 per day 

The days payables outstanding (DPO), also 
known as the number of days of purchases, 
represented in the ending balance in accounts 
payable, is calculated as the ratio of the balance 
in the accounts payable account to the average 
day's purchases: 

Days payables outstanding 
Accounts payable 
Average day's purchases 

For Fictitious in the current year: 

$500,000 

Days payables outstanding = m065perday 

= 33 days 

This means that on average Fictitious takes 
33 days to pay out cash for a purchase. 
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The operating cycle tells us how long it takes 
to convert an investment in cash back into cash 
(by way of inventory and accounts receivable). 
The number of days of payables tells us how 
long it takes to pay on purchases made to cre¬ 
ate the inventory. If we put these two pieces 
of information together, we can see how long, 
on net, we tie up cash. The difference between 
the operating cycle and the number of days of 
purchases is the cash conversion cycle (CCC), also 
known as the net operating cycle: 

Cash conversion cycle = Operating cycle 
—DPO 

Or, substituting for the operating cycle, 

CCC = DSI + DSO - DPO 

The cash conversion cycle for Fictitious in the 
current year is: 

CCC = 101 + 22 - 33 = 90 days 

The CCC is how long it takes for the firm 
to get cash back from its investments in in¬ 
ventory and accounts receivable, considering 
that purchases may be made on credit. By not 
paying for purchases immediately (that is, us¬ 
ing trade credit), the firm reduces its liquidity 
needs. Therefore, the longer the net operating 
cycle, the greater the required liquidity. 


Measures of Liquidity 

The analyst can describe a firm's ability to meet 
its current obligations in several ways. The cur¬ 
rent ratio indicates the firm's ability to meet 
or cover its current liabilities using its current 
assets: 

_ Current assets 

Current ratio = --——- 

Current liabilities 

For the Fictitious Corporation, the current ra¬ 
tio for the current year is the ratio of current 
assets, $3 million, to current liabilities, the sum 
of accounts payable and other current liabilities. 


or $1 million. 

^ $3,000,000 „ „ 

Current ratio = ————— =3.0 times 

$1,000,000 


The current ratio of 3.0 indicates that Ficti¬ 
tious has three times as much as it needs to 
cover its current obligations during the year. 
However, the current ratio groups all current 
asset accounts together, assuming they are all as 
easily converted to cash. Even though, by def¬ 
inition, current assets can be transformed into 
cash within a year, not all current assets can be 
transformed into cash in a short period of time. 

An alternative to the current ratio is the quick 
ratio, also called the acid-test ratio, which uses a 
slightly different set of current accounts to cover 
the same current liabilities as in the current ra¬ 
tio. In the quick ratio, the least liquid of the 
current asset accounts, inventory, is excluded. 
Hence: 

Current assets — Inventory 
Q” 1 * ra,i ° =- Current liabilities - 

We typically leave out inventories in the quick 
ratio because inventories are generally per¬ 
ceived as the least liquid of the current assets. 
By leaving out the least liquid asset, the quick 
ratio provides a more conservative view of 
liquidity. 

For Fictitious in the current year: 


Quick ratio = 


$3,000,000 - $1,800,000 
$1,000,000 
$1,200,000 


$1,000,000 


= 1.2 times 


Still another way to measure the firm's abil¬ 
ity to satisfy short-term obligations is the net 
working capital-to-sales ratio, which compares 
net working capital (current assets less current 
liabilities) with sales: 


Net working capital-to-sales ratio 
Net working capital 
Sales 

This ratio tells us the "cushion" available to 
meet short-term obligations relative to sales. 
Consider two firms with identical working cap¬ 
ital of $100,000, but one has sales of $500,000 and 
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the other sales of $1 million. If they have iden¬ 
tical operating cycles, this means that the firm 
with the greater sales has more funds flowing 
in and out of its current asset investments (in¬ 
ventories and receivables). The company with 
more funds flowing in and out needs a larger 
cushion to protect itself in case of a disruption in 
the cycle, such as a labor strike or unexpected 
delays in customer payments. The longer the 
operating cycle, the more of a cushion (net 
working capital) a firm needs for a given level 
of sales. 

For Fictitious Corporation: 


Net working capital-to-sales-ratio 


$3,000,000 - 1,000,000 

$10,000,000 


= 0.2000 or 20% 


The ratio of 0.20 tells us that for every dollar 
of sales. Fictitious has 20 cents of net working 
capital to support it. 


Recap: Liquidity Ratios 

Operating cycle and liquidity ratio information 
for Fictitious using data for the current year, in 
summary, is: 

Days sales in inventory = 101 days 

Days sales outstanding = 22 days 

Operating cycle = 123 days 

Days payables outstanding = 33 days 
Cash conversion cycle = 90 days 

Current ratio = 3.0 

Quick ratio =1.2 

Net working capital-to-sales ratio = 20% 

Given the measures of time related to the 
current accounts—the operating cycle and the 
cash conversion cycle—and the three measures 
of liquidity—current ratio, quick ratio, and net 
working capital-to-sales ratio—we know the 
following about Fictitious Corporation's ability 
to meet its short-term obligations: 

• Inventory is less liquid than accounts receiv¬ 
able (comparing days of inventory with days 
of credit). 


• Current assets are greater than needed to sat¬ 
isfy current liabilities in a year (from the cur¬ 
rent ratio). 

• The quick ratio tells us that Fictitious can 
meet its short-term obligations even without 
resorting to selling inventory. 

• The net working capital "cushion" is 20 cents 
for every dollar of sales (from the net working 
capital-to-sales ratio.) 

What don't ratios tells us about liquidity? 

They don't provide us with answers to the fol¬ 
lowing questions: 

• Flow liquid are the accounts receivable? Flow 
much of the accounts receivable will be col¬ 
lectible? Whereas we know it takes, on aver¬ 
age, 22 days to collect, we do not know how 
much will never be collected. 

• What is the nature of the current liabilities? 
Flow much of current liabilities consists of 
items that recur (such as accounts payable 
and wages payable) each period and how 
much consists of occasional items (such as in¬ 
come taxes payable)? 

• Are there any unrecorded liabilities (such as 
operating leases) that are not included in 
current liabilities? 


PROFITABILITY RATIOS 

Liquidity ratios indicate a firm's ability to meet 
its immediate obligations. Now we extend the 
analysis by adding profitability ratios, which help 
the analyst gauge how well a firm is managing 
its expenses. Profit margin ratios compare com¬ 
ponents of income with sales. They give the an¬ 
alyst an idea of which factors make up a firm's 
income and are usually expressed as a portion 
of each dollar of sales. For example, the profit 
margin ratios we discuss here differ only in the 
numerator. It is in the numerator that we can 
evaluate performance for different aspects of 
the business. 

For example, suppose the analyst wants to 
evaluate how well production facilities are 
managed. The analyst would focus on gross 
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profit (sales less cost of goods sold), a measure 
of income that is the direct result of produc¬ 
tion management. Comparing gross profit with 
sales produces the gross profit margin: 


Gross profit margin 

Revenues - Cost of goods sold 
Revenues 

This ratio tells us the portion of each dollar 
of sales that remains after deducting produc¬ 
tion expenses. For Fictitious Corporation for the 
current year: 


Gross profit margin = 


$10,000,000 - $6,500,000 
$ 10 , 000,000 
$3,500,000 


$10,000,000 

= 0.3500 or 35% 


For each dollar of revenues, the firm's gross 
profit is 35 cents. Looking at sales and cost of 
goods sold, we can see that the gross profit mar¬ 
gin is affected by: 

• Changes in sales volume, which affect cost of 
goods sold and sales. 

• Changes in sales price, which affect revenues. 

• Changes in the cost of production, which af¬ 
fect cost of goods sold. 

Any change in gross profit margin from one 
period to the next is caused by one or more 
of those three factors. Similarly, differences in 
gross margin ratios among firms are the result 
of differences in those factors. 

To evaluate operating performance, we need 
to consider operating expenses in addition to 
the cost of goods sold. To do this, remove 
operating expenses (e.g., selling and general 
administrative expenses) from gross profit, 
leaving operating profit, also referred to as earn¬ 
ings before interest and taxes (EBIT). The oper¬ 
ating profit margin is therefore: 


For Fictitious in the current year: 

„ . . $ 2 , 000,000 

Operating p,of.tma, 8 m= $ia00M0(| 

= 0.20 or 20% 

Therefore, for each dollar of revenues. Fictitious 
has 20 cents of operating income. The operating 
profit margin is affected by the same factors 
as gross profit margin, plus operating expenses 
such as: 

• Office rent and lease expenses 

• Miscellaneous income (e.g., income from 
investments) 

• Advertising expenditures 

• Bad debt expense 

Most of these expenses are related in some way 
to revenues, though they are not included di¬ 
rectly in the cost of goods sold. Therefore, the 
difference between the gross profit margin and 
the operating profit margin is due to these in¬ 
direct items that are included in computing the 
operating profit margin. 

Both the gross profit margin and the operating 
profit margin reflect a company's operating per¬ 
formance. But they do not consider how these 
operations have been financed. To evaluate both 
operating and financing decisions, the analyst 
must compare net income (that is, earnings af¬ 
ter deducting interest and taxes) with revenues. 
The result is the net profit margin: 

x Net income 

Net profit margin = - 

Revenues 

The net profit margin tells the analyst the net 
income generated from each dollar of revenues; 
it considers financing costs that the operating 
profit margin does not consider. For Fictitious 
for the current year: 


Operating profit margin 

Revenues — Cost of goods sold — Operating expenses 
Revenues 

Revenues earnings before interest and taxes 


$1,200,000 

Net profit margin = -= 0.12 or 12% 

F 6 $10,000,000 

For every dollar of revenues. Fictitious gener¬ 
ates 12 cents in profits. 


Revenues 
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Recap: Profitability Ratios 

The profitability ratios for Fictitious in the cur¬ 
rent year are: 

Gross profit margin = 35% 

Operating profit margin = 20% 

Net profit margin = 12% 

They indicate the following about the operat¬ 
ing performance of Fictitious: 

• Each dollar of revenues contributes 35 cents 
to gross profit and 20 cents to operating profit. 

• Every dollar of revenues contributes 12 cents 
to owners' earnings. 

• By comparing the 20-cent operating profit 
margin with the 12-cent net profit margin, 
we see that Fictitious has 8 cents of financ¬ 
ing costs for every dollar of revenues. 

What these ratios do not indicate about prof¬ 
itability is the sensitivity of gross, operating, 
and net profit margins to: 

• Changes in the sales price 

• Changes in the volume of sales 

Looking at the profitability ratios for one firm 
for one period gives the analyst very little infor¬ 
mation that can be used to make judgments re¬ 
garding future profitability. Nor do these ratios 
provide the analyst any information about why 
current profitability is what it is. We need more 
information to make these kinds of judgments, 
particularly regarding the future profitability of 
the firm. For that, turn to activity ratios, which 
are measures of how well assets are being used. 

ACTIVITY RATIOS 

Activity ratios —for the most part, turnover 
ratios—can be used to evaluate the benefits pro¬ 
duced by specific assets, such as inventory or 
accounts receivable, or to evaluate the benefits 
produced by the totality of the firm's assets. 

Inventory Management 

The inventory turnover ratio indicates how 
quickly a firm has used inventory to generate 


the goods and services that are sold. The inven¬ 
tory turnover is the ratio of the cost of goods 
sold to inventory: 

Cost of goods sold 

Inventory turnover ratio = -—- 

Inventory 

For Fictitious for the current year: 

$6,500,000 

Inventory turnover ratio = -———-- 

y $1,800,000 

= 3.61 times 

This ratio indicates that Fictitious turns over 
its inventory 3.61 times per year. On average, 
cash is invested in inventory, goods and ser¬ 
vices are produced, and these goods and ser¬ 
vices are sold 3.6 times a year. Looking back to 
the number of days of inventory, we see that this 
turnover measure is consistent with the results 
of that calculation: There are 101 calendar days 
of inventory on hand at the end of the year; di¬ 
viding 365 days by 101 days, or 365/101 days, 
we find that inventory cycles through (from 
cash to sales) 3.61 times a year. 

Accounts Receivable Management 

In much the same way inventory turnover can 
be evaluated, an analyst can evaluate a firm's 
management of its accounts receivable and its 
credit policy. The accounts receivable turnover 
ratio is a measure of how effectively a firm is 
using credit extended to customers. The rea¬ 
son for extending credit is to increase sales. 
The downside to extending credit is the pos¬ 
sibility of default—customers not paying when 
promised. The benefit obtained from extending 
credit is referred to as net credit sales —sales on 
credit less returns and refunds. 

Accounts receivable turnover 
Net credit sales 
Accounts receivable 

Looking at the Fictitious Corporation income 
statement, we see an entry for sales, but we do 
not know how much of the amount stated is 
on credit. In the case of evaluating a firm, an 
analyst would have an estimate of the amount 
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of credit sales. Let us assume that the entire 
sales amount represents net credit sales. For 
Fictitious for the current year: 


. , , $ 10 , 000,000 

Accounts receivable turnover = __ _ 

$600,000 

= 16.67 times 

Therefore, almost 17 times in the year there is, 
on average, a cycle that begins with a sale on 
credit and finishes with the receipt of cash for 
that sale. In other words, there are 17 cycles of 
sales to credit to cash during the year. 

The number of times accounts receivable cy¬ 
cle through the year is consistent with the 
days sales outstanding (22) that we calculated 
earlier—accounts receivable turn over 17 times 
during the year, and the average number of 
days of sales in the accounts receivable balance 
is 365 days/16.67 times = 22 days. 


Overall Asset Management 

The inventory and accounts receivable turnover 
ratios reflect the benefits obtained from the use 
of specific assets (inventory and accounts re¬ 
ceivable). For a more general picture of the pro¬ 
ductivity of the firm, an analyst can compare 
the sales during a period with the total assets 
that generated these revenues. 

One way is with the total asset turnover ra¬ 
tio, which indicates how many times during the 
year the value of a firm's total assets is gener¬ 
ated in revenues: 

„ , Revenues 

Total assets turnover = ——-- 

Total assets 

For Fictitious in the current year: 

, $10,000,000 

Total assets turnover = ---—— 

$11,000,000 

= 0.91 times 

The turnover ratio of 0.91 indicated that in the 
current year, every dollar invested in total as¬ 
sets generates 91 cents of sales. Or, stated dif¬ 
ferently, the total assets of Fictitious turn over 
almost once during the year. Because total as¬ 


sets include both tangible and intangible assets, 
this turnover indicates how efficiently all assets 
were used. 

An alternative is to focus only on fixed assets, 
the long-term, tangible assets of the firm. The 
fixed-asset turnover is the ratio of revenues to 
fixed assets: 

„ , Revenues 

Fixed asset turnover ratio = --- 

Fixed assets 

For Fictitious in the current year: 


^ , $ 10 , 000,000 

Fixed asset turnover ratio- 

$7,000,000 

= 1.43 times 


Therefore, for every dollar of fixed assets. Ficti¬ 
tious is able to generate $1.43 of revenues. 


Recap: Activity Ratios 

The activity ratios for Fictitious Corporation 
are: 

Inventory turnover ratio = 3.61 times 
Accounts receivable turnover 

ratio = 16.67 times 

Total asset turnover ratio = 0.91 times 

Fixed-asset turnover ratio = 1.43 times 

From these ratios the analyst can determine 
that: 

• Inventory flows in and out almost four times 
a year (from the inventory turnover ratio). 

• Accounts receivable are collected in cash, on 
average, 22 days after a sale (from the number 
of days of credit). In other words, accounts 
receivable flow in and out almost 17 times 
during the year (from the accounts receivable 
turnover ratio). 

Flere is what these ratios do not indicate about 
the firm's use of its assets: 

• The sales not made because credit policies are 
too stringent. 

• Flow much of credit sales is not collectible. 

• Which assets contribute most to the turnover. 
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FINANCIAL LEVERAGE 
RATIOS 

A firm can finance its assets with equity or with 
debt. Financing with debt legally obligates the 
firm to pay interest and to repay the principal 
as promised. Equity financing does not obli¬ 
gate the firm to pay anything because dividends 
are paid at the discretion of the board of direc¬ 
tors. There is always some risk, which we re¬ 
fer to as business risk, inherent in any business 
enterprise. But how a firm chooses to finance 
its operations—the particular mix of debt and 
equity—may add financial risk on top of busi¬ 
ness risk. Financial risk is risk associated with a 
firm's ability to satisfy its debt obligations, and 
is often measured using the extent to which debt 
financing is used relative to equity. 

Financial leverage ratios are used to assess how 
much financial risk the firm has taken on. There 
are two types of financial leverage ratios: com¬ 
ponent percentages and coverage ratios. Com¬ 
ponent percentages compare a firm's debt with 
either its total capital (debt plus equity) or its 
equity capital. Coverage ratios reflect a firm's 
ability to satisfy fixed financing obligations, 
such as interest, principal repayment, or lease 
payments. 


Component Percentage Ratios 

A ratio that indicates the proportion of assets fi¬ 
nanced with debt is the debt-to-assets ratio, which 
compares total liabilities (short-term + long¬ 
term debt) with total assets: 


Total debt-to-assets ratio = 


Debt 

Total assets 


For Fictitious in the current year: 


Total debt-to-assets ratio 


$5,000,000 
$ 11 , 000,000 
0.4546 or 45.46% 


This ratio indicates that 45% of the firm's assets 
are financed with debt (both short term and 
long term). 

Another way to look at the financial risk is in 
terms of the use of debt relative to the use of eq¬ 


uity. The debt-to-equity ratio indicates how the 
firm finances its operations with debt relative 
to the book value of its shareholders' equity: 


Debt-to-equity ratio 


Debt 


Book value of shareholders' equity 

For Fictitious for the current year, using the 
book-value definition: 


Debt-to-equity ratio = 


$5,000,000 


$6,000,000 

= 0.8333 or 83.33% 


For every $1 of book value of shareholders' 
equity. Fictitious uses 83 cents of debt. 

Both of these ratios can be stated in terms of 
total debt, as above, or in terms of long-term 
debt or even simply interest-bearing debt. And 
it is not always clear in which form—total, long¬ 
term debt, or interest-bearing—the ratio is cal¬ 
culated. Additionally, it is often the case that the 
current portion of long-term debt is excluded 
in the calculation of the long-term versions of 
these debt ratios. 


Book Value versus Market Value 
One problem with using a financial ratio based 
on the book value of equity to analyze financial 
risk is that there is seldom a strong relationship 
between the book value and market value of a 
stock. The distortion in values on the balance 
sheet is obvious by looking at the book value of 
equity and comparing it with the market value 
of equity. The book value of equity consists of: 

• The proceeds to the firm of all the stock issues 
since it was first incorporated, less any stock 
repurchased by the firm. 

• The accumulative earnings of the firm, less 
any dividends, since it was first incorporated. 

Fet's look at an example of the book value ver¬ 
sus the market value of equity. IBM was incor¬ 
porated in 1911, so the book value of its equity 
represents the sum of all its stock issued and 
all its earnings, less any dividends paid since 
1911. As of the end of 2006, IBM's book value of 
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equity was approximately $28.5 billion, yet its 
market value was $142.8 billion. 

Book value generally does not give a true pic¬ 
ture of the investment of shareholders in the 
firm because: 

• Earnings are recorded according to account¬ 
ing principles, which may not reflect the true 
economics of transactions. 

• Due to inflation, the earnings and proceeds 
from stock issued in the past do not reflect 
today's values. 

Market value, on the other hand, is the value 
of equity as perceived by investors. It is what 
investors are willing to pay. So why bother with 
book value? For two reasons: First, it is easier to 
obtain the book value than the market value of 
a firm's securities, and second, many financial 
services report ratios using book value rather 
than market value. 

However, any of the ratios presented in this 
entry that use the book value of equity can be 
restated using the market value of equity. For 
example, instead of using the book value of eq¬ 
uity in the debt-to-equity ratio, the market value 
of equity to measure the firm's financial lever¬ 
age can be used. 

Coverage Ratios 

The ratios that compare debt to equity or debt to 
assets indicate the amount of financial leverage, 
which enables an analyst to assess the financial 
condition of a firm. Another way of looking at 
the financial condition and the amount of finan¬ 
cial leverage used by the firm is to see how well 
it can handle the financial burdens associated 
with its debt or other fixed commitments. 

One measure of a firm's ability to handle fi¬ 
nancial burdens is the interest coverage ratio, also 
referred to as the times interest-covered ratio. This 
ratio tells us how well the firm can cover or meet 
the interest payments associated with debt. The 
ratio compares the funds available to pay inter¬ 
est (that is, earnings before interest and taxes) 


with the interest expense: 

EBIT 

Interest coverage ratio = -- 

Interest expense 

The greater the interest coverage ratio, the bet¬ 
ter able the firm is to pay its interest expense. 
For Fictitious for the current year: 

$2,000,000 n 

Interest coverage ratio = -= 5 times 

6 $400,000 

An interest coverage ratio of 5 means that the 
firm's earnings before interest and taxes are five 
times greater than its interest payments. 

The interest coverage ratio provides informa¬ 
tion about a firm's ability to cover the interest 
related to its debt financing. However, there are 
other costs that do not arise from debt but that 
nevertheless must be considered in the same 
way we consider the cost of debt in a firm's 
financial obligations. For example, lease pay¬ 
ments are fixed costs incurred in financing op¬ 
erations. Like interest payments, they represent 
legal obligations. 

What funds are available to pay debt and 
debt-like expenses? Start with EBIT and add 
back expenses that were deducted to arrive at 
EBIT. The ability of a firm to satisfy its fixed fi¬ 
nancial costs—its fixed charges—is referred to 
as the fixed-charge coverage ratio. One definition 
of the fixed-charge coverage considers only the 
lease payments: 

Fixed-charge coverage ratio 
EBIT + Lease expense 
Interest + Lease expense 
For Fictitious for the current year: 

Fixed-charge coverage ratio 
_ $ 2 , 000,000 + $ 1 , 000,000 
“ $400,000 + $1,000,000 
= 2.14 times 

This ratio tells us that Fictitious's earnings can 
cover its fixed charges (interest and lease pay¬ 
ments) more than two times over. 

What fixed charges to consider is not entirely 
clear-cut. For example, if the firm is required 
to set aside funds to eventually or periodically 
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retire debt—referred to as sinking funds—is the 
amount set aside a fixed charge? As another 
example, since preferred dividends represent a 
fixed financing charge, should they be included 
as a fixed charge? From the perspective of the 
common shareholder, the preferred dividends 
must be covered either to enable the payment 
of common dividends or to retain earnings for 
future growth. Because debt principal repay¬ 
ment and preferred stock dividends are paid 
on an after-tax basis—paid out of dollars re¬ 
maining after taxes are paid—this fixed charge 
must be converted to before-tax dollars. The 
fixed charge coverage ratio can be expanded to 
accommodate the sinking funds and preferred 
stock dividends as fixed charges. 

Up to now we considered earnings before 
interest and taxes as funds available to meet 
fixed financial charges. EBIT includes noncash 
items such as depreciation and amortization. 
If an analyst is trying to compare funds avail¬ 
able to meet obligations, a better measure of 
available funds is cash flow from operations, 
as reported in the statement of cash flows. A 
ratio that considers cash flows from operations 
as funds available to cover interest payments is 
referred to as the cash-flow interest coverage ratio. 

Cash flow interest coverage ratio 

Cash flow from operations + Interest + Taxes 
Interest 

The amount of cash flow from operations that 
is in the statement of cash flows is net of interest 
and taxes. So we have to add back interest and 
taxes to cash flow from operations to arrive at 
the cash flow amount before interest and taxes 
in order to determine the cash flow available to 
cover interest payments. 

For Fictitious for the current year: 


This coverage ratio indicates that, in terms of 
cash flows. Fictitious has 6.5 times more cash 
than is needed to pay its interest. This is a bet¬ 
ter picture of interest coverage than the five 
times reflected by EBIT. Why the difference? 
Because cash flow considers not just the ac¬ 
counting income, but noncash items as well. 
In the case of Fictitious, depreciation is a non¬ 
cash charge that reduced EBIT but not cash flow 
from operations—it is added back to net income 
to arrive at cash flow from operations. 


Recap: Financial Leverage Ratios 

Summarizing, the financial leverage ratios for 
Fictitious Corporation for the current year are: 


Debt-to-assets ratio = 45.45% 

Debt-to-equity ratio = 83.33% 

Interest coverage ratio = 5.00 times 

Fixed-charge coverage ratio =2.14 times 

Cash-flow interest coverage ratio = 6.50 times 


These ratios indicate that Fictitious uses its 
financial leverage as follows: 

* Assets are 45% financed with debt, measured 
using book values. 

* Fong-term debt is approximately two-thirds 
of equity. When equity is measured in market 
value terms, long-term debt is approximately 
one-sixth of equity. 

These ratios do not indicate: 


• What other fixed, legal commitments the firm 
has that are not included on the balance sheet 
(for example, operating leases). 

• What the intentions of management are re¬ 
garding taking on more debt as the existing 
debt matures. 


Cash flow interest coverage ratio 
$1,800,000 + $400,000 + $400,000 
~~ $400,000 


$2,600,000 

$400,000 


= 6.5 times 


COMMON-SIZE ANALYSIS 

An analyst can evaluate a company's operating 
performance and financial condition through 
ratios that relate various items of information 
contained in the financial statements. Another 
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way to analyze a firm is to look at its financial 
data more comprehensively. 

Common-size analysis is a method of analysis in 
which the components of a financial statement 
are compared with each other. The first step in 
common-size analysis is to break down a finan¬ 
cial statement—either the balance sheet or the 
income statement—into its parts. The next step 
is to calculate the proportion that each item rep¬ 
resents relative to some benchmark. This form 
of common-size analysis is sometimes referred 
to as vertical common-size analysis. Another form 
of common-size analysis is horizontal common- 
size analysis, which uses either an income 
statement or a balance sheet in a fiscal year and 
compares accounts to the corresponding items 
in another year. In common-size analysis of the 
balance sheet, the benchmark is total assets. For 
the income statement, the benchmark is sales. 

Let us see how it works by doing some 
common-size financial analysis for the Fic¬ 
titious Corporation. The company's balance 
sheet is restated in Table 4. This statement does 
not look precisely like the balance sheet we have 
seen before. Nevertheless, the data are the same 
but reorganized. Each item in the original bal¬ 
ance sheet has been restated as a proportion 


Table 4 Fictitious Corporation Common-Size Balance 
Sheets for Years Ending December 31 



Current Year 

Prior Year 

Asset Components 





Cash 

3.6% 


2.0% 


Marketable securities 

1.8% 


0.0% 


Accounts receivable 

5.5% 


8.0% 


Inventory 

16.4% 


10.0% 


Current assets 


27.3% 


20.0% 

Net plant and 


63.5% 


70.0% 

equipment 





Intangible assets 


9.2% 


10.0% 

Total assets 


100.0% 


100.0% 

Liability and shareholders' equity components 


Accounts payable 

4.6% 


4.0% 


Other current liabilities 

4.6% 


2.0% 


Long-term debt 

36.4% 


50.0% 


Total liabilities 


45.4% 


56.0% 

Shareholders' equity 


54.6% 


44.0% 

Total liabilities and 


100.0% 


100.0% 

shareholders' equity 






of total assets for the purpose of common size 
analysis. Flence, we refer to this as the common- 
size balance sheet. 

In this balance sheet, we see, for example, that 
in the current year cash is 3.6% of total assets, 
or $400,000/$ll,000,000 = 0.036. The largest in¬ 
vestment is in plant and equipment, which com¬ 
prises 63.6% of total assets. On the liabilities 
side, that current liabilities are a small portion 
(9.1%) of liabilities and equity. 

The common-size balance sheet indicates in 
very general terms how Fictitious has raised 
capital and where this capital has been invested. 
As with financial ratios, however, the picture 
is not complete until trends are examined and 
compared with those of other firms in the same 
industry. 

In the income statement, as with the balance 
sheet, the items may be restated as a propor¬ 
tion of sales; this statement is referred to as 
the common-size income statement. The common- 
size income statements for Fictitious for the two 
years are shown in Table 5. For the current year, 
the major costs are associated with goods sold 
(65%); lease expense, other expenses, interest, 
taxes, and dividends make up smaller portions 
of sales. Looking at gross profit, EBIT, and net 
income, these proportions are the profit mar¬ 
gins we calculated earlier. The common-size in¬ 
come statement provides information on the 
profitability of different aspects of the firm's 
business. Again, the picture is not yet complete. 

Table 5 Fictitious Corporation Common-Size Income 
Statement for Years Ending December 31 


Current Prior 
Year Year 


Sales 

100.0% 

100.0% 

Cost of goods sold 

65.0% 

66.7% 

Gross profit 

35.0% 

33.3% 

Lease and administrative expenses 

15.0% 

11.1% 

Earnings before interest and taxes 

20.0% 

22.2% 

Interest expense 

4.0% 

5.6% 

Earnings before taxes 

16.0% 

16.6% 

Taxes 

4.0% 

5.5% 

Net income 

12.0% 

11.1% 

Common dividends 

6.0% 

5.6% 

Retained earnings 

6.0% 

5.5% 
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For a more complete picture, the analyst must 
look at trends over time and make comparisons 
with other companies in the same industry. 


USING FINANCIAL 
RATIO ANALYSIS 

Financial analysis provides information con¬ 
cerning a firm's operating performance and fi¬ 
nancial condition. This information is useful 
for an analyst in evaluating the performance 
of the company as a whole, as well as of di¬ 
visions, products, and subsidiaries. An analyst 
must also be aware that financial analysis is also 
used by analysts and investors to gauge the fi¬ 
nancial performance of the company. 

But financial ratio analysis cannot tell the 
whole story and must be interpreted and used 
with care. Financial ratios are useful but, as 
noted in the discussion of each ratio, there is 
information that the ratios do not reveal. For 
example, in calculating inventory turnover, we 
need to assume that the inventory shown on 
the balance sheet is representative of inventory 
throughout the year. Another example is in the 
calculation of accounts receivable turnover. We 
assumed that all sales were on credit. If we 
are on the outside looking in—that is, evalu¬ 
ating a firm based on its financial statements 
only, such as the case of a financial analyst or 
investor—and therefore do not have data on 
credit sales, assumptions must be made that 
may or may not be correct. 

In addition, there are other areas of concern 
that an analyst should be aware of in using fi¬ 
nancial ratios: 

• Limitations in the accounting data used to 
construct the ratios. 

• Selection of an appropriate benchmark firm 
or firms for comparison purposes. 

• Interpretation of the ratios. 

• Pitfalls in forecasting future operating perfor¬ 
mance and financial condition based on past 
trends. 


KEY POINTS 

• The basic data for financial analysis are the fi¬ 
nancial statement data. These data are used 
to analyze relationships between different 
elements of a firm's financial statements. 
Through this analysis, a picture of the operat¬ 
ing performance and financial condition of a 
firm can be developed. 

• Looking at the calculated financial ratios, 
in conjunction with industry and economic 
data, judgments about past and future finan¬ 
cial performance and condition can be made. 

• Financial ratios can be classified by type— 
coverage, return, turnover, or component 
percentage—or by the financial character¬ 
istic that we wish to measure—liquidity, 
profitability activity, financial leverage, or 
return. 

• Liquidity ratios indicate firm's ability to sat¬ 
isfy short-term obligations. These ratios are 
closely related to a firm's operating cycle, 
which tells us how long it takes a firm to turn 
its investment in current assets back into cash. 

• Profitability ratios indicate how well a firm 
manages its assets, typically in terms of the 
proportion of revenues that are left over after 
expenses. 

• Activity ratios measure how efficiently a firm 
manages its assets, that is, how effectively a 
firm uses its assets to generate sales. 

• Financial leverage ratios indicate (1) to what 
extent a firm uses debt to finance its oper¬ 
ations and (2) its ability to satisfy debt and 
debt-like obligations. 

• Return-on-investment ratios provide a gauge 
for how much of each dollar of an investment 
is generated in a period. 

• The DuPont system breaks down return ratios 
into their profit margin and activity ratios, 
allowing us to analyze changes in return on 
investments. 

• Common-size analysis expresses financial 
statement data relative to some benchmark 
item—usually total assets for the balance 
sheet and sales for the income statement. Rep¬ 
resenting financial data in this way allows 
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an analyst to spot trends in investments and 
profitability 

• Interpretation of financial ratios requires an 
analyst to put the trends and comparisons 
in perspective with the company's signifi¬ 
cant events. In addition to company-specific 
events, issues that can cause the analysis of 
financial ratios to become more challenging 
include the use of historical accounting val¬ 
ues, changes in accounting principles, and ac¬ 
counts that are difficult to classify 

• Comparison of financial ratios across time 
and with competitors is useful in gauging per¬ 
formance. In comparing ratios over time, an 
analyst should consider changes in account¬ 
ing and significant company events. In com¬ 
paring ratios with a benchmark, an analyst 


must take care in the selection of the com¬ 
panies that constitute the benchmark and the 
method of calculation. 
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Abstract: An objective of financial analysis is to assess a company's operating performance and 
financial condition. The information that is available for analysis includes economic, market, and 
financial information. But some of the most important financial data are provided by the company 
in its annual and quarterly financial statements. These choices make it quite difficult to compare 
financial performance and condition across companies, and also provide an opportunity for the 
management of financial numbers through judicious choice of accounting methods. Cash flows 
provide a way of transforming net income based on an accrual system to a more comparable 
basis. Additionally, cash flows are essential ingredients in valuation: The value of a company 
today is the present value of its expected future cash flows. Therefore, understanding past and 
current cash flows may help in forecasting future cash flows and, hence, determine the value of 
the company Moreover, understanding cash flow allows the assessment of the ability of a firm to 
maintain current dividends and its current capital expenditure policy without relying on external 
financing. 


One of the key financial measures that an ana¬ 
lyst should understand is the company's cash 
flow. This is because the cash flow aids the an¬ 
alyst in assessing the ability of the company 
to satisfy its contractual obligations and main¬ 
tain current dividends and current capital ex¬ 
penditure policy without relying on external 
financing. Moreover, an analyst must under¬ 
stand why this measure is important for exter¬ 
nal parties, specifically stock analysts covering 
the company The reason is that the basic valua¬ 
tion principle followed by stock analysts is that 
the value of a company today is the present 


value of its expected future cash flows. In this 
entry we discuss cash-flow analysis. 


DIFFICULTIES WITH 
MEASURING CASH FLOW 

The primary difficulty with measuring a cash 
flow is that it is a flow: Cash flows into the com¬ 
pany (cash inflows) and cash flows out of the 
company (cash outflows). At any point in time 
there is a stock of cash on hand, but the stock of 
cash on hand varies among companies because 
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of the size of the company, the cash demands of 
the business, and a company's management of 
working capital. So what is cash flow? Is it the 
total amount of cash flowing into the company 
during a period? Is it the total amount of cash 
flowing out of the company during a period? 
Is it the net of the cash inflows and outflows 
for a period? Well, there is no specific definition 
of cash flow—and that's probably why there is 
so much confusion regarding the measurement 
of cash flow. Ideally, a measure of the com¬ 
pany's operating performance that is compa¬ 
rable among companies is needed—something 
other than net income. 

A simple, yet crude method of calculating 
cash flow requires simply adding noncash ex¬ 
penses (e.g., depreciation and amortization) to 
the reported net income amount to arrive at 
cash flow. For example, the estimated cash flow 
for Procter & Gamble (P&G) for 2002 is: 

Estimated cash flow 

= Net income + Depreciation and 
amortization 

= $4,352 million + 1,693 million 
= $6,045 million 

This amount is not really a cash flow, but sim¬ 
ply earnings before depreciation and amorti¬ 
zation. Is this a cash flow that stock analysts 
should use in valuing a company? Though not 
a cash flow, this estimated cash flow does allow 
a quick comparison of income across firms that 
may use different depreciation methods and de¬ 
preciable lives. (As an example of the use of this 
estimate of cash flow. The Value Line Investment 
Survey, published by Value Line, Inc., reports a 
cash flow per share amount, calculated as re¬ 
ported earnings plus depreciation, minus any 
preferred dividends, stated per share of com¬ 
mon stock.) [Guide to Using the Value Line In¬ 
vestment Survey (New York: Value Line, Inc.), 
p. 19.] 

The problem with this measure is that it ig¬ 
nores the many other sources and uses of cash 
during the period. Consider the sale of goods 
for credit. This transaction generates sales for 


the period. Sales and the accompanying cost 
of goods sold are reflected in the period's net 
income and the estimated cash flow amount. 
However, until the account receivable is col¬ 
lected, there is no cash from this transaction. 
If collection does not occur until the next pe¬ 
riod, there is a misalignment of the income and 
cash flow arising from this transaction. There¬ 
fore, the simple estimated cash flow ignores 
some cash flows that, for many companies, are 
significant. 

Another estimate of cash flow that is simple to 
calculate is earnings before interest, taxes, de¬ 
preciation, and amortization (EBITDA). How¬ 
ever, this measure suffers from the same 
accrual-accounting bias as the previous mea¬ 
sure, which may result in the omission of sig¬ 
nificant cash flows. Additionally, EBITDA does 
not consider interest and taxes, which may also 
be substantial cash outflows for some compa¬ 
nies. (For a more detailed discussion of the 
EBITDA measure, see Eastman [1997].) 

These two rough estimates of cash flows are 
used in practice not only for their simplicity, but 
because they experienced widespread use prior 
to the disclosure of more detailed information in 
the statement of cash flows. Currently, the mea¬ 
sures of cash flow are wide ranging, including 
the simplistic cash flow measures, measures de¬ 
veloped from the statement of cash flows, and 
measures that seek to capture the theoretical 
concept of free cashflow. 


CASH FLOWS AND THE 
STATEMENT OF CASH 
FLOWS 

Prior to the adoption of the statement of cash 
flows, the information regarding cash flows was 
quite limited. The first statement that addressed 
the issue of cash flows was the statement of 
financial position, which was required start¬ 
ing in 1971 (APB Opinion No. 19, "Reporting 
Changes in Financial Position"). This statement 
was quite limited, requiring an analysis of the 
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sources and uses of funds in a variety of for¬ 
mats. In its earlier years of adoption, most com¬ 
panies provided this information using what 
is referred to as the working capital concept —a 
presentation of working capital provided and 
applied during the period. Over time, many 
companies began presenting this information 
using the cash concept, which is a most detailed 
presentation of the cash flows provided by op¬ 
erations, investing, and financing activities. 

Consistent with the cash concept format of 
the funds flow statement, the statement of cash 
flows is now a required financial statement. The 
requirement that companies provide a state¬ 
ment of cash flows applies to fiscal years after 
1987 (Statement of Financial Accounting Stan¬ 
dards No. 95, "Statement of Cash Flows"). This 
statement requires the company to classify cash 
flows into three categories, based on the activ¬ 
ity: operating, investing, and financing. Cash 
flows are summarized by activity and within 
activity by type (e.g., asset dispositions are re¬ 
ported separately from asset acquisitions). 

The reporting company may report the 
cash flows from operating activities on the 
statement of cash flows using either the di¬ 
rect method —reporting all cash inflows and 
outflows—or the indirect method —starting with 
net income and making adjustments for de¬ 
preciation and other noncash expenses and for 
changes in working capital accounts. Though 
the direct method is recommended, it is also the 
most burdensome for the reporting company 
to prepare. Most companies report cash flows 
from operations using the indirect method. The 
indirect method has the advantage of provid¬ 
ing the financial statement user with a recon¬ 
ciliation of the company's net income with the 
change in cash. The indirect method produces a 
cash flow from operations that is similar to the 
estimated cash flow measure discussed previ¬ 
ously, yet it encompasses the changes in work¬ 
ing capital accounts that the simple measure 
does not. For example, Procter & Gamble's cash 
flow from operating activities (taken from their 
2002 statement of cash flows) is $7,742 million. 


which is over $1 billion more than the cash flow 
that we estimated earlier. (Procter & Gamble's 
fiscal year ends June 30,2002.) 

The classification of cash flows into the three 
types of activities provides useful information 
that can be used by an analyst to see, for ex¬ 
ample, whether the company is generating suf¬ 
ficient cash flows from operations to sustain 
its current rate of growth. However, the clas¬ 
sification of particular items is not necessarily 
as useful as it could be. Consider some of the 
classifications: 

• Cash flows related to interest expense are clas¬ 
sified in operations, though they are clearly 
financing cash flows. 

• Income taxes are classified as operating cash 
flows, though taxes are affected by financing 
(e.g., deduction for interest expense paid on 
debt) and investment activities (e.g., the re¬ 
duction of taxes from tax credits on invest¬ 
ment activities). 

• Interest income and dividends received are 
classified as operating cash flows, though 
these flows are a result of investment 
activities. 

Whether these items have a significant ef¬ 
fect on the analysis depends on the particular 
company's situation. Procter & Gamble, for ex¬ 
ample, has very little interest and dividend in¬ 
come, and its interest expense of $603 million is 
not large relative to its earnings before interest 
and taxes ($6,986 million). Table 1 shows that 
by adjusting P&G's cash flows for the interest 
expense only (and related taxes) changes the 
complexion of its cash flows slightly to re¬ 
flect greater cash-flow generation from opera¬ 
tions and less cash flow reliance on financing 
activities. 

The adjustment is for $603 million of inter¬ 
est and other financing costs, less its tax shield 
(the amount that the tax bill is reduced by the 
interest deduction) of $211 (estimated from the 
average tax rate of 35% of $603): adjustment = 
$603 (1 - 0.35) = $392. 
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Table 1 Adjusted Cash Flow for P&G (2002) 



As 

As 

(In Millions) 

Reported Adjusted 

Cash flow from operations 

$7,741 

$8,134 

Cash flow for investing activities 

(6,835) 

(6,835) 

Cash flow from (for) financing 

197 

(195) 

activities 




Source: Procter & Gamble 2002 Annual Report. 


For other companies, however, this adjust¬ 
ment may provide a less flattering view of cash 
flows. Consider Amazon.com's fiscal year re¬ 
sults. Interest expense to financing, along with 
their respective estimated tax effects, results in 
more reliance on cash flow from financing as 
can be seen in Table 2. 

Looking at the relation among the three cash 
flows in the statement provides a sense of the ac¬ 
tivities of the company. A young, fast-growing 
company may have negative cash flows from 
operations, yet positive cash flows from financ¬ 
ing activities (that is, operations may be fi¬ 
nanced in large part with external financing). 
As a company grows, it may rely to a lesser 
extent on external financing. The typical, ma¬ 
ture company generates cash from operations 
and reinvests part or all of it back into the com¬ 
pany. Therefore, cash flow related to operations 
is positive (that is a source of cash) and cash 
flow related to investing activities is negative 
(that is, a use of cash). As a company matures, 
it may seek less financing externally and may 
even use cash to reduce its reliance on exter¬ 
nal financing (e.g., repay debts). We can clas¬ 
sify companies on the basis of the pattern of 
their sources of cash flows, as shown in Table 3. 


Table 2 Adjusted Cash Flow, Amazon.com (2001) 



As 

As 

(In Millions) 

Reported Adjusted 

Cash flow from operations 

$(120) 

$(30) 

Cash flow for investing activities 

(253) 

(253) 

Cash flow from financing 

(107) 

17 

activities 




The adjustment is based on interest expense of 
$139 million, and a tax rate of 35%. 

Source: Amazon.com 2001 10-K. 


Though additional information is required to 
assess a company's financial performance and 
condition, examination of the sources of cash 
flows, especially over time, gives us a general 
idea of the company's operations. P&G's cash 
flow pattern is consistent with that of a mature 
company, whereas Amazon.com's cash flows 
are consistent with those of a fast-growing com¬ 
pany that is reliant on outside funds for growth. 

Fridson (2002) suggests reformatting the 
statement of cash flows as shown in Table 4. 
From the basic cash flow, the nondiscretionary 
cash needs are subtracted resulting in a cash 
flow referred to as discretionary cash flow. By 
restructuring the statement of cash flows in this 
way, it can be seen how much flexibility the 
company has when it must make business de¬ 
cisions that may adversely impact the long-run 
financial health of the enterprise. 

For example, consider a company with a 
basic cash flow of $800 million and operating 
cash flow of $500 million. Suppose that this 
company pays dividends of $130 million and 
that its capital expenditure is $300 million. Then 
the discretionary cash flow for this company is 
$200 million found by subtracting the $300 mil¬ 
lion capital expenditure from the operating cash 
flow of $500 million. This means that even after 
maintaining a dividend payment of $130 mil¬ 
lion, its cash flow is positive. Notice that asset 
sales and other investing activity are not needed 
to generate cash to meet the dividend payments 
because in Table 4 these items are subtracted 
after accounting for the dividend payments. In 
fact, if this company planned to increase its cap¬ 
ital expenditures, the format in Table 4 can be 
used to assess how much that expansion can 
be before affecting dividends and / or increasing 
financing needs. 

Though we can classify a company based 
on the sources and uses of cash flows, more 
data are needed to put this information in per¬ 
spective. What is the trend in the sources and 
uses of cash flows? What market, industry, or 
company-specific events affect the company's 
cash flows? How does the company being 
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Table 3 Patterns of Sources of Cash Flows 


Cash Flow 

Financing Growth 
Externally and 
Internally 

Financing 

Growth 

Internally 

Mature 

Temporary 

Financial 

Downturn 

Financial 

Distress 

Downsizing 

Operations 

+ 

+ 

+ 

— 

— 

+ 

Investing activities 

- 

- 

- 

+ 

- 

+ 

Financing activities 

+ 

— 

+ or - 

+ 

— 

— 


analyzed compare with other companies in the 
same industry in terms of the sources and uses 
of funds? 

Let's take a closer look at the incremental 
information provided by cash flows. Consider 
Wal-Mart Stores, Inc., which had growing sales 
and net income from 1990 to 2005, as summa¬ 
rized in Figure 1. We see that net income grew 
each year, with the exception of 1995, and that 
sales grew each year. 

We get additional information by looking at 
the cash flows and their sources, as graphed in 
Figure 2. We see that the growth in Wal-Mart 
was supported both by internally generated 
funds and, to a lesser extent, through external 
financing. Wal-Mart's pattern of cash flows sug¬ 
gests that Wal-Mart is a mature company that 


Table 4 Suggested Reformatting of Cash Flow 
Statement to Analyze a Company's Flexibility 


Less: 

Basic cash flow 

Increase in adjusted working capital 

Less: 

Operating cash flow 

Capital expenditures 

Less: 

Discretionary cash flow 

Dividends 

Less: 

Asset sales and other investing activities 

Less: 

Cash flow before financing 

Net (increase) in long-term debt 

Less: 

Net (increase) in notes payable 

Less: 

Net purchase of company's common stock 

Less: 

Miscellaneous 


Cash flow 


Notes: 

1. The basic cash flow includes net earnings, deprecia¬ 
tion, and deferred income taxes, less items in net income 
not providing cash. 

2. The increase in adjusted working capital excludes 
cash and payables. 

Source: This format was suggested by Fridson (1995). 


has become less reliant on external financing, 
funding most of its growth in recent years (with 
the exception of 1999) with internally generated 
funds. 

FREE CASH FLOW 

Cash flows without any adjustment may be mis¬ 
leading because they do not reflect the cash 
outflows that are necessary for the future ex¬ 
istence of a firm. An alternative measure, free 
cash flow, was developed by Jensen (1986) in 
his theoretical analysis of agency costs and cor¬ 
porate takeovers. In theory, free cash flow is the 
cash flow left over after the company funds all 
positive net present value projects. Positive net 
present value projects are those capital invest¬ 
ment projects for which the present value of 
expected future cash flows exceeds the present 
value of project outlays, all discounted at the 
cost of capital. (The cost of capital is the cost to 
the company of funds from creditors and share¬ 
holders. The cost of capital is basically a hurdle: 
If a project returns more than its cost of cap¬ 
ital, it is a profitable project.) In other words, 
free cash flow is the cash flow of the firm, less 
capital expenditures necessary to stay in busi¬ 
ness (that is, replacing facilities as necessary) 
and grow at the expected rate (which requires 
increases in working capital). 

The theory of free cash flow was developed 
by Jensen to explain behaviors of companies 
that could not be explained by existing eco¬ 
nomic theories. Jensen observed that companies 
that generate free cash flow should disgorge 
that cash rather than invest the funds in 
less profitable investments. There are many 
ways in which companies can disgorge this 
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Figure 1 Wal-Mart Stores, Inc., Revenues, Operating Profit, and Net Income, 1990-2005 
Source: Wal-Mart Stores, Inc., Annual Report, various years. 


excess cash flow, including the payment of cash 
dividends, the repurchase of stock, and debt 
issuance in exchange for stock. The debt-for- 
stock exchange, for example, increases the com¬ 
pany's leverage and future debt obligations, 
obligating the future use of excess cash flow. If a 
company does not disgorge this free cash flow, 
there is the possibility that another company—a 
company whose cash flows are less than its 
profitable investment opportunities or a com¬ 
pany that is willing to purchase and lever-up 
the company—will attempt to acquire the free- 
cash-flow-laden company. 

As a case in point, Jensen observed that the 
oil industry illustrates the case of wasting re¬ 


sources: The free cash flows generated in the 
1980s were spent on low-return exploration and 
development and on poor diversification at¬ 
tempts through acquisitions. He argues that 
these companies would have been better off 
paying these excess cash flows to shareholders 
through share repurchases or exchanges with 
debt. 

By itself, the fact that a company generates 
free cash flow is neither good nor bad. What the 
company does with this free cash flow is what 
is important. And this is where it is important 
to measure the free cash flow as that cash flow 
in excess of profitable investment opportuni¬ 
ties. Consider the simple numerical exercise 



Figure 2 Wal-Mart Stores, Inc., Cash Flows, 1990-2005 
Source: Wal-Mart Stores, Inc., Annual Report, various years. 
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with the Winner Company and the Loser 
Company: 


Winner Loser 

Company Company 

Cash flow before capital 

$1,000 

$1,000 

expenditures 



Capital expenditures, positive 

(750) 

(250) 

net present value projects 



Capital expenditures, negative 

0 

(500) 

net present value projects 



Cash flow 

$250 

$250 

Free cash flow 

$250 

$750 


These two companies have identical cash 
flows and the same total capital expenditures. 
However, the Winner Company spends only 
on profitable projects (in terms of positive net 
present value projects), whereas the Loser Com¬ 
pany spends on both profitable projects and 
wasteful projects. The Winner Company has 
a lower free cash flow than the Loser Com¬ 
pany, indicating that they are using the gen¬ 
erated cash flows in a more profitable manner. 
The lesson is that the existence of a high level 
of free cash flow is not necessarily good—it 
may simply suggest that the company is either 
a very good takeover target or the company 
has the potential for investing in unprofitable 
investments. 

Positive free cash flow may be good or bad 
news; likewise, negative free cash flow may be 
good or bad news: 



Good News 

Bad News 

Positive 

The company is 

The company is 

free 

generating 

generating more 

cash 

substantial operating 

cash flows than it 

flow 

cash flows, beyond 

needs for 


those necessary for 

profitable projects 


profitable projects. 

and may waste 
these cash flows 



on unprofitable 
projects. 

Negative 

The company has more 

The company is 

free 

profitable projects 

unable to generate 

cash 

than it has operating 

sufficient 

flow 

cash flows and must 

operating cash 


rely on external 

flows to satisfy its 


financing to fund 

investment needs 


these projects. 

for future growth. 


Therefore, once the free cash flow is calcu¬ 
lated, other information (e.g., trends in prof¬ 
itability) must be considered to evaluate the 
operating performance and financial condition 
of the firm. 

CALCULATING FREE 
CASH FLOW 

There is some confusion when this theoretical 
concept is applied to actual companies. The pri¬ 
mary difficulty is that the amount of capital ex¬ 
penditures necessary to maintain the business 
at its current rate of growth is generally not 
known; companies do not report this item and 
may not even be able to determine how much 
of a period's capital expenditures are attributed 
to maintenance and how much are attributed to 
expansion. 

Consider Procter & Gamble's property, plant, 
and equipment for 2002, which comprise some, 
but not all, of P&G's capital investment: 

Additions to property, plant, and $1,679 million 

equipment 

Dispositions of property, plant, (227) 

and equipment 

Net change before depreciation $1,452 million 

(In addition to the traditional capital expendi¬ 
tures (that is, changes in property, plant, and 
equipment), P&G also has cash flows related to 
investment securities and acquisitions. These 
investments are long-term and are hence part 
of P&G's investment activities cash outflow of 
$6,835 million.) 

How much of the $1,679 million is for main¬ 
taining P&G's current rate of growth and how 
much is for expansion? Though there is a posi¬ 
tive net change of $1,452 million, does it mean 
that P&G is expanding? Not necessarily: The 
additions are at current costs, whereas the dis¬ 
positions are at historical costs. The additions 
of $1,679 are less than P&G's depreciation and 
amortization expense for 2001 of $1,693 million, 
yet it is not disclosed in the financial reports 
how much of this latter amount reflects amor¬ 
tization. (P&G's depreciation and amortization 
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are reported together as $1,693 million on the 
statement of cash flows.) The amount of neces¬ 
sary capital expenditures is therefore elusive. 

Some estimate free cash flow by assuming 
that all capital expenditures are necessary for 
the maintenance of the current growth of the 
company. Though there is little justification in 
using all expenditures, this is a practical solu¬ 
tion to an impractical calculation. This assump¬ 
tion allows us to estimate free cash flows using 
published financial statements. 

Another issue in the calculation is defining 
what is truly "free" cash flow. Generally, we 
think of "free" cash flow as that being left over 
after all necessary financing expenditures are 
paid; this means that free cash flow is after in¬ 
terest on debt is paid. Some calculate free cash 
flow before such financing expenditures, others 
calculate free cash flow after interest, and still 
others calculate free cash flow after both interest 
and dividends (assuming that dividends are a 
commitment, though not a legal commitment). 

There is no one correct method of calculating 
free cash flow and different analysts may ar¬ 
rive at different estimates of free cash flow for a 
company. The problem is that it is impossible to 
measure free cash flow as dictated by the the¬ 
ory, so many methods have arisen to calculate 
this cash flow. A simple method is to start with 
the cash flow from operations and then deduct 
capital expenditures. For P&G in 2002, 

Cash flow from operations $7,742 

Deduct capital (1,692) 

expenditures 

Free cash flow $6,050 

Though this approach is rather simple, the 
cash flow from the operations amount includes 
a deduction for interest and other financing ex¬ 
penses. Making an adjustment for the after-tax 
interest and financing expenses, as we did ear¬ 


lier for Procter & Gamble, 

Cash flow from operations (as reported) $7,742 

Adjustment 392 

Cash flow from operations (as adjusted) $8,134 

Deduct capital expenditures (1,692) 

Free cash flow $6,442 


We can relate free cash flow directly to a com¬ 
pany's income. Starting with net income, we 
can estimate free cash flow using four steps: 

Step 1: Determine earnings before interest and 
taxes (EBIT). 

Step 2: Calculate earnings before interest but 
after taxes. 

Step 3: Adjust for noncash expenses (e.g., de¬ 
preciation). 

Step 4: Adjust for capital expenditures and 
changes in working capital. 

Using these four steps, we can calculate the 
free cash flow for Procter & Gamble for 2002, as 
shown in Table 5. 


NET FREE CASH FLOW 

There are many variations in the calculation of 
cash flows that are used in analyses of com¬ 
panies' financial condition and operating per¬ 
formance. As an example of these variations, 
consider the alternative to free cash flow de¬ 
veloped by Fitch, a company that rates corpo¬ 
rate debt instruments. This cash flow measure, 
referred to as net free cashflow (NFCF), is free 
cash flow less interest and other financing costs 
and taxes. In this approach, free cash flow is 
defined as earnings before depreciation, inter¬ 
est, and taxes, less capital expenditures. Capital 
expenditures encompass all capital spending, 
whether for maintenance or expansion, and no 
changes in working capital are considered. 

The basic difference between NFCF and free 
cash flow is that the financing expenses— 
interest and, in some cases, dividends—are de¬ 
ducted. If preferred dividends are perceived 
as nondiscretionary—that is, investors come to 
expect the dividends—dividends may be in¬ 
cluded with the interest commitment to arrive 
at net free cash flow. Otherwise, dividends are 
deducted from net free cash flow to produce 
cash flow. Another difference is that NFCF does 
not consider changes in working capital in the 
analysis. 





Cash-Flow Analysis 


573 


Table 5 Calculation of Procter & Gamble's Free Cash Flow for 2002, in Millions* 


Step l: 

Net income 
Add taxes 
Add interest 

Earnings before interest and taxes 
Step 2: 

Earnings before interest and taxes 
Deduct taxes (@35%) 

Earnings before interest 
Step 3: 

Earnings before interest 
Add depreciation and amortization 
Add increase in deferred taxes 
Earnings before noncash expenses 

Step 4: 

Earnings before noncash expenses 
Deduct capital expenditures 
Add decrease in receivables 
Add decrease in inventories 
Add cash flows from changes in accounts payable, 
accrued expenses, and other liabilities 
Deduct cash flow from changes in other operating assets 
and liabilities 

Cash flow from change in working capital accounts 
Free cash flow 


$4,352 

2,031 

603 

$6,986 

$6,986 

(2.445) 

$4,541 

$4,541 

1,693 

389 

$6,623 


$6,623 

(1,679) 

$96 

159 

684 

(98) 


841 

$5,785 


’Procter & Gamble's fiscal year ended June 30,2002. Charges in operating accounts are taken from Procter & Gamble's 
Statement of Cash Flows. 


Further, cash taxes are deducted to arrive 
at net free cash flow. Cash taxes are the in¬ 
come tax expense restated to reflect the actual 
cash flow related to this obligation, rather than 
the accrued expense for the period. Cash taxes 
are the income tax expense (from the income 
statement) adjusted for the change in deferred 
income taxes (from the balance sheets). For 
Procter & Gamble in 2002, 

Income tax expense $2,031 

Deduct increase in deferred income (389) 

tax 

Cash taxes $1,642 

(Note that cash taxes require taking the tax 
expense and either increasing this to reflect any 
decrease in deferred taxes [that is, the payment 
this period of tax expense recorded in a prior 
period] or decreasing this amount to reflect any 
increase in deferred taxes [that is, the deferment 
of some of the tax expense].) 


In the case of Procter & Gamble for 2002, 


EBIT $6,986 

Add depreciation and amortization 1,693 
EBITDA $8,679 

Deduct capital expenditures (1,679) 

Free cash flow $7,000 

Deduct interest (603) 

Deduct cash taxes (1,642) 

Net free cash flow $4,755 

Deduct cash common dividends (2,095) 

Net cash flow $2,660 


The free cash flow amount per this calculation 
differs from the $5,785 that we calculated ear¬ 
lier for two reasons: Changes in working capital 
and the deduction of taxes on operating earn¬ 
ings were not considered. 

Net cash flow gives an idea of the uncon¬ 
strained cash flow of the company. This cash 
flow measure may be useful from a creditor's 
perspective in terms of evaluating the com¬ 
pany's ability to fund additional debt. From a 
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shareholder's perspective, net cash flow (that 
is, net free cash flow net of dividends) may be 
an appropriate measure because this represents 
the cash flow that is reinvested in the company. 

USEFULNESS OF CASH 
FLOWS IN FINANCIAL 
ANALYSIS 

The usefulness of cash flows for financial anal¬ 
ysis depends on whether cash flows provide 
unique information or provide information in a 
manner that is more accessible or convenient for 
the analyst. The cash flow information provided 
in the statement of cash flows, for example, is 
not necessarily unique because most, if not all, 
of the information is available through analy¬ 
sis of the balance sheet and income statement. 
What the statement does provide is a classifi¬ 
cation scheme that presents information in a 
manner that is easier to use and, perhaps, more 
illustrative of the company's financial position. 

An analysis of cash flows and the sources of 
cash flows can reveal the following information: 

• The sources of financing the company's cap¬ 
ital spending. Does the company generate in¬ 


ternally (that is, from operations) a portion or 
all of the funds needed for its investment ac¬ 
tivities? If a company cannot generate cash 
flow from operations, this may indicate prob¬ 
lems up ahead. Reliance on external financing 
(e.g., equity or debt issuance) may indicate 
a company's inability to sustain itself over 
time. 

• The company's dependence on borrowing. 

Does the company rely heavily on borrow¬ 
ing that may result in difficulty in satisfying 
future debt service? 

* The quality of earnings. Large and growing 
differences between income and cash flows 
suggest a low quality of earnings. 

Consider the financial results of Krispy Kreme 
Doughnuts, Inc., a wholesaler and retailer of 
donuts. Krispy Kreme grew from having fewer 
than 200 stores before its initial public offering 
(IPO) in 2000 to over 400 stores at the end of 
its 2005 fiscal year. Accompanying this growth 
in stores is the growth in operating and net in¬ 
come, as we show in Figure 3. The growth in in¬ 
come continued after the IPO as the number of 
stores increased, but the tide in income turned 
in the 2004 fiscal year and losses continued into 
the 2005 fiscal year as well. 
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Figure 3 Krispy, Kreme Doughnuts, Inc. Income, 1997-2006 
Source: Krispy Kreme Doughnuts, Inc., 10-K filings, various years. 
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Operating activities 



Figure 4 Krispy, Kreme Doughnuts, Inc., Cash Flows, 1997-2006 
Source: Krispy, Kreme Doughnuts, Inc., 10-K filings, various years. 


Krispy Kreme's growth just after its IPO was 
financed by both operating activities and exter¬ 
nal financing, as we show in Figure 4. However, 
approximately half of the funds to support its 
rapid growth and to purchase some of its fran¬ 
chised stores in the 2000-2003 fiscal years came 
from long-term financing. This resulted in prob¬ 
lems as the company's debt burden became al¬ 
most three times its equity as revenue growth 
slowed by the 2005 fiscal year. Krispy Kreme 
demonstrated some ability to turn itself around 
in the 2006 fiscal year, partly by slowing its ex¬ 
pansion through new stores. 

Ratio Analysis 

One use of cash-flow information is in ratio 
analysis, primarily with the balance sheet and 
income statement information. Once such ra¬ 
tio is the cash flow-based ratio, the cash-flow 
interest coverage ratio, which is a measure of fi¬ 
nancial risk. There are a number of other cash 
flow-based ratios that an analyst may find use¬ 
ful in evaluating the operating performance and 
financial condition of a company. 

A useful ratio to help further assess a com¬ 
pany's cash flow is the cashflow-to-capital expen¬ 


ditures ratio, or capital expenditures coverage ratio: 

Cash flow-to-capital expenditures 

_ Cash flow 
Capital expenditures 

The cash-flow measure in the numerator 
should be one that has not already removed 
capital expenditures; for example, including 
free cash flow in the numerator would be in¬ 
appropriate. 

This ratio provides information about the fi¬ 
nancial flexibility of the company and is par¬ 
ticularly useful for capital-intensive firms and 
utilities (see Fridson, 2002, p. 173). The larger 
the ratio, the greater the financial flexibility. 
However, one must carefully examine the rea¬ 
sons why this ratio may be changing over time 
and why it might be out of line with comparable 
firms in the industry. For example, a declining 
ratio can be interpreted in two ways. First, the 
firm may eventually have difficulty adding to 
capacity via capital expenditures without the 
need to borrow funds. The second interpreta¬ 
tion is that the firm may have gone through a 
period of major capital expansion and therefore 
it will take time for revenues to be generated 
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that will increase the cash flow from operations 
to bring the ratio to some normal long-run level. 

Another useful cash flow ratio is the cashflozv- 
to-debt ratio: 

_ , _ , , Cashflow 

Cash flow to debt = ———- 

Debt 

where debt can be represented as total debt, 
long-term debt, or a debt measure that cap¬ 
tures a specific range of maturity (e.g., debt 
maturing in five years). This ratio gives a mea¬ 
sure of a company's ability to meet maturing 
debt obligations. A more specific formulation 
of this ratio is Fitch's CFAR ratio, which com¬ 
pares a company's three-year average net free 
cash flow to its maturing debt over the next five 
years (see McConville, 1996). By comparing the 
company's average net free cash flow to the ex¬ 
pected obligations in the near term (that is, five 
years), this ratio provides information on the 
company's credit quality. 

Using Cash-Flow Information 

The analysis of cash flows provides information 
that can be used along with other financial data 
to help assess the financial condition of a com¬ 
pany. Consider the cash flow-to-debt ratio cal¬ 
culated using three different measures of cash 
flow—EBITDA, free cash flow, and cash flow 
from operations (from the statement of cash 
flows)—each compared with long-term debt, as 
shown in Figure 5 for Weirton Steel. 

This example illustrates the need to under¬ 
stand the differences among the cash flow mea¬ 
sures. The effect of capital expenditures in the 
1988-1991 period can be seen by the difference 
between the free-cash-flow measure and the 
other two measures of cash flow; both EBITDA 
and cash flow from operations ignore capital 
expenditures, which were substantial outflows 
for this company in the earlier period. 

Cash-flow information may help a stock or 
bond analyst identify companies that may 
encounter financial difficulties. Consider the 
study by Largay and Stickney (1980) that an¬ 


alyzed the financial statements of W. T. Grant 
during the 1966-1974 period preceding its 
bankruptcy in 1975 and ultimate liquidation. 
They noted that financial indicators such as 
profitability ratios, turnover ratios, and liq¬ 
uidity ratios showed some downtrends, but 
provided no definite clues to the company's im¬ 
pending bankruptcy. A study of cash flows from 
operations, however, revealed that company 
operations were causing an increasing drain on 
cash, rather than providing cash. (For the period 
investigated, a statement of changes of financial 
position [on a working capital basis]) was re¬ 
quired to be reported prior to 1988.] This neces¬ 
sitated an increased use of external financing, 
the required interest payments on which exac¬ 
erbated the cash-flow drain. Cash-flow analy¬ 
sis clearly was a valuable tool in this case since 
W. T. Grant had been running a negative cash 
flow from operations for years. Yet none of the 
traditional ratios discussed above take into ac¬ 
count the cash flow from operations. Use of the 
cash flow-to-capital expenditures ratio and the 
cash flow-to-debt ratio would have highlighted 
the company's difficulties. 

Dugan and Samson (1996) examined the use 
of operating cash flow as an early warning sig¬ 
nal of a company's potential financial problems. 
The subject of the study was Allied Products 
Corporation because for a decade this company 
exhibited a significant divergence between cash 
flow from operations and net income. For parts 
of the period, net income was positive while 
cash flow from operations was a large negative 
value. In contrast to W. T. Grant, which went 
into bankruptcy, the auditor's report in the 1991 
Annual Report of Allied Products Corporation 
did issue a going-concern warning. Moreover, 
the stock traded in the range of $2 to $3 per 
share. There was then a turnaround of the com¬ 
pany by 1995. In its 1995 annual report, net 
income increased dramatically from prior pe¬ 
riods (to $34 million) and there was a positive 
cash flow from operations ($29 million). The 
stock traded in the $25 range by the spring of 
1996. As with the W. T. Grant study, Dugan and 
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Figure 5 Cash Flow to Debt Using Alternative Estimates of Cash Flow for Weirton Steel, 1988-1996 
Source: Weirton Steel's 10-K reports, various years. 


Samson (1996) found that the economic reali¬ 
ties of a firm are better reflected in its cash flow 
from operations. 

The importance of cash-flow analysis in 
bankruptcy prediction is supported by the 
study by Foster and Ward (1997), who 
compared trends in the statement of cash 
flows components—cash flow from opera¬ 
tions, cash flow for investment, and cash 
flow for financing—between healthy compa¬ 
nies and companies that subsequently sought 
bankruptcy. They observe that healthy com¬ 
panies tend to have relatively stable relations 
among the cash flows for the three sources, cor¬ 
recting any given year's deviation from their 
norm within one year. They also observe that 
unhealthy companies exhibit declining cash 
flows from operations and financing and declin¬ 
ing cash flows for investment one and two years 
prior to the bankruptcy. Further, unhealthy 
companies tend to expend more cash flows to 
financing sources than they bring in during the 
year prior to bankruptcy. These studies illus¬ 
trate the importance of examining cash flow in¬ 
formation in assessing the financial condition 
of a company. 


KEY POINTS 

• The term "cash flow" has many mean¬ 
ings and the challenge is to determine the 
cash-flow definition and calculation that is 
appropriate. The simplest calculation of cash 
flow is the sum of net income and noncash 
expenses. This measure, however, does not 
consider other sources and uses of cash 
during the period. 

• The statement of cash flows provides a use¬ 
ful breakdown of the sources of cash flows: 
operating activities, investing activities, and 
financing activities. Though attention is gen¬ 
erally focused on the cash flows from op¬ 
erations, what the company does with the 
cash flows (that is, investing or paying off fi¬ 
nancing obligations) and what the sources of 
invested funds are (that is, operations ver¬ 
sus external financing) must be investigated. 
Minor adjustments can be made to the items 
classified in the statement of cash flows to im¬ 
prove the classification. 

• Examination of the different patterns of cash 
flows is necessary to get a general idea of 
the activities of the company. For example, 
a company whose only source of cash flow is 
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from investing activities, suggesting the sale 
of property or equipment, may be experienc¬ 
ing financial distress. 

• Free cash flow is a company's cash flow 
that remains after making capital investments 
that maintain the company's current rate of 
growth. It is not possible to calculate free 
cash flow precisely, resulting in many differ¬ 
ent variations in calculations of this measure. 
A company that generates free cash flow is 
not necessarily performing well or poorly; the 
existence of free cash flow must be taken in 
context with other financial data and infor¬ 
mation on the company. 

* One of the variations in the calculation of a 
cash-flow measure is net free cash flow, which 
is, essentially, free cash flow less any financing 
obligations. This is a measure of the funds 
available to service additional obligations to 
suppliers of capital. 
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Abstract: Probability theory can be understood as a particular field in mathematics. Hence, it is only 
to be expected that it relies intensely on theory from analysis and algebra. For example, the fact 
that the cumulative probability over all values a random variable can assume has to be equal to 
one is not always feasible to check for without a profound knowledge of mathematics. Continuous 
probability distributions involve a good deal of analysis and the more sophisticated a distribution 
is, the more mathematics is necessary to handle it. 


In this entry, we review the functions that are 
used in financial modeling: continuous func¬ 
tions, the indicator function, the derivative of 
a function, monotonic functions, and the inte¬ 
gral. Moreover, as special functions, we get to 
know the factorial, the gamma, beta, and Bessel 
functions as well as the characteristic function 
of random variables. (For a more detailed dis¬ 
cussion of these functions, see Khuri [2003], 
MacCluer [2009], and Richardson [2008].) 


CONTINUOUS FUNCTION 

In this section, we introduce general continuous 
functions. 

General Idea 

Let f(x) be a continuous function for some real¬ 
valued variable x. The general idea behind con¬ 
tinuity is that the graph of f(x) does not exhibit 
gaps. In other words,/(x) can be thought of as 
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Figure 1 Continuous Function/(x) 

Note: For x e [0,1 ),f{x) = x 2 and for x e [1,2),/(x) = 1 + Ln(x). 

being seamless. We illustrate this in Figure 1. exponential function g(x) = e x where e = 2.7183 

For increasing x, from x — 0 to x = 2, we can is the Euler constant. The inverse has the effect 

move along the graph of f(x) without ever hav- that/(g(x)) = ln(e x ) = x, that is. In and e cancel 

ing to jump. In the figure, the graph is generated each other out. 

by the two functions/(x) = x 2 for x e [0,1), and A function /(x) is discontinuous if we have 
/(x) = ln(x) + 1 for x e [1, 2). to jump when we move along the graph of the 

Note that the function/(x) = In (x) is the nat- function. For example, consider the graph in 

ural logarithm. It is the inverse function to the Figure 2. Approaching x = 1 from the left, we 



Figure 2 Discontinuous Function/(x) 

Note: For x e [0,1),/(x) = x 2 and for x e [l,2),/(x) = ln(x). 






Important Functions and Their Features 


583 


have to jump from/(x) = 1 to/(l) = 0. Thus, the 
function / is discontinuous at x = 1. Here, / is 
given by /(x) = x 2 for x e [0,1), and /(x) = ln(x) 
for x e [1,2). 

Formal Derivation 

For a formal treatment of continuity, we first 
concentrate on the behavior of / at a particular 
value x*. 

We say that that a function/(x) is continuous 
at x* if, for any positive distance 8, we obtain a 
related distance s(8) such that 

/(x*) - 3 < /(x) < f(x*) + 8, for all 
x e (x*-e(<5),x*+ £(<$)) 

What does that mean? We use Figure 3 to il¬ 
lustrate. (The function is /(x) = sin(x) with x‘ = 
0.2.) At x*, we have the value/(x*). Now, we 
select a neighborhood around/(x*) of some ar¬ 
bitrary distance 8 as indicated by the dashed 
horizontal lines through/(x’) — 8 and/(x*) + 8, 
respectively. From the intersections of these hor¬ 
izontal lines and the function graph (solid line). 


we extend two vertical dash-dotted lines down 
to the x-axis so that we obtain the two values 
x L and x u , respectively. Now, we measure the 
distance between x L and x* and also the dis¬ 
tance between x 11 and x*. The smaller of the two 
yields the distance e(8). With this distance e(8) 
on the x-axis, we obtain the environment (x* — 
e(8), x* + e(S)) about x*. (Note that x L = x* — 
ss, since the distance between x L and x* is the 
shorter one.) The environment is indicated by 
the dashed lines extending vertically above x* — 
s(8) and x* + e(<5), respectively. We require that 
all x that lie in (x* — e(8), x* + s(<5)) yield values 
/(x) inside of the environment | f(x*)—8,f(x*) + 
5]. We can see by Figure 3 that this is satisfied. 

Let us repeat this procedure for a smaller dis¬ 
tance 8. We obtain new environments \f(x*) — 8, 
/(x‘) + <5] and (x* — s(8), x* + s(8)). If, for all 
x in (x* — s(8), x*+ s(8 )), the/(x) are inside of 
[f(x*) — 8, f(x*) + 5], again, then we can take 
an even smaller 8. We continue this for succes¬ 
sively smaller values of 8 just short of becoming 
0 or until the condition on the/(x) is no longer 
satisfied. As we can easily see in Figure 3, we 
could go on forever and the condition on the/(x) 



x* + e(8) 


Figure 3 Continuity Criterion 
Note: Function/ = sin(x), for —1 <x< 1. 


x* - e(8) 
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would always be satisfied. Hence, the graph of 
/ is seamless or continuous at x. 

Finally, we say that the function/ is continuous 
if it is continuous at all x for which/ is defined, 
that is, in the domain of/. Note that only the do¬ 
main off is of interest. For example, the square 
root function/(x) = *Jx is only defined for x 
> 0. Thus, we do not care about whether / is 
continuous for any x other than x > 0. 


INDICATOR FUNCTION 

The indicator function acts like a switch. Often, 
it is denoted by 1 A (X) where A is the event of 
interest and X is a random variable. So, 1 A (X) 
is 1 if the event A is true, that is, if X assumes a 
value in A. Otherwise, 1 A (X) is 0. Formally, this 
is expressed as 


U(X) = 


1 X e A 
0 otherwise 


Usually, indicator functions are applied if we 
are interested in whether a certain event has 
occurred or not. For example, in a simple way. 


the value V of a company may be described 
by a real numbered random variable X on £2 = 
R with a particular probability distribution P. 
Now, the value V of the company may be equal 
to X as long as X is greater than 0. In the case 
where X assumes a negative value or 0, then 
V is automatically 0, that is, the company is 
bankrupt. So, the event of interest is A = [0, 
oo), that is, we want to know whether X is still 
positive. Using the indicator function this can 
be expressed as 


l[0,oo)(X) 


1 X e [0, oo) 
0 otherwise 


Finally, the company value can be given as 


v = l[ 0 ,oo)(X) ■ X = 


X X e [0, oo) 
0 otherwise 


The company value V as a function is depicted 
in Figure 4. We can clearly detect the kink at 
x = 0 where the indicator function becomes 1 
and, hence, V — X. 



Figure 4 The Company Value V as a Function of the Random Variable X Using the Indicator Function 

l[0,oo)(X) • X 
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Figure 5 Function/ (solid) with Derivatives/'(x) at x, for 0 < x < 0.5 (dashed), x = 1 (dash-dotted), 
and x = 1.571 (dotted) 


DERIVATIVES 

Suppose we have some continuous function/ 
with the graph given by the solid line in Figure 
5. We now might be interested in the growth rate 
of / at some position x. That is, we might want 
to know by how much/ increases or decreases 


when we move from some x by a step of a given 
size, say Ax, to the right. This difference in/ we 
denote by A/. This A symbol is called delta. 

Let us next have a look at the graphs given 
by the solid lines in Figure 6. These represent 
the graphs of / and g. The important difference 



◄-Ax,-► 

•4 Ax fa _ 

X* x + 


Figure 6 Functions/ and g with Slopes Measured at the Points (x’,/(x‘)) and (x + , y(x + )) Indicated by 
the • Symbol 
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between/ and g is that, while g is linear,/ is not, 
as can be seen by/'s curvature. 

We begin the analysis of the graphs' slopes 
with function g on the top right of the figure. 
Let us focus on the point (x + ,g(x + )) given by the 
solid circle at the lower end of graph g. Now, 
when we move to the right by Ax 4 along the hor¬ 
izontal dashed line, the corresponding increase 
in g is given by Ay 4 , as indicated by the vertical 
dashed line. If, on the other hand, we moved 
to the right by the longer distance, Axy, the ac¬ 
cording increment of g would be given by A 1 / 5 . 
(This vertical increment A 1/5 is also indicated by 
a vertical dashed line.) Since g is linear, it has 
constant slope everywhere and, hence, also at 
the point (x + , f(x + )). We denote that slope by 
S 4 . This implies that the ratios representing the 
relative increments (i.e., the slopes) have to be 
equal. That is, 

_ Ayi _ Ays 

Ax 4 AX5 

Next, we focus on the graph of/ on the lower 
left of Figure 6 . Suppose we measured the slope 
of/ at the point (x*,/(x*)). If we extended a step 
along the dashed line to the right by Axi, the 
corresponding increment in / would be Ayi, 
as indicated by the leftmost vertical dashed 
line. If we moved, instead, by the longer AX 2 
to the right, the corresponding increment in/ 
would be Ay 2 - And a horizontal increment of 
AX 3 would result in an increase off by A 1 / 3 . 

In contrast to the graph of g, the graph of 
/ does not exhibit the property of a constant 
increment Ay in/ per unit step Ax to the right. 
That is, there is no constant slope of /, which 
results in the fact that the three ratios of the 
relative increase off are different. To be precise, 
we have 

Ayi ^ Ay 2 ^ Ay 3 
Axi AX 2 A *3 

as can be seen in Figure 6 . So, the shorter our 
step Ax to the right, the steeper the slopes of 
the thin solid lines through (x*, /(x*)) and the 
corresponding points on the curve, (x’+Axi, 
/(x’+Axi)), (x*+Ax 2 ,/(x*+Ax 2 )), and (x*+Ax 2 , 


/(x*+Ax 2 )), respectively. That means that, the 
smaller the increment Ax, the higher the relative 
increment Ay off. So, finally, if we moved only 
a minuscule step to the right from (x',/(x*)), we 
would obtain the steepest thin line and, conse¬ 
quently, the highest relative increase in/ given 
by 


a y 

Ax 


( 1 ) 


By letting Ax approach 0, we obtain the 
marginal increment, in case the limit of ( 1 ) exists 
(i.e., if the ratio has a finite limit). Formally, 

All Ax—>0 

——- > s(x) with — 00 < s(x) < 00 

Ax 

This marginal increment s(x) is different, at any 
point on the graph of/, while we have seen 
that it is constant for all points on the graph 

ofg- 


Construction of the Derivative 

The limit analysis of marginal increments now 
brings us to the notion of a derivative that we 
discuss next. Earlier we introduced the limit 
growth rate of some continuous function at 
some point (xo, /(x 0 )). To represent the slope 
of the line through (xo, /(x 0 )) and (x 0 + Ax, 
f(x 0 + Ax)), we define the difference quotient 

fix 0 + Ax) - /(x 0 ) 

Ax ^ 

If we let Ax —> 0, we obtain the limit of the 
difference quotient (2). If this limit is not finite, 
then we say that it does not exist. Suppose we 
were not only interested in the behavior of/ 
when moving Ax to the right but also wanted 
to analyze the reaction by / to a step Ax to the 
left. We would then obtain two limits of (2). The 
first with Ax + > 0 (i.e., a step to the right) would 
be the upper limit L u 

fix 0 + Ax+) - /(x 0 ) Ax +^0 ^ u 

Ax+ > 
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and the second with Ax < 0 (i.e., a step to the 
left), would be the lower limit L l 

/(*o + Ax~) - /(x q) a*-^o u 

| Ax - 1 


If L u and L l are equal, L u = L l — L, then/ is said 
to be differentiable at xo- The limit L is the deriva¬ 
tive of /. We commonly write the derivative in 
the fashion 


/'(* o) = 


df(x) 


dx 


dy 

dx 


( 3 ) 


On the right side of (3), we have replaced /(x) 
by the variable y as we will often do, for conve¬ 
nience. If the derivative (3) exists for all x, then 
/ is said to be differentiable. 

Let us now return to Figure 5. Recall that the 
graph of the continuous function/is given by 
the solid line. We start at x = —1. Since / is 
not continuous at x = —1, we omit this end 
point (1,1) from our analysis. For —1 < x < 0, 
we have that / is constant with slope s = — 1. 
Consequently, the derivative f'(x) = —1, for 
these x. 

At x = 0, we observe that/ is linear to the left 
with f'(x) = — 1 and that it is also linear to the 
right, however, with /'(x) = 1, for 0 < x < 0.5. 
So, at x = 0, L u = 1 while L L = — 1. Since here L u 
L L , the derivative off does not exist at x = 0. 

For 0 < x < 0.5, we have the constant deriva¬ 
tive f'(x) — 1. The corresponding slope of 1 
through (0,0) and (0.5,0.5) is indicated by the 
dashed line. At x = 0.5, the left side limit L l = 1 
while the right side limit L u = 0.8776. (This 
value of cos(0.5) = 0.8776 is a result from cal¬ 
culus.) Flence, the two limits are not equal and, 
consequently,/ is not differentiable at x = 0.5. 

Without formal proof, we state that / is dif¬ 
ferentiable for all 0.5 < x < 2. For example, at 
x — 1,L l = L u — 0.5403 and, thus, the derivative 
/'(1) = 0.5403. The dash-dotted line indicating 
this derivative is called the tangent of / at x = 
1. In Figure 5, the arrow indexed/'(l) points at 
this tangent. As another example, we select x 
= 1.571 where / assumes its maximum value. 
Flere, the derivative/'(l.571) = 0 and, hence. 


the tangent at x = 1.571 is flat as indicated by 
the horizontal dotted line. In Figure 5, the arrow 
indexed/' (1.571) points at this tangent. 


MONOTONIC FUNCTION 

Suppose we have some function/(x) for real¬ 
valued x. For example, the graph off may look 
like that in Figure 7. We see that on the interval 
[ 0 , 1 ], the graph is increasing from/( 0 ) = 0 to 
/(1) = 1. For 1 < x < 2 , the graph remains at 
the level/(l) = 1 like a platform. And, finally, 
between x = 2 and x = 3, the graph is increasing, 
again, from/(2) = 1 to/(3) = 2. 

In contrast, we may have another function, 
g(x). Its graph is given by Figure 8 . It looks 
somewhat similar to the graph in Figure 7, 
however, without the platform. The graph of 
g never remains at a level, but increases con¬ 
stantly. Even for the smallest increments from 
one value of x, say xi, to the next higher, say X 2 , 
there is always an upward slope in the graph. 

Both functions, / and g, never decrease. The 
distinction is that / is monotonically increasing 
since the graph can remain at some level, while 
g is strictly monotonic increasing since its graph 
never remains at any level. If we can differenti¬ 
ate/ and g, we can express this in terms of the 
derivatives off and g. Let/' be the derivative of 
/ and g' the derivative of g. Then, we have the 
following definitions of continuity for continu¬ 
ous functions with existing derivatives: 

Monotonically increasing functions: A continu¬ 
ous function/ with derivative/' is monotoni¬ 
cally increasing if its derivative/' > 0. 

Strictly monotonic increasing functions: A con¬ 
tinuous function g with derivative g' is strictly 
monotonic increasing if its derivative g' > 0. 

Analogously, a function/(x) is monotonically 
decreasing if it behaves in the opposite manner. 
That is, / never increases when moving from 
some x to any higher value xi > x. When / is 
continuous with derivative/', then we say that 
/ is monotonically decreasing if /'(x) < 0 and that 
it is strictly monotonic increasing if/'(x) < 0 for all 
x. For these two cases, illustrations are given by 
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Figure 7 Monotonically Increasing Function/ 


mirroring the graphs in Figures 7 and 8 against 
their vertical axes, respectively. 

INTEGRAL 

Here we derive the concept of integration neces¬ 
sary to understand the probability density and 
continuous distribution function. The integral 


of some function over some set of values repre¬ 
sents the area between the function values and 
the horizontal axis. To sketch the idea, we start 
with an intuitive graphical illustration. 

We begin by analyzing the area A between 
the graph (solid line) of the function/(f) and 
the horizontal axis between t — 0 and t = T 
in Figure 9. Looking at the graph, it appears 



Figure 8 Strictly Monotonic Increasing Function g 
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Figure 9 Approximation of the Area A between Graph of/(f) and the Horizontal Axis, for 0 < f < T 


quite complicated to compute this area A in 
comparison to, for example, the area of a rect¬ 
angle where we would only need to know its 
width and length. However, we can approxi¬ 
mate this area by rectangles as will be done next. 

Approximation of the Area 
through Rectangles 

Let's approximate the area A under the func¬ 
tion graph in Figure 9 as follows. As a first step, 
we dissect the interval between 0 and T into n 
equidistant intervals of length At — f; +1 — f, for 
i = 0, 1,..., n — 1. For each such interval, we 
consider the function value /(/+i) at the right¬ 
most point, f (+ i. To obtain an estimate of the area 
under the graph for the respective interval, we 
multiply the value /(f/+i) at f, + i by the interval 
width At yielding A (f;+i) — At ■ f (f;+i), which 
equals the area of the rectangle above interval 
i + 1 as displayed in Figure 9. Finally, we add up 
the areas A(t\), A(t 2 ),..., A(T) of all rectangles 
resulting in the desired estimate of the area A 

n —1 ft—1 

£A(f, +1 )=J>f-/(f !+1 ) (4) 

!=0 !=0 

We repeat the just described procedure for de¬ 
creasing interval widths At. 


Integral as the Limiting Area 

To derive the perfect approximation of the area 
under the curve in Figure 9, we let the inter¬ 
val width At gradually vanish until it almost 
equals 0, proceeding as before. We denote this 
infinitesimally small width by the step rate dt. 
Now, the difference between the function val¬ 
ues at either end, that is,/(/■) and/(f, + i), of the 
interval i + 1 will be nearly indistinguishable 
since f, and f, + i almost coincide. Hence, the cor¬ 
responding rectangle with area A(f, + i) will turn 
into a dash with infinitesimally small base dt. 

Summation as in equation (4) of the areas of 
the dashes becomes infeasible. For this purpose, 
the integral has been introduced as the limit of 
(4) as At —» 0. (Conditions under which these 
limits exist are omitted here.) It is denoted by 

T 

J fm ( 5 ) 

0 

where the limits 0 and T indicate which interval 
the integration is performed on. In our case, the 
integration variable is t while the function/(t) 
is called the integrand. In words, equation (5) is 
the integral of the function/(t) over t from 0 to 
T. It is immaterial how we denote the integra¬ 
tion variable. The same result as in equation (5) 
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would result if we wrote 

T 

J f(y)dy 

0 

instead. The important factors are the integrand 
and the integral limits. 

Note that instead of using the function values 
of the right boundaries of the intervals/(f /+ i) in 
equation (4), referred to as the right-point rule, 
we might as well have taken the function values 
of the left boundaries/(f,), referred to as the left- 
point rule, which would have led to the same 
integral. Moreover, we might have taken the 
function/(0.5-(f/ + i + f,)) values evaluated at the 
mid-points of the intervals and still obtained 
the same interval. This latter procedure is called 
the mid-point rule. 

If we keep 0 as the lower limit of the integral 
in equation (5) and vary T, then equation (5) 
becomes a function of the variable T. We may 
denote this function by 

T 

F(T) = j f(t)dt (6) 

o 


tion function F and (probability) density func¬ 
tion/. There is the unique link between/ and P 
given through 


P(X<x) = F(x) = 



—oo 


( 8 ) 


Formally, the integration of / over x is always 
from — oo to oo, even if the support is not on the 
entire real line. This is no problem, however, 
since the density is zero outside the support 
and, hence, integration over those parts yields 
0 contribution to the integral. For example, sup¬ 
pose that some density function were 


/(*) = 


h(x), x > 0 
0 x < 0 


(9) 


where h(x) is just some function such that/ sat¬ 
isfies the requirements for a density function. 
That is, the support is only on the positive part 
of the real line. Substituting the function from 
equation (9) into equation (8) yields the equality 


OO OO OO 

S=J f w*=Jw <“> 


Relationship Between Integral 
and Derivative 

In equation (6) the relationship between /(f) 
and F(T) is as follows. Suppose we compute 
the derivative of F(T) with respect to T and as¬ 
sume that F(T) is differentiable, for T > 0. The 
result is 

FTO = ® = f(T) (7) 

Flence, from equation (7) we see that the 
marginal increment of the integral at any point 
(i.e., its derivative) is exactly equal to the in¬ 
tegrand evaluated at the according value. This 
need not generally be true. But in most cases, 
particularly in financial modeling, this state¬ 
ment is valid. 

The implication of this discussion for proba¬ 
bility theory is as follows. Let P be a continuous 
probability measure with probability distribu- 


SOME FUNCTIONS 

Flere we introduce some functions needed in 
probability theory to describe probability distri¬ 
butions of random variables: factorials, gamma 
function, beta function, Bessel function of the 
third kind, and characteristic function. While 
the first four are functions of very special shape, 
the characteristic function is of a more general 
structure. It is the function characterizing the 
probability distribution of some random vari¬ 
able and, hence, is of unique form for each ran¬ 
dom variable. 

Factorial 

Let k e N (i.e., k = 1, 2,...). Then the factorial of 
this natural number k, denoted by the symbol!, 
is given by 

k\ = k ■ (k - 1) • (jfc - 2) •... • 1 


( 11 ) 
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A factorial is the product of this number and 
all natural numbers smaller than k including 1 . 
By definition, the factorial of zero is one (i.e., 
0! = 1). For example, the factorial of 3 is 3! = 
3 • 2 • 1 = 6 . 

Gamma Function 

The gamma function for nonnegative values x is 
defined by 

OO 

r M = x>0 (12, 

0 

The gamma function has the following prop¬ 
erties. If the x correspond with a natural number 
n e N (i.e., n = 1,2,...), then we have that equa¬ 
tion ( 12 ) equals the factorial given by equation 
(11) of n — 1. Formally, this is 

T(n) = (n - 1)! = (n - 1) • (n - 2) • ... ■ 1 

Furthermore, for any x > 0, it holds that 
T(x + 1 ) = xT(x). 

In Figure 10, we have displayed part of the 
gamma function for x values between 0.1 and 5. 
Note that, for either x —> 0 or x -> oo, T(x) goes 
to infinity. 


Beta Function 

The beta function with parameters c and d is de¬ 
fined as 

l 

B(c,d) = J u c ~ l (f — u) d ~ l du 

o 

= r (c)r(d) 
r (c + d) 

where T is the gamma function from equa¬ 
tion (12). 


Bessel Function of the Third Kind 


The Bessel function of the third kind is 
defined as 


Kfx) = 


OO 



0 

This function is often a component of other, 
more complex functions such as the density 
function of the NIG distribution. 


Characteristic Function 

Before advancing to introduce the characteristic 
function, we briefly explain complex numbers. 



Figure 10 Gamma Function r(x) 
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Figure 11 Graphical Representation of the Complex Number z = 0.8 + 0.9; 


Suppose we were to take the square root of 
the number —1, that is, V—1. So far, our cal¬ 
culus has no solution for this since the square 
root of negative numbers has not yet been intro¬ 
duced. However, by introducing the imaginary 
number ;, which is defined as 

i = 

we can solve square roots of any real number. 
Now, we can represent any number as the com¬ 
bination of a real (Re) part a plus some units b of 
;, which we refer to as the imaginary ( Im ) part. 
Then, any number z will look like 

z = a + i ■ b (13) 

The number given by equation (13) is a com¬ 
plex number. The set of complex numbers is 
symbolized by C. This set contains the real num¬ 
bers that are those complex numbers with b = 
0. Graphically, we can represent the complex 
numbers on a two-dimensional space as given 
in Figure 11. 

Now, we can introduce the characteristic func¬ 
tion as some function </> mapping real numbers 
into the complex numbers. Formally, we write 
this as (p: R C. Suppose we have some ran¬ 


dom variable X with density function /. The 
characteristic function is then defined as 

OO 

m=fe-fW, (.4, 

— OO 

which transforms the density/ into some com¬ 
plex number at any real position f. Equation (14) 
is commonly referred to as the Fourier transfor¬ 
mation of the density. 

The relationship between the characteristic 
function <p and the density function/ of some 
random variable is unique. So, when we state ei¬ 
ther one, the probability distribution of the cor¬ 
responding random variable is unmistakably 
determined. 

KEY POINTS 

* Continuous functions are an integral compo¬ 
nent of mathematical analysis. They are use¬ 
ful whenever jumps in the function values 
are undesirable. This is often the case when 
financial asset returns are modeled; that is, 
one assumes that, in particular logarithmic 
returns, they may assume any value on the 
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real line such that the related probability dis¬ 
tribution is continuous with continuous prob¬ 
ability density 

The indicator function is defined as a function 
yielding one for certain specified argument 
values and zero in any other case. It is help¬ 
ful in expressing so-called exclusive either-or 
behavior of random variables (i.e., when ran¬ 
dom variables can only assume exactly one 
of two values). For example, when one mod¬ 
els call option prices where, at maturity, the 
value of the option is equal to either zero or 
the difference between the market value of the 
underlying and the strike price, one resorts to 
the indicator function. 

The derivative of some function expresses the 
function's rate of growth at some point for 
infinitesimally small increments. In words, it 
expresses by how much the function changes 
if one takes a very small step. In probability 
theory, a derivative is used in the context of 
a continuous probability distribution to ex¬ 
press by how much the distribution function 
increases at a certain value (i.e., the marginal 
rate of probability at a certain value). 

The integral is the continuous analogue of the 
sum of discrete values. In probability theory, 
the probability of individual outcomes is al¬ 
ways zero when the distribution is continu¬ 


ous. In order to express the probability of at 
most a certain value, we cannot sum the in¬ 
dividual probabilities of all values less than 
or equal to the critical value. Instead, at each 
value, we have the density function which we 
integrate up to the critical value, yielding the 
requested probability. 

* The characteristic function is the unique rep¬ 
resentation of a probability distribution. For 
certain distributions, the probability density 
function or the distribution function are un¬ 
known. Instead, it is necessary to resort to 
the characteristic function. Technically, the 
characteristic function is a function involv¬ 
ing complex numbers (i.e., numbers includ¬ 
ing the square root of minus one) to express 
the behavior of some function at certain fre¬ 
quencies. It is closely linked to the Fourier 
transform used in engineering. 
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Abstract: Investing decisions require the valuation of investments and the determination of yields 
on investments. Necessary for the valuation and yield determination are the financial mathematics 
that involve the time value of money. With these mathematics, future cash flows can be translated 
to a value in the present, a value today can be converted into a value at some future point in time, 
and the yield on an investment can be computed. 


In this entry, we introduce the mathematical 
process of translating a value today into a value 
at some future point in time, and then show 
how this process can be reversed to determine 
the value today of some future amount. We then 
show how to extend the time value of money 
mathematics to include multiple cash flows and 
the special cases of annuities and loan amortiza¬ 
tion. Finally, we demonstrate how these math¬ 
ematics can be used to calculate the yield on an 
investment. 1 


IMPORTANCE OF THE TIME 
VALUE OF MONEY 

The notion that money has a time value is one of 
the most basic concepts in investment analysis. 
Making decisions today regarding future cash 
flows requires understanding that the value of 
money does not remain the same throughout 
time. 


A dollar today is worth less than a dollar some 
time in the future for two reasons: 

Reason 1: Cash flows occurring at different 
points in time have different values relative 
to any one point in time. 

One dollar one year from now is not as 
valuable as one dollar today. After all, you 
can invest a dollar today and earn interest so 
that the value it grows to next year is greater 
than the one dollar today. This means we 
have to take into account the time value of 
money to quantify the relation between cash 
flows at different points in time. 

Reason 2: Cash flows are uncertain. 

Expected cash flows may not materialize. 
Uncertainty stems from the nature of fore¬ 
casts of the timing and/or the amount of 
cash flows. We do not know for certain when, 
whether, or how much cash flows will be 
in the future. This uncertainty regarding fu¬ 
ture cash flows must somehow be taken 
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into account in assessing the value of an 
investment. 

Translating a current value into its equiva¬ 
lent future value is referred to as compounding. 
Translating a future cash flow or value into its 
equivalent value in a prior period is referred 
to as discounting. This entry outlines the basic 
mathematical techniques used in compounding 
and discounting. 

Suppose someone wants to borrow $100 to¬ 
day and promises to pay back the amount bor¬ 
rowed in one month. Would the repayment of 
only the $100 be fair? Probably not. There are 
two things to consider. First, if the lender didn't 
lend the $100, what could he or she have done 
with it? Second, is there a chance that the bor¬ 
rower may not pay back the loan? So, when 
considering lending money, we must consider 
the opportunity cost (that is, what could have 
been earned or enjoyed), as well as the uncer¬ 
tainty associated with getting the money back 
as promised. 

Let's say that someone is willing to lend the 
money, but that they require repayment of the 
$100 plus some compensation for the opportu¬ 
nity cost and any uncertainty the loan will be 
repaid as promised. The amount of the loan, 
the $100, is the principal. The compensation re¬ 
quired for allowing someone else to use the $100 
is the interest. 

Looking at this same situation from the per¬ 
spective of time and value, the amount that you 
are willing to lend today is the loan's present 
value. The amount that you require to be paid 
at the end of the loan period is the loan's fu¬ 
ture value. Therefore, the future period's value 
is comprised of two parts: 

Future value = Present value + Interest 

The interest is compensation for the use of funds 
for a specific period. It consists of (1) compen¬ 
sation for the length of time the money is bor¬ 
rowed and (2) compensation for the risk that the 
amount borrowed will not be repaid exactly as 
set forth in the loan agreement. 


DETERMINING THE FUTURE 
VALUE 

Suppose you deposit $1,000 into a savings ac¬ 
count at the Surety Savings Bank and you are 
promised 10% interest per period. At the end of 
one period you would have $1,100. This $1,100 
consists of the return of your principal amount 
of the investment (the $1,000) and the interest 
or return on your investment (the $100). Let's 
label these values: 

• $1,000 is the value today, the present value, 
PV. 

• $1,100 is the value at the end of one period, 
the future value, FV. 

• 10% is the rate interest is earned in one period, 
the interest rate, i. 

To get to the future value from the present 
value: 

FV = PV + (PV x i) 

t t 

principal interest 
This is equivalent to: 

FV = PV(1 + i) 

In terms of our example, 

FV = $1,000 + ($1,000 x 0.10) 

= $ 1 , 000(1 + 0 . 10 ) = $ 1,100 

If the $100 interest is withdrawn at the end of 
the period, the principal is left to earn interest 
at the 10% rate. Whenever you do this, you earn 
simple interest. It is simple because it repeats it¬ 
self in exactly the same way from one period 
to the next as long as you take out the interest 
at the end of each period and the principal re¬ 
mains the same. If, on the other hand, both the 
principal and the interest are left on deposit at 
the Surety Savings Bank, the balance earns in¬ 
terest on the previously paid interest, referred 
to as compound interest. Earning interest on inter¬ 
est is called compounding because the balance 
at any time is a combination of the principal, 
interest on principal, and interest on accumulated 
interest (or simply, interest on interest). 
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If you compound interest for one more period 
in our example, the original $1,000 grows to 
$1,210.00: 

FV — Principal + First period interest 
+ Second period interest 
= $1,000.00 + ($1,000.00 x 0.10) 

+ ($1,100.00 x 0.10) 

= $1,200.00 

The present value of the investment is $1,000, 
the interest earned over two years is $210, and 
the future value of the investment after two 
years is $1,210. 

The relation between the present value and 
the future value after two periods, breaking out 
the second period interest into interest on the 
principal and interest on interest, is: 

FV = PV + (PV x i) + (PV x i) + (PV x i x i) 

t t t t 

Principal First Second Second 

period's period's period's interest 

interest on interest on on the first 
the principal the principal period's interest 

or, collecting the PV s from each term and ap¬ 
plying a bit of elementary algebra, 

FV = PV( 1 + 2 i + i 2 ) = PV(1 + if 

The balance in the account two years from now, 
$1,210, comprises three parts: 

1. The principal, $1,000. 

2. Interest on principal, $100 in the first period 
plus $100 in the second period. 

3. Interest on interest, 10% of the first period's 
interest, or $10. 

To determine the future value with compound 
interest for more than two periods, we follow 
along the same lines: 

FV = PV( 1 + i) N (1) 

The value of N is the number of compounding 
periods, where a compounding period is the 
unit of time after which interest is paid at the 
rate i. A period may be any length of time: a 
minute, a day, a month, or a year. The impor¬ 
tant thing is to make sure the same compound- 



Number of compound periods 

Figure 1 The Value of $1,000 Invested 10 Years in 
an Account That Pays 10% Compounded Interest 
per Year 

ing period is reflected throughout the problem 
being analyzed. The term "(1 + i) N " is referred 
to as the compound factor. It is the rate of ex¬ 
change between present dollars and dollars N 
compounding periods into the future. Equation 
(1) is the basic valuation equation—the founda¬ 
tion of financial mathematics. It relates a value 
at one point in time to a value at another point in 
time, considering the compounding of interest. 

The relation between present and future val¬ 
ues for a principal of $1,000 and interest of 10% 
per period through 10 compounding periods 
is shown graphically in Figure 1. For example, 
the value of $1,000, earning interest at 10% per 
period, is $2,593.70 ten periods into the future: 

FV = $1,000 (1 + 0.10) 10 = $1,000 (2.5937) 

= $2,593.70 

As you can see in this figure the $2,593.70 bal¬ 
ance in the account at the end of 10 periods is 
comprised of three parts: 

1. The principal, $1,000. 

2. Interest on the principal of $1,000, which is 
$100 per period for 10 periods or $1,000. 

3. Interest on interest totaling $593.70. 

We can express the change in the value of the 
savings balance (that is, the difference between 
the ending value and the beginning value) as 
a growth rate. A growth rate is the rate at 
which a value appreciates (a positive growth) or 
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depreciates (a negative growth) over time. Our 
$1,000 grew at a rate of 10% per year over the 
10-year period to $2,593.70. The average an¬ 
nual growth rate of our investment of $1,000 is 
10%—the value of the savings account balance 
increased 10% per year. 

We could also express the appreciation in our 
savings balance in terms of a return. A return is 
the income on an investment, generally stated 
as a change in the value of the investment over 
each period divided by the amount at the in¬ 
vestment at the beginning of the period. We 
could also say that our investment of $1,000 pro¬ 
vides an average annual return of 10% per year. 
The average annual return is not calculated by 
taking the change in value over the entire 10- 
year period ($2,593.70 — $1,000) and dividing it 
by $1,000. This would produce an arithmetic av¬ 
erage return of 159.37% over the 10-year period, 
or 15.937% per year. But the arithmetic aver¬ 
age ignores the process of compounding. The 
correct way of calculating the average annual 
return is to use a geometric average return: 



which is a rearrangement of equation (1) Using 
the values from the example, 

$2,593.70 / $2,593.70 \ 1/10 

$1,000.00 ~ 1 - v$uoooooy 
-1 = 1 . 100-1 = 10 % 

Therefore, the annual return on the invest¬ 
ment—sometimes referred to as the compound 
average annual return or the true return —is 10% 
per year. 

Here is another example for calculating a fu¬ 
ture value. A common investment product of a 
life insurance company is a guaranteed invest¬ 
ment contract (GIC). With this investment, an 
insurance company guarantees a specified in¬ 
terest rate for a period of years. Suppose that 
the life insurance company agrees to pay 6% 
annually for a five-year GIC and the amount 
invested by the policyholder is $10 million. The 
amount of the liability (that is, the amount this 


life insurance company has agreed to pay the 
GIC policyholder) is the future value of $10 mil¬ 
lion when invested at 6% interest for five years. 
In terms of equation (1), PV = $10,000,000, i = 
6%, and N — 5, so that the future value is: 

FV = $10,000,000 (1 + 0.06) 5 = $13,382,256 

Compounding More Than One Time 
per Year 

An investment may pay interest more than one 
time per year. For example, interest may be 
paid semiannually, quarterly, monthly, weekly, 
or daily, even though the stated rate is quoted 
on an annual basis. If the interest is stated as, 
say, 10% per year, compounded semiannually, 
the nominal rate—often referred to as the annual 
percentage rate (APR)—is 10%. The basic valua¬ 
tion equation handles situations in which there 
is compounding more frequently than once a 
year if we translate the nominal rate into a rate 
per compounding period. Therefore, an APR of 
10% with compounding semiannually is 5% per 
period—where a period is six months—and the 
number of periods in one year is 2. 

Consider a deposit of $50,000 in an account for 
five years that pays 8% interest, compounded 
quarterly. The interest rate per period, i, is 8%/4 
= 2% and the number of compounding periods 
is 5 x 4 = 20. Therefore, the balance in the ac¬ 
count at the end of five years is: 

FV = $50,000(1 + 0.02) 20 = $50,000(1.4859474) 
= $74,297.37 

As shown in Figure 2, through 50 years with 
both annual and quarterly compounding, the 
investment's value increases at a faster rate with 
the increased frequency of compounding. 

The last example illustrates the need to 
correctly identify the "period" because this dic¬ 
tates the interest rate per period and the number 
of compounding periods. Because interest rates 
are often quoted in terms of an APR, we need 
to be able to translate the APR into an inter¬ 
est rate per period and to adjust the number of 
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Continuous Compounding 

The extreme frequency of compounding is con¬ 
tinuous compounding —interest is compounded 
instantaneously. The factor for compounding 
continuously for one year is e APR , where e is 
2.71828..., the base of the natural logarithm. 
And the factor for compounding continuously 
for two years is e APR e APR or e APR . The future 
value of an amount that is compounded contin¬ 
uously for N years is: 

FV = PVe N(AFR) (3) 


Figure 2 Value of $50,000 Invested in the Ac¬ 
count that Pays 8% Interest per Year: Quarterly 
versus Annual Compounding 

periods. To see how this works, let's use an ex¬ 
ample of a deposit of $1,000 in an account that 
pays interest at a rate of 12% per year, with in¬ 
terest compounded for different compounding 
frequencies. How much is in the account after, 
say, five years depends on the compounding 
frequency: 


Com¬ 

pounding 

Frequency 

Period 

Rate per 
Compound¬ 
ing Period, i 

Number 
of Periods 
in Five 
Years, N 

FV at the 
End of 
Five 
Years 

Annual 

One year 

12% 

5 

$1,762.34 

Semiannual 

Six months 

6% 

10 

1,790.85 

Quarterly 

Three months 

3% 

20 

1,806.11 

Monthly 

One month 

1% 

60 

1,816.70 


As you can see, both the rate per period, i, 
and the number of compounding periods, N, 
are adjusted and depend on the frequency of 
compounding. Interest can be compounded for 
any frequency, such as daily or hourly. 

Let's work through another example for com¬ 
pounding with compounding more than once 
a year. Suppose we invest $200,000 in an in¬ 
vestment that pays 4% interest per year, com¬ 
pounded quarterly. What will be the future 
value of this investment at the end of 10 years? 

The given information is i — 4%/4 = 1% and 
N = 10 x 4 = 40 quarters. Therefore, 

FV = $200,000(1 + 0.01) 40 = $297,772.75 


where APR is the annual percentage rate and 
gN(APR) j s jpg compound factor. 

If $1,000 is deposited in an account for five 
years with interest of 12% per year, com¬ 
pounded continuously, 

FV = $l,OOOe 5(012) = $l,000(e a6 °) 

= $1,000(1.82212) = $1,822.12 

Comparing this future value with that if interest 
is compounded annually at 12% per year for 
five years, $1,762.34, we see the effects of this 
extreme frequency of compounding. 

Multiple Rates 

In our discussion thus far, we have assumed 
that the investment will earn the same periodic 
interest rate, i. We can extend the calculation 
of a future value to allow for different inter¬ 
est rates or growth rates for different periods. 
Suppose an investment of $10,000 pays 9% dur¬ 
ing the first year and 10% during the second 
year. At the end of the first period, the value of 
the investment is $10,000 (1 + 0.09), or $10,900. 
During the second period, this $10,900 earns in¬ 
terest at 10%. Therefore, the future value of this 
$10,000 at the end of the second period is: 

FV = $10,000(1 + 0.09)(1 + 0.10) = $11,990 

We can write this more generally as: 

FV = PV( 1 + z'i)(1 + i 2 )(l + h) ■ ■ ■ (1 + in) 

( 4 ) 

where 1 % is the interest rate for period N. 
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Consider a $50,000 investment in a one-year 
bank certificate of deposit (CD) today and rolled 
over annually for the next two years into one- 
year CDs. The future value of the $50,000 in¬ 
vestment will depend on the one-year CD rate 
each time the funds are rolled over. Assuming 
that the one-year CD rate today is 5% and that 
it is expected that the one-year CD rate one 
year from now will be 6%, and the one-year 
CD rate two years from now will be 6.5%, then 
we know: 


FV = $50,000(1 + 0.05)(1 + 0.06)(1 + 0.065) 
= $59,267.25 


Continuing this example, what is the average 
annual interest rate over this period? We know 
that the future value is $59,267.25, the present 
value is $50,000, and N = 3: 


= j S^nnnnn “ 1 = ^!- 185345 = 5.8315% 
V $50,000.00 


which is also: 


i = ^(1 + 0.05) + (1 + 0.06)(1 + 0.065) - 1 
= 5.8315% 


DETERMINING THE 
PRESENT VALUE 

Now that we understand how to compute fu¬ 
ture values, let's work the process in reverse. 
Suppose that for borrowing a specific amount 
of money today, the Yenom Company promises 
to pay lenders $5,000 two years from today. 
How much should the lenders be willing to 
lend Yenom in exchange for this promise? This 
dilemma is different than figuring out a future 
value. Here we are given the future value and 
have to figure out the present value. But we can 
use the same basic idea from the future value 
problems to solve present value problems. 

If you can earn 10% on other investments that 
have the same amount of uncertainty as the 
$5,000 Yenom promises to pay, then: 

• The future value, FV = $5,000. 

• The number of compounding periods, N = 2. 

• The interest rate, i = 10%. 


We also know the basic relation between the 
present and future values: 

FV = PV( 1 + i) N 

Substituting the known values into this 
equation: 

$5,000 = PV{ 1 + 0.10) 2 


To determine how much you are willing to lend 
now, PV, to get $5,000 one year from now, FV, 
requires solving this equation for the unknown 
present value: 


PV = 


$5,000 

(1 + 0 . 10) 2 


= $5,000 


0.10 


= $5,000(0.82645) = $4,132.25 


Therefore, you would be willing to lend 
$4,132.25 to receive $5,000 one year from to¬ 
day if your opportunity cost is 10%. We can 
check our work by reworking the problem from 
the reverse perspective. Suppose you invested 
$4,132.25 for two years and it earned 10% per 
year. What is the value of this investment at the 
end of the year? 

We know: PV = $4,132.25, N = 10% or 0.10, 
and i — 2. 

Therefore, the future value is: 

FV = PV( 1 + i) N = $4,132.25 (1 + 0.10) 2 
= $5,000.00 


Compounding translates a value in one point 
in time into a value at some future point in 
time. The opposite process translates future val¬ 
ues into present values: Discounting translates 
a value back in time. From the basic valuation 
equation: 

FV = PV( 1 + i) N 


we divide both sides by (1 + i) N and exchange 
sides to get the present value. 


FV 

PV = - 

(1 + i) N 


( 5 ) 


or PV = FV 



or PV = FV 


1 

(1 + i) N 
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$ 5 , 000.00 



Number of discount periods 

Figure 3 Present Value of $5,000 Discounted at 10% 


The term in brackets [ ] is referred to as the 
discount factor since it is used to translate a fu¬ 
ture value to its equivalent present value. The 
present value of $5,000 for discount periods 
ranging from 0 to 10 is shown in Figure 3. 

If the frequency of compounding is greater 
than once a year, we make adjustments to the 
rate per period and the number of periods as 
we did in compounding. For example, if the 
future value five years from today is $100,000 
and the interest is 6% per year, compounded 
semiannually, i = 6%/2 = 3% and N — 5 x 2 = 
10, and the present value is: 

PV = $100,000(1 + 0.03) 10 = $100,000(1.34392) 
= $134,392 


Here is an example of calculating a present 
value. Suppose that the goal is to have $75,000 
in an account by the end of four years. And 
suppose that interest on this account is paid at a 
rate of 5% per year, compounded semiannually. 
How much must be deposited in the account 
today to reach this goal? We are given PV = 
$75,000, i = 5%/2 = 2.5% per six months, and 
N — 4x2 = 8 six-month periods. The amount 
of the required deposit is therefore: 


_ $75,000 

(1 + 0.025) 8 


$61,555.99 


DETERMINING THE 
UNKNOWN INTEREST RATE 

As we saw earlier in our discussion of growth 
rates, we can rearrange the basic equation to 
solve for i: 



As an example, suppose that the value of an 
investment today is $100 and the expected value 
of the investment in five years is expected to be 
$150. What is the annual rate of appreciation 
in value of this investment over the five-year 
period? 



= \/h5 — 1 = 0.0845 or 8.45% per year 

There are many applications in finance where 
it is necessary to determine the rate of change 
in values over a period of time. If values are in¬ 
creasing over time, we refer to the rate of change 
as the growth rate. To make comparisons easier, 
we usually specify the growth rate as a rate per 
year. 

For example, if we wish to determine the rate 
of growth in these values, we solve for the un¬ 
known interest rate. Consider the growth rate of 
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dividends for General Electric. General Electric 
pays dividends each year. In 1996, for exam¬ 
ple, General Electric paid dividends of SO.317 
per share of its common stock, whereas in 2006 
the company paid SI.03 in dividends per share. 
This represents a growth rate of 12.507%: 


Growth rate of dividends 


$1.03 


- 1 


V $0,317 
^3.2492 - 1 
12.507% 


The 12.507% is the average annual rate of the 
growth during this 10 -year span. 


DETERMINING THE 
NUMBER OF 

COMPOUNDING PERIODS 

Given the present and future values, calculat¬ 
ing the number of periods when we know the 
interest rate is a bit more complex than calcu¬ 
lating the interest rate when we know the num¬ 
ber of periods. Nevertheless, we can develop 
an equation for determining the number of pe¬ 
riods, beginning with the valuation formula 
given by equation ( 1 ) and rearranging to solve 
for N, 


InFV-lnPV 
ln(l + i) 


( 6 ) 


where In indicates the natural logarithm, which 
is the log of the base e. (e is approximately equal 
to 2.718. The natural logarithm function can be 
found on most calculators, usually indicated by 
"In".) 

Suppose that the present value of an invest¬ 
ment is $100 and you wish to determine how 
long it will take for the investment to double 
in value if the investment earns 6 % per year, 
compounded annually: 

In200-In 100 5.2983-4.6052 

In 1.06 = 0.0583 

= 11.8885 or approximately 12years 


You'll notice that we round off to the next 
whole period. To see why, consider this last ex¬ 
ample. After 11.8885 years, we have doubled 
our money if interest were paid 88.85% the way 
through the 12th year. But, we stated earlier that 
interest is paid at the and of each period—not 
part of the way through. At the end of the 
11th year, our investment is worth $189.93, and 
at the end of the 12 th year, our investment is 
worth $201.22. So, our investment's value dou¬ 
bles by the 12 th period—with a little extra, 
$ 1 . 22 . 


THE TIME VALUE OF A 
SERIES OF CASH FLOWS 

Applications in finance may require the deter¬ 
mination of the present or future value of a se¬ 
ries of cash flows rather than simply a single 
cash flow. The principles of determining the fu¬ 
ture value or present value of a series of cash 
flows are the same as for a single cash flow, yet 
the math becomes a bit more cumbersome. 

Suppose that the following deposits are made 
in a Thrifty Savings and Loan account paying 
5% interest, compounded annually: 


Time When Deposit Is Made 

Amount of Deposit 

Today 

$1,000 

At the end of the first year 

2,000 

At the end of the second year 

1,500 


What is the balance in the savings account at 
the end of the second year if no withdrawals 
are made and interest is paid annually? 

Let's simplify any problem like this by refer¬ 
ring to today as the end of period 0 , and iden¬ 
tifying the end of the first and each successive 
period as 1,2,3, and so on. Represent each end- 
of-period cash flow as "CF" with a subscript 
specifying the period to which it corresponds. 
Thus, CF 0 is a cash flow today, CF W is a cash 
flow at the end of period 10, and CF 2 5 is a cash 
flow at the end of period 25, and so on. 
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Representing the information in our example 
using cash flow and period notation: 


Period 

Cash Flow 

End-of-Period Cash Flow 

0 

CF o 

$1,000 

1 

Cfi 

$2,000 

2 

cf 2 

$1,500 


The future value of the series of cash flows at 
the end of the second period is calculated as 
follows: 


Period 

End-of-Period 
Cash Flow 

Number of 
Periods 

Interest 

Is Earned 

Compounding 

Factor 

Future 

Value 

0 

$1,000 

2 

1.1025 

$1,102.50 

1 

2,000 

1 

1.0500 

2,100.00 

2 

1,500 

0 

1.0000 

1,500.00 





$4,702.50 


The last cash flow, $1,500, was deposited at the 
very end of the second period—the point of 
time at which we wish to know the future value 
of the series. Therefore, this deposit earns no 
interest. In more formal terms, its future value 
is precisely equal to its present value. 

Today, the end of period 0, the balance in the 
account is $1,000 since the first deposit is made 
but no interest has been earned. At the end of 
period 1, the balance in the account is $3,050, 
made up of three parts: 

1. The first deposit, $1,000. 

2. $50 interest on the first deposit. 

3. The second deposit, $2,000. 

The balance in the account at the end of period 
2 is $4,702.50, made up of five parts: 

1. The first deposit, $1,000. 

2. The second deposit, $2,000. 

3. The third deposit, $1,500. 

4. $102.50 interest on the first deposit, $50 
earned at the end of the first period, $52.50 
more earned at the end of the second period. 

5. $100 interest earned on the second deposit at 
the end of the second period. 

These cash flows can also be represented in 
a time line. A time line is used to help graph- 


End of period 0 1 2 

Time |-1-1 

Cashflows CF 0 = $1,000.00 = $2,000.00 CF 2 = $1,500.00 

>*$2,000.00(1.05)= 2,100.00 

**$1,000.00(1.05) 2 = 1,102.50 

FV= $4,702.50 

Figure 4 Time Line for the Future Value of a Se¬ 
ries of Uneven Cash Flows Deposited to Earn 5% 
Compounded Interest per Period 

ically depict and sort out each cash flow in a 
series. The time line for this example is shown 
in Figure 4. From this example, you can see that 
the future value of the entire series is the sum of 
each of the compounded cash flows comprising 
the series. In much the same way, we can deter¬ 
mine the future value of a series comprising any 
number of cash flows. And if we need to, we can 
determine the future value of a number of cash 
flows before the end of the series. 

For example, suppose you are planning to de¬ 
posit $1,000 today and at the end of each year 
for the next ten years in a savings account pay¬ 
ing 5% interest annually. If you want to know 
the future value of this series after four years, 
you compound each cash flow for the number 
of years it takes to reach four years. That is, you 
compound the first cash flow over four years, 
the second cash flow over three years, the third 
over two years, the fourth over one year, and 
the fifth you don't compound at all because you 
will have just deposited it in the bank at the end 
of the fourth year. 

To determine the present value of a series of 
future cash flows, each cash flow is discounted 
back to the present, where the beginning of the 
first period, today, is designated as 0. As an 
example, consider the Thrifty Savings & Loan 
problem from a different angle. Instead of cal¬ 
culating what the deposits and the interest on 
these deposits will be worth in the future, let's 
calculate the present value of the deposits. The 
present value is what these future deposits are 
worth today. 

In the series of cash flows of $1,000 today, 
$2,000 at the end of period 1, and $1,500 at 
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the end of period 2, each are discounted to the 
present, 0, as follows: 


Period 

End-of- 

Period 

Cash Flow 

Number of 
Periods of 
Discounting 

Discount 

Factor 

Present 

Value 

0 

$1,000 

0 

1.00000 

$1,000.00 

1 

$2,000 

1 

0.95238 

1,904.76 

2 

$1,500 

2 

0.90703 

1,360.54 





FV = $4,265.30 


The present value of the series is the sum 
of the present value of these three cash flows, 
$4,265.30. For example, the $1,500 cash flow at 
the end of period 2 is worth $1,428.57 at the end 
of the first period and is worth $1,360.54 today. 

The present value of a series of cash flows can 
be represented in notation form as: 

+CF2 (/T>) + ■ + cf ”(tT>) 


$10,000 
$9,000 
$8,000 
§ $7,000 
§ $6,000 
I $5,000 
5 $4,000 

<D 

I $3,000 
m $2,000 
$ 1,000 
$- 


■ 1st deposit 
E2nd deposit 

■ 3rd deposit 
□4th deposit 

■ Interest 




2 3 

End of period 





Figure 5 Balance in an Account in Which De¬ 
posits of $2,000 Each Are Made Each Year (The 
Balance in the Account Earns 8%) 


interest. How much will you have available at 
the end of the fourth year? 

As we just did for the future value of a series of 
uneven cash flows, we can calculate the future 
value (as of the end of the fourth year) of each 
$2,000 deposit, compounding interest at 8%: 


For example, if there are cash flows today and 
at the end of periods 1 and 2, today's cash flow 
is not discounted, the first period cash flow is 
discounted one period, and the second period 
cash flow is discounted two periods. 

We can represent the present value of a series 
using summation notation as shown below: 

N / i \f 

pv = L cf '(tT7 (7> 

t =0 V 7 

This equation tells us that the present value of a 
series of cash flows is the sum of the products of 
each cash flow and its corresponding discount 
factor. 


Shortcuts: Annuities 

There are valuation problems that require us to 
evaluate a series of level cash flows—each cash 
flow is the same amount as the others—received 
at regular intervals. Let's suppose you expect to 
deposit $2,000 at the end of each of the next four 
years in an account earning 8% compounded 


FV = $2,000(1 + 0.08) 3 + $2,000(1 + 0.08) 2 
+ $2,000(1 + 0.08) 1 + $2,000(1 + 0.08)° 

= $2,519.40 + $2,332.80 + $2,160.00 
+ $2,000 = $9,012.20 

Figure 5 shows the contribution of each de¬ 
posit and the accumulated interest at the end of 
each period. 

• At the end of the first year, there is $2,000.00 
in the account because you have just made 
your first deposit. 

• At the end of the second, there is $4,160.00 in 
the account: two deposits of $2,000 each, plus 
$160 interest (8% of $2,000). 

• At the end of the third year, there is $6,492.80 
in the account: three deposits of $2,000.00 
each, plus accumulated interest of $492.80 
[$160.00 + (0.08 x $4,000) + (0.08 x $160)]. 

• At the end of the fourth year, you would 
have $9,012.20 available: four deposits of 
$2,000 each, plus $1,012.20 accumulated in¬ 
terest [$160.00 + $492.80 + (0.08 x $6,000) + 
(0.08 x ($160.00 + 492.80)]. 
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Notice that in our calculations, each deposit of 
$2,000 is multiplied by a factor that corresponds 
to an interest rate of 8% and the number of pe¬ 
riods that the deposit has been in the savings 
account. Since the deposit of $2,000 is common 
to each multiplication, we can simplify the math 
a bit by multiplying the $2,000 by the sum of the 
factors to get the same answer: 

FV = $2,000(1.2597) + $2,000(1.1664) 

+ $2,000(1.0800) + $2,000(1.0000) 

= $9,012.20 

A series of cash flows of equal amount occur¬ 
ring at even intervals is referred to as an annuity. 
Determining the value of an annuity, whether 
compounding or discounting, is simpler than 
valuing uneven cash flows. If each CF t is equal 
(that is, all the cash flows are the same value) 
and the first one occurs at the end of the first 
period (t — 1), we can express the future value 
of the series as: 


N 

FV= Y^CF t (l + i) N ~‘ 

t=i 

N is last and t indicates the time period corre¬ 
sponding to a particular cash flow, starting at 1 
for an ordinary annuity. Since CF t is shorthand 
for: CFi, CF 2 , CF 3 ,..., CF N , and we know that 
CFi = CF 2 = CF 3 = ... CF\>, let's make things 
simple by using CF to indicate the same value 
for the periodic cash flows. Rearranging the fu¬ 
ture value equation we get: 

N 

FV = CFj2(l + if~ t ( 8 ) 

t=i 

This equation tells us that the future value of 
a level series of cash flows, occurring at regu¬ 
lar intervals beginning one period from today 
(notice that t starts at 1), is equal to the amount 
of cash flow multiplied by the sum of the com¬ 
pound factors. 

In a like manner, the equation for the present 
value of a series of level cash flows beginning 


after one period simplifies to: 


N 


t =1 


^ = Ecf.(tT 7 = cf £ TT7 


t N 


t=1 


i y 


or 


pv = cfJ2 




(9) 


This equation tells us that the present value 
of an annuity is equal to the amount of one 
cash flow multiplied by the sum of the discount 
factors. 

Equations (8) and (9) are the valuation— 
future and present value—formulas for an or¬ 
dinary annuity. An ordinary annuity is a special 
form of annuity, where the first cash flow occurs 
at the end of the first period. 

To calculate the future value of an annuity 
we multiply the amount of the annuity (that is, 
the amount of one periodic cash flow) by the 
sum of the compound factors. The sum of these 
compounding factors for a given interest rate, i, 
and number of periods, N, is referred to as the 
future value annuity factor. Likewise, to calculate 
the present value of an annuity we multiply 
one cash flow of the annuity by the sum of the 
discount factors. The sum of the discounting 
factors for a given i and N is referred to as the 
present value annuity factor. 

Suppose you wish to determine the future 
value of a series of deposits of $1,000, deposited 
each year in the No Fault Vault Bank for five 
years, with the first deposit made at the end of 
the first year. If the NFV Bank pays 5% inter¬ 
est on the balance in the account at the end of 
each year and no withdrawals are made, what 
is the balance in the account at the end of the 
five years? 

Each $1,000 is deposited at a different time, 
so it contributes a different amount to the 
future value. For example, the first deposit 
accumulates interest for four periods, contribut¬ 
ing $1,215.50 to the future value (at the end of 
period 5), whereas the last deposit contributes 
only $1,000 to the future value since it is de¬ 
posited at exactly the point in time when we 
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are determining the future value, hence there is 
no interest on this deposit. 

The future value of an annuity is the sum of 
the future value of each deposit: 


Period 

Amount of 
Deposit 

Number of 
Periods Interest 
Is Earned 

Compounding 

Factor 

Future 

Value 

1 

$1,000 

4 

1.2155 

$1,215.50 

2 

1,000 

3 

1.1576 

1,157.60 

3 

1,000 

2 

1.1025 

1,102.50 

4 

1,000 

1 

1.0500 

1,050.00 

5 

1,000 

0 

1.0000 

1,000.00 

Total 



5.5256 

$5,525.60 


The future value of the series of $1,000 deposits, 
with interest compounded at 5%, is $5,525.60. 
Since we know the value of one of the level pe¬ 
riod flows is $1,000, and the future value of the 
annuity is $5,525.60, and looking at the sum of 
the individual compounding factors, 5.5256, we 
can see that there is an easier way to calculate 
the future value of an annuity. If the sum of the 
individual compounding factors for a specific 
interest rate and a specific number of periods 
were available, all we would have to do is mul¬ 
tiply that sum by the value of one cash flow to 
get the future value of the entire annuity. 

In this example, the shortcut is multiplying 
the amount of the annuity, $1,000, by the sum 
of the compounding factors, 5.5256: 


Let's use the long method to find the present 
value of the series of five deposits of $ 1,000 each, 
with the first deposit at the end of the first pe¬ 
riod. Then we'll do it using the shortcut method. 
The calculations are similar to the future value 
of an ordinary annuity, except we are taking 
each deposit back in time, instead of forward: 


Period 

Amount of 
Deposit 

Discounting 

Periods 

Discounting 

Factor 

Present 

Value 

i 

$1,000 

i 

0.9524 

$952.40 

2 

1,000 

2 

0.9070 

907.00 

3 

1,000 

3 

0.8638 

863.80 

4 

1,000 

4 

0.8227 

822.70 

5 

1,000 

5 

0.7835 

783.50 

Total 



4.3294 

$4,329,40 


The present value of this series of five deposits 
is $4,329.40. 

This same value is obtained by multiplying 
the annuity amount of $1,000 by the sum of the 
discounting factors, 4.3294: 

PV = $1,000 x 4.3294 = $4,329.40 

Another, more convenient way of solving for 
the present value of an annuity is to rewrite the 
factor as: 


Present value annuity factor = -—- - — 

( 11 ) 


FV = $1,000 x 5.5256 = $5,525.60 


For large numbers of periods, summing the 
individual factors can be a bit clumsy—with 
possibilities of errors along the way. An al¬ 
ternative formula for the sum of the com¬ 
pound factors—that is, the future value annuity 
factor—is: 


Future value annuity factor = 


(1 + i) N - 1 

( 10 ) 


In the last example, N = 5 and i — 5%: 


Future value annuity factor = 


(1 + 0.05) 5 - 1 
005 

1.2763- 1.000 


0.05 

= 5.5256 


If there are many discount periods, this formula 
is a bit easier to calculate. In our last example. 


Present value annuity factor 


1 

} ~ (1 + 0.05) 5 
005 
1 - 0.7835 
005 
4.3295 


which is different from the sum of the factors, 
4.3294, due to rounding. 

We can turn this present value of an annuity 
problem around to look at it from another an¬ 
gle. Suppose you borrow $4,329.40 at an interest 
rate of 5% per period and are required to pay 
back this loan in five installments (N = 5): one 
payment per period for five periods, starting 
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one period from now. The payments are deter¬ 
mined by equating the present value with the 
product of the cash flow and the sum of the 
discount factors: 

PV = CF(sum of discount factors) 

5 1 

= CFY - 

i=i(l + 0.05) f 

= CF (0.9524 + 0.9070 + 0.8638 + 0.8227 
+ 0.7835) 

= CF (4.3294) 

substituting the known present value, 

$4,329.40 = CF (4.3294) 
and rearranging to solve for the payment: 

CF = $4,329.40/4.3290 = $1,000.00 

We can convince ourselves that five install¬ 
ments of $1,000 each can pay off the loan of 
$4,329.40 by carefully stepping through the cal¬ 
culation of interest and the reduction of the 
principal: 


Beginning 
of Periods 
Loan 
Balance 

Payment 

Interest 
(Principal 
x 5%) 

Reduction in 
Loan Balance 
(Payment — 
Interest) 

End-of-Period 
Loan Balance 

$4,329.40 

$1,000.00 

$216.47 

$783.53 

$3,545.87 

3,545.87 

1,000.00 

177.29 

822.71 

2,723.16 

2,723.16 

1,000.00 

136.16 

863.84 

1,859.32 

1,859.32 

1,000.00 

92.97 

907.03 

952.29 

952.29 

1,000.00 

47.61 

952.29 a 

0 


a The small difference between calculated reduction 
($952.38) and reported reduction is due to rounding 
differences. 


For example, the first payment of $1,000 
is used to: (1) pay interest on the loan at 
5% ($4,329.40 x 0.05 = $216.47) and (2) pay 
down the principal or loan balance ($1,000.00 — 
$216.47 = $783.53 paid off). Each successive 
payment pays off a greater amount of the loan— 
as the principal amount of the loan is reduced, 
less of each payment goes to paying off interest 
and more goes to reducing the loan principal. 
This analysis of the repayment of a loan is re¬ 
ferred to as loan amortization. Loan amortization 


is the repayment of a loan with equal payments, 
over a specified period of time. As we can see 
from the example of borrowing $4,329.40, each 
payment can be broken down into its interest 
and principal components. 


VALUING CASH FLOWS 
WITH DIFFERENT TIME 
PATTERNS 

Valuing a Perpetual Stream of 
Cash Flows 

There are some circumstances where cash flows 
are expected to continue forever. For example, 
a corporation may promise to pay dividends on 
preferred stock forever, or, a company may is¬ 
sue a bond that pays interest every six months, 
forever. Flow do you value these cash flow 
streams? Recall that when we calculated the 
present value of an annuity, we took the amount 
of one cash flow and multiplied it by the sum of 
the discount factors that corresponded to the in¬ 
terest rate and number of payments. But what if 
the number of payments extends forever—into 
infinity? 

A series of cash flows that occur at regular 
intervals, forever, is a perpetuity. Valuing a per¬ 
petual cash flow stream is just like valuing an 
ordinary annuity. It looks like this: 


PV = CF i 



+ CF 2 



2 


+ cf 3 




oo 


Simplifying, recognizing that the cash flows CF f 
are the same in each period, and using summa¬ 
tion notation. 


PV 




t=i 


As the number of discounting periods ap¬ 
proaches infinity, the summation approaches 
1/i. To see why, consider the present value 
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annuity factor for an interest rate of 10%, as 
the number of payments goes from 1 to 200: 


Number of Discounting 
Periods, N 

Present Value 
Annuity Factor 

1 

0.9091 

10 

6.1446 

40 

9.7791 

100 

9.9993 

200 

9.9999 


For greater numbers of payments, the factor ap¬ 
proaches 10, or 1/0.10. Therefore, the present 
value of a perpetual annuity is very close to: 

CF 

PV=— (12) 

i 

Suppose you are considering an investment 
that promises to pay $100 each period forever, 
and the interest rate you can earn on alterna¬ 
tive investments of similar risk is 5% per pe¬ 
riod. What are you willing to pay today for this 
investment? 


PV = 


$100 

005 


= $ 2,000 


Therefore, you would be willing to pay $2,000 
today for this investment to receive, in return, 
the promise of $100 each period forever. 

Let's look at the value of a perpetuity an¬ 
other way. Suppose that you are given the op¬ 
portunity to purchase an investment for $5,000 
that promises to pay $50 at the end of every 
period forever. What is the periodic interest 
per period—the return—associated with this 
investment? 

We know that the present value is PV = $5,000 
and the periodic, perpetual payment is CF = 
$50. Inserting these values into the formula for 
the present value of a perpetuity: 


$5,000 = 


$50 

i 


Solving for i, 
$50 


i = 


$5,000 


= 0.01 or 1% per period 


Therefore, an investment of $5,000 that gener¬ 
ates $50 per period provides 1% compounded 
interest per period. 


Valuing an Annuity Due 

The ordinary annuity cash flow analysis as¬ 
sumes that cash flows occur at the end of each 
period. However, there is another fairly com¬ 
mon cash flow pattern in which level cash flows 
occur at regular intervals, but the first cash flow 
occurs immediately. This pattern of cash flows 
is called an annuity due. For example, if you win 
the Florida Lottery Lotto grand prize, you will 
receive your winnings in 20 installments (after 
taxes, of course). The 20 installments are paid 
out annually, beginning immediately. The lot¬ 
tery winnings are therefore an annuity due. 

Like the cash flows we have considered thus 
far, the future value of an annuity due can be 
determined by calculating the future value of 
each cash flow and summing them. And, the 
present value of an annuity due is determined 
in the same way as a present value of any stream 
of cash flows. 

Let's consider first an example of the future 
value of an annuity due, comparing the val¬ 
ues of an ordinary annuity and an annuity due, 
each comprising three cash flows of $500, com¬ 
pounded at the interest rate of 4% per period. 
The calculation of the future value of both the 
ordinary annuity and the annuity due at the end 
of three periods is: 

Ordinary annuity Annuity due 

3 3 

FV = $500 E (1 + 0.04) 3 -' FV due = $500 £ (1 + 0.04) 3 ~ ,+1 

t =1 f=1 


The future value of each of the $500 payments 
in the annuity due calculation is compounded 
for one more period than for the ordinary annu¬ 
ity. For example, the first deposit of $500 earns 
interest for two periods in the ordinary annu¬ 
ity situation [$500 (1 + 0.04) 2 ], whereas the first 
$500 in the annuity due case earns interest for 
three periods [$500 (1 + 0.04) 3 ]. 
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In general terms, 

N 

FV due = CFj2(l + i) N ~ t+1 ( 13 ) 

f=i 

which is equal to the future value of an ordinary 
annuity multiplied by a factor of 1 + i: 

TV due = CF[Future value annuity factor 
(ordinary) for N and z](l + i) 

The present value of the annuity due is cal¬ 
culated in a similar manner, adjusting the ordi¬ 
nary annuity formula for the different number 
of discount periods: 

N 

= (14) 

Since the cash flows in the annuity due situa¬ 
tion are each discounted one less period than 
the corresponding cash flows in the ordinary 
annuity, the present value of the annuity due is 
greater than the present value of the ordinary 
annuity for an equivalent amount and number 
of cash flows. Like the future value an annuity 
due, we can specify the present value in terms 
of the ordinary annuity factor: 

PV due — CF [Present value annuity factor 
(ordinary) for N and i](l + i) 


Valuing a Deterred Annuity 

A deferred annuity has a stream of cash flows 
of equal amounts at regular periods starting at 
some time after the end of the first period. When 
we calculated the present value of an annuity, 
we brought a series of cash flows back to the 
beginning of the first period—or, equivalently 
the end of the period 0. With a deferred annuity, 
we determine the present value of the ordinary 
annuity and then discount this present value to 
an earlier period. 

To illustrate the calculation of the present 
value of an annuity due, suppose you deposit 
$20,000 per year in an account for 10 years, start¬ 
ing today, for a total of 10 deposits. What will 


be the balance in the account at the end of 10 
years if the balance in the account earns 5% per 
year? The future value of this annuity due is: 

10 

TVdue.io = $20,000 £ (1 + 0.05) 10 “ f+1 

i=i 

( Future value annuity\ 
factor (ordinary) for I 
10 periods and 5% / 
x (1 + 0.05) 

= $20,000(12.5779)(1 + 0.05) = $264,135.74 

Suppose you want to deposit an amount to¬ 
day in an account such that you can withdraw 
$5,000 per year for four years, with the first 
withdrawal occurring five years from today. We 
can solve this problem in two steps: 

Step 1: Solve for the present value of the with¬ 
drawals. 

Step 2: Discount this present value to the 
present. 


The first step requires determining the present 
value of a four-cash-flow ordinary annuity of 
$5,000. This calculation provides the present 
value as of the end of the fourth year (one pe¬ 
riod prior to the first withdrawal): 


PV 4 = $5,000 £ 


t=i (1 + 0.04) f 
= $5,000 (present value annuity factor 

N = 4, i = 4%) 

= $18,149.48 


This means that there must be a balance in the 
account of $18,149.48 at the end of the fourth 
period to satisfy the withdrawals of $5,000 per 
year for four years. 

The second step requires discounting the 
$18,149.48—the savings goal—to the present, 
providing the deposit today that produces the 
goal: 


$18,149.48 
(1 + 0.04) 4 


$15,514.25 


The balance in the account throughout the en¬ 
tire eight-year period is shown in Figure 6 with 
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Year (Today = 0) 

Figure 6 Balance in the Account that Requires a Deposit Today (Year 0) that Permits Withdrawals of 
$5,000 Each Starting at the End of Year 5 


the balance indicated both before and after the 
$5,000 withdrawals. 

Let's look at a more complex deferred annuity. 
Consider making a series of deposits, beginning 
today, to provide for a steady cash flow begin¬ 
ning at some future time period. If interest is 
earned at a rate of 4% compounded per year, 
what amount must be deposited in a savings 
account each year for four years, starting today 
so that $1,000 may be withdrawn each year for 
five years, beginning five years from today? As 
with any deferred annuity, we need to perform 
this calculation in steps: 

Step 1: Calculate the present value of the $1,000 
per year five-year ordinary annuity as of the 
end of the fourth year: 

The present value of the annuity deferred to 
the end of the fourth period is 

5 1 

PV 4 = $1,000;: = $1/000(4.4518) 

= $4,451.80 


Therefore, there must be $4,451.80 in the ac¬ 
count at the end of the fourth year to permit 
five $1,000 withdrawals at the end of each of 
the years 5, 6, 7, 8, and 9. 

Step 2: Calculate the cash flow needed to ar¬ 
rive at the future value of that annuity due 
comprising four annual deposits earning 4% 
compounded interest, starting today. 

The present value of the annuity at the end 
of the fourth year, $4,451.80, is the future value 
of the annuity due of four payments of an un¬ 
known amount. Using the formula for the fu¬ 
ture value of an annuity due, 

4 

$4,451.80 = 1 + 0.04) 4-f+1 

f=i 

= CF (4.2465)(1.04) 

and rearranging, 

CF = $4,451.80/4.4164 = $1,008.02 

Therefore, by depositing $1,008.02 today and 
the same amount on the same date each of the 
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next three years, we will have a balance in the 
account of $4,451.80 at the end of the fourth pe¬ 
riod. With this period 4 balance, we will be able 
to withdraw $1,000 at the end of the following 
five periods. 


LOAN AMORTIZATION 


There are securities backed by various types 
of loans. These include asset-backed secu¬ 
rities, residential mortgage-backed securities, 
and commercial mortgage-backed securities. 
Consequently, it is important to understand the 
mathematics associated with loan amortization. 

If an amount is loaned and then repaid in in¬ 
stallments, we say that the loan is amortized. 
Therefore, loan amortization is the process of cal¬ 
culating the loan payments that amortize the 
loaned amount. We can determine the amount 
of the loan payments once we know the fre¬ 
quency of payments, the interest rate, and the 
number of payments. 

Consider a loan of $100,000. If the loan is re¬ 
paid in 24 annual installments (at the end of 
each year) and the interest rate is 5% per year, 
we calculate the amount of the payments by 
applying the relationship: 


N 




CF 


^ (1 + f)f 

, , Loan payment 

Amount loaned = > --—- 

^ (l + 0‘ 


t=i 


__ 24 L oarl payment 

$ 100,000 = 1 ] F y 


f=l 


(1 + 0.05)' 


We want to solve for the loan payment, that 
is, the amount of the annuity. Using a finan¬ 
cial calculator or spreadsheet, the periodic loan 
payment is $7,247.09 (PV = $100,000; N = 24; 
i — 5%). Therefore, the monthly payments are 
$7,247.09 each. In other words, if payments of 
$7,247.09 are made each year for 24 years (at 
the end of each year), the $100,000 loan will 
be repaid and the lender earns a return that is 
equivalent to a 5% interest on this loan. 

We can calculate the amount of interest and 
principal repayment associated with each loan 


payment using a loan amortization schedule, as 
shown in Table 1. 

The loan payments are determined such that 
after the last payment is made there is no loan 
balance outstanding. Thus, the loan is referred 
to as a fully amortizing loan. Even though the loan 
payment each year is the same, the proportion 
of interest and principal differs with each pay¬ 
ment: The interest is 5% of the principal amount 
of the loan that remains at the beginning of the 
period, whereas the principal repaid with each 
payment is the difference between the payment 
and the interest. As the payments are made, the 
remainder is applied to repayment of the princi¬ 
pal; this is referred to as the scheduled principal 
repayment or the amortization. As the principal 
remaining on the loan declines, less interest is 
paid with each payment. We show the decline 
in the loan's principal graphically in Figure 7. 
The decline in the remaining principal is not a 
linear, but is curvilinear due to the compound¬ 
ing of interest. 

Loan amortization works the same whether 
this is a mortgage loan, a term loan, or any other 
loan in which the interest paid is determined on 
the basis of the remaining amount of the loan. 
The calculation of the loan amortization can be 
modified to suit different principal repayments, 
such as additional lump-sum payments, known 
as balloon payments. For example, if there is a 
$10,000 balloon payment at the end of the loan 
in the loan of $100,000 repaid over 24 years, the 
calculation of the payment is modified as: 


Amount loaned = 


E 


Loan payment 

i=i fl + o* . 

balloon payment 


(1 + 0 


;\N 


$ 100,000 = 


24 


E Loan payment 
(1 + 0.05)' 


t=i 

$10,000 


(1 + 0 


n 24 


The loan payment that solves this equation 
is $7,022.38 (PV = $100,000; N = 24; i = 5%; 
FV = $10,000). The last payment (that is, at the 
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Table 1 Loan Amortization on a $100,000 Loan for 24 Years and an Interest Rate of 5% per Year 


Payment 

Loan 

Payment 

Beginning-of-the- 
Year Principal 

Interest 
on Loan 

Principal Paid Off = 
Payment — Interest 

Remaining 

Principal 

0 





$100,000.00 

1 

$7,247.09 

$100,000.00 

$5,000.00 

$2,247.09 

$97,752.91 

2 

$7,247.09 

$97,752.91 

$4,887.65 

$2,359.44 

$95,393.47 

3 

$7,247.09 

$95,393.47 

$4,769.67 

$2,477.42 

$92,916.05 

4 

$7,247.09 

$92,916.05 

$4,645.80 

$2,601.29 

$90,314.76 

5 

$7,247.09 

$90,314.76 

$4,515.74 

$2,731.35 

$87,583.41 

6 

$7,247.09 

$87,583.41 

$4,379.17 

$2,867.92 

$84,715.49 

7 

$7,247.09 

$84,715.49 

$4,235.77 

$3,011.32 

$81,704.17 

8 

$7,247.09 

$81,704.17 

$4,085.21 

$3,161.88 

$78,542.29 

9 

$7,247.09 

$78,542.29 

$3,927.11 

$3,319.98 

$75,222.32 

10 

$7,247.09 

$75,222.32 

$3,761.12 

$3,485.97 

$71,736.34 

11 

$7,247.09 

$71,736.34 

$3,586.82 

$3,660.27 

$68,076.07 

12 

$7,247.09 

$68,076.07 

$3,403.80 

$3,843.29 

$64,232.78 

13 

$7,247.09 

$64,232.78 

$3,211.64 

$4,035.45 

$60,197.33 

14 

$7,247.09 

$60,197.33 

$3,009.87 

$4,237.22 

$55,960.11 

15 

$7,247.09 

$55,960.11 

$2,798.01 

$4,449.08 

$51,511.03 

16 

$7,247.09 

$51,511.03 

$2,575.55 

$4,671.54 

$46,839.49 

17 

$7,247.09 

$46,839.49 

$2,341.97 

$4,905.12 

$41,934.37 

18 

$7,247.09 

$41,934.37 

$2,096.72 

$5,150.37 

$36,784.00 

19 

$7,247.09 

$36,784.00 

$1,839.20 

$5,407.89 

$31,376.11 

20 

$7,247.09 

$31,376.11 

$1,568.81 

$5,678.28 

$25,697.83 

21 

$7,247.09 

$25,697.83 

$1,284.89 

$5,962.20 

$19,735.63 

22 

$7,247.09 

$19,735.63 

$986.78 

$6,260.31 

$13,475.32 

23 

$7,247.09 

$13,475.32 

$673.77 

$6,573.32 

$6,901.99 

24 

$7,247.09 

$6,901.99 

$345.10 

$6,901.99 

$0.00 


end of the 24th year) is the regular payment of 
$7,022.38, plus the balloon payment, for a total 
of $17,022.38. As you can see in Figure 8, the 
loan amortization is slower when compared to 
the loan without the balloon payment. 


The same mathematics work with term loans. 
Term loans are usually repaid in installments 
either monthly, quarterly, semiannually, or an¬ 
nually. Let's look at the typical repayment 
schedule for a term loan. Suppose that BigRock 


End-of-period loan principal for a $100,000 loan with interest of 5% per year 
over the life of the loan 


$100,000 

Loan 

principal $80,000 
remaining 

after the $60,000 
payment 

$40,000 

$20,000 

$0 



Month 


Figure 7 Loan Amortization 
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End-of-period loan principal for a $100,000 loan with interest of 5% per year 
over the life of the loan and a $10,000 balloon payment at the end of the loan 



Figure 8 Loan Amortization with Balloon Payment 

Corporation seeks a four-year term loan of $100 
million. Let's assume for now that the term loan 
carries a fixed interest rate of 8% and that level 
payments are made monthly. If the annual in¬ 
terest rate is 8%, the rate per month is 8% -b 12 = 
0.6667% per month. In a typical term loan, the 
payments are structured such that the loan is 
fully amortizing. 

For this four-year, $100 million term loan 
with an 8% interest rate, the monthly payment 
is $2,441,292.23 (PV = $100,000,000; N = 48; 
i = 06667%). This amount is determined by 
solving for the annuity payment that equates 
the present value of the payments with the 
amount of the loan, considering a discount 
rate of 0.6667%. In Table 2 we show for each 
month the beginning monthly balance, the in¬ 
terest payment for the month, the amount of the 
monthly, and the ending loan balance. Notice 
that in our illustration, the ending loan balance 
is zero. That is, it is a fully amortizing loan. 

In the loan amortization examples so far, we 
have assumed that the interest rate is fixed 
throughout the loan. However, in many loans 
the interest rate may change during the loan, 
as in the case of a floating-rate loan. The new 
loan rate at the reset date is determined by a 
formula. The formula is typically composed of 
two parts. The first is the reference rate. For 


example, in a monthly pay loan, the loan rate 
might be one-month London Interbank Offered 
Rate (LIBOR). The second part is a spread that 
is added to the reference rate. This spread is re¬ 
ferred to as the quoted margin and depends on 
the credit of the borrower. 

A floating-rate loan requires a recalculation 
of the loan payment and payment schedule at 
each time the loan rate is reset. Suppose in the 
case of BigRock's term loan that the rate remains 
constant for the first three years, but is reset to 
9% in the fourth year. This requires BigRock to 
pay off the principal remaining at the end of 
three years, the $28,064,562.84, in the remain¬ 
ing 12 payments. The revised schedule of pay¬ 
ments and payoff for the fourth year require a 
payment of $2,454,287.47 (PV = $27,064,562.84; 
N = 12; i = 0.09 -b 12 = 0.75%), as shown in 
Table 3. 


THE CALCULATION OL 
INTEREST RATES AND 
YIELDS 

The calculation of the present or future value 
of a lump-sum or set of cash flows requires in¬ 
formation on the timing of cash flows and the 
compound or discount rate. However, there are 
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Table 2 Term Loan Schedule: Fixed Rate, Fully Amortized 


Amount of loan 
Interest rate 

Number of years 
Monthly payment 

$100,000,000 

8% per year 

4 

$2,441,292.33 




Monthly 

Payment 

Beginning-of-the- 
Year Principal 

Interest on 
on Loan 

Principal Paid Off = 
Payment — Interest 

Remaining 

Principal 

1 

$100,000,000.00 

$666,666.67 

$1,774,625.57 

$98,225,374.43 

2 

$98,225,374.43 

$654,835.83 

$1,786,456.40 

$96,438,918.03 

3 

$96,438,918.03 

$642,926.12 

$1,798,366.11 

$94,640,551.91 

4 

$94,640,551.91 

$630,937.01 

$1,810,355.22 

$92,830,196.69 

5 

$92,830,196.69 

$618,867.98 

$1,822,424.26 

$91,007,772.44 

6 

$91,007,772.44 

$606,718.48 

$1,834,573.75 

$89,173,198.69 

7 

$89,173,198.69 

$594,487.99 

$1,846,804.24 

$87,326,394.44 

8 

$87,326,394.44 

$582,175.96 

$1,859,116.27 

$85,467,278.17 

9 

$85,467,278.17 

$569,781.85 

$1,871,510.38 

$83,595,767.79 

10 

$83,595,767.79 

$557,305.12 

$1,883,987.12 

$81,711,780.68 

11 

$81,711,780.68 

$544,745.20 

$1,896,547.03 

$79,815,233.65 

12 

$79,815,233.65 

$532,101.56 

$1,909,190.68 

$77,906,042.97 

13 

$77,906,042.97 

$519,373.62 

$1,921,918.61 

$75,984,124.36 

14 

$75,984,124.36 

$506,560.83 

$1,934,731.41 

$74,049,392.95 

15 

$74,049,392.95 

$493,662.62 

$1,947,629.61 

$72,101,763.34 

16 

$72,101,763.34 

$480,678.42 

$1,960,613.81 

$70,141,149.52 

17 

$70,141,149.52 

$467,607.66 

$1,973,684.57 

$68,167,464.95 

18 

$68,167,464.95 

$454,449.77 

$1,986,842.47 

$66,180,622.49 

19 

$66,180,622.49 

$441,204.15 

$2,000,088.08 

$64,180,534.40 

20 

$64,180,534.40 

$427,870.23 

$2,013,422.00 

$62,167,112.40 

21 

$62,167,112.40 

$414,447.42 

$2,026,844.82 

$60,140,267.58 

22 

$60,140,267.58 

$400,935.12 

$2,040,357.12 

$58,099,910.46 

23 

$58,099,910.46 

$387,332.74 

$2,053,959.50 

$56,045,950.96 

24 

$56,045,950.96 

$373,639.67 

$2,067,652.56 

$53,978,298.40 

25 

$53,978,298.40 

$359,855.32 

$2,081,436.91 

$51,896,861.49 

26 

$51,896,861.49 

$345,979.08 

$2,095,313.16 

$49,801,548.33 

27 

$49,801,548.33 

$332,010.32 

$2,109,281.91 

$47,692,266.42 

28 

$47,692,266.42 

$317,948.44 

$2,123,343.79 

$45,568,922.63 

29 

$45,568,922.63 

$303,792.82 

$2,137,499.42 

$43,431,423.21 

30 

$43,431,423.21 

$289,542.82 

$2,151,749.41 

$41,279,673.80 

31 

$41,279,673.80 

$275,197.83 

$2,166,094.41 

$39,113,579.39 

32 

$39,113,579.39 

$260,757.20 

$2,180,535.04 

$36,933,044.35 

33 

$36,933,044.35 

$246,220.30 

$2,195,071.94 

$34,737,972.42 

34 

$34,737,972.42 

$231,586.48 

$2,209,705.75 

$32,528,266.66 

35 

$32,528,266.66 

$216,855.11 

$2,224,437.12 

$30,303,829.54 

36 

$30,303,829.54 

$202,025.53 

$2,239,266.70 

$28,064,562.84 

37 

$28,064,562.84 

$187,097.09 

$2,254,195.15 

$25,810,367.69 

38 

$25,810,367.69 

$172,069.12 

$2,269,223.12 

$23,541,144.57 

39 

$23,541,144.57 

$156,940.96 

$2,284,351.27 

$21,256,793.30 

40 

$21,256,793.30 

$141,711.96 

$2,299,580.28 

$18,957,213.02 

41 

$18,957,213.02 

$126,381.42 

$2,314,910.81 

$16,642,302.21 

42 

$16,642,302.21 

$110,948.68 

$2,330,343.55 

$14,311,958.66 

43 

$14,311,958.66 

$95,413.06 

$2,345,879.18 

$11,966,079.48 

44 

$11,966,079.48 

$79,773.86 

$2,361,518.37 

$9,604,561.11 

45 

$9,604,561.11 

$64,030.41 

$2,377,261.83 

$7,227,299.28 

46 

$7,227,299.28 

$48,182.00 

$2,393,110.24 

$4,834,189.04 

47 

$4,834,189.04 

$32,227.93 

$2,409,064.31 

$2,425,124.74 

48 

$2,425,124.74 

$16,167.50 

$2,425,124.74 

$0.00 
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Table 3 Term Loan Schedule: Reset Rate, Fully Amortized 

Amount of loan $100,000,000 

Interest rate 8% per year for the first 3 years, 9% thereafter 

Number of years 4 

Monthly payment $2,441,292.33 for the first 3 years, $2,454,287.47 for the fourth year and beyond 

Monthly 

Payment 

Beginning-of-the- 
Year Principal 

Interest on Loan 

Principal Paid Off = 
Payment - Interest 

Remaining 

Principal 

37 

$28,064,562.84 

$210,484.22 

$2,243,803.24 

$25,820,759.59 

38 

$25,820,759.59 

$193,655.70 

$2,260,631.77 

$23,560,127.82 

39 

$23,560,127.82 

$176,700.96 

$2,277,586.51 

$21,282,541.32 

40 

$21,282,541.32 

$159,619.06 

$2,294,668.41 

$18,987,872.91 

41 

$18,987,872.91 

$142,409.05 

$2,311,878.42 

$16,675,994.49 

42 

$16,675,994.49 

$125,069.96 

$2,329,217.51 

$14,346,776.99 

43 

$14,346,776.99 

$107,600.83 

$2,346,686.64 

$12,000,090.35 

44 

$12,000,090.35 

$90,000.68 

$2,364,286.79 

$9,635,803.56 

45 

$9,635,803.56 

$72,268.53 

$2,382,018.94 

$7,253,784.62 

46 

$7,253,784.62 

$54,403.38 

$2,399,884.08 

$4,853,900.54 

47 

$4,853,900.54 

$36,404.25 

$2,417,883.21 

$2,436,017.33 

48 

$2,436,017.33 

$18,270.13 

$2,436,017.34 

$0.00 


many applications in which we are presented 
with values and cash flows, and wish to calcu¬ 
late the yield or implied interest rate associated 
with these values and cash flows. By calculat¬ 
ing the yield or implied interest rate, we can 
then compare investment or financing oppor¬ 
tunities. We first look at how interest rates are 
stated and how the effective interest rate can be 
calculated based on this stated rate, and then 
we look at how to calculate the yield, or rate of 
return, on a set of cash flows. 

Annual Percentage Rate versus 
Effective Annual Rate 

A common problem in finance is comparing 
alternative financing or investment opportuni¬ 
ties when the interest rates are specified in a 
way that makes it difficult to compare terms. 
The Truth in Savings Act requires institutions 
to provide the annual percentage yield for 
savings accounts. As a result of this law, con¬ 
sumers can compare the yields on different 
savings arrangements. But this law does not ap¬ 
ply beyond savings accounts. One investment 
may pay 10% interest compounded semiannu¬ 
ally, whereas another investment may pay 9% 
interest compounded daily. One financing ar¬ 


rangement may require interest compounding 
quarterly, whereas another may require inter¬ 
est compounding monthly. To compare invest¬ 
ments or financing with different frequencies of 
compounding, we must first translate the stated 
interest rates into a common basis. There are 
two ways to convert interest rates stated over 
different time intervals so that they have a com¬ 
mon basis: the annual percentage rate and the 
effective annual interest rate. 

One obvious way to represent rates stated in 
various time intervals on a common basis is 
to express them in the same unit of time—so 
we annualize them. The annualized rate is the 
product of the stated rate of interest per com¬ 
pound period and the number of compounding 
periods in a year. Let i be the rate of interest per 
period and n be the number of compounding 
periods in a year. The annualized rate, also re¬ 
ferred to as the nominal interest rate or the annual 
percentage rate (APR) is: 

APR = i x n 

Consider the following example. Suppose the 
Lucky Break Loan Company has simple loan 
terms: Repay the amount borrowed, plus 50%, 
in six months. Suppose you borrow $10,000 
from Lucky. After six months, you must pay 
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back the $10,000 plus $5,000. The APR on fi¬ 
nancing with Lucky is the interest rate per pe¬ 
riod (50% for six months) multiplied by the 
number of compound periods in a year (two six- 
month periods in a year). For the Lucky Break 
financing arrangement: 

APR = 0.50 x 2 = 1.00 or 100% per year 

But what if you cannot pay Lucky back after 
six months? Lucky will let you off this time, but 
you must pay back the following at the end of 
the next six months: 

* The $10,000 borrowed. 

* The $5,000 interest from the first six months. 

* The 50% of interest on both the unpaid 
$10,000 and the unpaid $5,000 interest 
($15,000 (0.50) = $7,500). 

So, at the end of the year, knowing what is 
good for you, you pay off Lucky: 

Amount of the original loan $10,000 

Interest from first six months 5,000 

Interest on second six months 7,500 

Total payment at end of the year $22,500 

Using the Lucky Break method of financ¬ 
ing, you have to pay $12,500 interest to bor¬ 
row $10,000 for one year's time. Because you 
have to pay $12,500 interest to borrow $10,000 
over one year's time, you pay not 100% inter¬ 
est, but rather 125% interest per year ($12,500/ 
$10,000 = 1.25 = 125%). What's going onhere? It 
looks like the APR in the Lucky Break example 
ignores the compounding (interest on interest) 
that takes place after the first six months. And 
that's the way it is with all APRs. The APR ig¬ 
nores the effect of compounding. Therefore, this 
rate understates the true annual rate of interest 
if interest is compounded at any time prior to 
the end of the year. Nevertheless, APR is an ac¬ 
ceptable method of disclosing interest on many 
lending arrangements, since it is easy to un¬ 
derstand and simple to compute. However, be¬ 
cause it ignores compounding, it is not the best 
way to convert interest rates to a common basis. 


Another way of converting stated interest 
rates to a common basis is the effective rate of 
interest. The effective annual rate (EAR) is the 
true economic return for a given time period— 
it takes into account the compounding of 
interest—and is also referred to as the effective 
rate of interest. 

Using our Lucky Break example, we see that 
we must pay $12,500 interest on the loan of 
$10,000 for one year. Effectively, we are paying 
125% annual interest. Thus, 125% is the effec¬ 
tive annual rate of interest. In this example, we 
can easily work through the calculation of in¬ 
terest and interest on interest. But for situations 
where interest is compounded more frequently, 
we need a direct way to calculate the effective 
annual rate. We can calculate it by resorting 
once again to our basic valuation equation: 

FV = PV( 1 + i) n 

Next, we consider that a return is the change 
in the value of an investment over a period and 
an annual return is the change in value over 
a year. Using our basic valuation equation, the 
relative change in value is the difference be¬ 
tween the future value and the present value, 
divided by the present value: 

ear _ fv ~ pv _ pv ( i+i y i 

PV PV 

Canceling PV from both the numerator and 
the denominator, 

EAR = (1 + i) n - 1 (15) 

Let's look how the EAR is affected by the 
compounding. Suppose that the Safe Savings 
and Loan promises to pay 6% interest on ac¬ 
counts, compounded annually. Since interest is 
paid once, at the end of the year, the effective 
annual return, EAR, is 6%. If the 6% interest 
is paid on a semiannual basis—3% every six 
months—the effective annual return is larger 
than 6% since interest is earned on the 3% inter¬ 
est earned at the end of the first six months. In 
this case, to calculate the EAR, the interest rate 
per compounding period—six months—is 0.03 
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(that is, 0.06/2) and the number of compound¬ 
ing periods in an annual period is 2: 

EAR = (1 + 0.03) 2 - 1 = 1.0609 - 1 = 0.0609 
or 6.09% 

Extending this example to the case of quar¬ 
terly compounding with a nominal interest rate 
of 6%, we first calculate the interest rate per 
period, z, and the number of compounding pe¬ 
riods in a year, n : 

i = 0.06/4 = 0.015 per quarter 
n = 4 quarters in a year 

The EAR is: 

EAR = (1 + 0.015) 4 - 1 = 1.0614 - 1 = 0.0614 
or 6.14% 

As we saw earlier, the extreme frequency 
of compounding is continuous compounding. 
Continuous compounding is when interest is 
compounded at the smallest possible increment 
of time. In continuous compounding, the rate 
per period becomes extremely small: 

APR 

z = - 

oo 

And the number of compounding periods in a 
year, n, is infinite. The EAR is therefore: 

EAR = e APR - 1 (16) 

where e is the natural logarithmic base. 

For the stated 6% annual interest rate com¬ 
pounded continuously, the EAR is: 

EAR = e 0 06 - 1 = 1.0618 - 1 = 0.0618 or 6.18% 

The relation between the frequency of com¬ 
pounding for a given stated rate and the ef¬ 
fective annual rate of interest for this example 
indicates that the greater the frequency of com¬ 
pounding, the greater the EAR. 


Frequency of 
Compounding 

Calculation 

Effective 
Annual Rate 

Annual 

(1 + 0.060) 1 - 1 

6.00% 

Semiannual 

(1 + 0.030) 2 - 1 

6.09% 

Quarterly 

(1 + 0.015) 4 - 1 

6.14% 

Continuous 

e 0.06 _ ! 

6.18% 


Figuring out the effective annual rate is use¬ 
ful when comparing interest rates for different 
investments. It doesn't make sense to compare 
the APRs for different investments having a dif¬ 
ferent frequency of compounding within a year. 
But since many investments have returns stated 
in terms of APRs, we need to understand how 
to work with them. 

To illustrate how to calculate effective annual 
rates, consider the rates offered by two banks. 
Bank A and Bank B. Bank A offers 9.2% com¬ 
pounded semiannually and Bank B other offers 
9% compounded daily. We can compare these 
rates using the EARs. Which bank offers the 
highest interest rate? The effective annual rate 
for Bank A is (1 + 0.046) 2 — 1 = 9.4%. The effec¬ 
tive annual rate for Bank B is (1 + 0.000247) 365 — 
1 = 9.42%. Therefore, Bank B offers the higher 
interest rate. 


Yields on Investments 


Suppose an investment opportunity requires an 
investor to put up $1 million and offers cash in¬ 
flows of $500,000 after one year and $600,000 
after two years. The return on this investment, 
or yield, is the discount rate that equates the 
present values of the $500,000 and $600,000 cash 
inflows to equal the present value of the $1 mil¬ 
lion cash outflow. This yield is also referred to as 
the internal rate of return (IRR) and is calculated 
as the rate that solves the following: 


$1,000,000 


$500,000 
(1 + IRR) 1 


$600,000 
(1 + IRR) 2 


Unfortunately, there is no direct mathematical 
solution (that is, closed-form solution) for the 
IRR, but rather we must use an iterative pro¬ 
cedure. Fortunately, financial calculators and 
financial software ease our burden in this cal¬ 
culation. The IRR that solves this equation is 
6.3941%: 


$1,000,000 


$500,000 

(1.063941) 1 


$600,000 

(1.063941) 2 


In other words, if you invest $1 million today 
and receive $500,000 in one year and $600,000 
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in two years, the return on your investment is 
6.3941%. 

Another way of looking at this same yield 
is to consider that an investment's IRR is the 
discount rate that makes the present value of 
all expected future cash flows—both the cash 
outflows for the investment and the subsequent 
inflows—equal to zero. We can represent the 
IRR as the rate that solves: 

N fr 

$0 = E(TTiw 

Consider another example. Suppose an in¬ 
vestment of $1 million produces no cash flow 
in the first year but cash flows of $200,000, 
$300,000, and $900,000 two, three, and four 
years from now, respectively. The IRR for this 
investment is the discount rate that solves: 

$1,000,000 0 $200,000 
~~ (1 + IRR) 0 + (1 + IRR) 1 + (1 + IRR) 2 
$300,000 $900,000 

+ (1 + IRR) 3 + (1 + IRR) 4 

Using a calculator or a computer, we get the 
precise answer of 10.172% per year. 

We can use this approach to calculate the yield 
on any type of investment, as long as we know 
the cash flows—both positive and negative— 
and the timing of these flows. Consider the 
case of the yield to maturity on a bond. Most 
bonds pay interest semiannually—that is, ev¬ 
ery six months. Therefore, when calculating the 
yield on a bond, we must consider the timing 
of the cash flows to be such that the discount 
period is six months. 

Consider a bond that has a current price of 
90; that is, if the par value of the bond is $1,000, 
the bond's price is 90% of $1,000 or $900. And 
suppose that this bond has five years remain¬ 
ing to maturity and an 8% coupon rate. With 
five years remaining to maturity, the bond has 
10 six-month periods remaining. With a coupon 
rate of 8%, this means that the cash flows for 


interest is $40 every six months. For a 
given bond, we therefore have the following 
information: 


1. Present value = $900 

2. Number of periods to maturity = 10 

3. Cash flow every six months = $40 

4. Additional cash flow at maturity = $1,000 


The six-month yield, rd, is the discount rate 
that solves the following: 


$900 = 



$40 

(1 +r d y 


$ 1,000 

(1 + r d ) w 


Using a calculator or spreadsheet, the six- 
month yield is 5.315%. Bond yields are gen¬ 
erally stated on the basis of an annualized 
yield, referred to as the yield to maturity 
(YTM) on a bond-equivalent basis. This YTM 
is analogous to the APR with semiannual 
compounding. Therefore, yield to maturity is 
10.63%. 


KEY POINTS 

• A present value can be translated into a value 
in the future through compounding. The ex¬ 
treme frequency of compounding is continu¬ 
ous compounding. 

• A future value can be converted into an equiv¬ 
alent value today through discounting. 

• Applications in finance may require the de¬ 
termination of the present or future value of a 
series of cash flows rather than simply a sin¬ 
gle cash flow. The principles of determining 
the future value or present value of a series 
of cash flows are the same as for a single cash 
flow. That is, any number of cash flows can 
be translated into a present or future value. 

• When faced with a series of cash flows, a fi¬ 
nancial modeler must value each cash flow 
individually, and then sum these individual 
values to arrive at the present value of the 
series. 
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The tools of the time value of money can be 
used to value many different patterns of cash 
flows, including perpetuities, annuities due, 
and deferred annuities. Applying the tools 
to these different patterns of cash flows re¬ 
quires specifying the timing of the various 
cash flows. 

The interest on alternative investments is 
stated in different terms, so these interest rates 
must be placed on a common basis so that in¬ 
vestment alternatives can be compared. Typ¬ 
ically, an interest rate on an annual basis is 
specified, using either the annual percentage 
rate or the effective annual rate. The latter 
method is preferred since it takes into consid¬ 
eration the compounding of interest within a 
year. 

The yield on an investment (also referred to as 
internal rate of return) is the interest rate that 
makes the present value of the future cash 
flows equal to the cost of the investment. 


NOTE 

1. For a more detailed treatment of this topic, 
see Drake and Fabozzi (2009). The topic is 
covered in finite mathematics textbooks. See, 
for example, Barnett, Ziegler, and Byleen 
(2002), Mizrahi and Sullivan (1999), and Rolf 
(2007). 
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Abstract: Ordinary algebra deals with operations such as addition and multiplication performed on 
individual numbers. In many applications, however, it is useful to consider operations performed 
on ordered arrays of numbers. This is the domain of matrix algebra. Ordered arrays of numbers 
are called vectors and matrices while individual numbers are called scalars. 


In financial modeling, it is useful to consider 
operations performed on ordered arrays of 
numbers. Ordered arrays of numbers are called 
vectors and matrices while individual numbers 
are called scalars. In this entry, we will discuss 
some concepts, operations, and results of ma¬ 
trix algebra used in financial modeling. 

VECTORS AND MATRICES 
DEFINED 

We begin by defining the concepts of vector and 
matrix. Though vectors can be thought of as par¬ 
ticular matrices, in many cases it is useful to 
keep the two concepts—vectors and matrices— 
distinct. In particular, a number of important 
concepts and properties can be defined for vec¬ 
tors but do not generalize easily to matrices. 1 

Vectors 

An (/-dimensional vector is an ordered array 
of n numbers. Vectors are generally indicated 


with boldface lowercase letters, although we do 
not always follow that convention in this book. 
Thus a vector x is an array of the form: 

x = [xi,..., x n \ ■ 

The numbers a, are called the components of 
the vector x. 

A vector is identified by the set of its com¬ 
ponents. Vectors can be row vectors or column 
vectors. If the vector components appear in a 
horizontal row, then the vector is called a row 
vector, as for instance the vector: 

x= [1,2, 8, 7] 

Here are two examples. Suppose that we let 
w n be a risky asset's weight in a portfolio. As¬ 
sume that there are N risky assets. Then the fol¬ 
lowing vector, w, is a row vector that represents 
a portfolio's holdings of the N risky assets: 

W = [w i W2 ■ ■ ■ Utiv] 

As a second example of a row vector, sup¬ 
pose that we let r„ be the excess return for a 


621 




622 


Finite Mathematics for Financial Modeling 


risky asset. (The excess return is the difference 
between the return on a risky asset and the risk¬ 
free rate.) Then the following row vector is the 
excess return vector: 


r=hr 2 ... r N ] 


If the vector components are arranged in a 
column, then the vector is called a column 
vector. 

For example, we know that a portfolio's ex¬ 
cess return will be affected by what can be 
different characteristics or attributes that affect 
all asset prices. A few examples would be the 
price-earnings ratio, market capitalization, and 
industry. Let us denote for a particular attribute 
a column vector, a, that shows the exposure of 
each risky asset to that attribute, denoted a n : 


a i 
a 2 

a N 


Matrices 

An n x m matrix is a bidimensional ordered 
array of n x m numbers. Matrices are usually 
indicated with boldface uppercase letters. Thus, 
the generic matrix A is an n x in array of the 
form: 



a i,i 

' a i,j 

d\m 

A = 

a i, 1 

&i,j 

di,m 


An, 1 

& n, j 



Note that the first subscript indicates rows 
while the second subscript indicates columns. 
The entries fly—called the elements of the ma¬ 
trix A—are the numbers at the crossing of the 
i-th row and the j-th column. The commas be¬ 
tween the subscripts of the matrix entries are 
omitted when there is no risk of confusion: 
A,- y = fly. A matrix A is often indicated by its 
generic element between brackets: 

A={fl,,} or A = [fl,;l 

l J J nm L v J run 


where the subscripts nm are the dimensions of 
the matrix. 

There are several types of matrices. First there 
is a broad classification of square and rectangu¬ 
lar matrices. A rectangular matrix can have dif¬ 
ferent numbers of rows and columns; a square 
matrix is a rectangular matrix with the same 
number n of rows as of columns. Because of the 
important role that they play in applications, 
we focus on square matrices in the next section. 


SQUARE MATRICES 

The n x n identity matrix, indicated as the matrix 
I„, is a square matrix whose diagonal elements 
(i.e., the entries with the same row and column 
suffix) are equal to one while all other entries 
are zero: 


1 0 • ■ ■ 0 

0 1 • ■ • 0 


0 0 ■ • ■ 1 


A matrix whose entries are all zero is called a 
zero matrix. 

A diagonal matrix is a square matrix whose 
elements are all zero except the ones on the di¬ 
agonal: 


flu 0 0 

0 fl22 • 0 


■ 0 

0 0 • ■ 0 fl„„ 


Given a square n x n matrix A, the matrix 
dg A is the diagonal matrix extracted from A. 
The diagonal matrix dg A is a matrix whose 
elements are all zero except the elements on the 
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diagonal that coincide with those of the matrix 
A: 


0 -11 A12 
a 21 <*22 



^nl ttnl ' ' ' Rnn 

Cl n 0 ... 0 

0 U 22 ... 0 


dgA = 


0 0 * ■ * a yin 


The trace of a square matrix A is the sum of its 
diagonal elements: 


n 

trA = 

i=i 

A square matrix is called symmetric if the el¬ 
ements above the diagonal are equal to the 
corresponding elements below the diagonal: 
djj — ay. A matrix is said to be skew-symmetric 
if the diagonal elements are zero and the ele¬ 
ments above the diagonal are the opposite of 
the corresponding elements below the diago¬ 
nal: fly = —fly/, i ^ j, a,j = 0. 

The most commonly used symmetric ma¬ 
trix in financial economics and econometrics is 
the covariance matrix, also referred to as the 
variance-covariance matrix. For example, sup¬ 
pose that there are N risky assets and that the 
variance of the excess return for each risky asset 
and the covariances between each pair of risky 
assets are estimated. As the number of risky as¬ 
sets is N r there are N 2 elements, consisting of 
N variances (along the diagonal) and N 2 — N 
covariances. Symmetry restrictions reduce the 
number of independent elements. In fact, the 
covariance between risky asset i and risky asset 
j will be equal to the covariance between risky 
asset/ and risky asset i. Notice that the variance- 
covariance matrix is a symmetric matrix. 


DETERMINANTS 

Consider a square, n x n, matrix A. The deter¬ 
minant of A, denoted |A|, is defined as follows: 

|A| = £(-D t(/1 .»f[fl, 

1=1 

where the sum is extended over all permu¬ 
tations (/ 1 ,..., /„) of the set (1,2,..., n) and 
t(ji, ..., j n ) is the number of transpositions (or 
inversions of positions) required to go from (1, 
2,... ,n) to (;'/,..., j n ). Otherwise stated, a de¬ 
terminant is the sum of all products formed 
taking exactly one element from each row with 
each product multiplied by (—l) t b’ 1 >-d»). Con¬ 
sider, for instance, the case n — 2, where there 
is only one possible transposition: 1,2 => 2,1. 
The determinant of a 2 x 2 matrix is therefore 
computed as follows: 

|A| = ( —l)°flufl22 + ( —l) 1 fll2«21 
= «llfl22 — «12«21- 

Consider a square matrix A of order n . Con¬ 
sider the matrix M i; obtained by removing the 
ith row and the jth column. The matrix M/y is 
a square matrix of order (n — 1). The determi¬ 
nant | My | of the matrix M,y is called the minor of 
fly. The signed minor (—l)^ + b|M ; y| is called the 
cofactor of fly and is generally denoted as ay. 

A square matrix A is said to be singular if its 
determinant is equal to zero. An n x m matrix A 
is of rank r if at least one of its (square) r-minors 
is different from zero while all (r + l)-minors, 
if any, are zero. A nonsingular square matrix is 
said to be of full rank if its rank r is equal to its 
order n. 


SYSTEMS OF LINEAR 
EQUATIONS 

A system of n linear equations in m unknown 
variables is a set of n simultaneous equations of 
the following form: 

« 1 , 1*1 + ' ' ' + Cll.mXm = 


^/7,1-^T T * ’ * T Cl\ m Xm — bm 
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The n x m matrix: 



#1,1 

01,; 

01, m 

A = 

0/,l 

0/,; 

tti,m 


1 

’ 0 n, j 

• CLn,m 


formed with the coefficients of the variables 
is called the coefficient matrix. The terms b{ are 
called the constant terms. The augmented matrix 
[A b]—formed by adding to the coefficient ma¬ 
trix a column formed with the constant term—is 
represented below: 



«i,i 

* 01,; 

01,ra b\ 

[A b] = 

Hi, 1 

0j,; 

0/,m 


Hn, 1 

0 w, ; 

U n ,m bn 


If the constant terms on the right side of the 
equations are all zero, the system is called ho¬ 
mogeneous. If at least one of the constant terms 
is different from zero, the system is said to be 
nonhomogeneous. A system is said to be con¬ 
sistent if it admits a solution, that is, if there 
is a set of values of the variables that simulta¬ 
neously satisfy all the equations. A system is 
referred to as inconsistent if there is no set of 
numbers that satisfy the system equations. 

Let's first consider the case of nonhomoge¬ 
neous linear systems. The fundamental theo¬ 
rems of linear systems state that: 

Theorem 1 : A system of n linear equations in m 
unknown is consistent (i.e., it admits a solu¬ 
tion) if and only if the coefficient matrix and 
the augmented matrix have the same rank. 
Theorem 2: If a consistent system of n equations 
in m variables is of rank r < m, it is possible 
to choose n — r unknowns so that the coef¬ 
ficient matrix of the remaining r unknowns 
is of rank r. When these m — r variables are 
assigned any arbitrary value, the value of the 
remaining variables is uniquely determined. 

An immediate consequence of the two funda¬ 
mental theorems is that (1) a system of n equa¬ 


tions in n unknown variables admits a solution, 
and (2) the solution is unique if and only if both 
the coefficient matrix and the augmented ma¬ 
trix are of rank n. 

Let's now examine homogeneous systems. 
The coefficient matrix and the augmented ma¬ 
trix of a homogeneous system always have the 
same rank and thus a homogeneous system is 
always consistent. In fact, the trivial solution 
Xj = ... = x m — 0 always satisfies a homoge¬ 
neous system. 

Consider now a homogeneous system of n 
equations in n unknowns. If the rank of the 
coefficient matrix is n, the system has only the 
trivial solution. If the rank of the coefficient 
matrix is r < n, then Theorem 2 ensures that 
the system has a solution other than the trivial 
solution. 

LINEAR INDEPENDENCE 
AND RANK 

Consider an n x m matrix A. A set of p columns 
extracted from the matrix A: 


' 01,ii 

■ a u p ■ 

0«,z‘i 

Hn.ip 


are said to be linearly independent if it is not 
possible to find p constants f s , s — 1, ..., p such 
that the following n equations are simultane¬ 
ously satisfied: 

Pl a l,ii + • • ' + jSpfll.i,, = 0 

Pl&n.i-i + ' • • + Pp(l n ,i p = 0 

Analogously, a set of q rows extracted from 
the matrix A are said to be linearly indepen¬ 
dent if it is not possible to find q constants X s , 
s —1, ... ,q such that the following m equations 
are simultaneously satisfied: 

Mflii.l + • ' • + XqUi^i — 0 
+-h XqUi^^rn = 0 
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It can be demonstrated that in any matrix the 
number p of linearly independent columns is 
the same as the number q of linearly indepen¬ 
dent rows. This number is equal, in turn, to the 
rank r of the matrix. Recall that an x m matrix 
A is said to be of rank r if at least one of its 
(square) r-minors is different from zero while 
all (r + l)-minors, if any, are zero. The constant 
p, is the same for rows and for columns. We can 
now give an alternative definition of the rank 
of a matrix: 

Given an n x m matrix A, its rank, denoted 
rank( A), is the number r of linearly independent 
rows or columns as the row rank is always equal 
to the column rank. 


VECTOR AND MATRIX 
OPERATIONS 

Let's now introduce the most common oper¬ 
ations performed on vectors and matrices. An 
operation is a mapping that operates on scalars, 
vectors, and matrices to produce new scalars, 
vectors, or matrices. The notion of operations 
performed on a set of objects to produce an¬ 
other object of the same set is the key concept 
of algebra. Let's start with vector operations. 


Clearly the transpose of the transpose is the 
original vector: (x T ) r = x. 


Addition 

Two row (or column) vectors x = [x\, ..., x„], 
y = [y \,..., y n \ with the same number n of 
components can be added. The addition of two 
vectors is a new vector whose components are 
the sums of the components: 

X + y = [Xi + yi,..., Xn + IJ„\ 


This definition can be generalized to any 
number N of summands: 


N 





The summands must be both column or row 
vectors; it is not possible to add row vectors to 
column vectors. 

It is clear from the definition of addition that 
addition is a commutative operation in the 
sense that the order of the summands does not 
matter: x + y = y + x. Addition is also an asso¬ 
ciative operation in the sense that x + (y + z) = 
(x + y) + z. 


Vector Operations 

The following three operations are usually 
defined on vectors: transpose, addition, and 
multiplication. 

Transpose 

The transpose operation transforms a row vec¬ 
tor into a column vector and vice versa. Given 
the row vector x = [x\,x„\, its transpose, de¬ 
noted as x T or x', is the column vector: 



Multiplication 

We define two types of multiplication: 

(1) multiplication of a scalar and a vector, and 

(2) scalar multiplication of two vectors (inner 
product). 2 

The multiplication of a scalar a and a row (or 
column) vector x, denoted as ax, is defined as 
the multiplication of each component of the vec¬ 
tor by the scalar: 

ax = [ax i,..., ax n \. 

A similar definition holds for column vectors. 
It is clear from this definition that multiplication 
by a scalar is associative as: 

a (x + y) — ax + ay 
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The scalar product (also called the inner prod¬ 
uct), of two vectors x, y, denoted as x • y, is 
defined between a row vector and a column 
vector. The scalar product between two vectors 
produces a scalar according to the following 
rule: 

n 

x -y = 

1=1 

Two vectors x, y are said to be orthogonal if 
their scalar product is zero. 

MATRIX OPERATIONS 

Let's now define operations on matrices. The 
following five operations on matrices are usu¬ 
ally defined: transpose, addition, multiplica¬ 
tion, inverse, and adjoint. 

Transpose 

The definition of the transpose of a matrix is an 
extension of the transpose of a vector. The trans¬ 
pose operation consists in exchanging rows 
with columns. Consider the n x m matrix A = 
{flylnm- The transpose of A, denoted A r or A' is 
the m x n matrix whose zth row is the zth column 
of A: 


The following should be clear from this defi¬ 
nition: 

(A^A 

and that a matrix is symmetric if and only if 
A T — A 

Addition 

Consider two n x m matrices A = {zz and 
B = {bij} nm . The sum of the matrices A and B is 
defined as the n x m matrix obtained by adding 
the respective elements: 

A + B = {a,j + bij} nm . 


Note that it is essential for the definition of 
addition that the two matrices have the same 
order zz x zzz. 

The operation of addition can be extended to 
any number N of summands as follows: 

N I N 1 

x>= 

s=l l s=l J nm 

where a Sj . is the generic i,j element of the sth 
summand. 

Multiplication 

Consider a scalar c and a matrix A = {zz,y}„ m . The 
product cA = Ac is the zz x zzz matrix obtained 
by multiplying each element of the matrix by c: 

cA = Ac — \ca;j\ 

l JI nm 

Multiplication of a matrix by a scalar is dis¬ 
tributive with respect to matrix addition: 

c (A -f- B) = cA -4- cB. 

Let's now define the product of two ma¬ 
trices. Consider two matrices A = {rz, t }np and 
B = {b s j} pm . The product C = AB is defined as 
follows: 

C = AB = {dj} = 

The product C = AB is therefore a matrix 
whose generic element {c/y} is the scalar product 
of the zth row of the matrix A and the jth col¬ 
umn of the matrix B. This definition generalizes 
the definition of scalar product of vectors: The 
scalar product of two zz-dimensional vectors is 
the product of an zzxl matrix (a row vector) for 
a lxzz matrix (the column vector). 

Inverse and Adjoint 

Consider two square matrices of order zz A and 
B. If AB = BA = I, then the matrix B is called 
the inverse of A and is denoted as A -1 . It can be 
demonstrated that the two following properties 
hold: 
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Property 1: A square matrix A admits an inverse 
A -1 if and only if it is nonsingular, that is, if 
and only if its determinant is different from 
zero. Otherwise stated, a matrix A admits an 
inverse if and only if it is of full rank. 
Property 2: The inverse of a square matrix, if it 
exists, is unique. This property is a conse¬ 
quence of the property that, if A is nonsin¬ 
gular, then AB = AC implies B = C. 

Consider now a square matrix of order n 
A = {«,,! and consider its cofactors Recall 
that the cofactors ctjj are the signed minors 
(_l)(H-;)|M,y| of the matrix A. The adjoint of the 
matrix A, denoted as Adj(A), is the following 
matrix: 



<* 1,1 

■ <*i 

; ‘ 

&l,n 

Adj (A) = 

<*!,1 

■ <*/ 

/ ‘ 

&i,n 


<*H,1 

■ <*» 

; ‘ 

&n,n 


<* 1,1 

' <* 2,1 • 

<*n,l 


= 

<* 1,1 

• <* 2,1 • 

&n,i 



&1 ,n 

• <* 2 ,n • 

Oi n ,n 



The adjoint of a matrix A is therefore the trans¬ 
pose of the matrix obtained by replacing the 
elements of A with their cofactors. 

If the matrix A is nonsingular, and there¬ 
fore admits an inverse, it can be demonstrated 
that: 


i_ Adj(A) 
|A| 


A square matrix of order n A is said to be 
orthogonal if the following property holds: 


EIGENVALUES AND 
EIGENVECTORS 

Consider a square matrix A of order n and the 
set of all //-dimensional vectors. The matrix A 
is a linear operator on the space of vectors. 
This means that A operates on each vector pro¬ 
ducing another vector subject to the following 
restriction: 

A (ax + by) = a Ax + b Ay 

Consider now the set of vectors x such that 
the following property holds: 

Ax = Ax. 


Any vector such that the above property holds 
is called an eigenvector of the matrix A and the 
corresponding value of X is called an eigenvalue. 

To determine the eigenvectors of a matrix and 
the relative eigenvalues, consider that the equa¬ 
tion Ax = Ax can be written as: 


(A - AI) x = 0 


which can, in turn, be written as a system of 
linear equations: 



Cl\l — 

fll ,j 

fll ,n 

> 

i 

X 

II 

flf,l 

Am h 

tti,n 


flft,l 


ttn,n ^ 


X\ 


Xi 


= 0 


X 


n 


This system of equations has nontrivial solu¬ 
tions only if the matrix A — AI is singular. To 
determine the eigenvectors and the eigenval¬ 
ues of the matrix A we must therefore solve the 
equation: 


AA' = A'A = I n 

Because in this case A must be of full rank, 
the transpose of an orthogonal matrix coincides 
with its inverse: A _1 = A'. 



A 1,1 — ^ 

fll,; 

Al,n 

ii 

i 

< 

flz\l 

, 1 . 

flf,n 


fl«,l 

fln,y 

U n ,n ^ 
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The expansion of this determinant yields a 
polynomial <p{\) of degree n known as the 
characteristic polynomial of the matrix A. The 
equation 0(1) = 0 is known as the characteristic 
equation of the matrix A. In general, this equa¬ 
tion will have n roots X s which are the eigenval¬ 
ues of the matrix A. To each of these eigenvalues 
corresponds a solution of the system of linear 
equations as illustrated below: 


Vi 

r< 

1 . 

cs 

1 _ 

01,; 

01, n 


*i s 

0z, 1 

0z,z 



Xi. 

0«, 1 

0n,; 

* 0«,n h s 


%n s 


Each solution represents the eigenvector x s 
corresponding to the eigenvector k s . The deter¬ 
mination of eigenvalues and eigenvectors is the 
basis for principal component analysis. 


KEY POINTS 

• An n-dimensional vector is an ordered array 
of n numbers with the numbers referred to as 
the components. An n x m matrix is a bidi- 
mensional ordered array of n x m numbers. 

• A rectangular matrix can have different num¬ 
bers of rows and columns; a square matrix is 
a rectangular matrix with the same number 
of rows and columns. An identity matrix is 
a square matrix whose diagonal elements are 
equal to one while all other entries are zero. 


A diagonal matrix is a square matrix whose 
elements are all zero except the ones on the 
diagonal. 

• The trace of a square matrix is the sum of its 
diagonal elements. A symmetric matrix is a 
square matrix where the elements above the 
diagonal are equal to the corresponding el¬ 
ements below the diagonal. The most com¬ 
monly used symmetric matrix in finance is 
the covariance matrix (or variance-covariance 
matrix). 

• The rank of a matrix is used to determine 
the number of solutions of a system of linear 
equations. 

• An operation is a mapping that operates on 
scalars, vectors, and matrices to produce new 
scalars, vectors, or matrices. The notion of op¬ 
erations performed on a set of objects to pro¬ 
duce another object of the same set is the key 
concept of algebra. Five vector operations on 
matrices are transpose, addition, multiplica¬ 
tion, inverse, and adjoint. 

NOTES 

1. Vectors can be thought of as the elements of 
an abstract linear space while matrices are 
operators that operate on linear spaces. 

2. A third type of product between vectors— 
the vector (or outer) product between 
vectors—produces a third vector. We do not 
define it here as it is not typically used in eco¬ 
nomics, though widely used in the physical 
sciences. 
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Abstract: The theory of linear difference equations has found applications in many areas in finance. A 
difference equation is an equation that involves differences between successive values of a function 
of a discrete variance. The theory of linear difference equations covers three areas: solving difference 
equations, describing the behavior of difference equations, and identifying the equilibrium (or 
critical value) and stability of difference equations. 


Linear difference equations are important in 
the context of dynamic econometric models. 
Stochastic models in finance are expressed as 
linear difference equations with random dis¬ 
turbances added. Understanding the behav¬ 
ior of solutions of linear difference equations 
helps develop intuition about the behavior of 
these models. The relationship between differ¬ 
ence equations (the subject of this entry) and 
differential equations is as follows. The latter 
are great for modeling situations in finance 
where there is a continually changing value. 
The problem is that not all changes in value oc¬ 
cur continuously. If the change in value occurs 
incrementally rather than continuously, then 
differential equations have their limitations. 
Instead, a financial modeler can use differ¬ 
ence equations, which are recursively defined 
sequences. 

In this entry we explain the theory of lin¬ 
ear difference equations and describe how to 


compute explicit solutions of different types of 
equations. 

THE LAG OPERATOR L 

The lag operator L is a linear operator that acts on 
doubly infinite time series by shifting positions 
by one place: 

Lx t = x t -i 

The difference operator Ax t = x t — x t -\ can be 
written in terms of the lag operator as 

Ax t = (1 — L)x t 

Products and thus powers of the lag operator 
are defined as follows: 

(L x L)x t = L 2 x t = L(Lx t ) = x t _ 2 

From the previous definition, we can see that 
the z’-th power of the lag operator shifts the 
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series by i places: 

Vx t — x t _i 

The lag operator is linear, that is, given scalars 
a and b we have 

(aV +bL ] )x t — axt-i + bxt-j 

Hence we can define the polynomial 
operator: 

A(L) = (1 - fljL- apL p ) = (^ ~ J2 a ‘ L ^ 


HOMOGENEOUS 
DIFFERENCE EQUATIONS 

Homogeneous difference equations are linear con¬ 
ditions that link the values of variables at dif¬ 
ferent time lags. Using the lag operator L, they 
can be written as follows: 

A(L)x t = (1 — a\L — • • • — apL p )x t 

= (1 — k\L) x • • • x (1 — X p L)x t = 0 

where the A.,-, i = 1,2,..., p are the solutions of 
the characteristic equation: 

Z P — — • • • — flp_jZ — Up 

= (z — Ai) x • • • x (z — A p ) = 0 

Suppose that time extends from 0 =>■ oo, t — 0,1, 
2,... and that the initial conditions (x_i, x_ 2 , ..., 
x_p) are given. 

Real Roots 

Consider first the case of real roots. In this case, 
as we see later in this entry, solutions are sums of 
exponentials. First suppose that the roots of the 
characteristic equation are all real and distinct. 
It can be verified by substitution that any series 
of the form 

x t = c^y 

where C is a constant, solves the homogeneous 
difference equation. In fact, we can write 

(1 - A;L)(CA.-) = CAl - A,CA- _1 = 0 

In addition, given the linearity of the lag oper¬ 
ator, any linear combination of solutions of the 


homogeneous difference equation is another so¬ 
lution. We can therefore state that the follow¬ 
ing series solves the homogeneous difference 
equation: 

v 

x t = £Ch4 

1=1 

By solving the linear system 

v 

x-i = CiX T 1 

i =1 

x~p — ^ C,A ; p 

i=i 

that states that the p initial conditions are satis¬ 
fied, we can determine the p constants Cs. 

Suppose now that all m roots of the charac¬ 
teristic equation are real and coincident. In this 
case, we can represent a difference equation in 
the following way: 

A(L) = 1 - fljL- a P L p = (1 - A L) p 

It can be demonstrated by substitution that, in 
this case, the general solution of the process is 
the following: 

Xt = Ci(A) f + C2f(A) ( + ■ ■ • + Cpt p 1 (A) f 

In the most general case, assuming that all 
roots are real, there will be m < p distinct roots 
cpt, i = 1,2,..., m each of order n,- > 1, 

m 

= v 

i =1 

and the general solution of the process will be 

Xt — Cj(Ai) f + CjhAi) f + • • • + Clf" '(Ai ) 1 + ■ ■ ■ 

+ C”'(X m y + Cft( X m ) 1 + • • • + C"', t" m 1 (A„,) f 

We can therefore conclude that the solutions 
of a homogeneous difference equation whose 
characteristic equation has only real roots is 
formed by a sum of exponentials. If these 
roots have modulus greater than unity, then 
solutions are diverging exponentials; if they 
have modulus smaller than unity, solutions are 
exponentials that go to zero. If the roots are 
unity, solutions are either constants or, if the 
roots have multiplicity greater than 1, polyno¬ 
mials. 
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Figure 1 Solution of the Equation (1 — 0.8 L)x t = 0 with Initial Condition x-\ = 1 


Figure 1 illustrates the simple equation 

A(L)x t = (1 - 0.8L)x t =0,1 = 0.8, 
t = 1,2, ... 

whose solution, with initial condition X\ = 1, is 
x t = 1.25(0.8/ 


The behavior of the solution is that of an expo¬ 
nential decay. 

Figure 2 illustrates the equation 

A(L)x t = (1 + 0.8 L)x t = 0, k — —0.8, 
t = 1,2, ... 

Simulations were run for 100 time steps 



Figure 2 Solution of the Equation (1 + 0.8 L)x t = 0 with Initial Condition X\ = 1 
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Figure 3 Solution of the Equation (1 — 1.7L + 0.72L 2 )x t = 0 with Initial Conditions X\ = I, X 2 = 1.5 


whose solution, with initial condition X\ — 1, is 
x t = —1.25(—0.8) f 

The behavior of the solution is that of an expo¬ 
nential decay with oscillations at each step. The 
oscillations are due to the change in sign of the 
exponential at odd and even time steps. 

If the equation has more than one real root, 
then the solution is a sum of exponentials. 
Figure 3 illustrates the equation 

A(L)x t = (1 - 1.7L + 0.72L 2 )x t = 0, M = 0.8, 
7.2 = 0.9, t = 1,2, ...,n,... 

whose solution, with initial condition Xi = 1, 
x 2 = 1.5, is 

xt = -75(0.8)* + 7.7778(0.9/ 

The behavior of the solution is that of an expo¬ 
nential decay after a peak. 

Figure 4 illustrates the equation 

A(L)x t = (1 - 1.9L + 0.88L 2 )x f = 0, 

ki = 0.8, A .2 = 1.1, t = 1,2, ...,n,... 


whose solution, with initial condition X\ = 1, 
x 2 = 1.5, is 

x t = -1.6667(0.8)' + 2.1212(1.1)' 

The behavior is that of exponential explosion 
due to the exponential with modulus greater 
than 1. 


Complex Roots 

Now suppose that some of the roots are com¬ 
plex. In this case, solutions exhibit an oscillat¬ 
ing behavior with a period that depends on the 
model coefficients. For simplicity, consider ini¬ 
tially a second-order homogeneous difference 
equation: 

A(L)x t = (1 — a\L — a 2 L 2 )x t 

Suppose that its characteristic equation given 
by 


A(z) = z 2 — a\z — a s = 0 
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Figure 4 Solution of the Equation (1 — 1.9L + 0.88L 2 )x f = 0 with Initial Conditions X\ = 1, x 2 = 1.5 


admits the two complex conjugate roots: 
Xi=a+ib, X 2 = a—ib 
Let's write the two roots in polar notation: 

Xi = re la> , Xi = re~ m 

r = ~Ja 2 + b 1 , co = arctan - 

a 

It can be demonstrated that the general solu¬ 
tion of the above difference equation has the 
following form: 

x t = r f (Ci cos(wf) + C 2 sin(«f)) = Cr l cos(cof + if) 

where the Ci and C 2 or C and if are constants to 
be determined in function of initial conditions. 
If the imaginary part of the roots vanishes, then 
co vanishes and a = r, the two complex conjugate 
roots become a real root, and we find again the 
expression x t = Cr f . 

Consider now a homogeneous difference 
equation of order 2 n. Suppose that the char¬ 
acteristic equation has only two distinct com¬ 
plex conjugate roots with multiplicity n. We can 
write the difference equation as follows: 

A(L)x t = (1 — Cl\L — ... — Cl 2 nL 2n )x t 

= [(1 - XL)'\ 1 - XL)"]x t = 0 


and its general solution as follows: 

x t — r f (C{ cos(oif) + C\ sin(<wf)) + • • • 

+ t "r f (C" cos (cot) + C 2 sin(a)f)) 

The general solution of a homogeneous dif¬ 
ference equation that admits both real and com¬ 
plex roots with different multiplicities is a sum 
of the different types of solutions. The above 
formulas show that real roots correspond to a 
sum of exponentials while complex roots cor¬ 
respond to oscillating series with exponential 
dumping or explosive behavior. The above for¬ 
mulas confirm that in both the real and the com¬ 
plex case, solutions decay if the modulus of the 
roots of the inverse characteristic equation is 
outside the unit circle and explode if it is inside 
the unit circle. 

Figure 5 illustrates the equation 

A(L)x t = (1 - 1.2L + 1.0L 2 )x, = 0, 
t = 1 , 2 , ... 

which has two complex conjugate roots, 

X \ = 0.6 + / 0 . 8 , A .2 — 0.6 — z 0.8 
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Figure 5 Solutions of the Equation (1 — 1.2L + 1.0 L 2 )x t = 0 with Initial Conditions X\ = 1 , x 2 = 1.5 


or in polar form, 

M = e' 0 ' 9273 , A 2 = e 109273 

and whose solution, with initial condition x 2 = 
1, x 2 = 1.5, is 

x t = —0.3cos(0.9273f) + 1.475 sin(0.9273f) 

The behavior of the solutions is that of un¬ 
damped oscillations with frequency deter¬ 
mined by the model. 

Figure 6 illustrates the equation 

A(L)x t = (1 - 1.0L + 0.89L 2 )x t = 0, 
t = 1 , 2 , ... 

which has two complex conjugate roots, 

M = 0.5 + z'0.8, X 2 — 0.5 — 10.8 
or in polar form, 

A.i = 0.9434e !l0122 , X 2 = 0.9434e- ! ' 10122 

and whose solution, with initial condition X\ = 
1, x 2 = 1.5, is 

x t = 0.9434 f (—0.5618 cos(1.0122t) 

+ 1.6011 sin(1.0122f)) 


The behavior of the solutions is that of damped 
oscillations with frequency determined by the 
model. 


NONHOMOGENEOUS 
DIFFERENCE EQUATIONS 

Consider now the following n-th order differ¬ 
ence equation: 

A(L)x t = (1 — a\L — • • • — apL v )xt = y t 

where y t is a given sequence of real numbers. 
Recall that we are in a deterministic setting, 
that is, the yt are given. The general solution 
of the above difference equation will be the 
sum of two solutions X\ jt + x 2 ,t where X\ , t is 
the solution of the associated homogeneous 
equation, 

A(L)x t = (1 — a\L — ■ ■ ■ — apL p )x t = 0 

and X 2/t solves the given nonliomogeneous 
equation. 
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Figure 6 Solutions of the Equation (1 — 1.0L + 0.89 L 2 )x t = 0 with Initial Conditions X\ = 1, i' 2 = 1.5 


Real Roots 

To determine the general form of X2,t in the case 
of real roots, we begin by considering the case 
of a first-order equation: 

A(L)x t — (1 — a\L)x t = \j t 

We can compute the solution as follows: 

111 = (T^Tl A = (|> L) ') » 

which is meaningful only for \a\\ < 1. If, how¬ 
ever, \ji starts at t = — 1, that is, if y t = 0 for f = 
—2, —3,we can rewrite the above formula 
as 

This latter formula, which is valid for any real 
value of «i, yields 

x i,o = yo + a iy~i 

x 2 ,i = 1/1 +«iy 0 + fl?y-i 

x 2 ,t = yt + aiyt-i H-b «i +1 y-i 


and so on. These formulas can be easily verified 
by direct substitution. If y t = y = constant, then 

x 2,t = 3/(1 + 


Consider now the case of a second-order 
equation: 

A(L)x t = (1 — a\L — «2 T 2 )x t 

= (1 - A.rL)(l - k 2 L)x t = y t 


where k\, k 2 are the solutions of the character¬ 
istic equation (the reciprocal of the solutions 
of the inverse characteristic equation). We can 
write the solution of the above equation as 

1 1 

* 2 ’ f = (1 -n l L-a 1 L 1 ) yt = (1-^L)(1 - k 2 L) Vt 
Recall that, if |1, | < 1, i = 1, 2, we can write: 


(1 - X 1 L)(1 - X 2 L) 


ki 




M — k 2 \(1 — XiL) (1 — X 2 L) 


k\ 

kl — k2 


E(ml)M - 


j =0 


k2 

k\ — k2 
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so that the solution can be written as 

A.i 


x 2 ,t 


A.i — X 2 
X 2 

k\ — X 2 


£(W U 

j= 0 / 

00 \ 

j=° J 


If the two solutions are coincident, reason¬ 
ing as in the homogeneous case, we can estab¬ 
lish that the general solutions can be written as 
follows: 


1 


x 2 ,t = 


(1-fli L) 2 


Vi = £(«iio j y* 


j=° 


+ t |^£ (fliL) ; I y t 

If y t starts at t = —2, that is, if y t = 0 for f = —3, 
—4,..., — n ,..., we can rewrite the above for¬ 
mula respectively as 




x 2 ,t = 


k\ — X 2 

x 2 

X\ — x 2 


t +2 

£(W ]yt 
J= 0 

' t+2 

£(W]y ( 

j=° 


if the solutions are distinct, and as 


t +2 


x 2 ,t = 


(1 — fliL) 2 


Vi = yt 


j =0 


+ f J y t 

if the solutions are coincident. These formulas 
are valid for any real value of Ai. 

The above formulas can be generalized to 
cover the case of an n-th order difference equa¬ 
tion. In the most general case of an n-th order 
difference equation, assuming that all roots are 
real, there will be m < n distinct roots a,, i = 1, 
2,... ,m, each of order n* > 1, 


s>=- 


2=1 


and the general solution of the process will be 

00 

x 2 j — y ] ((X\L)‘ + i(X 1 L)‘ + • • • + i ni 1 (XiL)‘ + ■ ■ ■ 

i=0 

+ (X m L )’ + i(X m L)‘ + • • • + i" m l {X m L)')y t 
if |A.j| < 1, i = 1,2,..., m, and 

t+m 

x 2 j — y ((/-i t)' + i(X\ Ly + • • • + i ni 1 (XiL)' + • ■ • 

i=0 

+ (X m L)' + i(X m L)' + • • • + i' h " 1 (X m L) l )yt 

if yt starts at t = —n, that is, if y t — 0 for f = — (n 
+ 1), — (n + 2),... for any real value of the Xj. 

Therefore, if the roots are all real, the general 
solution of a difference equation is a sum of 
exponentials. Figure 7 illustrates the case of the 
same difference equation as in Figure 3 with the 
same initial conditions X\ — 1, x 2 — 1.5 but with 
an exogenous forcing sinusoidal variable: 

(1 - 1.7L + 0.72L 2 )x t = 0.1 x sin(0.4 x t ) 

The solution of the equation is the sum of 
*1 , t = —7.5(0.8) f + 7.7778(0.9) f plus 

x 2 t = £ [((0.8)' + (0.9)')0.1 x sin(0.4 x (f — i))] 

After the initial phase dominated by the solu¬ 
tion of the homogeneous equation, the forcing 
term dictates the shape of the solution. 


Complex Roots 

Consider now the case of complex roots. For 
simplicity, consider initially a second-order dif¬ 
ference equation: 

A(L)x t = (1 — a\L — a 2 L 2 )x t — yt 

Suppose that its characteristic equation, 

A(z) = z 2 — fliz — a 2 — 0 

admits the two complex conjugate roots, 

X\ = a + ib, X 2 — a — ib 

We write the two roots in polar notation: 

A.i = re“°, X 2 = re~ to> 
r = ~Ja 2 + b 2 , co = arctan ^ 
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Figure 7 Solutions of the Equation (1 — 1.7L + 0.72L 2 )x t = 0.1 x sin(0.4 x f) with Initial Conditions 
x\ = 1, x 2 = 1.5 


It can be demonstrated that the general form of 
the X 2 ,t of the above difference equation has the 
following form: 

OO 

X 2 ,t = ^2 ( rl ( cos (^0 + sin (coi))y t -i) 

i=i 

which is meaningful only if \r\ < 1. If y f starts 
at t = —2, that is, if yt — 0 for t = —3, —4,..., 
—n, ... we can rewrite the previous formula as 

1+2 

% 2 ,f = ^ (r ! (cos(<wz) + sin(a>z))i/f_;) 

i=l 

This latter formula is meaningful for any real 
value of r. Note that the constant a> is deter¬ 
mined by the structure of the model while the 
constants C\, C 2 that appear in x 1/t need to be 
determined in the function of initial conditions. 
If the imaginary part of the roots vanishes, then 
co vanishes and a = r, the two complex conju¬ 
gate roots become a real root, and we again find 
the expression x t — Ci J . 

Figure 8 illustrates the case of the same dif¬ 
ference equation as in Figure 7 with the same 
initial conditions X\ = 1, X 2 = 1.5 but with an 


exogenous forcing sinusoidal variable: 

(1 - 1.2L + 1.0 L 2 )x t = 0.5 x sin(0.4 x t) 

The solution of the equation is the sum of 
X\ jt = — 0.3cos(0.9273f) + 1.475 sin(0.9273f) plus 

f-i 

* 2 , t = J][(cos(0.9273z) 

;=0 

+ sin(0.9273z))0.5sin(0.4 x (t - z'))] 

After the initial phase dominated by the solu¬ 
tion of the homogeneous equation, the forcing 
term dictates the shape of the solution. Note the 
model produces amplification and phase shift 
of the forcing term 0.1 x sin(0.4 x t) represented 
by a dotted line. 

SYSTEMS OF LINEAR 
DIFFERENCE EQUATIONS 

In this section, we discuss systems of linear dif¬ 
ference equations of the type 

Xl,t = filial,f-l + • • ■ + ZZlJtXfc.f-l + 3/1, t 
%k,t — flfclXl.t-l + ' ‘ ‘ + HkkXk,t-l + yk,t 
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Figure 8 Solutions of the Equation (1 — 1.2L + 1.0L 2 )x t = 0.5 x sin(0.4 x f) with Initial Conditions 
X\ = 1, X2 = 1.5 


or in vector notation: 

x, = Ax f _i + y f 

Observe that we need to consider only first- 
order systems, that is, systems with only one 
lag. In fact, a system of an arbitrary order can be 
transformed into a first-order system by adding 
one variable for each additional lag. For exam¬ 
ple, a second-order system of two difference 
equations, 

Xij — flllXl,f-l +rtl2*2,t-l + b 11*1,f—2 

+ buX 2 ,t -2 + yi.f 

X2,t — Cl2lXl,t-l +Cl22X2,t-l + b2\Xl,t-2 

+ b22X2,t-2 + 1/2, f 

can be transformed in a first-order system 
adding two variables: 

xi,t — auXij-i +cii2X2,t-i + buXij-i 

+ bl2X2,t-l + yi,t 

X2,t — fl21*l,f-l + «22*2,t-l + ^21^1,f-l 
+ fr22*2,f-l + 1/2,t 
Zl,f = Xl,f-1 
Z2,t — X2,t-1 


Transformations of this type can be generalized 
to systems of any order and any number of 
equations. 

A system of difference equations is called ho¬ 
mogeneous if the exogenous variable y f is zero, 
that is, if it can be written as 

x f = Ax f _j 

while it is called nonhomogeneous if the exoge¬ 
nous term is present. 

There are different ways to solve first-order 
systems of difference equations. One method 
consists in eliminating variables as in ordinary 
algebraic systems. In this way, the original first- 
order system in k equations is solved by solving 
a single difference equation of order k with the 
methods explained above. This observation im¬ 
plies that solutions of systems of linear differ¬ 
ence equations are of the same nature as those 
of difference equations (i.e., sums of exponen¬ 
tial and/or sinusoidal functions). In the follow¬ 
ing section we will show a direct method for 
solving systems of linear difference equations. 
This method could be used to solve equations of 
any order, as they are equivalent to first-order 
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systems. In addition, it gives a better insight 
into vector autoregressive processes. 


SYSTEMS OF 

HOMOGENEOUS LINEAR 
DIFFERENCE EQUATIONS 

Consider a homogeneous system of the follow¬ 
ing type: 

x(f) = Ax(f — 1), t = 0,1,..., n,... 

where A is a k x k, real-valued, nonsingular 
matrix of constant coefficients. Using the lag 
operator notation, we can also write the above 
systems in the following form: 

(I — AL)xf =0, f = 1,. 

If a vector of initial conditions x(0) is given, the 
above system is called an initial value problem. 

Through recursive computation, that is, start¬ 
ing at f = 0 and computing forward, we can 
write 

x(l) = Ax(0) 

x(2) = Ax(l) = A 2 x(0) 

x(f) = A f x(0) 

The following theorem can be demonstrated: 
Any homogeneous system of the type x(f) = 
Ax(f — 1), where A is a k x k, real-valued, non¬ 
singular matrix, coupled with given initial con¬ 
ditions x(0) admits one and only one solution. 

A set of k solutions x,(f), i = 1,..., k, t = 0,1, 
2,... are said to be linearly independent if 
k 

= o 

i =1 

f = 0,1,2,... implies c; = 0, i = 1,..., k. Suppose 
now that k linearly independent solutions x,(f), 
i = 1,..., k are given. Consider the matrix 

$(f) = [xi(f) - - - x fc (f)] 


nonsingular for every value t > 0, that is, if 
det[<I»(f)] ^ 0, f = 0, 1,.... Any nonsingular 
matrix O(f), f = 0, 1 ,... such that the matrix 
equation 

O(f) = AO(f - 1) 

is satisfied is called a fundamental matrix of the 
system x(f) = Ax(f — 1), f = 1,..., n,.. . and it 
satisfies the equation 

O(f) = A f 0(0) 

In order to compute an explicit solution of 
this system, we need an efficient algorithm to 
compute the matrix sequence A f . We will dis¬ 
cuss one algorithm for this computation. 1 Recall 
that an eigenvalue of the k x k real valued ma¬ 
trix A = (fl ( y) is a real or complex number X that 
satisfies the matrix equation: 

(A - AI)f = 0 

where £ e C k is a /c-dimensional complex vector. 
The above equation has a nonzero solution if 
and only if 

|(A — AI)| = 0 


or 


det 


/flu — X 


\ Ukl 


H\k \ 


Clkk — 'X) 


= 0 


The above condition can be expressed by the 
following algebraic equation: 


z^+fliz *- 1 


+ •••-)- fljc-lZ + flfc 


which is called the characteristic equation of the 
matrix A = (fly). 

To see the relationship of this equation with 
the characteristic equations of single equations, 
consider the /c-order equation: 

(1 -«iL- a k L k )x(t) = 0 

x t = a\x{t — !) + •••+ a k x(t — k) 


The following matrix equation is clearly satis¬ 
fied: 

4>(f) = A$(f - 1) 

The solutions x,(f), i = 1,..., n are linearly 
independent if and only if the matrix 4>(f) is 


which is equivalent to the first-order system, 

x t = flix f _i H-b a k z\z\ 

z] = X t -1 

ryk~ 1 _ 

Z^—i — %t—k 
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The matrix 

fli a 2 ••• flu fl/c 

1 0 ■■■ 0 0 

A _ 0 1 0 0 


00... 1 0 


is called the companion matrix. By induction, 
it can be demonstrated that the characteristic 
equation of the system x(f) = Ax(f — 1), f = 
1,..., n,.. . and of the A:-order equation above 
coincide. 

Given a system x(f) = Ax(f — 1), f = 1,..., 
n,.. ., we now consider separately two cases: (1) 
All, possibly complex, eigenvalues of the real¬ 
valued matrix A are distinct, and (2) two or 
more eigenvalues coincide. 

Recall that if X is a complex eigenvalue with 
corresponding complex eigenvector £, the com¬ 
plex conjugate number X is also an eigenvalue 
with corresponding complex eigenvector f. 

If the eigenvalues of the real-valued matrix A 
are all distinct, then the matrix can be diagonal¬ 
ized. This means that A is similar to a diagonal 
matrix, according to the matrix equation 


Al 


A = 3 


_ 0 ■ 
S = [fl • ■ ■ Hn] 


0 1 


1 


n _ 


and 



0 1 


--.-l 




t 

n - 


We can therefore write the general solution of 
the system x(f) = Ax(f — 1) as follows: 

x (f) = Ciljfi + • • • + c n A.”§„ 


The c, are complex numbers that need to be 
determined for the solutions to be real and to 
satisfy initial conditions. We therefore see the 
parallel between the solutions of first-order sys¬ 
tems of difference equations and the solutions 
of /c-order difference equations that we have 
determined above. In particular, if solutions are 
all real they exhibit exponential decay if their 
modulus is less than 1 or exponential growth if 
their modulus is greater than 1. If the solutions 
of the characteristic equation are real, they 
can produce oscillating damped or undamped 
behavior with period equal to two time steps. If 
the solutions of the characteristic equation are 
complex, then solutions might exhibit damped 
or undamped oscillating behavior with any 
period. 

To illustrate the above, consider the following 
second-order system: 

X \ £ = 0.6Xl 5 £_i — 0.1X2,f_l — 0.7Xl,t_2 T 0.15X2,f_2 
x 2,f = —0.12 X\j—\ + 0.7x2^—! + 0.22xi 5 £_2 — 0.85x2,f_2 

This system can be transformed in the follow¬ 
ing first-order system: 

X | 2 = 0.6xi,f_i — 0.1x2 t f_i — 0.7x l t _2 T 0.15X2, f _ 2 
X 2 ,f = — 0.12x 1 , f _ 1 -|- 0.7x 2 , ( _i -f - 0.22xi,t_2 — 0.85x2,f_2 
Zl,( = Xi, f _i 
Z2,f — x 2,f-l 

with matrix 

" 0.6 -0.1 -0.7 0.15" 

_ -0.12 0.7 0.22 -0.8 

“10 0 0 

0 1 0 0 

The eigenvalues of the matrix A are distinct 
and complex: 

a, = 0.2654 + 0.7011z, A 2 = Xf = 0.2654 - 0.7011z 
7.3 = 0.3846 + 0.8887/, ^4 = ^3 = 0.3846 — 0.8887/ 

The corresponding eigenvector matrix S is 


" 0.1571 + 0.4150/ 0.1571 -0.4150/ -0.1311 -0.3436/ 

-0.0924 + 0.3928/ 0.0924 - 0.3928/ 0.2346 + 0.5419/ 

0.5920 0.5920 -0.3794 - 0.0167/ 

_ 0.5337 + 0.0702/ 0.5337 - 0.0702/ 0.6098 


-0.1311 + 0.3436/" 
0.2346 - 0.5419/ 
-0.3794 + 0.0167/ 
0.6098 
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Each column of the matrix is an eigenvector. 
The solution of the system is given by 


i/(l) = 1.5; i/(2) = —2. Figure 9 illustrates the 
behavior of solutions. 


x(f) — ClAjfi + C 2 ^ 4 tl + C3^3?3 + C4^3?3 


= C! (0.2654 + 0.70111)' 


/ 0.1571 + 0.4150; \ 
0.0924 + 0.3928; 
0.5920 

V 0.5337 + 0.0702// 


+ c z (0.2654 


0.7011i) f 


/ 0.1571- 0.4150; \ 
0.0924 - 0.3928/ 
0.5920 

V 0.5337 - 0.0702;/ 


+ c 3 (0.3846 +0.8887/)' 


/ -0.1311 + 0.3436; \ 
0.2346 + 0.5419; 
-0.3794 + 0.0167; 

V 0.6098 / 


?3 


Now consider the case in which two or more 
solutions of the characteristic equation are coin¬ 
cident. In this case, it can be demonstrated that 
the matrix A can be diagonalized only if it is 
normal, that is if 

a t a = aa t 

If the matrix A is not normal, it cannot be 
diagonalized. However, it can be put in Jordan 
canonical form. In fact, it can be demonstrated 
that any nonsingular real-valued matrix A is 
similar to a matrix in Jordan canonical form, 

A = PJP 1 


+ Ci(0 .3846 


0.8887;)' 


/-0.1311 - 0.3436; \ 
0.2346 - 0.5419; 
-0.3794 - 0.0167; 

V 0.6098 / 


The four constants c can be determined using 
the initial conditions: (1) = 1; x(2) = 1.2; 


where the matrix J has the form J = diag[Ji,..., 
J k], that is, it is formed by Jordan diagonal blocks: 



0 " 

h . 



Figure 9 Solution of the System 

X\ f = O.Gxi f—i — 0.1x2,;—l — 0.7x+f_2 T 0.15x2 ,;—2 

X 2 ,; — —0.12Xif_3 + 0.7X2,;—! + 0.22x 1 f_ 2 — 0.85X2 ,;—2 
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where each Jordan block has the form 

"Aj 1 0" 

0 Xi ■■■ : 

: : 1 

0 0 Xj _ 

The Jordan canonical form is characterized by 
two sets of multiplicity parameters, the alge¬ 
braic multiplicity and the geometric multiplic¬ 
ity The geometric multiplicity of an eigenvalue 
is the number of Jordan blocks corresponding 
to that eigenvalue, while the algebraic multi¬ 
plicity of an eigenvalue is the number of times 
the eigenvalue is repeated. An eigenvalue that 
is repeated s times can have from 1 to s Jor¬ 
dan blocks. For example, suppose a matrix has 
only one eigenvalue X = 5 that is repeated three 
times. There are four possible matrices with the 
following Jordan representation: 

/5 0 0\ /5 1 0\ /5 0 0\ /5 1 0\ 

050 , 050 , 051, 051 

\0 0 5/ \0 0 5/ \0 0 5/ \0 0 5/ 

These four matrices have all algebraic multi¬ 
plicity 3 but geometric multiplicity from left to 
right 1,2, 2, 3, respectively. 

KEY POINTS 

* Homogeneous difference equations are linear 
conditions that link the values of variables at 
different time lags. 

* In the case of real roots, solutions are sums of 
exponentials. Any linear combination of solu¬ 


tions of the homogeneous difference equation 
is another solution. 

• When some of the roots are complex, the so¬ 
lutions of a homogeneous difference equation 
exhibit an oscillating behavior with a period 
that depends on the model coefficients. 

• The general solution of a homogeneous dif¬ 
ference equation that admits both real and 
complex roots with different multiplicities is 
a sum of the different types of solutions. 

• A system of difference equations is called 
homogeneous if the system's exogenous vari¬ 
able is zero, and nonhomogeneous if the ex¬ 
ogenous term is present. 

• One method of solving first-order systems of 
difference equations is by eliminating vari¬ 
ables as in ordinary algebraic systems; an¬ 
other way is a direct method that can be used 
to solve systems of linear difference equations 
of any order. 

NOTE 

1. This discussion of systems of difference 
equations draws on Elaydi (2002). 
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Abstract: In financial modeling, the goal is to be able to represent the problem at hand as a 
mathematical function. In a mathematical function, the dependent variable depends on one or 
more variables that are referred to as independent variables. In standard calculus, there are two 
basic operations with mathematical functions: differentiation and integration. The differentiation 
operation leads to derivatives. When a mathematical function has only one independent variable, 
then the derivative is referred to as an ordinary derivative. Typically in financial applications, the 
independent variable is time. The derivative of a mathematical function that has more than one 
independent variable (one of which is typically time) is called a partial derivative. A differential 
equation is an equation that contains derivatives. When it contains only an ordinary derivative, it 
is referred to as an ordinary differential equation; when the differential equation contains partial 
derivatives, the differential equation is called a partial differential equation. 


In nontechnical terms, differential equations are 
equations that express a relationship between 
a function and one or more derivatives (or dif¬ 
ferentials) of that function. The highest order of 
derivatives included in a differential equation 
is referred to as its order. In financial modeling, 
differential equations are used to specify the 
laws governing the evolution of price distribu¬ 
tions, deriving solutions to simple and complex 
options, and estimating term structure models. 
In most applications in finance, only first- and 
second-order differential equations are found. 

Differential equations are classified as or¬ 
dinary differential equations and partial dif¬ 
ferential equations depending on the type of 


derivatives included in the differential equa¬ 
tion. When there is only an ordinary derivative 
(i.e., a derivative of a mathematical function 
with only one independent variable), the dif¬ 
ferential equation is called an ordinary differen¬ 
tial equation. For differential equations where 
there are partial derivatives (i.e., a derivative 
of a mathematical function with more than 
one independent variable), then the differential 
equation is called a partial differential equation. 
Typically in differential equations, one of the 
independent variables is time. A differential 
equation may have a derivative of a mathe¬ 
matical function where one or more of the in¬ 
dependent variables is a random variable or a 
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stochastic process. In such instances, the dif¬ 
ferential equation is referred to as a stochastic 
differential equation. 

The solutions to a differential equation or sys¬ 
tem of differential equations can be as simple as 
explicit formulas. When an explicit formula is 
not possible to obtain, various numerical meth¬ 
ods can be used to approximate a solution. Even 
in the absence of an exact solution, properties 
of solutions of a differential equation can be de¬ 
termined. A large number of properties of dif¬ 
ferential equations have been established over 
the last three centuries. This entry provides only 
a brief introduction to the concept of differen¬ 
tial equations and their properties, limiting our 
discussion to the principal concepts. We do not 
cover stochastic differential equations. 

DIFFERENTIAL EQUATIONS 
DEFINED 

A differential equation is a condition expressed 
as a functional link between one or more 
functions and their derivatives. It is expressed 
as an equation (that is, as an equality between 
two terms). 

A solution of a differential equation is a func¬ 
tion that satisfies the given condition. For ex¬ 
ample, the condition 

Y"(x) + aY'(x) + fY(x) - b(x) = 0 

equates to zero a linear relationship between 
an unknown function Y(x), its first and second 
derivatives Y'(x),Y"(x), and a known function 
b(x). (In some equations we will denote the first 
and second derivatives by a single and dou¬ 
ble prime, respectively.) The unknown function 
Y(x) is the solution of the equation that is to be 
determined. 

There are two broad types of differential equa¬ 
tions: ordinary differential equations and par¬ 
tial differential equations. Ordinary differential 
equations are equations or systems of equa¬ 
tions involving only one independent variable. 
Another way of saying this is that ordinary 


differential equations involve only total deriva¬ 
tives. In contrast, partial differential equations 
are differential equations or systems of equa¬ 
tions involving partial derivatives. That is, there 
is more than one independent variable. 

ORDINARY DIFFERENTIAL 
EQUATIONS 

In full generality, an ordinary differential equa¬ 
tion (ODE) can be expressed as the following 
relationship: 

F[x, Y(x), Y 3 (x),..., Y (n) (x)] = 0 

where Y^ m \x) denotes the m -th derivative of an 
unknown function Y(x). If the equation can be 
solved for the n-th derivative, it can be put in 
the form: 

Y (n \x) = G[x, Y(x), Y (1) (x),..., Y ( "“ 1) (x)] 

Order and Degree of an ODE 

A differential equation is classified in terms of 
its order and its degree. The order of a differen¬ 
tial equation is the order of the highest deriva¬ 
tive in the equation. For example, the above 
differential equation is of order n since the high¬ 
est order derivative is Y^ n \x). The degree of 
a differential equation is determined by look¬ 
ing at the highest derivative in the differential 
equation. The degree is the power to which that 
derivative is raised. 

For example, the following ordinary differen¬ 
tial equations are first-degree differential equa¬ 
tions of different orders: 

Y (1) (x) - 10Y(x) + 40 = 0 (order 1) 

4Y (3) (x) + Y (2) (x) + Y (1) (x) - 0.5Y(x) + 100 = 0 

(order 3) 

The following ordinary differential equations 
are of order 3 and fifth degree: 

4[Y (3) (x)] 5 + [Y (2) (x)] 2 + Y (1) (x) - 0.5Y(x) 

+ 100 = 0 

4[Y (3) (x)] 5 + [Y (2) (x)] 3 + Y (1) (x) - 0.5Y(x) 

+ 100 = 0 
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When an ordinary differential equation is of the 
first degree, it is said to be a linear ordinary dif¬ 
ferential equation. 

Solution to an ODE 

Let's return to the general ODE. A solution of 
this equation is any function y(x) such that: 

F[x, y(x), y (1) (x),..., y (n \x)\ = 0 

In general there will be not one but an infinite 
family of solutions. For example, the equation 

Y (1) (x) = aY(x) 

admits, as a solution, all the functions of the 
form 

y(x) — C exp(ax) 

To identify one specific solution among the 
possible infinite solutions that satisfy a differ¬ 
ential equation, additional restrictions must be 
imposed. Restrictions that uniquely identify a 
solution to a differential equation can be of var¬ 
ious types. For instance, one could impose that 
a solution of an n-th order differential equation 
passes through n given points. A common type 
of restriction—called an initial condition —is ob¬ 
tained by imposing that the solution and some 
of its derivatives assume given initial values at 
some initial point. 

Given an ODE of order n, to ensure the 
uniqueness of solutions it will generally be 
necessary to specify a starting point and the 
initial value of n —1 derivatives. It can be demon¬ 
strated, given the differential equation 

F[x, Y(x), Y (1) (x),..., Y (n) (x)] = 0 

that if the function F is continuous and all of 
its partial derivatives up to order n are con¬ 
tinuous in some region containing the values 
i/o,..., yf l ! , then there is a unique solution y(x) 
of the equation in some interval I = (M < x < 
L) such that yo = Y(xo), ■ • •, Y (n_1) (xo). 1 

Note that this theorem states that there is an 
interval in which the solution exists. Existence 
and uniqueness of solutions in a given interval 


is a more delicate matter and must be examined 
for different classes of equations. 

The general solution of a differential equation 
of order n is a function of the form 

y = <p(x,Ci, 

that satisfies the following two conditions: 

• Condition 1. The function y = cp(x, C\, C„) 
satisfies the differential equation for any n- 
tuple of values (Ci,..., C„). 

• Condition 2. Given a set of initial conditions 
j/(x 0 ) = yo, ... , y ( " _1) (x 0 ) = j/o" -1) that belong 
to the region where solutions of the equa¬ 
tion exist, it is possible to determine n con¬ 
stants in such a way that the function y = 
<p(x, Ci,..., C„) satisfies these conditions. 
The coupling of differential equations with 

initial conditions embodies the notion of uni¬ 
versal determinism of classical physics. Given 
initial conditions, the future evolution of a 
system that obeys those equations is com¬ 
pletely determined. This notion was force¬ 
fully expressed by Pierre-Simon Laplace in the 
eighteenth century: A supernatural mind who 
knows the laws of physics and the initial con¬ 
ditions of each atom could perfectly predict the 
future evolution of the universe with unlimited 
precision. 

In the twentieth century, the notion of univer¬ 
sal determinism was challenged twice in the 
physical sciences. First in the 1920s the devel¬ 
opment of quantum mechanics introduced the 
so-called indeterminacy principle which estab¬ 
lished explicit bounds to the precision of mea¬ 
surements. Later, in the 1970s, the development 
of nonlinear dynamics and chaos theory showed 
how arbitrarily small initial differences might 
become arbitrarily large: The flapping of a but¬ 
terfly's wings in the southern hemisphere might 
cause a tornado in the northern hemisphere. 

SYSTEMS OF ORDINARY 
DIFFERENTIAL EQUATIONS 

Differential equations can be combined to form 
systems of differential equations. These are sets 
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of differential conditions that must be satisfied 
simultaneously. A first-order system of differential 
equations is a system of the following type: 

dy i 

— = fi(x, yi,..., y n ) 
dlh 

— = f 2 (x,y 1 ,...,y„) 

dyn r , \ 

-fo = Jn(x, yL. IJn) 

Solving this system means finding a set of func¬ 
tions ij\, ■ ■ ■ ,y n that satisfy the system as well as 
the initial conditions: 


ljl(x 0 ) = l/io, • • • , ljn(x o) = IJnO 

Systems of orders higher than one can be re¬ 
duced to first-order systems in a straightfor¬ 
ward way by adding new variables defined as 
the derivatives of existing variables. As a conse¬ 
quence, an n -th order differential equation can 
be transformed into a first-order system of n 
equations. Conversely, a system of first-order 
differential equations is equivalent to a single 
«-th order equation. 

To illustrate this point, let's differentiate the 
first equation to obtain 

d 2 yi 9/i 9/i dxji 9/i dy n 

dx 2 dx 9j/i dx dy n dx 

Replacing the derivatives 

dy i dyn 

dx ' '' dx 

with their expressions fi,...,f n from the sys¬ 
tem's equations, we obtain 

d 2 t/i _ , 

—y = P 2 (x,yi,...,y n ) 

dx 

If we now reiterate this process, we arrive at the 
n-th order equation: 

d (n) yi 

— = F n (x,y 1 ,...,y n ) 


We can thus write the following system: 

dyi ,, , 

d2y 

—4 = F 2 (x,y 1 ,...,y n ) 

ax 

d^yi 

-JW = F " (X ’ yn) 

We can express y 2 , ■ ■ • ,y« as functions of x, \j\, 
y' v , yj” -1) by solving, if possible, the system 
formed with the first n —1 equations: 

y 2 = <p 2 (x, yi, yi,..., yf -1) ) 

y 3 = (p 3 (x, yi, yi,..., yf _1) ) 


y„ = yi, yi,..., yf :) ) 


Substituting these expressions into the «-th 
equation of the previous system, we arrive at 
the single equation: 


d^y\ 

dx (n) 


cD(x,yi,...,yf 1} ) 


Solving, if possible, this equation, we find the 
general solution 


yi = yi(x,Ci,...,C„) 


Substituting this expression for yi into the pre¬ 
vious system, y 2 ,... ,y„ can be computed. 


CLOSED-FORM SOLUTIONS 
OF ORDINARY 
DIFFERENTIAL EQUATIONS 

Let's now consider the methods for solving two 
types of common differential equations: equa¬ 
tions with separable variables and equations of 
linear type. Let's start with equations with sep¬ 
arable variables. Consider the equation 

* = 
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This equation is said to have separable variables 
because it can be written as an equality between 
two sides, each depending on only y or only x. 
We can rewrite our equation in the following 
way: 

= f{x)dx 

This equation can be regarded as an equality be¬ 
tween two differentials in y and x respectively 
Their indefinite integrals can differ only by a 
constant. Integrating the left side with respect 
to y and the right side with respect to x, we 
obtain the general solution of the equation: 


fl§)=I fMdx+ 


C 


For example, if g(y) = y, the previous equation 
becomes 

— = f(x)dx 

whose solution is 

I ~y~ = I + ^ 

logy = Jf(x)dx + C =b y = A exp (Jf(x)dx) 
where A = exp(C). 

A differential equation of this type describes 
the continuous compounding of time-varying 
interest rates. Consider, for example, the growth 
of capital C deposited in a bank account that 
earns the variable but deterministic rate r —f(t). 
When interest rates Ri are constant for discrete 
periods of time Af„ compounding is obtained 
by purely algebraic formulas as follows: 


RiAti 


Solving for C(f,): 


C(tj) - C(fj-Afr) 

C(ti-Afj) 


C(ti) — (1 + Ri At;)C(f; -A!, ) 

By recursive substitution we obtain 

C(f,) = (1 + R, Af;)(l + _R,_iAf,_i)... 
(1 + RiAh)C(fo) 


However, market interest rates are subject to 
rapid change. In the limit of very short time 
intervals, the instantaneous rate r(t) would be 
defined as the limit, if it exists, of the discrete 
interest rate: 


r(t) = lim 

Af—*-0 


C(f + Af)-C(f) 
AfC(f) 


The above expression can be rewritten as a sim¬ 
ple first-order differential equation in C: 


r(f)C(f) = 


dC(t) 

dt 


In a simple intuitive way, the above equation 
can be obtained considering that in the elemen¬ 
tary time dt the bank account increments by 
the amount dC = C(t)r(t)dt. In this equation, 
variables are separable. It admits the family of 
solutions: 

C = A exp( / r(t)dt) 


where A is the initial capital. 


Linear Differential Equation 

Linear differential equations are equations of the 
following type: 

a„(x)y (n) + fl„_i(x)y (,,_1) H-hfli(x)y (1) 

+ a 0 (x)y + b(x) = 0 

If the function b is identically zero, the equation 
is said to be homogeneous. 

In cases where the coefficients a's are con¬ 
stant, Laplace transforms provide a powerful 
method for solving linear differential equations. 
(Laplace transforms are one of two popular 
integral transforms—the other being Fourier 
transforms—used in financial modeling. Inte¬ 
gral transforms are operations that take any 
function into another function of a different 
variable through an improper integral.) Con¬ 
sider, without loss of generality, the following 
linear equation with constant coefficients: 

fl„y (,!) + «„_iy ( " -1) H-b «iy (1) + a 0 y = b(x) 
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together with the initial conditions: y(0) = 
y 0/ ... ,y ( " _1) (0) = In cases in which the 

initial point is not the origin, by a variable trans¬ 
formation we can shift the origin. 


Laplace Transform 

For one-sided Laplace transforms the following 
formulas hold: 


C 



C 


( d n f(x) \ 
V dx n ) 


= sC[f(x)] - /(0) 

= s"C[f(x)]-s n - 1 f( ())-■■■ 
-/ (n - 1) (0) 


Suppose that a function y — y(x ) satisfies the pre¬ 
vious linear equation with constant coefficients 
and that it admits a Laplace transform. Apply 
one-sided Laplace transform to both sides of 
the equation. If Y(s) = £[y(x)\, the following 
relationships hold: 

L(fl„y (n) + fl„_iy ( ” _1) H-F «iy (1) + a 0 y) 

= L[b(x)] 

a„[s n Y(s) - s"- 1 y (1) (0)-y (,,_1) (0)] 

+«„_i[s”- 1 Y(s) - s”- 2 yW(0)-y ( ” _2) (0)] 

+ ■ • • + UqY(s) = B(s) 


Solving this equation for Y(s), that is, Y(s) = 
g[s,y®(0),... ,y (n_1) (0)] the inverse Laplace trans¬ 
form y(t) = £x ] [Y(s)] uniquely determines the 
solution of the equation. 

Because inverse Laplace transforms are in¬ 
tegrals, with this method, when applicable, 
the solution of a differential equation is re¬ 
duced to the determination of integrals. Laplace 
transforms and inverse Laplace transforms are 
known for large classes of functions. Because of 
the important role that Laplace transforms play 
in solving ordinary differential equations in en¬ 
gineering problems, there are published refer¬ 
ence tables. Laplace transform methods also 
yield closed-form solutions of many ordinary 
differential equations of interest in economics 
and finance. 


NUMERICAL SOLUTIONS OL 
ORDINARY DILLERENTIAL 
EQUATIONS 

Closed-form solutions are solutions that can be 
expressed in terms of known functions such 
as polynomials or exponential functions. Be¬ 
fore the advent of fast digital computers, the 
search for closed-form solutions of differential 
equations was an important task. Today, thanks 
to the availability of high-performance comput¬ 
ing, most problems are solved numerically. This 
section looks at methods for solving ordinary 
differential equations numerically. 

The Finite Difference Method 

Among the methods used to numerically solve 
ordinary differential equations subject to ini¬ 
tial conditions, the most common is the fi¬ 
nite difference method. The finite difference 
method is based on replacing derivatives with 
difference equations; differential equations are 
thereby transformed into recursive difference 
equations. 

Key to this method of numerical solution is 
the fact that ODEs subject to initial conditions 
describe phenomena that evolve from some 
starting point. In this case, the differential 
equation can be approximated with a system 
of difference equations that compute the next 
point based on previous points. This would 
not be possible should we impose boundary 
conditions instead of initial conditions. In this 
latter case, we have to solve a system of linear 
equations. 

To illustrate the finite difference method, con¬ 
sider the following simple ordinary differential 
equation and its solution in a finite interval: 

/'(.>■) = m 

f =iv 

log f(x) = x + C 

f(x) = exp(x + C) 

As shown, the closed-form solution of the equa¬ 
tion is obtained by separation of variables, that 
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Figure 1 Numerical Solutions of the Equation f'=f with the Euler Approximation for Different 
Step Sizes 


is, by transforming the original equation into 
another equation where the function/ appears 
only on the left side and the variable x only on 
the right side. 

Suppose that we replace the derivative with 
its forward finite difference approximation and 
solve 

f( * M> ~ = /<*) 

%i+1 %i 

f(x i+ i) = [1 + (Xi +1 - Xi)\f{Xi) 

If we assume that the step size is constant for 
all i: 

f(xt) = [1 + Ai]'/(i 0 ) 

The replacement of derivatives with finite dif¬ 
ferences is often called the Euler approxima¬ 
tion. The differential equation is replaced by 
a recursive formula based on approximating 
the derivative with a finite difference. The /-th 
value of the solution is computed from the 
i — 1-th value. Given the initial value of the func¬ 


tion/, the solution of the differential equation 
can be arbitrarily approximated by choosing a 
sufficiently small interval. Figure 1 illustrates 
this computation for different values of Ax. 

In the previous example of a first-order lin¬ 
ear equation, only one initial condition was 
involved. Let's now consider a second-order 
equation: 


f"(x) = kf(x) = 0 


This equation describes oscillatory motion, 
such as the elongation of a pendulum or the 
displacement of a spring. 

To approximate this equation we must ap¬ 
proximate the second derivative. This could 
be done, for example, by combining difference 
quotients as follows: 


/'(*) 
f'(x + Ax) 


f(x + Ax) — f(x) 

Ax 

f(x + 2Ax) — f(x + Ax) 


Ax 
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Figure 2 Numerical Solution of the Equation/" +/ = 0 with the Euler Approximation 


/"(*) 


f'(x + Ax) — f'(x) 
Ax 


f(x + 2Ax) — /(x — Ax) f(x + Ax) — f(x) 


Ax 


Ax 


Ax 

f(x + 2Ax) — 2/(x + Ax) + /(x) 
(Ax) 2 


With this approximation, the original equation 
becomes 


/"(x) + */(x)« 

f(x + 2Ax) — 2 f(x + Ax) + f(x) 
(Ax) 2 


+ kf(x) = 0 


f(x + 2 Ax) — 2 f(x + Ax) + (1 + k(Ax) 2 )f(x) 

= 0 


We can thus write the approximation scheme: 

f(x + Ax) = f(x) + A x/'(x) 
f(x + 2 Ax) = 2/(x + Ax) — (1 + k(Ax) 2 ) f (x) 

Given the increment Ax and the initial values 
/(0),/'(0), using the above formulas we can re¬ 
cursively compute/(0 + Ax),/(0 + 2 Ax), and 
so on. Figure 2 illustrates this computation. 


In practice, the Euler approximation scheme 
is often not sufficiently precise and more sophis¬ 
ticated approximation schemes are used. For 
example, a widely used approximation scheme 
is the Runge-Kutta method. We give an exam¬ 
ple of the Runge-Kutta method in the case of 
the equation/" + / = 0 which is equivalent to 
the linear system: 

*' = V 
\J = -x 

In this case the Runge-Kutta approximation 
scheme is the following: 


h = hy(i) 
h\ = —hx(i) 


k 2 = h 
h 2 = -h 
k 3 = h 
h 3 = -h 


3/(0 + 2 hl 

1 

x(z)+ -/Cl 

3/(0 + \h 2 

1 

x(0 + -k 2 


















Differential Equations 


651 



Figure 3 Numerical Solution of the Equation/' =/ with the Runge-Kutta Method After 10 Steps 


h = h[y(i) + h 3 ] 
hi = -h[x(i) + k 3 \ 

1 

x(i + 1) = x(z) + —(ki + 2k 3 + 2k 3 + ki) 

6 

1 

y(i + 1) = y(i) + -(In + 2 h 2 + 2h 3 + h 4 ) 

6 

Figures 3 and 4 illustrate the results of this 
method in the two cases/' —f and/" +/= 0. 

As mentioned above, this numerical method 
depends critically on our having as givens (1) 
the initial values of the solution, and (2) its 
first derivative. Suppose that instead of initial 
values two boundary values were given, for 
instance the initial value of the solution and 
its value 1,000 steps ahead, that is, /(0) = /o, 
/(0 + 1,000 Ax) =/ 1000 . Conditions like these are 
rarely used in the study of dynamical systems as 
they imply foresight, that is, knowledge of the 
future position of a system. However, they of¬ 
ten appear in static systems and when trying 
to determine what initial conditions should be 
imposed to reach a given goal at a given date. 

In the case of boundary conditions, one can¬ 
not write a direct recursive scheme; it's neces¬ 


sary to solve a system of equations. For instance, 
we could introduce the derivative/'(x) = <5 as 
an unknown quantity. The difference quotient 
that approximates the derivative becomes an 
unknown. We can now write a system of linear 
equations in the following way: 

/(Ax) = /o + <5 Ax 

/(2 Ax) = 2/(Ax) - (1 +k(Ax) 2 )/ 0 

/(3 Ax) = 2/(2 Ax) — (1 +k(Ax) 2 f(Ax) 


/iooo = 2/(999 Ax) - (1 + k( Ax) 2 )/( 998 Ax) 

This is a system of 1,000 equations in 1,000 
unknowns. Solving the system we compute the 
entire solution. In this system two equations, 
the first and the last, are linked to boundary 
values; all other equations are transfer equa¬ 
tions that express the dynamics (or the law) of 
the system. This is a general feature of bound¬ 
ary value problems. We will encounter it again 
when discussing numerical solutions of partial 
differential equations. 





652 


Finite Mathematics for Financial Modeling 



Figure 4 Numerical Solution of the Equation/" +/ = 0 with the Runge-Kutta Method 


In the above example, we chose a forward 
scheme where the derivative is approximated 
with the forward difference quotient. One 
might use a different approximation scheme, 
computing the derivative in intervals centered 
around the point x. When derivatives of higher 
orders are involved, the choice of the approx¬ 
imation scheme becomes critical. Recall that 
when we approximated first and second deriva¬ 
tives using forward differences, we were re¬ 
quired to evaluate the function at two points (z, i 
+ 1) and three points (z,z + l,z + 2) ahead respec¬ 
tively. If purely forward schemes are employed, 
computing higher-order derivatives requires 
many steps ahead. This fact might affect the pre¬ 
cision and stability of numerical computations. 

We saw in the examples that the accuracy of 
a finite difference scheme depends on the dis¬ 
cretization interval. In general, a finite differ¬ 
ence scheme works, that is, it is consistent and 
stable, if the numerical solution converges uni¬ 
formly to the exact solution when the length of 
the discretization interval tends to zero. Sup¬ 
pose that the precision of an approximation 


scheme depends on the length of the discretiza¬ 
tion interval Ax. Consider the difference 8f = 
/(x) — /(x) between the approximate and the 
exact solutions. We say that 8f 0 uniformly 
in the interval [a,b] when Ax—0 if, given any e 
arbitrarily small, it is possible to find a Ax such 
that \8f\ < s, Vx e [a, b]. 

NONLINEAR DYNAMICS 
AND CHAOS 

Systems of differential equations describe dy¬ 
namical systems that evolve starting from initial 
conditions. A fundamental concept in the the¬ 
ory of dynamical systems is that of the stability 
of solutions. This topic has become of paramount 
importance with the development of nonlin¬ 
ear dynamics and with the discovery of chaotic 
phenomena. We can only give a brief introduc¬ 
tory account of this subject whose role in eco¬ 
nomics is still the subject of debate. 

Intuitively, a dynamical system is consid¬ 
ered stable if its solutions do not change much 
when the system is only slightly perturbed. 
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There are different ways to perturb a system: 
changing parameters in its equations, changing 
the known functions of the system by a small 
amount, or changing the initial conditions. 

Consider an equilibrium solution of a dynam¬ 
ical system, that is, a solution that is time in¬ 
variant. If a stable system is perturbed when it 
is in a position of equilibrium, it tends to return 
to the equilibrium position or, in any case, not 
to diverge indefinitely from its equilibrium po¬ 
sition. For example, a damped pendulum—if 
perturbed from a position of equilibrium—will 
tend to go back to an equilibrium position. If 
the pendulum is not damped it will continue to 
oscillate forever. 

Consider a system of n equations of first or¬ 
der. (As noted above, systems of higher orders 
can always be reduced to first-order systems 
by enlarging the set of variables.) Suppose that 
we can write the system explicitly in the first 
derivatives as follows: 

dyi ,. . 

— = fi(x, y\,, y n ) 

dyi ,, , 

-^ = f 2 {x,yi,...,y n ) 

dy n , , . 

— fn(x, y\,, y n ) 

If the equations are all linear, a complete the¬ 
ory of stability has been developed. Essentially, 
linear dynamical systems are stable except pos¬ 
sibly at singular points where solutions might 
diverge. In particular, a characteristic of linear 
systems is that they incur only small changes in 
the solution as a result of small changes in the 
initial conditions. 

However, during the 1970s, it was discovered 
that nonlinear systems have a different behav¬ 
ior. Suppose that a nonlinear system has at least 
three degrees of freedom (that is, it has three in¬ 
dependent nonlinear equations). The dynamics 
of such a system can then become chaotic in 
the sense that arbitrarily small changes in ini¬ 
tial conditions might diverge. This sensitivity 


to initial conditions is one of the signatures of 
chaos. Note that while discrete systems such as 
discrete maps can exhibit chaos in one dimen¬ 
sion, continuous systems require at least three 
degrees of freedom (that is, three equations). 

Sensitive dependence from initial conditions 
was first observed in 1960 by the meteorolo¬ 
gist Edward Lorenz of the Massachusetts In¬ 
stitute of Technology. Lorenz remarked that 
computer simulations of weather forecasts 
starting, apparently, from the same meteoro¬ 
logical data could yield very different results. 
He argued that the numerical solutions of ex¬ 
tremely sensitive differential equations such as 
those he was using produced diverging results 
due to rounding-off errors made by the com¬ 
puter system. His discovery was published in 
a meteorological journal where it remained un¬ 
noticed for many years. 

Fractals 

While in principle deterministic chaotic sys¬ 
tems are unpredictable because of their sensi¬ 
tivity to initial conditions, the statistics of their 
behavior can be studied. Consider, for exam¬ 
ple, the chaos laws that describe the evolution 
of weather: While the weather is basically un¬ 
predictable over long periods of time, long-run 
simulations are used to predict the statistics of 
weather. 

It was discovered that probability distribu¬ 
tions originating from chaotic systems exhibit 
fat tails in the sense that very large, extreme 
events have nonnegligible probabilities. (See 
Brock, Hsieh, and LeBaron [1991] and Hsieh 
[1991].) It was also discovered that chaotic sys¬ 
tems exhibit complex unexpected behavior. The 
motion of chaotic systems is often associated 
with self-similarity and fractal shapes. 

Fractals were introduced in the 1960s by 
Benoit Mandelbrot, a mathematician working 
at the IBM research center in Yorktown Heights, 
New York. Starting from the empirical observa¬ 
tion that cotton price time-series are similar at 
different time scales, Mandelbrot developed a 
powerful theory of fractal geometrical objects. 
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Fractals are geometrical objects that are geomet¬ 
rically similar to part of themselves. Stock prices 
exhibit this property insofar as price time-series 
look the same at different time scales. 

Chaotic systems are also sensitive to changes 
in their parameters. In a chaotic system, only 
some regions of the parameter space exhibit 
chaotic behavior. The change in behavior is ab¬ 
rupt and, in general, it cannot be predicted ana¬ 
lytically. In addition, chaotic behavior appears 
in systems that are apparently very simple. 

While the intuition that chaotic systems might 
exist is not new, the systematic exploration of 
chaotic systems started only in the 1970s. The 
discovery of the existence of nonlinear chaotic 
systems marked a conceptual crisis in the phys¬ 
ical sciences: It challenges the very notion of the 
applicability of mathematics to the description 
of reality. Chaos laws are not testable on a large 
scale; their applicability cannot be predicted an¬ 
alytically. Nevertheless, the statistics of chaos 
theory might still prove to be meaningful. 

The economy being a complex system, the ex¬ 
pectation was that its apparently random be¬ 
havior could be explained as a deterministic 
chaotic system of low dimensionality. Despite 
the fact that tests to detect low-dimensional 
chaos in the economy have produced a sub¬ 
stantially negative response, it is easy to 
make macroeconomic and financial economet¬ 
ric models exhibit chaos. (See Brock, Dechert, 
Scheinkman, and LeBaron [1996] and Brock 
and Hommes [1997].) As a matter of fact, most 
macroeconomic models are nonlinear. Though 
chaos has not been detected in economic time- 
series, most economic dynamic models are non¬ 
linear in more than three dimensions and thus 
potentially chaotic. At this stage of the research, 
we might conclude that if chaos exists in eco¬ 
nomics it is not of the low-dimensional type. 

PARTIAL DIFFERENTIAL 
EQUATIONS 

To illustrate the notion of a partial differential 
equation (PDE), let's start with equations in two 


dimensions. An n-order PDE in two dimensions 
x,y is an equation of the form 

Ff,.,,g,g. ,/y,, ) 

\ dx 3 y d^xd^-^yJ 

= 0, 0 < k < i, 0 < i <n 

A solution of the previous equation will be any 
function that satisfies the equation. 

In the case of PDEs, the notion of initial con¬ 
ditions must be replaced with the notion of 
boundary conditions or initial plus boundary 
conditions. Solutions will be defined in a mul¬ 
tidimensional domain. To identify a solution 
uniquely, the value of the solution on some 
subdomain must be specified. In general, this 
subdomain will coincide with the boundary (or 
some portion of the boundary) of the domain. 

Diffusion Equation 

Different equations will require and admit dif¬ 
ferent types of boundary and initial conditions. 
The question of the existence and uniqueness 
of solutions of PDEs is a delicate mathematical 
problem. We can only give a brief account by 
way of an example. 

Let's consider the diffusion equation. This 
equation describes the propagation of the 
probability density of stock prices under the 
random-walk hypothesis: 

9/ 2 a 2 / 

dt dx 2 

The Black-Scholes equation, which describes the 
evolution of option prices, can be reduced to 
the diffusion equation. 

The diffusion equation describes propagat¬ 
ing phenomena. Call f(t,x ) the probability 
density that prices have value x at time t. In 
finance theory, the diffusion equation describes 
the time-evolution of the probability density 
function f(t,x) of stock prices that follow a ran¬ 
dom walk. 2 It is therefore natural to impose 
initial and boundary conditions on the distri¬ 
bution of prices. 

In general, we distinguish two different prob¬ 
lems related to the diffusion equation: the first 
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boundary value problem and the Cauchy initial 
value problem, named after the French math¬ 
ematician Augustin Cauchy who first formu¬ 
lated it. The two problems refer to the same 
diffusion equation but consider different do¬ 
mains and different initial and boundary con¬ 
ditions. It can be demonstrated that both 
problems admit a unique solution. 

The first boundary value problem seeks to 
find in the rectangle 0<x<l, 0<f<Ta 
continuous function f(t,x) that satisfies the dif¬ 
fusion equation in the interior Q of the rectangle 
plus the following initial condition, 

/(0, x) = (p(x), 0 < x < / 

and boundary conditions, 

= f(t,l) = f 2 (t), 0 < t <T 

The functions/i ,/2 are assumed to be continu¬ 
ous and /i(0) = </>(0),/ 2 (0) = </>(/). 

The Cauchy problem is related to an infinite 
half plane instead of a finite rectangle. It is for¬ 
mulated as follows. The objective is to find for 
any x and for t > 0 a continuous and bounded 
function /(f,x) that satisfies the diffusion equa¬ 
tion and which, for t — 0, is equal to a continuous 
and bounded function /(0, x) — cp(x), Vx. 


only on x: 


dhm g(x) = a>m dlgM 


dt 

dh(t) 


= a 


>d 2 g(x) 1 


dt h(t) " dx 2 g(x) 

This condition can be satisfied only if the two 
sides are equal to a constant. The original diffu¬ 
sion equation is therefore transformed into two 
ordinary differential equations: 

1 dh(t) 


i 2 dt 
d 2 g(x) 

dx 2 


= bh(t) 
= bg(x) 


with boundary conditions g( 0) = g(l) — 0. From 
the above equations and boundary conditions, 
it can be seen that b can assume only the nega¬ 
tive values, 

Fir 2 

b = —p-.k = 1,2,... 

while the functions g can only be of the form 

. , kjr 

g(x) = Bk sm —x 

Substituting for h, we obtain 


. / n 2 k 2 it 2 \ 

h(t) = B k exp ^- — tj 


Solution of the Diffusion Equation 

The first boundary value problem of the diffu¬ 
sion equation can be solved exactly. We illus¬ 
trate here a widely used method based on the 
separation of variables, which is applicable if 
the boundary conditions on the vertical sides 
vanish (that is, if/i(f) =fi(t) = 0). The method 
involves looking for a tentative solution in the 
form of a product of two functions, one that de¬ 
pends only on t and the other that depends only 
on x:/(f,x) = h(t)g(x). 

If we substitute the previous tentative solu¬ 
tion in the diffusion equation 



9 2 _l 

dx 2 


we obtain an equation where the left side de¬ 
pends only on t while the right side depends 


Therefore, we can see that there are denumer- 
ably infinite solutions of the diffusion equation 
of the form 


fx(t, x) = C k exp 


( a 2 k 2 n 2 \ kit 


All these solutions satisfy the boundary condi¬ 
tions f(t,0) — f(t,l) — 0. By linearity, we know 
that the infinite sum 


/(t,x) = 2>(t,x) 

k =1 
oo 

= X! Cfc exp 

k =1 


aVjt 2 \ kn 
—p—t) sm—x 


will satisfy the diffusion equation. Clearly/(f,x) 
satisfies the boundary conditions/(f,0) =f(t r l) 
= 0. In order to satisfy the initial condition, 
given that <p(x) is bounded and continuous and 
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that 0(0) = <p(l) = 0, it can be demonstrated 
that the coefficients Cs can be uniquely deter¬ 
mined through the following integrals, which 
are called the Fourier integrals: 

Ck =lf <M?)sin(^W 

o 

The previous method applies to the first 
boundary value problem but cannot be applied 
to the Cauchy problem, which admits only an 
initial condition. It can be demonstrated that 
the solution of the Cauchy problem can be ex¬ 
pressed in terms of a convolution with a Green's 
function. In particular, it can be demonstrated 
that the solution of the Cauchy problem can be 
written in closed form as follows: 




OO 



—oo 


(* - £) 2 

At 




for t > 0 and f(0,x) = 0(x). It can be demon¬ 
strated that the Black-Scholes equation, which 


is an equation of the form 

9/ i 2 2 a 2 / 9/ r n 

— + -ox —v + rx —— rf = 0 
9f 2 dx 2 dx J 

can be reduced through transformation of vari¬ 
ables to the standard diffusion equation to be 
solved with the Green's function approach. 


Numerical Solution of PDEs 

There are different methods for the numerical 
solution of PDEs. We illustrate the finite differ¬ 
ence methods, which are based on approximat¬ 
ing derivatives with finite differences. Other 
discretization schemes such as finite elements 
and spectral methods are possible but, being 
more complex, they go beyond the scope of this 
book. 

Finite difference methods result in a set of re¬ 
cursive equations when applied to initial con¬ 
ditions. When finite difference methods are 
applied to boundary problems, they require 
the solution of systems of simultaneous linear 
equations. PDEs might exhibit boundary con¬ 
ditions, initial conditions, or a mix of the two. 
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Figure 5 Solution of the Cauchy Problem by the Finite Difference Method 
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Figure 6 Solution of the First Boundary Problem by the Finite Difference Method 


The Cauchy problem of the diffusion equation 
is an example of initial conditions. The simplest 
discretization scheme for the diffusion equation 
replaces derivatives with their difference quo¬ 
tients. As for ordinary differential equations, 
the discretization scheme can be written as 
follows: 

9/ ^ f(t + At, x) - f(t, x) 

Jt At 

9 2 / ^ f(t, x + Ax) — 1 f(t, x) + f(t, x — Ax) 

9x 2 (Ax) 2 

In the case of the Cauchy problem, this ap¬ 
proximation scheme defines the forward re¬ 
cursive algorithm. It can be proved that the 
algorithm is stable only if the Courant- 
Friedrichs-Lewy (CFL) conditions 


Different approximation schemes can be 
used. In particular, the forward approximation 
to the derivative used above could be replaced 
by centered approximations. Figure 5 illustrates 
the solution of a Cauchy problem for initial con¬ 
ditions that vanish outside of a finite interval. 
The simulation shows that solutions diffuse in 
the entire half space. 

Applying the same discretization to a first 
boundary problem would require the solution 
of a system of linear equations at every step. 
Figure 6 illustrates this case. 

KEY POINTS 

* Basically, differential equations are equations 
that express a relationship between a function 
and one or more derivatives (or differentials) 
of that function. 

• The two classifications of differential equa¬ 
tions are ordinary differential equations and 
partial differential equations. The classifi¬ 
cation depends on the type of derivatives 


are satisfied. 
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included in the differential equation: ordinary 
differential equation when there is only an 
ordinary derivative and partial differential 
equation where there are partial derivatives. 

• Typically in differential equations, one of the 
independent variables is time. 

• The term stochastic differential equation 
refers to a differential equation in which a 
derivative of one or more of the independent 
variables is a random variable or a stochastic 
process. 

• Differential equations are conditions that 
must be satisfied by their solutions. Differ¬ 
ential equations generally admit infinite so¬ 
lutions. Initial or boundary conditions are 
needed to identify solutions uniquely. 

• Differential equations are the key mathe¬ 
matical tools for the development of mod¬ 
ern science; in finance they are used in 
arbitrage pricing, to define stochastic pro¬ 
cesses, and to compute the time evolution of 
averages. 

• Differential equations can be solved in closed 
form or with numerical methods. Finite 
difference methods approximate derivatives 
with difference quotients. Initial conditions 
yield recursive algorithms. 

• Boundary conditions require the solution of 
linear equations. 


NOTES 

1. The condition of existence and continuity of 
derivatives is stronger than necessary. The 
Lipschitz condition, which requires that the 
incremental ratio be uniformly bounded in a 
given interval, would suffice. 

2. In physics, the diffusion equation describes 
phenomena such as the diffusion of particles 
suspended in some fluid. In this case, the 
diffusion equation describes the density of 
particles at a given moment at a given point. 
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Abstract: Partial differential equations are useful in finance in various contexts, in particular for 
the pricing of European and American options, for stochastic portfolio optimization, and for cal¬ 
ibration. They can be used for simple options as well as for more exotic ones, such as Asian or 
lookback options. They are particularly useful for nonlinear models. They allow for the numerical 
computations of several spot prices at the same time. Numerical aspects, discretization methods, 
algorithms, and analysis of the numerical schemes have been under constant development during 
the last three decades. Finite difference methods are the simplest and most basic approaches. Finite 
element methods allow the use of nonuniform meshes and refinement procedures can then be 
applied and improve accuracy near a region of interest. Deterministic approaches based on partial 
differential equation formulations can also be used for calibration of various volatility models (such 
as local, stochastic, or Levy-driven volatility models) and by making use of Dupire's formula. Cur¬ 
rent research directions include the development of discretization methods for high-dimensional 
problems. 


Numerical methods based on partial differen¬ 
tial equations (PDEs) in finance are not very 
popular. Indeed, the models are usually de¬ 
rived from probabilistic arguments and Monte 
Carlo methods are therefore much more nat¬ 
ural. Stochastic methods are also often simpler 
to implement than the algorithms used for solv¬ 
ing the related PDEs. However, when it is pos¬ 
sible to efficiently discretize the PDE (which 


is not always the case, the typical counterex¬ 
ample being high-dimensional problems), de¬ 
terministic methods are usually more efficient 
than stochastic ones. Moreover, the solution 
to the partial differential equation gives more 
information. In the context of option pricing, 
one obtains, for example, the price of the op¬ 
tion for all values of the maturity and for all 
spot prices, while the probabilistic formulation 
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typically gives the value of the option for a fixed 
maturity and a fixed spot price. In particular, 
this is useful for computing derivatives of the 
option's price with respect to some parameters 
of the model (the so-called "Greeks"). 

The PDEs obtained in finance have sev¬ 
eral characteristics. First, they are posed on a 
bounded domain in time (0, T), with typically 
a singular final condition at the maturity t = T, 
and very often in an unbounded domain in 
the spot variable, which requires to impose 
suitable "boundary conditions" at infinity to 
get well-posed problems and to use appropri¬ 
ate numerical approximations (truncation to a 
bounded domain and artificial boundary condi¬ 
tions). These PDEs are usually of parabolic type, 
but often with degenerate diffusions. Because 
of operational constraints, the numerical meth¬ 
ods used for the discretization of the PDE must 
be sufficiently fast and accurate to be useful in 
practice. These peculiarities of PDEs in finance 
explain the need for up-to-date and sometimes 
involved numerical methods. 

In this entry we focus on numerical issues 
and try to review the main numerical meth¬ 
ods used for solving PDEs in finance. This 
presentation heavily relies on Achdou and 
Pironneau (2005), as well as Lamberton and 
Lapeyre (1997), Karatzas and Shreve (1991), 
and Wilmott, Dewynne, and Howison (1993). 

PARTIAL DIFFERENTIAL 
EQUATIONS FOR OPTION 
PRICING 

In this section, we present the main argu¬ 
ments to derive a PDE for the price of various 
European and American options. 

A Primer: The Black and Scholes 
Model for European Options 

The aim of this section is to recall the basic tools 
needed to derive a PDE in the context of op¬ 
tion pricing, without providing all the detailed 
assumptions required on the data to perform 


this derivation. Karatzas and Shreve (1991) and 
Lamberton and Lapeyre (1997), for example, 
provide more details on the mathematical as¬ 
pects. We adopt the standard Black and Scholes 
model (Black and Scholes, 1973; Merton, 1973) 
with a risky asset whose price at time t is St and 
a risk-free asset whose price at time t is S f °, such 
that: 

dS t = dt + adB t ), dS° t — rS° t dt 

The process B t is a standard Brownian motion 
defined on a probability space (D, T, T t , Q), 
and ii (the mean rate of return), r (the interest 
rate), and a > 0 (the volatility) are three con¬ 
stants. However, the following can be gener¬ 
alized to the case where /i, r, and cr > 0 are 
functions of t and S (under suitable smoothness 
assumptions). 

We introduce the stochastic process W) = B t + 

Under the so-called risk-neutral probabil¬ 
ity P defined by its Radon-Nikodym derivative 
with respect to Q by 



Wf is a Brownian motion and St/Sf is a mar¬ 
tingale. This is one of the fundamental prop¬ 
erties of the stochastic process needed in the 
following. The process S t satisfies the following 
stochastic differential equation (SDE) under P: 

dS t = St(r dt + adW t ) (1) 

Let us now consider a portfolio with H f risky 
assets and Hf no-risk assets. Its value at time t 
is: 

Pt = H f S f + H f °S f ° (2) 

We suppose that this portfolio is self-financing 
(any manipulation on this portfolio, i.e., any 
change of the values of Hf or Hf, is done with¬ 
out any inflows or outflows of money), which 
translates into 

dP t = H t dS t + Hf dS° (3) 

The value of a self-financing portfolio changes 
if and only if the price of the risky asset changes. 
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Using (3), it is possible to show that P t / S f ° is also 
a martingale. 

We consider the following problem: For a 
given function <p (the payoff function) and 
a given time T > 0 (the maturity), is it pos¬ 
sible to build a self-financing portfolio such 
that Pt = (p(Si)? Classical examples of function 
(p are cp(S) = (S — K) + (vanilla call) or (p(S ) = 
(S — fC)_ (vanilla put), where, for any real x, 
x + = max(x, 0) and x_ = max(-x, 0). The an¬ 
swer is positive (this is typically based on a 
martingale representation theorem, the fact that 
Pt / Sf is a martingale, and the fact that the payoff 
4>(St) is .Fr-measurable), and it is then possible 
to show that such a portfolio has the following 
value at time t: 



<P(St) 




( 4 ) 


where here and in the following, E denotes 
an expectation with respect to the risk-neutral 
probability P. By the so-called arbitrage-free 
principle, Pt is actually the "fair price" at time f 
of the option, which enables its owner to get the 
payoff (p(Sr ) at time T. In the particular context 
of vanilla options, the solution is analytically 
known, at least if r and a are constant: This is 
the celebrated Black and Scholes formula. How¬ 
ever, in the case when r and a are functions of 
f and S, (4) cannot be estimated without a nu¬ 
merical method. We are interested in determin¬ 
istic numerical methods, based on a PDE related 
to (4). 

The second fundamental property of the 
stochastic process St required to obtain a PDE 
formulation of this problem is a Markov prop¬ 
erty. Roughly speaking, it states that the expec¬ 
tation of any function of (St)o<t<T conditionally 
to T t is actually a function of the price S t of the 
risky asset at time t. In our context, this property 
shows that Pt writes 


P, = p(t, S t ) (5) 

where p is a function of f e [0, T]andS e [0, oo), 
called the pricing function of the option. Notice 
that even if (5) only involves the value of p at 


point (t, S t ), the pricing function p is a deter¬ 
ministic function defined for all values of f > 0 
and S > 0. By the Markov property of Sf, we 
also have the following representation formula 
for p: 

p(t, x) — E ^exp r dsj ‘/'(St*)^ (6) 

where (S‘ 0 ’ x )t<$<T denotes the process solution 
to (1) starting from x at time f 

[ dS*g x = S l g x {r do + adWg), 0>t, 

1 s!’ x = x (7) 


By using Ito's calculus and the fact that P ( /S f ° 
is a martingale, we then obtain that p should 
satisfy the following backward-in-time PDE: 


dp 

~dt 


c d P 
■ rS — 

dS 


a 2 S 2 d 2 p 
~2TJS 2 


— rp = 0, 


P(T, S ) = 0(S) 


( 8 ) 


Conversely, it is possible (using again a mar¬ 
tingale representation theorem) to show that if 
p satisfies (8), then p(t, S t ) is the value of a self¬ 
financing portfolio with value <P(St) at time T. 

dp 

Moreover, one can check that —(t, St) = H t , 

dS 

which shows that obtaining an accurate ap- 
. dp 

proximation of — is important m order to es- 

dS 

timate the quantity of risky asset Ht needed 
at time t to build the portfolio with value Pt 
(this is the hedging strategy). Collectively, equa¬ 
tions (4)-(5) and (8) provide an example of so- 
called Feynman-Kac formulas, which are used 
in many other contexts (quantum chemistry or 
transport equations, for example) either to give 
a probabilistic interpretation to a PDE, or to re¬ 
cast the computation of an expectation into a 
PDE problem. 

For problem (8) to be well posed (i.e., for 
one and only one solution to exist), one needs 
to supply the system with "boundary condi¬ 
tions" when S = 0 or S -* oo. More precisely, 
one needs to make precise in which functional 
space the function p is looked for. This will be 
explained in the next section. 





662 


Finite Mathematics for Financial Modeling 


From the PDE (8) and the so-called maxi¬ 
mum principle, it is possible to derive many 
qualitative properties and a priori bounds on 
the price p (like the call-put parity, for exam¬ 
ple; see Achdou and Pirormeau, 2005). Roughly 
speaking, the maximum principle states that 
if the data (initial condition, boundary condi¬ 
tions, right-hand side) for the PDE (8) are posi¬ 
tive, then the solution is positive. This property 
is definitely necessary to hold for a price. It is 
also an important property to check on the nu¬ 
merical schemes (which is then called a discrete 
maximum principle as discussed below). 

It is also possible to obtain the PDE with¬ 
out introducing the risk-neutral probability (see 
Wilmott, Dewynne, and Howison, 1993) by con¬ 
sidering a portfolio containing some options 
and some risky assets and by using an arbitrage- 
free argument. 

It is important to recall that the Black and 
Scholes model for the evolution of the risky 
asset (1) badly compares with experimental 
data. We discuss later in this entry some pos¬ 
sible refinements that have been introduced in 
order to better fit the observations (see the dis¬ 
cussion on calibration below). However, this 
model remains very important in practice be¬ 
cause it is used as a prototypical description of 
the evolution of the asset. Moreover, for a given 
observed price of a derivative, there exists a 
constant volatility a (called the implied volatil¬ 
ity; see the section on calibration below) for 
which the Black-Scholes price is the observed 
price. The implied volatility is a major quantity 
used in practice to compare derivatives. 

Other Options 

The argument presented for the Black-Scholes 
model is prototypical. In particular, the deriva¬ 
tion of a PDE satisfied by the pricing function of 
an option always relies on the two fundamental 
properties stressed above: the martingale and 
the Markov properties of a suitable stochastic 
process. In this section, we present PDEs for the 
prices of various options without providing all 
the details of the derivation. 


Basket Options 

In many cases, the payoff of the option depends 
on the values of more than one asset, which typ¬ 
ically do not evolve independently. Let us, for 
example, consider the case of two assets, which 
evolve following the following SDE under the 
neutral risk probability 

' dS) = Sj (r dt + aidWj'j 

dSf = Sj (r dt + o 2 dwf\ 


where Wj 1 and W 2 are possibly correlated stan¬ 
dard Brownian motions. We call p the corre¬ 
lation of W/ and W t 2 : d(W|, W 2 ), = pdt. We 
suppose that the maturity is T > 0 and the 
payoff is 0( S], S 2 ), where 0 is a given function. 
It is then possible to show that the price of the 
option at time t is p(t, Sj, Sj) where p satisfies 


: ti +rSl ^ +rS2 ^ 

at as, as 2 

ct|S| 3 2 p 

+ T^ + '* mS,S! «4 
p(T, S 1; S 2 ) = 0(Si,S2) 


"i 2 S? d 2 P 
2 3S 2 
d 2 p 


— rp = 0, 


( 9 ) 


Here again, r, o\, and er 2 may be functions of t 
and (Si, S 2 ). It is possible to solve such PDEs by 
standard numerical methods up to dimension 
3 or 4. As discussed later, to derive appropri¬ 
ate discretization for higher dimensions is not 
an easy task and is still the subject of current 
research. 


Barrier Options 

Again, let us consider an option on a single as¬ 
set. For some options, the payoff becomes 0 if 
there exists a time t e [0, T] such that S t goes 
below a or above b, where a and b are two given 
values, 0 < a < b (the case a = 0 or b = oo can 
be treated similarly). Mathematically, the pay¬ 
off is l V fe[o,r], s,e[fl,6]</>(Sr) where, for any event 
A C D, 1 /i denotes the characteristic function 
of A, and St satisfies (1). In this case, the rele¬ 
vant stochastic process for deriving the PDE is 
S tAr , where r = infjt e [0, T], S t > b or S t < a } 
is a stopping time, and, for any real x and y, 
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x Ay — infix, y). It can be checked that S t/Z is a 
Markov process, and that S fAT /S f ° Ar is a martin¬ 
gale. It is then possible to show that the price of 
the option at time f is p(t A r, S fAr ) where p is 
defined for f e [0, T] and S e [a, b] and satisfies: 


dp 3 p a 2 S 2 d 2 p 

dt dS 2 dS 2 P 

P(T, S) = 0(S), 
p(t,a) = p(t , b) = 0 


( 10 ) 


Here again, r and a may be functions of t and 
S. Moreover, the generalization to basket op¬ 
tions is straightforward, as explained above. 
In this case, it is possible to consider more 
general barriers, namely a payoff of the form 
1 vte[0,TUSt,s?,...,sf)ev ( l>( s T), where d denotes the 
number of underlying assets and V is any sim¬ 
ple connected domain of R rf . The appropri¬ 
ate discretization for general domains V is the 
finite element method that will be discussed 
later on. 


satisfies: 


_ J 2 r.d 2 W 3 w 

— —f —r r f — — r w = 0, 

3f 2 s 3f 2 5 3f 

w(7\§) = 0(f), 

jr(t, 1 ) = w(t, 1 ) 

of 


( 12 ) 


Notice that this reduction is not gener¬ 
ally possible for (f, S, M)-dependent interest 
rate and volatility (except for very peculiar 
dependencies). 


Options on the Average 

Some options (the so-called Asian options) in¬ 
volve the average of the risky asset. More pre¬ 
cisely, the payoff writes (/(Sr, A T ) where A, = 
j /J S r dr and S t satisfies (1). One can check that 
(S t , A t ) is a Markov process. Using this property, 
it is possible to show that the price of the op¬ 
tion at time f is p(t , St, A t ) where p is defined for 
f e [0, T] and (S, A) e [0, oo) 2 , and p satisfies: 


Options on the Maximum 

For some options (the so-called lookback op¬ 
tions), the payoff involves the maximum of the 
risky asset. For example, it writes 0(Sr, Mr) 
where Mf = maxo< r <t S r and S t satisfies (1). 
One can check that (St, M) is a Markov pro¬ 
cess. It is then possible to show that the price 
of the option at time t is p(t, S f , M t ) where p 
is defined for t e [0, T] and (S, M) e {(S, M) e 
R 2 , 0 < S < M} and satisfies: 



(T 2 t d 2 p 3 p 

Y s d +rs is- rt ’= 


p(T, S, M) = 0(S, M), 


dp 

a m 


(t, s, s) = o 


(ii) 


If the payoff is of the form 0(S, M) = M0(S/M), 
it is possible to reduce the problem to a two- 
dimensional one (including the time variable). 
Indeed, one can check by straightforward com¬ 
putations that p(t, S, M) = Mw(t, S/M) where 
a; is a function of t e [0, T] and f e [0,1], which 


dp a 2 S 2 d 2 p dp 

~dt + ~2TJs 2+rS dS 
+ )(S-A)/l-rV = 0, 
p(T, S, A) = 0(S, A) 


(13) 


In some cases (see Rogers and Shi, 1995), it 
is possible to reduce this problem to a one¬ 
dimensional PDE. More precisely, for fixed 
strike call ( <p(S, A) = (A — K) + ) or fixed strike 
put (0(S, A) = (K — A) + ), we have p(t, S, A) = 
Sf(t, K ~g A/T ) where/ satisfies 


3/ <7 2 f 2 d 2 f 

3f + 



3? 


= 0, 


m^)=m 

(14) 

and 0(f) = f_ (resp. 0(f) = f + ). This reduc¬ 
tion of (13) to (14) is also possible for float¬ 
ing strike call (0(S, A) = (S — A) + ) (resp. for 
floating strike put (0(S, A) — (A — S) + )) by set¬ 
ting p(t, S, A) = Sf(t, -h|) and 0(f) = (l + f)+ 
(resp. 0(f) = (1 + f )_). However, this reduction 
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is generally not possible for general payoff 
function or (f, S, A)-dependent interest rate and 
volatility (except for very peculiar dependen¬ 
cies). 

Bermudean Options 

As a transition between European and Amer¬ 
ican options, we would like to mention that 
it is very easy to price Bermudean options 
with the PDE approach. For such options, the 
contract can be exercised only at certain days 
between the present time and the maturity. 
Mathematically, for an option on a single as¬ 
set (the spot price is called S) and if <p de¬ 
notes the payoff, the pricing function satisfies 
p(f ; + , S) = max(p(f ! “, S), <p(S )), at each exercis¬ 
ing time f„ and (8) between the exercising times; 
see Duffie (1992, p. 211). 

The Case of American Options 

We have so far presented so-called European 
options, that is, some options that enable their 
owners to get (p(S-[) at a fixed time T. On the 
other hand, American options can be exercised at 
any time up to the maturity. Hence the price of 
an American option of payoff <p and maturity 
T will be the maximum of all possible expecta¬ 
tions such as (6) for stopping times r between t 
and T, that is, for t e [0, T] and x > 0, 

p(t,x)= sup E (e~ d > rds (p(S t T ’ x )) (15) 

T€7J fiT ] 

where denotes the set of stopping times r 
of the filtration Tt, with values in [t, T]. 

The PDE for American Options 

We now present the main arguments to derive 
a PDE on p defined by (15) (or more precisely a 
system of partial differential inequalities). 

Notice first that taking r = t in (15) yields the 
inequality 

p(t, x) > <p(x) (16) 

Moreover, we clearly have from (15) p(T, x) — 
</>(*)• 


Let t and St be such that 0 <t<t +St <T. 
From (15) we have: 


,-C“rds 


p (t + St, S t ’ +st ) 


sup E 

Te^ii+sf.Tj 


< e 


(e~J'° rds <p(s t t +Si ' S, ^y 

sup E(W° rds 0(S*’*)), 
p(t,x) 


T€.T[ f,T] 

-lords 




where we have used the fact that: S z 
Si' 1 . By Ito's calculus (taking the limit <5f —>• 0), 
we thus obtain 


-|+^ P >0 


(17) 


where we have introduced the linear PDE 
operator 

a 2 S 2 d 2 p 


Ap = -rS— - 
p 9S 


■ rp 


2 as 2 

Combined with (16), we then obtain 


(18) 


+ Ap, p - (p^j >0 (19) 

Our aim is now to show that the inequality 
in (19) is actually an equality. This is done in 
several steps, and requires us to identify an op¬ 
timal stopping time r* for which the supremum 
in (15) is obtained. For a fixed ( t, x), let us intro¬ 
duce the stopping time r* e 7\tj \ defined by 

r* = inf [0 >t, p ( 6, Sg’ x ) = (p (Sg *)} , a.s. 

( 20 ) 

(notice that r* < T since p(T, x) = cp(x)). It can 
be shown (see Appendix) that 

p(t,x) = E(e-f rds cp(S t r ’, x y 

= E(e-f‘ ,%ds p(T*,S t t ’. x )\ (21) 

Using a decreasing property (65) proved in 
the Appendix, one then obtains that for any 
St > 0, 

p(t,x) = E(e-f? rds p(r; t ,S t r ’?f), 
where tg t = (t + St) A r* 


(22) 
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This can be seen as a dynamic programming 
principle (or Bellman's principle). For a Euro¬ 
pean option we would have more simply 

p(t,x) = E(e-ft ,+S,rds p{t + 8t,S t t ’* st )) 

Now if we suppose that p(t, x) > then 
for any St > 0 we have P(r| t > f) = 1. Consid¬ 
ering Ito's formula in (22), and by (17), we ob¬ 
tain (— + Ap)(9, Sg X ) = 0 for t <6 < r* ( , thus 

leading to (— ff + Ap)(t, x) = 0. This shows that 
the inequality in (19) is actually an equality. 

Hence the PDE for the American option is 

/ dp 

mml-— + Ap,p- ( p 

t e [0, T], x > 0, 
p(T, x) = <p(x), x > 0 

where A is defined by (18). The major difference 
between the PDE (23) for American options and 
the PDE (8) for European options is that (23) 
is a nonlinear equation. This makes the theory 
of existence and uniqueness as well as the nu¬ 
merical approximation more difficult than for 
European options. 

In the presentation above, we have used Ito's 
formula, which requires that p is C 1 in time and 
C 2 in the spot variable. This is not true in gen¬ 
eral. It is however possible, following the same 
lines, to prove that p is a weak solution to (23) in 
the viscosity sense. For a historical derivation 
of this PDE, see Bensoussan and Lions (1978) 
or El Karoui (1981) where a variational formu¬ 
lation of (23) is derived (see (52) below). We 
also refer to Oksendal and Rekvam (1998) for 
an infinite horizon-related problem, Crandall, 
Ishii, and Lions (1992) for general results, Pham 
(1998) for an approach of optimal stopping in¬ 
cluding jump diffusion processes, and to Barles 
(1994) for the case of a discontinuous payoff 0. 



PRICING EUROPEAN 
OPTIONS WITH PDEs 

The aim of this section is to present two classes 
of methods for solving partial differential equa¬ 


tions with some applications to the PDEs de¬ 
rived previously. We first introduce th efinite dif¬ 
ference method, which is based on approximation 
of the differential operators by Taylor expan¬ 
sions, and then the finite element methods, which 
belong to the wider class of Galerkin methods 
and are based on a variational formulation of 
the PDE. We try to stress the most important 
aspects of the numerical methods and refer, for 
example, to Achdou and Pironneau (2005 and 
2009) for a more comprehensive presentation. 


The Finite Difference Method for 
European Options 

We first present the simplest approach to dis¬ 
cretize a PDE: the finite difference method. 


Basic Schemes 

Let us introduce the finite difference method 
on the simple PDE (8). Let us first concentrate 
on the discretization of (8) with respect to the 
variable S. The principle is to divide the interval 
[0 ? S m ax] into I intervals of length SS — Sni a X /I 
(where S max has to be chosen large enough, see 
below), and to approximate the derivatives by 
finite differences. A possible semidiscretization 
of (8) is: for i e {0,1,..., I }, 


Pi -1 


— + rSi — - 

dt 2 SS 

a 2 S 2 P,- +1 - 2 Pj + P,-_! 

2 SS 2 


rPi = 0, 


PfT) = <p(Si) 

(24) 

where S, = iSS denotes the z-th discretization 
point, and P, (f) is intended to be an approxima¬ 
tion of p(t, Sf. Now, (24) is a system of cou¬ 
pled ordinary differential equations (ODEs). 
The generalization to the case of a time and 
spot dependent r or a is straightforward. 

Notice that for S = 0, Po can be solved 
independently (since So = 0): Po(f) = 0(0) 
exp(— f' r ds). In order to obtain a solution 
of the whole system of ODEs, one needs to 
define an appropriate boundary condition at 
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S = S max . Indeed, (24) taken at i = I involves 
Pj _|_i which is a priori not defined. There are ba¬ 
sically two methods to deal with this issue. The 
first one consists of using some a priori knowl¬ 
edge on the values of p(t,S) when S is large and 
making some approximations of p(f, S max ). In 
this case, the value of Pj is given as a data 
(this is a so-called Dirichlet boundary condi¬ 
tion), and the unknowns are (P;)o<;</-i- For ex¬ 
ample, in the case of a put (</>(S) = (S — fC)_) 
(resp. a call (0(S) = (S — K) + )), it is known 
that limj^oc p(t, S) = 0 (resp., in the limit 
S -*■ oo, p(t, S) ~ S — K exp(— f t T r ds)), so that 
one can set P/(f) = 0 (resp. Pi{t) = S max — K 
exp(—/ f T r ds)). The error introduced by these 
artificial boundary conditions can be estimated. 
Another method is based on some knowledge 
on the asymptotic behavior of the derivatives of 
p. For example, in the case of the put, one can use 
the so-called homogeneous Neumann bound¬ 
ary condition, which writes dp/dS(t, S max ) = 0 
at the continuous level and p, + 1 ^~ p d t ) _ q a (- 
the discrete level. In this case, the unknowns 
are (P;)o<;<t- For both methods, S max should be 
chosen sufficiently large. In practice, the quality 
of the method may be assessed by measuring 
how sensitive the result is to the value of S max . 

Let us now consider the time discretization. 
Here again, the idea is to divide the time interval 
[0, T] into N intervals of length St = T/N and to 
replace the time derivative by a finite difference. 
Three numerical methods are classically used: 


pn+1 


rSi 


nii+i _ pn+i 
1 i +1 1 i —1 


St 2 SS 

r 2( 


a 


c2 pn+1 _ ip»+l i pn +1 
1L i+1 ' + i ~ 1 _ r p n + 1 


SS 2 


P? = <KSi) 


=0, 


(25) 


pn + l _ pn c pn +i _ pn^ 

St + ' 2 SS 

, Pf+r — 2P” + P"_i 


SS 2 


- rPf = 0, 


P, N = HSi) 


( 26 ) 


or 


5«+1 


St 


2 1 rS 


o»+i 

i+l 


pn+1 
1 i —1 


r 2 S? P, 


n +1 

i+l 


2 SS 
2 P” +1 + PN 


n +1 
1 


rP" 


n+1 


rSi 


2 SS 2 

P n — P 11 „2c2 pn _ 9 pn 1 pn 

1 i+l 1 i—1 . a 1 i+l AL i ^ 1 i—1 


2 SS 


SS 2 


- rP'f^ = 0 , 

P, N = cf>{Sj) 

(27) 

where P" is intended to be an approximation 
of p(t n , Si), with f„ = nSt. Notice that using the 
discretization scheme (25) (the so-called explicit 
Euler scheme), the values of (P")o<;<i are ex¬ 
plicitly obtained from the values of (P/ ,+1 )o<;<:• 
On the contrary, in the two other schemes (26) 
(implicit Euler scheme) or (27) (Crank-Nicolson 
scheme), the values of (P”)o<;<; are obtained 
from the values of (P,” +1 )o<;</ through the res¬ 
olution of a linear system, which is more de¬ 
manding from the computational viewpoint. 
Various numerical methods can be used for 
solving this linear system; here, we cannot de¬ 
scribe them in detail. Let us simply mention 
that basically, there exist two classes of meth¬ 
ods: the direct methods, which are based on 
Gaussian elimination, and the iterative meth¬ 
ods, which consist of computing the solution 
as the limit of a sequence of approximations 
and which only require matrix-vector multipli¬ 
cations. The method of choice depends on the 
characteristics of the problem. 


Notions of Stability and Consistency 

In order to analyze the convergence of the three 
discretization schemes (25), (26), and (27), and 
to understand the differences between these 
schemes, we need to introduce two important 
notions. The first notion is the consistency. A 
numerical method is said to be consistent if, 
when the exact solution is plugged into the nu¬ 
merical scheme, the error tends to zero when 
the discretization parameters tend to zero. In 
our context, it consists of replacing P" in (25), 
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(26), or (27) by p(t n , S,), where p satisfies (8), 
and to check that the remaining terms tend to 
zero when St and SS tend to zero. By using 
Taylor expansions, one can check that for (25) 
and (26) (resp. for (27)), the remaining terms 
are bounded from above by C(St + SS 2 ) (resp. 
by C(St 2 + SS 2 )), where C denotes a constant, 
which depends on some norms of the deriva¬ 
tives of p. Therefore (25) and (26) (resp. (27)) 
are consistent discretization schemes of order 2 
in the spot variable, and of order 1 (resp. 2) in 
time. The second important notion is the sta¬ 
bility. A numerical method is said to be sta¬ 
ble if the norm of the solution to the numerical 
scheme is bounded from above by a constant 
(independent of the discretization parameters) 
multiplied by the norm of the data (initial con¬ 
dition, boundary conditions, right-hand side). 
This property is clearly satisfied if the numerical 
method is convergent, that is, if the numerical 
approximation converges to the solution of the 
PDE when the discretization parameters tend 
to zero. A general result states that, conversely, 
a consistent and stable discretization scheme 
is indeed convergent. The estimate of conver¬ 
gence is given by the estimate of consistency 
error. For example, if p is smooth enough, the er¬ 
ror for the El scheme is bounded from above by 
C(St + SS 2 ). Notice that the constant C in these 
estimates depends on the solution p. Higher or¬ 
der schemes will lead to better error estimates 
as soon as the solution of the continuous prob¬ 
lem is smooth enough: The higher the order, the 
more regular p must be in order to take full ad¬ 
vantage of the scheme. For example, for some 
parameters, it may happen that the results ob¬ 
tained with the CN scheme around t — T are 
not better than those obtained with an order 
one scheme (IE or EE) since the solution is not 
sufficiently regular in time around t = T. 

To give a precise meaning to all these results 
would require us to specify the norms used 
to measure the errors and to prove the stabil¬ 
ity. Let us simply mention that two norms are 
used in practice: The stability in L 00 -norm (the 
supremum of the absolute values of the com¬ 
ponents) is related to a discrete maximum prin¬ 


ciple (see below); and the stability in L 2 -norm 
(the Euclidean norm of the vector) is related to 
an energy estimate on the variational formu¬ 
lation. We refer, for example, to Achdou and 
Pironneau (2005) for more details. 

The discrete maximum principle is the coun¬ 
terpart at the discrete level of the maximum 
principle at the continuous level mentioned 
above. It states that if the data for the numerical 
schemes are positive, then the solution is posi¬ 
tive. Such schemes are by construction stable in 
L°°-norm. There exist deterministic numerical 
methods based on a probabilistic representation 
of the stock evolution on a binomial or a trino¬ 
mial tree. Such methods can be interpreted as 
explicit finite difference methods to solve the 
PDE (8) and naturally satisfy a discrete maxi¬ 
mum principle. 

Convergence Analysis 

Let us now discuss the properties of the three 
discretization schemes. We already mentioned 
that they are all consistent. On the other hand, 
it can be shown that the explicit scheme (25) 
is stable under an additional assumption (a so- 
called CFL condition; see Courant, Friedrichs, 
and Lewy, 1967) of the form St < CSS 2 , where 
C denotes a positive constant. The other two 
schemes (26) and (27) are unconditionally sta¬ 
ble (in L 2 -norm). In conclusion, with the ex¬ 
plicit scheme, the values of (P" )o<;< / can be very 
rapidly obtained from the values of (P, n+1 )o<;</, 
but the time step must be sufficiently small with 
respect to the spot step to guarantee stability 
and hence convergence. On the other hand, the 
implicit schemes (26) and (27) require the res¬ 
olution of a linear system at each time-step, 
but converge without any restriction on the 
time-step. This situation is very general for the 
parabolic PDEs obtained in finance. In terms 
of computational costs, the balance is gener¬ 
ally in favor of the implicit schemes, since the 
CFL condition appears to be very stringent in 
practice. Concerning the stability in L°°-norm, 
let us just mention that the implicit schemes 
above do not satisfy the discrete maximum 
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Table 1 Error on the Value of a Call in Function of the 
Number of Intervals I in the Variable S, for the Implicit 
Euler (IE) Scheme 


N - 1000 

I = 150 

I = 300 

I = 600 

I - 1200 

IE 

0.165 

0.0356 

0.00103 

0.000452 


principle and are not L “-stable as such. These 

properties are, however, satisfied after a small 

modification of the discretization of the advec- 
dp 

tion term rS — (this is a so-called upwinding 
3 S 

technique), which amounts to adding a dif¬ 
fusion term of order SS, which implies that 
this modified scheme becomes only of order 
1 in the spot variable. Thus, the price to pay 
to get L “-stability is a loss of one order of 
convergence. 

In Tables 1 and 2, we illustrate this analysis by 
computing the error on the price of a call with 
r = 0.1, a = 0.01, K = 100, T = 1, S 0 = 100, and 
S max = 300 for the three discretization schemes 
(25), (26), and (27), and various values of the 
numerical parameters I and N. The reference 
value (P — 9.51625) is obtained by the analytic 
Black and Scholes formula. In particular, one 
can check that the rates of convergence with 
respect to St and SS are indeed those predicted 
by the analysis. 

Before presenting an extension of this dis¬ 
cretization method to Asian options, we men¬ 
tion the interest of a classical change of variable 
for the spot variable. It is indeed well known 
that by a change of variable x = In S, it is pos¬ 
sible to get rid of the dependency in S of the 
advection and diffusion terms in (8). It is not 
better to discretize the PDE after this change of 


variable, since it corresponds to taking a grid 
refined near S = 0, which is useless in this case. 
As we will see below, what actually matters is 
to refine the grid around the singularity of p 
(i.e., around S = K). A finite element approach 
is better suited in order to implement these 
refinements. 

Application to Asian Options 

We now present a less easy implementation 
of a finite difference method for pricing Asian 
options (see Dubois and Lelievre, 2005). More 
precisely, we focus on computing numerical so¬ 
lutions to (14) for a fixed strike call: 

m =( 28 ) 

We have seen in the previous section that a 
simple finite difference scheme leads to very 
satisfactory results when computing the solu¬ 
tion of the classical Black-Scholes equation (8). 
On the other hand, when one uses a simple fi¬ 
nite difference scheme on (14), very bad results 
are obtained, especially when the volatility a is 
small (see Table 1 in Dubois and Lelievre, 2005). 
These bad results are due to the fact that when 
£ is close to zero, the advection term (^ + r£) is 
much larger than the diffusion term a 2 f 2 /2 in 
(14). This is known to deteriorate the stability of 
the numerical scheme, particularly with respect 
to the L“-norm. In practice, the numerical solu¬ 
tion exhibits some oscillations and does not sat¬ 
isfy the discrete maximum principle. Moreover, 
the finite difference method introduces numer¬ 
ical diffusion, which leads to unsatisfactory re¬ 
sults for purely advective equations. 


Table 2 Error on the Value of a Call in Function of the Number of Time-Steps N 


I = 500 

N= 5 

O 

rH 

II 

£ 

N= 20 

3 

II 

o 

O 

CO 

II 

£ 

N = 160 

EE 

28.53 

0.386 

0.398 

0.0739 

0.0162 

0.00714 

IE 

0.0892 

0.0449 

0.0225 

0.0113 

0.00554 

0.00226 

CN 

0.0299 

0.00758 

0.00103 

0.00169 

0.00169 

0.00168 


Note : We observe that the Euler explicit (EE) scheme is unstable for N = 5. The 
convergence in time of the Crank- Nicolson (CN) scheme is much faster than for 
the implicit Euler (IE) scheme. The remaining error when N is large is due to the 
discretization with respect to the variable S. 
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One way to handle this problem is to use a 
characteristic method (based on the solution of 
d^/dt = — 1/T) in order to get rid of the term 
1/T. This means that the following change of 
variable is introduced: 

g(t,x) = f(t,x-t/T) (29) 

One can easily show that g is solution of: 1 

dg a 2 (x — t/T) 2 d 2 g , ^ dg „ 

1 + 2 L d- r( - x -‘ /T) £= 0 ' 

g(T, x) = f(x - 1) = (1 - x)+ 

(30) 

The PDE (30) satisfied by g is such that when 
the advectionterm r(x — t/T) is small, the diffu- 
a 2 (x — t/T) 2 . 

sion term-is also small. As shown 

2 

below, a finite difference scheme applied to (30) 
will indeed lead to satisfactory results. 

An important property of the solution to (30) 
for ) = §_ is that (see Rogers and Shi, 1995) 
<0, 

fit , £) = ^(1 - e- r(r - f) ) - (31) 

rT 

and therefore, V x < t/T, 

git , x) = -1(1 - e-'V-V) - (x - t/T)e~ r ^ 
r 1 

(32) 

To prove (31), one can notice that/ given by (31) 
is the solution to (14) with <p(t-) = —tj, and that, 
due to the fact that the diffusion term is null for 
§ = 0 and that the advection term is negative, 
the solution to (14) for </>(£) = §_ on / < 0 is the 
same as the solution to (14) for </>(£) = — f on 
? <0. 

To discretize (30), a Crank-Nicolson time 
scheme is used, with a uniform time step St = 
T/N. In order to use the fact that g is analytically 
known on x < t/T (see (32)), a mesh that prop¬ 
erly discretizes the boundary x = t/T is used. 
Therefore, the space interval (0,1) is also dis¬ 
cretized with N space steps of length Sx — 1/N 
(see Figure 1). The mesh is completed by adding 
/ intervals on the right-hand side of x = 1, so 
that x e (0, x max ) with X max = (N + J)Sx. The 
value J = N/2 has been found to be sufficient 



Figure 1 The Mesh and the Computational Do¬ 
main for the Finite Difference Scheme Used to Dis¬ 
cretize (30) 

to guarantee the independence of the results on 
the position of x max . 

Notice that at time t n = nSt, the number of 
unknowns is (N + / — n). This means that the 
dimension of the linear system to solve depends 
on the time-step. 

As far as boundary conditions are concerned, 
we use a Dirichlet boundary condition on x = 
t/T (using (32)) and an artificial zero Neumann 
boundary condition on x = x max - 

Let us now give some numerical results. In 
Table 3, a few comparisons of the results ob¬ 
tained with the characteristic method and other 
methods are given. The characteristic method 
appears to be accurate for both small and large 
volatilities. For any values of the parameters, 
at least 5 digits of precision are obtained in 
less than one second. Notice that the Thomp¬ 
son bounds and the characteristic method are 
implemented in Premia. 2 

The Finite Element Method for 
European Options 

We would like now to introduce the finite el¬ 
ement method. This technique is more flexible 
than the finite difference method. In particu¬ 
lar, it allows for local refinements of the spot 
grid (even in dimensions greater than one), and 
possibly based on local error estimators that are 
mentioned below. This is particularly important 
for American options, because the pricing func¬ 
tion is singular near the exercise boundary, and 
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Table 3 Comparisons of the Prices for an Asian Fixed Call Obtained with Various Finite Difference Methods: 
Characteristic Method, Zvan et al. (1998), Vecer (2001), and Thompson (1999) 


O' 

K 

Charact. 

Method 

N: 

Zvan et al. 

Vecer 

Thompson 

(low) 

Thompson 

(up) 

0.05 

95 

11.09409 

(300) 

11.094 

11.094 

11.094094 

11.094096 


too 

6.7943 

(1000) 

6.793 

6.795 

6.794354 

6.794465 


105 

2.7444 

(3000) 

2.748 

2.744 

2.744406 

2.744581 

0.30 

90 

16.512 

(300) 

16.514 

16.516 

16.512024 

16.523720 


100 

10.209 

(300) 

10.210 

10.215 

10.208724 

10.214085 


110 

5.730 

(1000) 

5.729 

5.736 

5.728161 

5.735488 


Note: Values of parameters: T = 1, r = 0.15, So = 100, / = N/2. For the characteristic method, the number of time- 
steps N > 300 needed to obtain at least 5 digits of precision is given. 


this curve is not known a priori. Let us empha¬ 
size that the use of a refined mesh around the 
singularities of the solution (for example, for 
vanilla option pricing problems, around t = T 
and S = K) is very important in practice to 
rapidly obtain accurate results. The finite ele¬ 
ment method can also be used in a flexible way 
when the geometry of the computational do¬ 
main becomes complex, which may be of in¬ 
terest for barrier options in dimensions greater 
than one. Finally, finite element methods are in¬ 
teresting since they are naturally stable (in L 2 - 
norm) and optimal error bounds (in L 2 -norm) 
can be derived. 

In the following, we first present the finite 
element method on a simple example, namely 
equation (8). We then show how to treat more 
complex European options. 


Variational Formulation and Finite Element 
Space 

The conforming finite element method is based 
on two ingredients: a so-called variational for¬ 
mulation of the PDE on a functional space V 
and the choice of an appropriate sequence of 
finite dimensional spaces V;, C V , which tends 
to V when h (which is the typical diameter of 
the cells of the space mesh) tends to 0. Let us 
illustrate this on (8). 

To derive a variational formulation of (8), 
the principle is to multiply the equation by a 
test function of the spot variable and to inte¬ 
grate by parts. For these computations to be 


well defined, the functions need to be suffi¬ 
ciently smooth. We thus introduce the func¬ 
tional spaces H = L 2 (R + ) = {q : [0, oo) —K, 
/ 0 °°q 2 < oo}, and V = {q e L 2 (R + ), S(dq/dS) e 
L 2 (R + )|. Assuming that (/> is square inte- 
grable, a variational formulation of (8) is then 
(for an S-dependent volatility a): Find p e 
L 2 ((0, T), V) n C°([0, T], H) such that for all 
q e V, 


d r°° a 2 S 2 dp dq 

It Jo pq ~J 0 ~HdsJs 

+ l 

—rj pq — 0, 
p(T, S) = 0(S) 


3cr\ dp 


Scr — ) S—q 
dS dS 1 


(33) 


All the integrals are with respect to S e 
[0, oo). This rewrites: Find p e L 2 (( 0, T), V) n 
C°([0, T], H) such that for all q e V, 


' d 

It h W-« ( P.?) = 0 . 


P(T , S) = 0(S) 


(34) 


where a is the bilinear form 

r0 ° a 2 S 2 dp dq 


a 


(/M) = f 

Jo 

-f 


2 dS dS 


2 c da\ dp 
r — a — Scr— S—q 
dS dS 1 


+ r 


f 


(35) 
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Under suitable assumptions on the data (r, a, 
and <p), it is possible to prove that this varia¬ 
tional problem is well posed (see Achdou and 
Pironneau, 2005). 

The second step is to introduce a sequence 
of meshes in the spot variable indexed by the 
maximal step h and related finite dimensional 
functional spaces 14 C V. In the case of (33), 
the problem is posed on an infinite domain, 
and one needs to first localize the PDE in a fi¬ 
nite domain [0, S max ] by using artificial bound¬ 
ary condition at S = S max , as already explained 
for finite difference discretizations. We con¬ 
sider, for example, a zero Neumann boundary 
dp 

condition on S = S max : — (t, S max ) = 0. Then, a 
mesh of [0, S max ] consists of a finite number of 
intervals (Si, S,-+i) with So = 0 and S/ = S max . 
We set h — maxo<;</_i(S, + i — S ; ). The intervals 
(Si, S,-+ 1 ) are called elements. We then need to 
define a functional space V/, associated with the 
mesh. A classical example is the PI finite el¬ 
ement space, which contains continuous and 
piecewise affine functions, namely, continuous 
functions, which are affine on each interval 
(Si, S i+ i), for 0 < i < I — 1 . In this case, a basis 
of the vector space V is given by the so-called 
hat functions cji e 14 such that for 0 < i, j < I, 


o if * # y 


. (Si j is the Kronecker 


qi(Sj) - 4,/ - q if z - _ j 
symbol). Notice that higher order finite ele¬ 
ment methods may be easily obtained by taking 
continuous and element-wise polynomial func¬ 
tions of degree k > 1. 

The discretization in the spot price variable 
now simply consists in replacing the functional 
space V by the finite dimensional space V/, in 
(33) or (34) (this is the principle of Galerkin 
methods): Find pi, e C°([0, T], 14) such that for 
all q h e V h , 


i: 


d 

dt jo 
p h (T, S) = <p h (S) 


PhCjh ~a(p h ,q h ) = 0, 


(36) 


where (pi, is an approximation of </> in the space 
Vu, and where the integrals in the bilinear form 


a are here for S e [0, S max ] (see (35)). One can 
take, for example, (pi, such that / 0 Smax (</> — (ph)cjh = 
0 for all qi, e 14 (</>/, is then the L 2 projection of 
4> onto V),)- Problem (36) is a finite-dimensional 
problem in space of the form MhdPh/dt — Ah P;,, 
where P/,(f) is a vector of dimension 1 contain¬ 
ing the values of pi, at the nodes of the mesh 
(ph(t,x) = J2 I j= o p h,j(t)qj(x)) and M;„ A h are 
I x I matrices. The matrix M/, (resp. Ah), with 
(i, j )-th component / (J Sm “ qi qj (resp. a(qj, qi)) is 
classically called the mass (resp. stiffness) ma¬ 
trix, because the finite element method was 
originally popularized by the mechanical en¬ 
gineering community. When using the nodal 
basis (hat functions), these matrices are very 
sparse (tridiagonal for one-dimensional prob¬ 
lems). Problem (36) is somewhat similar to (24) 
obtained by the finite difference method; the 
two problems (24) and (36) are actually equiva¬ 
lent if a mesh with uniform space steps is used, 
and if M;, is replaced by a close diagonal matrix 
(mass-lumping). 

A fundamental result (the Cea's lemma) states 
that the norm of (p — pi,) (the discretization er¬ 
ror) is bounded from above by a constant times 
the infimum of the norm of (p — qi,), over all 
qi, e 14 (the best fit error). Using this result, if 
Vi, gets closer to V when h tends to 0, that is, 
if the best fit error tends to 0 when h tends to 
zero, so does the discretization error. In par¬ 
ticular, the finite element discretization is thus 
naturally stable in this norm. A precise mean¬ 
ing for this statement requires us to define the 
norm and study the best fit error. Let us simply 
mention that the norms used in this context are 
related to the L 2 -norm introduced for finite dif¬ 
ference schemes. We refer to Achdou and Piron¬ 
neau (2005) or Quarteroni and Valli (1997) for 
the details. In our specific example, it is possible 
to prove that, if the payoff function is regular 
enough, then 

II V ~ P/!|Il”([o,t],h) + lip - P/dlL 2 ([o,r],v) < Ch 

and that 


P - P/iIIl 2 ([o,t],h) < Ch 2 
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For the discretization in time, the situation is 
exactly the same as for the finite difference 
method: One can use the explicit Euler scheme, 
implicit Euler scheme, or Crank-Nicolson 
scheme, and the rate of convergence is O(St) 
for the Euler schemes and 0(St 2 ) for the Crank- 
Nicolson scheme. 


where V = {(S, M) e R 2 , 0 < S < M}. The 
boundary condition dp/dM(t, S, S) — 0 is natu¬ 
rally contained in this variational formulation 
since, by integration by parts over V: 

C o 2 S 2 dp 3 q C a 2 S 2 dp dq 

- J v —crisis ~ Jt, 2 JSdM 


Finite Element Methods for 
Other Options 

We have introduced the finite element method 
in a very simple case. The aim of this section is 
to explain how it applies for other options. 

Let us first consider basket options, or bas¬ 
ket options with barriers, in dimension 2 and 3. 
The derivation of a variational formulation for 
(9) is very similar to the one-dimensional case. 
However, the construction of the mesh is much 
more complicated in dimension 2 and 3, than 
in dimension 1. It consists of partitioning the 
domain into non-overlapping cells (elements) 
whose shapes are simple and fixed (for exam¬ 
ple, triangles or quadrilaterals in dimension 2, 
or tetrahedra or hexahedra in dimension 3). The 
functional spaces Vh can then be constructed 
as in dimension 1, for example, by considering 
continuous piecewise affine functions. One in¬ 
terest of the finite element method in this con¬ 
text is that it is possible to mesh any domain 
V for barrier options. In the finite difference 
method, to mesh nonquadrilateral (or nonhex- 
ahedral) domains is complicated. 

Let us now consider the case of lookback op¬ 
tions whose prices satisfy (11). This is a natural 
variational formulation of (11) (written here for 
a constant volatility ct): Lind p : 'D —»■ R such 
that, for all £7 : V -* R, 

dr C cr 2 S 2 dp dq C (J 2 S 2 dp dq 

dt Jv Pq ~ Jv~2^~dS~9S ~ Jv^TdSdM 

r a^dp^dc, r ^ 

J v 2 dMdS J v dM 

+ Sv (r ~ ,j2)s Ts‘i- r L n=0 ’ 

p(T, S, M) = 0(S, M) 

( 37 ) 


+ 


X 

L 


a 2 S 2 dp dq 
2 dMdS 


a 2 S 



L 



a 2 S 2 d 2 p " 
2 SS 2 ^ 


J_ r a 2 S 2 dp 

sf2 J{s=M) 2 3 

The first term corresponds to the diffusion term 
in (11). The second term is an integral over the 
boundary { S = M} of V and naturally enforces 
the boundary condition dp/dM(t, S, S) = 0. In 
Ligure 2, we represent the price of a fixed strike 
call obtained using the formulation (11), an im¬ 
plicit Euler scheme, and PI finite elements. The 
computations are made with LreeLem++. 3 


A Posteriori Error Estimates 

A frequently mentioned advantage of the 
Monte Carlo methods is that they naturally pro¬ 
vide a posteriori error bounds through a confi¬ 
dence interval, typically built upon the central 
limit theorem. It is also possible to obtain such 
a posteriori error estimates in the framework 
of the finite element method (this is one addi¬ 
tional advantage of this method compared to 
finite difference methods). Moreover, these a 
posteriori estimates have two very important 
features: 

* They depend on local error indicators. 

* They can be proved to be reliable and efficient, 
that is, the actual error is bounded above and 
below by some fixed constants times the a 
posteriori error, and these estimates can be 
made local. 

Therefore, in the finite element method, the a 
posteriori error estimates enable us to refine 
the mesh in space and time adaptively. We will 
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Figure 2 Price of a Lookback Option for a Fixed Strike Call: 0(S, M) = (M — K ) + 
Note: The parameters are: a = 0.3, r = 0.1, K = 100, T = 1. 


give a numerical illustration for American op¬ 
tions and refer to Ern, Villeneuve, and Zanette 
(2004), Achdou and Pironneau (2005 and 2009), 
or Achdou, Hecht, and Pommier (2008) for 
more details. 

High-Dimensional Problems 

In practical problems, options often involve 
more than three assets. In this case, the PDE 
is posed in a space of dimension larger than 
4, and the finite element or difference methods 
cannot be used, since the number of unknowns 
typically grows exponentially with respect to 
the problem's dimension. This is the so-called 
curse of dimensionality. Let us mention that 
such high-dimensional problems also appear in 
other scientific fields, quantum chemistry, for 
example, and that it is still a subject of current 
research to build appropriate discretizations 
for high-dimensional PDEs. Roughly speaking, 
the problem is to find an appropriate sequence 
of functional spaces Vj, (whose basis is called 
a Galerkin basis), such that their dimensions 
do not grow too rapidly with the dimension 
of the problem. One approach is the sparse 
tensor product (see Bungartz and Griebel, 
2004; Petersdoff and Schwab, 2004). The main 
difficulty when using this approach is actually 


to project the initial condition on V Another 
approach used in other contexts for solving 
high-dimensional problems by deterministic 
methods is the low separation rank method (see 
Beylkin and Mohlenkamp, 2002) and the related 
greedy algorithms (see Ammar et al., 2002; 
Temlyakov, 2008; Le Bris, Lelievre, and Maday, 
2009; and Nouy, 2009). Let us finally mention 
that another possible approach for building an 
appropriate Galerkin basis would be the re¬ 
duced basis method, where some solutions for a 
given set of parameters are used to approximate 
the solution for other values of the parameters. 
Such methods are currently actively inves¬ 
tigated (see, for example, Boyaval, Le Bris, 
Lelievre, Maday, Nguyen, and Patera, 2010). 

The Uncertain Volatility Model: An Example 
of a Nonlinear PDE 

One major interest of the PDE approach is that 
it can be applied for nonlinear models. This will 
be the case for American options, see below, but 
we would like to give here another example 
of such a situation. The principle of the uncer¬ 
tain volatility model introduced by Avellaneda, 
Levy, and Paras (1995) is to give a price for a 
European option, when the volatility is only 
supposed to be in an interval [<r m i n , cr max ]. The 
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principle is the following. For a European op¬ 
tion with convex payoff, it is easy to check that 
the price should be the Black-Scholes price ob¬ 
tained with the maximum volatility <r max . In this 
case, the profit and loss for the hedging strategy 
is indeed zero if the realized volatility is con¬ 
stant equal to er max . A similar reasoning holds 
for concave payoffs: In this case, one should 
consider the Black-Scholes price with the min¬ 
imum volatility cr m j n . For a general payoff, it is 
thus natural (and it can be checked that this is 
indeed an approach that leads to a very good 
hedging strategy, with small profit and loss, and 
thus cheap price) to consider the solution p to 
the PDE: 


3 p 
3 1 

u 


rS-- 

as 


max 9 P >n 
as 2 ~ 


+ G rx 


^f<0 

as 2 


C 2 d 2 P 

as 2 


— rp = 0, 


p(T, S) = 4>(S) 

(38) 

In other words, cr max (resp. cr m in) is used where 
the price is convex (resp. concave), as a function 
of the spot. This PDE can be solved using exten¬ 
sions of the discretization techniques presented 
above; see, for instance, section 2.4 in van der 
Fiji and Oosterlee (2011). 


PRICING AMERICAN 
OPTIONS WITH PDES 

This section is devoted to the discretization of 
the system (23) for the price of an American op¬ 
tion. Notice that no closed formulas such as the 
Black-Scholes formula are available for Amer¬ 
ican put, or for American call with a dividend 
rate, so that efficient discretization of this sys¬ 
tem is needed even for these simple payoffs. 

The Finite Difference Approach for 
American Options 

We first present the extension of the finite dif¬ 
ference approach presented above for European 
options to American options. 


Some Finite Difference Schemes 

We consider a regular mesh discretization S, = 
iSS and a time discretization t n — nSt with 
St = jj. As in the European case, it is natural 
to consider the following three iterative nu¬ 
merical schemes for P", an approximation of 
p(t n , Si). In all cases, the scheme is initialized by 
P | ,v = <p{Si). Let A be the matrix such that 


pn+ 1 _ pn+l 

(. AP n+1 )i = - rSj —^ 

„2c2 pH+1 _ 9 pH+1 I pH+1 
_ g A 1 i+ 1 AL i + i /-l , pn+l 

2 SS 2 1 

(39) 

The explicit Euler (EE) scheme for (23) is, for 
n = N-l, N-2,.,.,0, 

( pn+l _ pn \ 

- 1 st 1 + (AP" +1 );, P," - 4>(Si) 1 = 0 

(40) 

The scheme computes P" = (P");=o,...,/-i from 
the knowledge of P n+1 = (P" +I ) != o,...,/_i. Sim¬ 
ilarly, we can propose an implicit Euler (IE) 
scheme: 

( p'>+l _ pn \ 

- * st ‘ + ( AP n )t, P? - cp(Si) j = 0 

(41) 

and an (implicit) Crank-Nicolson (CN) scheme 


mm — 


p»+l _ pn 

i _ i 

St 


1 -((AP n ) i + (AP n+1 )i), P” — <p(Si) \ = 0 

(42) 

In the case of the EE scheme, it is easy to see 
that we have the equivalent formulation 


P? = max (((I d - StA)P n+1 )i, 0(S,)) (43) 


where I d denotes the identity matrix. 

We now have two new difficulties compared 
to the European case: First, the well-posedness 
of the schemes (41) or (42) is not immediate (for 
European options, we obtained a linear system, 
but this is no longer true for American options), 
and second, studying the convergence is more 
difficult. 
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One way to circumvent the first difficulty is 
to introduce a splitting method (see Barles and 
Souganidis, 1991; Barles, Daher, and Romano, 
1995; and Lions and Mercier, 1979). For (23), 
it writes (a similar modification of (42) could 
also be considered, yielding a Crank Nicolson- 
splitting (CN-S) scheme): 

pn+1 _ pn, 1 

compute P"’ 1 s.t.--——— -h (AP"’ 1 ); = 0 

(44a) 

and then compute P“ = max(P"’ 1 ,0(S,)) 

(44b) 

Hereafter, (44) will be refered to as the im¬ 
plicit Euler-splitting (IE-S) scheme. The first 
step (44a) consists of solving a linear system, 
as in the European case. The second step is a 
projection on the set {i> = (i> ; ), u, > </>(S, ), Vz}, as 
for the EE scheme (43). 

Notice that as for European options, we set 
the equation on a truncated domain (0, S max ) 
and use artificial boundary conditions on S = 
Smax- We refer to Barles, Daher, and Romano 
(1995) for error estimates between the truncated 
problem on (0, S max ) and the exact problem. 


An Abstract Convergence Result 
Assuming for the moment that the schemes 
are well posed, it is possible to study the 
convergence in the general framework of 
finite different schemes for Hamilton-Jacobi 
equations. Possibly under some restrictions 
on the mesh sizes St and <5S, we can obtain 
convergence to the viscosity solution of the PDE 
(23). We refer to Barles (1994) or Barles, Daher, 
and Romano (1995) for a short introduction, 
and Crandall, Ishii, and Lions (1992) for a more 
detailed overview. To give a rough idea of the 
convergence results for such schemes, we con¬ 
sider a general Hamilton-Jacobi equation of the 
form 


H[t,S,p 


dp dp d 2 p\ 
’Iti’dS'dS 2 / 


= 0 


(45) 


with a terminal condition on p(T, •), where 
H is assumed to be Lipschitz continuous and 


"backward parabolic" in the sense that 

if xfri < jjr 2 then H(t, S, p, u, v, i/fi) 

> H(t, S, p, u, v, z/q) (46a) 


and if ii\ < u 2 then H(t, S, p, u\, v, i j/) 

> H(t, S, p, u 2 , v, i jr) (46b) 


Equation (23) is indeed of the form of 
(45) with, for (t, S) e (0, T) x (0, S max ), 
H(t, S, p, u, v, \j/) = min(— u — rSv — \o 2 S 2 ijf + 
rp, p — 0(S)), which obviously satisfies (46). 

First convergence results were given in the 
fundamental work of Crandall and Lions (1984) 
for Lipschitz continuous final condition (f> (and 
without dependence in (45)). 

An abstract and general convergence result 
is given by Barles and Souganidis (1991), and 
we now give a simplified presentation of this 
result. 

We first assume that H satisfies a comparison 
principle, which can be seen as an extension 
of the maximum principle to some nonlinear 
equations. The comparison principle is roughly 
the following (see Crandall, Ishii, and Lions, 
1992; Barles, 1994; or Pham, 1998): Assume that 
u (resp. v) is a subsolution (resp. supersolution) 
of (45), that is. 


f dll dll d 2 ll\ 

H I t, X, II, -, -, -r- ) < 0 

I dt dx dx 2 J 


resp. H lt,x,v 


dv dv d 2 v 
dt ’ dx’ dx 2 


>0 


for (t, S) e (0, T) x (0, S max ), and that u < v on 
the boundaries S — S max and t = T, then u < v 
everywhere. 

Now, suppose that we can write the scheme 
in the abstract form: Vz e {0,Viz e 
{0,..., N], 


S p {t n , Si, P", [P]) = 0 (47) 

where p — (St, SS), and [P] stands for a contin¬ 
uous function that takes values ( Pf)o<k<N,o<j<i 
on the corresponding grid points (f; c , Sy). 4 We 
suppose that (47) admits at least one solution 
denoted P p . Then, in the limit when p goes to 
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zero, P p converges to p solution to (45) if the 
following conditions are satisfied: 


(i) A stability condition, which reads 
max 0 <„< Ni o<!</ |P"| < C, for some con¬ 
stant C independent of N and I (i.e., 
independent of p). 

(ii) A consistency condition: for any regular 
function i fr, 


lim S p (t„, Si, i fr(t„, Si) + f, f + $) 

0 , p—> 0 , t n —>t, Si~>S 

/ a* a* d 2 ir\ 

= H { t ’ s ’+’lt’Js'W) it ’ S) 


For a weaker statement see Barles and 
Souganidis (1991). 

(iii) A monotonicity condition, which reads 


<P < f =f Sp((t, S), P, <p) > Sp((t, S ), P, x fr) 


For most standard financial options, a com¬ 
parison principle holds. The stability and con¬ 
sistency conditions are close to the stability 
and consistency conditions already introduced 
in the case of the schemes for European op¬ 
tions. Flence the new condition to check is the 
monotonicity assumption (which is related to 
the property (46a) satisfied by H). It is actually 
related to a discrete maximum principle. 

Error estimates can also be obtained for the 
finite difference schemes (40)-(41)-(42). For ex¬ 
ample, for the EE scheme, an error estimate of 
order SS 1 ^ 2 in L°°-norm can be proved under 
a CFL condition and for Lipschitz initial data 
(see Jackobsen, 2003). In the context of the finite 
element method (see below) an error estimate 
of order S S 2 can be proved, but in the weaker 
L 2 -norm. 


This property holds under a CFL condition of 
the form St < CSS 2 ,C constant, and with an ap¬ 
propriate discretization of the advection term. 
The CN scheme is also stable and monotone un¬ 
der a CFL-like condition. On the other hand, it 
can be shown that the IE-S scheme as well as 
the IE scheme are stable and monotone without 
any CFL condition. 

Now let us explain how to solve the implicit 
schemes (41) or (42) in practice. Let us con¬ 
sider the IE scheme (41). At each time step, 
setting b — P n+1 , B = I d + StAandg = (4>(S,))i, 
the problem is equivalent to finding x = P" 
such that 


min((Bx — b)i, (x — g)j) = 0, Vi (48) 

The Howard algorithm (see Howard, 1960; also 
called the policy iteration algorithm) is the 
method of choice to solve (48). To present this al¬ 
gorithm, we rewrite (48) in the following form: 
Find x such that. 


min ((B(a)x — b(a))j) = 0, Vi (49) 

aefO ,!) 1 


where B, ;(a) = 


Bpj if a; = 0 
Sj j if a; = 1 
again the Kronecker symbol, i.e., the (i, /)-th 

component of fd) and/?, (a) = b,ifa, 0 


(where S; is 


.The 


gi if an = 1 

i-th component of B(a)x — b(a) only depends 
on the i-th component of a, so that the mini¬ 
mum for the i-th component in (49) is indeed 
taken with respect to the i-th component of a. 
Thus, for a given x and a realizing the minimum 
in (49), the component a-, is equal to 0 (resp. to 
1) if, at the i-th node, the minimum in (48) is 
(Bx — b)i (resp. (x — y),). For an initial value 5 
a 0 e {0, l} 1 , the algorithm is written as follows: 
Iterate for k > 0, 


Implementation and Convergence of the 
Finite Difference Schemes 

It is easy to see, in view of (43), that the EE 
scheme is stable and monotone if the compo¬ 
nents of the matrix (Id — St A) are nonnegative. 
This is exactly what is needed to prove a dis¬ 
crete maximum principle in the European case. 


(i) Compute x k such that B(a k )x k = b(a k ) 

(ii) a k+1 = arg min aje ( 0 .i) (B(a)x k - b(a))i 

Santos and Rust (2004) and Bokanowski, 
Maroso, and Zidani (2009) provide some con¬ 
vergence results. Under suitable assumptions 
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on the matrix B (which are satisfied for the 
schemes considered above, which satisfy the 
monotonicity condition), it can be proved that 
this method converges in at most I iterations. 
In practice, only a few iterations are needed for 
solving (41). 

This algorithm can also be seen as: 

• A Newton's method on the function F defined 
by Fj(x) — min((Bx — b)i, (x — g)i). More pre¬ 
cisely, it is a semismooth Newton's method 
applied to the slantly differentiable function 6 
F. 

• A primal-dual algorithm to implement the 
fully implicit Euler scheme (41) as introduced 
in Hintermuller, Ito, and Kunisch (2002). 

Another well-known method for solving (48) 
is the projected successive over relaxation 
(PSOR) method, which is a modification of 
the successive over relaxation (SOR) method 
to solve iteratively systems of linear equations 
(see Saad, 2003). In its simplest form, it consists 
of decomposing B — L + U where L is a lower 
triangular matrix and U is an upper triangular 
matrix with zero coefficients on the diagonal. 
The algorithm consists of choosing an initial 
guess x° and then computing iteratively for 
n > 1, for x” = argmin {(Lx” — 

(b — Ux n ~ l ))i, (x n — g)i }. This method con¬ 
verges only if B is a diagonal dominant matrix, 
and the convergence is rather slow in practice. 
However, it leads to a highly efficient method 
for the finite element method discussed be¬ 
low, when combined with a suitable splitting 
scheme. 

For the specific case of an American put op¬ 
tion on a single asset, a fast algorithm was 
proposed by Brennan and Schwartz (1977) for 
solving (41) and proved to converge in Jaillet, 
Lamberton, and Lapeyre (1990) in the finite ele¬ 
ment setting (see also Bokanowski, Maroso, and 
Zidani [2009] in the finite difference setting). 
Also in this case it can be proved that the re¬ 
gion of exercise (namely T f = {x e M+, p(t, x) > 
<fi(x)}) is of the form Tf = [y(t), oo[ where y is 
continuous under some regularity assumption 


Table 4 Error on the Value of an American Put in 
Function of the Number I of Intervals in the Variable S 
(and for N = 1000) 


(N = 1000) 

I = 100 

I - 200 

I = 400 

I = 800 

I = 1600 

IE-S 

0.00267 

0.0361 

0.00180 

0.00210 

0.00210 

IE 

0.00379 

0.0146 

0.00011 

0.00024 

0.00018 


of the data. Then the Howard algorithm takes 
a simple form, which is known as the front¬ 
tracking algorithm (see, for instance, Achdou 
and Pironneau, 2005). However, these algo¬ 
rithms are very specific to the one-dimensional 
case and do not apply for general payoff 
functions. 

Numerical Results for the 
American Put Option 

In Table 4, we give numerical results obtained 
with the IE-S and IE schemes for an Ameri¬ 
can put option with r = 0.1, a — 0.1, K — 100, 
T = 1, S = 100, and S max = 150. We have com¬ 
puted all error values by taking the reference 
value P = 1.63380 (obtained with a Cox-Ross- 
Rubinstein algorithm with N — 10 5 , CPU-time 
= 1750s.; see Cox, Ross, and Rubinstein [1979] 
and Lamberton and Lapeyre [1997]). In this ex¬ 
ample, the IE scheme is one digit more accurate 
than the IE-S scheme. With these numerical pa¬ 
rameters, the EE scheme would yield bad re¬ 
sults since the CFL condition is not respected. 
The IE scheme has been implemented using the 
Howard algorithm. The remaining error when 
I is large is due to the time discretization. 

In Table 5, we compare the EE, IE-S and IE 
schemes. Since the error is of order of 0(St) + 
0(8 S 2 ), we have used parameters N and I such 
that 8t ~ 8S 2 (i.e., N — I 2 ), and such that the 
CFL condition is satisfied. We remark that the 
EE scheme gives satisfactory results in less than 
a few seconds. The IE is more accurate but more 
costly than IE-S. Hence in view of the CPU-time 
it is more advantageous here to use simply the 
EE or the IE-S scheme. This conclusion holds 
for a finite difference discretization, but may be 
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Table 5 Error and CPU-Times for the Value of an American Put as a Function 
of the Number N of Time-Steps N and the Number I of Intervals in the Variable S 



I = 100 

N = 100 

I = 200 

N - 400 

I = 400 

N - 1600 

I = 800 

N - 6400 

I - 1600 

N = 25000 

EE 

0.00593 

0.00069 

0.00045 

0.00003 

0.00003 

CPU-time (sec.) 

0.01 

0.10 

0.5 

2.6 

10.7 

IE-S 

0.01177 

0.00616 

0.00098 

0.00029 

0.00007 

CPU-time (sec.) 

0.05 

0.22 

1.23 

7.06 

44.31 

IE 

0.00201 

0.00181 

0.00016 

0.00004 

0.00001 

CPU-time (sec.) 

0.2 

0.9 

7.3 

75.0 

1033.0 


different for a finite element discretization, or 
for another set of parameters. 


Markov Chains Approximations 

There exist related discretization schemes for 
American options based on Markov chain 
approximations. Markov chain schemes (see 
Kushner and Dupuis, 2001) are based on the 
approximation of the dynamic programming 
principle between times f and t + St and on 
the use of a spatial interpolation over a mesh 
(Sj). This leads to another class of schemes that 
are also in finite difference form. Their con¬ 
vergence can be established by showing the 
convergence to the dynamic programming 
equation, or by using the Barles-Souganidis 
theorem (see Barles and Souganidis, 1991). Fi¬ 
nite difference schemes enter this framework 
as well as semi-Lagrangian schemes (Capuzzo- 
Dolcetta and Falcone, 1989; Falcone and 
Ferretti, 1994). An inversed CFL condition, typ¬ 
ically of the form 8S 2 /8t g can then be 

needed. Notice that the Cox-Ross-Rubinstein 
algorithm (Cox, Ross, and Rubinstein, 1979) can 
also be seen as a discrete Markov chain approx¬ 
imation scheme using a very particular spatial 
mesh such that no interpolation appears at the 
end. 


Portfolio Optimization 

The techniques developed above for pricing 
American options can be used in the con¬ 
text of portfolio optimization. A portfolio op¬ 


timization problem (or stochastic optimization 
problem) is typically of the form 


p(t,x)= max E ( f e dl r(s )ds 
aeL°°([t,T],K) \J t 

X / (u, X£ I - a > «(w)) du + e~^ r(s)ds (j) (X^’“) ^ 

(50) 


where K is compact, a is a progressively mea¬ 
surable function with values in K, and with 

' dt u XM = b(u, Xl; x ’ a , a{u))du 

+ a(u,X^ x ’ a ,a(u))dW u , u>t, 

X\' x ’ a = x 

The corresponding PDE can be shown to be 


mm 

aeK 


1 9 3 2 p dp 

(t.S.c)- 


dp 

Itt 


+ rp — f(t, S, a) 1 = 0 


in the viscosity sense (see Pham, 2006). Finite 
difference schemes similar to those presented 
above for American options can be applied. Im¬ 
plicit schemes, if considered, can be solved by 
the Floward algorithm. This can also be gen¬ 
eralized to an optimal stopping time problem, 
adding in (50) a supremum over stopping times 
with values in [f, T] (as in (15)). For such general 
HJB equations, a discretization by a finite ele¬ 
ment approach is not always possible because 
an appropriate variational formulation cannot 
always be obtained; see Bensoussan and Lions 
(1978). 
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The Finite Element Approach for 
American Options 

As in the European case, the finite element ap¬ 
proach requires a variational formulation of the 
PDE (23). Let us consider the case of the Amer¬ 
ican put option. Let V be the functional space 
used for the variational formulation, and 


K = {q e V, q > (p) 


We first notice that (23) is equivalent to the set 
of inequalities 7 (together with p(T, S) = </>(S)) 


p — 4> >0, 





(p-(/>) = 0 


(51) 


we refer to Achdou and Pironneau (2005 and 
2009). 

Now, as in the case of the finite element 
method for European options, we introduce 
a sequence of finite dimensional functional 
spaces 14 C V , such that the functions in V are 
better and better approximated by functions in 
Vi, as h goes to 0. One can, for example, consider 
a PI finite element space on a mesh (S, )o<;</ . Re¬ 
member that a basis of V], is given by a set of 
functions (i/,)o<;<;. The finite element approx¬ 
imation of (52) is obtained by replacing V by 
Vi,: Find pi, e C°([0, T], K n 14) such that for all 
t e [0, T), 


Vqi, e K n 14, - 



- Ph) 


+ a(p h ,cjh - ph) > 0 


(53) 


We can check that this is equivalent (for suf¬ 
ficiently smooth function p) to the following 
variational formulation for (23): find p e L 2 
([0, T], K) n C°([0, T], L 2 (M+)) such that for all 
f e [0, T), 

e K ’ ~ / ~ + “ P) - 0 

(52) 

where a is the bilinear form (35) defined above 
(recall that for compactly supported func¬ 
tions p and q, a(p, q) = f Apq), with the final 
condition 

P(T, S) = </>(S) 

Indeed, by writing q — p = (q — </>) — (p — </>), 
it is clear that (51) implies (52). Conversely, 
choosing a sufficiently large q e K in (52), we 
obtain that —^+Ap>0. Taking then q = </> 
in (52), we obtain that (—+ Ap, <j> — p) > 0, 
but this inequality is actually an equality since 
—+ Ap > 0 and </> — p < 0. 

Notice that if we take K = V in (52), we re¬ 
cover the variational formulation (34) for the 
European option. Precise existence and unique¬ 
ness results for such variational inequalities can 
be found in Bensoussan and Lions (1978). For 
results and applications in the finance context. 


with the final condition pi,(T) — (pi,, where 
(pi, e 14 is an approximation of (p. 

For time discretization, one can again use 
the schemes we have introduced in the case 
of the discretization of European options. For 
example, the implicit Euler scheme applied 
to (53) is naturally defined as follows: Find 
Ph, Ph~ X , ■■■’Ph in V h CiK such that pf = (p h 
(initialization) and, for n — N — 1, ..., 0: 

Wq h e VhP K, - J Vh ^ Ph (q h - p£) 

+ a(Ph,q>,-Ph)> 0 (54) 

One can easily check that 

q h e Vh n K & q h e V h and qi,(Si) > </>(S,), Vi 

Denoting A;, and Mi, the mass and stiffness 
matrices as in the case of the finite element 
method for European options, and reasoning as 
for the equivalence between (23), (51), and (52), 
it can be checked that (54) is equivalent to solve 
in Mb 

( pn +1 _ pn \ 

(~M h --- +A„P n ) p (P»-g).J 


= 0, Vi 
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Figure 3 The Adapted Mesh and the Contours of P One Year to Maturity. o\ = 0.2, o-i = 0.1, p = —0.6. 


where g, — </>(S;) and P" = p,"(S/). Equivalently, 
the problem is to find P" such that 

min (((Mh + StA h )P" - M,,P n+1 ) r ( P n - g).) 

= 0, Vi 

This is a similar problem as for the IE finite 
difference scheme (see (48)) where the matrix 
(Id + St A) is now replaced by (Mi + St A],). It 
can be solved by the Howard algorithm pre¬ 
viously presented. For the particular American 
put problem under some assumptions on the 
mesh steps St and h, it can also be solved by the 
Brennan and Schwartz algorithm or the front¬ 
tracking algorithm mentioned above. 


Notice that a Crank-Nicolson scheme can be 
derived in a similar way. The expected error (in 
L 2 -norm) is (as in the European case) 0(h 2 ) + 
0(St) for the IE scheme and 0(h 2 ) + 0(St 2 ) for 
the CN scheme. 

We conclude this section by a numerical illus¬ 
tration of the mesh refinement procedure (that 
can be implemented by using a posteriori error 
estimates) applied to the pricing of an Amer¬ 
ican option on two assets. Such an automatic 
refinement procedure is particularly useful for 
American options because the pricing function 
is not smooth at any given time t e [0, T], Fig¬ 
ures 3 and 4 illustrate such a mesh refinement 
for a typical two-assets American option with 


"exercise_250" 


"exercise_250" 



Figure 4 The Exercise Region One Year to Maturity. Left: K = 100, o\ — 0.2, 07 = 0.1, p = —0.6. 
Right: K = 50, o\ — 07 = 0.2, p = 0. 
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payoff 0(Si, S 2 ) = (K — max(Si, S 2 ))+. The arti¬ 
ficial boundary T 0 is {max(Si, S 2 ) = 5 = 200}. 
Homogeneous Dirichlet conditions are im¬ 
posed on r 0 . We have chosen two examples. In 
the first example, the parameters are o\ = 0.2, 
<t 2 = 0.1, r = 0.05, p — —0.6, and K = 100. In 
the second example, the parameters are cri = 
<72 = 0.2, r — 0.05, p = 0, and K = 50. The im¬ 
plicit Euler scheme has been used with a uni¬ 
form time step of 1 /250 year. For the variables 
Si and S 2 , we have used adaptive finite ele¬ 
ments. For solving the linear complementarity 
problems, we have used the Howard algorithm. 
Mesh adaption in the (Si, S 2 ) variable has been 
performed every 1/10 year. In Figure 3, we have 
plotted the adapted mesh (left) and the contours 
of the pricing function (right) one year to ma¬ 
turity for the first example. Note that the con¬ 
tours exhibit right angles in the exercise region. 
In Figure 4, we plot the exercise region one year 
to maturity for the first example (left) and for 
the second example (right). One sees that the 
exercise boundary has singularities. It is also 
visible that the mesh has been adapted near the 
exercise boundary. 

CALIBRATION 

Fet us now discuss the question of the deter¬ 
mination of the parameters that appear in the 
models we introduced, with an emphasis on the 
calibration of the local volatility. 

Limitation of the Black-Scholes 
Model: The Need for Calibration 

Consider a European-style option on a given 
stock with a maturity T and a payoff function 
cp, and assume that this option is on the mar¬ 
ket. Call p its present price. Also, assume the 
risk-free interest rate is the constant r. One may 
associate with p the so-called implied volatil¬ 
ity, that is, the volatility er !mp such that the price 
given by formula (4) at time t = 0 with <7 = <t,,„ p 
coincides with p. If the Black-Scholes model was 
sharp, then the implied volatility would not de¬ 
pend on the payoff function <p. Unfortunately, 


for vanilla European puts or calls, for example, 
it is often observed that the implied volatility 
is far from constant. Rather, it is often a convex 
function of the strike price. This phenomenon is 
known as the volatility smile. A possible expla¬ 
nation for the volatility smile is that the deeply 
out-of-the-money options are less liquid, thus 
relatively more expensive than the options in- 
the-money. 

This shows that the critical parameter in the 
Black-Scholes model is the volatility < 7 . Assum¬ 
ing a constant and using (8) often leads to poor 
predictions of the options'prices. The volatility 
smile is the price paid for the too great simplic¬ 
ity of Black-Scholes' assumptions. 

Let us now discuss some of the possible en¬ 
richments of the Black-Scholes model: 

* Local volatility models: The volatility is a 
function of time and of the spot price, that 
is, o> = a(t, St). With suitable assumptions on 
the regularity and the behavior at infinity of 
the function <7, (4) holds, and P t — p(t, S t ), 
where p satisfies the final value problem (8), 
in which a varies with t and S. Calibrating 
the model consists of tuning the function a 
in such a way that the prices computed, for 
example, with the PDE coincide with the ob¬ 
served prices. This will be discussed in detail 
below. 

• Stochastic volatility models: One assumes 
that at = f(yt), where ijt is a continuous time 
stochastic process, correlated or not to the 
process driving S<; see Fouque, Papanico¬ 
laou, and Sircar (2000) for a nice presentation. 
Several models have been proposed, among 
which are the following: 

1. Hull-White model (see Hull and White, 
1987): f(y) = Jy and y t is a lognormal 
process. 

2. Scott model: f(y) = «Jy and y t is a mean- 
reverting Ornstein-Uhlenbeck process: 

dy t = a(m — yt)dt + /3dZ t (55) 

where a and f5 are positive constants, Z f is 
a Brownian motion. 
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3. Heston model (see Heston, 1993): f(y) = 
y/y and y, is a Cox-Ingersoll-Ross process, 

dy t = ic(m — y t )dt + Xy/y~tdZ t (56) 

where k, m, and X are positive constants. 

4. Stein-Stein model (see Stein and Stein, 
1991): f(y) = yfy and y t is a mean- 
reverting Ornstein-Uhlenbeck process. 

There are two risk factors, one for the stock 
price and the other for the volatility. If the two 
driving processes are not completely corre¬ 
lated, it is not possible to construct a hedged 
portfolio containing simply one option and 
shares of the underlying asset. One says that 
the market is incomplete. Nevertheless, if one 
fixes the contribution of the second source of 
randomness dZ t to the risk premium, that is, 
the market price of the volatility risk or the 
risk premium factor as a function of t, St and 
y t , then it is possible to prove that the option's 
price is of the form P t — p(t, St, yt), where the 
pricing function satisfies a PDE in the vari¬ 
ables ( t, S, y). The PDE may be degenerate 
for the values of y corresponding to volatility 
cancellation. Calibrating the model consists 
of tuning the parameters of the process yt and 
the function/ in order to match the observed 
prices. 

* Levy-driven spot price: One may generalize 
the Black-Scholes model by assuming that 
the spot price is driven by a more general 
stochastic process, for example, a Levy pro¬ 
cess (see Cont and Tankov, 2003; Merton, 1976; 
and Carr, Geman, and Yor, 2002). Levy pro¬ 
cesses are processes with stationary and in¬ 
dependent increments which are continuous 
in probability. For a Levy process X T on a fil¬ 
tered probability space with probability P*, 
the Levy-Khintchine formula says that there 
exists a function / : R. —»■ C such that 


for a > 0, / e R and a positive measure v on 
M\{0} such that / R min(l, z 2 )v(dz) < +oo. The 
measure v is called the Levy measure of X. 
We focus on the Levy measure with a den¬ 
sity, v(dz) = k(z)dz. It is assumed that the dis¬ 
counted price of the risky asset is a square 
integrable martingale under P*, and that it 
is represented as the exponential of a Levy 
process: 

e~ rz S T = S 0 e x ' 


The martingale property is that E*(e Xr ) = 1, 
i.e. 



and 

- 1 - zl| Z |< 1 )k(z)(dz) 


and the square integrability comes from the 
condition J^. 1 e 2z k(z)dz < oo. 

With such models, the pricing function for a 
European option is obtained by solving a par¬ 
tial integrodifferential equation (PIDE), with 
a nonlocal term. Calibrating the model con¬ 
sists of tuning a and the function k in such a 
way that the prices computed with the PIDE, 
for example, match the observed prices (see 
Cont and Tankov, 2004). 


Local Volatility and Dupire's 
Formula 

We consider a local volatility model and call 
(t, S) f—C(f, S, r, x) the pricing function for a 
vanilla European call with maturity r and strike 
x. It satisfies the final value problem: for f e 
[0, r) and x e M+, 


9C 

~dt 


2 (f, S)S 2 d 2 C 

2 as 2 


dC 

+ (r - 1? ) s as ~ rC = 0 


C(r, S) = (S - x)+ 
(57) 


E*(e'" Xr ) = e- r * (u) , 

2 2 p 

X(u) = - ipu - / {e mz - 1 - iuz)v(dz) 

Z J |z| < 1 



where we have supposed that the underlying 
asset yields a distributed dividend, qSfdt. By 
reasoning directly on (4) or by using PDE ar¬ 
guments, it can be proved that the function 
(r, x) i->- C(f, S, r, x) (now t and S are fixed) 




Partial Differential Equations in Finance 


683 


satisfies the forward parabolic PDE: 

9C 1 2/ N o 3 2 C , x ac „ „ 

-- -a 2 (t,x)x 2 — +(r -q)x— +qC = 0 

dz 2 dx z dx 

(58) 


for r > t and x e K + . This observation was first 
made by Dupire (1994), and the proof of (58) by 
PDE arguments can be found in Achdou and 
Pironneau (2005) or Pironneau (2007). We also 
mention that similar partial differential equa¬ 
tions can be derived for other options, like bi¬ 
nary options, barrier options, options on Levy- 
driven assets, or basket options (see Pironneau, 
2007). 

Using (58) is useful for two reasons. First, con¬ 
sider a family of calls on the same stock with 
different maturities and strikes (r,,x ; ), I € I, 
where I is a finite set. Assume that the spot 
price is known, that is, S = So- In order to nu¬ 
merically compute the prices of the calls, that 
is, C(0, So, t i, Xi), i e I, one may solve (58) for 
max; e j tj > r > 0 and initial data C(r = 0, x) = 
(So — x) + with, for example, a finite difference 
or a finite element method. Only one initial 
value problem is needed. On the contrary, us¬ 
ing (8) would necessitate solving #1 initial value 
problems. We see that (58) may save a lot of 
work. 

Second, (58) may be used for local volatility 
calibration. Indeed, if all the possible vanilla 
options were on the market, the local volatility 
in (57) could be computed: 


cr 2 (r, x) = 2 


3 C 
3r 


9C 

(r, x) + (r — q)x — (r, x) + qC( r, x) 
dx 


2 , 
X 3U (T ’ X) 


(59) 


This is known as Dupire's formula for the local 
volatility. In practice, (59) cannot be used di¬ 
rectly because only a finite number of options 
are on the market. 

Assume that the observations are the prices 
(Ci)ier of a family of calls with maturity/strike 
(r i,Xi)i € i- Finding a function (r, x) f-> or(r, x) 
such that the solution of (58) with C(0, x) — 
(So — x) + takes the value C, at (r xi), i e I is 
called an inverse problem. 


A natural idea is to somehow interpolate 
the observed prices by a sufficiently smooth 
function C : [0, max; € j r,] —> K + , then use (59) 
with C — C. For example, bicubic splines may 
be used. This approach has several serious 
drawbacks: 

• It is difficult to design an interpolation pro- 

9 2 C 

cess such that —r- does not take the value 0, 
dx z 

and such that the right-hand side of (59) is 
nonnegative. 

* There is an infinity of possible interpolations 
of C; at (r;, Xi), i e I, and for two possible 
choices, the volatility obtained by (59) may 
differ considerably. 

We see that financially relevant additional in¬ 
formation has to be added to the interpolation 
process. 

Least-Square Methods 

Here, we show how (58) can be used for calibra¬ 
tion. The first idea is to use least squares, that is, 
to minimize a functional / : a i-* JT ; o>, | C, — 
C(t;, Xj)\ 2 for a in a suitable function set T, 
where a>, are positive weights, and the pric¬ 
ing function C is the solution of (58) with 
C(0, x) — (So — x) + . The evaluation of / requires 
the solution of an initial value problem. The set 
T where the volatility is to be found must be 
chosen in order to ensure that from a minimiz¬ 
ing sequence one can extract at least a subse¬ 
quence that converges in T, and that its limit is 
indeed a solution of the least square problem. 
For example, T may be a compact subset of a 
Hilbert space W (in principle W could be a more 
general Banach space but it is easier to work 
in Hilbert spaces if gradients are needed) such 
that the mapping / is continuous in W. In prac¬ 
tice, W has a finite dimension and is compactly 
embedded in the space of bounded and con¬ 
tinuous functions a such that xd x a is bounded. 
Thus, the existence of a solution to the min¬ 
imization problem is most often guaranteed. 
What is more difficult to guarantee is unique¬ 
ness and stability: Is there a unique solution to 
the least square problem? If yes, is the solution 
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insensitive to small variations of the data? The 
answer to these questions is no in general, and 
we say that the problem is ill-posed. 

As a possible cure to ill-posedness, one usu¬ 
ally modifies the problem by minimizing the 
functional a i->- / (er) + J r(<j) instead of/, where 
Jr is a sufficiently large strongly convex func¬ 
tional defined on W and containing some fi¬ 
nancially relevant information. For example, 
one may choose J r(ct) = co\\(j — ct || 2 , where co is 
some positive weight, ||. || is a norm in W, and 5 
is a prior local volatility, which may come from 
historical knowledge. The difficulty is that co 
must not be too large not to perturb the inverse 
problem too much, but not too small to guar¬ 
antee some stability. The art of the practitioner 
lies in the choice of Jr. 

Once the least square problem is chosen, we 
are left with proposing a strategy for the con¬ 
struction of minimizing sequences. If / and Jr 
are C 1 functional, then gradient methods may 
be used. The drawbacks and advantages of such 
methods are well known: On the one hand, they 
do not guarantee convergence to the global min¬ 
imum if the functional is not convex, because 
the iterates can be trapped near a local mini¬ 
mum. On the other hand, they are fast and accu¬ 
rate when the initial guess is close enough to the 
minimum. For these reasons, gradient methods 
are often combined with techniques that permit 
us to localize the global minimum but that are 
slow, like simulated annealing or evolutionary 
algorithms. 

Anyhow, gradient methods require the eval¬ 
uation of the functional's gradient. Since Jr 
explicitly depends on a , its gradient is easily 
computed. The gradient of / is more difficult to 
evaluate, because the prices C(r,, x, ) depend on 
ct in an indirect way: One needs to evaluate the 
variations of C(r;, X;) caused by a small varia¬ 
tion of or; calling <$<7 the variation of ct and SC 
the induced variation of C, one sees by differ¬ 
entiating (58) that SC(r = 0, •) = 0 and 

(j 2 (r, x)x 2 t 

d T SC - v ’ d 2 x SC +(r - q)xd x 8C + qSC 

= oSox 2 d xx C (60) 


To express SJ in terms of So, an adjoint state 
function P is introduced as the solution to the 
adjoint problem: Find the function P such that 
P(f, •) = 0 and for r < f, 

9 T P + d 2 x - M p ( r - <j) x ) ~ J P 

= 2 ^2 a>i(C(tj,Xj) - Ci)8 XiiXi (61) 

iel 

where f is an arbitrary time greater than 
max, e j r, and in the right-hand side, the S T; Xi de¬ 
note Dirac functions in time and strike at (r;, x{). 
The meaning of (61) is the following: 

-J ^9 r u - ^~d xx v + (r -q)xd x v + qvj P 

= 2 coi(C(n,Xi) - Ci)v(zi, Xj) (62) 

iel 

where Q = (0, f) x R + , and v is any function 
such that v e L 2 ((0, f), V)with9 r i; e L 2 (Q) and 
x 2 d xx v e L 2 (Q). Taking v — SC in (62) and using 
(60), one finds 


2 ^ a>i (C(t;, Xi) - C; )SC (r/, x,) 
iel 

= 2 ^ > (C(Tf , Xi) — Ci)(8 riiXi , 8C) 
iel 

= - J ^a t 5C - ^-3 XX 8C + (r - q)xd x 8C + qSCj P 

= — f o8crx 2 Pd xx C 
JQ 

We have worked in a formal way, but all the 
integrations above can be justified. This leads 
to the estimate 


SJ 


■ f oSox 2 Pd xx C 

Jq 


< cPctII 2 , 


(Q) 


which implies that / is differentiable, and that 
its differential at point a is given by 

D/(ct) :q\-+ - f oqx 2 P(o)d 2 x C(o) 

JQ 


where P(ct) satisfies (61), and C(er) satisfies (58). 
We see that the gradient of / can be evaluated. 
When (58) is discretized with, for example, fi¬ 
nite elements, all that has been done can be re¬ 
peated with a discrete adjoint problem, and the 
gradient of the functional can be evaluated in 
the same way. Let us stress that the gradient 
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D/(ct) is computed exactly, which would not be 
the case with, for example, a finite difference 
method. 

Local volatility can also be calibrated with 
American options, but it is not possible to find 
the analogue of Dupire's equation. Thus, in the 
context of a least square approach, the evalua¬ 
tion of the cost function requires the solution of 
#1 variational inequalities, which is computa¬ 
tionally expensive (see Achdou and Pironneau, 
2005). In this case, it is also possible to find 
the necessary optimality conditions involving 
an adjoint state (see Achdou, 2005). 

Appendix: Proof of (21) 

First, from the definition (15) of p we have, for 
any stopping time p e ^[t,T], 

e-!' rds P (p,S^) 

= ess sup E (e~ dt rds (p(S t T ' x )\ , a.s. 

(63) 

where Tjy r] denotes the set of stopping times r 
such that p < t < T. Then it is possible to show 
that (see, for instance, Karatzas and Shreve, 
2010,Eq. (D.7)), for any stopping time p e T[tj], 

E ( e -/M s p(p,S^)) 

= sup e(W ‘ rds (/>(S t T ’ x )\ (64) 

z€l i p,t] 

We obtain from (64) the decreasing property: 
For all stopping times p\, P 2 e 7jy T ], such that 
Pi > Pi, 

E (e - f* n r As p(pi , S'-*)^ 

<E (e-f‘ 2rds p(p 2 ,S t p x )) (65) 

We deduce from (63) that, for any r e 7| r - / |, 

E (e~ d ‘ rds 4> (S'-*) | < e~f rds p{ r*, S'f) 

= e~^ rds (f>(S t T ' x ) (66) 

where the last identity comes from the defi¬ 
nition (20) of r*. Then, for any stopping time 


r e 7[, y |, we have (by decomposing on the 
events {r < r*} and {r > r*}), and using (66) 
for r > t*): 

E (e- d < rds (t> (S'-*)) < E ( e~K AT ‘ rds 4 »(Sj’* T .)) 

Hence, by taking the supremum over all the 
stopping times r e T[t,T\, 

p(t,x)< sup E(W" r rcis 0(S'-* T ,)) 

zeT[ t j] 

= sup E (e-J’‘ rds 4>(S t T ’ x )) (67) 

r<r*, reTf t jj 

By (15), the right-hand side of (67) is bounded 
from above by p(t,x), and thus we obtain the 
equality 

p(t,x)= sup E (e~J'> rds (/,(S t p x )) (68) 

r<r*, t67J,,t] ' 

In fact the supremum in (68) is reached only 
for r = r* a.s.. Indeed, for r e if r < r* 
andP(r < r*) > 0, we have, by the definition of 
t*, E {e~J't rds (P(S t T ’ x )) < E(e~ft rds p(T, S'-*)) < 
p(t, x). This concludes the proof of (21). 

KEY POINTS 

* When a deterministic method is available to 
price an option, it is generally more efficient 
than a brute force Monte Carlo algorithm. 

* Deterministic techniques are usually more 
involved to implement than stochastic ap¬ 
proaches and typically require specific devel¬ 
opments for each targeted pricing problem. 

* Deterministic approaches are particularly 
useful for nonlinear problems (including the 
pricing of American options and portfolio op¬ 
timization) and for calibration. 

* Future research subjects for such approaches 
include the development of efficient dis¬ 
cretization methods for high-dimensional 
problems, and the combination of deter¬ 
ministic and stochastic approaches to take 
advantage of both techniques (using 
variance-reduction techniques or predictor- 
corrector methods, for example). 
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NOTES 

1. Notice that the same equation has been con¬ 
sidered by Vecer (2001) using some financial 
arguments. 

2. http://www-rocq.inria.fr/mathfi/Premia/ 
index.html 

3. http://www.freefem.org/ 

4. More precisely, the interpolating operator • 
should also satisfy [P] < [Q] everywhere as 
soon as Pj < for all k, j. 

5. A good initial guess is indeed the vector a 
obtained at the previous time iteration. 

6 . F is slantly differentiable if there exist C > 0 
and a matrix G(x) such that Vx, ||G(x)^ 1 || 00 < 
C andF(x + h) = f (x) + G(x + h)h + o(h) as 
h —> 0. Here G(x) can be defined by G(x),y = 
Bjj if (Bx — b)i < (x — g)i, and G(x); ; - = Sy 
otherwise. 

7. Such a problem is called a linear complemen¬ 
tarity problem. 
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Abstract: Model risk is the risk of error in pricing or risk-forecasting (such as value at risk, or VaR) 
models. It arises in part because any model involves simplification and calibration, and both of 
these require subjective judgments that are inevitably open to error. Model risk can also arise where 
a model is used inappropriately. Model risk is therefore an inescapable consequence of model use, 
and there is abundant anecdotal and other evidence that it is a major problem, especially for VaR 
models. However, there are also many ways in which risk managers and financial institutions can 
manage this problem. 


This entry examines the subject of model risk. 
Loosely speaking, model risk is the risk of error 
in the valuations produced by a pricing model 
or in the estimated risk measures produced by a 
risk model. The nature of model risk and its di¬ 
verse causes and manifestations are examined. 
The entry also briefly addresses the scale of the 
problem and the dangers it entails, and then 
goes on to discuss ways in which model risk 
can be managed. 

MODELS AND MODEL RISK 

A model can be defined as "a simplified descrip¬ 
tion of reality that is at least potentially use¬ 
ful in decision-making" (Geweke, 2005, p. 7). A 
model attempts to identify the key features of 
whatever it is meant to represent and is, by its 
very nature, a highly simplified structure. We 
should therefore not expect any model to give 
a perfect answer: Some degree of error is to be 
expected, and we can think of this risk of error 
as a form of model risk. 


However, the term model risk is more subtle 
than it looks, and not all output errors are due 
to model inadequacy. For example, simulation 
methods generally produce errors due to sam¬ 
pling variation, so even the best simulation- 
based model will produce results affected by 
random noise. Conversely, models that are the¬ 
oretically inappropriate can sometimes provide 
good results. The most obvious cases in point 
are the well-known "holes in Black-Scholes": 
Simple option pricing models often work well 
even when some of the assumptions on which 
they are based are known to be invalid. They 
work well not because they are accurate, but 
because those who use them are aware of their 
limitations and use them discerningly. 

In finance, we are concerned with both pric¬ 
ing (or valuation) models and risk (or VaR) mod¬ 
els. The former are models that enable us to 
price a financial instrument, and with these 
model risk boils down to the risk of mispric¬ 
ing. These models are typically used on a stand¬ 
alone basis and it is often very important that 


691 




692 


Model Risk and Selection 


they give precise answers: Mispricing can lead 
to rapid and large arbitrage losses. Their ex¬ 
posure to this risk depends on such factors as 
the complexity of the position, the presence or 
otherwise of unobserved variables (e.g., such 
as volatilities), interactions between risk fac¬ 
tors, the use of numerical approximations, and 
so on. 

Risk models are models that forecast financial 
risks or probabilities. These models are exposed 
to many of the same problems as pricing mod¬ 
els, but all are often also affected by the difficul¬ 
ties of trying to integrate risks across different 
positions or business units, and this raises a 
host of issues (e.g., aggregation problems, po¬ 
tential inconsistencies across constituent posi¬ 
tions or models, etc.) that do not (typically) arise 
in stand-alone pricing models. So risk models 
are exposed to more sources of model risk than 
pricing models typically are. However, with 
risk models there is far less need for accuracy: 
Errors in risk estimates do not lead directly to 
arbitrage losses, and the old engineering princi¬ 
ple applies that the end output is only as good as 
the weakest link in the system. With risk mod¬ 
els, we therefore want to be approximate and 
right, and efforts to achieve high levels of pre¬ 
cision would be pointless because any reported 
precision would be spurious. 

We are particularly concerned in this entry 
with how models can go wrong, and to appre¬ 
ciate these problems it helps to understand how 
our models are constructed in the first place. To 
get to know our models we should: 

• Understand the securities involved and the 
markets in which they are traded. 

• Isolate the most important variables and sep¬ 
arate out the causal variables (or exogenous 
variables) from the caused (or endogenous) 
variables. 

• Decide which exogenous variables are deter¬ 
ministic and which are stochastic or random, 
decide how the exogenous variables are to 
be modeled, and decide how the exogenous 
variables affect the endogenous ones. 


• Decide which variables are observable or 
measurable and which are not; decide how 
the former are measured, and consider 
whether and how the unobservable variables 
can be proxied or implicitly solved from other 
variables. 

• Try to ensure that the model captures all key 
features of the problem at hand, but also has 
no unnecessary complexity. 

• Consider how the model can be solved and 
look for the simplest possible solutions. We 
should also consider the possible benefits and 
drawbacks of using approximations instead 
of exact solutions. 

• Program the model, taking account of 
programming considerations, computational 
time, and so on. 

• Calibrate the model using suitable methods: 
For example, we might estimate parameters 
using maximum likelihood methods and then 
adjust them using subjective judgments about 
factors such as changing market conditions 
that might not be fully reflected in our data 
set. 

• Test the model using data not used to calibrate 
the model. 

• Implement the model, regularly evaluate its 
performance, and identify its strengths and 
weaknesses. 

• Keep a log of all these activities and their 
outcomes. 

SOURCES OF MODEL RISK 

Incorrect Model Specification 

One of the most important sources of model 

risk is incorrect model specification, and this 

can manifest itself in many ways: 

• Stochastic processes might be misspecified. 

We might assume that a stochastic process 
follows a geometric Brownian motion when 
it is in fact heavy-tailed, we might fit a sym¬ 
metric distribution to skewed data, and so 
forth. It is very easy to misspecify stochas¬ 
tic processes, because the "true" stochastic 
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process is very difficult to identify and it is 
impossible in practice to distinguish between 
a "true" process and a similar but false alter¬ 
native. The misspecification of stochastic pro¬ 
cesses can lead to major errors in estimates of 
risk measures: The classic example is where 
we incorrectly assume normality in the pres¬ 
ence of heavy tails, an error that can lead to 
major underestimates of VaR and other risk 
measures. 

• Incorrect calibration of parameters. Even if 
we do manage to identify the "true" model, 
the model might be calibrated with incorrect 
parameter values. Parameters might be es¬ 
timated with error, not kept up to date, es¬ 
timated over inappropriate sample periods, 
and so forth. This problem is often referred to 
as parameter risk, and it arises everywhere in 
risk management because it is practically im¬ 
possible to determine "true" parameter val¬ 
ues. Besides leading to major errors in risk 
estimates, incorrect calibration can also lead 
to major losses if the models are used to price 
traded instruments. A good example was the 
£90 million loss made by the NatWest Bank 
from 1995 to 1997, where a trader had fed his 
own (artificially high) estimates of volatility 
into a model used to price long-dated over- 
the-counter (OTC) interest rate options. We 
can also get problems when correlations un¬ 
expectedly polarize in a crisis: In such cases, 
the portfolio loses much of its effective diver¬ 
sification, and the "true" risks taken can be 
much greater than estimates based on earlier 
correlations might suggest. 

• Missing risk factors and misspecified rela¬ 
tionships. We might ignore stochastic volatil¬ 
ity or fail to consider enough points across the 
term structure of interest rates, ignore back¬ 
ground risk factors such as macroeconomic 
ones, or we might misspecify important rela¬ 
tionships (e.g., by ignoring correlations). 

• Ignoring of transactions costs, liquidity, and 
crisis factors. Many models ignore trans¬ 
actions costs and assume that markets are 
perfectly liquid. Such assumptions are very 


convenient for modelling purposes, but can 
lead to major errors where transactions costs 
are significant, where market liquidity is lim¬ 
ited, or where a crisis occurs. These sorts 
of problems were highlighted by the dif¬ 
ficulties experienced by portfolio insurance 
strategies in the October 1987 crash—where 
strategies predicated on dynamic hedging 
were unhinged by the inability to unwind 
positions as the market fell. The failure to al¬ 
low for illiquidity led to much larger losses 
than the models anticipated—a classic form 
of model risk. 

There is empirical evidence that model mis¬ 
specification risk is a major problem. To give a 
couple of examples: Hendricks (1996) investi¬ 
gated differences between alternative VaR esti¬ 
mation procedures applied to 1,000 randomly 
selected simple foreign exchange portfolios, 
finding that these differences were sometimes 
substantial; more alarmingly, a famous study 
by Beder 1995 examined eight common VaR 
methodologies used by a sample of commercial 
institutions applied to three hypothetical port¬ 
folios, and among other worrying results found 
that alternative VaR estimates for the same port¬ 
folio could differ by a factor of up to 14. Some 
further evidence is provided by Berkowitz and 
O'Brien (2001) who examined the VaR models 
used by six leading U.S. financial institutions. 
Their results indicated that these models can 
be highly inaccurate: Banks sometimes experi¬ 
enced high losses very much larger than their 
models predicted, and this suggests that these 
models are poor at dealing with heavy tails or 
extreme risks. Their results also suggest that 
banks' structural models embody so many ap¬ 
proximations and other implementation com¬ 
promises that they lose any edge over much 
simpler models such as generalized autoregres¬ 
sive conditional heteroskedasticity (GARCH) 
ones. The implication is that financial institu¬ 
tions' risk models are very exposed to model 
risk—and one suspects many risk managers are 
not aware of the extent of the problem. 
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Incorrect Model Application 

Model risk can also arise because a good 
model is incorrectly applied. To quote Emanuel 
Derman: 

There are always implicit assumptions behind 
a model and its solution method. But human 
beings have limited foresight and great imagi¬ 
nation, so that, inevitably, a model will be used 
in ways its creator never intended. This is es¬ 
pecially true in trading environments, where 
not enough time can be spent on making inter¬ 
faces fail-safe, but it's also a matter of principle: 
you just cannot foresee everything. So, even a 
"correct" model, "correctly" solved, can lead 
to problems. The more complex the model, the 
greater this possibility. (Derman, 1997, p. 86) 

One can give very many instances of this 
problem: We might use the wrong model in a 
particular context (e.g., we might use a Black- 
Scholes model for pricing options when we 
should have used a stochastic volatility model, 
etc.); we might have initially had the right 
model, but have fallen behind best market prac¬ 
tice and not kept the model up to date, or not 
replaced it when a superior model became 
available; we might run Monte Carlo simula¬ 
tions with a poor random number generator 
or an insufficient number of trials, and so on. 
We can also get "model creep," where a model 
is initially designed for one type of problem 
and performs well on that problem, but is then 
gradually applied to more diverse situations to 
which it is less suited or not suited at all. A per¬ 
fectly good model can then end up as a major 
liability not because there is anything wrong 
with it, but because users don't appreciate its 
limitations. 

Implementation Risk 

Model risk also arises from the ways in which 
models are implemented. No model can pro¬ 
vide a complete specification of model imple¬ 
mentation in every conceivable circumstance 
because of the very large number of possible 
instruments and markets, and because of their 


varying institutional, statistical, and other prop¬ 
erties. However complete the model, imple¬ 
mentation decisions still need to be made about 
such factors as valuation (e.g., mark to market 
versus mark to model, whether to use the mean 
bid-ask spread, etc.), whether and how to clean 
data, how to map instruments, how to deal with 
legacy systems, and so on. 

The possible extent of implementation risk is 
illustrated by the results of a study by Mar¬ 
shall and Siegel (1997). They sought to quantify 
implementation risk by looking at differences 
between how various commercial systems 
applied the RiskMetrics variance-covariance 
approach to specified positions based on a 
common set of assumptions (that is, a one- 
day holding period, a 95% VaR confidence 
level, delta-valuation of derivatives, RiskMet¬ 
rics mapping systems, etc.). They found that 
any two sets of VaR estimates were always dif¬ 
ferent, and that VaR estimates could vary by 
up to nearly 30% depending on the instrument 
class; they also found these variations were in 
general positively related to complexity: The 
more complex the instrument or portfolio, the 
greater the range of variation of reported VaRs. 
These results suggested that: 

[A] naive view of risk assessment systems as 
straightforward implementations of models is 
incorrect. Although software is deterministic 
(i.e., given a complete description of all the 
inputs to the system, it has well-defined out¬ 
puts), as software and the embedded model 
become more complex, from the perspective 
of the only partially knowledgeable user, they 
behave stochastically.... Perhaps the most crit¬ 
ical insight of our work is that as models and 
their implementations become more complex, 
treating them as entirely deterministic black 
boxes is unwise, and leads to real implemen¬ 
tation and model risks. (Marshall and Siegel, 
1997, pp. 105-106) 

Endogenous Model Risk 

There is also a particularly subtle and invidious 
form of model risk that arises from the ways in 
which traders or asset managers respond to the 
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models themselves: Traders or asset managers 
will "game" against the model. Traders are 
likely to have a reasonable idea of the errors in 
the parameters—particularly volatility or corre¬ 
lation parameters—used to estimate VaR, and 
such knowledge will give the traders an idea 
of which positions have under- and overesti¬ 
mated risks. If traders face VaR limits or face 
risk-adjusted remuneration with risks specified 
in VaR terms, they will therefore have an incen¬ 
tive to seek out such positions and trade them. 
To the extent they do, they will take on more risk 
than suggested by VaR estimates, which will 
therefore be biased downward. Indeed, VaR es¬ 
timates are likely to be biased even if traders 
do not have superior knowledge of underlying 
parameter values. The reason for this is that if 
a trader uses an estimated variance-covariance 
matrix to select trading positions, then he or she 
will tend to select positions with low estimated 
risks, and the resulting changes in position sizes 
mean that the initial variance-covariance matrix 
will tend to underestimate the resulting portfo¬ 
lio risk. As Shaw nicely puts it: 

[M]any factor models fail to pick up the risks 
of typical trading strategies which can be the 
greatest risks run by an investment bank. Ac¬ 
cording to naive yield factor models, huge 
spread positions between on-the-run bonds 
and off-the-run bonds are riskless! According 
to naive volatility factor models, hedging one 
year (or longer dated) implied volatility with 
three month implied volatility is riskless, pro¬ 
vided it is done in the "right" proportions—i.e., 
the proportions built into the factor model! It is 
the rule, not the exception, for traders to put on 
spread trades which defeat factor models since 
they use factor type models to identify richness and 
cheapness! (Shaw, 1997, p. 215; his emphasis) 

Other Sources of Model Risk 

There are also other sources of model risk. Pro¬ 
grams might have errors or bugs in them, simu¬ 
lation methods might use poor random number 
generators or suffer from discretization errors, 
approximation routines might be inaccurate or 
fail to converge to sensible solutions, rounding 


errors might add up, and so on. We can also get 
problems when programs are revised by people 
who did not originally write them, when pro¬ 
grams are not compatible with user interfaces or 
other systems (e.g., datafeeds), when programs 
become complex or hard to read (e.g., when 
programs are rewritten to make them compu¬ 
tationally more efficient but then become less 
easy to follow). We can also get simple blun¬ 
ders. Derman (1997, p. 87) reported the exam¬ 
ple of a convertible bond model that was good 
at pricing many of the options features embed¬ 
ded in convertible bonds, but sometimes mis¬ 
counted the number of coupon payments left to 
maturity. 

Finally, models can give incorrect answers be¬ 
cause poor data are fed into them—"garbage 
in, garbage out," as the saying goes. Data prob¬ 
lems can arise from many sources: from the way 
data are constructed (e.g., whether we mark to 
market or mark to model, whether we use ac¬ 
tual trading data or end-of-day data, how we 
deal with bid-ask spreads, etc.), from the way 
time is handled (e.g., whether we use calen¬ 
dar time, trading time, how we deal with hol¬ 
idays, etc.), from the way in which data are 
cleansed or standardized, from data being non- 
synchronous, and from many other sources. 


MANAGING MODEL RISK 

Some Guidelines for Risk Managers 

Given that risk managers can never eliminate 
model risk, the only option left is to learn to live 
with it and, hopefully, manage it. Practitioners 
can do so in a number of ways: 

• Be aware of model risk. First and foremost, 
practitioners should simply be aware of it, 
and be aware of the limitations of the mod¬ 
els they use. They should also be aware of 
the comparative strengths and weaknesses of 
different models, be knowledgeable of which 
models suit which problems, and be on the 
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lookout for models that are applied inappro¬ 
priately. 

• Identify, evaluate, and check key assump¬ 
tions. Users should explicitly set out the key 
assumptions on which a model is based, 
evaluate the extent to which the model's re¬ 
sults depend on these assumptions, and check 
them as much as possible (e.g., using statisti¬ 
cal tests). 

• Choose the simplest reasonable model. 

Exposure to model risk is reduced if practi¬ 
tioners always choose the simplest reason¬ 
able model for the task at hand. Occam's 
razor applies just as much in model selec¬ 
tion as in anything else: Unnecessary com¬ 
plexity is never a virtue. Whenever we choose 
a more complex model over a simpler one, 
we should always have a clear reason for 
doing so. 

• Don't ignore small problems. Practitioners 
should resist the temptation to explain away 
small discrepancies in results and sweep them 
under the rug. Small discrepancies are often 
good warning signals of larger problems that 
will manifest themselves later if they are not 
sorted out. 

• Test models against known problems. It is 

always a good idea to check a model on sim¬ 
ple problems to which one already knows the 
answer, and many problems can be distilled 
to simple special cases that have known an¬ 
swers. If the model fails to give the correct an¬ 
swer to a problem whose solution is already 
known, then we immediately know that there 
must be something wrong with it. 

• Plot results and use nonparametric statis¬ 
tics. Graphical outputs can be extremely re¬ 
vealing, and simple histograms or plots often 
show up errors that might otherwise be very 
hard to detect. For example, a plot might have 
the wrong slope or shape or have odd fea¬ 
tures such as kinks that flag an underlying 
problem. Summary statistics and simple non¬ 
parametric tests can also be useful for helping 
to impart a feel for data and results. 


• Back-test and stress-test the model. Practi¬ 
tioners should evaluate model adequacy us¬ 
ing stress tests and back tests. 

• Estimate model risk quantitatively. Where 
feasible, practitioners should seek to estimate 
model risk quantitatively (e.g., using simula¬ 
tion methods). However, it helps to keep in 
mind that any quantitative estimate of model 
risk is almost certainly an underestimate be¬ 
cause not all model risk is quantifiable. 

• Reevaluate models periodically. Models 
should be re-calibrated and reestimated on 
a regular basis, and the methods used should 
be kept up to date. 

Some Institutional Guidelines 

Financial institutions themselves can also com¬ 
bat model risk through appropriate institu¬ 
tional devices. One defense is a sound system to 
vet models before they are approved for use and 
then periodically review them. A good model¬ 
vetting procedure is proposed by Crouhy et al. 
(2001, pp. 607-608) and involves the following 
four steps: 

1. Documentation. The risk manager should 
ask for a complete specification of the 
model, including its mathematics, compo¬ 
nents, computer code, and implementation 
features (e.g., numerical methods and pric¬ 
ing algormithms used). The information 
should be in sufficient detail to enable the 
risk manager to reproduce the model from 
the information provided. 

2. Soundness. The risk manager should check 
that the model is a reasonable one for the 
instrument(s) or portfolio concerned. 

3. Benchmark modeling. The risk manager 
should develop a benchmark model and test 
it against well-understood approximation or 
simulation methods. 

4. Check results and test the proposed model. 

The final stage involves the risk manager 
using the benchmark model to check the 
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performance of the proposed model. The 
model should also be checked for zero- 
arbitrage properties such as put-call parity, 
and should then be stress tested to help de¬ 
termine the range of parameter values for 
which it will give reasonable estimates. 

All these stages should be carried out free 
of undue pressures from the front office, and 
traders should not be allowed to vet their own 
pricing models. It is also important to keep 
good records, so each model should be fully 
documented in the middle (or risk) office. Risk 
managers should have full access to the model 
at all times, as well as access to real trading 
and other data that might be necessary to check 
models and validate results. The ideal should 
be to give the middle office enough informa¬ 
tion to be able to check any model or model re¬ 
sults at any time, and do so using appropriate 
(that is, up to date) data sets. This information 
set should include a log of model performance 
with particular attention to any problems en¬ 
countered and what (if anything) has been done 
about them. There should also be a periodic re¬ 
view (as well as occasional spot check) of the 
models in use, to ensure that model calibration 
is up to date and that models are upgraded in 
line with market best practice, and to ensure 
that obsolete models are identified as such and 
taken out of use. Such risk audits should also 
address not just the risk models, but all aspects 
of the firm's risk management. And, of course, 
all these measures should take place in the con¬ 
text of a strong and independent risk oversight 
or middle office function. 

KEY POINTS 

• A model attempts to identify the key features 
of whatever it is meant to represent and is, by 
its very nature, a highly simplified structure. 

• In financial modeling, the concern is with 
both pricing (or valuation) models and risk 
(or VaR) models. The risk of error in pric¬ 


ing or risk-forecasting models is referred to 
as model risk. 

• Model risk is an inescapable consequence of 
model use and affects both pricing models 
and VaR models. 

• The main sources of model risk include 
incorrect specification, incorrect application, 
implementation risk, and the problem of en¬ 
dogenous model risk where traders "game" 
against the model. 

• There are ways in which practitioners can 
manage model risk. These include (1) recog¬ 
nizing model risk, (2) identifying, evaluating, 
and checking the model's key assumptions, 

(3) selecting the simplest reasonable model, 

(4) resisting the temptation to ignore small 
discrepancies in results, (5) testing the model 
against known problems, (6) plotting results 
and employing nonparametric statistics, (7) 
back-testing and stress-testing the model, (8) 
estimating model risk quantitatively, and (9) 
reevaluating models periodically. 
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Abstract: Financial modelers have to solve the critical problem of selecting or perhaps building the 
optimal model to represent the phenomena they seek to study. The task calls for a combination of 
personal creativity, theory, and machine learning. 


In this entry we discuss methods for model 
selection and analyze the many pitfalls of the 
model selection process. 


MODEL SELECTION AND 
ESTIMATION 

In his book Complexity, Mitchell Waldrop (1992) 
describes the 1987 Global Economy Workshop 
held at The Santa Fe Institute, a research cen¬ 
ter dedicated to the study of complex phe¬ 
nomena and related issues. Organized by the 
economist Bryan Arthur and attended by dis¬ 
tinguished economists and physicists, the sem¬ 
inar introduced the idea that economic laws 
might be better understood applying the prin¬ 
ciples of physics and, in particular, the newly 
developed theory of complex systems. The sem¬ 
inar proceedings were to become the influential 


book The Economy as an Evolving Complex System 
(Anderson, Arrow, and Pines, 1998). 

An anecdote from the book is revealing of 
the issues specific to economics as a scien¬ 
tific endeavor. According to Waldrop, physi¬ 
cists attending the seminar were surprised to 
learn that economists used highly sophisticated 
mathematics. 

A physicist attending the seminar reportedly 
asked Kenneth Arrow, the 1972 Nobel Prize 
winner in economics, why, given the lack of 
data to support theories, economists use such 
sophisticated mathematics. Arrow replied, "It 
is just because we do not have enough data 
that we use sophisticated mathematics. We have 
to ensure the logical consistency of our argu¬ 
ments." For physicists, on the other hand, ex¬ 
plaining empirical data is the best guarantee 
of the logical consistency of theories. If theories 
work empirically, then mathematical details are 
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not so important and will be amended later; if 
theories do not work empirically, no logical sub¬ 
tlety will improve them. 

This anecdote is revealing of one of the 
key problems that any modeler of economic 
phenomena has to confront. On the one side, 
economics is an empirical science based on em¬ 
pirical facts. However, as data are scarce, many 
theories and models fit the same data. One is 
tempted to rely on "clear reasoning" to com¬ 
pensate for the scarcity of data. In economics, 
there is always a tension between the use of 
pure reasoning to develop ex ante economic 
theories and the need to conform to generally 
accepted principles of empirical science. The 
development of high-performance computing 
has aggravated the problem, making it possible 
to discover subtle patterns in data and to build 
models that fit data samples with arbitrary 
precision. But patterns and models selected in 
this way are meaningless and reveal no true 
economic feature. 

Given the importance of model selection, let 
us discuss this issue before actually discussing 
estimation issues. It is perhaps useful to com¬ 
pare again the methods of economics and of 
physics. In physics, the process of model choice 
is largely based on human creativity. Facts and 
partial theories are accumulated until scien¬ 
tists make a major leap forward, discovering 
a new unifying theory. Theories are generally 
expressed through differential equations and 
often contain constants (i.e., numerical parame¬ 
ters) to be empirically ascertained. Note that the 
discovery of laws and the determination of con¬ 
stants are separate moments. Theories are often 
fully developed before the constants are deter¬ 
mined; physical constants often survive major 
theoretical overhauls in the sense that new theo¬ 
ries must include the same constants plus, even¬ 
tually, additional ones. 

Physicists are not concerned with problems 
of "data snooping," that is, of fitting the data 
to the same sample that one wants to predict. 
In general, data are overabundant and models 
are not determined through a process of fitting 


and adaptation. Once a physical law that accu¬ 
rately fits all available data is discovered, scien¬ 
tists are confident that it will fit similar data in 
the future. The key point is that physical laws 
are known with a high level of precision. Cen¬ 
turies of theoretical thinking and empirical re¬ 
search have resulted in mathematical models 
that exhibit an amazing level of correspondence 
with reality. Any minor discrepancy from pre¬ 
dictions to experiments entails a major scientific 
reevaluation. Often new laws have completely 
different forms but produce quite similar re¬ 
sults. Experiments are devised to choose the 
winning theory. 

Now consider economics, where the concep¬ 
tual framework is totally different. First, though 
apparently many data are available, these data 
come in vastly different patterns. For example, 
the details of economic development are very 
different from year to year and from country to 
country. Asset prices seem to wander about in 
random ways. Introducing a concept that plays 
a fundamental role later in this entry, we can 
state: From the point of view of statistical esti¬ 
mation, economic data are always scarce given 
the complexity of their patterns. 

Attempts to discover simple deterministic 
laws that accurately fit empirical economic data 
have proved futile. Furthermore, as economic 
data are the product of human artifacts, it is rea¬ 
sonable to believe that they will not follow the 
same laws for very long periods of time. Simply 
put, the structure of any economy changes too 
much over time to believe that economic laws 
are time-invariant laws of nature. One is, there¬ 
fore, inclined to believe that only approximate 
laws can be discovered. 

However the above considerations create 
an additional problem: The precise meaning 
of approximation must be defined. The usual 
response is to have recourse to probability 
theory. Here is the reasoning. Economic data 
are considered one realization of stochastic (i.e., 
random) data. In particular, economic time se¬ 
ries are considered one realization of a stochas¬ 
tic process. The attention of the modeler has 
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therefore to switch from discovering determin¬ 
istic paths to determining the time evolution of 
probability distributions. In physics, this switch 
was made at the end of the 19th century, with 
the introduction of statistical physics. It later be¬ 
came an article of scientific faith that we can ar¬ 
rive at no better than a probabilistic description 
of nature. 

The adoption of probability as a descriptive 
framework is not without a cost: Discover¬ 
ing probabilistic laws with confidence requires 
working with very large populations (or sam¬ 
ples). In physics, this is not a problem as we 
have very large populations of particles. (Al¬ 
though this statement needs some qualifica¬ 
tion because physics has now reached the stage 
where it is possible to experiment with small 
numbers of elementary particles, it is sufficient 
for our discussion here.) In economics, how¬ 
ever, populations are too small to allow for a 
safe estimate of probability laws; small changes 
in the sample induce changes in the laws. We 
can, therefore, make the following statement: 
Economic data are too scarce to allow us to 
make sure probability estimates. 

For example, Gopikrishnan, Meyer, Nunes 
Amaral, and Stanley (1998) conducted a study 
to determine the distribution of stock returns at 
short time horizons, from a few minutes to a few 
days. They found that returns had a power tail 
distribution with exponent a ss 3. One would 
expect that the same measurement repeated 
several times over would give the same result. 
But this is not the case. Since the publication of 
the aforementioned paper, the return distribu¬ 
tion has been estimated several times, obtaining 
vastly different results. Each successive mea¬ 
surement was made in bona fide, but a slightly 
different empirical setting produced different 
results. 

As a result of the scarcity of economic data, 
many statistical models, even simple ones, can 
be compatible with the same data with roughly 
the same level of statistical confidence. For ex¬ 
ample, if we consider stock price processes, 
many statistical models—including the ran¬ 


dom walk—compete to describe each process 
with the same level of significance. Before dis¬ 
cussing the many issues surrounding model se¬ 
lection and estimation, we will briefly discuss 
the subject of machine learning and the machine¬ 
learning approach to modeling. 

THE (MACHINE) LEARNING 
APPROACH TO MODEL 
SELECTION 

There is a fundamental distinction between (1) 
estimating parameters in a well-defined model 
and (2) estimating models through a process of 
learning. Models, as mentioned, are determined 
by human modelers using their creativity. For 
example, a modeler might decide that stock re¬ 
turns in a given market are influenced by a set 
of economic variables and then write a linear 
model as follows: 

K 

r i,t = X! 

k =1 

where the / are stochastic processes that rep¬ 
resent a set of given economic variables. The 
modeler must then estimate the fit and test the 
validity of his model. 

In the machine-learning approach to 
modeling—ultimately a byproduct of the 
diffusion of computers—the process is the 
following: 

* There is a set of empirical data to explain. 

* Data are explained by a family of mod¬ 
els that include an unbounded number of 
parameters. 

* Models fit with arbitrary precision any set of 
data. 

That models can fit any given set of data 
with arbitrary precision is illustrated by neural 
networks, one of the many machine learning 
tools used to model data that includes genetic 
algorithms. As first demonstrated by Cybenko 
(1989), neural networks are universal func¬ 
tion approximators. If we allow a sufficient 
number of layers and nodes, a neural network 
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can approximate any given function with 
arbitrary precision. The idea of universal func¬ 
tion approximators is well known in calculus. 
The Taylor and Fourier series are universal 
approximators for broad classes of functions. 

Suppose a modeler wants to model the un¬ 
known data generation process (DGP) of a time 
series X(f) using a neural network. A DGP is 
a possibly nonlinear function of the following 
type: 

X(f) = F(X(t — 1),..., X(t — k)) 

that links the present value of the series to its 
past. A neural network will try to learn the 
function F using empirical data from the se¬ 
ries. If the number of layers and nodes is not 
constrained, the network can learn F with un¬ 
limited precision. 

Flowever, the key concept of the theory of 
machine learning is that a model that can fit 
any data set with arbitrary precision has no ex¬ 
planatory power, that is, it does not capture 
any true feature of the data, neither in a de¬ 


terministic setting nor in a statistical setting. 
In an economic context, machine learning per¬ 
fectly explains sample data but has no forecast¬ 
ing power. It is only a mathematical device; it 
does not correspond to any economic property. 

We can illustrate this point in a simplified set¬ 
ting. Let us generate an autoregressive trend 
stationary process according to the following 
model: 

X(i) = X(i - 1) + k(Di - X(i - 1)) + ere(z’) 

X = 0.1, D = 0.1, o- = 0.5 

where e(i) are normally distributed zero-mean 
unit-variance random numbers generated with 
a random number generator. The initial con¬ 
dition is X = 1. This process is asymptoti¬ 
cally trend stationary. Using the ordinary least 
squares (OLS) method, let us fit to the process X 
two polynomials of degree 2 and 20 respectively 
on a training window of 200 steps. We con¬ 
tinue the polynomials five steps after the train¬ 
ing window. Figure 1 represents the process 
plot and the two polynomials. Observe from the 
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Figure 1 Polynomial Fitting of a Trend Stationary Process Using Two Polynomials of Degree 2 and 20 
Respectively on a Training Window of 200 Steps 
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exhibit the different behavior of the two poly¬ 
nomials. The polynomial of degree 2 essentially 
repeats the linear trend, while the polynomial 
of degree 20 follows the random fluctuations of 
the process quite accurately Immediately after, 
however, the training window it diverges. 

To address the problem, the theory of machine 
learning suggests criteria to constrain models so 
that they fit sample data only partially but, as 
a trade-off, retain some forecasting power. The 
intuitive meaning is the following: The struc¬ 
ture of the data and the sample size dictate the com¬ 
plexity of the laws that can be learned by computer 
algorithms. 

This is a fundamental point. If we have only a 
small sample data set we can learn only simple 
patterns, provided that these patterns indeed 
exist. The theory of machine learning constrains 
the dimensionality of models to make them 
adapt to the sample size and structure. 

In most practical applications, the theory 
of machine learning works by introducing a 
penalty fiinction that constrains the models. The 
penalty function is a function of the size of the 
sample and of the complexity of the model. One 
compares models by adding the penalty func¬ 
tion to the likelihood function (a definition of 
the likelihood function is provided later). In this 
way one can obtain an ideal trade-off between 
model complexity and forecasting ability. 

Several proposals have been made as regards 
the shape of the penalty function. Three criteria 
are in general use: 

• The Akaike Information Criterion (AIC) 

• The Bayesian Information Criterion (BIC) of 
Schwartz 

• The Maximum Description Length principle 
of Rissanen 

More recently, Vapnik and Chervonenkis 
(1974) have developed a full-fledged quantita¬ 
tive theory of machine learning. While this the¬ 
ory goes well beyond the scope of this book, the 
practical implication of the theory of learning is 
important to note: Model complexity must be 
constrained in function of the sample. 


Consider that some "learning" appears in 
most financial econometric endeavors. For ex¬ 
ample, determining the number of lags in an 
autoregressive model is a problem typically 
solved with methods of learning theory, that 
is, by selecting the number of lags that mini¬ 
mize the sum of the loss function of the model 
plus a penalty function. Ultimately, in modern 
computer-based financial econometrics, there is 
no clear-cut distinction between a learning ap¬ 
proach versus a theory-based a priori approach. 

Note, however, that the theory of machine 
learning offers no guarantee of success. To see 
this point, let's generate a random walk and 
fit two polynomials of degree 3 and 20, respec¬ 
tively. Figure 2 illustrates the random path and 
the two polynomials. The two polynomials ap¬ 
pear to fit the random path quite well. Fol¬ 
lowing the above discussion, the polynomial 
of order 3 seems to capture some real behav¬ 
ior of the data. But as the data are random, the 
fit is spurious. This is by no means a special 
case. In general, it is often possible to fit mod¬ 
els to sample data even if the data are basically 
unpredictable. 

Figures 1 and 2 are examples of the simplest 
cases of model fitting. One might be tempted 
to object that fitting a curve with a polynomial 
is not a good modeling strategy for prices or 
returns. This is true, as one should model a dy¬ 
namic DGP. Flowever, fitting a DGP implies a 
multivariate curve fitting. For illustration pur¬ 
poses, we chose the polynomial fitting of a uni¬ 
variate curve: It is easy to visualize and contains 
all the essential elements of model fitting. 

SAMPLE SIZE AND MODEL 
COMPLEXITY 

The four key conclusions reached thus far are 

• Economic data are generally scarce for statis¬ 
tical estimation given the complexity of their 
patterns. 

• Economic data are too scarce for sure statisti¬ 
cal estimates. 
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Figure 2 Polynomial Fitting of a Random Walk Using Two Polynomials of Degree 3 and 20 Respectively 
on a 100-Step Sample 


• The scarcity of data means that the data might 
be compatible with many different models. 

• There is a trade-off between model complex¬ 
ity and the size of the data sample. 

The last two considerations are critical. To illus¬ 
trate the quantitative trade-off between the size 
of a data sample and model complexity, con¬ 
sider an apparently straightforward case: esti¬ 
mating a correlation matrix. 

It is well known from the theory of random 
matrices that the eigenvalues of the correlation 
matrix of independent random walks are dis¬ 
tributed according to the following law: 

, Q V (kmax k)(k m i n A.) 

p M = ~—2- \ - 

2tt a L A 

where Q is the ratio between the number N of 
sample points and the number M of time series. 
Figure 3 illustrates the theoretical distribution 
of eigenvalues for three values of Q: Q = 1.8, 
Q = 4, and Q = 16. 

As can be easily predicted by examining the 
above formula, the distribution of eigenvalues 


is broader when Q is smaller. The correspond¬ 
ing /. max is larger for the broader distribution. 
The /, m ax are respectively: 

Vnax = 3.0463 for Q = 1.8 
Vnax = 2.2500 for Q = 4 
Vnax = 1.5625 for Q = 16 

The eigenvalues of a random matrix do not 
carry any true correlation information. If we 
now compute the eigenvalues of an empirical 
correlation matrix of asset returns with a given 
Q (i.e., the ratio between number of samples and 
the number of series), we find that only a few 
eigenvalues carry information as they are out¬ 
side the area of pure randomness correspond¬ 
ing to the Q. In fact, with good approximation, 
A.max is the cut-off point that separates meaning¬ 
ful correlation information from noise. (The ap¬ 
plication of random matrices to the estimation 
of correlation and covariance matrices is devel¬ 
oped in Plerou, Gopikrishnan, Rosenow, Nunes 
Amaral, Guhr, and Stanley [2002].) Therefore, as 
the ratio of sample points to the number of asset 
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Figure 3 The Theoretical Distribution of Eigenvalues for Three Values of Q: Q = 1.8, Q = 4, and Q = 16 


prices grows (i.e., we have more points for each 
price process) the "noise area" gets smaller. 

To show the effects of the ratio Q on the es¬ 
timation of empirical correlation matrices, let's 
compute the correlation matrix for three sets of 
900, 400, and 100 stock prices that appeared in 
the MSCI Europe in a six-year period from De¬ 
cember 1998 to February 2005. The return series 
contain in total 1,611 sample points, each corre¬ 
sponding to a trading day. 

First we compute the correlation matrices. 
The average correlation (excluding the diago¬ 
nal) is approximately 10% for the three sets of 
100,400, and 900 stocks. Then we compute the 
eigenvalues. The plot of sorted eigenvalues for 
the three samples is shown in Figures 4, 5, and 
6. One can see from these exhibits that when 
the ratio Q is equal to 16 (i.e., we have more 
sample points per stock price process), the plot 
of eigenvalues decays more slowly. 

Now compare the distribution of empirical 
eigenvalues with the theoretical cut-off point 
Vax that we computed above. The parameter Q 
was chosen to approximately represent the ra¬ 


tios between 1,611 sample points and 100, 400, 
and 900 stocks. Results are tabulated in Table 1. 
This exhibit shows that the percentage of mean¬ 
ingful eigenvalues grows as the ratio between 
the number of sample points and the number 
of processes increases. If we hold the number 
of sample points constant (i.e., 1,611) and in¬ 
crease the number of time series from 100 to 
900, a larger percentage of eigenvalues becomes 
essentially noise (i.e., they do not carry infor¬ 
mation). Obviously the number of meaningful 
eigenvalues increases with the number of se¬ 
ries, but, due to loss of information, it does so 
more slowly than does the number of series due 
to loss of information. 

Two main conclusions can be drawn from 
Table 1: 

* Meaningful eigenvalues represent a small 
percentage of the total, even when Q = 16. 

* The ratio of meaningful eigenvalues to the 
total grows with Q, but the gain is not linear. 

The above considerations apply to estimating 
a correlation matrix. As we will see, however. 
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Figure 4 Plot of Eigenvalues for 900 Prices, Q = 1.8 



Figure 5 Plot of Eigenvalues for 400 Prices, Q = 4 
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Figure 6 Plot of Eigenvalues for 100 Prices, Q = 16 


Table 1 Comparison of the Distribution of Empirical Eigenvalues with the Theoretical 
Cutoff Point for Different Values of Q 


Number of 
processes 

Average 

correlation 

Max 

eigenvalue 

Number of 
meaningful 
eigenvalues 

Percentage of 

meaningful 

eigenvalues 

900; Q = 1.8 

10% 

118 

26 

0.029 

400; Q = 4 

9.5% 

50 

15 

0.038 

100; Q = 16 

9.8% 

14 

6 

0.06 


they carry over, at least qualitatively, to the esti¬ 
mation of any linear dynamic model. In fact, the 
estimation of linear dynamic models is based on 
estimating correlation and covariance matrices. 


DANGEROUS PATTERNS OF 
BEHAVIOR 

One of the most serious mistakes that a financial 
modeler can make is to look for rare or unique 
patterns that look profitable in-sample but pro¬ 
duce losses out-of-sample. This mistake is made 
easy by the availability of powerful computers 
that can explore large amounts of data: Any 


large data set contains a huge number of pat¬ 
terns, many of which look very profitable. Oth¬ 
erwise expressed, any large set of data, even 
if randomly generated, can be represented by 
models that appear to produce large profits. 
To see the point, perform the following sim¬ 
ple experiment. Using a good random number 
generator, generate a large number of indepen¬ 
dent random walks with zero drift. In sample, 
these random walks exhibit large profit oppor¬ 
tunities. There are numerous reasons for this. In 
fact, if we perform a sufficiently large number of 
simulations, we will generate a number of paths 
that are arbitrarily close to any path we want. 
Many paths will look autocorrelated and will 
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be indistinguishable from trend-stationary pro¬ 
cesses. In addition, many stochastic trends will 
be indistinguishable from deterministic drifts. 

There is nothing surprising in the above phe¬ 
nomena. A stochastic process or a discrete 
time series is formed by all possible paths. 
For example, a trend-stationary process and a 
random walk are formed by the same paths. 
What makes the difference between a trend¬ 
stationary process and a random walk are not 
the paths—which are exactly the same—but the 
probability assignments. Suppose processes are 
discrete, for example because time is discrete 
and prices move by only discrete amounts. Any 
computer simulation is ultimately a discrete 
process, though the granularity of the process is 
very small. In this discrete case, we can assign a 
discrete probability to each path. The difference 
between processes is the probability assigned to 
each path. In a large sample, even low proba¬ 
bility paths will occur, albeit in small numbers. 

In a very large data set, almost any path will 
be approximated by some path in the sample. 
If the computer generates a sufficiently large 
number of random paths, we will come arbi¬ 
trarily close to any given path, including, for 
example, to any path that passes the test for 
trend stationarity. In any large set of price pro¬ 
cesses, one will therefore always find numer¬ 
ous interesting paths, such as cointegrated pairs 
and trend-stationary processes. 

To avoid looking for ephemeral patterns, we 
must stick rigorously to the paradigm of ma¬ 
chine learning and statistical tests. This sounds 
conceptually simple, but it is very difficult to 
do in practice. It means that we have to decide 
the level of confidence that we find acceptable 
and then compute probability distributions for 
the entire sample. This has somewhat counter¬ 
intuitive consequences. We illustrate this point 
using as an example the search for cointegrated 
pairs; the same reasoning applies to any statis¬ 
tical property. 

Suppose that we have to decide whether a 
given pair of time series is cointegrated or not. 
We can use one of the many cointegration tests. 


If the time series are short, no test will be con¬ 
vincing; the longer the time series, the more 
convincing the test. The problem with economic 
data is that no test is really convincing as the 
confidence level is generally in the range of 95% 
or 99%. Whatever confidence level we choose, 
given one or a small number of pairs, we de¬ 
cide the cointegration properties of each pair 
individually. For example, in macroeconomic 
studies where only a few time series are given, 
we decide if a given pair of time series is coin¬ 
tegrated or not by looking at the cointegration 
test for that pair. 

Does having a large number of data series, 
for example 500 price time series, require any 
change in the testing methodology? The an¬ 
swer, in fact, is that additional care is required: 
In a large data set, for the reasons we outlined 
above, any pattern can be approximated. One 
has to look at the probability that a pattern will 
appear in that data set. In the example of cointe¬ 
gration, if one finds, say, ten cointegrated pairs 
in 500 time series, the question to ask is: What 
is the probability that in 500 time series 10 time 
series are cointegrated? Answering this ques¬ 
tion is not easy because the properties of pairs 
are not independent. In fact, given three se¬ 
ries a, b, and c we can form three distinct pairs 
whose cointegration properties are not, how¬ 
ever, mutually independent. This makes calcu¬ 
lations difficult. 

To illustrate the above, let us generate a 
simulated random walk using the following 
formula: 

X(0 = X(» - 1) + e(r) 

X(l) = 1 

where X(z') is a random vector with 500 ele¬ 
ments, and the noise term is generated with 
a random number generator as 500 indepen¬ 
dent normally distributed zero-mean unitary- 
variance numbers. Now run simulations for 500 
steps. Next, eliminate linear trends from each 
realization. (Cointegration tests can handle lin¬ 
ear trends. We detrended for clarity of illustra¬ 
tions.) A sample of three typical realizations of 
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Figure 7 A Sample of Three Typical Realizations of a 500-Step Random Walk with Their Trends 


the random walks is illustrated in Figure 7 and 
the corresponding residuals after detrending in 
Figure 8. 

Now run the cointegration test at a 99% con¬ 
fidence level on each possible pair. In a sample 
of 10 simulation runs, we obtain the following 


number of pairs that pass the cointegration test: 
74, 75, 89, 73, 65, 91, 91, 93, 84, 62. There are in 
total 



500 x 499 
2 


= 124,750 distinct pairs 



Figure 8 The Residuals of the Same Random Walks after Detrending 
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If cointegration properties of pairs were inde¬ 
pendent, given 500 random walks, on the av¬ 
erage we should find 124 pairs that pass the 
cointegration test at the 99% confidence level. 
However, cointegration properties of pairs are 
not independent for the reasons mentioned 
above. This explains why we obtained a smaller 
number of pairs than expected. This example il¬ 
lustrates the usefulness of running Monte Carlo 
experiments to determine the number of coin¬ 
tegrated pairs found in random walks. 

If, however, the patterns we are looking for 
are all independent, calculations are relatively 
straightforward. Suppose we are looking for 
stationary series applying an appropriate test 
at a 99% confidence interval. This means that 
a sample random walk has a 1% probability of 
passing the test (i.e., to be wrongly accepted 
as stationary) and a 99% probability of being 
correctly rejected. In this case, the probability 
distribution of the number of paths that pass 
the stationarity test given a sample of 500 gen¬ 
erated random walks is a binomial distribution 
with probabilities p — 0.01 and q = 1 — p — 0.99 
and mean 5. 

We apply criteria of this type very often in 
our professional and private lives. For example, 
suppose that an inspector has to decide whether 
to accept or reject a supply of spare parts. The 
inspector knows that on average one part in 100 
is defective. He randomly chooses a part in a lot 
of 100 parts. If the part is defective, he is likely to 
ask for additional tests before accepting the lot. 
Suppose now that he tests 100 parts from 100 
different lots of 100 parts and finds only one 
defective part. He is likely to accept the 100 lots 
because the incidence of faulty parts is what he 
expected it to be, that is, one in 100. The point 
is that we are looking for statistical properties, 
not real identifiable objects. 

A profitable price time series is not a rec¬ 
ognizable object. We find what seems to be a 
profitable time series but we cannot draw any 
conclusion because the level of the "authenticity 
test" of each series is low. When looking at very 
large data sets, we have to make data work for 


us and not against us, examining the entire sam¬ 
ple. For example, consider a strategy known 
as "pair trading." In this strategy, an investor 
selects pairs from a stock universe and main¬ 
tains a market neutral (i.e., zero beta) long-short 
portfolio of several pairs of stocks with a mean- 
reverting spread. When there are imbalances in 
the market causing the spread to diverge, the in¬ 
vestor seeks to determine the reason for the di¬ 
vergence. If the investor believes that the spread 
will revert, he or she takes a position in the two 
stocks to capitalize on the reversion. A modeler 
who would define a pair trading strategy based 
on the cointegrated pair in the previous exam¬ 
ple would be disappointed. Based on extensive 
Monte Carlo simulations to compare the num¬ 
ber of cointegrated pairs among the stocks in the 
S&P 500 index for the period 2001-2004 and in 
computer-generated random walks, the num¬ 
ber of cointegrated pairs we found was slightly 
larger in the real series than in the simulated 
random walks. 

We can conclude that it is always good prac¬ 
tice is to test any model or pattern recognition 
method against a surrogated random sample 
generated with the same statistical character¬ 
istics as the empirical ones. For example, it is 
always good practice to test any model and 
any strategy intended to find excess returns on 
a set of computer-generated random walks. If 
the proposed strategy finds profit in computer¬ 
generated random walks, it is highly advisable 
to rethink the strategy. 

DATA SNOOPING 

Given the scarcity of data and the basically 
uncertain nature of any econometric model, it 
is generally required to calibrate models on 
some data set, the so-called training set, and 
test them on another data set, the test set. In 
other words, it is necessary to perform an out- 
of-sample validation on a separate test set. The 
rationale for this procedure is that any machine- 
learning process—or even the calibration mech¬ 
anism itself—is a heuristic methodology, not 
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a true discovery process. Models determined 
through a machine-learning process must be 
checked against the reality of out-of-sample val¬ 
idation. Failure to do so is referred to as data 
snooping, that is, performing training and tests 
on the same data set. 

Out-of-sample validation is typical of 
machine-learning methods. Learning entails 
models with unbounded capabilities of ap¬ 
proximation constrained by somewhat artificial 
mechanisms such as a penalty function. This 
learning mechanism is often effective but there 
is no guarantee that it will produce a good 
model. Therefore, the learning process is con¬ 
sidered discovery heuristics. The true validation 
test, say the experiments, has to be performed 
on the test set. Needless to say, the test set 
must be large and cover all possible patterns, 
at least in some approximate sense. For exam¬ 
ple, in order to test a trading strategy one would 
need to test data in many different market con¬ 
ditions: with high volatility and low volatil¬ 
ity, in expansionary and recessionary economic 
periods, under different correlation situations, 
and so on. 

Data snooping is not always easy to under¬ 
stand or detect. Suppose that a modeler wants 
to build the DGP of a time series. A DGP is 
often embodied in a set of difference equa¬ 
tions with parameters to be estimated. Suppose 
that four years of data of a set of time series 
are available. A modeler might be tempted to 
use the entire four years to perform a "robust" 
model calibration and to "test" the model on 
the last year. This is an example of data snoop¬ 
ing that might be difficult to recognize and to 
avoid. In fact, one might (erroneously) reason 
as follows. If there is a true DGP, it is more 
likely that it is "discovered" on a four-year 
sample than on shorter samples. If there is a 
true DGP, data snooping is basically innocu¬ 
ous and it is therefore correct to use the entire 
data set. On the other hand, if there is no stable 
DGP, then it does not make sense to calibrate 
models as their coefficients would be basically 
noise. 


This reasoning is wrong. In general, there is 
no guarantee that, even if a true DGP exists, a 
learning algorithm will learn it. Among the rea¬ 
sons for learning failure are (1) the slow conver¬ 
gence of algorithms which might require more 
data than that available, and (2) the possibility 
of getting stuck in local optima. Flowever, the 
real danger is the possibility that no true DGP 
exists. Should this be the case, the learning algo¬ 
rithm might converge to a false solution or not 
converge at all. We illustrated this fact earlier in 
this entry where we showed how it is possible 
to successfully fit a low dimensionality polyno¬ 
mial to a randomly generated path. 

There are other forms of data snooping. Sup¬ 
pose that a modeling team works on a sample of 
stock price data to find a profitable trading strat¬ 
egy. Suppose that they respect all of the above 
criteria of separation of the training set and the 
data set. Different strategies are tried and those 
that do not perform are rejected. Though sound 
criteria are used, there is still the possibility that 
by trial and error the team hits a strategy that 
performs well in sample but poorly when ap¬ 
plied in the real world. Another form of hidden 
data snooping is when a methodology is finely 
calibrated to sample data. Again, there is the 
possibility that by trial and error one finds a 
calibration parameterization that works well in 
sample and poorly in the real world. 

There is no sound theoretical way to avoid 
this problem ex ante. In practice, the answer is to 
separate the sets of training data and test data, 
and to decide on the existence of a DGP in func¬ 
tion of performance on the test data. Flowever, 
this type of procedure requires a lot of data. 
"Resampling" techniques have been proposed 
to alleviate the problem. Intuitively, the idea be¬ 
hind resampling methods is that a stable DGP 
calibrated on any portion of the data should 
work on the remaining data. Widely used re¬ 
sampling techniques include "leave-one-out" 
and "bootstrapping." The bootstrap technique 
creates surrogated data from the initial sample 
data. (The bootstrap is an important technique 
but its description goes beyond the scope of this 
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entry. For a review of bootstrapping, see Davi¬ 
son and Hinkley [1997].) 

Data snooping is a defect of training pro¬ 
cesses which must be controlled but which is 
very difficult to avoid given the size of data 
samples currently available. Suppose samples 
in the range of ten years are available. (Tech¬ 
nically much longer data sets on financial mar¬ 
kets, up to 50 years of price data, are available. 
While useful for some applications, these data 
are useless for most asset management appli¬ 
cations given the changes in the structure of 
the economy.) One can partition these data and 
perform a single test free from data snooping bi¬ 
ases. However, if the test fails, one has to start all 
over again and design a new strategy. The pro¬ 
cess of redesigning the modeling strategy might 
have to be repeated several times over before an 
acceptable solution is found. Inevitably, repeat¬ 
ing the process on the same data includes the 
risk of data snooping. The real danger in data 
snooping is the possibility that by trial and er¬ 
ror or by optimization, one hits upon a model 
that casually performs well on the sample data 
but that will perform poorly in real-world fore¬ 
casts. Fabozzi, Focardi, and Ma (2005) explore 
at length different ways in which data snoop¬ 
ing and other biases might enter the model dis¬ 
covery process and propose a methodology to 
minimize the risk of biases, as will be explained 
in the last section of this entry. 


SURVIVORSHIP BIASES AND 
OTHER SAMPLE DEFECTS 

We now examine possible defects of the sample 
data themselves. In addition to errors and miss¬ 
ing data, one of the most common (and danger¬ 
ous) defects of sample data are the so-called 
survivorship biases. The survivorship bias is a 
consequence of selecting time series, in particu¬ 
lar asset price time series, based on criteria that 
apply at the end of the period. For example, sup¬ 
pose a sample contains 10 years of price data for 
all stocks that are in the S&P 500 today and that 
existed for the last 10 years. This sample, ap¬ 


parently well formed, is, however, biased: The 
selection, in fact, is made on the stocks of com¬ 
panies that are in the S&P 500 today, that is, 
those companies that have "survived" in suf¬ 
ficiently good shape to still be in the S&P 500 
aggregate. The bias comes from the fact that 
many of the surviving companies successfully 
passed through some difficult period. Surviv¬ 
ing the difficulty is a form of reversion to the 
mean that produces trading profits. However, 
at the moment of the crisis it was impossible 
to predict which companies in difficulty would 
indeed have survived. 

To gauge the importance of the survivorship 
bias, consider a strategy that goes short on a 
fraction of the assets with the highest price and 
long on the corresponding fraction with the 
lowest price. This strategy might appear highly 
profitable in sample. Looking at the behavior 
of this strategy, however, it becomes clear that 
profits are very large in the central region of the 
sample and disappear approaching the present 
day. This behavior should raise flags. Although 
any valid trading strategy will have good and 
bad periods, profit reduction when approach¬ 
ing the present day should command height¬ 
ened attention. 

Avoiding the survivorship bias seems simple 
in principle: It might seem sufficient to base 
any sample selection at the moment where the 
forecast begins, so that no invalid information 
enters the strategy prior to trading. However, 
the fact that companies are founded, merged, 
and closed plays havoc with simple models. In 
fact, calibrating a simple model requires data of 
assets that exist over the entire training period. 
This in itself introduces a potentially substantial 
training bias. 

A simple model cannot handle processes that 
start or end in the middle of the training pe¬ 
riod. On the other hand, models that take into 
account the foundation or closing of firms can¬ 
not be simple. Consider, for example, a simple 
linear autoregressive model. Any addition or 
deletion of companies introduces a nonlinear¬ 
ity in the model and precludes using standard 
tools such as the OLS method. 
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There is no ideal solution. Care is required in 
estimating possible performance biases conse¬ 
quent to sample biases. Suppose that we make 
a forecast of return processes based on models 
trained on the past three or four years of re¬ 
turns data on the same processes that we want 
to forecast. Clearly there is no data snooping, 
as we use only information available prior to 
forecasting. However, it should be understood 
that we are estimating our models on data that 
contain biases. If the selection of companies to 
forecast is subject to strong criteria, for exam¬ 
ple companies that belong to a major index, it 
is likely that the model will suffer a loss of per¬ 
formance. This is due to the fact that models 
will be trained on spurious past performance. 
If the modeler is constrained to work on a spe¬ 
cific stock selection, for example because he 
has to create an active strategy against a se¬ 
lected benchmark, he might want to consider 
Bayesian techniques to reduce the biases. 

The survivorship bias is not the only pos¬ 
sible bias of sample data. More in general, 
any selection of data contains some bias. Some 
of these biases are intentional. For example, 
selecting large caps or small caps introduces 
special behavioral biases that are intentional. 
However, other selection biases are more dif¬ 
ficult to appreciate. In general, any selec¬ 
tion based on belonging to indexes introduces 
index-specific biases in addition to the survivor¬ 
ship bias. Consider that presently thousands of 
indexes are in use—the FTSE alone has created 
some 60,000. Institutional investors and their 
consultants use these indexes to create asset al¬ 
location strategies and then give the indexes to 
asset managers for active management. 

Anyone creating active management strate¬ 
gies based on these indexes should be aware of 
the biases inherent in the indexes when build¬ 
ing their strategies. Data snooping applied to 
carefully crafted stock selection can result in 
poor performance because the asset selection 
process inherent in the index formation process 
can produce very good results in sample; these 
results vanish out-of-sample as "snow under 
the sun." 


MOVING TRAINING 
WINDOWS 

Thus far we assumed that the DGP exists as a 
time-invariant model. Can we also assume that 
the DGP varies and that it can be estimated on 
a moving window? If yes, how can it be tested? 
These are complex questions that do not ad¬ 
mit an easy answer. It is often assumed that 
the economy undergoes "structural breaks" or 
"regime shifts" (i.e., that the economy under¬ 
goes discrete changes at fixed or random time 
points). 

If the economy is indeed subject to breaks or 
shifts and the time between breaks is long, mod¬ 
els would perform well for a while and then, at 
the point of the break, performance would de¬ 
grade until a new model is learned. If regime 
changes are frequent and the interval between 
the changes short, one could use a model that 
includes the changes. The result is typically a 
nonlinear model such as the Markov-switching 
models. Estimating models of this type is very 
onerous given the nonlinearities inherent in the 
model and the long training period required. 

There is, however, another possibility that is 
common in modeling. Consider a model that 
has a defined structure, for example a linear 
VAR model, but whose coefficients are allowed 
to change in time with the moving of the train¬ 
ing window. In practice, most models used 
work in this way as they are periodically re¬ 
calibrated. The rationale of this strategy is that 
models are assumed to be approximate and suf¬ 
ficiently stable for only short periods of time. 
Clearly there is a trade-off between the advan¬ 
tage of using long training sets and the disad¬ 
vantage that a long training set includes too 
much change. 

Intuitively, if model coefficients change 
rapidly, this means that the model coefficients 
are noisy and do not carry genuine information. 
We have seen an example above in the simple 
case of estimating a correlation matrix. There¬ 
fore, it is not sufficient to simply reestimate the 
model: One must determine how to separate the 
noise from the information in the coefficients. 
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For example, a large VAR model used to repre¬ 
sent prices or returns will generally be unstable. 
It would not make sense to reestimate the model 
frequently; one should first reduce model di¬ 
mensionality with, for example, factor analysis. 
Once model dimensionality has been reduced, 
coefficients should change slowly. If they con¬ 
tinue to change rapidly, the model structure 
cannot be considered appropriate. One might, 
for example, have ignored fat tails or essential 
nonlinearities. 

How can we quantitatively estimate an ac¬ 
ceptable rate of change for model coefficients? 
Are we introducing a special form of data 
snooping in calibrating the training window? 
Clearly the answer depends on the nature of the 
true DGP—assuming that one exists. It is easy to 
construct artificially DGPs that change slowly 
in time so that the learning process can progres¬ 
sively adapt to them. It is also easy to construct 
true DGPs that will play havoc with any method 
based on a moving training window. For exam¬ 
ple, if one constructs a linear model where co¬ 
efficients change systematically at a frequency 
comparable with a minimum training window, 
it will not be possible to estimate the process as 
a linear model estimated on a moving window. 

Calibrating a training window is clearly an 
empirical question. However, it is easy to see 
that calibration can introduce a subtle form of 
data snooping. Suppose a rather long set of time 
series is given, say six to eight years, and that 
one selects a family of models to capture the 
DGP of the series and to build an investment 
strategy. Testing the strategy calls for calibrating 
a moving window. Different moving windows 
are tested. Even if training and test data are kept 
separate so that forecasts are never performed 
on the training data, clearly the methodology is 
tested on the same data on which the models 
are learned. 

Other problems with data snooping stem 
from the psychology of modeling. A key pre¬ 
cept that helps to avoid biases is the following: 
Modeling hunches should be based on theoret¬ 
ical reasoning and not on looking at the data. 


This statement might seem inimical to an em¬ 
pirical enterprise, an example of the danger of 
"clear reasoning" mentioned above. Still, it is 
true that by looking at data too long one might 
develop hunches that are sample-specific. There 
is some tension between looking at empirical 
data to discover how they behave and avoid¬ 
ing to capture the idiosyncratic behavior of the 
available data. 

In his best-seller Chaos: Making a New Sci¬ 
ence, James Gleick (1987) reports that one of the 
initiators of chaos theory used to spend long 
hours flying planes (at his own expense) just 
to contemplate clouds to develop a feeling for 
their chaotic movement. Obviously there is no 
danger of data snooping in this case as there 
are plenty of clouds on which any modeling 
idea can be tested. In other cases, important 
discoveries have been made working on rel¬ 
atively small data samples. The 20th-century 
English hydrologist Harold Hurst developed 
his ideas of rescaled range analysis from the 
yearly behavior of the Nile River, approxi¬ 
mately 500 years of sample data, not a huge data 
sample. 

Clearly simplicity (i.e., having only a small 
number of parameters to calibrate) is a virtue 
in modeling. A simple model that works well 
should be favored over a complex model that 
might produce unpredictable results. Nonlin¬ 
ear models in particular are always subject to 
the danger of unpredictable chaotic behavior. 
It was a surprising discovery that even simple 
maps originate highly complex behavior. The 
conclusion is that every step of the discovery 
process has to be checked for empirical, theo¬ 
retical, and logical consistency. 

MODEL RISK 

As we have seen above, any model choice 
and estimation process might result in biases 
and poor performance. In other words, any 
model selection process is subject to model risk. 
One might well ask if it is possible to mitigate 
model risk. In statistics, there is a long tradition. 
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initiated by the 18th-century English math¬ 
ematician Thomas Bayes, of considering 
uncertain not only individual outcomes but 
the probability distribution itself. It is therefore 
natural to see if ideas from Bayesian statistics 
and related concepts could be applied to 
mitigate model risk. 

A simple idea that is widely used in practice 
is to take the average of different models. This 
idea can take different forms. Suppose that we 
have to estimate a variance-covariance matrix. 
It makes sense to take radically different esti¬ 
mates such as noisy empirical estimates and 
capital asset pricing model (CAPM) estimates 
that only consider covariances with the market 
portfolio and average. Averaging is done with 
the principle of shrinkage, that is, one does not 
form a pure average but weights the two matri¬ 
ces with weights a and 1 — a, choosing a accord¬ 
ing to some optimality principle. This idea can 
be extended to dynamic models, weighting all 
coefficients in a model with a probability distri¬ 
bution. Here we want to make some additional 
qualitative considerations that lead to strategies 
in model selection. 

There are two principal reasons for applying 
model risk mitigation. First, we might be uncer¬ 
tain as to which model is best, and so mitigate 
risk by diversification. Second, perhaps more 
cogent, we might believe that different models 
will perform differently under different circum¬ 
stances. By averaging, we hope to reduce the 
volatility of our forecasts. It should be clear that 
averaging model results or working to produce 
an average model (i.e., averaging coefficients) 
are two different techniques. The level of diffi¬ 
culty involved is also different. 

Averaging results is a simple matter. One es¬ 
timates different models with different tech¬ 
niques, makes forecasts and then averages the 
forecasts. This simple idea can be extended to 
different contexts. For example, in rating stocks 
one might want to do an exponential averaging 
over past ratings, so that the proposed rating 
today is an exponential average of the model 
rating today and model ratings in the past. 


Obviously parameters must be set correctly, 
which again forces a careful analysis of possible 
data snooping biases. Whatever the averaging 
process one uses, the methodology should be 
carefully checked for statistical consistency. For 
example, one obtains quite different results ap¬ 
plying methodologies based on averaging to 
stationary or nonstationary processes. The key 
principle is that averaging is used to eliminate 
noise, not genuine information. 

Averaging models is more difficult than av¬ 
eraging results. In this case, the final result is a 
single model, which is, in a sense, the average of 
other models. Shrinkage of the covariance ma¬ 
trix is a simple example of averaging models. 


MODEL SELECTION IN A 
NUTSHELL 

It is now time to turn all the caveats into some 
positive approach to model selection. As re¬ 
marked in Fabozzi, Focardi, and Ma (2005), 
any process of model selection must start with 
strong economic intuition. Data mining and 
machine learning alone are unlikely to produce 
significant positive results. The possibility that 
scientific discovery, and any creative process 
in general, can be "outsourced" to computers 
is still far from today's technological reality. A 
number of experimental artificial intelligence 
(AI) programs have indeed shown the ability 
to "discover" scientific laws. For example, the 
program KAM developed by Yip (1989) is able 
to analyze nonlinear dynamic patterns and the 
program TETRAD developed at Carnegie Mel¬ 
lon is able to discover causal relationships in 
data (see Glymour, Schemes, Spirtes, and Kelly, 
1987). However, practical applications of ma¬ 
chine intelligence use AI as a tool to help per¬ 
form specific tasks. 

Economic intuition clearly entails an element 
of human creativity. As in any other scientific 
and technological endeavor, it is inherently de¬ 
pendent on individual abilities. Is there a body 
of true, shared science that any modeler can 
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use? Or do modelers have to content them¬ 
selves with only partial and uncertain findings 
reported in the literature? As of the writing of 
this book, the answer is probably a bit of both. 

One would have a hard time identifying eco¬ 
nomic laws that have the status of true scientific 
laws. Principles such as the absence of arbitrage 
are probably what comes closest to a true sci¬ 
entific law but are not, per se, very useful in 
finding, say, profitable trading strategies. Most 
economic findings are of an uncertain nature 
and are conditional on the structure of the econ¬ 
omy or the markets. 

It is fair to say that economic intuition is based 
on a number of broad economic principles plus 
a set of findings of an uncertain and local nature. 
Economic findings are statistically validated on 
a limited sample and probably hold only for a 
finite time span. Consider, for example, findings 
such as volatility clustering. One might claim 
that volatility clustering is ubiquitous and that 
it holds for every market. In a broad sense this 
is true. However, no volatility clustering model 
can claim the status of a law of nature as all 
volatility clustering models fail to explain some 
essential fact. 

It is often argued that profitable investment 
strategies can be based only on secret propri¬ 
etary discoveries. This is probably true but its 
importance should not be exaggerated. Secrecy 
is typically inimical to knowledge building. Se¬ 
crets are also difficult to keep. Historically, the 
largest secret knowledge-building endeavors 
were related to military efforts. Some of these 
efforts were truly gigantic, such as the Manhat¬ 
tan Project to develop the first atomic bomb. 
Industrial projects of a non-military nature are 
rarely based on a truly scientific breakthrough. 
They typically exploit existing knowledge. 

Financial econometrics is probably no excep¬ 
tion. Proprietary techniques are, in most cases, 
the application of more or less shared knowl¬ 
edge. There is no record of major economic 
breakthroughs made in secrecy by investment 
teams. Some firms have advantages in terms 
of data. Custodian banks, for example, can ex¬ 


ploit data on economic flows that are not avail¬ 
able to (or in any case are very expensive for) 
other entities. Until the recent past, availability 
of computing power was also a major advan¬ 
tage, reserved to only the biggest Wall Street 
firms; however, computing power is now a 
commodity. 

As a consequence, it is fair to say that eco¬ 
nomic intuition can be based on a vast amount 
of shared knowledge plus some proprietary dis¬ 
covery or interpretation. In the last 25 years, a 
number of computer methodologies were ex¬ 
perimented with in the hope of discovering po¬ 
tentially important sources of profits. Among 
the most fascinating of these were nonlinear dy¬ 
namics and chaos theory, as well as neural net¬ 
works and genetic algorithms. None has lived 
up to initial expectations. With the maturing of 
techniques, one discovers that many new pro¬ 
posals are only a different language for exist¬ 
ing ideas. In other cases, there is a substantial 
equivalence between theories. 

After using intuition to develop an ex ante 
hypothesis, the process of model selection and 
calibration begins in earnest. This implies se¬ 
lecting a sample free from biases and deter¬ 
mining a quality-control methodology. In the 
production phase, an independent risk control 
mechanism will be essential. A key point is 
that the discovery process should be linear. If 
at any point the development process does not 
meet the quality standards, one should resist 
the temptation of adjusting parameters and go 
back to develop new economic intuition. 

This process implies that there is plenty of eco¬ 
nomic intuition to work on. The modeler must 
have many ideas to develop. Ideas might range 
from the intuition that certain market segments 
have some specific behavior to the discovery 
that there are specific patterns of behavior with 
unexploited opportunities. In some cases it 
will be the application of ideas that are well 
known but have never been applied on a large 
scale. 

A special feature of the model selection pro¬ 
cess is the level of uncertainty and noise. 


Model Selection and Its Pitfalls 
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Figure 9 Process of Quantitative Research and Investment Strategy 
Source: Fabozzi, Focardi, and Ma (2005, p. 73) 


Models capture small amounts of informa¬ 
tion in a vast "sea of noise." Models are 
always uncertain, and so is their potential 
longevity. The psychology of discovery plays 
an important role. These considerations suggest 
the adoption of a rigorous objective research 
methodology. Figure 9 illustrates the work 
flow for a sound process of discovery of prof¬ 
itable strategies. (For a further discussion, see 
Fabozzi, Focardi, and Ma (2005).) 

A modeler working in financial econometrics 
is always confronted with the risk of finding an 
artifact that does not, in reality, exist. And, as we 
have seen, paradoxically one cannot look too 
hard at the data; this risks introducing biases 
formed by available but insufficient data sets. 
Even trying too many possible solutions, one 
risks falling into the trap of data snooping. 


KEY POINTS 

* Model selection in financial econometrics re¬ 
quires a blend of theory, creativity, and ma¬ 
chine learning. 

* The machine-learning approach starts with a 
set of empirical data that we want to explain. 
Data are explained by a family of models that 
include an unbounded number of parame¬ 
ters and are able to fit data with arbitrary 
precision. 

* There is a trade-off between model complex¬ 
ity and the size of the data sample. To im¬ 
plement this trade-off, ensuring that models 
have forecasting power, the fitting of sam¬ 
ple data is constrained to avoid fitting noise. 
Constraints are embodied in criteria such as 
the Akaike Information Criterion (AIC) or the 
Bayesian Information Criterion (BIC). 
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• Economic data are generally scarce given the 
complexity of their patterns. This scarcity in¬ 
troduces uncertainty as regards our statisti¬ 
cal estimates. It means that the data might be 
compatible with many different models with 
the same level of statistical confidence. 

• A serious mistake in model selection is to 
look for models that fit rare or unique pat¬ 
terns; such patterns are purely random and 
lack predictive power. 

• Another mistake in model selection is data 
snooping, that is, fitting models to the same 
data that we want to explain. A sound 
model selection approach calls for a sepa¬ 
ration of sample data and test data: Models 
are fitted to sample data and tested on test 
data. 

• Because data are scarce, techniques have been 
devised to make optimal use of data; perhaps 
the most widely used of such techniques is 
bootstrapping. 

• Financial data are also subject to "survivor¬ 
ship bias," that is, data are selected using 
criteria known only a posteriori, for example 
companies that are presently in the S&P 500. 
Survivorship bias induces biases in models 
and results in forecasting errors. 

• Model risk is the risk that models are subject 
to forecasting errors in real data. Techniques 
to mitigate model risk include Bayesian tech¬ 
niques, averaging/shrinkage, and random 
coefficient models. 

• A sound model selection methodology in¬ 
cludes strong theoretical considerations, the 


rigorous separation of sample and testing 
data, and discipline to avoid data snooping. 
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Abstract: Practical applications of financial models require a proper assessment of the model risk 
due to uncertainty of the model parameters. Methods of the probabilistic decision theory achieve 
this objective. Probabilistic decision making starts from the Bayesian inference process, which sup¬ 
plies the posterior distribution of parameters. Bayesian incorporation of priors, or opinions, which 
influence posterior confidence intervals for the model parameters, is indispensable in real-world 
financial applications. Then, the utility function is used to evaluate practical implications of uncer¬ 
tainty of parameters by comparing the relative expected values of differing decisions. Probabilistic 
decision making involves computer simulations in all realistic situations. Still, a complete analytical 
treatment is possible in simple cases. 


Practical applications of financial models re¬ 
quire their parameters to be given concrete 
numerical values. These values are typically 
fitted to empirical data to ensure that the 
model predictions match historical observa¬ 
tions. Parameter values obtained by such fitting 
procedures never propagate into the future un¬ 
changed: Tracing the model's steps back in time, 
we find that its parameters are always more or 
less in error. The convention is that predictions 
made by the model are better if its parameters 
are known with better precision. 

Thus, financial models are always in error—to 
an extent. Additional variability of actual out¬ 
comes due to models themselves, or model risks, 
can be loosely associated with Knightian un¬ 
certainty. Methods of Bayesian inference estimate 


the extent of this uncertainty, whereas the utility 
theory helps evaluate relative costs of decisions 
made under this uncertainty. Probabilistic deci¬ 
sion theory, which combines Bayesian inference 
with the concept of utility, is the natural and 
powerful tool for handling intrinsic risks of fi¬ 
nancial models. The purpose of this chapter is 
to demonstrate how it works in practice. 


AN OUTLINE OF 
PROBABLISTIC DECISION 
THEORY 

As McKay (2008) cleverly puts it, probabilistic 
decision theory is trivial—apart from compu¬ 
tational details. It has its roots in the Bayesian 
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inference and in the concept of the utility, or the 
loss, function. Bayesian inference with its pure 
probabilistic methods is now gaining its long- 
deserved position in financial applications. 

The utility function U : d —► V that maps the 
outcomes of possible decisions d onto the value 
space (or, conversely, the cost space) V is a con¬ 
cept that embodies personal choice and indi¬ 
vidual risk preferences. In its simplest form, 
the cost space is one-dimensional. This makes 
it possible to order decisions by their costs. The 
decision that has a minimum cost (or a maxi¬ 
mum value) is the best decision in the sense of 
the utility function U. 

We will proceed to the formulation of the 
probabilistic decision-making theory according 
to Jaynes (2003) and McKay (2008). If E (.) is the 
expectation, d is the decision, U(d) is the utility 
function of the decision, 9 is the probable future 
state of the world, and P(9,d) is the probability 
of 9, possibly influenced by the decision, then 
the optimal decision that maximizes the expec¬ 
tation of the utility function is 

d = argmax{£(11(d)) = J d9U(9, d)P(9, d} 

In exact sciences, the states of the world 9 are 
represented by objective quantities such as tem¬ 
perature, energy density, barometric pressure, 
acidity, and the like. Measurements of these 
quantities are subject to errors whose distribu¬ 
tion is often fairly well known from the theory 
of the underlying physical process. For exam¬ 
ple, in electronics, the probability of an error of 
a weak signal is closely linked to the ambient 
temperature, which is an objective and measur¬ 
able quantity. In engineering the contribution of 
side factors can often be accounted for and con¬ 
trolled for to a great degree. The existence of the 
underlying theory capable of quantitative de¬ 
scription of the noise and other factors greatly 
simplifies decision making under uncertainty 
in engineering and in other exact sciences in 
comparison with financial applications. 

It is customary to employ the same reasoning 
in finance. When we talk about "more precise 


prediction of volatility" or "an accurate correla¬ 
tion coefficient" we implicitly assume that these 
quantities and parameters in finance are objec¬ 
tive. They are not. Not unless we supply an 
underlying micro-model derived from the first 
principles, as we routinely do in exact sciences. 
In contrast, states of the world 9 in finance are 
not inexact measurements of some "true quanti¬ 
ties" linked to natural phenomena. Rather, they 
are mental constructs, which help us reason 
about financial phenomena—with more or less 
success. In financial observations, controlling 
for other factors is not possible, so the concept of 
ceteris paribus does not exist in nontrivial cases 
of any practical significance. It is better to think 
about states of the world in financial applica¬ 
tions as relatively stable properties of markets 
and financial instruments. Depending on cir¬ 
cumstances, such mental concepts as volatility, 
correlations, liquidity, expected time to default, 
and so on can be regarded as states of the world 
in finance. 

States 9 are functions of the model employed 
9 — 9(M). Given the set of observations Y and 
subjective priors I, each state 9 is assigned a 
probability: 

P(9,d) = P(9(M),d\Y, I) 

Being the function of the model, the data, 
prior beliefs, and, possibly, the decision, the 
probability of the state 9 encapsulates all that 
is known to be relevant about the phenomenon 
under consideration. 

Probabilistic inference, apart from very spe¬ 
cial cases, is often tractable only by computer 
techniques: P(9 | Y) has no analytical represen¬ 
tation and must be ultimately sampled from the 
data. 

The utility function U(6,d) introduces the cost 
(or utility) of each decision in each state of 
the world. In academic research, one typically 
chooses a smooth and convex utility function. 
This should not necessarily be the case in the 
real world of financial applications where var¬ 
ious smooth and nonsmooth constraints must 
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be satisfied—such as risk tolerance, tax consider¬ 
ations, strict and soft budget constraints. 

Note that except for the observable data, all 
other components in the probabilistic decision¬ 
making process are user-dependent—the 
model, the beliefs, and the utility preferences. 
In the world of subjective views, there is no uni¬ 
versal truth, there are no unconditionally good 
or poor decisions. All decisions are ultimately 
conditional on personal preferences. 

Let's consider how it works in two simple fi¬ 
nancial applications: risk management of a sim¬ 
ple portfolio and valuation of a risky bond. 


MODEL RISK OF A SIMPLE 
PORTFOLIO 

A portfolio manager considers creating an in¬ 
vestment vehicle based on the instrument Y. 
The portfolio manager's objective is to extract 
as much idiosyncratic alpha as possible from Y 
while reducing the risks associated with the fac¬ 
tor X. Instruments highly correlated with X are 
available for short selling, or instruments highly 
negatively correlated with X are available for 
purchase. There are costs associated with these 
actions. The portfolio manager has an amount 
of capital equal to C and access to an abundant 
and relatively low-risk security Z, which can be 
used to preserve capital. The objective is to meet 
investment goals G(T), which include return on 
capital and risk parameters over a definite time 
horizon T. 

The portfolio manager's decisions are based 
on prior beliefs and the data. The portfolio man¬ 
ager begins splitting capital among X, Y, and Z 
such that 

C = Cx + Cy + Cz 

The allocation of capital is determined by the 
optimization of the utility function given by: 

C x , C Y , C z = arg max (£(U(C(T) - C))) 

Expectations of future returns depend on the 
model parameters. In the Bayesian decision 


framework, the distributions of these param¬ 
eters are important: 

1. Distribution of future returns of X. 

2. Uncertainty of knowledge about how Y and 
X are related. 

3. Distribution of idiosyncratic risk of Y after 
Y's relationship to X is accounted for. 

4. Uncertainty of expectations about future 
alpha. 

In the list, the first risk can be understood as 
the true risk; the last three risks are the model 
risks or uncertainty. 

Consider the model that links contemporane¬ 
ous data y t and x t in a linear fashion: 

yi = fix t + et 

This model is a simplification of the industry- 
standard factor risk model and is akin to that 
used in the capital asset pricing theory (Sharpe, 
1964), where a similar relationship is defined 
implicitly, or Fama and French (1992), where 
several factors are used. 

The probability of observing the datum y t 
given the unknown parameter of the model fi 
is 

P(y t \px t ) = P(y t - Pxt ) = P(e f ) 

It is customary to select a normal model of 
the idiosyncratic noise P(c f ) oc N(ji, a) as it is 
a well-behaving distribution that falls off very 
fast and which for this reason has all its mo¬ 
ments well defined. This, in turn, assists in 
obtaining clear analytical results with helpful 
illustrative properties. 

One needs to remember, however, that real fi¬ 
nancial noise is neither normal, nor log-normal: 
It has fat tails, which can be so poorly behav¬ 
ing that the distribution may not even have its 
first moment well defined. In the probabilistic 
decision framework, it is almost never possi¬ 
ble to obtain a neat analytical expression for 
the final result. Consequently, the advantages 
of the normally distributed noise fade in com¬ 
parison with more realistic models. Another ad¬ 
vantage of the probabilistic framework is that 
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Figure 1 Distribution of p of Daily Price Changes over Three Years for Microsoft Corporation (MSFT) 
and the S&P 500 EFT (SPY) 


one can easily compare evidence in favor of or 
against any conceivable model. In the presenta¬ 
tion here, we retain the normalcy of the residual 
noise, bearing in mind that it is used for the sole 
purpose of illustrating the main idea. 

Noise values being identically distributed 
and independent, which again is not a require¬ 
ment for the probabilistic decision theory, the 
probability of observing the data set consisting 
of N points X\, y t , t = 1 ... Mis 

P(X\pX)=Y[p^) = l\P(y t -px t ) 

=n p ^) 

It is easier to see the properties of the likeli¬ 
hood function by taking the logarithm: 

iogP<r| « = 

1 , 

--logoff 

As a function of P, the log-likelihood attains a 
maximum at the same point where the ordinary 
least squares (OLS) method finds its optimum 
value of P — P 0 i s . Contrary to the OLS, which 
boils down all the available data to one number, 
which is then taken as a real objective quan¬ 


tity the probabilistic framework retains more 
information about the relationship between Y 
and X, thereby preserving it in the distribution 

P(P |XY). 

In Figure 1 we show the distribution of /3, 
P (P | XY), when the dependent instrument Y is 
the daily change in the price of Microsoft Cor¬ 
poration stock and the independent instrument 
X is the daily price change of the exchange- 
traded fund SPY corresponding to the Standard 
& Poor's 500 index. Three years of daily data are 
used in the estimates of P(P\ XY). In Figure 2 
the same amount of data is used to estimate 
P (P | XY) when Y is the daily change in the 
price of the stock of a natural resource com¬ 
pany, the Mosaic Company and X is, again, 
the set of contemporaneous daily price changes 
in SPY. 

Having obtained distributions of the model 
parameters P, /z, and a from the data, the port¬ 
folio manager blends likelihoods with opinions 
about the distribution of the residual returns. 
The portfolio manager's alpha model is that 
the expectation of daily returns of Y is /zo 
with the confidence band ±<to: Mo ~ N(mo> er 0 ). 
Combined with subjective opinions, the 
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Figure 2 Distribution of fi of Daily Price Changes over Three Years for the Mosaic Company (MOS) 
and the S&P 500 ETF 


idiosyncratic distribution is again, normal: 


~[P(e f ) oc N(£, 5) 




. , jiN 

+ 

_1_ N _ 

_2 I" _2 


N 


In order to overcome the evidence extracted 
from the data and given by /x, the portfo¬ 
lio manager's confidence must be greater than 
the confidence range of the data: The portfo¬ 
lio manager's confidence is high, that is, when 
<7o er/N, the posterior expectation of alpha is 
governed by the portfolio manager's prognosis. 
In the opposite case, the data are trusted more 
than the portfolio manager's judgment. 

The portfolio manager sets risk preferences 
with the utility function 

U(C(T), C, ij) = - exp (- C(7 ^~ C ) 

Taking the expectations over one period T = 1 
we obtain: 


E(U) = — J aft exp(— (A + ^ + ^ 

+l/2r 1 2 (w 2 y & 2 + w 2 a 2 + a;{w]p 2 + 2w y w x p)j P(fi) 


Here 

C x C, 

w x = ^r,u> z = —,Wy = 1 - w x - w z 

C C 

First, we focus on the problem of optimum 
allocation when there is no hedging: w x — 0, 
li x = 0. Define the certainty equivalent (CE) of 
the investment in Y and Z as such guaranteed 
change in C that results in the same utility as a 
risky investment in Y and Z. Mathematically, it 
is defined as the inverse of the utility function: 

CE(C(T), C) = U-\E(U)) 


For the exponential utility function we obtain 
CE(C(T), C) = -C„ log £(ll(C(T), C, r,)) 


Adopting P(/3) = N(fi o, T) and integrating ex¬ 
pected utility over the model parameter ft, we 
finally arrive at 


CE(Wy) = jlWy + H z ( 1 - Wy) 


Po a x w 


1 - 


T 2 cr?u>l 


— + log 



r 2 °x w l\ 

—) 


The first three terms in this equation repre¬ 
sent the certainty equivalent of the investment 
without the risk model Po = 0 and without the 
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model risk T = 0. The optimal fraction of the 
capital invested in Y is a well-known expres¬ 
sion (see, for example, Merton, 1969) 



In this case, the fraction invested in the risky 
instrument is proportional to the portfolio man¬ 
ager 's risk tolerance and inversely proportional 
to the instrument's idiosyncratic risk, which, in 
the absence of any model, is the total risk of the 
instrument. 

Introduction of the risk model without uncer¬ 
tainty do ^ 0, T = 0 results in the obvious ex¬ 
tension: 


w y = T 1 


A - hz 
a 2 + P 2 0 a? 


Here cf 2 + P\o 2 is, again, the total risk of Y as 
given by the model, split into the idiosyncratic 
part and the part coming from the influence 
of X. 

When T ^ 0, the last two terms in the equation 
for CE(wy) represent the model risk. In some 
situations, the term r 2 cr 2 w 2 can be thought of 
as the contribution to the expected variance due 
to the model risk. Indeed, if r 2 er 2 u; 2 <y; i} 2 (i.e., 
when the risk tolerance is much greater than 
possible risk associated with the factor X) in 
the expression for the certainty equivalent, the 
model risk is simply added to the total risk: 


CE(Wy) «= jlWy + fl z (l - Wy) 

-^-(ct 2 + (do + r 2 )a 2 )w 2 + 0((T^w^/ri 3 ) 

In this expression, the last term is proportional 
to the magnitude of the expression in parenthe¬ 
ses and is small in comparison with the preced¬ 
ing terms. 

The contribution of the model risk is not so 
obvious in a general case. Clearly, when T 2 cr 2 ~ 
i] 2 , the model risk significantly affects optimal 
allocations. 


Position Hedging 

Now the portfolio manager aims to reduce the 
influence of the factor X on the variability of 
returns. The portfolio manager adds a position 
in X to the portfolio. Weight w x allocated to 
X is chosen to maximize CE. Positive weight 
corresponds to a long position in X, whereas 
a negative weight corresponds to a short posi¬ 
tion or its equivalent. In the case when X is the 
daily performance of the Standard & Poor's 500 
market index, a short position can be roughly 
replicated by taking a long position in an 
exchange-traded fund (ETF) whose daily re¬ 
turns correspond by design to the inverse—up 
to a constant factor—of the daily performance 
of the S&P 500 index. 

The certainty equivalent of the portfolio is 


E(Wy) 



w u 


1 cr 2 (w x + do Wy ) 2 

2r] i _ r2j >v 
>r 


-,n log 



r 2 a 2 w 2 \ 

^r) 


The first two terms in this expression are the 
idiosyncratic alpha and risk of the instrument Y. 

The third term introduces the risk associated 
with the portfolio returns dependence on X. 
Let's take a closer look at it. Its structure is simi¬ 
lar to the term describing the idiosyncratic risk: 
variance of the portfolio due to X divided by the 
portfolio manager's risk tolerance. In the third 
term, contribution from the risk model comes 
in two forms. In the numerator w x + do Wy is the 
total weight of X in the portfolio: the sum of 
the weight of the position in X, w x and the es¬ 
timate of the contribution from exposure to X 
of the position in Y, do w y . The fact that the total 
contribution of X is the same as in the standard 
portfolio theory is purely accidental and is due 
to the choice of the model distribution of d- 

In the denominator, the portfolio manager's 
risk tolerance is augmented by a factor that 
depends on the uncertainty of d ■ 


rj 2 
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This term being less than unity, uncertainty 
effectively reduces the portfolio manager's risk 
tolerance. 

The fourth term is the contribution to C E from 
the model risk. Terms associated with the model 
risk indicate that when the uncertainty of the 
model approaches a critical value T 2 er 2 u; 2 ~ i] 2 
the portfolio becomes unfeasible unless w lf is 
sufficiently small. 

In the absence of a risk model fio = 0, T = 0 
optimal allocations maximizing CE of the port¬ 
folio are 


u> x ~ 0 

1 

~? 

3 a 2 

When the risk model is present, but the un¬ 
certainty of the model is much bigger than its 
prediction F, we obtain another useful 

result: 


Wy 


1 

a 2 + r 2 er - 

X 


In this case the optimal allocation in Y is de¬ 
termined by the total risk of the instrument 
composed of the idiosyncratic risk and the un¬ 
certainty of the model. 

When the risk model is present and is ab¬ 
solutely precise ^ 0, T =0, the usual hedg¬ 
ing ratio — = —fi completely eliminates the 
dependency of portfolio returns and their CE 
on X—the result conventionally obtained in the 
traditional formulation of the risk management 
problem. 

From the probabilistic point of view, how¬ 
ever, an absolutely precise model is nonsensi¬ 
cal. Moreover, situations when both the model's 
optimal parameters and the uncertainty of the 
parameters are of the same order of magnitude 
are most likely to occur in real applications. 

Contribution from the risk model and from 
the uncertainty of the model become sepa¬ 
rated and especially simple when the portfolio 
manager's risk tolerance is sufficiently large. 


r 2 cr 2 w 2 <SC rj 1 '. 

1 

CE(Wy, W x ) ~ At Wy - ~ (<7 2 ~ r2(T x) W l 

\ 

-- a 2 { Wx - PflWy ) 2 

Note that there is no combination of the in¬ 
struments Y and X that can eliminate the ef¬ 
fect of X. That the effect of the instrument X 
may never be eliminated completely is a bet¬ 
ter depiction of the everyday experience of the 
portfolio manager. Probabilistic decision theory 
accounting for the model risk, however, gives 
a reasonable indication of what the portfolio 
manager can expect from such or another com¬ 
position of the portfolio when its components 
are mutually dependent. 

In more complicated settings, once the port¬ 
folio manager introduces the costs of hedging, 
the decision whether to hedge or not comes 
naturally as the consequence of the interplay 
between the value of hedging and the costs. 
Let y |w x C |= y w y C | be the cost associated 
with the hedge. Then one should hedge the 
position if 

-^(& 2 + r2 ^) w2 y -y 

Hedging is justified if the model risk of the 
hedge plus the cost of implementing the model 
is smaller than the original risk that the hedge 
is meant to reduce. 

In the equation above all quantities are eval¬ 
uated from the data and the subjective prior 
beliefs using the methods of the Bayesian in¬ 
ference. Even when the model and the model 
parameters are relatively stable, the decision 
whether to hedge or not to hedge depends on 
the portfolio manager's risk tolerance, which in 
turn can be represented by a combination of ex¬ 
ternal constraints, or be inferred from another 
model. 

A portfolio manager can readily extend the 
methodology of the preceding sections to more 
complicated cases of many interrelated in¬ 
struments and many factors. The probability 


uC 


> 


2 ii 


2 2 
a y w y 
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distribution of the correlation matrix, however, 
will not necessarily appear in the calculations 
in place of the probability distribution of /3: 
Noise models that have no concept of second 
or higher moments completely rule out corre¬ 
lation matrices in the calculations. Moreover, 
these distributions naturally lead to decisions 
being determined by a few extreme outliers. 
Fortunately, even pathological noise distribu¬ 
tions, which seem to be the norm rather than an 
exception in finance, are treated equally well by 
the methods of the probabilistic decision theory, 
which is designed to incorporate all available 
data plus the portfolio manager's preferences 
and constraints. 

In the next section we will address a problem 
of the model risk in an investment when the risk 
profile is different from that of an investment in 
an equity portfolio. 


INVESTMENT IN A 
RISKY BOND 

Let P be the face value of the zero-coupon bond, 
r the benchmark rate over the period of interest, 
p the multiplicative spread rate for the bond, so 
that 


(1 + r)(l + p) 

is the current fair or market price of the bond, 
possibly unknown. An alternative investment 
vehicle Z is available as in the previous section, 
the rate of return for this instrument being r z . 

Let there be two possible states of the world. 
In the first state the bond is redeemed at the 
face value at the end of the period. In the sec¬ 
ond state of the world the bond is redeemed 
at Py. The situation when P„ = 0 is possible, in 
which case the investment is a total loss. If the 
investor purchases N units of the risky bond 
and the remainder of the capital is preserved in 
the alternative vehicle, then, at the end of the 
period, the investor's capital is 


Ci = 


NP +(1 + r z )(C 0 - NV), with (1 - p d ) 
NP r +(1 + r z )(C 0 - NV), with p rf 


In the traditional formulation the investment 
is justified if the expected return on capital 
when At > 0 is greater than the expected return 
when N — 0. This translates into the following 
expression, which links all the input data of the 
problem and the unknown value of the bond: 

P(1 - pi) + P r pd > (1 + r z )V 

This traditional approach is a reasonably 
good approximation under certain conditions. 
A much richer view along with the set of quan¬ 
titative tools is required in a general case. 

From the probabilistic decision theory view¬ 
point, the probabilities and other relevant pa¬ 
rameters entering the decision-making process 
must be inferred from the model, from the 
data, and from the investor's prior beliefs, and 
are best represented by distributions of pos¬ 
sible states of the world. We consider now a 
simple one-parametric risk model and show 
how the model risk contributes to the decision 
process. 

Parameter Inference in the 
Bernoulli Model 

In the Bernoulli-like model, the investment ve¬ 
hicle under consideration belongs to a class 
of essentially similar bonds. They are financial 
obligations issued by debtors facing essentially 
the same economic (financial, market, etc.) con¬ 
ditions. Given these conditions, it is customary 
to assume that the failure of each instrument is 
a random event. Failures in the class occur with 
the same probability p d per unit time, which, 
for simplicity, will coincide in our analysis with 
the maturity time of the instrument. 

The model of the random process, the em¬ 
pirical data, and the investor's prior beliefs 
determine all that we know about the model 
parameter p d . 

Assume that the empirical data are the sam¬ 
ple of n observations of the class, and in is the 
number of cases when a debtor defaults. Adopt¬ 
ing a beta-distribution of the model parameter, 
we obtain the following posterior distribution 
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given the data and the prior beliefs: 


n(Pd,<x, P) 


I> + P) 
r(a)r(/3) 


Pt \i - VdY~ l 


where 


a = ao + m 
P = Pa + n-m 

and ao. Po are the parameters representing the 
investor's prior beliefs. In the prior distribu¬ 
tion, ao can be interpreted as number of cases 
of default and Pq is the number of cases when 
the bond was repaid in full. The prior distribu¬ 
tion's parameters can come from the investor's 
own experience, or from the consensus of ex¬ 
perts, or be inferred from agency ratings. The 
magnitude of ao, Po versus n, m determines the 
relative weight the investor assigns to prior 
beliefs. Prior beliefs dominate the data when 
ao + Po » n. 

In Figure 3, the investor's prior beliefs follow 
the prior probability of default 0.1. Parameters 
of the prior distribution are ao = 2, Po = 11. The 
newly arriving data point to the probability of 
default 0.2. Observe the change in the shape 
of the distribution: Its mode moves from ~ 0.1 



Figure 3 Distribution of the Probability of 
Default pa 

Note: Prior distribution is defined by ao = 2, /So = 
11. Newly arriving data follow the new proba¬ 
bility f = 0.2. 


to ~ 0.2 as the new data gradually overcome 
the investor's prior beliefs. Note that the model 
risk—the width of the distribution—remains 
relatively high. 

In the Bayesian perspective, the distribution 
of the probability of default is a convenient vehicle 
that carries all that the investor knows from 
the set of observations and the investor's prior 
beliefs: what is the most probable state of the 
world and what is the spread of possible states 
of the world given the investor's choice of the 
model of the world. 

The rich framework offered by the Bayesian 
inference of the probability of default conse¬ 
quently brings in a rich set of valuation meth¬ 
ods that naturally account for the model risk. 
In the next sections we will study the valuation 
effects of the risk of models. 


Model Risk Contribution to the Fair 
Price of the Bond 

First, we obtain an interesting estimate of the 
model risk contribution to the fair price of 
the bond under the assumption of the infinite 
risk tolerance. This is a degenerate case most 
closely resembling the traditional formulation. 
The utility function is linear if the investor's risk 
tolerance is infinite. 

We obtain formally: 

P(l-E(p d ))+P r E(p d )>(l+r z )V 


Assume that the sample size is n of which 
there are m defaults. A flat prior distribu¬ 
tion jr(pd, ao, Po) — const describes an investor 
who initially is ignorant. Expectation of the 
probability of default is then governed by the 
rule of succession (originally developed by 
Laplace): 


E(p„) = 


m +1 
n + 2 


The difference between this posterior expec¬ 
tation and the naive probability of default p d — 
minis the contribution of the model risk to the 
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fair price of the bond: 


SV = 


P~Pr 
1 + Yz 



m 

n 


For example, if the naive default rate estimate 
is 0.1 and it is based on 100 observations, the 
contribution of the model risk to the fair price 
of the bond can be as big as 78 basis points—not 
an insignificant amount: The model risk can be 
a substantial contributor to the overall risk of 
the investment. Thus, the sampling risk and the 
prior beliefs bias yield a substantial contribu¬ 
tion to the overall risk of the investment. 

Even in the simplest Bernoulli-like model, the 
contribution of the model risk to the value of 
the bond is nonnegligible. This contribution is 
especially pronounced when the probability of 
default is small. 

Now we will proceed to a case when the in¬ 
vestor's risk tolerance is not infinite. We will 
show that average probabilities are likely in¬ 
sufficient for making an informed investment 
decision. Relying on just expected probabilities 
can result in catastrophic consequences for the 
investor. 


Model Risk of Agency Ratings 

Currently financial regulators recommend that 
expected losses be quantified as the expected 
probability of default times the exposure at 
default (see Basel, 2008). Consequently, credit 
scoring and rating agencies aim at develop¬ 
ing models that generate expected probabili¬ 
ties of default. These models are calibrated by 
minimizing the difference between predicted 
and empirically observed probabilities of de¬ 
fault (see, for example, Korablev and Dwyer, 
2007). From the preceding section, it follows 
that the average rates based on thousands of 
credit events used in the calibration of the 
agency model alone are insufficient for making 
investment decisions concerning a portfolio of 
an arbitrary, possibly small, subset of instru¬ 
ments. Moreover, the naive probability of de¬ 
fault is likely to be useless in the valuation of a 


singular derivative instrument, such as a credit 
default swap (CDS). For a financial practitioner 
it is important to know, however, that agencies 
possess and disclose substantially more infor¬ 
mation than ratings, scoring, or expected prob¬ 
abilities alone. We will now discuss briefly how 
this information is used in the probabilistic de¬ 
cision framework. 

Korablev and Dwyer (2007) report that for a 
certain group of companies the Moody's KMV 
EDF™ model was predicting 2.5% as the mean 
probability of default in 2002. The value of 
1.8% was actually observed. The 10 and 90 per¬ 
centiles of the distribution of predicted rates 
were 0.5% and 5.4%. This information is suffi¬ 
cient to reconstruct the parameters of the beta- 
distribution discussed earlier. An approximate 
match is a = 1.12, /3 = 56.35. In Figure 4 we 
show the set of implied distributions for the 
four years preceding 2002. The inferred distri¬ 
bution jr(pd, a, ji) for the year 2002 is almost 
identical to that for 2000. 

We will now show how the inferred model of 
the probability of default is used in the decision¬ 
making process. 

It appears from the following analysis that 
due to idiosyncrasies of the distribution of the 



Pd 

Figure 4 Implied Distribution of the Probability 
of Default pj According to the Moody's Data for 
1998,1999, 2000, and 2001 
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probability of default, the effects of the model 
risk can be profound, even catastrophic. De¬ 
scribe the investor's risk preferences with the 
following disutility function: 

U{ Vd ) = -e ¥ 

This function describes an investor who is 
progressively reluctant to tolerate deviations 
from the expected probability of default when 
these deviations exceed p. Note that disutil¬ 
ity of positive deviations from the expected 
value is growing exponentially, while the beta- 
distribution of pd falls off around its mode 
much slower, approximately as a power func¬ 
tion. Using a beta-distributed probability of 
default n(pd, a, P), we find for the certainty 
equivalent 

CE(p rf ) = U~\E(U(pd)) 

= p log |V(a + p)F a + p, 

where F (a, b, z) is the regularized confluent 
hypergeo metric function F\(a , b, z)/ r(b) (Weis- 
stein, 2010). The certainty equivalent of CE(p d ) 
can be interpreted as an equivalent certain prob¬ 
ability of default, which supplies the same value 
for the investor as the uncertain probability of 
default—given the investor's risk preferences. 

In the limit p —> oo 

cE( Pd ) -+ -J- (i + + o or 2 ) 

At high tolerances CE(p d ) coincides with 
the mean naive probability of default. As the 
investor's risk tolerance decreases, however, 
the certainty equivalent grows more and more 
rapidly. A plot of the exact certainty equiv¬ 
alent probability of default as the function 
of the model risk tolerance is shown in Fig¬ 
ure 5. The parameters of the distribution are 
a — 1.45 and p = 15, and the dashed line is the 
asymptote a/(a + P). 

The catastrophic divergence of the certainty 
equivalent probability occurs at the values of 
the tolerance that are close to the width of the 
distribution n(p d , a, P): At the tolerance level 



Figure 5 CE( p^) versus Risk Tolerance 
Note: Dashed line is the asymptote value a/ 
(a + P),a = 1.45, /3 = 45 


p = 0.01 the CE(p d ) is as big as 0.23, more than 
seven times the naive value of the probability. 
From the practical decision-making standpoint 
it means that if the investor accepts the price of 
the bond or associated instruments defined by 
the naive probability 0.031, it is likely that the 
investor is grossly mistaken about the value of 
the bond given the investor's risk tolerance and 
the model risk. 

KEY POINTS 

• Probabilistic decision theory is a blend of the 
probabilistic, also called Bayesian, inference 
and the concept of utility. 

• In the probabilistic decision theory optimal 
decisions maximize the expected value of the 
user's utility over all possible states of the 
world. 

• Probabilities of the states of the world are in¬ 
ferred from the empirical data, the model, and 
the user's beliefs. 

• Uncertainty in the model parameters results 
in the model risk; a financial model that is free 
of the model risk is an exception. 

• Practical consequences of the model risk are 
evaluated using the utility function. 
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• Model risk significantly augments optimal al¬ 
locations in equity portfolios and can result in 
a prospective portfolio being ruled out. 

• Valuation of a risky bond is significantly af¬ 
fected by the model risk; ratings and ex¬ 
pected probabilities of default alone are 
likely insufficient for the decision-making 
process. 

• Failure to account for the model risk can lead 
to catastrophic consequences for the investor. 

• Unhandled or unknown model risk produces 
risk exposure that remains indeterminate. 
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Abstract: Accounting for the likelihood of observing extreme returns and for return asymmetry 
is paramount in financial modeling. In addition to recognizing essential features of the returns' 
temporal dynamics, such as autocorrelations, volatility clustering, and long memory, a successful 
univariate model employs a distributional assumption flexible enough to accommodate various 
degrees of skewness and heavy-tailedness. At the same time, a model's usefulness depends on 
its scalability and practicality—the extent to which the univariate model can be extended to a 
multivariate one covering a large number of assets. 


Risk models are employed in financial model¬ 
ing to provide a measure of risk that could be 
employed in portfolio construction, risk man¬ 
agement, and derivatives pricing. A risk model 
is typically a combination of a probability distri¬ 
bution model and a risk measure. In this entry, 
we discuss alternatives for building the prob¬ 
ability distribution model, as well as the pros 
and cons of various heavy-tailed distributional 
choices. Our focus is univariate models; their 
multivariate extensions are only briefly men¬ 
tioned. We start with the fundamentals—the 
Gaussian distribution. Then, we introduce fat¬ 
tailed alternatives, such as the Student's t distri¬ 
bution and its asymmetric version and the Pareto 


stable class of distributions and their tempered 
extensions. Next, we discuss extreme value the¬ 
ory's risk modeling approach. We conclude 
with a comparative empirical example to con¬ 
trast the models' performance over a 10-year 
period. 


THE FUNDAMENTALS: 
NORMAL DISTRIBUTION 

The use of normal (Gaussian) distribution in fi¬ 
nancial modeling has a long and distinguished 
tradition. The main reasons for its traditional 
popularity are several. First, its analytical 
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tractability means that deriving theoretical re¬ 
sults and employing it in applications is (rel¬ 
atively) straightforward; numerical methods 
are widely available and implementable. 1 Sec¬ 
ond, certain central results in statistics underlie 
the importance of the Gaussian distribution. 2 
Third, it has an intuitive appeal—random vari¬ 
ables distributed with the Gaussian distribu¬ 
tion tend to assume values around the average, 
with the odds of deviation from the average 
decreasing exponentially as one moves away 
from it. 

Some of the most prominent financial frame¬ 
works built around the normal distribution are 
Markowitz's modern portfolio theory, the cap¬ 
ital asset pricing model, and the Black-Scholes 
option pricing model. All of them assume (or 
imply) that asset returns follow a normal dis¬ 
tribution and reflect a long-standing paradigm 
that rational investors' preferences can be de¬ 
scribed exclusively in terms of expected returns 
and risk as measured by the variance of the 
return distribution. However, they are inher¬ 
ently static frameworks. The underlying dy¬ 
namic is either given exogenously or is based on 
the assumption that returns have independent 
and identical distributions. Such characteristics 
do not fit adequately with the empirically ob¬ 
served features of financial returns and investor 
choice. 

In this section, we describe the fundamentals 
of a risk modeling approach based on the Gaus¬ 
sian distribution. We start with a review of some 
of its basic properties and facts. 

Basics and Properties of the 
Gaussian Distribution 

The normal distribution is characterized by 
two parameters—a location (mean) parame¬ 
ter and a scale (volatility, standard deviation) 
parameter. 3 The location parameter serves to 
displace (shift) the whole distribution, while 
the scale parameter changes the shape of the 
distribution. For small values of the scale, the 


distribution is narrow and peaked, while for 
(relatively) larger values, it widens and flat¬ 
tens. Since the normal distribution is symmetric 
around its mean, the location (mean) coincides 
with the center of the distribution. Commonly, 
the mean is denoted by /i and the standard de¬ 
viation by a. 

Two important properties of the normal dis¬ 
tribution are location-scale invariance and sum¬ 
mation stability. They are directly related to the 
central role of the normal distribution in tradi¬ 
tional financial modeling. 

Location-Scale Invariance Property 

Let us suppose a random variable X is normally 
distributed with parameters /i and a. Now con¬ 
sider another random variable, Y, obtained as a 
linear function of X, that is, Y = aX + b. The 
variable Y is also normally distributed with 
parameters /ry = (ifi + b and cry = a o . That is, 
if a normal random variable is multiplied by 
a constant and/or is shifted, it remains dis¬ 
tributed with the normal distribution. 

Summation Stability Property 

Let us take n independent random variables 
distributed with the Gaussian distribution with 
parameters /i, and cr, . The sum of the variables 
is normal as well. The resulting normal distri¬ 
bution has a mean and standard deviation ob¬ 
tained, respectively, as 

h = Ml + h2 + • • • 

a = y/al + al + ••• + < 

Location-scale invariance and summation 
stability are not universal properties of sta¬ 
tistical distributions. In financial applications, 
however, they are clearly desirable properties. 

The property of summation stability is often 
used to justify the predominant use of normal 
distributions in financial modeling. A statisti¬ 
cal result, called the central limit theorem, states 
that, under certain technical conditions, the sum 
of a large number of random variables behaves 
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like a normally distributed random variable. 
More generally, we say that the normal distri¬ 
bution possesses a domain of attraction. In fact, 
the normal distribution is not the only distribu¬ 
tion with this feature. As we will see later in the 
entry, it is the class of stable distributions (to 
which the Gaussian distribution belongs) that 
is unique with that property. A large sum of 
random variables can only converge to a stable 
distribution. 


Density Function and Fitting of the Normal 
Distribution 

The density function of a random variable X 
distributed with the normal distribution with 
mean /x and standard deviation a is given by 
the following expression 

/(jr| ' i>< ’ )= vib exp (^^) (1) 


We denote this distribution asN(/i,a). The vari¬ 
able X and the parameter /x can take any real 
value, while cr can only take positive values. A 
normal random variable with zero mean and 
standard deviation of one is said to be dis¬ 
tributed with the standard normal distribution 
(N(0, 1)). The presence of the exponential func¬ 
tion in the normal density implies that the prob¬ 
ability of events away from the mean decays at 
an exponential rate. In contrast, heavy-tailed 
distributions are characterized by power-law 
behaviors for large (small) values of the ran¬ 
dom variables, leading to increased chance for 
extreme events relative to the Gaussian setting. 

Fitting of the Gaussian distribution is usually 
performed by maximizing the logarithm of the 
likelihood function given by 


£(/x, a | X\, x %,..., x,j) 


n 

2 

1 

2 


log27T — 


n 

2 


log a 2 


( x i ~ h) 2 


( 2 ) 


where x\, % 2 ,. ■., x n is the sample of observed 
data used for fitting. The resulting estimators of 


the mean and the standard deviation are (using 
standard notation): 

1 " 

x = - T Xi (3) 

n L —' 

i =1 

< t 2 = - y,{xi - xf ( 4 ) 

i =1 

Unconditional models imply that returns are 
independent and identically distributed (IID), 
so that (among other implications) the re¬ 
turns' means and variances remain unchanged 
through time. However, empirical evidence 
abounds that financial returns exhibit time- 
series properties such as autocorrelation and 
volatility clustering, which make unconditional 
return modeling inadequate. The time-series 
properties of returns need to be modeled in a 
conditional framework with appropriate time 
series models. We consider conditional normal 
models next. 


Conditional Normal Models and 
Time-Series Properties of Returns 

Properly computing the risk of a portfolio de¬ 
pends on recognizing a number of essential fea¬ 
tures of the evolution of returns through time. 
We begin with the two most commonly ac¬ 
counted for by academics and practitioners— 
autocorrelation and volatility clustering. 

Sometimes, a portfolio's past performance in¬ 
fluences future performance. Current returns 
of a financial asset may depend on its past 
returns. This property of autocorrelation is 
modeled by including lagged (past) values of 
the return and/or lagged innovations (infor¬ 
mational surprises). The resulting conditional 
model of the expected return is called an au¬ 
toregressive moving average (ARMA) model. 

Time-varying volatility concerns the empir¬ 
ically observed fact that large returns (of ei¬ 
ther sign) tend to be followed by large ones 
and small returns by small ones. To de¬ 
scribe this volatility clustering effect, the class 
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of autoregressive conditional heteroskedastic- 
ity (ARCH) models, as well as their gener¬ 
alized (GARCH) extensions are widely used. 
GARCH models assume that volatility on a 
given day depends on the volatilities and 
also squared innovations of the one or more 
previous days. 

The typical approach to building a risk model 
includes at least the elements of autoregres¬ 
sive component and volatility clustering com¬ 
ponent by means of a GARCH or alternative 
ARCH-type processes. A conditional normal 
ARMA(1,1)-GARCH(1,1) model combines the 
returns' conditional mean and volatility models 
with the assumption that returns are distributed 
with the Gaussian distribution. Analytically, the 
model is represented as 

Tt — m + €t (5) 

Ah = 0o + (6) 

Of 2 = (o + aaf_^ + (7) 

where rt, fit, and af are the return, expected re¬ 
turn, and return variance at time f, respectively, 
and €t is the innovation at time f. The innova¬ 
tion is normally distributed with mean 0 and 
variance rr, 2 . 4 

The standardized fitted residuals, et/a t/ are 
the original returns with the effects of auto¬ 
correlation and volatility clustering removed 
(filtered out). Since the model innovations are 
assumed to be Gaussian, if the model is cor¬ 
rectly specified, these filtered returns must ex¬ 
hibit the dynamics of a Gaussian white noise 
with variance one. Therefore, one easy way to 
determine whether the distributional assump¬ 
tions are valid is to examine the properties of 
these residuals. Indeed, numerous studies have 
confirmed that in the case of financial returns 
the standardized fitted residuals are not Gaus¬ 
sian. That is, even after removing the autocorre¬ 
lation and volatility clustering, fat tails, though 
smaller in magnitude, continue to be present 
in returns. Time-varying volatility then is not 
sufficient to explain the extreme events observ¬ 


able in returns. 5 Therefore, a more realistic risk 
model should allow for fat-tailed innovations. 
In the next section, we discuss parametric fat¬ 
tailed models, specifically, models based on the 
classical Student's f distribution and its asym¬ 
metric version, as well as on the stable and tem¬ 
pered stable distributions. 


INCORPORATING HEAVY 
TAILS AND SKEWNESS: 
PARAMETRIC FAT-TAILED 
MODELS 

The Student's f distribution has become the 
"go to" mainstream alternative of the normal 
distribution, when attempting to address asset 
returns' heavy-tailedness. Further below, we in¬ 
troduce an extension, called the skewed Stu¬ 
dent's f distribution, designed to deal with data 
asymmetries. First, we turn to discussing the 
"classical" Student's f distribution. 

The "Classical" Student's t 
Distribution 

The Student's f distribution (or simply the f- 
distribution) is symmetric and mound-shaped, 
like the normal distribution. However, it is more 
peaked around the center and has fatter tails. 
This makes it better suited for return modeling 
than the Gaussian distribution. Additionally, 
numerical methods for the f-distribution are 
widely available and easy to implement. 

The f-distribution has a single parameter, 
called degrees of freedom (DOF), that controls 
the heaviness of the tails and, therefore, the 
likelihood for extreme returns. The DOF takes 
only positive values, with lower values signi¬ 
fying heavier tails. Values less than 2 imply in¬ 
finite variance, while values less than 1 imply 
infinite mean. The f-distribution becomes arbi¬ 
trarily close to the normal distribution as DOF 
increases above 30. 
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Density Function of the Student's t 
Distribution 

A random variable X (taking any real value) dis¬ 
tributed with the Student's f distribution with v 
degrees of freedom has a density function given 
by 


f( x \ v ) 


r(^) 

r(|)ywr 



( 8 ) 


where "T" is the Gamma function. We denote 
this distribution by f„. The mean of X is zero 
and its variance is given by 

var(X) = (9) 

The variance exists for values of v greater than 
two and the mean—for v greater than one. 

The f-distribution above is sometimes re¬ 
ferred to as the "standard" Student's t 
distribution. 6 In financial applications, it is of¬ 
ten necessary to define the Student's t distri¬ 
bution in a more general manner so that we 
allow for the mean (location) and scale to be 
different from zero and one, respectively. The 
density function of such a "scaled" Student's f 
distribution is described by 


f(x | v, n, a) 

n v -f) 

ar(^)^/v7T 



x — /x 
a 


2 ^ -(v+l)/2 


( 10 ) 


where the mean /x can take any real value and 
a is positive. The variance of X is then equal to 
o 2 v/(v — 2). We denote the distribution above 
by f„(/x,er). 7 

Finally, we make a note of an equivalent rep¬ 
resentation of the Student's t distribution which 
is useful for obtaining simulations from it. The 
t„(/x, a) distribution is equivalently expressed 
as a scale mixture of the normal distribution 
where the mixing variable distributed with the 
inverse-gamma distribution, 

X ~ N(fi, VWa) 

W ~ Inv-Gamma 



Later in this entry we will again come across 
mixture representations in the context of our 
discussion of the skewed Student's f, the sta¬ 
ble Paretian, and the classical tempered stable 
distributions. 


Degrees of Freedom Across Assets and Time 
The typical approach to risk modeling based 
on the Student's t distribution includes build¬ 
ing an autoregressive and volatility clustering 
components, as well as assuming that DOF is 
the same for all assets' returns. This assump¬ 
tion is essential if we want to extend the clas¬ 
sical Student's f model to a multivariate one. 
It is, however, an empirical fact that assets are 
not homogeneous with respect to the degree of 
non-normality of their returns. Moreover, tail 
thickness and shape are not constant through 
time either. 

Consider the result of an empirical study 
of constituent stocks of the S&P 500 stock in¬ 
dex during the period from January 2, 1991 to 
June 30, 2011. We calibrate the Student's t dis¬ 
tribution after filtering the equity returns for 
GARCFI effects. The estimated DOF is shown 
in Figure 1. It is evident that tail behavior di¬ 
verges dramatically across stocks. Around 44% 
of equity returns are very heavy tailed, with 
DOF estimate below five. Around 54% of equi¬ 
ties have an estimated DOF parameter between 
five and 10. Only three stocks exhibit charac¬ 
teristics closer to the Gaussian, with a DOF ex¬ 
ceeding 15. Obtaining accurate DOF estimates 
across assets is important in risk management, 
since these estimates can form the basis of an 
analysis of portfolio risk contributors and di¬ 
versifiers. 

Not only does tail behavior vary across as¬ 
sets, it also changes through time. In rela¬ 
tively calm periods, asset returns are almost 
Gaussian, while in turbulent periods, the tails 
become fatter. Figure 2 illustrates the time- 
varying behavior of DOF, suggesting tempo¬ 
ral tail dynamics for the Dow Jones Industrial 
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Fitted degrees of freedom parameter 
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Figure 1 Fitted Degrees-of-Freedom Parameter for S&P 500 Index Stock Returns 

Note: The Student's t distribution is calibrated on the residuals from a GARCF1 model fitted to the returns 

of the stocks in the S&P 500 index. 


Average (DJIA) returns for the period from Jan¬ 
uary 1,1997 to June 30,2011. The top and middle 
plots show the value and return of the DJIA, 
respectively. The bottom plot shows the DOF 
parameter estimates. 8 In periods of "normal" 
market volatility, returns are almost normally 
distributed, with a fitted DOF over 30. FIow- 
ever, when markets are unsettled, return tails 
grow heavier. Accounting for that time dynam¬ 
ics is important in risk budgeting and manage¬ 
ment to serve as an indictor for the transition 
between different market regimes—from calm 
to turbulent market or vice versa. 

As pointed out earlier, a major limitation of 
employing the classical Student's t distribution 
for risk modeling is its symmetry. If there is 
significant asymmetry in the data, it will not 
be reflected in the risk estimate. There are at 
least several versions of the skewed Student's t 
distribution, depending on the analytical way 
in which asymmetry is introduced into the 


distribution. 9 Below, we consider the skewed 
Student's f distribution obtained as a mixture of 
Gaussian and inverse-gamma distributions. 10 

The Skewed Student's t Distribution 
Suppose that a random variable X is distributed 
with the skewed Student's t distribution, ob¬ 
tained as a mixture of a Gaussian distribution 
and an inverse-gamma distribution, 

X=/x + yW+ZTW (11) 

where 

* W is an inverse-gamma random variable with 
parameters v/2, W ~ Inv-Gamma(v/2, v/2). 

• Z is a Gaussian random variable, Z ~ N( 0, a), 
independent of W. 

The parameters /z and y are real-valued. 
The sign and magnitude of y control the 
asymmetry in X. We say that X's distribution 
is a mean-variance mixture of the normal 
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Figure 2 Fitted Degrees-of-Freedom Parameter for DJIA Returns 

Note: The Student's f distribution is calibrated on the residuals from a GARCH model fitted to the return 
on the DJIA using a 500-day rolling window 


distribution, since the mixing variable W 
modifies both the mean and the variance of 
the Gaussian Z. Notice that conditional on the 
value of W, the distribution of X is normal: 

X | W — w ~ N(/x + yw, Os/w) (12) 


X's unconditional distribution is what is 
defined as the skewed Student's t distribution 
and its density is given by the expression 


f (x \ jjL,o,y,v) = A x 


x 


^p(^) 

(i 

K( v+ 1)/2(B) 

g-(v+l)/2 


(x-DAb+1)/2 


where 


A = 


B = 


2 1 -( v + 1)/2 

ra)(itv) 1/2 o 



and Ki(-) is the so-called modified Bessel 
function with index X. 


Fitting and Simulation of the Classical and 
Skewed Student's t Distributions 

Estimation of the classical and skewed Stu¬ 
dent's t distributions is carried out using 
the method of maximum likelihood. Simula¬ 
tions from the two distributions make use of 
their normal mixture representations. For given 
















738 


Model Risk and Selection 


parameters /i, a , and y (e.g., the maximum like¬ 
lihood estimates), generation of t and skewed t 
observations consists of the steps below: 

• Generate an observation w from the inverse- 
gamma distribution with parameters v/2. 

• Generate an observation z from the normal 
distribution with mean 0 and variance a 2 . 

• Compute the corresponding observation of 
the t or skewed f-distribution, respectively, as 

x — fi + yfwz and y = /i + wy + y/wz 

(13) 

Stable Paretian and Classical 
Tempered Stable Distributions 

Research on stable distributions in the field 
of finance has a long history. 11 In 1963, the 
mathematician Benoit Mandelbrot first used the 
stable distribution to model empirical distribu¬ 
tions that have skewness and fat tails. The prac¬ 
tical implementation of stable distributions to 
risk modeling, however, has only recently been 
developed. Reasons for the late penetration are 
the complexity of the associated algorithms for 
fitting and simulating stable models, as well as 
the multivariate extensions. To distinguish be¬ 
tween Gaussian and non-Gaussian stable dis¬ 
tributions, the latter are commonly referred 
to as stable Paretian, Levy stable, or a-stable 
distributions. 

Stable Paretian tails decay more slowly than 
the tails of the normal distribution and there¬ 
fore better describe the extreme events present 
in the data. Like the Student's t distribution, 
stable Paretian distributions have a parameter 
responsible for the tail behavior, called tail in¬ 
dex or index of stability. 

Definition of Stable Paretian Distributions 

We offer two definitions of the stable Paretian 
distribution. The first one establishes the sta¬ 
ble distribution as having a domain of attrac¬ 
tion. That is, (properly normalized) sums of 
IID random variables are distributed with the 
a-stable distribution as the number of sum¬ 
mands n goes to infinity. Formally, let 


Yi, Y 2 , ..., Y„ be IID random variables and {a n } 
and {b n } be sequences of real and positive num¬ 
bers, respectively. A variable X is said to have 
the stable Paretian distribution if 

^ =1 , y '~ fl ” 4 X (14) 

b n 

where the symbol 4 denotes convergence in 
distribution. 

The density function of the stable Paretian 
distribution is not available in a closed-form 
expression in the general case. Therefore, the 
distribution of a stable random variable X is 
alternatively defined through its characteristic 
function. The density function can be obtained 
through a numerical method, as we explain fur¬ 
ther below. The characteristic function of the 
a-stable distribution is given by 

<Px(t) 

exp {i/it — (1 — i/Jsign(f)tan ™)}, 

a ^ 1 

(15) 

exp {i/it - cr\t\ (1 - ifil/n sign(f)log(f))}, 
a = 1 

where sign(f) is 1 if t > 0, 0 if t = 0, and —1 if 
t < 0. The four parameters uniquely determin¬ 
ing the a-stable distribution are: 

• a: index of stability or tail index, 0 < a < 2. 

• ft: skewness parameter, — 1 < ft < 1. 

• a: scale parameter, a > 0. 

• /i: location parameter, // e R. 

We denote the distribution by S a (a, ft, /i). The 
roles of a and ft are illustrated in Figure 3. The 
sign of ft reflects the asymmetry of the dis¬ 
tribution. Positive (negative) ft implies skew¬ 
ness to the right (left). As noted earlier, the 
index of stability controls the degree of heavy- 
tailedness of the distribution. Smaller values 
imply a fatter tail. The closer the tail index 
is to two, the more Gaussian-like the distri¬ 
bution is. Indeed, for a = 2, we arrive at the 
Gaussian distribution—if X is distributed with 
Sz(cr, ft, /i), then it has the normal distribution 
with mean equal to /i and variance equal to 2a 2 . 
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Figure 3 Stable Density: /r = 0, a = 1, /3 = 0, and varying a (left); a = 1.5, /z = 0, a = 1, and varying 
d (right) 


In this case, the parameter fi loses its mean¬ 
ing as a skewness parameter and becomes ir¬ 
relevant. Nevertheless, the normal distribution 
is usually associated with ft — 0. Apart from 
the Gaussian distribution, there are two more 
special cases for which the density function of 
the stable distribution is available in a closed 
form: the Cauchy distribution (a — 1, ft = 0) 
and the completely skewed Levy distribution 
(a = 1 / 2 , 0 = ± 1 ). 

Basic Properties of the Stable Distribution 

We outline three basic properties of the o-stable 
distribution: 

• Power-tail decay. The tail of the stable distri¬ 
bution's density decays like a power function 
(slower than the exponential decay). It is this 
property that allows the stable distribution to 
capture the occurrence of extreme events. For 
a constant C, the property can be expressed as 

P (| X| > x) oc Cx~ a , as x —► oo (16) 

• Existence of raw moments. The magnitude of 
the tail index determines the order up to 
which raw moments exist: 

£|X| p <oo, for any p: 0 < p < a (17) 

£|X| p = oo, foranyp:p>a 


This property implies that, for non-Gaussian 
a-stable distributions (a < 2), the variance 
(as well as higher moments such as skewness 
and kurtosis) does not exist. When the index 
of stability has a value less than one, the mean 
is infinite as well. Since the variance does not 
exist, one cannot express risk in terms of the 
variance. However, the scale parameter can 
play the role of a risk measure, in the same 
way that the standard deviation does in the 
normal distribution case. 

• Stability. The property of stability charac¬ 
terizes the preservation of the distribu¬ 
tional form under linear transformations. 
It is governed by the index of stability 
a and expressed as follows. Suppose that 
Xi, X 2 ,..., X„ are IID random variables, in¬ 
dependent copies of a random variable X. 
Then, for a positive constant C n and a real 
number D„, X follows the stable distribution: 

X 1 + X 2 + --- + X„ = C„X+D„ (18) 

The notation = denotes equality in distri¬ 
bution. The constant C„ = rd' a determines 
the stability property. The stability property 
means that the "classical" central limit theo¬ 
rem does not apply in the non-Gaussian case. 
A large sum of appropriately standardized 
IID random variables is distributed with the 
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Figure 4 Fitted Stable Tail Index for DJIA Returns 

Note: The Stable Paretian distribution is calibrated on the residuals from a GARCH model fitted to the 
return on the DJIA using a 500-day rolling window 


stable Paretian distribution as the number of 
terms increases indefinitely, not with the nor¬ 
mal distribution. 

When the variables X,, i = 1,... ,n, are 
themselves distributed with the a-stable 
distribution, X; ~ S a (n,. /%, //,), the stability 
property can be extended further: 

1. The distribution of Y = l ^ is a-stable 
with index of stability a and parameters: 

0 E"=i Pi°i 

( n \ V a n 

X!^ H w (i9) 

;=i / i =i 

2. The distribution of Y = X\ + a for some 
real constant a is a-stable with index of 


stability a and parameters: 

ft = Pi, a — a\, p = p + a (20) 

3. The distribution of Y — a X\ for some real 
constant a (a ^ 0) is a-stable with index of 
stability a and parameters: 

yd = sign (a) Pi, a = \a\(T 1 , 

cifi i, for a ^ 1 

11 flpi — -a ln(fl)CTi/ii, for«=l 

In empirical analysis, the time-varying tail be¬ 
havior of assets is reflected in the nonconstancy 
of the tail index of the a-stable distribution, 
as demonstrated in Figure 4. As in the earlier 
illustration, the tail index is estimated by fit¬ 
ting a stable distribution to the filtered returns 
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(after removing the volatility clustering with a 
GARCH model.) The tail index of the DJIA re¬ 
turns is very close to two in the upward market 
environment from 2003 to 2005 but starts de¬ 
creasing right before the 2008 market crash and 
is smallest at the time of the crash itself. This im¬ 
plies that tail thickness is smallest in the bullish 
market from 2003 to 2005 and is largest during 
the crisis period. 

As noted above, the variance of non-Gaussian 
stable distributions does not exist. To address 
this potentially undesirable feature, smoothly 
truncated stable distributions and various types 
of tempered stable distributions have been pro¬ 
posed. They are all obtained with a procedure 
known as "tempering" applied to the tails of 
the distribution to ensure that the variance is 
finite. This procedure replaces the power decay 
very far out in the tails of the distribution with 
an exponential (or faster) decay. We discuss the 
classical tempered stable distributions next. 

Definition of Classical Tempered 
Stable Distributions 

The characteristic function of the classical tem¬ 
pered stable (CTS) distribution is given by the 
following expression: 

<Px(t) = exp {imt — zfCr(l — a)^"” 1 — k“ _1 ) 

+cr(-a)((k+ - ity -)- a + + (*_ + uy - r )j 

( 21 ) 

We denote the distribution by CTS(a, C, '/■.+, 
/._, m). The distribution parameters are char¬ 
acterized as follows: 

* a: tail index, a e (0,1) U (1, 2). 

* m: location parameter, m e M. 

* C: scale parameter, C > 0. 

* a + and a_: parameters controlling the de¬ 
cay in the right and left tail, respectively; 
k+,k_ > 0. 

The relative magnitudes of a + and de¬ 
termine the degree of skewness of the CTS 
distribution. When ). + > a_ (a + < /,_), the dis¬ 
tribution is skewed to the left (right). Symme¬ 
try is obtained for A + = k_. Tail heaviness is 



Figure 5 Probability Density of the CTS Distri¬ 
bution: Dependence on X + and /._ 

Note: CTS Parameter Values: a = 0.8, C = 1, 
m = 0, and varying k + and 


determined in a more flexible manner in the 
CTS distribution than in the stable Paretian 
distribution. Three parameters play a role in 
that: a + , a_ , and a. The former two have the 
effect of scaling the tails (smaller values corre¬ 
spond to heavier tails), while the latter one, of 
shaping the tails (as before, small values im¬ 
ply fatter tails). The effect of different values of 
these three parameters on the CTS distribution 
can be seen in Figures 5, 6, and 7. 

Linear combinations of CTS-distributed ran¬ 
dom variables are also distributed with the 
CTS distribution. First, we define the standard 
CTS distribution. A random variable X has the 
standard CTS distribution if 

C = (r(2 - a) (X a ~ 2 + kr 2 ))^ 1 (22) 

The distribution is denoted by X ~ stdCTS 
( a , a + ). Its mean and variance are zero and 

one, respectively. 

For a positive number a and a real number 
m, the linear combination Y = aX + m has the 
CTS distribution: 


Y - CTS a 


r(2 - Of) (/“~ 2 + rr 2 ) a ' a 


(23) 
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Figure 6 Probability Density of the Symmetric 
CTS Distribution: Dependence on A.+ and 
Note: CTS Parameter Values: a = 1.1, C = 1, 
m = 0, and varying k + and 

The mean and variance of Y are m and a 2 , 
respectively. 

Subordinated Representation of the a-Stable 
and CTS Distributions 

Similar to the Student's t distribution, sta¬ 
ble distributions can be represented as mix¬ 
tures of other distributions. More generally. 



Figure 7 Probability Density of the CTS Distri¬ 
bution: Dependence on a 

Note: CTS Parameter Values: C = 1, A.+ = 50, it- = 
50, m = 0, and varying a 


(continuous) mixture representations are ana¬ 
lyzed within the framework of intrinsic time 
change and subordination. The price and return 
dynamics can be considered under two differ¬ 
ent time scales—the physical (calendar) time 
and an intrinsic (also called operational, trad¬ 
ing or market) time. The intrinsic time is best 
thought of as the cumulative trading volume 
process which measures the cumulative trad¬ 
ing volume of the transactions up to a point 
on the calendar-time scale. It is a measure of 
market activity and a reflection of the empirical 
observation that price changes are larger when 
market activity is more intense. Let us denote 
the intrinsic time process by T(t) and the time- 
evolving random variable such as price or re¬ 
turn by X(f). X(f) is assumed independent of 
T(f). The compound process X(T(t)) is said to 
be subordinated to X by the intrinsic time T (f) 
and T (f) is referred to as a subordinator. 12 Since 
the increments of the intrinsic time AT(f) = 
T(f) — T(t — At) are non-decreasing and posi¬ 
tive, distributions such as gamma, Poisson, and 
inverse-Gaussian can be used to describe them 
in probabilistic terms. 13 Another distributional 
alternative is the completely skewed to the right 
a-stable distribution, Sq.(ct, 1, 0), for 0 < a < 1, 
whose support is the positive real line. There¬ 
fore, when 0 < a < 2, the subordinator is a sta¬ 
ble distribution given by S»(cr, 1, 0). 

Subordinated models with random intrinsic 
time, such as X(T (f)), are leptokurtic. They have 
heavier tails and higher peaks around the mode 
of zero than the normal distribution. As such, 
they provide a natural way to model the tail 
effects observed in prices and returns. 

Subordinated representations' usefulness is 
in allowing for practical ways of simulating ran¬ 
dom numbers from the corresponding models. 
Subordinated processes are especially impor¬ 
tant in multiasset settings, where each marginal 
distribution has a different tail heaviness. This 
across-asset heterogeneity can be modeled by 
having subordinators with different parameters 
for each asset. As noted earlier, this character¬ 
istic of multivariate asset returns is crucially 
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important for a realistic risk model able to 
identify tail risk contributors and tail risk 
diversifiers. 

The subordinated representation of the ca¬ 
stable distribution can be expressed in the 
following way. Let Z be a standard normal ran¬ 
dom variable, Z ~ N( 0,1), and Y be a positive 
a /2-stable random variable independent of Z, 
Y ~ S a / 2 (s, 1,0), where 

or 2 /na\ 2 / a 

s = —cos(—J (24) 

Then, the variable 


X = Y 1/2 Z 

is symmetric a-stable: X ~ S a (cr, 0, 0). This im¬ 
plies that every symmetric stable variable 
is conditionally Gaussian (conditional on the 
value of the stable subordinator). Uncondi¬ 
tionally, the symmetric a-stable distribution 
is expressed as a scale mixture of normal 
distributions. 14 

The CTS distribution has a subordinated rep¬ 
resentation as well and can be expressed as a 
mean-scale mixture of Gaussian distributions. 
For details, see Madan and Yor (2005). 


Fitting and Simulations of a-Stable and CTS 
Distributions 

Fitting techniques for the a-stable distribu¬ 
tions can be divided into three categories: 
quantile methods, characteristic function-based 
methods, and maximum likelihood methods. 
The quantile method is similar to the method 
of moments estimation method in that it 
uses predetermined empirical quantiles to es¬ 
timate the sample parameters. 15 The charac¬ 
teristic function-based methods also resemble 
the method of moments and rely on transfor¬ 
mations of the sample characteristic function. 16 
Finally, the latter method involves maximiza¬ 
tion of the likelihood function, which is com¬ 
puted numerically. Comparative studies of the 
three fitting approaches show that the method 
of maximum likelihood is superior in terms of 


estimation accuracy. The quantile method re¬ 
quires the least amount of computational time 
but is the least accurate one. The second cate¬ 
gory of methods also have the benefit of compu¬ 
tational simplicity but may have a difficulty in 
estimating the skewness parameter. 17 For these 
reasons, here we focus on the method of maxi¬ 
mum likelihood in some more detail. 

In statistical theory, the relationship between 
the probability density function (pdf) and the 
characteristic function is expressed as follows: 

1 r°° 

/x« = —j exp (—itx) <p x (t) df (25) 

where h > 0 and /(•) and <p(-) are the density 
and characteristic functions, respectively. The 
pdf of the a-stable and CTS distributions can 
be computed by numerical evaluation of the 
integral above. A fast and computationally ef¬ 
ficient method of numerical integration is the 
fast Fourier transform (FFT) algorithm. 18 Con¬ 
sider the pdf computation in (25). The main 
idea of FFT is to evaluate the integral for a 
grid of equally-spaced values of the random 
variable X: 

x k = Mfc-1- jjh, k = l,...,N (26) 


That is, equation (25) can be expressed as 



q>x(2Tia))da) 


This integral can be approximated by the 
so-called Riemann sum, after choosing small 
enough lower and large enough upper bounds: 


N / , 

fx(xk) f 2jrs [ n - 1 - y 

n=i ' ' 


x exp 


N 

-i2rc ( k — 1 — — 



(27) 


for k = 1,..., N. Flere, the lower and upper 
bounds equal — ^ and =r, respectively. The 
distance between the grid points n — 1 — y. 
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n = 1,..., N is s. If s = we arrive at the fol¬ 
lowing expression for the density, after some 
algebraic rearrangement: 


fx(xk) 


(- 1 ) 


k- 1+ 


N N 


hN 

2lT 

hN 


D-irv 


n =1 


x exp 




ilnftn — 1 )(k — 1) 


N 


k = l,...,N 


(28) 


To compute the sum above, one can use the 
FFT implemented by many numerical analy¬ 
sis software packages. The parameters of the 
FFT method are N, the number of summands 
in the Riemann sum, and h, the grid spacing. 
Their values can be chosen appropriately, so 
as to achieve a balance between approximation 
accuracy and computational burden. 19 Finally, 
the maximum-likelihood estimates of the pa¬ 
rameters of the a-stable and CTS distributions 
are obtained by numerical maximization of the 
log-likelihood function. 

Simulations of a-stable distribution can be 
accomplished using an algorithm called the 
Chambers-Mallows-Stuck generator. To gener¬ 
ate a random number from S„(er, ft, /x), the steps 
are as follows: 


• Generate two independent random numbers 
from an exponential distribution with mean 1, 
E ~ exp(l), and from a uniform distribution, 

§,§). 

* If a ^ 1, compute 


Z = 



sin (a(U + b a ,p)) 

(cos U) 1/a 

cos (If — a(U + b a ,p)) 
E 


( 1 — a)/a 

(29) 


where 

[ „ 9 9 7t Oi 2a 

1 + ft 2 tan 2 — 
arctan(d tan ^) 

b a ,p = ---^ (30) 

a 


• If a = 1, compute 

Z=i[(| + ^),anU 



• The random variable Z has a standardized 
stable distribution with location parameter 
equal to zero and scale parameter equal to 
one, Z ~ S„(l, ft, 0). To obtain an observation 
from S„(a. ft, /x) with arbitrary values of a and 
!±, transform Z according to 20 

S = a Z + /j, (32) 

Conditional Parametric 
Fat-Tailed Models 

A fat-tailed parametric model includes the fol¬ 
lowing main components: 

• An autoregressive component to capture au¬ 
toregressive behavior. 

• A volatility clustering component, usually a 
GARCH-type model. 

• A fat-tailed distribution (stable Paretian or 
skewed Student's f) to explain the heavy tails 
and the skewness of the residuals from the 
ARMA-GARCH model. 

• Tail thickness changing with time and across 
assets addressed. 

INCORPORATING HEAVY 
TAILS AND SKEWNESS: 
SEMI-PARAMETRIC 
FAT-TAILED MODELS 

In this section, we review semi-parametric 
models, which combine an empirical distribu¬ 
tion for the body of the data distribution where 
plenty of observations are available and extend 
the tail by a parametric model based on extreme 
value theory ( EVT ). EVT has a long history 
of applications in modeling the occurrence 
of severe weather, earthquakes, and other 
extreme natural phenomena. In general terms, 
extreme value distributions are the asymptotic 
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distributions for the normalized largest ob¬ 
servations of IID random variables. There are 
two main categories of models for extreme 
values: block maxima models and threshold 
exceedances models. 

In the financial applications context, block 
maxima could refer to the maximal observa¬ 
tions within certain predefined periods of time. 
For example, daily return data could be divided 
into quarterly (or semiannual or yearly) blocks 
and the largest daily observations within these 
blocks collected and analyzed. The distribution 
of the maximal values is generally not known. 
However, when the block size is large, so that 
block maxima are independent (irrespective of 
whether the underlying data are dependent), 
the limit distribution is given by EVT. 21 The 
number of blocks determines the size of the 
data sample available for analysis and fitting. 
In contrast, in threshold exceedances models, 
the sample size is not predetermined but, natu¬ 
rally, depends on the a priori selected threshold 
level. 

The first model category is represented by the 
so-called generalized extreme value (GEV) dis¬ 
tribution. Its distribution function has the form 

exp(-(l +{ iai)-'«). 


F x (x | /r, a) 


^0 

exp {—e~ x ), 


(33) 


$ = 0 


where 1 + £(x — q.)/cr > 0. The parameters 
^ e K, /ieR, and a > 0 are the shape, loca¬ 
tion, and scale parameters, respectively. The 
value of £ determines the three distributions en¬ 
compassed by the parametric form above: the 
Weibull distribution (£ < 0), the Gumbel dis¬ 
tribution (£ = 0), and the Frechet distribution 
(£ > 0). Of the three, the latter one has the heavi¬ 
est tails, 22 while the first one is short-tailed, with 
a finite right endpoint and, thus, not favored in 
modeling financial losses. 23 

The block maxima method's major drawback 
is its "wastefulness" of data: all but the largest 
observation within each block are discarded. 
For this reason, a more common approach 


to EVT modeling is the threshold exceedance 
method. In it, the extreme events exceeding a 
predetermined high level (that is, events in the 
tail) are modeled with the generalized Pareto 
distribution (GPD). Its distribution function is 
given by 


F x {x | §, u) 


1-(1 + lf) _1/f . 

1 — exp (—f), 


1 = 0 


(34) 


where a > 0 and x > 0 when | > 0 and 0 < 
x < — <r/£ when £ < 0. The parameters § and 
a are the shape and scale parameters, respec¬ 
tively. Like the GEV, the GPD contains several 
special cases defined by the value of £. When 
§ > 0, we get the Pareto distribution with pa¬ 
rameters a = l/£ and k = a /§, whose tails ex¬ 
hibit slow, power-law decay. The exponential 
distribution is obtained for £ = 0; its tails de¬ 
cay at an exponential rate. A short (finite)-tailed 
distribution, called Pareto type II distribution, 
arises when § < 0. 24 


Fitting and Simulations of the GPD 

In empirical modeling, there is generally a per¬ 
ceived trade-off between fitting the bulk and 
the tails of the data. Data around the mode are 
numerous and relatively easy to fit, while data 
in the tails are sparse and present an estima¬ 
tion challenge. Most commonly, the choice of 
model is based on how well it fits the bulk of 
the data, with the tails relegated to a somewhat 
secondary role. The semiparametric approach 
we consider in this section is to describe the 
majority of the data in a nonparametric fashion 
and use the GPD to fit the tails. Since the GPD 
describes the excess distribution over a thresh¬ 
old, we now define formally this concept. 

For a random variable with cumulative distri¬ 
bution function G, the excess distribution over 
the threshold u is denoted by G u and is given by 

G„(x) = P (X — u < x | X > u) 

G(x + u ) — G(it) 

1 - G(h) 

for 0 < x < x F — u, where x F is the right 
endpoint (a finite number or infinity) of X's 




746 


Model Risk and Selection 


distribution function G. 25 A statistical result 
known as the Pickand-Balkema-de Haan 
theorem implies that the excess distributions 
of a large class of underlying distributions con¬ 
verge to a GPD as the threshold level increases. 
That is, GPD is the limiting distribution as u 
increases to infinity. 

Denote the available data sample by 
X \, ..., X;v and define an upper and a lower 
threshold level Uu and Ui, respectively. The 
data points beyond the threshold levels con¬ 
stitute the tails of the data distribution that are 
to be modeled with EVT. Naturally, separate 
modeling of the two tails has the purpose of 
accounting for the potential skewness in the 
data distribution. Let us define the exceedances 
of u u by Yk,u = X, c — uu, where Xk > iiu and 
the exceedances of Ul by Y/ c ./, = u L — Xk, where 
Xk < u L , k = l,K. 26 The estimates of the 
scale and shape parameters are most conve¬ 
niently obtained by maximizing the GPD log- 
likelihood function for each of the sets of data 
Yk'U and Ykj . 27 It is written as 


lnL(^<7|Yi,...,Y K ) = £ln/y($,<7) 


k =1 
K 


= -K\na- 1 + - 1+*-) (35) 


k=l 


Y k 


where /(£, a) denotes the GPD density 
function. 

The empirical distribution is usually esti¬ 
mated using kernel density estimation ap¬ 
proach. The kernel density estimate can be 
roughly thought of as a smoothed-out his¬ 
togram. A parameter, called bandwidth or win¬ 
dow width, controls the degree of smoothness 
of the resulting density estimate. More formally, 
the kernel density estimate is defined as 


f(x,Xi,h) 


1 

hn 


i =1 



(36) 


where x; = (xi, X 2 , ..., x„) is data sample com¬ 
ing from some unspecified distribution and 
assumed to be IID. The bandwidth, h, takes pos¬ 


itive values and K is called the kernel, a symmet¬ 
ric function that integrates to one. The normal 
density is often chosen as the kernel in (36). The 
bandwidth's value can be selected in an optimal 
way. 28 

The approach to scenario generation from a 
model based on GPD is also semiparametric— 
the body of the distribution is simulated from 
the empirical density and GPD tails are attached 
to it. Generating observations from a GPD with 
a given shape parameter £, a scale parameter a, 
and a threshold level u can be accomplished in 
the following three steps: 


• Generate an observation U from a uniform 
distribution on the interval (0,1). 

• Compute the quantity 


Z = 


U~S - 1 

r 


(37) 


* Compute the GPD realization as 


Y = | + a x Z (38) 


Scenarios from the body of the distribution 
are generated nonparametrically, via histor¬ 
ical simulation known as bootstrapping (or 
resampling, more generally). The procedure in¬ 
volves drawing randomly, with replacement, 
from the set of historically observed data points. 
The simulated tails of the distribution are then 
"attached" to the scenarios from the body to ob¬ 
tain semiparametric scenarios from the whole 
data distribution. 


Threshold Selection 

We consider two of the most popular tools for 
selection of the threshold level—the mean ex¬ 
cess function plot and the Hill plot. Both of 
them rely on visual inspection to determine the 
threshold. 

The mean excess function is closely related to 
the concept of excess distribution. It describes 
the average exceedance above a threshold u, as 
a function of u. 29 Formally, it is defined as 

m(u) = E (X — u\X> u) 


(39) 
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In the case of the GPD, m(u) can be shown to 
equal 


m(u) = 


a 


l-£ 


I 

l-g 


u 


where 0 <u < oo if 0 < £ < 1 and 0 < u < 
— p/% if § <0. The excess mean function does 
not exist for £ > 1. The mean excess function 
is linear in the threshold level. This linearity 
is used to motivate a graphical check that the 
data conform to a GPD model: If the plot is 
approximately linear for high threshold values, 
the GPD may be employed to describe the dis¬ 
tribution of the exceedances. The level above 
which linearity is evident may be taken as the 
threshold level. 

Plots of the Hill estimator are another EVT 
model selection method. The Hill approach of¬ 
fers a way to estimate the tail index a = l/£. 
Denote the zth order statistics of the data sam¬ 
ple by X(,). 30 The Hill estimator of a is defined 
as 

H m , n = (h E ln X (0 - ln X ("») (40) 


where 2 < m < n and m is a sufficiently high 
number. For £ > 0, the Hill estimator is equal to 
a asymptotically, as the sample size n and the 
number of extremes m increase without bound. 
In practical applications, the Hill estimator is 
computed for different values of m and plot¬ 
ted against these values. The plot is expected 
to stabilize above a certain value of m, so that 
the Hill estimates constructed from a differ¬ 
ent number of order statistics remain approx¬ 
imately the same. The threshold level u is then 
estimated by X(,„). 

The semiparametric approach described in 
this section is a source of two major challenges. 
First, in order to obtain a sufficiently large num¬ 
ber of observations in the tail, a large sample of 
historical data is needed. Second, even though 
the plots of the Hill estimator and the mean 
excess function provide a method for threshold 
identification, such identification is intrinsically 
subjective, as it is based on visual inspection. 


Moreover, it is difficult to automate it for large- 
scale applications. 31 

Conditional GPD Approach 

The semiparametric approach described above 
is unconditional, since it implicitly assumes that 
the observed data is IID. A typical conditional 
GPD approach involves the components: 

• Autoregressive model to capture linear de¬ 
pendencies in the data. 

• GARCH-type model to capture the volatility 
clustering in the data. 

• Semi-parametric model applied to the stan¬ 
dardized residuals (which are approximately 
IID) to explain the data's heavy-tailedness 
and asymmetry. 

COMPARISON AMONG 
RISK MODELS 

Using the DJIA daily returns from February 
7, 1992 to June 30, 2011, we conduct a back¬ 
testing analysis to compare the three fat-tailed 
distribution models—stable Paretian, Student's 
f, and EVT—alongside the normal distribution 
model. The data used in all models are first fil¬ 
tered for autoregression and volatility cluster¬ 
ing using ARMA-GARCH. 

The particular models we use in this section 
are the univariate analogs of the typical ap¬ 
proaches to modeling in the multivariate case. 
A short discussion will help clarify what this 
means. Earlier we explained that, in a multi¬ 
asset setting, taking into account the varying 
tail behavior of the returns of different assets 
is of principal importance for risk analysis and 
management. However, employing the classi¬ 
cal Student's t distribution in the multivari¬ 
ate case necessarily implies the same value of 
the DOF parameter for all assets. That value 
would "average out" the tail-fatness of assets, 
so that the risk of some risk drivers will be 
underestimated, while the risk of others, over¬ 
estimated. To reflect this typical multivariate 
application, in our backtesting analysis we 
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choose to fix the DOF of the Student's t dis¬ 
tribution to four. 

In the case of the stable Paretian model, sim¬ 
ilar considerations about the heterogeneity of 
tail behavior across risk drivers lead us to use 
the subordinated representation of the a-stable 
distribution. As mentioned in an earlier section, 
that representation allows for modeling the in¬ 
dividual tail behavior of assets. 

The backtesting analysis in this section, there¬ 
fore, can be understood as a comparison among 
models with increasing degree of sophisti¬ 
cation. We start with the classical paramet¬ 
ric approach (the "Gaussian model"). Then, 
a "non-sophisticated" fat-tailed model, repre¬ 
sented by the fixed-DOF Student's f model 
(the "T-model") is tested. Finally, a state-of-the- 
art fat-tailed model—the stable subordinated 
model (the "stable model")—is considered. For 
each of the four models, exceedances of value- 
at-risk (VaR)—the number of times the realized 
loss is larger than the predicted VaR level—are 
tracked. 32 We run the backtest with the follow¬ 
ing settings: 

• Backtest period: January 2, 2004, to June 30, 
2011 . 

• VaR confidence level: 99%. 

• Time window: 500 rolling days for normal, 
classical Student's f, and stable Paretian dis¬ 
tributions and 3,000 rolling days for EVT. 33 

• EVT threshold: 1.02% (as suggested by Gold¬ 
berg, Miller, and Weinstein (2008)). 

The number of exceedances of the daily 99% 
VaR in the backtesting analysis for the four 
models is summarized below: 


Model 

Number of Exceedances 

Stable 

21 

Student's t 

26 

Gaussian (normal) 

42 

EVT 

1 


The number of exceedances is compared us¬ 
ing a 95% confidence interval estimated to be 
[10, 27]. The results show that the Gaussian 
model is too optimistic—with 42 exceedances. 


its VaR forecasts are too low. In contrast, the 
EVT approach is overly pessimistic: Its pre¬ 
dicted VaR is only exceeded once in the back¬ 
testing period. The Student's t model and the 
stable model both produce exceedances within 
the confidence interval, with the latter model 
being very close to the upper bound. 

The DJIA performance during the backtest pe¬ 
riod is presented in Figure 8. Figure 9 plots the 
daily 99% VaR forecast produced by the Gaus¬ 
sian model, the Student's t model, and the stable 
model against the daily DJIA returns for the full 
backtest period. Since the EVT model's VaR pre¬ 
dictions are too conservative, we have excluded 
it from the exhibit for the sake of presentation 
clarity. It can be seen from the figure that in 
times of low market volatility, the VaR forecasts 
of the three models are almost indistinguish¬ 
able. However, during periods of greater mar¬ 
ket turmoil, differences in predicted risk levels 
are substantial across models. This point is fur¬ 
ther elaborated in Figure 10, which shows the 
spreads between the 99% VaR forecasts for the 
Student's t Gaussian and the stable-Gaussian 
model pairs, along with the values and returns 
of DJIA. We observe that the stable-Gaussian 
VaR spread stays at zero for the period from 
2004 to late-2006, suggesting "normal" market 
conditions. (The estimated tail index parame¬ 
ter of the a-stable distribution is close or equal 
to two during that period.) This is an essen¬ 
tial feature of the stable model: Despite being 
a fat-tailed approach, it does not overpenalize 
the portfolio by assessing unnecessarily high 
risk estimates during calm market periods. On 
the other hand, even in times of severe market 
circumstances the number of exceedances of the 
stable VaR is within an acceptable range. For the 
period from June 1,2008 to June 1,2009, the sta¬ 
ble VaR has one exceedance, which is within the 
95% confidence interval for the number of ex¬ 
ceedances ([0, 4]). By comparison, the Student's 
t model's VaR is exceeded four times, while the 
Gaussian model has seven exceedances. 

It is interesting to analyze whether the VaR 
forecasts can anticipate the transition from a 





Fat-Tailed Models for Risk Estimation 


749 


x 10 


Dow Jones Industrial Average: January 2, 2004 - June 30, 2011 



18/Oct/04 03/Aug/05 19/May/06 08/Mar/07 20/Dec/07 07/0ct/08 24/Jul/09 11/May/10 24/Feb/11 
Figure 8 Dow Jones Industrial Average Performance: January 2, 2004-June 30, 2011 


calm market regime to a turbulent one. To in¬ 
vestigate, we "zoom in" on the VaR spread 
dynamics for the two-year period leading up 
to the September 2008 crash. Figure 11 shows 


the VaR spreads for the period September 1, 
2006-September 1, 2008, relative to the Gaus¬ 
sian VaR forecast. We notice that the stable- 
Gaussian relative spread starts increasing in 



Figure 9 Backtest of the 99% Daily VaR Predicted by Different Distributional Methodologies 
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Figure 10 Spreads between 99%VaR Predictions for Student's t Gaussian and Stable-Gaussian Model 
Pairs: Full Period 


late 2006. This is the result of the increased 
tail-fatness estimated by the stable model (a 
decreases). At the same time, the Student's 
f-Gaussian relative spread is fairly constant due 
to the fact that the DOF (and, therefore, the tail¬ 
fatness) is fixed. 34 Over the two-year period, 
we can see a pronounced increase in the stable- 
Gaussian VaR relative spread. There are two 
time segments (in spring 2007 and spring 2008) 
in which the spread actually decreases. Both are 
associated with periods following major nega¬ 
tive news and market tremors. 35 In these pe¬ 
riods, the Gaussian model's VaR "catches up" 
post factum due to the increase in the estimates 
of the conditional GARCH volatility. 

In general, one can interpret the upward trend 
of the stable-Gaussian VaR relative spread as 
an indicator of markets accumulating higher 
probability of extreme events before the actual 
market volatility goes up. This predictive be¬ 
havior is only possible due to the time-varying 


estimates of the tail-fatness (the a parameter in 
the stable model). Thus, in the fixed-DOF Stu¬ 
dent's t model, such a predictive trend cannot 
be observed. During the two-year period, the 
number of exceedances is eight for the stable 
model, ten for the Student's f model, and 16 for 
the Gaussian model, while the 95% confidence 
interval is [0, 9]. 

Finally, to test the significance of the stable- 
Gaussian VaR relative spread, we build a con¬ 
fidence interval for it. We do that by altering 
the tail index a at each point in time during 
the backtesting period with plus and minus 
one standard deviation of a and then re¬ 
computing the stable VaR and the associated 
stable-Gaussian relative spread. The standard 
deviation of a is estimated using parametric 
bootstrap, based on 200 bootstrap samples of 
500 random draws each generated from an 
a-stable model with the corresponding a. 36 
Figure 12 shows the confidence bounds of the 
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Figure 11 Relative Spreads between 99%VaR Predictions for Student's f Gaussian and Stable-Gaussian 
Model Pairs: September 1, 2006-September 1,2008 


stable-Gaussian VaR relative spread for the two- 
year period running up to September 2008, 
together with the Student's f-Gaussian VaR rel¬ 
ative spread. Even the lower bound of the confi¬ 
dence interval of the stable-Gaussian VaR rela¬ 
tive spread is more indicative than the Student's 
f-Gaussian relative spread over this time pe¬ 
riod. Although the upward trend of the lower 
confidence bound is not as strong as that of 
the upper confidence bound, the results sup¬ 
port the conclusion that the stable model's VaR 
forecasts have the ability to anticipate a switch 
from a calm to a volatile market regime. 

KEY POINTS 

* The Gaussian distribution is not adequate 
to describe the empirical features of asset 


returns. The standardized residuals from a 
conditional Gaussian model exhibit heavy- 
tailedness and asymmetry. 

* The Student's f distribution has fatter tails 
than the normal distribution. To account for 
skewness, however, the "classical" Student's 
f distribution needs to be modified. 

* The skewed Student's f distribution can be 
represented as a mean-scale mixture of nor¬ 
mal distributions; that is, normal distribution 
with random mean and variance. 

* The tails of the stable Paretian distributions 
decay more slowly than the tails of the normal 
distribution and therefore better describe the 
extreme events present in the data. 

* In the non-Gaussian case, a large sum of ap¬ 
propriately standardized IID random vari¬ 
ables is distributed with the stable Paretian 
distribution in the limit. 
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Figure 12 Relative Spreads between 99%VaR Predictions for Student's t Gaussian and Stable-Gaussian 
Model Pairs with Confidence Bounds: September 1,2006-September 1,2008 


• To address the issue of infinite variance, the 
stable Paretian distribution may be modified 
by tempering of the distribution's tail. This 
gives rise to the tempered stable distributions. 

• There are two main categories of distributions 
for extreme values—block maxima models 
and threshold exceedances models. The lat¬ 
ter category is more often employed in risk 
modeling, since it is less "wasteful" of histor¬ 
ical data than the former category. 

* Selection of the threshold from where the 
tail of the data distribution starts is based on 
a subjective judgement and, together with 
data scarcity, is the main bottleneck in EVT 
applications. 

* In all cases, before applying a fat-tailed 
model, an ARMA-GARCH filter should be 


used to remove the temporal dependence in 
asset returns. 

• A realistic distributional assumption for a 
model should allow for tail-fatness that 
changes over time and from asset to asset. 
Such models can serve as early warning indi¬ 
cators when moving to a new market regime 
(from calm to turbulent and vice versa) and 
can identify tail-risk contributors and tail-risk 
diversifiers. 

NOTES 

1. The Gaussian distribution's analytical 
tractability in the multivariate setting is 
an additional important factor behind its 
widespread use. See, for example, Kotz, 
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Johnson, and Balakrishnan (2000) for details 
on the multivariate normal distribution. 

2. However, later in this entry we provide an 
important clarification regarding a statisti¬ 
cal result, called the central limit theorem. 

3. In general, the scale parameter does not al¬ 
ways coincide with the standard deviation 
(volatility), as we will see, for instance, in 
our discussion of the scaled Student's t dis¬ 
tribution later in the entry. 

4. See, for example, Rachev, Stoyanov, 
Biglova, and Fabozzi (2005). 

5. The conditional distribution of returns ac¬ 
cording to the model above is Gaussian. The 
unconditional distribution, however, is not 
normal but a mixture of normal distribu¬ 
tions (due to the time-varying mean and 
variance). Its tails are fatter than those of the 
normal distribution but not fat enough to 
account for the empirically observed heavy 
tails. 

6. Note that this is not the same as "standard¬ 
ized," since the standard deviation of X is 
not one. 

7. Notice that the Student's t distribution de¬ 
fined in equation (8) has a location of zero 
and a scale of one. 

8. More precisely, we estimate a GARCH 
model on a 500-day rolling window of re¬ 
turns and then fit a f-distribution to the 
(standardized) GARCH residuals. 

9. Skewed Student's t models have been pro¬ 
posed by Fernandez and Steel (1998), Azza- 
lini and Capitanio (2003), and Rachev and 
Riischendorf (1994), among others. 

10. The skewed Student's f distribution be¬ 
longs to a more general class of dis¬ 
tributions called generalized hyperbolic 
distributions and introduced by Barndorff- 
Nielsen (1978). It contains the Student's f 
and normal distributions as limiting cases. 

11. See Rachev and Mittnik (2000) and 
Samorodnitsky and Taqqu (1994). A de¬ 
tailed description of the stable methodology 
is available in Rachev, Martin, Racheva- 
Yotova, and Stoyanov (2009). 


12. For details on the statistical properties of 
subordinated processes, see Feller (1966) 
and Clark (1973). Rachev and Mittnik (2000) 
provide discussions of subordinated pro¬ 
cesses in financial applications. 

13. More generally, the family of infinitely di¬ 
visible distributions to which the gamma, 
Poisson, inverse-Gaussian, and all stable 
Paretian distributions belong is a natural 
choice of distributions for the increments 
of the intrinsic time process T(f). 

14. In a multivariate setting, Z would be dis¬ 
tributed with a multivariate normal dis¬ 
tribution and Y can be a a vector whose 
components are stable subordinators with 
different tail-fatness. The resulting distribu¬ 
tion is a generalization of the multivariate 
sub-Gaussian stable distribution. 

15. McCulloch (1986)'s estimation procedure 
generalized the quantile method for sym¬ 
metric a-stable distributions of Fama and 
Roll (1971). 

16. See Press (1972). Kogon and Williams 
(1998) and Koutrouvelis (1980) suggested 
regression-type estimator algorithms, also 
based on the characteristic function. 

17. Comparison among the three types of esti¬ 
mation categories is provided in Stoyanov 
and Racheva-Yotova (2004). 

18. A detailed description of the stable fitting 
methodology is available in Rachev and 
Mittnik (2000). 

19. Rachev and Mittnik (2000) show that 
selecting h = 0.01 and N — 2 13 reduces 
the approximation error in computing 
the a-stable pdf to the satisfactory level 
of 10” 6 . 

20. The algorithm for simulations from the 
CTS distribution is rather involved and de¬ 
scribed in detail in Rachev, Kim, Bianchi, 
and Fabozzi (2011). 

21. The role of EVT in modeling maxima of 
random variables is similar to the one the 
central limit theorem plays in modeling the 
sums of random variables. Both character¬ 
ize the limiting distributions. 
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22. The tails of the Frechet distribution decay 
like a power function at a rate a = l/£, 
the tail index parameter characterizing the 
a-stable distribution. 

23. Comprehensive discussion of EVT is avail¬ 
able in McNeil, Frey, and Embrechts (2005). 

24. For more details on GPD, see Embrechts, 
Kliippelberg, and Mikosch (1997). 

25. Notice that the threshold level u is in fact the 
location parameter in the GPD distribution. 

26. The thresholds Ul and Uu are usually de¬ 
fined in terms of symmetric empirical quan¬ 
tiles, for example, the 5% and 95% quantiles. 
In that case, the number of observed data 
points in each tail is equal to K. 

27. To be precise, when fitting the left tail, (35) is 
maximized over the absolute values of Ya^l, 
k = 1,..., K. 

28. See, for example, Silverman (1986). 

29. m(ii) is also known as mean residual life 
function in survival analysis and character¬ 
izes the expected residual lifetime of a com¬ 
ponent that has function for u units of time 
already. 

30. The zth order statistic is the zth largest ob¬ 
servation in a data sample. The first or¬ 
der statistic is the maximum of the sample, 
while the zzth order statistic is the minimum 
of a sample of size n. 

31. See Rachev, Racheva-Yotova, and Stoyanov 
(2010) for a detailed discussion of these 
challenges. 

32. Value-at-risk is defined as the minimum 
loss at a given confidence level for a pre¬ 
defined time horizon. 

33. Goldberg, Miller, and Weinstein (2008) use 
time windows ranging from approximately 
1,500 to 7,600 days. 

34. The small variations are due to the vari¬ 
ability in the estimates of the Student's t 
GARCH model. 

35. In February 2007, Freddie Mac announced 
that it would no longer buy the most risky 
subprime mortgages and mortgage-related 
securities and in April 2007, New Cen¬ 
tury Financial Corporation, a leading sub¬ 


prime mortgage lender, filed for Chapter 11 
bankruptcy protection. Then, in the spring 
of 2008, we observed the collapse of Bear 
Stearns and the associated events. 

36. The bootstrap sample size is 500, since this 
is the length of the time window used to 
calibrate the model. 
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for assessing operational risk, 
777:92-93 

in asset pricing, 77:474 
defined, 7:394 

and discount factor model, 7:65-66 
and investor risk, 7:73-74 
using assumptions under, 7:68-69 

Caps 

defined, 7:248-251 
value of, 7:248,777:552-553 
valuing of, with floors, 7:249-250, 
7:256 

Carry, 7:423^26,7:481 

Carry costs, 7:424-425, 7:426, 7:435, 

7:437-438, 7:455n, 7:481. See also 
net cost of carry 

CART (classification and regression 
trees) 

defined, 77:375 

example, input variables for, 77:3797 
example, out-of-sample 
performance, 77:3817 
fundamentals of, 77:376-377 
in stock selection, 77:378-381 
strengths and weaknesses of, 
77:377-378 
uses of, 77:381 

Cash-and-carry trade, 7:480, 7:481, 

7:487 

Cash concept, 77:567 

Cash flows 
accounting for, 777:306 
analysis of, 77:574-577,777:4-5 
for bond class, III:9t 
of bonds, 7:211 

cash flow at risk (CFaR), 777:376-378 
classification of, 77:567 
defined, 7:209-210, 77:539, 777:4 
direct vs. indirect reporting method, 
77:567 


discounted, 7:225 
discrete, 7:429 

distribution analysis vs. benchmark, 
777:310 

estimation of, 7:209-210,77:21-23 
expected, 7:211 
factors in, 777:31-32, 777:377 
form residential mortgage loans, 
777:62 

futures vs. forwards, 7:431f 
future value of, 77:603/ 
influences on, 777:44 
interest coverage ratio of, 77:561, 
77:575 

interim, 7:482 
for loan pool, 777:9f 
measurement of, 77:565-566, 777:14 
monthly, 777:52-54, 777:537 
net free (NFCF), 77:572-574, 77:578 
in OAS analysis, 7:259 
perpetual stream of, 77:607-608 
sources of, 77:540-541, 77:5697 
in state dependent models, 
7:351-352 

statement of, 77:539-541, 77:566-567 
time patterns of, 77:607-611 
and time value of money, 77:595-596 
time value of series of, 77:602-607 
for total return receivers, 7:542 
for Treasuries, 7:219, 777:564-565 
types of in assessing liquidity risk, 
777:378 

use of information on, 77:576-577 
valuation of, 77:618-619 
us. free cash flow, 77:22-23 
Cash flow statements 
example of, 77:541 
form of, 77:26f 

information from, 77:577-578 
reformatting of, 77:569f 
restructuring of, 77:568 
sample, 77:5477 
use of, 77:24-26 

Cash flow-to-debt ratio, 77:576 
Cash-out refinancing, 777:66,777:69 
Cash payments, 7:486-487, 777:377 
Categorizations, determining 
usefulness of, 77:335 
Cauchy, Augustin, 77:655 
Cauchy initial value problem, 77:655, 
77:656, 77:656/, 77:657 
CAViaR (conditional autoregressive 
value at risk), 77:366 
CDOs (collateralized debt 

obligations), 7:299,7:525,777:553, 
777:645 

CDRs (conditional default rates) 
in cash flow calculators, 777:34 
defaults measured by, 777:58-59 


defined, 777:30-31 
monthly, 777:627 
projections for, 777:35/ 
in transition matrices, 777:35/ 

CDSs (credit default swaps) 
basis, 7:232 
bids on, 7:527 
cash basis, 7:402 
discussion of, 7:230-232 
fixed premiums of, 7:530-531 
hedging with, 7:418 
illustration of, 7:527 
initial value of, 7:538 
maturity dates, 7:526 
payoff and payment structure of, 
7:534/ 

premium payments, 7:231/, 

7:533-535 

pricing models for, 7:538-539 
pricing of by static replication, 
7:530-532 

pricing of single-name, 7:532-538 
quotations for, 7:413 
risk and sensitivities of, 7:536-537 
spread of, 7:526 
unwinding of, 7:538 
use of, 7:403, 7:413, 77:284 
valuation of, 7:535-536 
volume of market, 7:414 
Central limit theorem 
defined, 7:149n, 777:209-210, 777:640 
and the law of large numbers, 
777:263-264 

and random number generation, 
777:646 

and random variables, 77:732-733 
Central tendencies, 77:353, 77:354, 77:355 
Certainty equivalents, 77:723-724, 
77:724-725 

CEV (constant elasticity of variance), 
777:550, 777:551/, 777:654-655 
Chambers-Mallows-Stuck generator, 
77:743-744 

Change of measures, 777:509-517, 
777:5167 

Change of time methods (CTM) 
applications of, 777:522-527 
discussion of, 777:519-522 
general theory of, 777:520-521 
main idea of, 777:519-520,777:527 
in martingale settings, 777:522-523 
in stochastic differential equation 
setting, 777:523 
Chaos, defined, 77:653 
Chaos: Making a New Science (Gleick), 
77:714 

Characteristic function 

us. probability density function, 
77:743 
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Characteristic lines, II: 316,11:3181, 
11:344-348,11:345-3471 
Chebychev inequalities. III: 210,111:225 
Chen model, 1:493 
Chi-square distributions, 1:388-389, 

111: 212-213 
Cholesky factor, 1:380 
Chow test, 11:336,11:343,11:344,11:350 
CID (conditionally independent 
defaults) models, 1:320, 
1:321-322,1:333 

CIR model, 1:498,1:500-501,1:502 
Citigroup, 1:302,1:408/, 1:409 f 
CLA (critical line algorithm), 1:73 
Classes 

criteria for, 11:494 
Classical tempered stable (CTS) 

distribution, 11:741-742,11:741/ 
11:742/ 11:743-744,111:512 
Classification, and Bayes' Theorem, 
1:145 

Classification and regression trees 
(CART). See CART 
(classification and regression 
trees) 

Classing, procedure for, 11:494^98 
Clearinghouses, 1:478 
CME Group, 1:489^90 
CMOs (collateralized mortgage 
obligations). III: 598, III :645 
Coconut markets, 1:70 
Coefficients 

binomial, 111:171,111:187-191 
of determination, 11:315 
estimated, 11:336-337 
Coherent risk measures, 111:327-329 
and VaR, 111:329 

Coins, fair/unfair, 111:169, 111:326-327 
Cointegrated models, 11:503 
Cointegration 
analysis of, 11:3811 
defined, 11:383 

empirical illustration of, 11:388-393 
technique of, 11:384—385 
testing for, 11:386-387 
test of, 11:3941,11:3961 
use of, 11:397 

Collateralized debt obligations 

(CDOs), 1:299,1:525,111:553, 
111:645 

Collateralized mortgage obligations 
(CMOs), 111:598,111:645 
Collinearity, 11:329-330 
Commodities, 1:279,1:556,1:566 
Companies. See firms 
Comparison principals, 11:676 
Comparisons ys. testing, 1.T56 
Complete markets, 1:103-104,1:119, 
1:133,1:461 


Complexity, profiting from, 11:57-58 
Complexity (Waldrop), 11:699 
Complex numbers, 11:591-592,11:592/ 
Compounding. See also interest 
and annual percentage rates, 11:616 
continuous, 11:599,11:617 
determining number of periods, 
11:602 

discrete us. continuous, ill:570-571 
formula for growth rate, 11:8 
more than once per year, II: 598-599 
and present value, 11:618 
Comprehensive Capital Analysis and 
Review, 1:300 

Comprehensive Capital Assessment 
Review, 1:412 

Computational burden. III: 643-644 
Computers. See also various software 
applications 

increased use of. III: 137-138 
introduction of into finance, 11:480 
modeling with, 1:511, II: 695 
random walk generation of, 11:7 08 
in stochastic programing, ill: 124, 

111: 125-126 

Concordance, defined, 1:327 
Conditional autoregressive value at 
risk (CAViaR), 11:366 
Conditional default rate (CDR). See 
CDRs (conditional default 
rates) 

Conditionally independent defaults 
(CID) models, 1:320,1:321-322, 
1:323 

Conditioning/conditions, 1:24, 
11:307-308,11:361,11:645 
Confidence, 1:200,1:201,11:723,111:319 
Confidence intervals, 11:440,111:3381, 
III:399-400, 111:400/ 
Conglomerate discounts, 11:43 
Conseco, debt restructure of, 1:529 
Consistency, notion of, 11:666-667 
Constant elasticity of variance (CEV), 
111:550,111:551/ 111:654-655 
Constant growth dividend discount 
model, 11:7-9 
Constraints, portfolio 
cardinality, 11:64-65 
common, 111:146 
commonly used, 11:62-66,11:84 
holding, 11:62-63 

minimum holding/transaction size, 
11:65 

nonnegativity, 1:73 
real world, 11:224-225 
round lot, 11:65-66 
setting, 1:192 
turnover, 11:63 
on weights of, I.T91-192 


Constraint sets, 1:21,1:28,1:29 
Consumer Price Index (CPI), 

1:277-278,1:291/ 1:292,1:292/ 
Consumption, 1:59-60,11:360,111:570 
Contagion, 1:320,1:324,1:333 
Contingent claims 
financial instruments as, 1:462 
incomplete markets for, 1:461^62 
unit, 1:458 

use of analysis, 1:463 
utility maximization in markets, 
1:459^61 

value of, 1:458^159 
Continuity, formal treatment of, 

11:583-584 

Continuous distribution function 

(c.d.f.), 111:167,111:196,111:205, 
111:345-346,111:345/ 

Continuous distribution function F(a), 
111:196 

Continuous time/continuous state, 
111:578 

Continuous-time processes, change of 
measure for, 111:511-512 
Control flow statements in VBA, 
111:458-460 

Control methods, stochastic, 1:560 
Convenience yields, 1:424,1:439 
Convergence analysis, II: 667-668 
Conversion, 1:274,1:445 
Convexity 

in callable bonds. III: 302-303 
defined, 1:258-259,111:309 
effective. 111: 13, III: 300-304, Ill:617f 
measurement of, 111:13-14, 

Ill:304-305 

negative, 1II.T4,111:49,111:303 
positive, 111:13 
use of. III: 299-300 
Convex programming, 1:29,1:31-32 
Cootner, Paul, 111:242 
Copulas 

advantages of. Ill:284 
defined, 111:283 
mathematics of. III:284-286 
usefulness of. III:287 
visualization of bivariate 
independence, 111:285/ 
visualization of Gaussian, 111:287/ 
Corner solutions, 1:200 
Correlation coefficients 
relation to R 2 ,11:316 
and Theil-Sen regression, 11:444 
use of, 111:286-287 
Correlation matrices, 11:1601, /1:1631, 
Ill:396-397 
Correlations 

in binomial distribution, 1:118 
computation of, 1:92-93 


Index 


763 


concept of. III :283 
drawbacks of. III: 283-284 
between periodic increments, 
777:540f 

and portfolio risk, 7:11 
robust estimates of, 77:443^46 
serial, 77:220 
undesirable, 7:293 
use of, 77:271 

Costs, net financing, 7:481 
Cotton prices, model of, 777:383 
Countable additivity, 777:158 
Counterparts, robust, 77:81 
Countries, low- vs. high inflation, 

7:290 

Coupon payments, 7:212,777:4 
Coupon rates, computing of, 
777:548-549 

Courant-Friedrichs-Lewy (CFL) 
conditions, 77:657 
Covariance 

calculation of between assets, 7:8-9 
estimators for, 7:38^0, 7:194-195 
matrix, 7:38-39, 7:155, 7:190 
relationship with correlation, 7:9 
reliability of sample estimates, 77:77 
use of, 77:370-371 
Covariance matrices 
decisions for interest rates, 777:406 
eigenvectors / eigenvalues, 77:1607 
equally weighted moving average, 
777:402-403 

frequency of observations for, 

777:404 

graphic of, 77:1617 
residuals of return process of, 

77:1627 

of RiskMetrics™ Group, 777:412-413 
statistical methodology for, 
777:398-399 

of ten stock returns, 77:1597 
use of, 77:158-159, 77:169 
using EWMA in, 777:411 
Coverage ratios, 77:560-561 
Cox-Ingersoll-Ross (CIR) model, 7:260, 
7:491-492, 7:547, 7:548, 
777:546-547, 777:656 
Cox processes, 7:315-316, 77:470-471 
Cox-Ross-Rubenstein model, 7:510, 
7:522, 77:678 

CPI (Consumer Price Index), 

7:277-278, 7:291/, 7:292, 7:292/ 
CPRs (conditional prepayment rates). 

See prepayment, conditional 
CPR vector, 777:74. See also 

prepayment, conditional 
Cramer, Harald, 77:470-471 
Crank-Nicolson schemes, 77:666, 

77:669, 77:674, 77:680 


Crank Nicolson-splitting (CN-S) 
schemes, 77:675 

Crashmetrics, use of, 777:379, 777:380 
Credible intervals, 7:156 
Credit-adjusted spread trees, 7:274 
Credit crises 
of 2007,777:74 
of 2008,777:381 

data from and DTS model, 7:396 
in Japan, 7:417 
Credit curing, 777:73 
Credit default swaps (CDSs). Sec CDSs 
(credit default swaps) 

Credit events 
and credit loss, 7:379 
in default swaps, 7:526,7:528-530 
definitions of, 7:528 
descriptions of most used, 7:528f 
exchanges/payments in, 7:231/ 
in MBS turnover, 777:66 
prepayments from, 777:49-50 
protection against, 7:230 
and simultaneous defaults, 7:323 
Credit hedging, 7:405 
Credit inputs, interaction of, 777:36-38 
Credit loss 

computation of, 7:382-383 
distribution of, 7:369/ 
example of distribution of, 7:386/ 
simulated, 7:389 

steps for simulation of, 7:379-380 
Credit models, 7:300, 7:302,7:303 
Credit performance, evolution of, 
777:32-36 
Credit ratings 
categories of, 7:362 
consumer, 7:302 
disadvantages of, 7:300-301 
implied, 7:381-382 
maturity of, 7:301 
reasons for, 7:300 
risks for, 77:280-281,77:2807 
use of, 7:309 
Credit risk 
common, 7:322 
counterparty, 7:413 
in credit default swaps, 7:535 
defined, 7:361 
distribution of, 7:377 
importance of, 777:81 
measures for, 7:386/ 
modeling, 7:299-300, 7:322, 777:183 
quantification of, 7:369-372 
reports on, 77:278-281 
shipping, 7:566 

and spread duration, 7:391-392 
vs. cash flow risk, 777:377-378 
Credit scores, 7:300-302, 7:301-302, 
7:309, 7:310n 


Credit spreads 

alternative models of, 7:405^06 
analysis with stock prices, 7:3057 
applications of, 7:404-405 
decomposition, 7:401-402 
drivers of, 7:402 
interpretation of, 7:403-404 
model specification, 7:403 
relationship with stock prices, 7:304 
risk in, 77:279f 
use of, 7:222-223 

Credit support, evaluation of, 111:39-40 
Credit value at risk (CVaR). See CVaR 
Crisis situations, estimating liquidity 
in, 777:378-380 

Critical line algorithm (CLA), 7:73 
Cross-trading, 77:85n 
Cross-validation, leave-one-out, 
77:413-414 

Crude oil, 7:561, 7:562 
Cumulation, defined, 777:471 
Cumulative default rate (CDX), 777:58 
Cumulative frequency distributions, 
77:493/ 77:4937, 77:498-499 
formal presentation of, 77:492-493 
Currency put options, 7:515 
Current ratio, 77:554 
Curve imbalances, 77:270-271 
Curve options, 777:553 
Curve risk, 77:275-278 
CUSIPs/ticker symbols, changes in, 
77:202-203 

CVaR (credit value at risk), 7:384-385, 
7:385-386, 77:68, 77:85n, 777:3927. 
See also value at risk (VaR) 

Daily increments of volatility, 777:534 
Daily log returns, 77:407-408 
Dark pools, 77:450, 77:454 
Data. See also operational loss data 
absolute, 77:487-488 
acquisition and processing of, 

77:198 

alignment of, 77:202-203 
amount of, 7:196 
augmentation of, 7:186n 
availability of, 77:202, 77:486 
backfilling of, 77:202 
bias of, 77:204, 77:713 
bid-ask aggregation techniques for, 
77:457/ 

classification of, 77:499-500 
collection of, 77:102, 77:103/ 
cross-sectional, 77:201, 77:488, 77:488/ 
in forecasting models, 77:230 
frequency of, 77:113, 77:368, 
77:462-463, 77:500 
fundamental, 77:246-247 
generation of, 77:295-296 
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Data ( Continued ) 
high-frequency (HFD) (See 

high-frequency data (HFD)) 
historical, 11:77-78, II: 122, 11:172 
housing bubble, II: 397-399 
importing into MATLAB, 

1/1:433—434 

industry-specific, II: 105 
integrity of, II: 201-203 
levels and scale of, 71:486-487 
long-term, 777:389-390 
in mean-variance, 7:193-194 
misuse of, 77:108 
on operational loss, 777:99 
from OTC business, 77:486 
patterns in, 77:707-708 
pooling of, 777:96 
of precision, 7:158 
preliminary analysis of, 777:362 
problems in for operational risk, 
777:97-98 

qualitative vs. quantitative, 

77:486 

quality of, 77:204, 77:211, 77:452-453, 
77:486,77:695 

reasons for classification of, 
77:493-494 

for relative valuation, 77:34—35 
restatements of, 77:202 
sampling of, 77:459/, 77:711 
scarcity of, 77:699-700, 77:703-704, 
77:718 

sorting and counting of, 77:488^191 
standardization of, 77:204,7/7:228 
structure/sample size of, 77:703 
types of, 77:486—488 
underlying signals, 77:111 
univariate, defined, 77:485 
working with, 77:201-206 
Databases 

Compustat Point-In-Time, 77:238 
Factiva, 77:482 

Institutional Brokers Estimate 
System (IBES), 77:238 
structured, 77:482 
third-party, 77:198,77:211n 
Data classes, criteria for, 77:500 
Data generating processes (DGPs), 

77:295-296, 77:298/, 77:502, 77:702, 
777:278 

Data periods, length of, 7/7:404 
Data series, effect of large number of, 
77:708-709 

Data sets, training/test, 77:710-711 
Data snooping, 77:700,77:710-712, 
77:714, 77:717,77:718 
Datini, Francesco, 77:479^180 
Davis-Lo infectious defaults model, 
7:324 


Days payables outstanding (DPO), 
calculation of, 77:553-554 
Days sales outstanding (DSO), 
calculation of, 77:553 
DCF (discounted cash flow) models, 
77:16, 77:44-45 

DDM (dividend discount models). See 
dividend discount models 
(DDM) 

Debt 

long-term, in financial statements, 
77:542 

models of risky, 7:304-307 
restructuring of, 7:230 
risky, 7:307-308 
Debt-to-assets ratio, 77:559 
Debt-to-equity ratio, 77:559 
Decomposition models 
active/passive, 777:19 
Default correlation, 7:317-318 
contagion, 7:353-354 
cyclical, 7:352,7:353 
linear, 7:320-321 
measures of, 7:320-321 
tools for modeling, 7:319-333 
Default intensity, 777:225 
Default models, 7:321-322, 7:370/ 
Default probabilities 
adjustments in real time, 7:300-301 
between companies, 7:412-413 
cyclical rise and fall, 7:408/ 7:409/ 
defined, 7:299-300 
effect of business cycle on, 7:408 
effect of rating outlooks on, 
7:365-366 

empirical approach to, 7:362-363 
five-year (Bank of America and 
Citigroup), 7:301/ 7:302/ 
merits of approaches to, 7:365 
Merton's approach to, 7:363-365 
probability of, 77:727, 77:727/ 77:728/ 
and survival, 7:533-535 
and survival probability, 7:323-324 
term structure of, 7:303 
time span of, 7:302-303 
vs. ratings and credit scores, 
7:300-302 

for Washington Mutual, 7:415/ 
7:416/ 

of Washington Mutual, 7:415/ 

7:416/ 

Defaults 

annual rates of, 7:363 
and Bernoulli distributions, 
777:169-170 

calculation of monthly, 77/:61f 
clustering of, 7:324—325 
contagion, 7:320 
copulas for times, 7:329-331 


correlation of between companies, 
7:411 

cost of, 7:401,7:404/ 
dollar amounts of, 777:59/ 
effect of, 7:228, 777:645 
event vs. liquidation, 7:349 
factors influencing, 777:74—75 
first passage model of, 7:349 
historical database of, 7:414 
intensity of, 7:330, 7:414 
looping, 7:324-325 
measures of, 777:58-59 
in Merton approach, 7:306 
Moody's definition of, 7:363 
predictability of, 7:346-347 
and prepayments, 777:49-50, 
7/7:76-77 

process, relationship to recovery 
rate, 7:372 

pseudo intensities, 7:330 
rates of cumulative/conditional, 
777:63 

recovery after, 7:316-317 
risk of, 7:210 

simulation of times, 7:322-324,7:325 
threshold of, 7:345-346 
times simulation of, 7:319 
triggers for, 7:347-348 
variables in, 7:307-308 
Default swaps 

assumptions about, 7:531-532 
and credit events, 7:530 
digital, 7:537 
discussion of, 7:526-528 
market relationship with cash 
market, 7:530 

and restructuring, 7:528-529 
value of spread, 7:534 
Default times, 7:332 
Definite covariance matrix, 77:445 
Deflators, 7:129, 7:136 
Degrees, in ordinary differential 
equations, 77:644-645 
Degrees of freedom (DOF) 
across assets and time, 77:735-736 
in chi-square distribution, 777:212 
defined, 77:734 

for Dow Jones Industrial Average 
(DJIA), 77:735-737, 77:737/ 
prior distribution for, 7:177 
range of, 7:187n 

for S&P 500 index stock returns, 
77:735-736,77:736/ 

Delinquency measures, 7/7:57-58 
Delivery date, 7:478 
Delta, 7:509, 7:516-518, 7:521 
Delta-gamma approximation, 7:519, 
7/7:644-645 

Delta hedging, 7:413,7:416,7:418, 7:517 
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Delta profile, 1:518/ 

Densities 
beta, 111:108/ 

Burr, 111:110/ 

closed-form solutions for, 111:243 
exponential. 111:105-106,111:105/ 
gamma, 111:108/ 

Pareto, 111:109/ 
posterior, 1:170/ 
two-point lognormal, 111:111/ 
Density curves, 1:1 47f 
Density functions 
asymmetric, 111:205/ 
of beta distribution, 111:222/ 
chi-square distributions, 111:213/ 
common means, different variances, 
111:203/ 

computing probabilities from, 

111:201 

discussion of. 111:197-200 
of F-distribution, 111:217/ 
histogram of, 111:198/ 
of log-normal distribution, 111:223/ 
and normal distribution, 11:733 
and probability, 111:206 
rectangular distributions, 111:220 
requirements of, 111:198-200 
symmetric. 111:204/ 
of f-distribution, 111:214/ 
Dependence, 1:326-327,11:305-308 
Depreciation, 11:22 

accumulated, 11:533-534 
expense zjs. book value, 11:539/ 
expense zjs. carrying value, 11:540/ 
in financial statements, 11:537-539 
on income statements, 11:536 
methods of allocation, 11:537-538 
Derivatives 

construction of, 11:586-587 
described, 11:585-586 
embedded, 1:462 
energy, 1:558 
exotic, 1:558,1:559-560 
of functions, defined, 11:593 
and incomplete markets, 1:462 
interest rate. 111:589-590 
nonlinearity of, 111:644-645 
OTC, 1:538 

pricing of, 1:58, 111:594-596 
pricing of financial, 111:642-643 
relationship with integrals, 11:590 
for shipping assets, 1:555,1:558, 
1:565-566 

use of instruments, 1:477 
valuation and hedging of, 1:558-560 
vanilla, 1:559 
Derman, Emanuel, 11:694 
Descriptors, 11:140,11:246-247,11:256 
Determinants, 11:623 


Deterministic methods 
usefulness of, 11:685 
Diagonal VEC model (DVEC), 11:372 
Dice, and probability, 111:152,111:153, 
111:155-156, lll:156f 
Dickey-Fuller statistic, 11:386-387 
Dickey-Fuller tests, 11:514 
Difference, notation of, 1:80 
Differential equations 
classification of, 11:657-658 
defined, 1:95,11:644,11:657 
first-order system of, 11:646 
general solutions to, 11:645 
linear, 11:647-648 
linear ordinary, 11:644—645 
partial (PDE), 11:643,11:654-657 
stochastic, 11:643-644 
systems of ordinary, 11:645-646 
usefulness of, 11:658 
Diffusion, 111:539, 111:554-555 
Diffusion invariance principle, 1:132 
Dimensionality, curse of, 11:673,111:127 
Dirac measures, 111:271 
Directional measures, 11:428,11:429 
Dirichlet boundary conditions, 11:666 
Dirichlet distribution, 1:181-183, 
!:186-187n 

Discounted cash flow (DCF) models, 
11:16,11:44-45 

Discount factors, 1:57-58,1:59-62,1:60, 
11:600-601 
Discount function 
calculation of, 111:571 
defined, 111:563 
discussion of. 111:563-565 
forward rates from. 111:566-567 
graph of, 111:563/ 
for on-the-run Treasuries, 

111:564-565 

Discounting, defined, 11:596 
Discount rates, 1:211,1:212,1:215-216, 
11:6 

Discovery heuristics, 11:711 
Discrepancies, importance of small, 
11:696 

Discrete law. 111: 165-169 
Discrete maximum principle, 11:668 
Discretization, 1:265,11:669/ 11:672 
Disentangling, 11:51-56 
complexities of, 11:55-56 
predictive power of, 11:54-55 
return revelation of, 11:52-54 
usefulness of, 11:52,11:58 
Dispersion measures, 111:352, 

111:353-354,111:357 
Dispersion parameters. 111:202-205 
Distress events, 1:351 
Distributional measures, 11:428 
Distribution analysis, cash flow, 111:310 


Distribution function, 111:218/ 111:224/ 
Distributions 

application of hypergeometric, 
111:177-178 

beliefs about, 1:152-153 
Bernoulli, 111:169-170,111:1851 
beta, 1:148,111:108 
binomial, 1:81/ 111:170-174,111:1851, 
111:363 

Burr, 111:109-110 

categories for extreme values, 11:752 
common loss, 111:1121 
commonly used, 111:225 
conditional, 111:219 
conditional posterior, 1:178-179, 
1:182-183,1:184-185 
conjugate prior, 1:154 
continuous probability, 111:195-196 
discrete, 111:1851 
discrete cumulative. 111: 166 
discrete uniform, 111:183-184, 
111:1851,111:638/ 

empirical, 11:498,111:104-105,111:105/ 
exponential, 111:105-106 
finite-dimensional, 11:502 
of Frechet, Gumbel and Weibull, 
111:267/ 

gamma, 111:107-108, 111:221-222 
Gaussian, 111:210-212 
Gumbel, 111:228,111:230 
heavy-tailed, l:186n, 11:733, 111: 109, 
111:260 

hypergeometric, 111:174-178, lll:185t 
indicating location of, 111:235 
infinitely divisible. 111:253-256, 
111:2531 

informative prior, 1:152-153 
inverted Wishart, 1:172 
light- os. heavy-tailed, 111:111-112 
lognormal, 111:106,111:106/ 

111:538-539 

mixture loss, 111:110-111 
for modeling applications, 111:257 
multinomial. 111:179-182,111:1851 
non-Gaussian, 111:254 
noninformative prior, 1:153-154 
normal (See normal distributions) 
parametric, 111:201 
Poisson, 1:142,111:182-183,111:1851, 
111: 217-218 

Poisson probability, 111:1871 
posterior, 1:147-148,1:165,1:166-167, 
1:169-170,1:177,1:183-184 
power-law. 111:262-263 
predictive, 1:167 
prior, 1:177,1:181-182,1:196 
proposal, 1:183-184 
representation of stable and CTS, 
11:742-743 
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Distributions ( Continued ) 
spherical, 71:310 

stable, 777:238, 777:242, 777:264-265, 
777:384 (See also a-stable 
distributions) 

subexponential, 777:261-262 
tails of, 777:112/, 777:648 
tempered stable, 777:257, 777:382 
testing applied to truncated, 777:367 
Diversification, 77:57-58 
achieving, 7:10 
and cap weighting, 7:38 
and credit default swaps, 7:413-414 
example of, 7:15 
international, 77:393-396 
Markowitz's work on, 77:471 
Diversification effect, 777:321 
Diversification indicators, 7:192 
Dividend discount models (DDM) 
applied to electric utilities, 77:127 
applied to stocks, 77:16-17 
basic, 77:5 

constant growth, 77:7-9, 77:17-18 
defined, 77:14 
finite life general, 77:5-7 
free cash flow model, 77:21-23 
intuition behind, 77:18-19 
multiphase, 77:9-10 
non-constant growth, 77:18 
predictive power of, 77:54 
in the real world, 77:19-20 
stochastic, 77:10-12, 77:127 
Dividend payout ratio, 77:4, 77:20 
Dividends 

expected growth in, 77:19 
forecasting of, 77:6 
measurement of, 77:3-4, 77:14 
per share, 77:3—4 
reasons for not paying, 77:27 
required rate of return, 77:19 
and stock prices, 77:4-5 
Dividend yield, 77:4, 77:19 
Documentation 

of model risk, 77:696, 77:697 
Dothan model, 7:491, 7:493 
Dow Jones Global Titans 500 (DJGTI), 
77:4907, 77:4917 

Dow Jones Industrial Average (DJIA) 
in comparison of risk models, 
77:747-751 

components of, 77:4897 
fitted stable tail index for, 77:740/ 
frequency distribution in, 77:4897 
performance (January 2004 to June 
2011), 77:749/ 

relative frequencies, 77:4917 
stocks by share price, 77:4927 
Drawing without replacement, 
777:174-177 


Drawing with replacement, 777:170, 
777:174, 777:179-180 

Drift 

effects of, 777:537 
of interest rates, 7:263 
in randomness calculations, 777:535 
in random walks, 7:84,7:86 
time increments of, 7:83 
of time series, 7:80 
as variable, 777:536 
DTS (duration times spread), 7:392, 
7:393-394, 7:396-398 
Duffie-Singleton model, 7:542-543 
Dupire's formula, 77:682-683,77:685 
DuPont system, 77:548-551, 77:551/ 
Duration 

calculations of real yield and 
inflation, 7:286 
computing of, 7:285 
defined, 7:284, 777:309 
effective, 777:300-304, 777:6177 
effective/option adjusted, 777:13 
empirical, of common stock, 
77:318-322,77:319-3227 
estimation of, 77:3237 
measurement of, 777:12-13, 
777:304-305 
models of, 77:461 
modified os. effective, 777:299 
Duration/convexity, effective, 7:255, 
7:256/ 

Duration times spread (DTS). Sec DTS 
(duration times spread) 
Durbin-Watson test, 777:647 
Dynamical systems 

equilibrium solution of, 77:653 
study of, 77:651 

Dynamic conditional correlation 
(DCC) model, 77:373 
Dynamic term structures, 777:576-577, 
777:578-579,777:591 

Early exercise, 7:447, 7:455. See calls, 
American-style; options 
Earnings before interest, taxes, 

depreciation and amortization 
(EBITDA), 77:566 

Earnings before interest and taxes 
(EBIT), 77:23, 77:547, 77:556 
Earnings growth factor, 77:223 
Earnings per share (EPS), 77:20-21, 
77:38-39, 77:537 

Earnings revisions factor, 77:207,77:209/ 
EBITDA/EV factor 
correlations with, 77:226 
examples of, 77:203, 77:203/ 77:207, 
77:208/ 

in models, 77:232, 77:238-239 
use of, 77:222-223 


Econometrics 
financial, 77:295,77:298-300, 
77:301-303 

modeling of, 77:373, 77:654 
Economic cycles, 7:537,77.-42M3 
Economic intuition, 77:715-716 
Economic laws, changes in, 77:700 
Economy 

states of, 7:49-50,77:518-519, 777:476 
term structures in certain, 
777:567-568 

time periods of, 77:515-516 
Economy as an Evolving Complex 

System, The (Anderson, Arrow, 
& Pines), 77:699 

Educated guesses, use of, 7:511 
EE (explicit Euler) scheme, 77:674, 
77:677-678 

Effective annual rate (EAR), interest, 
77:616-617 
Efficiency 

in estimation, 777:641-642 
Efficient frontier, 7:13-14, 7:17/ 7:289/ 
Efficient market theory, 77:396,777:92 
Eggs, rotten, 7:457-458 
Eigenvalues, 77:627-628,77:705, 
77:706-707/ 77:707f 
Einstein, Albert, 77:470 
Elements, defined, 777:153-154 
Embedding problem, and change of 
time method, 777:520 
Emerging markets, transaction costs 
in, 777:628 

EM (expectation maximization) 
algorithm, 77:146, 77:165 
Empirical rule, 777:210, 777:225 
Endogenous parameterization, 
777:580-581 
Energy 

cargoes of, 7:561-562 
commodity price models, 7:556-558 
forward curves of, 7:564-565 
power plants and refineries, 7:563 
storage of, 7:560-561, 7:563-564 
Engle-Granger cointegration test, 
77:386-388, 77:391-392, 77:395 
Entropy, 777:354 

EPS (earnings per share), 77:20-21, 
77:38-39, 77:537 

Equally weighted moving average, 
777.-400M02, 777.-406M07, 
777:408-409 

Equal to earnings before interest and 
taxes (EBIT), 77:23,77:547, 77:556 
Equal-variance assumption, 7.T64, 
7:167 

Equations 

difference, homogenous vs. 
nonhomogeneous, 77:638 
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difference vs. differential, II:629 
diffusion, II: 654-656, 11: 658n 
error-correction, 11:391, II:395f 
homogeneous linear difference, 

II:639-642,11:641/ 
homogenous difference, II:630-634, 
11:631-632/ 11:633-634/ 11:642 
linear, II:623-624 
linear difference, systems of. 

If:637-639 

matrix characteristics of, II: 628 
no arbitrage, 111:612,111:617-619 
nonhomogeneous difference, 

II: 634-637,11:635/ II:637-638/ 
stochastic. III: 478 
Equilibrium 

and absolute valuation models, 
1:260 

defined, II: 385-386 
dimensions of. III: 601 
in dynamic term structure models, 
III: 576 

expectations for, 11:112 
expected returns from, 11:112 
modeling of, 111:577,111:594 
in supply and demand. III:568 
Equilibrium models 
use of. III:603-604 
Equilibrium term structure models, 
III: 601 
Equities, 1:279 
investing in, II:89-90 
Equity 

on the balance sheet, 11:535 
changes in homeowner, 111:73 
in homes. III: 69 
as option on assets, 1:304-305 
shareholders', 11:535 
Equity markets, 11:48 
Equity multipliers, 11:550 
Equity risk factor models, 11:173-178 
Equivalent probability measures, 
1:111,111:510-511 
Ergodicity, defined, 11:405 
Erlang distribution. III:221-222 
Errors. See also estimation error; 
standard errors 

absolute percentages of, 11:525/ 
11:526/ 

estimates of, 11:676 
in financial models, 11:719 
a posteriori estimates, II:672-673 
sources of, 11:720 
terms for, 11:126 
in variables problem, 11:220 
Esscher transform, 111:511, III: 514 
Estimates/estimation 
confidence in, 1:199 
consensus, 11:34-35 


equations for, 1:348-349 
in EVT, 111:272-274 
factor models in, 11:154 
with GARCH models, 11:364-365 
in-house from firms, II: 35 
maximum likelihood, 11:311-313 
methodology for, 11:174—176 
and PCA, 11:167/ 
posterior, 1:176 
posterior point, 1:155-156 
processes for, 1:193,11:176 
properties of for EWMA, 1/1:410—411 
robust, 1:189 
techniques of, 11:330 
use of, 11:304 
Estimation errors 

accumulation of, 11:7 8 

in the Black-Litterman model, 1:201 

covariance matrix of, 111:139-140 

effect of, 1:18 

pessimism in, 111:143 

in portfolio optimization, II: 82, 

III: 138-139 
sensitivity to, 1:191 
and uncertainty sets, 111:141 
Estimation risk, 1:193 
minimizing. III: 145 
Estimators 
bias in. III: 641 
efficiency in, 111:641-642 
equally weighted average, 
111:400-402 
factor-based, 1:39 
terms used to describe, 11:314 
unbiased. III: 399 
variance, 11:313 

ETL (expected tail loss). III: 355-356 
Euler approximation, II: 649-650, 
11:649/ 11:650/ 

Euler constant. III: 182 

Euler schemes, explicit/implicit, II: 666 

Europe 

common currency for, 11:393 
risk factors of, 11:174 
European call options 
Black-Scholes formula for, 

III: 639-640 

computed by different methods, 

III: 650-651,111:651/ 
explicit option pricing formula, 

III: 526-527 

pricing by simulation in VBA, 
111:465-466 

pricing in Black-Scholes setting, 

III: 649 

simulation of pricing, 111:444—445, 
111:462^163 

and term structure models, 
111:544-545 


European Central Bank, 1:300 
Events 

defined. III: 85, III: 162, III: 508 
effects of macroeconomic, 11:243-244 
extreme, 111:245-246, III: 260-261, 

III: 407 

identification of, 11:516 
mutually exclusive. III: 158 
in probability. III: 156 
rare. III: 645 
rare us. normal, 1:262 
tail, III:88n, 111:111,111:118 
three-,5, III: 381-382 
EVT (extreme value theory). See 

extreme value theory (EVT) 
EWMA (exponentially weighted 

moving averages), 111:409—113 
Exceedance observations. III: 362-363 
Exceedances, of VaR, III: 325-326, 

III: 339 

Excel 

accessing VBA in. III: 477 
add-ins for, 1:93, III: 651 
data series correlation in, 1:92-93 
determining corresponding 
probabilities in. III:646 
Excel Link, 111:434 
Excel Solver, 11:70 
interactions with MATLAB, III: 448 
macros in. III:449,111:454—455 
notations in, III:477n 
random number generation in, 
111:645-646 

random walks with, 1:83,1:85,1:87, 
1:90 

@RISK in, II:12f 
syntax for functions in. III: 456 
Exchange-rate intervention, study on, 
111:177-178 

Exercise prices, 1:452,1:484,1:508 
Expectation maximization (EM) 
algorithm, 11:146,11:165 
Expectations, conditional, 1:122, 

II: 517-518, III: 508-509 
Expectations hypothesis. III: 568-569, 
III: 601n 

Expected shortfall (ES), 1:385-386, 

111:332. See also average value at 
risk (AVaR) 

Expected tail loss (ETL), III:291, 

111:293/ 111:345-347,111:347/ 

III:355-356 

Expected value (EV), 1:511 
Expenses, noncash, 11:25 
Experiments, possibility of, 11:307 
Explicit costs, defined. III: 623 
Explicit Euler (EE) scheme, 11:674, 

II:677-678 

Exponential density function, 111:218/ 
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Exponential distribution. III: 217-219 
applications in finance. III :219 
Exponentially weighted moving 
averages (EWMA) 
discussion of, 777:409—413 
forecasting model of, 777:411 
properties of the estimates, 
111:410-411 

standard errors for, 777:411^412 
statistical methodology in, 777:409 
usefulness of, 777:413—414 
volatility estimates for, 777:410/ 
Exposures 

calculation of, II:247t 
correlation between, 77:186 
distribution of, 77:250/, 17:251/, 77:254 
management of, 77:182-183 
monitoring of portfolio, 77:249-250 
name-specific, 17:188 
Extrema, characterization of local, 

7:23 

Extremal random variables, 777:267 
Extreme value distributions, 
generalized, 777:269 
Extreme value theory (EVT), 
77:744-746, 777:95, 777:228 
defined, 777:238 
for IID processes, 777:265-274 
in IID sequences, 777:275 
role of in modeling, 77:753n 

Factor analysis 
application of, 77:165 
based on information coefficients, 
77:222 

defined, 77:141, 77:169 
discussion of, 77:164-166 
importance of, 77:238 
vs. principal component analysis, 
77:166-168 

Factor-based strategies 
vs. risk models, 77:236 
Factor-based trading, 77:196-197 
model construction for, 77:228-235 
performance evaluation of, 
77:225-228 

Factor exposures, 77:247-248, 
77:275-283 

Factorials, computing of, 777:456 
Factorization, defined, 77:307 
Factor mimicking portfolio (FMP), 
77:214 

Factor model estimation, 77:142-147, 
77:150 

alternative approaches and 
extensions, 77:145-147 
applied to bond returns, 77:144-145 
computational procedure for, 
77:142-144 


fixed N, 77:143 
large N, 77:143-144 
Factor models 

in the Black-Litterman framework, 
7:200 

commonly used, 77:150 
considerations in, 77:178 
cross-sectional, 77:220-221 
defined, 77:153 
fixed income, 77:271-272 
in forecasting, 77:230-231 
linear, 77:154-156,77:168 
normal, 77:156 
predictive, 77:142 
static/dynamic, 77:146-147, 

77:155 

in statistical methodology, 77:141 
strict, 77:155-156 
types of, 77:138-142 
usefulness of, 77:154, 77:503 
use of, 7:354, 77:137,77:150, 77:168, 
77:219-225 

Factor portfolios, 77:224-225 
Factor premiums, cross-sectional 
methods for evaluation of, 
77:214-219 

Factor returns, 77:1917, 77:1927 
calculation of, 77:248 
Factor risk models, 77:113, 77:119 
Factors 

adjustment of, 77:205-206 
analysis of data of, 77:206-211 
categories of, 77:197 
choice of, 77:232-235 
defined, 77:196, 77:211 
desirable properties of, 77:200 
development of, 77:198 
estimation of types of, 77:156 
graph of, 77:166/ 
known, 77:138-139 
K systematic, 77:138-139 
latent, 77:140-141,77:150 
loadings of, 77:144, 77:1457, 77:155, 
77:1667, 77:167/ 77:1687 
market, 77:176 

orthogonalization of, 77:205-206 
relationship to time series, 77:168/ 
sorting of, 77:215 
sources for, 77:200-201 
statistical, 77:197 

summary of well-known, 77:1967 
transformations applied to, 77:206 
use of multiple, 77:141-142 
Failures, probability of, 77:726-727 
Fair equilibrium, between multiple 
accounts, 77:76 
Fair value 

determination of, 777:584-585 
Fair value, assessment of, 77:6-7 


Fama, Eugene, 77:468, 77:473^74 
Fama-French three-factor model, 
77:139-140, 77:177 

Fama-MacBeth regression, 77:220-221, 
77:224, 77:227-228,77:228/ 77:237, 
77:240n 

Fannie Mae/Freddie Mac, 

writedowns of, 777:77n 
Fast Fourier transform algorithm, 
77:743 
Fat tails 

of asset return distributions, 

777:242 

in chaotic systems, 77:653 
class 2, 777:261-263 
comparison between risk models, 
77:749-750 
effects of, 77:354 
importance of, 77:524 
properties of, 777:260-261 
in Student's t distribution, 77:734 
Favorable selection, 777:76-77 
F-distribution, 777:216-217 
Federal Reserve 

effects of on inflation risk premium, 
7:281 

study by Cleveland Bank, 
777:177-178 

timing of interventions of, 777:178 
Feynman-Kac formulas, 77:661 
FFAs (freight forward agreements), 
7:566 

Filtered probability spaces, 7:314-315, 
7:334n 

Filtration, 77:516-517, 111:476-477, 
777:489-490, 777:508 
Finance, three major revolutions in, 
777:350 

Finance companies, captive, 7:366-369 
Finance theory 
development of, 11:467-468 
effect of computers on, 77:476 
in the nineteenth century, 
77:468-469,77:476 
in the 1960s, 77:476 
in the 1970s, 77:476 
stochastic laws in, 777:472 
in the twentieth century, 77:476 
Financial assets, price distribution of, 
777:349-350 

Financial crisis (2008), 777:71 
Financial date, pro forma, 77:542-543 
Financial distress, defined, 7:351 
Financial institutions, model risk of, 
77:693 

Financial leverage ratios, 77:559-561, 
77:563 

Financial modelers, mistakes of, 
77:707-710 
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Financial planning, 111:126-127, III: 128, 
III: 129 

Financial ratios, II: 546,11:563-564 
Financial statements 

assumptions used in creating, 

II: 532 

data in, 11:563 

information in, 11:533-542,11:543 
pro forma, 11:22-23 
time statements for, 11:532 
usefulness of, 11:531 
use of, 11:204-205,11:246 
Financial time series, 1:79-80, 

1:386-387,11:415-416,11:503-504 
Financial variables, modeling of, 
111:280 

Find, in MATLAB, 111:422 
Finite difference methods, 11:648-652, 
11:656-657,11:665-666, 

11:674-675,11:676-677,111:19 
Finite element methods, 11:669-670, 
11:672,11:679-681 
Finite element space, 11:670-672 
Finite life general DDM, 11:5-7 
Finite states, assumption of, 1:100-101 
Firms 

assessment of, 11:546-547 
and capital structure, 11:473 
characteristics of, 11:94,11:176-177, 
11:201 

clientele of, 11:36 
comparable, 11:34,11:35-36 
geographic location of, 11:36 
history os. future prospects, 11:92 
phases of, 11:9-10 
retained earnings of, 11:20 
valuation of, 11:26-27,11:473 
value of, 11:27-31,11:39 
os. characteristics of group, 11:90-91 
First boundary problem, 11:655-656, 
11:657/ 

First Interstate Bancorp, 1:304 
analysis of credit spreads, 1:305f 
debt ratings of, 1:410 
First passage models (FPMs), 1:342, 
1:344-348 

Fischer-Tippett theorem. 111:266-267 
Fisher, Ronald, 1:140 
Fisherian, defined, 1:140 
Fisher's information matrix, 1:160n 
Fisher's law, 11:322-323 
Fixed-asset turnover ratio, 11:558 
Fixed-charge coverage ratio, 

11:560-561 

Flesaker-Hughston (FH) model, 

111:548-549 

Flows, discrete, 1:448-453 
FMP (factor mimicking portfolio), 
11:214 


Footnotes, in financial statements, 

11:541-542 

Ford Motor Company, 1:408/ 1:409/ 
Forecastability, 11:132 
Forecastability, concept of, 11:123 
Forecast encompassing 
defined, 11:230-231 
Forecasts 

of bid-ask spreads, H.-456M57 
comparisons of, 11:420-421 
contingency tables, !!:429f 
development of, 11:110-114 
directional, 11:428 
effect on future of, 11:122-123 
errors in, 11:422/ 

evaluation of, 11:428-430,111:368-370 
machine-learning approach to, 

11:128 

measures of, 11:429-430,11:430 
need for, 11:110-111 
in neural networks, II.419M20 
one-step ahead, 11:421/ 
parametric bootstraps for, 

11.-428M30 

response to macroeconomic shocks, 
11:55/ 

usefulness of, II: 131-132 
use of models for, 11:302 
of volatility, 111:412 
Foreclosures, III: 31,111:75 
Forward contracts 
advantages of, 1:430 
buying assets of, 1:439 
defined, 1:426,1:478 
equivalence to futures prices, 
1:432^33 

hedging with, 1:429, l:429t 
as OTC instruments, 1:479 
prepaid, 1:428 
price paths of, f:428f 
short us. long, 1:437-438,1:438/ 
valuing of, 1:426^430 
vs. futures, 1:430M31,1:433 
us. options, 1:437-439 
Forward curves 
graph of, 1:434/ 
modeling of, 1:533,1:557-558, 
1:564-565 

normal vs. inverted, 1:434 
of physical commodities, 1:555 
Forward freight agreements (FFAs), 
1:555,1:558,1:566 

Forward measure, use of, 1:543-544 
Forward rates 

calculation of, 1:491,111:572 

defined, 1:509-510 

from discount function, 111:566-567 

implied, 111:565-567 

models of. 111:543-544 


from spot yields, 111:566 
of term structure, 111:586 
Fourier integrals, 11:656 
Fourier methods, 1:559-560 
Fourier transform, 111:265 
FPMs (first passage models), 1:342, 
1:344-348 

Fractals, 11:653-654,111:278-280, 

111:479M80 

Franklin Tempelton Investment 

Funds, ll:496f, !!:497f, I!:498f 
Frechet distribution, !I:754n, 111:228, 
111:230,111:265,111:267,111:268 
Frechet-Hoeffding copulas, 1:327, 

1:329 

Freddie Mac, ll:77n, !!:754n. 111:49 
Free cash flow (FCF), 11:21-23 
analysis of, 11:570-571 
calculation of, 11:23-24,11:571-572 
defined, 11:569-571,11:578 
expected for XYZ, Inc., ll:30t 
financial adjustments to, 11:25-26 
statement of, direct method, 
11:24-25, ll:24f 

statement of, indirect method, 
11:24-25, ll:24f 
us. cash flow, 11:22-23 
Freedman-Diaconis rule, 11:494,11:495, 
11:497 
Frequencies 

accumulating, 11:491-492 
distributions of, 11:488-491,11:499/ 
empirical cumulative, 11:492 
formal presentation of, 11:491 
Frequentist, 1:140,1:148 
Frictions, costs of, 11:472-473 
Friedman, Milton, 1:123 
Frontiers, true, estimated and actual 
efficient, 1:190-191 
F_SCORE, use of, 11:230-231 
F-test, 11:336,11:337,11:344,11:425, 

11:426 

FTSE 100, volatility in. 111:412-413 
Fuel costs, 1:561,1:562-563. See also 
energy 

Full disclosure, defined, 11:532 
Functional, defined, 1:24 
Functional-coefficient autoregressive 
(FAR) model, 11:417 
Functions 
affine, 1:31 

Archimedean, 1:329,1:330-331,1:331 
Bessel, of the third kind, 11:591 
beta, 11:591 

characteristic, 11:591-592,11:593 
choosing and calibrating of, 
1:331-333 

Clayton, Frank, Gumbel, and 
Product, 1:329 
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Functions ( Continued) 
continuous, 77:581-584,77:582/, 
77:583,77:592-593 

continuous/discontinuous, 77:582/ 
convex, 7:24-27, 7:25, 7:25/ 7:26/ 
convex quadratic, 7:26, 7:31/ 
copula, 7:320, 7:325-333, 7:407-408 
for default times, 7:329-331 
defined, 7:24, 7:333 
density, 7:141 
with derivatives, 77:585/ 
elementary, 777:474 
elliptical, 7:328-329 
empirical distribution, 777:270 
factorial, 77:590-591 
gamma, 77:591, 77:591/ 777:212 
gradients of, 7:23 
Heaviside, 77:418-419 
hypergeometric, 777:256, 777:257 
indicator, 77:584-585, 77:584/ 77:593 
likelihood function, 7:141-143, 
7:143/ 7:144/ 7:148,7:176, 7:177 
measurable, 777:159-160, 777:160/ 
777:201 

minimization and maximization of 
values, 7:22, 7:22/ 
monotonically increasing, 
77:587-588, 77:588/ 
nonconvex quadratic, 7:26-27 
nondecreasing, 777:154-155,777:155/ 
normal density, 777:226/ 
optimization of, 7:24 
parameters of copulas, 7:331-332 
properties of quasi-convex, 7:28 
quasi-concave, 7:27-28, 7:27/ 
right-continuous, 777:154—155, 
777:155/ 

surface of linear, 7:33/ 
with two local maxima, 7:23/ 
usefulness of, 7:411—412 
utility, 7:4-5, 7:14-15, 7:461 
Fund management, art of, 7:273 
Fund separation theorems, 7:36 
Futures 

Eurodollar, 7:503 
hedging with, 7:433 
market for housing, 77:396-397 
prices of, and interest rates, 7:435n 
telescoping positions of, 7.-431M32 
theoretical, 7:487 
valuing of, 7.-430M33 
vs. forward contracts, 7:430—431 
Futures contracts 
defined, 7:478 
determining price of, 7:481 
pricing model for, 7:479—481 
theoretical price of, 7.-481M84 
vs. forward contracts, 7:433, 
7:478-479 


Futures options, defined, 7:453 
Future value, 77:618 
determining of money, 

77:596-600 

Galerkin methods, principle of, 

77:671 

Gamma, 7:509, 7:518-520 
Gamma process, 777:498 
Gamma profile, 7:519/ 

Gapping effect, 7:509 
GARCH (generalized autoregressive 
conditional heteroskedastic) 
models 

asymmetric, 77:367-368 
exponential (EGARCH), 77:367-368 
extensions of, 777:657 
factor models, 77:372 
GARCH-M (GARCH in mean), 
77:368 

Markov-switching, 7:180-184 
time aggregation in, 77:369-370 
type of, 77:131 
usefulness of, 777:414 
use of, 7:175-176, 7:185-186, 77:371, 
77:733-734,777:388 
and volatility, 7:179 
weights in, 77:363-364 
GARCH (1,1) model 
Bayesian estimation of, 7:176-180 
defined, 77:364 
results from, 77:366,77:3667 
skewness of, 777:390-391 
strengths of, 777:388-389 
Student's f, 7:182 
use of, 7:550-551, 777:656-657 
GARCH (1,1) process, 7:5517 
Garman-Kohlhagen system, 7:510-511, 
7:522 

Gaussian density, 777:98/ 

Gaussian model, 777:547-548 
Gaussian processes, 777:280,777:504 
Gaussian variables, and Brownian 
motion, 777:480—481 
Gauss-Markov theorem, 77:314 
GBM (geometric Brownian motion), 
7:95, 7:97 

GDP (gross domestic product), 7:278, 
7:282, 77:138, 77:140 
General inverse Gaussian (GIG) 
distribution, 77:523-524 
Generalized autoregressive 

conditional heteroskedastic 
(GARCH) models. See GARCH 
(generalized autoregressive 
conditional heteroskedastic) 
models 

Generalized central limit theorem, 
777:237, 777:239 


Generalized extreme value (GEV) 

distribution, 77:745, 777:228-230, 
777:272-273 

Generalized inverse Gaussian 

distribution, use of, 77:521-522 
Generalized least squares (GLS), 
7:198-199, 77:328 

Generalized tempered stable (GTS) 
processes, 777:512 
Generally accepted accounting 

principles (GAAP), 77:21-22, 
77:531-532,77:542-543 
Geometric mean reversion (GMR) 
model, 7:91-92 
computation of, 7:91 
Gibbs sampler, 7:172n, 7:179, 7:184-185 
GIG models, calibration of, 77:526-527 
Gini index of dissimilarity (Gini 
measure), 777:353-354 
Ginnie Mae/Fannie Mae/Freddie 
Mac, actions of, 777:49 
Girsanov's theorem 
and Black-Scholes option pricing 
formula, 7:132-133 
with Brownian motion, 777:511 
and equivalent martingale 
measures, 7:130-133 
use of, 7:263, 777:517 
Glivenko-Cantelli theorem, 777:270, 
777:272,777:348n, 777:646 
Global Economy Workshop, Santa Fe 
Institute, 77:699 

Global Industry Classification 

Standard (GICS®), 77:36-37, 
77:248 

Global minimum variance (GMV) 
portfolios, 7:39 

GMR (geometric mean reversion) 
model, 7:91-92 

GMV (global minimum variance) 
portfolios, 7:15, 7:194-195 
GNP, growth rate of (1947-1991), 
77:410-411, 77:410/ 

Gradient methods, use of, 77:684 
Granger causality, 77:395-396 
Graphs, in MATLAB, 777.-428M33 
Greeks, the, 7:516-522 
beta and omega, 7:522 
delta, 7:516-518 
gamma, 7:518-520 
rho, 7:521-522 
theta, 7:509, 7:520-521 
use of, 7:559, 77:660, 777:643-644 
vega, 7:521 

Greenspan, Alan, 7:140-141 
Growth, 7:283/ 77:239, 77:597-598, 
77:601-602 

Gumbel distribution, 777:265, 777:267, 
777:268-269 
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Hamilton-Jacobi equations, II :675 
Hankel matrices, 11:5 12 
Hansen-Jagannathan bound, 1: 59, 
7:61-62 

Harrison, Michael, 77:476 
Hazard, defined, 777:85 
Hazard (failure) rate, calculation of, 
777:94-95 

Heat diffusion equation, 77:470 
Heath-Jarrow-Morton framework, 
7:503, 7:557 

Heavy tails, 777:227,777:382 
Hedge funds, and probit regression 
model, 77:349-350 
Hedge ratios, 1:416-417, 7:509 
Hedges 

importance of, 7:300 
improvement using DTS, 7:398 
in the Merton context, 7:409 
rebalancing of, 7:519 
risk-free, 7:532/ 

Hedge test, 7:409, 7:411 
Hedging 

costs of, 7:514, 77:725 
and credit default swaps, 7.-413M14 
determining, 7:303-304 
with forward contracts, 7:429, 7:429f 
of fuel costs, 7:561 
with futures, 7:433 
gamma, 7:519 
portfolio-level, 7:412^13 
of positions, 77:724—726 
ratio for, 77:725 
with swaps, 7:434^135 
transaction-level, 7:412 
usefulness of, 7:418 
use of, 7:125-126 
using macroeconomic indices, 
7:414-417 

Hessian matrix, 7:23-24, 7:25,7:186n, 
777:645 

Heston model, 7:547, 7:548, 7:552, 
77:682 

with change of time, 777:522 
Heteroskedasticity, 77:220, 77:359, 
77:360, 77:403 

HFD (high-frequency data). See 
high-frequency data (HFD) 
Higham's projection algorithm, 

77:446 

High-dimensional problems, 77:673 
High-frequency data (HFD) 
and bid-ask bounce, 11:454-457 
defined, 77:449-450 
generalizations to, 77:368-370 
Level I, 11:451-452, 77:452/ 77:4537 
Level II, 77:451 
properties of, 77:451, 77:4537 
recording of, 77:450^151 


time intervals of, 77:457-462 
use of, 77:300, 77:481 
volume of, 11:451-454 
Hilbert spaces, 77:683 
Hill estimator, 77:747, 777:273-274 
Historical method 
drawbacks of, 777:413 
weighting of data in, 777:397-398 
Hit rate, calculation of, 77:240n 
HJM framework, 7:498 
HJM methodology, 7:496-497 
Holding period return, 7:6 
Ho-Lee model 
continuous variant for, 7:497 
defined, 7:492 
in history, 7:493 
interest rate lattice, 777:614/ 
as short rate model, 777:23 
for short rates, 777:605 
as single factor model, 777:549 
Home equity prepayment (HEP) 
curve, 777:55-56,777:56/ 
Homeowners, refinancing behavior of, 
777:25 

Home prices, 7:412, 77:397/ 77:399f, 
777:74-75 

Homoskedasticity, 77:360, 77:373 
Horizon prices, 777:598 
Housing, 77:396-399, 777:48 
Howard algorithm (policy iteration 
algorithm), 77:676-677,77:680 
Hull-White (HW) models 
binomial lattice, 777:610-611 
for calibration, 77:681 
defined, 7:492 
interest rate lattice, 777:614/ 
and short rates, 777:545-546 
for short rates, 777:605 
trinomial lattice, 777:613, 777:616/ 
usefulness of, 7:503 
use of, 777:557, 777:604 
valuing zero-coupon bond calls 
with, 7:500 
Hume, David, 7:140 
Hurst, Harold, 77:714 
Hypercubes, use of, 777:648 

IBM stock, log returns of, 77:407/ 
Ignorance, prior, 7:153-154 
Implementation risk, 77:694 
Implementation shortfall approach, 
777:627 

Implicit costs, 771:631 
Implicit Euler (IE) scheme, 77:674, 
77:677-678 

Implied forward rates, 777:565-567 
Impurity, measures of, 77:377 
Income, defined for public 
corporation, 77:21-22 


Income statements 
common-size, 77:562-563,77:562f 
defined, 77:536 

in financial statements, 77:536-537 
sample, 77:537t, 77:547t 
structure of, 77:536 
XYZ Inc. (example), 77:28f 
Income taxes. See taxes 
Independence, 7:372-373, 77:624-625, 
777:363-364, 777:368 
Independence function, in VaR 
models, 777:365-366 
Independently and identically 

distributed (IDD) concept, 
7:164,7:171, 77:127, 777:274-280, 
777:367,777:414 
Indexes 

characteristics of efficient, 7:427 
defined, 77:67 

of dissimilarity, 777:353-354 
equity, 7:157, 77:1907, 77:262-263 
tail, 77:740-741, 77:740/ 777:234 
tracking of, 77:64, 77:180 
use of weighted market cap, 7:38 
value weighted, 7:76-77 
volatility, 777:550-552, 777:552/ 

Index returns, scenarios of, 77:1907, 
77:1917 

Indifference curves, 7:4—5, 7:5/ 7:14 
Industries, characteristics of, 77:36-37, 
77:39-40 

Inference, 7:155-158, 7:1697 
Inflation 

effect on after-tax real returns, 
7:286-287 

and GDP growth, 7:282 
indexing for, 7:278-279 
in regression analysis, 77:323 
risk of, 77:282 

risk premiums for, 7:280-283 
seasonal factors in, 7:292 
shifts in, 7:285f 
volatility of, 7:281 
Information 

anticipation of, 777:476 
from arrays in MATLAB, 777:421 
completeness of, 7:353-354 
contained in high volatility stocks, 
777:629 

and filtration, 777:517 
found in data, 77:486 
and information propagation, 

77:515 

insufficient, 777:44 
integration of, 77:481-482 
overload of, 77:481 
prior in Bayesian analysis, 
7:151-155, 7:152 
propagation of, 7:104 
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Information ( Continued) 

structures of, f:106/, II:515-517 
unstructured us. semistructured, 

II: 481-482 

Information coefficients (ICs), 11:98-99, 
II: 221-223,11:223/, 11:227/ 11:234 
Information ratios 

defined, ll:86n, 11:115,11:119,11:237 
determining, 11:100/ 
for portfolio sorts, 11:219 
use of, 11:99-100 
Information sets, 11:123 
Information structures 
defined, 11:518 

Information technology, role of, 
11:480-481 

Ingersoll models, 1:271-273,1:275/ 
Initial conditions, fixing of, 11:502 
Initial margins, 1:478 
Initial value problems, 11:639 
Inner quartile range (IQR), 11:494 
Innovations, II: 126 
Insurance, credit, 1:413^14 
Integrals, 11:588-590,11:593. See also 
stochastic integrals 
Integrated series, and trends, 
11:512-514 

Integration, stochastic, 111:472,111:473, 
111:483 

Intelligence, general, 11:154 
Intensity-based frameworks, and the 
Poisson process, 1:315 
Interarrival time, 111:219,111:225 
Intercepts, treatment of, 11:334-335 
Interest 

accumulated, 11:604—605,11:604/ 
annual us. quarterly compounding, 
11:599/ 

compound, 11:597,11:59// 
computing accrued, and clean price, 
1:214-215 

coverage ratio, 11:560 
defined, 11:596 

determining unknown rates, 

11:601-602 

effective annual rate (EAR), 
11:616-617 
mortgage, II :398 
simple us. compound, 11:596 
terms of, 11:619 
from TIPS, 1:277 
Interest rate models 
binomial. 111:173-174,111:174/ 
classes of, 111:600 
confusions about, 111:600 
importance of, 111:600 
properties of lattices, 111:610 
realistic, arbitrage-free. III:599 
risk-neutral / arbitrage-free, 111:597 


Interest rate paths. Ill:6-9,111:7, lll:8f 
Interest rate risk, 1/1:12-14 
Interest rates 

absolute us. relative changes in, 
111:533-534 

approaches in determining future, 
III: 591 

binomial model of, 1II.T73-174 
binomial trees, 1:236,1:236/ 1:237/ 
I:240f, 1:244,1:244/ 111:174/ 
borrowing us. lending, 1:482-483 
calculation of, 11:613-618 
calibration of, 1:495 
caps/caplets of. 111:589-590 
caps on, 1:248-249 
categories of term structure, 111:561 
computing sensitivities. III:22-23 
continuous, 1:428,1:439^88 
derivatives of. III:589-590 
determination of appropriate, 
1 : 210-211 

distribution of. 111:538-539 
dynamic of process, 1:262 
effect of, 1:514-515 
effect of shocks. 111:23 
effect on putable bonds, 111:303-304 
future course of. III:567,111:573 
and futures prices, l:435n 
importance of models, 111:600 
jumps of, 111:539-541 
jumpy and continuous, 111:539/ 
long us. short, 111:538 
market spot/forward, 1:4951 
mean reversion of, 111:7 
modeling of, 1:261-265,1:267,1:318, 
1:491,1:503, Ill: 212-213 
multiple, II:599-600 
negative. Ill: 538 
nominal, 11:615-616 
and option prices, 1:486^487 
and prepayment risk, 111:48 
risk-free, 1:442 
shocks/shifts to. 111:585-596 
short-rate, I.-491M94, Ill: 595 
simulation of. Ill:541 
stochastic, 1:344,1:346 
structures of, 111:573, Ill:576 
use of for control, 1:489 
volatility of, 111:405,111:533 
Intermarket relations, no-arbitrage, 
1:453-455 

Internal consistency rule, in OAS 
analysis, 1:265 

Internal rate of return (IRR), 11:617-618 
in MBSs, 111: 36 

International Monetary Fund 
Global Stability Report, 1:299 
International Swap and Derivatives 
Association (ISDA). See ISDA 


Interpolated spread (I-spread), 1:227 
Interrate relationship, arbitrage-free. 
Ill: 544 

Intertemporal dependence, and risk. 
Ill: 351 

Intertrade duration, 11.-460M61, 

11:4621 

Intertrade intervals, 1M60M61 
Intervals, credible, 1:170 
Interval scales, data on, 11:487 
Intrinsic value, 1:441,1:511,1:513, 
11:16-17 

Invariance property. III: 328-329 
Inventory, 11:542,11:557 
Inverse Gaussian process, 111:499 
Investment, goals of, 11:114-115 
Investment management. 111: 146 
Investment processes 

activities of integrated, 11:61 
evaluation of results of, II: 117-118 
model creation, 11:96 
monitoring of performance, I1.T04 
quantitative, 11:95,11:95/ 
quantitative equity, 11:95/ 11:96/ 
11:105 

research, 11:95-102 
sell-structured, 11:108 
steps for equity investment, 11:119 
testing of, 11:109 

Investment risk measures. III:350-351 
Investments, 1:77-78n, 11:50-51, 
11:617-618 

Investment strategies, 11:66-67, 

11:198 

Investment styles, quantamental, 
11:93-94,11:93/ 

Investors 

behavior of, 11:207,11:504 
comfort with risk, 1:193 
completeness of information of, 

1:353-354 

focus of, 1:299,11:90-91 
fundamental us. quantitative, 
11:90-94,11:91/11:92/11:105 
goals/objectives of, 11:114-115, 
11:179, 111 :631 

individual accounts of, 11:74 
monotonic preferences of, 1:57 
number of stocks considered, 11:91 
preferences of, 1:5,1:260,11:48,11:56, 
11:92-93 

prior beliefs of, 11:727 
real-world, 11:132 
risk aversion of, 11:82-83,11:729 
SL-CAPM assumptions about, 1:66 
sophistication of, I1.T08 
in uncertain markets, 11:54 
views of, 1:197-199 
Invisible hand, notion of, 11:468-469 
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ISDA (International Swap and 
Derivatives Association) 

Credit Derivative Definitions (1999), 
1:230,1:528 

Master Agreement, I:538 
organized auctions, I:526-527 
supplement definition, 1:230 
I-spread (interpolated spread), 1:227 
Ito, Kiyosi, II :470 
Ito definition, 111:486-487 
Ito integrals, 1:122,111:475,111:481, 
111:490-491 
Ito isometry. III :475 
Ito processes 
defined, 1:95 
generic univariate, 1:125 
and Girsanov's theorem, 1:131 
under HJM methodology, 1:497 
properties of, 111:487-488 
and smooth maps, 111:493 
Ito's formula, 1:126,111:488-489 
Ito's lemma 
defined, 1:98 
discussion of, 1:95-97 
in estimation, 1:348 
and the Heston model, 1:548 

James-Stein shrinkage estimator, 1:194 
Japan, credit crisis in, 1:417 
Jarrow-Turnbull model, 1:307 
Jarrow-Yu propensity model, 1:324-325 
Jeffreys' prior, 1:153,1:160n, 1:171-172 
Jensen's inequality, 1:86, III:569 
Jevons, Stanley, 11:468 
Johansen-Juselius cointegration tests, 
II:391-393,11:395 
Joint jumps/defaults, 1:322-324 
Joint survival probability, 1:323-324 
Jordan diagonal blocks, 11:641-642 
Jorion shrinkage estimator, 1:194,1:202 
Jump-diffusion, III: 554-557,111:657 
Jumps 

default, 1:322-324 
diffusions, 1:559-560 
downward, 1:347 
idiosyncratic, 1:323 
incorporation of, 1:93-94 
in interest rates. III:539-541 
joint, 1:322-324 
processes of, 111:496 
pure processes, 111:497-501, 111:506 
size of, 111:540 

Kalotay-Williams-Fabozzi (KWF) 
model, 111:604,111:606-607, 
111:615/ 

Kamakura Corporation, 1:301,1:307, 

1:308-309,1:310n 
Kappa, 1:521 


Karush-Kuhn-Tucker conditions (KKT 
conditions), 1:28-29 
Kendall's tau, 1:327,1:332 
Kernel regression, 11:403,11:412-413, 
11:415 

Kernels, 11:412,11:413/ 11:746 
Kernel smoothers, 11:413 
Keynes, John Maynard, 11:471 
Key rate durations (KRD), 11:276, 
111:311-315,111:317 
Key rates, 11:276,111:311 
Kim-Rachev (KR) process. III:512-513 
KKT conditions (Karush-Kuhn-Tucker 
conditions), 1:28-29,1:31,1:32 
KoBoL distribution. Ill:257n 
Kolmogorov extension theorem, 
111:477-478 

Kolmogorov-Smirnov (KS) test, 11:430, 
III:366,111:647 

Kolomogorov equation, use of. III:581 
Kreps, David, 11:476 
Krispy Kreme Doughnuts, II:574-575, 
11:574/ 

Kronecker product, 1:172, l:173n 
Kuiper test. III:366 
Kurtosis, 1:41,111:234 

Lag operator L, 11:504—506,11:507, 

II:629-630 

Lagrange multipliers, 1:28,1:29-31, 
1:30,1:32 

Lag times, 11:387,111:31 
Laplace transforms, 11:647-648 
Last trades, price and size of, 11:450 
Lattice frameworks 
bushy trees in, 1:265,1:266/ 
calibration of, 1:238-240 
fair, 1:235 

interest rate, 1:235-236,1:236-238 

one-factor model, 1:236/ 

for pricing options, 1:487 

usefulness of, 1:235 

use of, 1:240,1:265-266,11I.T4 

value at nodes, 1:237-238 

1-year rates, 1:238/ 1:239/ 

Law of iterated expectations, 1:110, 
1:122,11:308 

Law of large numbers, 1:267,1:270n, 

III:263-264,111:275 
Law of one a, 11:50 
Law of one price (LOP), 1:52-55, 
1:99-100,1:102,1:119,1:260 
LCS (liquidity cost score), 1:402 
use of, 1:403 

LDIs (liability-driven investments), 
1:36 

LD (loss on default), 1:370-371 
Leases, in financial statements, 11:542 
Least-square methods, 11:683-685 


Leavens, D. H., 1:10 
Legal loss data 

Cruz study, 1/1:113,111:1151 
Lewis study, 111:117,111:1171 
Lehman Brothers, bankruptcy of, 1:413 
Level (parallel) effect, 11:145 
Levy-Khinchine formula. 111:253-254, 
111: 257 

Levy measures, 111:254,111:2541 
Levy processes 
and Brownian motion. 111: 504 
in calibration, 11:682 
change of measure for, 111:511-512 
conditions for. III:505 
construction of. III:506 
from Girsanov's theorem, 111:511 
and Poisson process. Ill:496 
as stochastic process. 111:505-506 
as subordinators. III :521 
for tempered stable processes, 
111:512-514,111:5141 
and time change, 111:527 
Levy stable distribution, 111:242, 
111:339, Ill:382-386,111:392 
LGD (loss given default), 1:366,1:370, 
1:371 

Liabilities, 11:533,11:534-535,111:132 
Liability-driven investments (LDIs), 
1:36 

Liability-hedging portfolios (LHPs), 
1:36 

LIBOR (London Interbank Offered 
Rate) 

and asset swaps, 1:227 
changes in, by type. III:539-540 
curve of, 1:226 
interest rate models, 1:494 
market model of. III:589 
spread of, 1:530 
in total return swaps, 1:541 
use of in calibration, 111:7 
Likelihood maximization, 1:176 
Likelihood ratio statistic, 11:425 
Limited liability rule, 1:363 
Limit order books, use of. III:625, 

111:632n 

Lintner, John, 11:474 

Lipschitz condition, II:658n, Ill:489, 

111: 490 
Liquidation 
effect of, 11:186 
procedures for, 1:350-351 
process models for, 1:349-351 
time of, 1:350 
vs. default event, 1:349 
Liquidity 

assumption of, 111:371 
in backtesting, II :235 
changes in, 1:405 
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Liquidity ( Continued ) 
cost of, 7:401 

creation of. III: 624-625, III :631 
defined. III: 372, 777:380 
effect of, 11:284 

estimating in crises, 111:378-380 
in financial analysis, II: 551-555 
and LCS, 7:404 
and market costs, 777:624 
measures of, 77:554-555 
premiums on, 7:294, 7:307 
ratios for, 77:555 
in risk modeling, 77:693 
shortages in, 7:347-348 
and TIPS, 7:293, 7:294 
and transaction costs, 777:624-625 
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Liquidity cost, 777:373-374,777:375-376 
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77:422, 77:427-428 
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delinquent, 777:63 
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recoverability of, 777:31-32 
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Lookback options, 7:114,777:24 
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price (LOP) 
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expected, 7:369-370, 7:373-374 
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7:375-376 
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severity of, 777:44 
unexpected, 7:371-372,7:374-375 
Loss functions, 7:160n, 777:369 
Loss given default (LGD), 7:366, 7:370, 
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Malliavin calculus, 777:644 
Management, active, 77:115 
Mandelbrot, Benoit, 77:653,77:738, 
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costs of, 777:623-624, 777:627 
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forecasting/modeling of, 

777:628-631 

forecasting models for, 777:632 
forecasting of, 777:628-629, 
777:629-631 

measurement of, 777:626-628 
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and transaction costs, 77:70 
Market model regression, 77:139 
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Market portfolios, 7:66-67, 7:72-73 
Market prices, 7:57, 777:372 
Market risk 

approaches to estimation of, 777:380 
in bonds, 777:595 
in CAPM, 7:68-69, 77:474 
importance of, 777:81 
models for, 777:361-362 
premium for, 7:203n, 7:404 
Markets 

approach to segmented, 77:48-51 
arbitrage-free, 7:118 
complete, 7:51-52, 777:578 
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models of, 777:589 
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systematic fluctuations in, 
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unified approach to, 77:49 
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Market standards, 7:257 
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Market weights, 77:269f 
Markov chain approximations, 77:678 
Markov chain Monte Carlo (MCMC) 
methods, 11:410/, 11:417-418 
Markov coefficients, II: 506-507, II :512 
Markov matrix, 1 :368 
Markov models, 7:114 
Markov processes 
in dynamic term structures, 777:579 
hidden, 7:182 
use of, 777:509, 777:517 
Markov property, 7:82,7:180-181,7:183, 
77:661, 777:193n 

Markov switching (MS) models 
discussion of, 7:180-184 
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usefulness of, 77:433 
use of, 77:409-411, 77:4117 
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Markowitz diversification, 7:10-11, 
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Markowitz efficient frontiers, 7:191/ 
Markowitz model 
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Mark-to-market (MTM) 
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defined, 7:535 

and telescoping futures, 7:431—432 
Marshall and Siegel, 77:694 
Marshall-Olkin copula, 7:323-324, 
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and arbitrage, 7:111-112, 7:124 
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defined, 7:110-111 
and Girsanov's theorem, 7:130-133 
and state prices, 7:133-134 
use of, 7:130-131 
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with change of time methods 
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development of concept, 77:469—470 
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measures of, 7:110-111 
use of conditions, 7:116 
use of in forward rates, 777:586 
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array operations in, 777:420-421 
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control flow statements in, 
777:427-428 
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European call option pricing with, 
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functions built into, 111:421-422 
graphs in, 777:428-433, 777.-429M30/, 
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M-files in, 777.-418M19,777:423, 
777:447 
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Optimization Tool, 777.-435M36, 
777:436/ 777:440/ 777:441/ 
overview of desktop and editor, 
777:418-419 

quadprog function, 77:70 
quadratic optimization with, 
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777:444 

for simulations, 777:651 
Sobol sequences in, 777:445—446 
for stable distributions, 777:344 
surf function in, 111:432-433 
syntax of, 777:426—427 
toolboxes in, 111:417-418 
user-defined functions in, 
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characteristic polynomial of, 77:628 
coefficient, 77:624 
companion, 77:639-640 
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diagonal, 77:622-623,77:640 
eigenvalues of random, 77:704-705 
eigenvectors of, 77:640-641 
in MATLAB, 777:422, 777:432 
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square, 77:622-623,77:626-627 
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traces of, 77:623 
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approach, 7:141, 7:348 
methods, 77:348-349, 77:737-738, 
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Maximum principle, 77:662, 77:667 
Max-stable distributions, 777:269, 
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MBS (mortgage-backed securities), 
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cash flow characteristics of, 777:48 
default assumptions about, 777:8 
negative convexity of, 777:49 
performance of, 777:74 
prices of, 777:26 
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777:34/ 

time-related factors in, 777:73-74 
valuation of, 777:62 
valuing of, 777:645 
MBS (mortgage-backed securities), 
nonagency 
analysis of, 111:44-45 
defined, 777:48 

estimation of returns, 777:36—44 
evaluation of, 777:29 
factors impacting returns of, 
777:30-32 

yield tables for, 777:411 
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777:353 

Mean absolute moment (MAM(q)), 
777:353 

Mean colog (M-colog), 777:354 
Mean entropy (M-entropy), 777:354 
Mean excess function, 77:746-747 
Mean/first moment, 777:201-202 
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Mean reversion 

discussion of, 7:88-92 
geometric, 7:91-92 
in HW models, 777:605 
and market stability, 777:537-538 
models of, 7:97 

parameter estimation, 7:90-91 
risk-neutral asset model, 777:526 
simulation of, 7:90 
in spot rate models, 777:580 
stabilization by, 777:538 
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Mean-reverting asset model (MRAM), 
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Means, 7:148, 7:155, 7:380,777:166-167 
Mean-variance 
efficiency, 7:190-191 
efficient portfolios, 7:13, 7:68, 7:69-70 
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nonrobust formulation, 777:139-140 
optimization, t:192 
constraints on, 7:191 
estimation errors and, 7:17-18 
practical problems in, 7:190-194 
risk aversion formulation, 77:70 
Mean variance analysis, 7:3,7:15/, 
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Median tail loss (MTL), 777:341 
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default probabilities with, 7:307-308 
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drawbacks of, 7:410 
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as first modern structural model, 
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calculation of expected/unexpected 
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expected loss under, 7:373-374 


Miller, Merton, 77:467, 77:473 
MiniMax (MM) risk measure, 777:356 
Minimization problems, solutions to, 
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7:69 
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Mispricing, risk of, 77:691-692 
Model creep, 77:694 
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framework, 777:278 
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calibration of structure, 777:549-550 
changes in mathematical, 77:480-481 
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nonlinear time series, 77:427—428, 
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nonlinear, II: 402-421,11:417-418 
penalty functions in, 11:703 
performance measurement of, 11:301 
predictive regressive, 11:130 
predictive return, 11:128-131 
for pricing, 11:127-128 
pricing errors in, 1:322 
principals for engineering, 
11:482-483 
probabilistic, 11:299 
properties of good, 1:320 
ranking alternative. III: 368-370 
recalibration of, 11:713-714 
reduced form default, 1:310,1:313 
regressive, 11:128,11:129-130 
relative valuation, 1:260 
return forecasting, 11:119 
returns of, 11:2331 
robustness of, 11:301 
selection of, 1:145,11:298,11:692-693, 
11:699-701 
short-rate, 1:494 
single-index market, 11:317-318 
static, 11:297,111:573 
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usefulness of, 11:122 
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binomial, 111:610,111:610/ 
Black-Karasinski (BK) lattice, 111:611 
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Hull White trinomial, 111:613 
trinomial, 111:610,111:610/, 
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components of, 11:717 
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importance of, 11:700 
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11:701-703,11:717 
uncertainty/noise in, 11:716-717 
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Modified tempered stable (MTS) 
processes, 111:513 
Modigliani, Franco, 11:467,11:473 
Modigliani-Miller theorem, 1:343, 

1:344,11:473,11:476 
Moment ratio estimators, 111:274 
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exponential. 111:255-256 
first. 111:201-202 
of higher order. 111:202-205 
integration of, 11:367-368 
raw, 11:739 
second, 111:202 
types of, 11:125 
Momentum 

formula for analysis of, 11:239 
portfolios based on, 11:181 
Momentum factor, 11:226-227 
Money, future value of, 11:596-600 
Money funds, European options on, 
1.-498M99 

Money markets, 1:279,1:282,1:314, 
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Monotonicity property, 111:327 
Monte Carlo methods 
advantages of, 11:672 
approach to estimation, 1:193 
defined, 1:273 
examples of. 111:637-639 
foundations of, 1:377-378 
for interest rate structure, 1:494 
main ideas of. 111:637-642 
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usefulness of, 1:389 
use of, 1:266-268,111:651 
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effect of sampling process, 1:384 
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sequences in, 1:378-379 
speed of, 111:644 
use of. 111: 10-11,111:642 
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Moody's Investors Service, 1:362 
Moody's KMV, 1:364-365 
Mortgage-backed securities (MBS). See 
MBS (mortgage-backed 
securities) 

Mortgage Bankers Association (MBA) 
method. 111:57-58 
Mortgagee pools 
composition of, 111:52 
defined, 111:23, 111: 65 
nonperforming loans and, 111:75 
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seasoning of, 111:20,111:22 
Mortgages, 111: 48M9,111:65,111:69, 

111:71 
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changes of, 11:723/ 

Mossin, Jan, 11:468,11:474 
Moving averages, infinite, 11:504—508 
MSCI Barra model, 11:140 
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MSCI World Index, 1:15-17 
analysis of 18 countries, 1:161 
MS GARCH model, 1:185-186 
estimation of, 1:182 
sampling algorithm for, 1:184 
MSR (maximum Sharpe Ratio), 1:36-37 
MS-VAR models, 11:131 
Multiaccount optimization, 11:75-77 
Multicollinearity, 11:221 
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111:191-192 
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MATLAB, 111:432-433,111:433/ 
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Multivariate stationary series, 
11:506-507 

Multivariate f distribution, loss 
simulation, 1:388-389 

Nadaraya-Watson estimator, 11:412, 
11:415 
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Near-misses, management of, 

111: 84-85 
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Net cost of carry, 1.-424M25,1:428, 

1:437,1:439-440,1:455 
Net free cash flow (NFCF), 11:572-574, 
11:578 

Net profit margin, 11:556 
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11:554-555 

Network investment models, 
111:129-130,111:129/ 

Neumann boundary condition, 11:666, 
11:671 

Neural networks, 11:403,11:418^121, 
11:418/ 11:701-702 
Newey-West corrections, 11:220 
NIG distribution. 111:257n 
9/11 attacks, effects of. 111:402-403 
No-arbitrage condition, in certain 
economy. 111:567-568 
No arbitrage models, use of, 111:604 
No-arbitrage relations, 1:423 
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Noise 

continuous-time. III :486 
in financial models, 11:7 21-722 
in model selection, 77:716-717 
models for, 77:726 
reduction of, 77:51-52 
Noise, white 
defined, 7:82, 77:297 
qualities of, 77:127 
sequences, 77:312, 77:313 
in stochastic differential equations, 
777:486 
strict, 77:125 

us. colored noise, 777:275 
Nonlinear additive AR (NAAR) 
model, 77:417 

Nonlinear dynamics and chaos, 77:645, 
77:652-654 
Nonlinearity, 77:433 
in econometrics, 77:401—403 
tests of, 77:421-427 

Non-normal probability distributions, 
77:480 

Nonparametric methods, 77:411^16 
Normal distributions, 7:81,7:82f, 
7:177-178, 777:638/ 
and AVaR, 777:334 
comparison with o'-stable, 777:234/ 
fundamentals of, 77:731-734 
inverse Gaussian, 777:231-233, 
777:232/ 777:233/ (See a/so 
Gaussian distribution) 
likelihood function, 7:142-143 
for logarithmic returns, 777:211-212 
mixtures of for downside risk 
estimation, 777:387-388 
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777:98-99 

multivariate, and tail dependence, 
7:387 

properties of, 77:732-733,777:209-210 
relaxing assumption of, 7:386-387 
standard, 777:208 

standardized residuals from, 77:751 
use of, 77:752n 

using to approximate binomial 
distribution, 777:211 
for various parameter values, 
777:209/ 

us. normal inverse Gaussian 
distribution, 777:232-233 
Normal mean, and posterior tradeoff, 
7:158-159 

Normal tempered stable (NTS) 
processes, 777:513 
Normative theory, 7:3 
Notes, step-up callable, 7:251-252, 
7:251/ 7:252/ 

Novikov condition, 7:131-132 


NTS distribution, 777:257n 
Null hypothesis, 7:157, 7:170,777:362 
Numeraire, change of, 777:588-589 
Numerical approximation, 7:265 
Numerical models for bonds, 
7:273-275 

OAS (option-adjusted spread). See 
option-adjusted spread 
Obligations, deliverable, 7:231, 7:526 
Observations, frequency of, 777:404 
Occam's razor, in model selection, 
77:696 

Odds ratio, posterior, 7:157 
Office of Thrift Supervision (OTS) 
method, 777:57-58 

Oil industry, free cash flows of, 77:570 
OLS (ordinary least squares). See 

ordinary least squares (OLS) 
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Operating cycles, 77:551-554 
Operating profit margin, 77:556 
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Student's t-test, 77:219 
Sturge's rule, 77:495 
Style analysis, 77:189 
Style factors, 77:247 
Style indexes, 77:48 
Stylized facts, 77:503-504 
Subadditivity property, 777:328 
Subordinated processes, 7:186n, 
777:277, 777:521-522 


Successive over relaxation (SOR) 
method, 77:677 

Summation stability property 
(Gaussian distribution), 
77:732-733 

Supervisory Capital Assessment 
Program, 7:300, 7:412 
Support, defined, 777:200 
Survey bias, 7:293 
Survival probability, 7:533-535 
Swap agreements, 7:434, 7:435-436n 
Swap curves, 7:226, 77:275-276 
Swap rates, 7:226,777:536/ 

Swaps 

with change of time method, 777:522 
covariance/correlation, 7:547-548, 
7:549-550, 7:552 
duration-matched, 7:285 
freight rate, 7:558 

modeling and pricing of, 7:548-550 
summary of studies on, 7:546f 
valuing of, 7:434-435 
Swap spread (SS) risk, 77:278,77:2787 
Swaptions, 7:502-503, 777:550 
Synergies, in conglomerates, 77:43—44 
Systematic risk, 77:290 
Systems 

homogenous, 77:624 
linear, 77:624 
types of, 77:47,77:58 

Tailing the hedge, defined, 7:433 
Tail losses 

in loss functions, 777:369-370 
Tail probability, 777:320 
Tail risk, 7:377, 7:385, 77:752 
Tails 

across assets through time, 
77:735-736 

behavior of in operational losses, 
777:111-112 

in density functions, 777:203 
dependence, 7:327-328, 7:387 
Gaussian, 777:98-99,777:260 
heavy, 77:734-744, 777:238 
modeling heaviness of, 77:742-743 
for normal and STS distributions, 
777:2467 

power tail decay property, 77:739, 
777:244 

properties of, 777:261-262 
tempering of, 77:741 
Takeovers, probability of, 7:144-145 
Tangential contour lines, 7:29-30,7:30/ 
7:32/ 

Tanker market, 7:565 
TAR-F test, 77:426 

TAR(l) series, simulated time plot of, 
77:404/ 


Tatonnement, concept of, 77:468 
Taxes 

and bonds, 7:226 
capital gains, 77:73 
cash, 77:573 

for cash/futures transactions, 7:484 
complexity of, 77:73-74 
deferred income, 77:535, 77:538 
effect on returns, 77:83-84, 77:84, 
77:85n 

in financial statements, 77:541 
impact of, 7:286-287 
incorporating expense of, 77:73-75 
managing implications of, 777:146 
and Treasury strips, 7:218 
Tax policy risk, 77:282-283 
Technology, effect of on relative 
values, 77:37 

Telescoping futures strategy, 7:433 
Tempered stable distributions 
discussions of, 777:246-252, 
777:384-386 

generalized (GTS), 777:249 
Kim-Rachev (KRTS), 777:251-252 
modified (MTS), 777:249-250 
normal (NTS), 777:250-251 
probability densities of, 777:247/ 
777:248/ 777:250/ 777:252/ 
rapidly decreasing (RDTS), 777:252 
tempering function in, 777:254, 
777:258n 

Tempered stable processes, 

777:499-501, 777:5007, 777:512-517 
Tempering functions, 777:254, 777:2557 
Templates, for data storage, 77:204 
Terminal profit, options and forwards, 
7:438/, 7:439/ 

Terminal values, 77:45 
Terminology 

of delinquency, default and loss, 
777:56 

of prepayment, 777:49-50 
standard, of tree models, 77:376 
Term structure 

in contiguous time, 777:572-573 
continuous time models of, 
777:570-571 
defined, 777:560 
eclectic theory of, 777:570 
of forward rates, 777:586 
mathematical relationships of, 
777:562 

modeling of, 7:490-494,777:560 
of partial differential equations, 
777:583-584 

in real world, 777:568-570 
Term structure modeling 
applications of, 777:584-586 
arbitrage-free, 777:594 
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Term structure modeling ( Continued) 
calibration of. III:580-581 
discount function in. III: 565 
discussion of. III: 560-561 
Term structure models 
approaches to. III: 603-604 
defined, 1: 262,1:263 
discrete time. III: 562-563 
discussion of. III: 561-562 
of interest rates, 1:314 
internal consistency checks for, 

III: 581 

with no mean reversion. III: 613-616 
for OAS, 1:265-267 
quantitative. III: 563 
static vs. dynamic. III: 561-562 
Term structures. III: 567-568, III: 570, 
111:579,111:587 

Tests 

Anderson-Darling (AD), 111:112-113 
BDS statistic, 11:423-424,11:427 
bispectral, 1TA22-A23 
cointegration, 11:708-710 
Kolmogorov-Smirnov (KS), 
111:112-113 

monotonic relation (MR), 11:219 
nonlinearity, II:426-427, 11:427f 
nonparametric, 11:422^424 
out-of-sample vs. in-sample, 

11:236 

parametric, 11:424^26 
RESET, 11:424-425 
run tests. III: 364 
threshold, 11:425-426 
for uniformity, 111:366 
TEV (tracking error volatility), 11:180, 
11:186,11:272-274,11:286-287 
Theil-Sen regression algorithm, 
11:440-442,11:443-446, 
ll:444f 

The Internal Measurement Approach 
(BIS), III.TOOn 

Theoretical value, determination of, 
111 : 10-11 

Theorie de la Speculation (The Theory of 
Speculation) (Bachelier), 

11:121-122,11:469 

Theory of point processes, 11:470-471 
Three Mile Island power plant crisis, 
11:51-52 

Three-stage growth model, 11:9-10 
Threshold autoregressive (TAR) 
models, 11:404—408 
Thresholds, 11:746-747 
Through the cycle, defined, 1:302-303, 
1:309-310 

Thurstone, Louis Leon, 11:154 
Tick data. See high-frequency data 
(HFD) 


Time 

in differential equations, 11:643-644 
physical vs. intrinsic scales of, 11:742 
use of for financial data, 11:546-547 
Time aggregation, 11:369 
Time decay, 1:509,1:513,1:521/ 

Time dependency, capture of, 

11:362-363 

Time discretization, 11:666,11:679 
Time increments 
models of, 1:79 
in parameter estimation, 1:83 
Time intervals, size of, 11:300-301 
Time lags, 11:299-300 
Time points, spacing of, 11:501 
Time premiums, 1:485 
Time series 

autocorrelation of, 11:331 
causal, 11:504 
concepts of, 11:501-503 
continuity of, 1:80 
defined, 11:501-502,11:519 
fractal nature of. III: 480 
importance of, 11:360 
multivariate, 11:502 
stationary, 11:502 
stationary/nonstationary, 11:299 
for stock prices, 11:296 
Time to expiry, 1:513 
Time value, 1:513,1:513/, II:595-596 
TIPS (Treasury inflation-protected 
securities) 

and after-tax inflation risk, 1:287 
apparent real yield premium, 1:293/ 
effect of inflation and flexible price 
CPI, 1:292/ 
features of, 1:277 
and flexible price CPI, 1:291/ 
and inflation, 1:290,1:294 
performance link with short-term 
inflation, 1:291-292 
real yields on, 1:278 
spread to nominal yield curve, 
1:281/ 

volatility of, 1:288-290,1:294 
vs. real yield, 1:293-294 
10-year data, 1:279-280 
yield of, 1:284 
yields from, 1:278 

TLF model, strengths of. III:388-389 
Total asset turnover ratio, 11:558 
Total return reports, II:237t 
Total return swaps, 1:540-542, 
1:541-542 

Trace test statistic, 11:392 
Tracking error 

actual vs. predicted, 11:69 
alternate definitions of, 11:67-68 
defined, II: 115,11:119 


estimates of future, 11:69 
as measure of consistency, 11:99-100 
reduction of, II:262-263 
standard definition, 11:67 
with TIPS, 1:293 

Tracking error volatility (TEV). See 
TEV (tracking error volatility) 
Trade optimizers, role of, II: 116-117 
Trades 

amount needed for market impact, 
III: 624 

cash-and-carry, 1:487 
crossing of, 11:75 

importance of execution of. III: 623, 
III: 631 

measurement of size. Ill: 628 
in portfolio construction, 11:104, 
11:116-117 

round-trip time of, 11:451 
size effects of, 111:372,111:630 
speed of, 11:105 
timing of. III:628-629 
Trading costs, 11:118, III:627-628, 

Ill:631-632 

Trading gains, defined, 1:122,1.T23 
Trading horizons, extending. Ill: 624 
Trading lists, ll:289f 
Trading strategies 
backtesting of, II:236-237 
categories of, 11:195 
in continuous-state, 

continuous-time, 1:122 
development of factor-based, 
11:197-198,11:211 
factor-based, 11:195, II:232-235 
factor weights in, 11:233/ 
in multiperiod settings, 1:105 
risk to, 11:198-200 
self-financing, 1:126-127,1:136 
Trading venues, electronic, 11:57 
Training windows, moving, 11:713-714 
Tranches, III: 38,111:391, Ill:45 
Transaction costs 
in backtesting, 11:235 
in benchmarking, 11:67 
components of, 11:119 
consideration of, 11:64,11:85-86n 
dimensions of, 111:631 
effect of, 1:483 
figuring, 11:85n 
fixed, 11:72-73 
forecasting of, 11:113-114 
incorporation of, 11:69-73,11:84 
international. III: 629 
linear, 11:70 

and liquidity. III:624-625 
managing, II1.T46 
measurement of, 111:626 
piecewise-linear, 11:70-72,11:71/ 
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quadratic, 11:7 2 
in risk modeling, II: 693 
types of. III: 623 
Transformations, nonlinear, 

III: 630-631 

Transition probabilities, 1:368, J:381f 
Treasuries 

correlations of, lll:405f 
covariance matrix of, fff:406f 
curve risk, 11:2771 
discount function for. 111:564-565 
futures, 1:482 
inflation-indexed, 1:286 
movements of, 111:403/ 
on-the-run, 1:227,111:7, III: 560 
par yield curve, !:218f 
spot rates, 1:220 
3-month, 1M15M16,11:416/ 
volatility of, 111:404-406, !!!:406f 
Treasury bill rates, weekly data, 1:89/ 
Treasury inflation-protected securities 
(TIPS). See TIPS (Treasury 
inflation-protected securities) 
Treasury Regulation T (Reg T), 1:67 
Treasury securities, 1:210-211 
comparable, defined. III :5 
in futures contracts, 1:483 
hypothetical, illustration of 
duration / convexity, 

111:308-310,111:3081 
maturities of, 1:226 
options on, 1:490 
par rates for, 1:217 
prediction of 10-year yield, 

II:322-328 
valuation of, 1:216 
yield of, Il:324-327t 
Treasury strips, 1:2181,1:220-221,1:286, 
111:560 

Treasury yield curves, 1:226, Ill:561 
Trees/lattices 

adjusted to current market price, 
1:496/ 

bushy trees, 1:265,1:266/ 
calibrated, 1:495 

convertible bond value, 1:274-275 
extended pricing tree, 111:23/ 
from historical data, 111:131/ 
pruning of, 11:377 
stock price, 1:274 
three-period scenario, 111:131/ 
trinomial, 1:81,1:273,1:495^96 
use of in modeling, 1:494-496 
Trees/lattices, binomial 
building of, 1:273 
for convertible bonds, 1:275/ 
discussion of, 1:80-81 
interest rate, 1:244 
model of, 1:273-275 


stock price model, 111:173 
term structure evolution, 1:495/ 
use of, 1:114-115,1:114/ 

Trends 

deterministic, 11:383 
in financial time series, 11:504 
and integrated series, 11:512-514 
stochastic, 11:383,11:384 
Treynor-Black model., I:203n 
Trinomial stochastic models, 11:11-12 
Truncated Levy flight (TLF), III:382, 
III:384-386 
IDD in, 111:386 
time scaling of, 111:385/ 

Truncation, III:385-386 
Truth in Savings Act, 11:615 
T-statistic, ll:240n, 11:336,11:350,11:390 
Tuple, defined. 111: 157 
Turnover 

assessment of, 111:68 
defined, 111:66 
in MBSs, 111:48 
in portfolios, 11:234,11:235 
Two beta trap, 1:74—77 
Two-factor models. 111:553-554 
Two-stage growth model, II:9 

U.K. index-linked gilts, tax treatment 
of, 1:287 
Uncertainties 

and Bayesian statistics, 1:140 
in measurement processes, 11:367 
modeling of, 11:306,111.T24, 

III T31-132 

and model risk, 11:729 
quantification of, 1:101 
representation of. Ill..128 
time behavior of, 11:359 
Uncertainty sets 

effect of size of, III.T43 
in portfolio allocation, 11:80 
selection of. III:T40-141 
structured, 111:143-144 
in three dimensions, 11:81/ 
use of, I11.T38,111:140 
Uncertain volatility model, 11:673-674 
Underperformance, finding reasons 
for, 11:118 

Underwater, on homeowner's equity, 
111:73 

Unemployment rate 

as an economic measure, 11:398 
application of TAR models to, 

II:405M06 

characteristics of series, 11:430 
forecasts from, 11:433 
performance of forecasting, 

II:432M33, Il:432f 
and risk, II:292n 


test of nonlinearity, 11:431, Il:431f 
time plot of, 11:406/ 11:430/ 
Uniqueness, theorem of. Ill:490 
Unit root series, 11:385 
Univariate linear regression model, 

1:163-170 

Univariate stationary series, 11:504 
U.S. Bankruptcy Code. Sec also 
bankruptcy 
Chapter 7,1:350 
Chapter 11,1:342,1:350 
Utility, 1:56,11:469,11:471,11:719-720 

Validation, out of sample, 11:711 
Valuation 

arbitrage-free, 1:216-217,1:220-222, 
1:2211 

and cash flows, 1:223 
defined, 1:209 

effect of business cycle on, 1:303-304 
fundamental principle of, 1:209 
with Monte Carlo simulation, 

111 : 6-12 

of natural gas/oil storage, 1:560-561 
of non-Treasury securities, 

1:222-223 

relative, 1:225,11:34-40,11:44-45 
risk-neutral, 1:557, Ill:595-596, 

111:601 

total firm, II: 21-23 
uncertainty in, 11:15 
use of lattices for, 1:240 
Value 

absolute vs. relative basis of, 

1:259-260 

analysis of relative, 1:225 
arbitrage-free, 1:221 
book os. market of firms, II: 559-560 
determining present, 11:600-601 
formulas for analysis of, 11:238-239 
identification of relative, 1:405 
intrinsic, 1:484-485 
present, discounted, 11:601/ 
relative, 1:405,11:37-38 
vs. price, l:455n 

Value at risk (VaR). See also CVaR 
(credit value at risk) 
in backtesting, 11:748 
backtesting of, 11:749/ 111:325-327, 
111:365-367 
boxplot of, 111:325/ 
and coherent risk measures. Ill:329 
conditional. III:332, HI:355-356, 
111:382 

deficiencies in, 1:407, III:321, 

111:331-332,111:347 
defined, ll:754n. 111:319-322 
density and distribution functions, 
111:320/ 
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Value at risk (VaR) ( Continued) 
determining from simulation, 

III: 639/ 

distribution-free confidence 
intervals for. III: 292-293 
estimation of, II: 366, III: 289-290, 
111:373-376, III: 644, 777:644f 
exceedances of. III: 325-326 
IDD in, 111:290 

interest rate covariance matrix in, 
111:403 

levels of confidence with, 

777:290-291 

liquidity-adjusted, 111:374,111:376 
in low market volatility, 17:748 
measurements by, 77:354 
methods of computation, 777:323 
modeling of, 77:130-131,777:375-376 
and model risk, 77:695 
normal against confidence level, 
777:294/ 

portfolio problem, 7:193 
in practice, 777:321-325 
relative spreads between 

predictions, 77:750/, 77:751/, 
77:752/ 

as safety-first risk measure, 

777:355 

standard normal distribution of, 
777:3247 
use of, 77:365 

os. deviation measures, 777:320-321 
Value of operations, process for 
finding, 77:307 
Values, lagged, 77:130 
Van der Korput sequences, 777:650 
Variables 

antithetic, 777:647-648 
application of macro, 77:193n 
behavior of, 777:152-153 
categorical, 77:333-334,77:350 
classification, 77:176 
declaration of in VBL, 777:457-458 
dependence between, 77:306-307 
dependent categorical, 77:348-350 
dependent/independent in CAPM, 
7:67 

dichotomous, 77:350 
dummy, 77:334 

exogenous us. endogenous, 77:692 
fat-tailed, 777:280 
independent and identically 
distributed, 77:125 
independent categorical, 77:333-348 
interactions between, 77:378 
large numbers of, 77:147 
macroeconomic, 77:54—55, 77:177 
in maximum likelihood 
calculations, 77:312-313 


mixing of categorical and 
quantitative, 77:334-335 
nonstationary, 77:388-393 
as observation or measurement, 
77:306 

random, 7:159n 
in regression analysis, 77:330 
separable, 77:647 
slope, 777:553 

split formation of, 777:130/ 
spread, 77:336 
standardization of, 77:205 
stationary, 77:385, 77:386 
stationary/nonstationary, 77:384-386 
stochastic, 777:159-164 
use of dummy, 77:335, 77:343-344 
Variables, random, 77:297 
a-stable, 777:242-244,777:244-245 
Bernoulli, 777:169 

continuous, 777:200-201, 777:205-206 
on countable spaces, 777:160-161, 
777:166 

defined, 777:162 
discrete, 777:165 
infinitely divisible, 777:253 
in probability, 777:159-164 
sequences of, 7:389 
on uncountable spaces, 777:161-162 
use of, 7:82 

Variance gamma process, 777:499, 
777:504 

Variance matrix, 77:370-371 
Variances 

addressing inequality of, 7:168 
based on covariance matrix, 77:1617, 
77:1637, 77:164/ 
conditional, 7:180 
conditional/unconditional, 77:361 
in dispersion parameters, 

777:202-203 
equal, 7:164 
as measure of risk, 7:8 
in probablity, 777:167-169 
reduction in, 777:647-651 
unequal, 7:167-168, 7:172 
Variances / covariances, 77:112-113, 
77:302-303,777:395-396 
Variance swaps, 7:545-547, 7:549, 

7:552 

Variational formulation, and finite 
element space, 77:670-672 
Variation margins, 7:478 
Vasicek model 

with change of time, 777:523-524 
for coupon-bond call options, 
7:501-502 

distribution of, 7:493 

in history, 7:491 

for short rates, 777:545-546 


use of, 7:89, 7:497 
valuing zero-coupon bond calls 
with, 7:499-500 

VBA (Visual Basic for Applications) 
built-in numeric functions of, 777:456 
comments in, 777:453 
control flow statements, 111:458-460 
debugging in, 777:461 
debugging tools of, 777:461, 777:477 
example programs, 777:449-452, 
777:461-466 

in Excel, 777:449,777:450/ 
FactorialFunl, 777:455-456 
functions, user-defined, 777:463/ 
functions in, 777:477 
generating Brownian motion paths 
in, 777:463-465 
If statements, 777:459 
For loops, 777:458-459 
methods (actions) in, 777:452-453 
modules, defined, 777:455 
as object-oriented language, 777:452, 
777:466 

objects in, 777:452 
operators in, 777.459^60 
Option Explicit command, 777:458 
pricing European call options, 
777465-466 

programing of input dialog boxes, 
777:460-461 

programming tips for, 777:454-461 
properties in, 777:453 
random numbers in, 777:464-465 
subroutines and user-defined 
functions in, 111:466-477 
subroutines vs. user-defined 
functions in, 111:455-457 
use of Option Explicit command, 
777:458 

user-defined functions, 777:463/ 
user interaction with, 777460^61 
variable declaration in, 777.457^58 
With/End structure in, 777:453^54 
writing code in, 777:453^454 
Vech notation, 77:371-372 
VEC model, 77:372 
Vector autoregressive (VAR) model, 
77:393 

Vectors, 77:621-622, 77:625-626, 77:628 
Vega, 7:521 

Vichara Technology, 777:41-42, 777:437 
Visual Basic for Applications (VBA). 

See VBA 
Volatilities 

absolute vs. relative, 777.404^405 

actual, 7:514 

aim of models of, 1:176 

analysis of, 77:270-272 

and ARCH models, 77.409 
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assumptions about, 111:7 
calculation of, 11:272, lll;534f 
calculation of daily. III: 533-534 
calibration of local, 11:681-685 
clustering of, 11:359,11:716,111:402 
confidence intervals for. III: 399-400 
constant. III :653 

decisions for measuring, 111:403—404 
defined, 111:533,111:653 
with different mean reversions, 
111:538/ 

of the diffusion, 1:125 
effect of local, 111:609 
effect on hedging, 1:517-518 
of energy commodities, 1:556-557 
estimation of, 11:368-369 
in EWMA estimates, 111:410—411 
exposure to, 11:252/ ll:252f 
forecasts of, 1:179-180,11:172, 
11:367-368 

in FTSE 100,111:412^13 
historical, 1:513,111:534,111:654 
hypothetical modelers of, 111:408 
implied, 1:513-514,11:282,11:662, 
111:654 

in interest rate structure models, 
1:492 

jump-diffusion, 111:657 
level-dependent. 111:654-655, 

111:656 

local, 11:681,11:682-683,111:655 

as a measure, 1:545,11:373 

measurement of, 1:393, 111:403-406 

minimization of, 11:179 

in models, 11:302 

models of, 11:428 

in option pricing, 1:513-514 

patterns in, 1:395 

in random walks, 1:84 

and risk, 11:270 

in risk-neutral measures, 111:587 
smile of, 111:557 
and the smoothing constant, 
111:409^10 
states of, 1:180-181 
stochastic, 1:94,1:547,1:548, 

111:655-658,111:656,111:658 
stochastic models, 11:681 
time increments of, 1:83 
of time series, 1:80 
time-varying, 11:733-734 
types of, 111:658 
ds. annual standard deviation, 
111:534 

Volatility clustering, 111:242,111:388 
Volatility curves. 111:534-535, 
lll:535f 


Volatility measures, nonstochastic, 

111:654-655 

Volatility multiples, use of, 

111:536 

Volatility risk, 1:509 
Volatility skew, 111:550,111:551/ 

111:555-556,111:654 
measuring, 111:550 
Volatility smile, 11:681,111:555-557, 
111:556/ 111:654,111:656 
Volatility swaps, 1:545-547,1:552 
for S&P Canada index (example), 

1:550-552 
valuing of, 1:549 
Volume-weighted average price 
(VWAP), 11:117,111:626-627 
VPRs (voluntary prepayment rates) 
calculation of, 111:76 
in cash flow calculators, 111:34 
defined, 111:30 
impacts of, 111:38 

W. T. Grant, cash flows of, 11:576 
Waldrop, Mitchell, 11:699 
Wal-Mart, 11:569,11:570/ 

Walras, Leon, 11:467,11:468-469, 

11:474 

Waterfalls, development of. 111:8 
Weak laws of large numbers (WLLN), 
111:263 

Wealth, l:460f, 111:130 
Weather, as chaotic system, 11:653 
Weibull density, 111:107/ 

Weibull distributions, 111:106-107, 
111:112,111:229,111:262,111:265, 
111:267,111:268 

Weighting, efficient, 1:41-42 
Weights, 11:115,11:1851,11:231-232, 
11:724 

Weirton Steel, cash flows of, 

11:577/ 

What's the hedge, 1:300,1:303,1:306, 
1:417. See also hedge test 
White noise. See noise, white 
Wiener processes, 1:95,1:491,1:497, 

III: 534-535,111:579,111:581 
Wilson, Kenneth, 11:480 
Wind farms, valuation of, 1:563-564 
Wold representation, 11:506 
Working capital, 11:551 
concept of, II: 567 

XML (extensible Markup Language), 
development of, II :482 

Yield and bond loss matrix. III Alt 
Yield curve risk, 111:307, 111:316-317 


Yield curves 
horizon, 111:585 

initial consistency with, 111:544 
issuer par, !:238f, l:244f 
nonparallel. III: 309-310 
parallel shifts in. III: 308-309 
par-coupon, 111:585 
reshaping duration. III: 315-316 
in scenario analysis, 11:290 
SEDUR/LEDUR, 111:316,111:317 
shifts in, 111:586 
slope of. III :315 
in term structures, 111:560 
in valuation, 1:235 
Yields 

calculation of, 11:613-618 
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Preface 


It is often said that investment management 
is an art, not a science. However, since the 
early 1990s the market has witnessed a pro¬ 
gressive shift toward a more industrial view of 
the investment management process. There are 
several reasons for this change. First, with 
globalization the universe of investable assets 
has grown many times over. Asset managers 
might have to choose from among several 
thousand possible investments from around 
the globe. Second, institutional investors, of¬ 
ten together with their consultants, have en¬ 
couraged asset management firms to adopt 
an increasingly structured process with docu¬ 
mented steps and measurable results. Pressure 
from regulators and the media is another fac¬ 
tor. Finally, the sheer size of the markets makes 
it imperative to adopt safe and repeatable 
methodologies. 

In its modern sense, financial modeling is 
the design (or engineering) of financial instru¬ 
ments and portfolios of financial instruments 
that result in predetermined cash flows con¬ 
tingent upon different events. Broadly speak¬ 
ing, financial models are employed to manage 
investment portfolios and risk. The objective 
is the transfer of risk from one entity to an¬ 
other via appropriate financial arrangements. 
Though the aggregate risk is a quantity that can¬ 
not be altered, risk can be transferred if there is 
a willing counterparty. 

Financial modeling came to the forefront of 
finance in the 1980s, with the broad diffusion 


of derivative instruments. However, the con¬ 
cept and practice of financial modeling are quite 
old. The notion of the diversification of risk 
(central to modem risk management) and the 
quantification of insurance risk (a requisite for 
pricing insurance policies) were already under¬ 
stood, at least in practical terms, in the 14th cen¬ 
tury. The rich epistolary of Francesco Datini, 
a 14th-century merchant, banker, and insurer 
from Prato (Tuscany, Italy), contains detailed 
instructions to his agents on how to diversify 
risk and insure cargo. 

What is specific to modem financial model¬ 
ing is the quantitative management of risk. Both 
the pricing of contracts and the optimization of 
investments require some basic capabilities of 
statistical modeling of financial contingencies. 
It is the size, diversity, and efficiency of mod¬ 
ern competitive markets that makes the use of 
financial modeling imperative. 

This three-volume encyclopedia offers not 
only coverage of the fundamentals and ad¬ 
vances in financial modeling but provides the 
mathematical and statistical techniques needed 
to develop and test financial models, as well as 
the practical issues associated with implemen¬ 
tation. The encyclopedia offers the following 
unique features: 

• The entries for the encyclopedia were writ¬ 
ten by experts from around the world. This 
diverse collection of expertise has created the 
most definitive coverage of established and 
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cutting-edge financial models, applications, 
and tools in this ever-evolving field. 

• The series emphasizes both technical and 
managerial issues. This approach provides 
researchers, educators, students, and practi¬ 
tioners with a balanced understanding of the 
topics and the necessary background to deal 
with issues related to financial modeling. 

• Each entry follows a format that includes the 
author, entry abstract, introduction, body, list¬ 
ing of key points, notes, and references. This 
enables readers to pick and choose among 
various sections of an entry, and creates con¬ 
sistency throughout the entire encyclopedia. 

* The numerous illustrations and tables 
throughout the work highlight complex top¬ 
ics and assist further understanding. 

* Each volume includes a complete table of con¬ 
tents and index for easy access to various 
parts of the encyclopedia. 

TOPIC CATEGORIES 

As is the practice in the creation of an ency¬ 
clopedia, the topic categories are presented al¬ 
phabetically. The topic categories and a brief 
description of each topic follow. 

VOLUME I 
Asset Allocation 

A major activity in the investment management 
process is establishing policy guidelines to sat¬ 
isfy the investment objectives. Setting policy be¬ 
gins with the asset allocation decision. That is, 
a decision must be made as to how the funds 
to be invested should be distributed among the 
major asset classes (e.g., equities, fixed income, 
and alternative asset classes). The term "asset 
allocation" includes (1) policy asset allocation, 
(2) dynamic asset allocation, and (3) tactical as¬ 
set allocation. Policy asset allocation decisions 
can loosely be characterized as long-term as¬ 
set allocation decisions, in which the investor 
seeks to assess an appropriate long-term "nor¬ 
mal" asset mix that represents an ideal blend 
of controlled risk and enhanced return. In dy¬ 
namic asset allocation the asset mix (i.e., the 


allocation among the asset classes) is mechanis¬ 
tically shifted in response to changing market 
conditions. Once the policy asset allocation has 
been established, the investor can turn his or her 
attention to the possibility of active departures 
from the normal asset mix established by policy. 
If a decision to deviate from this mix is based 
upon rigorous objective measures of value, it 
is often called tactical asset allocation. The fun¬ 
damental model used in establishing the policy 
asset allocation is the mean-variance portfolio 
model formulated by Harry Markowitz in 1952, 
popularly referred to as the theory of portfolio 
selection and modern portfolio theory. 

Asset Pricing Models 

Asset pricing models seek to formalize the rela¬ 
tionship that should exist between asset returns 
and risk if investors behave in a hypothesized 
manner. At its most basic level, asset pricing 
is mainly about transforming asset payoffs into 
prices. The two most well-known asset pricing 
models are the arbitrage pricing theory and the 
capital asset pricing model. The fundamental 
theorem of asset pricing asserts the equivalence 
of three key issues in finance: (1) absence of 
arbitrage; (2) existence of a positive linear pric¬ 
ing rule; and (3) existence of an investor who 
prefers more to less and who has maximized his 
or her utility. There are two types of arbitrage 
opportunities. The first is paying nothing to¬ 
day and obtaining something in the future, and 
the second is obtaining something today and 
with no future obligations. Although the prin¬ 
ciple of absence of arbitrage is fundamental for 
understanding asset valuation in a competitive 
market, there are well-known limits to arbitrage 
resulting from restrictions imposed on rational 
traders, and, as a result, pricing inefficiencies 
may exist for a period of time. 

Bayesian Analysis and Financial 
Modeling Applications 

Financial models describe in mathematical 
terms the relationships between financial 
random variables through time and / or across 
assets. The fundamental assumption is that the 
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model relationship is valid independent of the 
time period or the asset class under consider¬ 
ation. Financial data contain both meaningful 
information and random noise. An adequate 
financial model not only extracts optimally the 
relevant information from the historical data 
but also performs well when tested with new 
data. The uncertainty brought about by the 
presence of data noise makes imperative the use 
of statistical analysis as part of the process of fi¬ 
nancial model building, model evaluation, and 
model testing. Statistical analysis is employed 
from the vantage point of either of the two main 
statistical philosophical traditions—frequentist 
and Bayesian. An important difference be¬ 
tween the two lies with the interpretation of the 
concept of probability. As the name suggests, 
advocates of the frequentist approach interpret 
the probability of an event as the limit of its 
long-run relative frequency (i.e., the frequency 
with which it occurs as the amount of data in¬ 
creases without bound). Since the time financial 
models became a mainstream tool to aid in un¬ 
derstanding financial markets and formulating 
investment strategies, the framework applied 
in finance has been the frequentist approach. 
However, strict adherence to this interpretation 
is not always possible in practice. When study¬ 
ing rare events, for instance, large samples of 
data may not be available, and in such cases 
proponents of frequentist statistics resort to 
theoretical results. The Bayesian view of the 
world is based on the subjectivist interpretation 
of probability: Probability is subjective, a de¬ 
gree of belief that is updated as information or 
data are acquired. Only in the last two decades 
has Bayesian statistics started to gain greater 
acceptance in financial modeling, despite its 
introduction about 250 years ago. It has been 
the advancements of computing power and the 
development of new computational methods 
that have fostered the growing use of Bayesian 
statistics in financial modeling. 

Bond Valuation 

The value of any financial asset is the present 
value of its expected future cash flows. To value 


a bond (also referred to as a fixed-income secu¬ 
rity), one must be able to estimate the bond's 
remaining cash flows and identify the appro¬ 
priate discount rate(s) at which to discount the 
cash flows. The traditional approach to bond 
valuation is to discount every cash flow with 
the same discount rate. Simply put, the rele¬ 
vant term structure of interest rate used in val¬ 
uation is assumed to be flat. This approach, 
however, permits opportunities for arbitrage. 
Alternatively, the arbitrage-free valuation ap¬ 
proach starts with the premise that a bond 
should be viewed as a portfolio or package 
of zero-coupon bonds. Moreover, each of the 
bond's cash flows is valued using a unique dis¬ 
count rate that depends on the term structure 
of interest rates and when in time the cash flow 
is. The relevant set of discount rates (that is, 
spot rates) is derived from an appropriate term 
structure of interest rates and when used to 
value risky bonds augmented with a suitable 
risk spread or premium. Rather than model¬ 
ing to calculate the fair value of its price, the 
market price can be taken as given so as to 
compute a yield measure or a spread measure. 
Popular yield measures are the yield to matu¬ 
rity, yield to call, yield to put, and cash flow 
yield. Nominal spread, static (or zero-volatility) 
spread, and option-adjusted spread are popu¬ 
lar relative value measures quoted in the bond 
market. Complications in bond valuation arise 
when a bond has one or more embedded op¬ 
tions such as call, put, or conversion features. 
For bonds with embedded options, the finan¬ 
cial modeling draws from options theory, more 
specifically, the use of the lattice model to value 
a bond with embedded options. 

Credit Risk Modeling 

Credit risk is a broad term used to refer to three 
types of risk: default risk, credit spread risk, and 
downgrade risk. Default risk is the risk that the 
counterparty to a transaction will fail to satisfy 
the terms of the obligation with respect to the 
timely payment of interest and repayment of 
the amount borrowed. The counterparty could 
be the issuer of a debt obligation or an entity on 


XX 


Preface 


the other side of a private transaction such as a 
derivative trade or a collateralized loan agree¬ 
ment (i.e., a repurchase agreement or a secu¬ 
rities lending agreement). The default risk of 
a counterparty is often initially gauged by the 
credit rating assigned by one of the three rat¬ 
ing companies—Standard & Poor's, Moody's 
Investors Service, and Fitch Ratings. Although 
default risk is the one that most market partici¬ 
pants think of when reference is made to credit 
risk, even in the absence of default, investors 
are concerned about the decline in the market 
value of their portfolio bond holdings due to 
a change in credit spread or the price perfor¬ 
mance of their holdings relative to a bond in¬ 
dex. This risk is due to an adverse change in 
credit spreads, referred to as credit spread risk, 
or when it is attributed solely to the downgrade 
of the credit rating of an entity, it is called down¬ 
grade risk. Financial modeling of credit risk is 
used (1) to measure, monitor, and control a port¬ 
folio's credit risk, and (2) to price credit risky 
debt instruments. There are two general cate¬ 
gories of credit risk models: structural models 
and reduced-form models. There is consider¬ 
able debate as to which type of model is the 
best to employ. 

Derivatives Valuation 

A derivative instrument is a contract whose 
value depends on some underlying asset. The 
term "derivative" is used to describe this prod¬ 
uct because its value is derived from the value 
of the underlying asset. The underlying asset, 
simply referred to as the "underlying," can be 
either a commodity, a financial instrument, or 
some reference entity such as an interest rate or 
stock index, leading to the classification of com¬ 
modity derivatives and financial derivatives. 
Although there are close conceptual relations 
between derivative instruments and cash mar¬ 
ket instruments such as debt and equity, the two 
classes of instruments are used differently: Debt 
and equity are used primarily for raising funds 
from investors, while derivatives are primarily 


used for dividing up and trading risks. More¬ 
over, debt and equity are direct claims against a 
firm's assets, while derivative instruments are 
usually claims on a third party. A derivative's 
value depends on the value of the underly¬ 
ing, but the derivative instrument itself repre¬ 
sents a claim on the "counterparty" to the trade. 
Derivatives instruments are classified in terms 
of their payoff characteristics: linear and nonlin¬ 
ear payoffs. The former, also referred to as sym¬ 
metric payoff derivatives, includes forward, 
futures, and swap contracts while the latter in¬ 
clude options. Basically, a linear payoff deriva¬ 
tive is a risk-sharing arrangement between the 
counterparties since both are sharing the risk re¬ 
garding the price of the underlying. In contrast, 
nonlinear payoff derivative instruments (also 
referred to as asymmetric payoff derivatives) 
are insurance arrangements because one party 
to the trade is willing to insure the counter¬ 
party of a minimum or maximum (depending 
on the contract) price. The amount received by 
the insuring party is referred to as the contract 
price or premium. Derivative instruments are 
used for controlling risk exposure with respect 
to the underlying. Hedging is a special case of 
risk control where a party seeks to eliminate 
the risk exposure. Derivative valuation or pric¬ 
ing is developed based on no-arbitrage price 
relations, relying on the assumption that two 
perfect substitutes must have the same price. 

VOLUME II 

Difference Equations and Differential 
Equations 

The tools of linear difference equations and 
differential equations have found many ap¬ 
plications in finance. A difference equation is 
an equation that involves differences between 
successive values of a function of a discrete 
variable. A function of such a variable is 
one that provides a rule for assigning values 
in sequences to it. The theory of linear dif¬ 
ference equations covers three areas: solving 
difference equations, describing the behavior 
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of difference equations, and identifying the 
equilibrium (or critical value) and stability 
of difference equations. Linear difference 
equations are important in the context of dy¬ 
namic econometric models. Stochastic models 
in finance are expressed as linear difference 
equations with random disturbances added. 
Understanding the behavior of solutions of 
linear difference equations helps develop 
intuition for the behavior of these models. In 
nontechnical terms, differential equations are 
equations that express a relationship between 
a function and one or more derivatives (or 
differentials) of that function. The relationship 
between difference equations and differential 
equations is that the latter are invaluable for 
modeling situations in finance where there is a 
continually changing value. The problem is that 
not all changes in value occur continuously. If 
the change in value occurs incrementally rather 
than continuously, then differential equations 
have their limitations. Instead, a financial 
modeler can use difference equations, which 
are recursively defined sequences. It would 
be difficult to overemphasize the importance 
of differential equations in financial modeling 
where they are used to express laws that govern 
the evolution of price probability distributions, 
the solution of economic variational problems 
(such as intertemporal optimization), and 
conditions for continuous hedging (such as in 
the Black-Scholes option pricing model). The 
two broad types of differential equations are 
ordinary differential equations and partial dif¬ 
ferential equations. The former are equations or 
systems of equations involving only one inde¬ 
pendent variable. Another way of saying this 
is that ordinary differential equations involve 
only total derivatives. Partial differential equa¬ 
tions are differential equations or systems of 
equations involving partial derivatives. When 
one or more of the variables is a stochastic pro¬ 
cess, we have the case of stochastic differential 
equations and the solution is also a stochastic 
process. An assumption must be made about 
what is driving noise in a stochastic differential 


equation. In most applications, it is assumed 
that the noise term follows a Gaussian random 
variable, although other types of random 
variables can be assumed. 

Equity Models and Valuation 

Traditional fundamental equity analysis in¬ 
volves the analysis of a company's opera¬ 
tions for the purpose of assessing its economic 
prospects. The analysis begins with the finan¬ 
cial statements of the company in order to in¬ 
vestigate the earnings, cash flow, profitability, 
and debt burden. The fundamental analyst will 
look at the major product lines, the economic 
outlook for the products (including existing 
and potential competitors), and the industries 
in which the company operates. The result of 
this analysis will be the growth prospects of 
earnings. Based on the growth prospects 
of earnings, a fundamental analyst attempts 
to determine the fair value of the stock using 
one or more equity valuation models. The two 
most commonly used approaches for valuing a 
firm's equity are based on discounted cash flow 
and relative valuation models. The principal 
idea underlying discounted cash flow models 
is that what an investor pays for a share of stock 
should reflect what is expected to be received 
from it—return on the investor's investment. 
What an investor receives are cash dividends 
in the future. Therefore, the value of a share of 
stock should be equal to the present value of 
all the future cash flows an investor expects to 
receive from that share. To value stock, there¬ 
fore, an investor must project future cash flows, 
which, in turn, means projecting future divi¬ 
dends. Popular discounted cash flow models in¬ 
clude the basic dividend discount model, which 
assumes a constant dividend growth, and the 
multiple-phase models, which include the two- 
stage dividend growth model and the stochas¬ 
tic dividend discount models. Relative valua¬ 
tion methods use multiples or ratios—such as 
price/earnings, price/book, or price/free cash 
flow—to determine whether a stock is trad¬ 
ing at higher or lower multiples than its peers. 
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There are two critical assumptions in using rela¬ 
tive valuation: (1) the universe of firms selected 
to be included in the peer group are in fact com¬ 
parable, and (2) the average multiple across the 
universe of firms can be treated as a reason¬ 
able approximation of "fair value" for those 
firms. This second assumption may be prob¬ 
lematic during periods of market panic or eu¬ 
phoria. Managers of quantitative equity firms 
employ techniques that allow them to identify 
attractive stock candidates, focusing not on a 
single stock as is done with traditional funda¬ 
mental analysis but rather on stock character¬ 
istics in order to explain why one stock out¬ 
performs another stock. They do so by statis¬ 
tically identifying a group of characteristics to 
create a quantitative selection model. In con¬ 
trast to the traditional fundamental stock se¬ 
lection, quantitative equity managers create a 
repeatable process that utilizes the stock selec¬ 
tion model to identify attractive stocks. Equity 
portfolio managers have used various statistical 
models for forecasting returns and risk. These 
models, referred to as predictive return models, 
make conditional forecasts of expected returns 
using the current information set. Predictive re¬ 
turn models include regressive models, linear 
autoregressive models, dynamic factor models, 
and hidden-variable models. 

Factor Models and Portfolio 
Construction 

Quantitative asset managers typically employ 
multifactor risk models for the purpose of 
constructing and rebalancing portfolios and 
analyzing portfolio performance. A multifactor 
risk model, or simply factor model, attempts to 
estimate and characterize the risk of a portfolio, 
either relative to a benchmark such as a market 
index or in absolute value. The model allows 
the decomposition of risk factors into a sys¬ 
tematic and an idiosyncratic component. The 
portfolio's risk exposure to broad risk factors 
is captured by the systematic risk. For equity 
portfolios these are typically fundamental 
factors (e.g., market capitalization and value 


vs. growth), technical (e.g., momentum), and 
industry/sector/country. For fixed-income 
portfolios, systematic risk captures a portfolio's 
exposure to broad risk factors such as the 
term structure of interest rates, credit spreads, 
optionality (call and prepayment), credit, and 
sectors. The portfolio's systematic risk depends 
not only on its exposure to these risk factors but 
also the volatility of the risk factors and how 
they correlate with each other. In contrast to 
systematic risk, idiosyncratic risk captures the 
uncertainty associated with news affecting the 
holdings of individual issuers in the portfolio. 
In equity portfolios, idiosyncratic risk can be 
easily diversified by reducing the importance 
of individual issuers in the portfolio. Because 
of the larger number of issuers in bond indexes, 
however, this is a difficult task. There are dif¬ 
ferent types of factor models depending on the 
factors. Factors can be exogenous variables or 
abstract variables formed by portfolios. Exoge¬ 
nous factors (or known factors) can be identified 
from traditional fundamental analysis or from 
economic theory that suggests macroeconomic 
factors. Abstract factors, also called unidenti¬ 
fied or latent factors, can be determined with 
the statistical tool of factor analysis or principal 
component analysis. The simplest type of 
factor models is where the factors are assumed 
to be known or observable, so that time-series 
data are those factors that can be used to 
estimate the model. The four most commonly 
used approaches for the evaluation of return 
premiums and risk characteristics to factors are 
portfolio sorts, factor models, factor portfolios, 
and information coefficients. Despite its use by 
quantitative asset managers, the basic building 
blocks of factor models used by model builders 
and by traditional fundamental analysts are 
the same: They both seek to identify the drivers 
of returns for the asset class being analyzed. 

Financial Econometrics 

Econometrics is the branch of economics that 
draws heavily on statistics for testing and 
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analyzing economic relationships. The eco¬ 
nomic equivalent of the laws of physics, 
econometrics represents the quantitative, math¬ 
ematical laws of economics. Financial econo¬ 
metrics is the econometrics of financial markets. 
It is a quest for models that describe financial 
time series such as prices, returns, interest rates, 
financial ratios, defaults, and so on. Although 
there are similarities between financial econo¬ 
metric models and models of the physical sci¬ 
ences, there are two important differences. First, 
the physical sciences aim at finding immutable 
laws of nature; econometric models model the 
economy or financial markets—artifacts subject 
to change. Because the economy and financial 
markets are artifacts subject to change, econo¬ 
metric models are not unique representations 
valid throughout time; they must adapt to the 
changing environment. Second, while basic 
physical laws are expressed as differential 
equations, financial econometrics uses both 
continuous-time and discrete-time models. 

Financial Modeling Principles 

The origins of financial modeling can be traced 
back to the development of mathematical equi¬ 
librium at the end of the nineteenth century, fol¬ 
lowed in the beginning of the twentieth century 
with the introduction of sophisticated mathe¬ 
matical tools for dealing with the uncertainty 
of prices and returns. In the 1950s and 1960s, 
financial modelers had tools for dealing with 
probabilistic models for describing markets, the 
principles of contingent claims analysis, an op¬ 
timization framework for portfolio selection 
based on mean and variance of asset returns, 
and an equilibrium model for pricing capital 
assets. The 1970s ushered in models for pricing 
contingent claims and a new model for pricing 
capital assets based on arbitrage pricing. Con¬ 
sequently, by the end of the 1970s, the frame¬ 
works for financial modeling were well known. 
It was the advancement of computing power 
and refinements of the theories to take into 
account real-world market imperfections and 


conventions starting in the 1980s that facilitated 
implementation and broader acceptance of 
mathematical modeling of financial decisions. 
The diffusion of low-cost high-performance 
computers has allowed the broad use of numer¬ 
ical methods, the landscape of financial mod¬ 
eling. The importance of finding closed-form 
solutions and the consequent search for simple 
models has been dramatically reduced. Com¬ 
putationally intensive methods such as Monte 
Carlo simulations and the numerical solution 
of differential equations are now widely used. 
As a consequence, it has become feasible to 
represent prices and returns with relatively 
complex models. Nonnormal probability dis¬ 
tributions have become commonplace in many 
sectors of financial modeling. It is fair to say 
that the key limitation of financial modeling is 
now the size of available data samples or train¬ 
ing sets, not the computations; it is the data 
that limit the complexity of estimates. Math¬ 
ematical modeling has also undergone major 
changes. Techniques such as equivalent martin¬ 
gale methods are being used in derivative pric¬ 
ing, and cointegration, the theory of fat-tailed 
processes, and state-space modeling (including 
ARCH/GARCFI and stochastic volatility mod¬ 
els) are being used in financial modeling. 

Financial Statement Analysis 

Much of the financial data that are used in 
constructing financial models for forecasting 
and valuation purposes draw from the finan¬ 
cial statements that companies are required to 
provide to investors. The four basic financial 
statements are the balance sheet, the income 
statement, the statement of cash flows, and 
the statement of shareholders' equity. It is im¬ 
portant to understand these data so that the 
information conveyed by them is interpreted 
properly in financial modeling. The financial 
statements are created using several assump¬ 
tions that affect how to use and interpret the 
financial data. The analysis of financial state¬ 
ments involves the selection, evaluation, and 
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interpretation of financial data and other per¬ 
tinent information to assist in evaluating the 
operating performance and financial condition 
of a company. The operating performance of a 
company is a measure of how well a company 
has used its resources—its assets, both tangible 
and intangible—to produce a return on its in¬ 
vestment. The financial condition of a company 
is a measure of its ability to satisfy its obliga¬ 
tions, such as the payment of interest on its 
debt in a timely manner. There are many tools 
available in the analysis of financial informa¬ 
tion. These tools include financial ratio analysis 
and cash flow analysis. Cash flows are essen¬ 
tial ingredients in valuation. Therefore, under¬ 
standing past and current cash flows may help 
in forecasting future cash flows and, hence, de¬ 
termine the value of the company. Moreover, 
understanding cash flow allows the assessment 
of the ability of a firm to maintain current divi¬ 
dends and its current capital expenditure policy 
without relying on external financing. Financial 
modelers must understand how to use these fi¬ 
nancial ratios and cash flow information in the 
most effective manner in building models. 

Finite Mathematics and Basic Functions 
for Financial Modeling 

The collection of mathematical tools that does 
not include calculus is often referred to as 
"finite mathematics." This includes matrix 
algebra, probability theory, and statistical anal¬ 
ysis. Ordinary algebra deals with operations 
such as addition and multiplication performed 
on individual numbers. In financial modeling, 
it is useful to consider operations performed on 
ordered arrays of numbers. Ordered arrays of 
numbers are called vectors and matrices while 
individual numbers are called scalars. Prob¬ 
ability theory is the mathematical approach 
to formalize the uncertainty of events. Even 
though a decision maker may not know which 
one of the set of possible events may finally 
occur, with probability theory a decision maker 
has the means of providing each event with 


a certain probability. Furthermore, it provides 
the decision maker with the axioms to compute 
the probability of a composed event in a 
unique way. The rather formal environment 
of probability theory translates in a reasonable 
manner to the problems related to risk and 
uncertainty in finance such as, for example, the 
future price of a financial asset. Today, investors 
may be aware of the price of a certain asset, but 
they cannot say for sure what value it might 
have tomorrow. To make a prudent decision, 
investors need to assess the possible scenarios 
for tomorrow's price and assign to each sce¬ 
nario a probability of occurrence. Only then can 
investors reasonably determine whether the 
financial asset satisfies an investment objective 
included within a portfolio. Probability models 
are theoretical models of the occurrence of 
uncertain events. In contrast, statistics is about 
empirical data and can be broadly defined as 
a set of methods used to make inferences from 
a known sample to a larger population that is 
in general unknown. In finance, a particular 
important example is making inferences from 
the past (the known sample) to the future 
(the unknown population). There are impor¬ 
tant mathematical functions with which the 
financial modeler should be acquainted. These 
include the continuous function, the indicator 
function, the derivative of a function, the 
monotonic function, and the integral, as well 
as special functions such as the characteristic 
function of random variables and the factorial, 
the gamma, beta, and Bessel functions. 

Liquidity and Trading Costs 

In broad terms, liquidity refers to the ability 
to execute a trade or liquidate a position with 
little or no cost or inconvenience. Liquidity de¬ 
pends on the market where a financial instru¬ 
ment is traded, the type of position traded, and 
sometimes the size and trading strategy of an 
individual trade. Liquidity risks are those as¬ 
sociated with the prospect of imperfect mar¬ 
ket liquidity and can relate to risk of loss or 
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risk to cash flows. There are two main aspects 
to liquidity risk measurement: the measure¬ 
ment of liquidity-adjusted measures of mar¬ 
ket risk and the measurement of liquidity risks 
per se. Market practitioners often assume that 
markets are liquid—that is, that they can liq¬ 
uidate or unwind positions at going market 
prices—usually taken to be the mean of bid 
and ask prices—without too much difficulty or 
cost. This assumption is very convenient and 
provides a justification for the practice of mark¬ 
ing positions to market prices. However, it is 
often empirically questionable, and the failure 
to allow for liquidity can undermine the mea¬ 
surement of market risk. Because liquidity risk 
is a major risk factor in its own right, port¬ 
folio managers and traders will need to mea¬ 
sure this risk in order to formulate effective 
portfolio and trading strategies. A consider¬ 
able amount of work has been done in the eq¬ 
uity market in estimating liquidity risk. Because 
transaction costs are incurred when buying or 
selling stocks, poorly executed trades can ad¬ 
versely impact portfolio returns and therefore 
relative performance. Transaction costs are clas¬ 
sified as explicit costs such as brokerage and 
taxes, and implicit costs, which include market 
impact cost, price movement risk, and opportu¬ 
nity cost. Broadly speaking, market impact cost 
is the price that a trader has to pay for obtain¬ 
ing liquidity in the market and is a key com¬ 
ponent of trading costs that must be modeled 
so that effective trading programs for execut¬ 
ing trades can be developed. Typical forecast¬ 
ing models for market impact costs are based 
on a statistical factor approach where the in¬ 
dependent variables are trade-based factors or 
asset-based factors. 

VOLUME III 

Model Risk and Selection 

Model risk is the risk of error in pricing or 
risk-forecasting models. In practice, model risk 
arises because (1) any model involves simpli¬ 


fication and calibration, and both of these re¬ 
quire subjective judgments that are prone to er¬ 
ror, and/or (2) a model is used inappropriately. 
Although model risk cannot be avoided, there 
are many ways in which financial modelers can 
manage this risk. These include (1) recogniz¬ 
ing model risk, (2) identifying, evaluating, and 
checking the model's key assumption, (3) se¬ 
lecting the simplest reasonable model, (4) resist¬ 
ing the temptation to ignore small discrepancies 
in results, (5) testing the model against known 
problems, (6) plotting results and employing 
nonparametric statistics, (7) back-testing and 
stress-testing the model, (8) estimating model 
risk quantitatively, and (9) reevaluating mod¬ 
els periodically. In financial modeling, model 
selection requires a blend of theory, creativity, 
and machine learning. The machine-learning 
approach starts with a set of empirical data that 
the financial modeler wants to explain. Data are 
explained by a family of models that include 
an unbounded number of parameters and are 
able to fit data with arbitrary precision. There 
is a trade-off between model complexity and 
the size of the data sample. To implement this 
trade-off, ensuring that models have forecast¬ 
ing power, the fitting of sample data is con¬ 
strained to avoid fitting noise. Constraints are 
embodied in criteria such as the Akaike infor¬ 
mation criterion or the Bayesian information 
criterion. Economic and financial data are gen¬ 
erally scarce given the complexity of their pat¬ 
terns. This scarcity introduces uncertainty as 
regards statistical estimates obtained by the fi¬ 
nancial modeler. It means that the data might 
be compatible with many different models with 
the same level of statistical confidence. Methods 
of probabilistic decision theory can be used to 
deal with model risk due to uncertainty regard¬ 
ing the model's parameters. Probabilistic deci¬ 
sion making starts from the Bayesian inference 
process and involves computer simulations in 
all realistic situations. Since a risk model is typi¬ 
cally a combination of a probability distribution 
model and a risk measure, a critical assump¬ 
tion is the probability distribution assumed for 
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the random variable of interest. Too often, the 
Gaussian distribution is the model of choice. 
Empirical evidence supports the use of proba¬ 
bility distributions that exhibit fat tails such as 
the Student's t distribution and its asymmetric 
version and the Pareto stable class of distribu¬ 
tions and their tempered extensions. Extreme 
value theory offers another approach for risk 
modeling. 

Mortgage-Backed Securities Analysis 
and Valuation 

Mortgage-backed securities are fixed-income 
securities backed by a pool of mortgage loans. 
Residential mortgage-backed securities (RMBS) 
are backed by a pool of residential mortgage 
loans (one-to-four family dwellings). The RMBS 
market includes agency RMBS and nonagency 
RMBS. The former are securities issued by 
the Government National Mortgage Associa¬ 
tion (Ginnie Mae), Fannie Mae, and Freddie 
Mac. Agency RMBS include passthrough secu¬ 
rities, collateralized mortgage obligations, and 
stripped mortgage-backed securities (interest- 
only and principal-only securities). The valua¬ 
tion of RMBS is complicated due to prepayment 
risk, a form of call risk. In contrast, nonagency 
RMBS are issued by private entities, have no 
implicit or explicit government guarantee, and 
therefore require one or more forms of credit 
enhancement in order to be assigned a credit 
rating. The analysis of nonagency RMBS must 
take into account both prepayment risk and 
credit risk. The most commonly used method 
for valuing RMBS is the Monte Carlo method, 
although other methods have garnered favor, 
in particular the decomposition method. The 
analysis of RMBS requires an understanding of 
the factors that impact prepayments. 

Operational Risk 

Operational risk has been regarded as a mere 
part of a financial institution's "other" risks. 
However, failures of major financial entities 


have made regulators and investors aware of 
the importance of this risk. In general terms, 
operational risk is the risk of loss resulting from 
inadequate or failed internal processes, people, 
or systems or from external events. This risk 
encompasses legal risks, which includes, but is 
not limited to, exposure to fines, penalties, or 
punitive damages resulting from supervisory 
actions, as well as private settlements. Opera¬ 
tional risk can be classified according to several 
principles: nature of the loss (internally inflicted 
or externally inflicted), direct losses or indirect 
losses, degree of expectancy (expected or unex¬ 
pected), risk type, event type or loss type, and 
by the magnitude (or severity) of loss and the 
frequency of loss. Operational risk can be the 
cause of reputational risk, a risk that can occur 
when the market reaction to an operational loss 
event results in reduction in the market value 
of a financial institution that is greater than the 
amount of the initial loss. The two principal 
approaches in modeling operational loss dis¬ 
tributions are the nonparametric approach and 
the parametric approach. It is important to em¬ 
ploy a model that captures tail events, and for 
this reason in operational risk modeling, dis¬ 
tributions that are characterized as light-tailed 
distributions should be used with caution. The 
models that have been proposed for assessing 
operational risk can be broadly classified into 
top-down models and bottom-up models. Top- 
down models quantify operational risk without 
attempting to identify the events or causes of 
losses. Bottom-up models quantify operational 
risk on a micro level, being based on identified 
internal events. The obstacle hindering the im¬ 
plementation of these models is the scarcity of 
available historical operational loss data. 

Optimization Tools 

Optimization is an area in applied mathematics 
that, most generally, deals with efficient algo¬ 
rithms for finding an optimal solution among 
a set of solutions that satisfy given constraints. 
Mathematical programming, a management 


Preface 


xxvii 


science tool that uses mathematical opti¬ 
mization models to assist in decision making, 
includes linear programming, integer program¬ 
ming, mixed-integer programming, nonlinear 
programming, stochastic programming, and 
goal programming. Unlike other mathematical 
tools that are available to decision makers such 
as statistical models (which tell the decision 
maker what occurred in the past), forecasting 
models (which tell the decision maker what 
might happen in the future), and simulation 
models (which tell the decision maker what 
will happen under different conditions), 
mathematical programming models allow the 
decision maker to identify the "best" solution. 
Markowitz's mean-variance model for port¬ 
folio selection is an example of an application 
of one type of mathematical programming 
(quadratic programming). Traditional opti¬ 
mization modeling assumes that the inputs 
to the algorithms are certain, but there are 
also branches of optimization such as robust 
optimization that study the optimal decision 
under uncertainty about the parameters of the 
problem. Stochastic programming deals with 
both the uncertainty about the parameters and 
a multiperiod decision-making framework. 

Probability Distributions 

In financial models where the outcome of 
interest is a random variable, an assumption 
must be made about the random variable's 
probability distribution. There are two types 
of probability distributions: discrete and 
continuous. Discrete probability distributions 
are needed whenever the random variable is 
to describe a quantity that can assume values 
from a countable set, either finite or infinite. 
A discrete probability distribution (or law) is 
quite intuitive in that it assigns certain values, 
positive probabilities, adding up to one, while 
any other value automatically has zero proba¬ 
bility. Continuous probability distributions are 
needed when the random variable of interest 
can assume any value inside of one or more 


intervals of real numbers such as, for example, 
any number greater than zero. Asset returns, 
for example, whether measured monthly, 
weekly, daily, or at an even higher frequency 
are commonly modeled as continuous random 
variables. In contrast to discrete probability 
distributions that assign positive probability to 
certain discrete values, continuous probability 
distributions assign zero probability to any sin¬ 
gle real number. Instead, only entire intervals of 
real numbers can have positive probability such 
as, for example, the event that some asset return 
is not negative. For each continuous probabil¬ 
ity distribution, this necessitates the so-called 
probability density, a function that determines 
how the entire probability mass of one is dis¬ 
tributed. The density often serves as the proxy 
for the respective probability distribution. To 
model the behavior of certain financial assets in 
a stochastic environment, a financial modeler 
can usually resort to a variety of theoretical 
distributions. Most commonly, probability dis¬ 
tributions are selected that are analytically well 
known. For example, the normal distribution (a 
continuous distribution)—also called the Gaus¬ 
sian distribution—is often the distribution of 
choice when asset returns are modeled. Or the 
exponential distribution is applied to charac¬ 
terize the randomness of the time between two 
successive defaults of firms in a bond portfolio. 
Many other distributions are related to them or 
built on them in a well-known manner. These 
distributions often display pleasant features 
such as stability under summation—meaning 
that the return of a portfolio of assets whose 
returns follow a certain distribution again 
follows the same distribution. Flowever, one 
has to be careful using these distributions since 
their advantage of mathematical tractability 
is often outweighed by the fact that the 
stochastic behavior of the true asset returns 
is not well captured by these distributions. 
For example, although the normal distribution 
generally renders modeling easy because all 
moments of the distribution exist, it fails to 
reflect stylized facts commonly encountered in 
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asset returns—namely, the possibility of very 
extreme movements and skewness. To remedy 
this shortcoming, probability distributions 
accounting for such extreme price changes 
have become increasingly popular. Some of 
these distributions concentrate exclusively on 
the extreme values while others permit any real 
number, but in a way capable of reflecting mar¬ 
ket behavior. Consequently, a financial modeler 
has available a great selection of probability 
distributions to realistically reproduce asset 
price changes. Their common shortcoming is 
generally that they are mathematically difficult 
to handle. 

Risk Measures 

The standard assumption in financial models is 
that the distribution for the return on financial 
assets follows a normal (or Gaussian) distri¬ 
bution and therefore the standard deviation 
(or variance) is an appropriate measure of risk 
in the portfolio selection process. This is the 
risk measure that is used in the well-known 
Markowitz portfolio selection model (that is, 
mean-variance model), which is the foundation 
for modern portfolio theory. Mounting evi¬ 
dence since the early 1960s strongly suggests 
that return distributions do not follow a normal 
distribution, but instead exhibit heavy tails 
and, possibly, skewness. The "tails" of the dis¬ 
tribution are where the extreme values occur, 
and these extreme values are more likely than 
would be predicted by the normal distribution. 
This means that between periods where the 
market exhibits relatively modest changes in 
prices and returns, there will be periods where 
there are changes that are much higher (that 
is, crashes and booms) than predicted by the 
normal distribution. This is of major concern to 
financial modelers in seeking to generate prob¬ 
ability estimates for financial risk assessment. 
To more effectively implement portfolio se¬ 
lection, researchers have proposed alternative 
risk measures. These risk measures fall into 


two disjointed categories: dispersion measures 
and safety-first measures. Dispersion measures 
include mean standard deviation, mean abso¬ 
lute deviation, mean absolute moment, index 
of dissimilarity, mean entropy, and mean colog. 
Safety-first risk measures include classical 
safety first, value-at-risk, average value-at-risk, 
expected tail loss, MiniMax, lower partial 
moment, downside risk, probability-weighted 
function of deviations below a specified target 
return, and power conditional value-at-risk. 
Despite these alternative risk measures, the 
most popular risk measure used in financial 
modeling is volatility as measured by the 
standard deviation. There are different types 
of volatility: historical, implied volatility, 
level-dependent volatility, local volatility, 
and stochastic volatility (e.g., jump-diffusion 
volatility). There are risk measures commonly 
used for bond portfolio management. These 
measures include duration, convexity, key rate 
duration, and spread duration. 

Software for Financial Modeling 

The development of financial models requires 
the modeler to be familiar with spreadsheets 
such as Microsoft Excel and/or a platform to 
implement concepts and algorithms such as 
the Palisade Decision Tools Suite and other 
Excel-based software (mostly @RISK1, Solver2, 
VBA3), and MATLAB. Financial modelers can 
choose one or the other, depending on their 
level of familiarity and comfort with spread¬ 
sheet programs and their add-ins versus pro¬ 
gramming environments such as MATLAB. 
Some tasks and implementations are easier in 
one environment than in the other. MATLAB 
is a modeling environment that allows for in¬ 
put and output processing, statistical analysis, 
simulation, and other types of model build¬ 
ing for the purpose of analysis of a situa¬ 
tion. MATLAB uses a number-array-oriented 
programming language, that is, a program¬ 
ming language in which vectors and matrices 
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are the basic data structures. Reliable built-in 
functions, a wide range of specialized tool¬ 
boxes, easy interface with widespread software 
like Microsoft Excel, and beautiful graphing ca¬ 
pabilities for data visualization make imple¬ 
mentation with MATLAB efficient and useful 
for the financial modeler. Visual Basic for Appli¬ 
cations (VBA) is a programming language en¬ 
vironment that allows Microsoft Excel users to 
automate tasks, create their own functions, per¬ 
form complex calculations, and interact with 
spreadsheets. VBA shares many of the same 
concepts as object-oriented programming lan¬ 
guages. Despite some important limitations, 
VBA does add useful capabilities to spreadsheet 
modeling, and it is a good tool to know because 
Excel is the platform of choice for many finance 
professionals. 

Stochastic Processes and Tools 

Stochastic integration provides a coherent way 
to represent that instantaneous uncertainty (or 
volatility) cumulates over time. It is thus fun¬ 
damental to the representation of financial pro¬ 
cesses such as interest rates, security prices, or 
cash flows. Stochastic integration operates on 
stochastic processes and produces random vari¬ 
ables or other stochastic processes. Stochastic 
integration is a process defined on each path as 
the limit of a sum. However, these sums are dif¬ 
ferent from the sums of the Riemann-Lebesgue 
integrals because the paths of stochastic pro¬ 
cesses are generally not of bounded variation. 
Stochastic integrals in the sense of Ito are de¬ 
fined through a process of approximation by 
(1) defining Brownian motion, which is the con¬ 
tinuous limit of a random walk, (2) defining 
stochastic integrals for elementary functions as 
the sums of the products of the elementary 
functions multiplied by the increments of the 
Brownian motion, and (3) extending this defi¬ 
nition to any function through approximating 
sequences. The major application of integra¬ 
tion to financial modeling involves stochastic 


integrals. An understanding of stochastic in¬ 
tegrals is needed to understand an important 
tool in contingent claims valuation: stochastic 
differential equations. The dynamic of finan¬ 
cial asset returns and prices can be expressed 
using a deterministic process if there is no un¬ 
certainty about its future behavior, or, with a 
stochastic process, in the more likely case when 
the value is uncertain. Stochastic processes in 
continuous time are the most used tool to ex¬ 
plain the dynamic of financial assets returns 
and prices. They are the building blocks to con¬ 
struct financial models for portfolio optimiza¬ 
tion, derivatives pricing, and risk management. 
Continuous-time processes allow for more ele¬ 
gant theoretical modeling compared to discrete 
time models, and many results proven in prob¬ 
ability theory can be applied to obtain a simple 
evaluation method. 


Statistics 

Probability models are theoretical models of 
the occurrence of uncertain events. In contrast, 
statistics is about empirical data and can be 
broadly defined as a set of methods used to 
make inferences from a known sample to a 
larger population that is in general unknown. In 
finance, a particular important example is mak¬ 
ing inferences from the past (the known sam¬ 
ple) to the future (the unknown population). In 
statistics, probabilistic models are applied us¬ 
ing data so as to estimate the parameters of 
these models. It is not assumed that all param¬ 
eter values in the model are known. Instead, 
the data for the variables in the model to esti¬ 
mate the value of the parameters are used and 
then applied to test hypotheses or make infer¬ 
ences about their estimated values. In financial 
modeling, the statistical technique of regression 
models is the workhorse. However, because re¬ 
gression models are part of the field of financial 
econometrics, this topic is covered in that topic 
category. Understanding dependences or func¬ 
tional links between variables is a key theme in 
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financial modeling. In general terms, functional 
dependencies are represented by dynamic 
models. Many important models are linear 
models whose coefficients are correlation coeffi¬ 
cients. In many instances in financial modeling, 
it is important to arrive at a quantitative mea¬ 
sure of the strength of dependencies. The cor¬ 
relation coefficient provides such a measure. In 
many instances, however, the correlation coef¬ 
ficient might be misleading. In particular, there 
are cases of nonlinear dependencies that result 
in a zero correlation coefficient. From the point 
of view of financial modeling, this situation is 
particularly dangerous as it leads to substan¬ 
tially underestimated risk. Different measures 
of dependence have been proposed, in partic¬ 
ular copula functions. The copula overcomes 
the drawbacks of the correlation as a measure 
of dependency by allowing for a more general 
measure than linear dependence, allowing for 
the modeling of dependence for extreme events, 
and being indifferent to continuously increas¬ 
ing transformations. Another essential tool in 
financial modeling, because it allows the incor¬ 
poration of uncertainty in financial models and 
consideration of additional layers of complex¬ 
ity that are difficult to incorporate in analytical 
models, is Monte Carlo simulation. The main 
idea of Monte Carlo simulation is to represent 
the uncertainty in market variables through sce¬ 
narios, and to evaluate parameters of interest 
that depend on these market variables in com¬ 
plex ways. The advantage of such an approach 
is that it can easily capture the dynamics of un¬ 
derlying processes and the otherwise complex 
effects of interactions among market variables. 
A substantial amount of research in recent years 
has been dedicated to making scenario genera¬ 
tion more accurate and efficient, and a number 
of sophisticated computational techniques are 
now available to the financial modeler. 

Term Structure Modeling 

The arbitrage-free valuation approach to the 
valuation of option-free bonds, bonds with em¬ 


bedded options, and option-type derivative in¬ 
struments requires that a financial instrument 
be viewed as a package of zero-coupon bonds. 
Consequently, in financial modeling, it is essen¬ 
tial to be able to discount each expected cash 
flow by the appropriate interest rate. That rate 
is referred to as the spot rate. The term struc¬ 
ture of interest rates provides the relationship 
between spot rates and maturity. Because of its 
role in valuation of cash bonds and option-type 
derivatives, the estimation of the term struc¬ 
ture of interest rates is of critical importance as 
an input into a financial model. In addition to 
its role in valuation modeling, term structure 
models are fundamental to expressing value, 
risk, and establishing relative value across the 
spectrum of instruments found in the various 
interest-rate or bond markets. The term struc¬ 
ture is most often specified for a specific market 
such as the U.S. Treasury market, the bond mar¬ 
ket for double-A rated financial institutions, 
the interest rate market for LIBOR, and swaps. 
Static models of the term structure are char¬ 
acterizations that are devoted to relationships 
based on a given market and do not serve future 
scenarios where there is uncertainty. Standard 
static models include those known as the spot 
yield curve, discount function, par yield curve, 
and the implied forward curve. Instantiations of 
these models may be found in both a discrete- 
and continuous-time framework. An important 
consideration is establishing how these term 
structure models are constructed and how to 
transform one model into another. In model¬ 
ing the behavior of interest rates, stochastic dif¬ 
ferential equations (SDEs) are commonly used. 
The SDEs used to model interest rates must cap¬ 
ture the market properties of interest rates such 
as mean reversion and/or a volatility that de¬ 
pends on the level of interest rates. For a one- 
factor model, the SDE is used to model the 
behavior of the short-term rate, referred to as 
simply the "short rate." The addition of another 
factor (i.e., a two-factor model) involves extend¬ 
ing the SDE to represent the behavior of the 
short rate and a long-term rate (i.e., long rate). 
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The entries can serve as material for a wide 
spectrum of courses, such as the following: 

• Financial engineering 

• Financial mathematics 

• Financial econometrics 

• Statistics with applications in finance 


* Quantitative asset management 

* Asset and derivative pricing 

* Risk management 

Frank J. Fabozzi 
Editor, Encyclopedia of Financial Models 



Guide to the Encyclopedia of 
Financial Models 


The Encyclopedia of Financial Models provides 
comprehensive coverage of the field of finan¬ 
cial modeling. This reference work consists of 
three separate volumes and 127 entries. Each 
entry provides coverage of the selected topic 
intended to inform a broad spectrum of read¬ 
ers ranging from finance professionals to aca¬ 
demicians to students to fiduciaries. To derive 
the greatest possible benefit from the Encyclo¬ 
pedia of Financial Models, we have provided this 
guide. It explains how the information within 
the encyclopedia can be located. 

ORGANIZATION 

The Encyclopedia of Financial Models is organized 
to provide maximum ease of use for its readers. 

Table of Contents 

A complete table of contents for the entire en¬ 
cyclopedia appears in the front of each volume. 
This list of titles represents topics that have been 
carefully selected by the editor, Frank J. Fabozzi. 
The Preface includes a more detailed descrip¬ 
tion of the volumes and the topic categories that 
the entries are grouped under. 

Index 

A Subject Index for the entire encyclopedia is 
located at the end of each volume. The sub¬ 


jects in the index are listed alphabetically and 
indicate the volume and page number where 
information on this topic can be found. 

Entries 

Each entry in the Encyclopedia of Financial Mod¬ 
els begins on a new page, so that the reader may 
quickly locate it. The author's name and affilia¬ 
tion are displayed at the beginning of the entry. 
All entries in the encyclopedia are organized 
according to a standard format, as follows: 

• Title and author 

• Abstract 

• Introduction 

• Body 

• Key points 

• Notes 

• References 

Abstract 

The abstract for each entry gives an overview of 
the topic, but not necessarily the content of the 
entry. This is designed to put the topic in the 
context of the entire Encyclopedia, rather than 
give an overview of the specific entry content. 

Introduction 

The text of each entry begins with an intro¬ 
ductory section that defines the topic under 
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discussion and summarizes the content. By 
reading this section, the reader gets a general 
idea about the content of a specific entry. 

Body 

The body of each entry explains the purpose, 
theory, and math behind each model. 

Key Points 

The key points section provides in bullet point 
format a review of the materials discussed in 


each entry. It imparts to the reader the most 
important issues and concepts discussed. 

Notes 

The notes provide more detailed information 
and citations of further readings. 

References 

The references section lists the publications 
cited in the entry. 
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Valuing Mortgage-Backed and 
Asset-Backed Securities 
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Abstract: The valuing (or pricing) of a bond without an embedded option (that is, an option-free 
bond) is straightforward. The value is equal to the present value of the expected cash flows. 
Ignoring defaults, for an option-free bond the cash flows are known and consist of the periodic 
interest payments and principal at the maturity date. The interest or discount rates for computing 
the present value of the cash flows begin with the spot rates for a benchmark security and to those 
rates an appropriate spread is added. Moving from valuing option-free bonds to corporate bonds 
and agency debentures with embedded options is not simple. The interest rate-sensitive options 
that can be embedded into these bonds are call options, put options, accelerated sinking provisions, 
and, for floating-rate securities, caps on the interest rate. The reason valuation is complicated is 
that the embedded options must be taken into account and the theoretical option-free value of the 
bond must be adjusted accordingly. The technique typically used for valuing corporate bonds and 
agency debentures with embedded options is the lattice method. Mortgage-backed securities also 
have embedded options: the right of the borrowers in a loan pool to prepay their mortgage loan. 
However, because future cash flows for a loan pool are sensitive to not only the current interest 
rate but the history of rates since the loans were originated, the lattice method which is solved 
using backward induction cannot be employed. Instead, the most common methodology used 
for valuing mortgage-backed securities and mortgage-related asset-backed securities is the Monte 
Carlo simulation model. Other types of asset-backed securities are straightforward to value. In 
addition to the complications in valuing mortgage-backed securities and mortgage-related asset- 
backed securities, there is the difficulty in estimating their price sensitivity to changes in interest 
rates (that is, duration and convexity). The Monte Carlo simulation model can be used to compute 
the effective duration of these securities. This duration measure takes into consideration how a 
change in interest rates can impact a security's cash flow. 


In this entry we will explain the methodology 
for valuing asset-backed securities (ABS) and 
mortgage-backed securities (MBS) and mea¬ 
sures of relative value. 1 We begin by review¬ 
ing cash-flow yield analysis and the limitations 


of the spread measure that is a result of that 
analysis—the nominal spread. We then look at a 
better spread measure called the zero-volatility 
spread, but point out its limitation as a mea¬ 
sure of relative value for MBS products because 
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of the borrower's prepayment option and for 
ABS products where the prepayment option has 
value. Finally, we look at the methodology for 
valuing MBS and for ABS products where the 
prepayment option has value—the Monte Carlo 
simulation model. A by-product of this model 
is a spread measure called the option-adjusted 
spread (OAS). This measure is superior to the 
nominal spread and the zero-volatility spread 
for ABS products where the prepayment op¬ 
tion has a value because it takes into account 
how cash flows may change when interest rates 
change. That is, it recognizes the borrower's 
prepayment option and how that affects pre¬ 
payments when interest rates may change in 
the future. While the OAS is superior to the two 
other spread measures, it is based on assump¬ 
tions that must be understood by an investor 
and the sensitivity of the security's value and 
OAS to changes in those assumptions must be 
investigated. 


CASH-FLOW YIELD 
ANALYSIS 

The yield on any financial instrument is the 
interest rate that makes the present value of 
the expected cash flow equal to its market 
price plus accrued interest. For ABS and MBS, 
the yield calculated is called a cash-flow yield. 
The problem in calculating the cash-flow yield 
of MBS and ABS is that because of prepayments 
the cash flow is unknown. A prepayment is the 
amount of the payment made by the obligor 
in the loan pool that is in excess of the sched¬ 
uled principal payment. Prepayments can be 
voluntary such as for refinancing the loan or in¬ 
voluntary such as for a default by the obligor. 
Consequently, to determine a cash-flow yield 
some assumption about the prepayment rate 
and recovery rate in the case of defaults must 
be made. 2 

The cash flow for MBS and ABS is typically 
monthly. The convention is to compare the yield 
on MBS and ABS to that of a Treasury coupon 
security by calculating the security's bond- 


equivalent yield. The bond-equivalent yield for 
a Treasury coupon security is found by dou¬ 
bling the semiannual yield. Flowever, it is in¬ 
correct to do this for MBS and ABS because 
the investor has the opportunity to generate 
greater interest by reinvesting the more fre¬ 
quent cash flows. The market practice is to 
calculate a yield so as to make it comparable 
to the yield to maturity on a bond-equivalent 
yield basis. The formula for annualizing the 
monthly cash-flow yield for MBS and ABS is 
as follows: 

Bond-equivalent yield = 2[(1 + i M ) 6 — 1] 

where z’m is the monthly interest rate that 
will equate the present value of the projected 
monthly cash flow to the market price (plus ac¬ 
crued interest) of the security. 

All yield measures suffer from problems that 
limit their use in assessing a security's poten¬ 
tial return. The yield to maturity for a Trea¬ 
sury, agency, or corporate bond has two major 
shortcomings as a measure of a bond's poten¬ 
tial return. To realize the stated yield to matu¬ 
rity, the investor must: (1) reinvest the coupon 
payments at a rate equal to the yield to maturity 
and (2) hold the bond to the maturity date. The 
reinvestment of the coupon payments is critical 
and for long-term bonds can comprise as much 
as 80% of the bond's return. The risk of having 
to reinvest the interest payments at less than 
the computed yield is called reinvestment risk. 
The risk associated with a decline in the value 
of a security due to a rise in interest rates is 
called interest rate risk and in practice is quanti¬ 
fied by computing the security's duration and 
convexity. 

These shortcomings are equally applicable to 
the cash-flow yield measure for ABS and MBS: 

(1) the projected cash flows are assumed to be 
reinvested at the computed cash-flow yield and 

(2) the security is assumed to be held until the fi¬ 
nal payout based on some prepayment assump¬ 
tion. The importance of reinvestment risk—the 
risk that the cash flow will be reinvested at a 
rate less than the calculated cash-flow yield—is 
particularly important for amortizing MBS and 
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ABS products, because payments are monthly 
and both interest and principal must be rein¬ 
vested. Moreover, an additional assumption is 
that the projected cash flow is actually realized. 
If the prepayment experience and the recovery 
rate realized differ from that assumed, the cash¬ 
flow yield will not be realized. 

Given the computed cash-flow yield and the 
average life for a security based on some pre¬ 
payment assumption and default/recovery as¬ 
sumption, the next step is to compare the yield 
to the yield for a comparable Treasury security. 
"Comparable" is typically defined as a Trea¬ 
sury security with the same maturity as the 
(weighted) average life or the duration of the 
security. The difference between the cash-flow 
yield and the yield on a comparable Treasury 
security is called the nominal spread. 

Unfortunately, it is the nominal spread that 
investors will too often use as a measure of rel¬ 
ative value for ABS and MBS. However, this 
spread masks the fact that a portion of the nom¬ 
inal spread may be compensation for accepting 
prepayment risk. Instead of nominal spread, in¬ 
vestors need a measure that indicates the com¬ 
pensation after adjusting for prepayment risk 
for all MBS and for ABS where the prepay¬ 
ment option has value. This measure is called 
the option-adjusted spread. Before discussing 
this measure, we describe another spread mea¬ 
sure commonly quoted for MBS and ABS called 
the zero-volatility spread. This measure takes 
into account another problem with the nomi¬ 
nal spread. Specifically, the nominal spread is 
computed assuming that all the cash flows for 
a security should be discounted at only one in¬ 
terest rate. That is, it fails to recognize the term 
structure of interest rates. 


ZERO-VOLATILITY SPREAD 

The proper procedure to compare ABS and MBS 
to a Treasury is to compare it to a portfolio of 
Treasury securities that have the same cash flow. 
The value of the security is then equal to the 
present value of all of the cash flows. The secu¬ 


rity's value, assuming the cash flows are default 
free, will equal the present value of the repli¬ 
cating portfolio of Treasury securities. In turn, 
these cash flows are valued at the Treasury spot 
rates. 

The zero-volatility spread is a measure of the 
spread that the investor would realize over 
the entire Treasury spot rate curve if the non- 
Treasury security being analyzed is held to 
maturity. It is not a spread off one point on the 
Treasury yield curve, as is the nominal spread. 
The zero-volatility spread (also called the Z- 
spread and the static spread ) is the spread that will 
make the present value of the cash flows from 
the non-Treasury security when discounted 
at the Treasury spot rate plus the spread equal 
to the market price plus accrued interest of the 
non-Treasury security. A trial-and-error proce¬ 
dure (or search algorithm) is required to deter¬ 
mine the zero-volatility spread. 

In general, the shorter the average life of 
the ABS /MBS, the less the zero-volatility 
spread will deviate from the nominal spread. 
The magnitude of the difference between the 
nominal spread and the zero-volatility spread 
also depends on the shape of the yield curve. 
The steeper the yield curve, the greater the 
difference. 

If borrowers in the underlying loan pool have 
the right to prepay but do not typically take ad¬ 
vantage of a decline in interest rates below the 
loan's rate to refinance, then the zero-volatility 
spread is the appropriate measure of relative 
value and it should be used in valuing cash 
flows to determine the value of ABS. This is 
the case, for example, for automobile loan ABS. 
While borrowers have the right to refinance 
when rates decline below the loan rate, they 
typically do not. In contrast, for standard resi¬ 
dential mortgage loans, home equity loan ABS, 
and manufactured housing ABS, the borrowers 
in the underlying pool do refinance when inter¬ 
est rates decline below the loan rate. The next 
methodology and spread measure are used for 
products with this characteristic. Basically, they 
are used for all residential MBS and mortgage- 
related ABS. 
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VALUATION USING MONTE 
CARLO SIMULATION AND 
OAS ANALYSIS 

In fixed income valuation modeling, there are 
two methodologies commonly used to value 
securities with embedded options—the Monte 
Carlo simulation model and the lattice model. 
The Monte Carlo simulation model involves 
simulating a large number of potential inter¬ 
est rate paths in order to assess the value of a 
security on those different paths. This model is 
the most flexible of the two valuation method¬ 
ologies for valuing interest rate-sensitive in¬ 
struments where the history of interest rates 
is important. MBS and mortgage-related ABS 
are commonly valued using this model. As 
explained below, a by-product of this valua¬ 
tion model is the OAS. (An alternative model 
for valuing agency passthrough securities that 
does not require a prepayment model is pro¬ 
vided in Kalotay, Yang, and Fabozzi, 2004.) 

A lattice model is used to value callable 
agency debentures and corporate bonds. This 
valuation model accommodates securities in 
which the decision to exercise a call option is not 
dependent on how interest rates evolved over 
time. That is, the decision of an issuer to call a 
bond will depend on the prevailing interest rate 
at which the issue can be refunded relative to 
the issue's coupon rate and the costs associated 
with refunding, and not the path interest rates 
took to get to that rate. MBS and mortgage- 
related ABS which allow prepayments have 
periodic cash flows that are interest rate path 
dependent. This means that the cash flow re¬ 
ceived in one period is determined not only by 
the current interest rate level, but also by the 
path that interest rates took to get to the current 
level. 

Prepayments for MBS and mortgage-related 
ABS are interest rate path dependent because 
this month's prepayment rate depends on 
whether there have been prior opportunities to 
refinance since the underlying loans were orig¬ 
inated. Moreover, the cash flows to be received 


in the current month by investors in a bond 
class of MBS and mortgage-related ABS transac¬ 
tion depend on the outstanding balances of the 
other bond classes in the transaction. For exam¬ 
ple, in the case of a planned amortization class 
(PAC) bond in a collateralized mortgage obli¬ 
gation structure, all prepayments from the time 
the security was issued up to the valuation date 
affect the amount of support bonds outstand¬ 
ing and therefore the cash flow at the valuation 
date for the PAC bond. 3 Thus, we need the his¬ 
tory of prepayments to calculate the balances of 
bond classes in a structure. 

Conceptually, valuation using the Monte 
Carlo simulation model is simple. In practice, 
however, it is very complex. The simulation 
involves generating a set of cash flows based 
on simulated future refinancing rates, which in 
turn imply simulated prepayment and default / 
recovery rates. The objective is to figure out how 
the value of the collateral gets transmitted to the 
bond classes in the structure. More specifically, 
modeling is used to identify where the value in 
a transaction has been allocated and where the 
risk (prepayment risk and credit risk) has been 
distributed in order to identify the bond classes 
with low risk and high value. 

Simulating Interest Rate Paths and 
Cash Flows 

Monte Carlo simulation is a management sci¬ 
ence/operations research technique that is com¬ 
monly employed in finance. 4 The purpose of 
Monte Carlo simulation is to generate a proba¬ 
bility distribution for the outcome of some ran¬ 
dom variable of interest. In its application to 
valuing securities, it is used to generate interest 
rate paths so that potential cash flows on those 
paths can be determined and then each path is 
valued. (In the parlance of simulation, an inter¬ 
est rate path is referred to as a trial.) The value 
for the security on each of those interest rate 
paths is then one value in determining the esti¬ 
mated probability distribution for the security's 
value. 
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The procedure for generating the interest rate 
paths begins with a benchmark term struc¬ 
ture of interest rates and associated with this 
benchmark are market prices for benchmark se¬ 
curities. Given the benchmark term structure 
of interest rates, the interest rate paths are ad¬ 
justed (that is, calibrated) so that the average 
price produced by the model for each bench¬ 
mark security will equal the market price for 
the benchmark security 

Most models use the on-the-run Treasury is¬ 
sues in this calibration process. Other model 
developers use off-the-run Treasury issues as 
well. The argument for using off-the-run Trea¬ 
sury issues is that the price/yield of on-the-run 
Treasury issues will not reflect their true eco¬ 
nomic value because the market price reflects 
their value for financing purposes (that is, an is¬ 
sue may be on special in the repo market). Some 
models use the London Interbank Offered Rate 
(LIBOR) curve instead of the Treasury curve. 
The reason is that some investors are interested 
in spreads that they can earn relative to their 
funding costs and LIBOR, for many investors, 
is a better proxy for that cost than Treasury rates. 

To generate the interest rate paths, an assump¬ 
tion about the evolution of future interest rates 
is required. Most Monte Carlo simulation mod¬ 
els use some form of one-factor interest rate 
model. The one factor used is the short-term in¬ 
terest rate. When using a particular one-factor 
interest rate model, several further assumptions 
must be made. The first, and the most impor¬ 
tant, is the assumption about the volatility of 
the short-term interest rate. The volatility as¬ 
sumption determines the dispersion of future 
interest rates in the simulation. Many model 
developers do not use one volatility number 
for the yield volatility of all maturities for the 
benchmark curve. Instead, they use either a 
short/long yield volatility or a term structure 
of yield volatility. A short/long yield volatil¬ 
ity means that volatility is specified for matu¬ 
rities up to a certain number of years (short 
yield volatility) and a different yield volatility 
for greater maturities (long yield volatility). The 


short yield volatility is assumed to be greater 
than the long yield volatility. A term structure 
of yield volatilities means that a yield volatil¬ 
ity is assumed for each maturity. (In practice, 
interest rate volatility is extracted from inter¬ 
est rate cap market prices.) From these prices, 
a term structure of yield volatility is obtained. 
Differences in the assumption about volatility 
of short-term interest rates can have a material 
impact on the resulting value derived for the 
security. 

Another assumption relates to the speed of 
mean reversion of the short-term interest rate. 
Mean reversion in an interest rate model has to 
do with not allowing interest rates to fall below 
a lower barrier and not exceed an upper barrier 
before rates revert back to some average interest 
rate specified by the model developer or user. 

The random paths of interest rates should be 
generated from an arbitrage-free model of the 
future term structure of interest rates. By arbi¬ 
trage free it is meant that the model replicates 
today's term structure of interest rates, an input 
of the model, and that for all future dates there 
is no possible arbitrage within the model. 

The simulation works by generating many 
scenarios of future interest rate paths. In each 
month of a given scenario (that is, path), a 
monthly interest rate and a refinancing rate are 
generated. The monthly interest rates are used 
to discount the projected cash flows in the sce¬ 
nario. The refinancing rate is needed to deter¬ 
mine the cash flows because it represents the 
opportunity cost the borrower is facing at that 
time. 

If the refinancing rates are high relative to the 
borrower's loan rate, the borrower will have no 
incentive to refinance. For MBS and mortgage- 
related ABS, there is a disincentive to prepay 
(that is, the homeowner may avoid moving in 
order to avoid refinancing). If the refinancing 
rate is low relative to the borrower's loan rate, 
the borrower has an incentive to refinance. 

Prepayments (voluntary and involuntary) 
and recoveries are projected by feeding the re¬ 
financing rate and loan characteristics into a 
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Table 1 Simulated Paths of One-Month Future Interest Rates 


Interest Rate Path Number 


Month 

1 

2 

3 

n 


N 

1 

/i(l) 

/i(2) 

/i(3) 

fi(n) 


/i(N) 

2 

H i) 

H 2 ) 

H 3) 

fi(n) 


fi(N) 

3 

/ 3 (i) 

/ 3 ( 2) 

H 3) 

fi(n) 


/ 3 (N) 

t 

m 

/f(2) 

m 

ft(n) 


ft(N) 

M-2 


/m-2 (2) 

/m-2(3) 

fM-2(n) 


f M -2<N) 

M-l 

/m-i(1) 

/m-i(2) 

/m-i(3) 

/m-iW 


/m-i(N) 

M 

M 1) 

/m( 2) 

/m(3) 

f mM 


fu(N) 

Notation: f t (n) = one-month future interest rate for month t on path n, 
M = number of months for the loan pool. 

N = total number of interest rate paths; 


prepayment model and default model. (In the 
case of agency MBS [Ginnie Mae, Fannie Mae, 
and Freddie Mac] no assumption about defaults 
is required.) Given the projected prepayments, 
the cash flows along an interest rate path can 
be determined. To be able to do this, the entire 
deal must be reverse engineered. That is, the 
deal's waterfall (that is, the rules for distribu¬ 
tion of interest, principal repayment, and loss 
allocation) must be specified so that the cash 
flow for the bond class being valued can be de¬ 
termined. Model developers do not reverse en¬ 
gineer the deals. Rather, there are vendors who 
provide the waterfall for deals that are used in 
conjunction with the Monte Carlo simulation 
model. 

To make this more concrete, consider a newly 
issued loan pool with a maturity of M months 


that is the collateral for an MBS or mortgage- 
related ABS. Table 1 shows N simulated inter¬ 
est rate path scenarios. Each scenario consists 
of a path of M simulated 1-month future 
interest rates. (The determination of the num¬ 
ber of paths generated is based on a variance- 
reduction method. 5 ) So, the first assumption 
made to generate the short-term interest rate 
paths in Table 1 is the volatility of short-term 
interest rates. 

Table 2 shows the paths of simulated refinanc¬ 
ing rates corresponding to the scenarios shown 
in Table 1. In going from Table 1 to Table 2, an as¬ 
sumption must be made about the relationship 
between the benchmark short-term interest rate 
and the refinancing rate. The assumption is that 
there is a constant spread relationship between 
the refinancing rate and the interest rate for a 


Table 2 Simulated Paths of Refinancing Rates 


Month 



Interest Rate Path Number 



1 

2 

3 

... n 


N 

1 

ri(l) 

ri(2) 

r i(3) 

n(n) 


n(N) 

2 

r 2 ( 1) 

'•2(2) 

'•2(3) 

r 2 (n) 


n 2 (N) 

3 

r 3 (l) 

rs(2) 

'" 3 ( 3) 

r 3 (n) 


MN) 

t 

ri(l) 

r,( 2) 

'1(3) 

r,(n) 


r,(N) 

M-2 

f"M-2(1) 

I ' M - 2(2) 

''M-2 (3) 

r M - 2 (n) 


^M-2 (N) 

M-l 

rM-i(l) 

rM-i(2) 

1(3) 

r M -i(") 


^m-i(N) 

M 

r M { 1) 

r M { 2) 

?m( 3) 

r M («) 


r M (N ) 

Notation: r t {n) 

= refinancing rate for month t 

on path n; N 

= total number of interest rate paths; M 

= number of 


months for the loan pool. 
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Table 3 Simulated Cash Flows for the Loan Pool 


Month 



Interest Rate Path Number 


1 

2 

3 

... n 

N 

1 

Ci(l) 

Ci (2) 

Ci (3) 

Cl (n) 

Ci (N) 

2 

C 2 (l) 

C 2 (2) 

C 2 ( 3) 

C 2 (n) 

C 2 (N) 

3 

C 3 (l) 

C 3 (2) 

C 3 (3) 

C 3 (n) 

C 3 (N) 

t 

c,(l) 

Ct(2) 

Cf(3) 

C t (n) 

C,(N) 

M-2 

Cm-2(1) 

Cm-i( 2) 

Cm- 2 (3) 

Cm-iM 

C M - 2 (N) 

M -1 

Cm-i(1) 

Cm-i(2) 

Cm-i (3) 

Cm- i(n) 

Cm-i (N) 

M 

Cm( 1) 

Cm (2) 

C M ( 3) 

C M (n) 

C m (N) 

Notation: C t (n ) 

= loan pool's cash flow for month t on path n; N 

= total number of interest rate paths; M = number 


of months for the loan pool. 


maturity that is the best proxy for the borrow¬ 
ing rate. Typically, it is the 10-year rate that is 
used as a proxy. 

Given the refinancing rates, the collateral's 
cash flows on each interest rate path can be 
generated. This requires a prepayment and de¬ 
fault/ recovery model. So our next assumption 
is that the prepayment and default/recovery 
models used to generate the loan pool's cash 
flows are correct. The resulting cash flows are 
depicted in Table 3. 

Given the loan pool's cash flow for each 
month on each interest rate path, the next step 
is to use the waterfall for the structure to deter¬ 
mine how the cash flow is distributed to the 
bond class being valued. Let us use BCC to 
denote the cash flow for that bond class. Ta¬ 
ble 4 shows the simulated cash flows on each of 


the interest rate paths for the bond class being 
valued. 

Calculating the Present Value of a 
Bond Class for a Scenario Interest 
Rate Path 

Given the cash flows for the bond class on an 
interest rate path, the path's present value can 
be calculated. The discount rate for determining 
the present value is the simulated spot rate for 
each month on the interest rate path plus an ap¬ 
propriate spread. The spot rate on a path can be 
determined from the simulated future monthly 
rates. The relationship that holds between the 
simulated spot rate for month f on path n and 
the simulated future one-month rates is: 

z t (n) = {[l+/i(n)][l+/ 2 (n)] ■ ■ ■ [1+/ f (n)]} 1/f -l 


Table 4 Simulated Cash Flows for the Bond Class Being Valued 





Interest Rate Path Number 



Month 

1 

2 

3 


n 


N 

1 

BCCi(l) 

BCCi (2) 

BCCi (3) 


BCCi(n) 


BCC\(N) 

2 

BCC 2 (1) 

BCC 2 (2) 

BCC 2 ( 3) 


BCC z (n) 


BCC 2 (N) 

3 

BCC 3 (1) 

BCC 3 ( 2) 

BCC 3 ( 3) 


BCC 3 (n) 


BCC 3 (N) 

t 

BCC f (l) 

BCC 2 ( 2) 

BCC, (3) 


BCC t (n) 


BCCt(N) 

M-2 

BCC M - 2 (1) 

BCC M - 2 ( 2) 

BCCm- 2 ( 3) 


BCCu-iin) 


BCCm-i(N) 

M-l 

BCC m _i(1) 

BCC M -i(2) 

BCCm-i (3) 


BCCm-i(«) 


BCCm-i(N) 

M 

BCCm(1) 

BCC m (2) 

BCC m ( 3) 


BCCmIw) 


BCCm(N) 


Notation: BCQ(n) = bond class's cash flow for month t on path n; N = total number of interest rate paths; M — 
number of months for the loan pool. 
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Table 5 Simulated Paths of Monthly Spot Rates 


Month 



Interest Rate Path Number 



1 

2 

3 


n 


N 

1 

zi(l) 

zi(2) 

zi(3) 


Zi (n) 


zi(N) 

2 

*2(1) 

zz(2) 

z 2 (3) 


z 2 (n) 


z 2 (N) 

3 

z 3 (l) 

*3(2) 

z 3 (3) 


z 3 (n) 


z 3 (N) 

t 

z f (l) 

Zf(2) 

Zt(3) 


z t (n) 


z t (N) 

M-2 

ZM-2(1) 

zm-z(2) 

ZM-2(3) 


Zm- 2 W 


z M -2(N) 

M-l 

Zm-i(1) 

Zm-1 (2) 

Zm-i(3) 


ZM-l(n) 


Zm-i(N) 

M 

Zm(1) 

Zm(2) 

zm( 3) 


z M (n) 


zm(N) 


Notation: z t (n) = spot rate for month t on path n; N = total number of interest rate paths; M = number of months for 
the loan pool. 


where 


on path n. That is. 


Zf(n) = simulated spot rate for month t on 
path n 

fj(n) — simulated future one-month rate for 
month j on path n 

Consequently, the interest rate path for the 
simulated future one-month rates can be con¬ 
verted to the interest rate path for the simu¬ 
lated monthly spot rates as shown in Table 5. 
Therefore, the present value of the cash flows 
for month t on interest rate path n discounted 
at the simulated spot rate for month f plus some 
spread is: 


PV[BCCf(n)] 


BCC t (n) 

[1 + z f (n) + KY 


( 1 ) 


where 

PV[BCC t (n)] — present value of the cash 
flow for the bond class for 
month t on path n 

BCC t (n) — cash flow for the bond class 
for month t on path n 
Zt(n) — spot rate for month f on 
path n 
K — spread 

The present value for path n is the sum of the 
present value of the cash flows for each month 


PV[Path(«)] = PV[BCCi(h)] + PV[BCC 2 (k)] 

+ --- + PV[BCC m (m)] 

( 2 ) 

where PV[Path(n)] is the present value of inter¬ 
est rate path n. 

Determining the Theoretical Value 

The present value of a given interest rate path 
is treated as the theoretical value of a bond class 
if that path is realized. The theoretical value 
of the bond class using the Monte Carlo simu¬ 
lation model is determined by calculating the 
average of the theoretical values of all the in¬ 
terest rate paths. That is, the theoretical value is 
equal to 

Theoretical value 

PV[Path(l)] + ■ • • + PV[Path(N)] 

“ N ( ) 

where N is the number of interest rate paths. 

Notice that the results of the Monte Carlo sim¬ 
ulation model produce one value, the average 
value, and that value is taken as the theoretical 
value. However, as noted earlier, the purpose 
of a Monte Carlo simulation model is to esti¬ 
mate the probability distribution for the vari¬ 
able of interest. While a probability distribution 
can easily be obtained from the values for each 
path and summary information in addition to 
the mean such as dispersion and skewness 
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measures can be computed, it is rare if that 
information is provided. Basically, the reason 
is that investors rarely seek that information 
because too often they do not understand the 
Monte Carlo simulation process. 

Moreover, it should be apparent how the 
Monte Carlo simulation model is driven by as¬ 
sumptions. Hence, a user of a model such as 
the one described here is subject to modeling 
risk. To mitigate modeling risk, an investor can 
test the sensitivity of the value produced by 
the model to alternative assumptions. For ex¬ 
ample, regarding the volatility assumption, the 
model can be rerun assuming both proportion¬ 
ality lower and higher volatility than initially 
assumed. The sensitivity to prepayments can 
be analyzed in the same way. From the sensitiv¬ 
ity analysis, an investor can determine which 
assumptions appear to be more important for 
the security being considered for purchase. 6 

Option-Adjusted Spread 

Thus far we have seen how the theoretical 
value of a security can be determined using the 
Monte Carlo simulation model. Recall that in 
the model, a spread ( K ) is added to the monthly 
spot rates on all the interest rate paths in Table 5 
in order to determine the discount rate used for 
calculating the present value of the cash flows. 
The spread should reflect the risk associated 
with the security as required by the market. 
However, the reverse can be done. Given (1) 
the cash flows in Table 4 for the bond class be¬ 
ing valued, (2) the spot rates in Table 5, and (3) 
the market price of the security being valued, 
one can determine the spread that will make 
the average value for the interest rate paths 
equal to the market price (plus accrued interest). 
That spread is what is referred to as the option- 
adjusted spread (OAS). Mathematically, OAS is 
the spread that will make 

Market price + PV[Path(l)] + ■ ■ ■ + PV[Path(N)] 
Accrued interest N 

(4) 

where N is the number of interest rate paths. 


Basically, the OAS is used to reconcile the 
model's value [that is, the value determined 
by the Monte Carlo simulation model given by 
equation (3)] with the market price. On the left- 
hand side of equation (4) is the market's valua¬ 
tion of the security as represented by the market 
price. On the right-hand side of the equation 
is the model's evaluation of the security (that 
is, the theoretical value), which is the average 
present value over all the interest rate paths. 
Basically, the OAS was developed as a measure 
of the spread that can be used to convert dollar 
differences between model value and market 
price. But what is it a "spread" over? In de¬ 
scribing the model above, we can see that the 
OAS is measuring the average spread over the 
benchmark spot rate. It is an average spread 
since the OAS is found by averaging over the 
interest rate paths for the possible future bench¬ 
mark spot rate curves. 

This spread measure is superior to the nom¬ 
inal spread, which gives no recognition to the 
prepayment risk. The OAS is "option adjusted" 
because the cash flows on the interest rate paths 
are adjusted for the option of the borrowers to 
prepay. 

Option Cost 

The implied cost of the option embedded in a 
security can be obtained by calculating the dif¬ 
ference between the OAS and the zero-volatility 
spread. That is. 

Option cost = Zero-volatility spread — OAS 

The option cost measures the prepayment (or 
option) risk embedded in MBS and ABS. Note 
that the cost of the option is a by-product of the 
OAS analysis, not valued explicitly with some 
option pricing model. 

When the option cost is zero because the bor¬ 
rower tends not to exercise the prepayment op¬ 
tion when interest rates decline below the loan 
rate or when there is no prepayment option, 
then substituting zero for the OAS in the previ¬ 
ous equation and solving for the zero-volatility 
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spread, we get: 

Zero-volatility spread = OAS 

Consequently, when the value of the option is 
zero (that is, the option cost is zero) for a partic¬ 
ular ABS, simply computing the zero-volatility 
spread for relative value purposes or for valu¬ 
ing that ABS is sufficient. Even if there is a small 
value for the option, the zero-volatility spread 
should be adequate rather than calculating an 
OAS using the Monte Carlo simulation model. 


price to changes in interest rates is to change 
rates by a small number of basis points and cal¬ 
culate how its price will change. To do this, we 
introduce the following notation. Let 

V 0 = initial value or price of the security 
Ay = change in the yield of the security (in 
decimal) 

V_ = the estimated value of the security if the 
yield is decreased by Ay 
V + = the estimated value of the security if the 
yield is increased by Ay 


Simulated Average Life 

The average life of a security when using the 
Monte Carlo simulation model is the weighted 
average time to receipt of principal payments 
(scheduled payments and projected prepay¬ 
ments). The average life reported in a Monte 
Carlo model is the average of the average lives 
along the interest rate paths. That is, for each 
interest rate path, there is an average life. The 
average of these average lives is the average life 
reported by the model. 

Additional information is conveyed by the 
distribution of the average life. The greater the 
range and standard deviation of the average 
life, the more uncertainty there is about the se¬ 
curity's average life. 


MEASURING INTEREST RISK 

There are two measures of interest rate risk that 
are commonly used: duration and convexity. 7 
Duration is a first approximation as to how the 
value of an individual security or the value 
of a portfolio will change when interest rates 
change. Convexity measures the change in the 
value of a security or portfolio that is not ex¬ 
plained by duration. How these measures are 
computed when using the Monte Carlo simula¬ 
tion model is described in this section. 

Duration 

The most obvious way to measure a bond's 
price sensitivity as a percentage of its current 


There are two key points to keep in mind 
in the foregoing discussion. First, the change 
in yield referred to above is the same change 
in yield for all maturities. This assumption is 
commonly referred to as a "parallel yield curve 
shift assumption." Thus, the foregoing discus¬ 
sion about the price sensitivity of a security to 
interest rate changes is limited to parallel shifts 
in the yield curve. Second, the notation refers to 
the estimated value of the security. This value 
is obtained from a valuation model. Conse¬ 
quently, the resulting measure of the price sen¬ 
sitivity of a security to interest rate changes is 
only as good as the valuation model employed 
to obtain the estimated value of the security. 

Now let's focus on the measure of interest. We 
are interested in the percentage change in the 
price of a security when interest rates change. 
This measure is referred to as duration. It can 
be demonstrated that duration can be estimated 
using the following formula: 


Duration = 


v_ - v+ 

2V 0 (Ay) 


( 5 ) 


The duration of a security can be interpreted 
as the approximate percentage change in price 
for a 100 basis point parallel shift in the yield 
curve. Thus, a bond with a duration of 5 will 
change by approximately 5% for a 100 basis 
point parallel shift in the yield curve. For a 
50 basis point parallel shift in the yield curve, 
the bond's price will change by approximately 
2.5%; for a 25 basis point parallel shift in the 
yield curve, 1.25%, and so on. 
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What this means is that in calculating the val¬ 
ues of V_ and V + in the duration formula, the 
same cash flows used to calculate Vo are used. 
Therefore, the change in the bond's price when 
the yield curve is shifted by a small number of 
basis points is due solely to discounting at the 
new yields. This assumption makes sense for 
option-free bonds such as Treasury securities 
and nonmortgage ABS such as credit card ABS 
and auto loan-backed ABS. However, the same 
cannot be said for MBS and mortgage-related 
ABS because for these products the cash flows 
are sensitive to changes in interest rates. Rather, 
for these products a change in yield will alter 
the expected cash flows because it will change 
expected prepayments. 

The Monte Carlo simulation model takes into 
account how parallel shifts in the yield curve 
will affect the cash flows. Thus, when V_ and 
V + are the values produced from the valuation 
model, the resulting duration takes into account 
both the discounting at different interest rates 
and how the cash flows can change. When du¬ 
ration is calculated in this manner, it is referred 
to as effective duration or option-adjusted duration. 

To calculate effective duration, the value of a 
security must be estimated when interest rates 
are shocked (that is, changed) up and down a 
given number of basis points. In terms of the 
Monte Carlo simulation model, the yield curve 
used is shocked up and down and the new 
curve is used to generate the values to be used 
in equation (5) to obtain effective duration. 

There are two important aspects of this pro¬ 
cess of generating the values when the rates are 
shocked that are critical to understand. First, the 
assumption is that the relationships assumed 
do not change when rates are shocked up and 
down. Specifically, (1) the interest rate volatil¬ 
ity is assumed to be unchanged to derive the 
new interest rate paths for a given shock (that 
is, the new Table 1), as well as the other assump¬ 
tions made to generate the new Table 2 from the 
newly constructed Table 1, and (2) the OAS is 
assumed to be constant. The constancy of the 
OAS comes into play because when discount¬ 


ing the new cash flows (that is, the cash flows in 
the new Table 4), the current OAS that was com¬ 
puted is assumed to be the same and is added 
to the new rates in the new Table 1. 


Convexity 

The duration measure indicates that regardless 
of whether interest rates increase or decrease, 
the approximate percentage price change is the 
same. However, this does not agree with the 
price volatility property of a bond. Specifically, 
while for small changes in yield the percentage 
price change will be the same for an increase or 
decrease in yield, for large changes in yield this 
is not true. This suggests that duration is only 
a good approximation of the percentage price 
change for a small change in yield. 

The reason for this result is that duration is 
in fact a first approximation for a small change 
in yield. The approximation can be improved 
by using a second approximation. This approx¬ 
imation is referred to as "convexity." (The use 
of this term in the industry is unfortunate since 
the term "convexity" is also used to describe 
the shape or curvature of the price / yield rela¬ 
tionship.) The convexity measure of a security 
can be used to approximate the change in price 
that is not explained by duration. 

The convexity measure of a bond can be ap¬ 
proximated using the following formula: 


^ V+ + V- - 2Vo 

Convexity measure =--— 

J 2V 0 (Ay) 2 


( 6 ) 


where the notation is the same as used earlier 
for duration. When the values for the inputs 
in the convexity measure as given in equation 
(6) are obtained from a Monte Carlo simulation 
model, the resulting convexity is referred to as 
effective convexity. Note that dealers often quote 
convexity by dividing the convexity measure 
by 100. 

When the convexity measure is positive, we 
have the situation where the gain is greater than 
the loss for a given large change in rates. That 
is, the security exhibits positive convexity. Most 
nonmortgage ABS have positive convexity. 
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However, if the convexity measure is negative, 
we have the situation where the loss will be 
greater than the gain. A security with this char¬ 
acteristic is said to have negative convexity and 
it occurs with MBS and mortgage-related ABS. 

KEY POINTS 

• Valuing securities with interest rate-sensitive 
options requires the employment of a model 
that recognizes how future interest rates can 
change and how that impacts the expected 
cash flows. 

• For bonds with embedded options such as 
callable bonds and putable bonds, as well as 
bonds that have an accelerated sinking fund 
provision, the lattice method can be used. 
Unfortunately, the lattice model cannot be 
used for MBS and mortgage-related ABS be¬ 
cause these securities have path-dependent 
cash flows and thus how interest rates have 
evolved prevents solving a lattice model. 

• Instead of the lattice model, the Monte Carlo 
simulation model is used to value MBS and 
mortgage-related ABS. There are many as¬ 
sumptions in the model and therefore, sen¬ 
sitivity analysis should be used to test the 
sensitivity of the model's value to changes 
in the major assumptions. 

• For ABS that do not have an embedded op¬ 
tion (that is, no prepayment option) or where 
there is a prepayment option but for all in¬ 
tents and purposes the prepayment option is 
unlikely to be exercised, valuation is fairly 
straightforward—assuming a good model for 
estimating defaults and recoveries. It is sim¬ 
ply the present value of the expected cash 
flow discounted at the benchmark spot rates 
plus an appropriate spread. 

• The cash-flow yield measure for MBS and 
ABS is a flawed measure of value. The cor¬ 
responding nominal spread is therefore sim¬ 
ilarly flawed. A better measure for ABS 
where the prepayment option has little value 
is the zero-volatility spread. For MBS and 
mortgage-related ABS, the commonly used 


measure is the OAS. This measure adjusts the 
spread for the embedded option by adjusting 
the cash flows in the Monte Carlo simulation 
model (as well as in the lattice model). 

• Because the OAS is derived from the 
Monte Carlo simulation model, it is also 
an assumption-driven product and therefore 
subject to modeling risk. 

• The appropriate interest risk measures for 
MBS and mortgage-related ABS are effective 
duration and effective convexity. These mea¬ 
sures require, as inputs, the estimated value of 
the security obtained by shocking the Monte 
Carlo simulation model. 

NOTES 

1. For a discussion of MBS, see Fabozzi, 
Bhattacharya, and Berliner (2011). Asset- 
backed securities are described in Fabozzi 
( 2012 ). 

2. For a discussion of prepayment models 
for MBS, see Fabozzi, Bhattacharya, and 
Berliner (2011). 

3. PACs are described in Fabozzi, Bhat¬ 
tacharya, and Berliner (2011). 

4. For applications of Monte Carlo simulation 
to finance, see Pachamanova and Fabozzi 
( 2010 ). 

5. Variance-reduction methods in Monte 
Carlo simulation are explained in 
Pachamanova and Fabozzi (2010). 

6. For an illustration applied to an actual 
CMO transaction, see Fabozzi, Richard, and 
Horowitz (2006). 

7. For an explanation of duration and convex¬ 
ity, see Fabozzi (1999, 2011). 
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Abstract: Even a simple mortgage pass-through is a path-dependent financial instrument, valuation 
of which depends on prepayment "burnout." The burnout is caused by observed or unobserved 
heterogeneity of borrowers; as a result, a mortgage pool's composition changes in the presence of 
refinancing incentives. An attractive modeling approach for dealing with this is to split a mort¬ 
gage pool into mutually exclusive "active" and "passive" groups. Not only does such a method 
explain the burnout, it effectively decomposes the path-dependent valuation problem into two 
easy-to-solve path-independent ones. The method is faster than the traditional Monte Carlo sam¬ 
pling approach while delivering the full set of interest rate risk measures at no additional cost 
of computing time. The method can be applied to an attractive prepayment model specification 
where the speed is a function of the pool's objective price, and not an interest rate. This makes 
universal refinancing modeling feasible as the same curve or curves can apply to both fixed- and 
adjustable-rate mortgages. 


The active-passive decomposition (APD) method 
of mortgage-backed securities (MBS) modeling 
and valuation was introduced in Levin (2001, 
2002, 2003). An efficient alternative to brute- 
force Monte Carlo simulation, the APD method 
splits a mortgage pass-through into two path- 
independent components, the active (refinance- 
able) and the passive (nonrefinanceable). Once 
this is done, the most time-efficient pricing 
structures operating backwards on probabil¬ 
ity trees or finite-difference grids could be 
employed. This valuation method runs faster 
than Monte Carlo simulation while deliver¬ 


ing a much richer outcome—all stressed values 
required by mandatory risk assessments—at 
no additional cost. Risk managers and traders 
of unstructured mortgage instruments such 
as agency pass-through MBS, whole loans, 
stripped (IO/PO) derivatives, and mortgage 
servicing rights (MSRs) are immediate benefi¬ 
ciaries of the method. 

The APD approach simulates the burnout ef¬ 
fect in a natural and explicit way through mod¬ 
eling the heterogeneity of the collateral. Hence, 
it presents an analytical advantage over any 
other approach that requires ad hoc judgments 


The extended APD model and its implementation presented here has greatly benefited from joint work 
with Andrew Davidson and Dan Szakallas. 
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about the achieved degree of burnout. Struc¬ 
tured instruments—such as a collateralized 
mortgage obligation (CMO) and asset-backed 
security (ABS)—though they retain heavy 
sources of path-dependence (other than the 
burnout) and still rely on Monte Carlo pricing, 
can benefit from better, more robust prepay 
modeling. 

The multi-population view of mortgage col¬ 
lateral is a known approach used to explain 
the burnout effect. In one of the earliest mod¬ 
eling attempts, Davidson (1987) and Davidson 
et al. (1988) proposed the refinancing threshold 
model, in which collateral is split into three or 
more American option bonds having differing 
strikes. A conceptually similar approach pro¬ 
posed by Kalotay and Young (2002) divides 
collateral into bonds differing by their exercise 
timing. Such structures naturally call for the 
backward induction pricing, but they fall short 
in replicating actually observed, probabilisti¬ 
cally smoothed, prepayment behavior—even if 
many constituent bonds are used. On the other 
hand, analytical systems used in practice of¬ 
ten employ multi-population mortgage models 
(see Hayre, 1994, 2000), but do not seek any 
computational benefits as they rely heavily on 
Monte Carlo simulation pricing anyway. 

The APD is a "mortgage-like" model with 
refinancing S-curve, aging, and other ad hoc 
features, which are meant to capture noneffi¬ 
cient, empirical option exercise. Therefore, the 
APD model is capable of generating realistic 
prepayment behavior with only two constituent 
components, the active and the passive. This en¬ 
try introduces an extended APD model and its 
applications. 


PATH-DEPENDENCE AND 
PRICING PARTIAL 
DIFFERENTIAL EQUATION 

Let us consider a hypothetical dynamic asset 
("mortgage") market price of which P(t, x) de¬ 
pends on time t and one market factor x. The 
latter can be formally anything and does not 


necessarily have to be the short market rate or 
the yield on the security analyzed. We treat x(t) 
as a random process having a (generally, vari¬ 
able) drift rate p and a volatility rate or, and be¬ 
ing disturbed by a standard Brownian motion 
z(f), that is. 


dx = pdt + adz (1) 


We assume further that the asset continu¬ 
ously pays the c(t, x) coupon rate and its bal¬ 
ance B is amortized at the ’/ft, x) rate, that is, 
dB/dt = — X B. Then one can prove that the price 
function P(t,x) should solve the following par¬ 
tial differential equation (PDE): 


r + OAS 

expected return 


l dP 

jj~dT 


l 

+ p( c + X) — X 

-- --' 

time return 


1 dP 1 9 2 P , 
“I"-/x -— o 

P_dx_2 2P dx 

return return 


( 2 ) 


A derivation of this PDE can be found in Levin 
(1998), but it goes back at least to Fabozzi and 
Fong (1994). A notable feature of the above writ¬ 
ten PDE is that it does not contain the balance 
variable, B. The entire effect of possibly ran¬ 
dom prepayments is represented by the amor¬ 
tization rate function, X(t, x). Although the total 
cash flow observed for each accrual period does 
depend on the beginning-period balance, con¬ 
struction of a finite-difference scheme and the 
backward induction will require the knowl¬ 
edge of X(t , x), not the balance. This observa¬ 
tion agrees with a trivial practical rule stating 
that the relative price is generally independent 
of the investment size. 

Another interesting observation comes as 
follows. If we transform the economy having 
shifted all the rates, r(t, x) and c(t, x), by amor¬ 
tization rate X(t, x), then PDE (2) will be re¬ 
duced to the constant-par asset's pricing PDE. It 
means that a probability tree or finite difference 
pricing grid built in the "/.-shifted" economy 
should, in principle, have as many dimen¬ 
sions as the total number of factors or state 
variables that affect r, c, and X. In particular, 
if the coupon rate is fixed, and the amortization 
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rate A depends only on current time (loan age) 
and the immediate market factor x, the entire 
valuation problem can be solved backwards on 
a two-dimensional (x, t) lattice. To implement 
this method, we would start our valuation pro¬ 
cess from maturity T when we surely know that 
the price is par, P(T, x) = 1, regardless the value 
of factor x. 

Working backwards, we derive prices at age 
t — 1 from prices already found at age t. In doing 
so, we replace derivatives in PDE (2) by finite 
difference approximations, or weigh branches 
of the lattice by explicitly computed probabili¬ 
ties. If the market is multifactor, then x should 
be considered a vector; the lattice will require 
more dimensions. Generally, the efficiency of 
finite-difference methods deteriorates quickly 
on high-dimensional grids because the number 
of nodes and cash flows grows geometrically; 
probability trees may maintain their speed, but 
at the cost of accuracy, if the same number of 
emanating nodes is used to capture multifactor 
dynamics. If we decide to operate on a probabil¬ 
ity tree instead of employing a finite-difference 
grid, then, for every branch. 


Pk 


Ck + Pk +1 + Ajc(l — Pk+l) 
1 + Tfc "b OAS 


(3) 


where P* is the previous-node value deduced 
from the next-node value Pjt+i- Of course, prob¬ 
ability weighting of thus obtained values ap¬ 
plies to all emanating branches. 


EXTENDED ACTIVE-PASSIVE 
DECOMPOSITION MODEL 

Even for a simple fixed-rate mortgage pass¬ 
through, total amortization speed A cannot be 
modeled as a function of time and the imme¬ 
diate market. Prepayment burnout is a strong 
source of path-dependence because the future 
refinancing activity is affected by the past in¬ 
centives. One can think of a mortgage pool 
as of a heterogeneous population of partici¬ 
pants having different refinancing propensities. 
Some borrowers have higher rate, better credit. 


larger loans, or perhaps they face smaller state- 
enforced transaction costs. Once they leave the 
pool, the future prepayment activity gradually 
declines. 

Instead of considering pricing PDE for the en¬ 
tire collateral, we propose decomposing it first 
into two components, "active" and "passive," 
differing in refinanceability. Under the follow¬ 
ing two conditions, mortgage path-dependent 
collateral can be deemed a simple portfolio of 
two path-independent instruments: 

1. Active and passive components prepay dif¬ 
ferently, but follow the immediate market 
and loan age. 

2. Any migration between components is 
prohibited. 

The Details 

Here is a permissible example: 

ActiveSMM = RefiSMM + TurnoverSMM 
PassiveSMM = /l* RefiSMM + TurnoverSMM 

(4) 

where RefiSMM denotes refinancing speed 
measured in terms of the single monthly 
mortality rate (SMM), TurnoverSMM is the 
turnover speed, and both are assumed to de¬ 
pend on market rates and loan age only. Param¬ 
eter ft quantifies relative refinancing activity for 
the passive component; it takes values between 
0 and 1. 

In order to find the total speed, we have to 
know the collateral composition. Denote i/r the 
ratio of active group to total, then 

A = TotalSMM = i//* ActiveSMM 
+ (1 - YO*PassiveSMM (5) 

All variables are time-dependent, but we 
omitted subscript t for simplicity. The initial 
value of ijr describes the composition of collat¬ 
eral at origination; both and f J > are parameters 
for the particular prepay model. The dynamic 
evolution of \[r from one time moment (f) to the 
next (f + 1) is as follows 

1 — ActiveSMMt 
^ t+l = ^ 1 - TotalSMMt 
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It is worth considering a few trivial special 
cases. First, if x[r is zero at any instance of time, 
it will remain zero for life. Second, if r/r is 1 at 
any time, then it will retain this value as well 
because TotalSMM is identical to ActiveSMM 
from equation (5). Indeed, if the mortgage pool 
is either totally passive (xj/ = 0) or totally active 
(i fr = 1), it will retain its status due to the com¬ 
plete absence of migration. In either of these 
two special cases, variables xfr and TotalSMM 
are path-independent, leading us to a key con¬ 
clusion: The separate consideration of active 
and passive components avoids the problem of 
path-dependence altogether. 

How the Model Works Forward 

If 0 < \[r < 1, then TotalSMM < ActiveSMM, 
the fraction in the right-hand side of formula 
(6) is less than 1, and t/c gradually falls. If we 
employed the APD model for prepay modeling 
while using Monte Carlo simulation for valua¬ 
tion, we could innovate compositional variable 
i fr month after month. First, we would compute 
refinancing and turnover speeds at time t from 
their respective models. Then, we would pro¬ 
duce active, passive, and total speeds, all still 
at time f, from formulas (4) and (5). This in¬ 
formation is not only sufficient to generate the 
f-month cash flow, but it also allows for find¬ 
ing the next-month composition, xfr t+ i, from for¬ 
mula (6), and proceeding forward. 

Note that prepay speeds RefiSMM and Turn- 
overSMM depend only on current market rates 
and time, that is, they are path-independent. 
Naturally, ActiveSMM and PassiveSMM found 
from (4) will be path-independent as well. In 
contrast, variables xfr and TotalSMM are gen¬ 
erally path-dependent except when if/ is either 
0 or 1. 

Let us visualize how the APD model works. 
Suppose we have a pool with i/^o =0.8, that is, 
the active part constitutes 80% of the total at 
origination. Consider two possible scenarios: 

Scenario A: Rates drop and remain low, induc¬ 
ing refinancing activity. 

Scenario B: Rates rise and remain high. 


Figures 1A and IB show how the pool com¬ 
position will evolve in these two cases. For sce¬ 
nario A, pool balance is amortized quickly due 
to the refinancing wave, but, more importantly, 
the active group (darker bars) evaporates much 
faster than the passive group (lighter bars). As 
the result, variable xfr drops from the original 
80% to under 30% and, correspondingly, the to¬ 
tal speed (as measured by conditional prepay¬ 
ment rate and denoted by CPR) declines—in the 
complete absence of any rate dynamics. A siz¬ 
able speed reduction from 45 CPR to 30 CPR is 
caused exclusively by the burnout effect and re¬ 
flected by xfr . This effect is not seen in scenario B 
where the active and the passive groups retire at 
similar rates. Pool composition barely changes, 
as does the total prepayment speed. 

We could give prepayment behaviors de¬ 
picted in Figures 1A and IB another interest¬ 
ing practical interpretation. Let us assume that 
we wish to compare a regular fixed-rate pool 
(Figure 1A) with a prepayment-penalty pool 
(Figure IB) under the same low-rate market 
conditions. The regular pool burns out—unlike 
the prepay-penalty one, which faces additional 
refinancing barriers. At the end of its penalty 
window (assume 60 months), this pool retains 
a relatively high level of xfr (71.7%). Looking at 
a matching speed level in Figure 1A, we con¬ 
clude that, once the penalty window is over, 
the prepay speed will jump above 40 CPR (com¬ 
pared to 29 CPR of the regular pool). Therefore, 
the APD model naturally explains the "catch¬ 
up" effect actually known for prepay-penalty 
mortgages. 

Above, we assumed a newly originated pool, 
the population of which is determined by pa¬ 
rameter xfr o- In practice, a pool may be already 
seasoned, and today's value of i fr, denote it 
xfr(to), needs to be determined first. We will 
cover this task shortly. 

How the Model Works in 
Backward Induction 

If we decide to employ the APD model for back¬ 
ward valuation, we do not need to innovate 
path-dependent variables, xfr, and TotalSMM, 
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Figure 1 Simple APD Model: How It Works Forward 



or keep track of their dynamics. Here are few 
simple steps to perform: 

Step 1: Recover today's value of the population 
variable, i/ffo). 

Step 2 Active: Generate cash flows on each node 
of a pricing grid (tree) for the active part 


only and value it using a backward inducting 
scheme that solves pricing equation (2). 

Step 2 Passive: Do the same for the passive part. 
Step 3: Combine thus obtained values as 

P — \jr (fg) P ac ti ve + [1 V'(f 0 )] b passive (7) 

Interestingly enough, formula (7) applies to 
today's prices obtained for all interest rate 
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levels of the pricing grid. As we mentioned 
above, computing prices on the entire grid is an 
inseparable part of backward valuation. There¬ 
fore, the total price can be also found on the grid, 
at no additional cost. In particular, the mea¬ 
sures representing the sensitivities of an MBS 
price to the interest rates are found immediately, 
without any repetitive efforts with a stressed 
market (compare to Monte Carlo simulation). 
However, we can't apply formula (7) for future 
nodes because we know only ift(to) —today's 
value of i fr. 


Initializing the Burnout Factor 

If the pool is already seasoned, we have to 
assess i//-(fo) first before we can employ the 
APD model either for forward simulation or 
backward induction. There exist two main ap¬ 
proaches to solve this problem: an analytical 
closed-form method and historical simulation. 

Suppose that we know the pool's age, to, 
factor, F(fo), and a constant turnover rate, 1 
Xturnover■ Then, we can assess the turnover fac¬ 
tor F turnover (t 0 ) = exp(—Xturnoverk) along with the 
scheduled factor, F S cted»/ed(fo)- Since the entire 
pool's amortization is driven by refinancing, 
turnover, and the scheduled payoff, the knowl¬ 
edge of two out of three factors along with the 
total pool's factor is enough to restore the entire 
time to composition. It is easy to show that un¬ 
known i//(fo) satisfies the following, generally 
transcendent, algebraic equation: 

x + olx^ = 1 (8) 


where a is a known parameter: 


1 - fo 
fa 


Ffurnove r (to ) F scheduled (1(3 ) 

Fit 0 } 


n 1-/3 


and ft is the same speed-reducing multiplier 
that enters the APD model (4). 

Of course, no numerical iterations are needed 
if ft is 0, 1, or 0.5. For instance, ft = 1 is a triv¬ 
ial case when the pool is homogeneous and is 
not subject to burnout, ift{to) = ifto- Case ft = 0 
was considered in Levin (2001,2002); it leads to 


f(t 0 ) = 1 - (1 - fa ft^erM^cheduledM ' A sim _ 

pie quadratic equation for iftift) arises when 
ft = 0.5, with only one meaningful positive so¬ 
lution. For all other values of ft, numerical meth¬ 
ods will suffice. 

Solving equation (8) is an attractive way to 
initialize the burnout stage, as it does not re¬ 
quire historical simulation of past refinancing 
incentives. However, it is valid only for very 
specific forms of the APD model, presented by 
formulas (4) and (5). Any possible extension of 
the model (such as discussed below) will make 
it impossible to recover the burnout stage using 
the pool's factor and age information only. An 
alternative method to estimate \[r (to) would be a 
historical simulation of all prepayment compo¬ 
nents, that is, running the APD model forward 
from a pool origination until today. A relevant 
historical interest rate dataset will be required 
to facilitate this process. 


EXTENSIONS AND NUANCES 

In this section, we discuss several possibilities 
of exploring and extending the APD frame¬ 
work. We complete the section by disclosing 
its expected accuracy and limitations. 

Computing Interest Rate 
Sensitivities Directly Off a 
Pricing Tree 

Let us illustrate how interest rate exposures can 
be efficiently computed using prices produced 
on a pricing tree. The idea is to augment the 
tree with "ghost nodes" as shown in Figure 2; 
for simplicity and clarity, we illustrate the idea 
with a recombining binomial tree. 2 

The tree contains the usual nodes and links 
(solid lines) that refer to market conditions (in¬ 
terest rates) and their changes. The root node 
refers to today's market. We assume applica¬ 
tion of the pricing formula (3) for every tran¬ 
sition. We carry this process from maturity 
backward until we reach the root. This process 
is carried out separately for Active and Passive 
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Figure 2 Extended Pricing Tree 

components of the mortgage pool; at the root, 
we combine prices using formula (7). 

Let us augment the actual tree with some 
nodes marked by "Up" and "Down" in Fig¬ 
ure 2. Those nodes cannot be reached from the 
root, but can be perceived theoretically as re¬ 
sults of immediate market shocks. We can add 
as many nodes at time t — 0 as we would like. 
These nodes and the emanating transitions are 
marked by dashed lines in Figure 2. If we as¬ 
sign transitional probabilities according to the 
law of our interest rate models and carry out 
the backward valuation process, we will end up 
with prices of Active, Passive, and Total prices 
at time t = 0. We can now measure duration 
and convexity using up and down shifts in the 
interest rate factor; we can also compile a risk 
report covering a substantial range on interest 
rate moves. These calculations will require car¬ 
rying out the backward induction algorithm on 
a somewhat expanded tree, but otherwise, no 
extra computing efforts. 

One practical question a user may have is 
whether interest rate shocks that are reflected in 
the up, the down, and other nodes are, in fact, 
parallel moves. In most cases, they are not. Each 


node of the valuation tree represents the full set 
of market conditions altered by a single factor 
(e.g., the short rate). The entire yield curve be¬ 
comes known via the relevant law of the term 
structure model. For example, long rates move 
less than the short rate if the single-factor model 
is mean reverting; the rate's move may be com¬ 
parable on a relative, not absolute, basis if the 
model is lognormal, and so on. These examples 
illustrate nonparallel moves in the yield curve. 
In these cases, it would be practically advis¬ 
able to measure the Greeks with respect to the 
"most important" rate, such as the MBS current 
coupon rate or the 10-year reference rate. 

Among a vast family of known short-rate 
models, there exists one special model whose 
internal law is consistent with the notion of 
parallel shocks. This is the Hull-White model 
with a zero mean reversion, also known as the 
Ho-Lee model 3 (see, for example, Hull, 2005). 
When the short rate moves by x basis point, 
every zero-coupon rate will move by the same 
amount, regardless of its maturity. 

If the Ho-Lee model is not employed and the 
sensitivity to parallel shocks of interest rates is 
a must (no approximation accepted), the tree- 
based valuation will have to be repeated using 
user-defined parallel moves of the yield curve. 
Whereas some advantages of the backward in¬ 
duction's superior speed will be forfeited, the 
method will still stand as a viable alternative to 
the Monte Carlo method. 

More Components, More 
Prepay Sources 

The APD model given by (4), (5), and (6) 
is a two-component pool model exposed to 
two sources of prepayment, refinancing and 
turnover. Each of these features can be gener¬ 
alized. A mortgage pool can be thought of as 
a blend of many prepayment patterns (super¬ 
active, active, moderately active, and so on). 
On the other hand, there may exist prepayment 
sources that contribute to each of the groups, 
but are distinctly different from refinancing and 
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turnover. Let us briefly discuss both ways to ex¬ 
tend the model. 

As we already pointed out, even a two- 
component model ensures smooth prepayment 
behavior if each component does so. Within the 
APD framework, a refinancing model may in¬ 
clude the traditional S-like curve, aging, and 
perhaps some other known empirical mort¬ 
gage effects that can be attributed to a nonop- 
timal option exercise. The total prepayment 
speed is proven to be between RefiSMM and 
TurnoverSMM, being continuously weighted 
as controlled by variable i//(f). Adding more 
components into the model does not alter this 
fact nor does it add any smoothness in the pre¬ 
pay model. It is also more difficult to fit a three- 
or four-component model than the APD model 
presented here. We believe that even a simple, 
but dynamic, APD model captures the main 
prepayment factors, including burnout. 

The APD model (4) assumes that the ac¬ 
tive and passive components share the same 
turnover rate, and their refinancing speeds re¬ 
late to one another as 1 to fl. We can consider 
some other prepayment source that is not prop¬ 
agated to the active and passive components 
identically, or with the 1 to ft ratio. For example, 
we may introduce both default termination and 
credit cure prepay sources, additive to refinanc¬ 
ing and turnover, but likely having a higher ef¬ 
fect on the passive part than on the active part. 4 

Of course, additional prepayment sources can 
be formally included in the refinancing without 
assuming any more that active and passive re¬ 
financing models relate to one another. We will 
not be able to initialize i//(fo) by solving equation 
(8), and we must use historical simulation for 
this purpose as discussed above. Principally, we 
may assume unrelated refinancing models built 
for the active and passive components, gaining 
generality with little sacrifice of convenience. 

Residual Sources of 
Path-Dependence 

The APD model takes care of the burnout ef¬ 
fect, the major source of path-dependence for 


fixed-rate mortgages. After the decomposition 
is done, we need to review residual sources 
of path-dependence and arrange the numeri¬ 
cal valuation procedure to reduce or eliminate 
potential pricing errors. 

Prepayment lag, a lookback option feature, 
is such a source. Applications to obtain a new 
mortgage replacing an old one enter the orig¬ 
ination pipeline 30 to 90 days before the loan 
is actually closed and the existing debt is paid 
off. Even if the prepayment model features a 
lag, but the backward valuation scheme is un¬ 
aware of its existence, the pricing results can 
be somewhat inaccurate. This ignorance of the 
lag by the backward induction scheme usually 
causes small errors for pass-through securities. 
However, mortgage strip derivatives are highly 
prepayment sensitive, and the lag may change 
their values in a sizable way. 

It is generally known that lookbacks with 
fairly short lag periods can be accounted for 
in the course of a backward induction process. 
Let us assume, for example, that, on a trino¬ 
mial monthly tree, speed A* actually depends 
on market rates lagging one month. Hence, the 
MBS value will also depend on both the current 
market and 1-month lagged market. This is to 
say that each valuation node of the tree should 
be "sliced" into three subnodes keeping track of 
prices matching three possible historical nodes, 
one month back. Of course, this costs computa¬ 
tional time; efficiency may deteriorate quickly 
for deeper lags and more complex trees. 

Approximate alternatives do exist and it 
is feasible to reduce pricing errors without 
much trouble. AD&Co employs a progressively 
sparse recombining pentagonal tree, which 
does not branch off every month. Branches of 
the tree are made from two to 12 months long 
so that the lagged market rates are explicitly 
known for most monthly steps. The lookback 
correction can also be adapted for the "frac¬ 
tional" prepayment lag that almost always ex¬ 
ists due to the net payment delay between the 
accrued-month-end and the actual cash flow 
date. In such a case, Ajt could be interpolated 
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between the current-month and the previous- 
month values. Thus, the total lag processing 
should account for both prepay lookback and 
payment delay 

Another example of path-dependence not 
cured by pool decomposition is the coupon re¬ 
set for adjustable-rate mortgages (ARMs). Both 
reset caps and nonlinear relationships between 
prepayments and coupons make it difficult for a 
backward induction scheme to account for this 
feature. One possible solution is to extend the 
state space and create an additional dimension 
that would keep track of the coupon rate for 
an ARM (Dorigan et al., 2001). This state-space 
extension will come at a cost of both compu¬ 
tational efficiency and memory consumption. 
Levin (2002) suggests that the reset provisions 
found in typical ARMs allow for backward val¬ 
uation with a practically acceptable accuracy, 
without any special measures on curing this 
path-dependence. 

Modeling Prepayments Universally: 
Refinancing Speed as a Function 
of Price 

We finish the entry with a rather interesting, if 
not unique, application of the APD idea where 
backward valuation of MBS is not an option, 
but a necessity. The academic literature contains 
quite a few works on the rational prepayment 
exercise models. 5 Our APD model is not of that 
sort as it is a "mortgage-like" approach that 
can accommodate empirical features such as an 
S-curve or aging. Yet, it can address some short¬ 
comings typically known for purely ad hoc em¬ 
pirical models. As we have already asserted, the 
APD model can value MBS backward provided 
that its refinancing and turnover constituents 
depend only on the current market. A likely im¬ 
plementation of this rule would rely on some 
experimental relationship between the SMMs 
and a relevant mortgage index. Although this 
is the way most mortgage practitioners en¬ 
vision prepayment modeling, it is not the only 
possible approach. In fact, the refinancing be¬ 


havior of homeowners also depends on the 
type of mortgage in hand. Given coupon and 
market, the economic incentive to prepay van¬ 
ishes when maturity, balloon, or ARM reset date 
approach. Hence, each type of mortgage and 
each seasoning stage call for its own refinancing 
model. 

An attractive alternative would be linking the 
refinancing speed of a mortgage (still measured 
on the grid nodes, separately for the active 
and passive pieces) directly to its price appre¬ 
ciation, using path-independent specification 
RefiSMM(Price) instead of RefiSMM(Rate). 
This is the same hint as the one used for val¬ 
uation of American option bonds except the 
refinancing model can still be an exogenous 
S-curve, not the "optimal" or "rational" exer¬ 
cise rule. This model would state the refinanc¬ 
ing speed, RefiSMM, as a function of the pool's 
price, for example, 15 CPR if collateral is priced 
at 102, 30 CPR for 105, and so on, asymptot¬ 
ically approaching its "ultimate" speed. For¬ 
mulas (4), (5) still allow computing the active, 
passive, and total speeds. In particular, the pas¬ 
sive component will still run off at a beta- 
reduced speed for the same price premium as 
the active component. 

In essence, variable X in the pricing PDE (2) 
becomes a function of the unknown P. Such 
an equation will still be path-independent, pre¬ 
senting no theoretical or computational issues 
for the backward solution. Moreover, if the refi¬ 
nancing behavior is indeed driven by price ap¬ 
preciation and such a universal relationship can 
be experimentally established, then the APD 
modeling approach and its backward imple¬ 
mentation becomes a natural, if not the only, 
way to price an MBS. Any Monte-Carlo-based 
valuation method simply would not allow as¬ 
sessing future prices and, hence, prepayment 
speeds. 

Arguably, the RefiSMM(Price) function can be 
viewed as one universal refinancing rule that 
can serve many collateral types. Furthermore, 
such a model can directly account for additional 
loan-specific transaction costs and cost saving 
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opportunities. For example, the knowledge of 
prepayment penalties, average loan sizes, or 
state-imposed taxes can easily be used to mod¬ 
ify the S-curve. 

Furthermore, the RefiSMM(Price) formula¬ 
tion can be used for modeling collateral be¬ 
havior for CMOs as well. Although a typical 
CMO is path-dependent well beyond its collat¬ 
eral and necessitates Monte Carlo sampling, it 
is the prepayment modeling stage that can be 
done via the APD scheme. We will start with 
valuing collateral first on the grid or a tree, and 
then compute and store ActiveSMM and Pas- 
siveSMM for every node of the tree as a result 
of the backward inducting process described in 
this entry. We then run Monte Carlo simulations 
for the CMO in question and apply precom¬ 
puted SMMs. As we pointed out, the key com¬ 
positional variable i j/(t) is known going forward 
(but not backward), thereby enabling construc¬ 
tion of the full prepayment rate, hence, the cash 
flow, for every node and every path. 

This approach's details and an illustration of 
how the same S-curve can "serve" both fixed- 
rate and adjustable-rate ARMs are given in 
Levin (2006). Pricing PDE (2) with A. = /.(P) has 
been given mathematical consideration by Gon¬ 
charov (2003, 2006), who studied the existence 
and uniqueness of its solution. 

KEY POINTS 

• The prices of mortgage-backed securities 
follow a partial differential equation that in¬ 
cludes interest rates, coupon rates, and pre¬ 
payment rates. Even for a simple mortgage 
pass-through security, this valuation PDE is 
path-dependent as it depends on the attained 
stage of burnout (hence, on past refinancing 
incentives). 

• The active-passive decomposition model 
splits a pool into two path-independent, mu¬ 
tually exclusive borrower groups. APD natu¬ 
rally simulates the burnout effect. 

• For mortgage pass-through securities (and 
their strip derivatives), APD splits valuation 


into two quick backward induction steps and 
produces the entire pricing grid for risk mea¬ 
surement at no additional cost (unlike Monte 
Carlo simulation). 

• Whereas CMOs will still rely on Monte Carlo 
simulation as being heavily path-dependent 
beyond the burnout, they will benefit from 
better prepay modeling. 

• The backward induction pricing technique 
makes future values accessible and new valu¬ 
ation and modeling tasks feasible. For exam¬ 
ple, one can assume that the refinancing curve 
is a function of a loan's objective price rather 
than interest rates. Such an approach can be 
viewed as a universal model that applies to 
both fixed and adjustable rate pools. 


NOTES 

1. We can relax this condition just assuming 
that the historical turnover rate is known, 
not necessarily constant. 

2. When using finite difference grids for solving 
the pricing PDE, the ghost nodes are part of 
the grid. 

3. Flistorical calibration of the Flull-White 
model to the swaption volatility surface of¬ 
ten reveals a small-to-zero level of the mean 
reversion constant. 

4. One reason a borrower is "passive" can be 
due to credit-related issues. 

5. See Longstaff (2003) and Stanton (1995). 
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Abstract: The transformation of groups of mortgage loans with common attributes into tradable 
and liquid MBS occurs using one of two mechanisms. Loans that meet the guidelines of the agencies 
(i.e., Fannie Mae, Freddie Mac, and Ginnie Mae) in terms of credit quality, underwriting standards, 
and balance are assigned an insurance premium (called a guaranty fee) by the agency in question 
and securitized as an agency pool. Loans that either do not qualify for agency treatment, or for 
which agency pooling execution is not efficient, can be securitized in nonagency or "private-label" 
transactions when such transactions are economically feasible. These types of securities do not 
have an agency guaranty, and must therefore be issued under the registration entity or "shelf" 
of the issuer. Although the analysis of private-label mortgage-backed securities utilizes many 
of the techniques employed to assess agency securities, the analysis must be extended in order 
to incorporate credit risk and adjust returns for expected principal losses, requiring additional 
analysis and metrics. 


While the evaluation of private-label mortgage- 
backed securities (MBS) utilizes many of the 
techniques used in the evaluation of agency 
MBS (i.e., Ginnie Mae, Fannie Mae, and Freddie 
Mac MBS), the need to incorporate credit risk 
and adjust returns for expected principal losses 
requires additional analysis and metrics. The 
fact that the credit risk in these securities is not 


assumed by the government, either explicitly 
or implicitly, forces investors to evaluate and 
judge both the timing of the return of princi¬ 
pal as well as the amount of principal, if any, 
that investors can expect to receive. Moreover, 
credit analysis has moved up what is called the 
credit stack. A major change stemming from 
the subprime mortgage crisis is that investors 
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can no longer assume that senior private-label 
mortgage-backed securities have virtually the 
same credit risk as agency MBS. Any bond that 
does not have agency credit support must be 
treated as a "credit piece" requiring the analy¬ 
sis of a variety of internal and external factors. 

In this entry, we outline the various ele¬ 
ments that drive the performance of nonagency 
MBS, and also examine the interactions of these 
factors. 1 We then examine a useful framework 
for understanding the evolution of a popula¬ 
tion's credit profile and discuss a variety of 
techniques used to evaluate the credit risk and 
expected returns of private-label securities. 

FACTORS IMPACTING 
RETURNS FROM 
NONAGENCY MBS 

The analysis of agency MBS is focused on es¬ 
timating the timing of principal cash flows 
since the government backing of these secu¬ 
rities eliminates investors' exposure to princi¬ 
pal writedowns. Private-label securities require 
layers of additional analysis. This is because of 
the introduction of a series of additional fac¬ 
tors that determine the bond's cash flows and 
thus their projected returns. These factors can 
be broadly characterized as: 

• The amount of principal expected to be 

returned. 

• The timing of principal returns. 

• The allocation of principal within the 

transaction. 

Before proceeding, it will be helpful to review 
a few concepts. Prepayments on nonagency se¬ 
curities must be classified based on their cau¬ 
sation. Unlike agency securities, the return of 
principal to the securitization (or, more specif¬ 
ically, the investment trust) must be treated 
differently depending on whether it resulted 
from a voluntary action by the borrower or 
is forced by credit-related difficulties. Model¬ 
ing the impact of voluntary prepayments is rel¬ 
atively straightforward; investors can assume 


that 100% of principal being prepaid will be re¬ 
turned on the next payment date. By contrast, 
projecting the impact of involuntary prepay¬ 
ments requires an estimate of both how much of 
every principal dollar prepaid will actually be 
paid to the investor, as well as when principal 
payments will be received by the trust. 

The Amount and Timing of 
Principal Return 

Before proceeding, a brief discussion of 
terminology will be helpful. For private-label 
securities voluntary prepayments encompass tra¬ 
ditional prepayment activity. Involuntary pre¬ 
payments are credit-related prepayments that re¬ 
sult from defaults or other events specifically 
related to credit events (such as short sales of 
homes), while also accounting for the likelihood 
that less than the full amount of principal will be 
returned to the transaction (or, more accurately, 
the trust holding the deal's collateral). Volun¬ 
tary prepayments are typically quoted as VPRs, 
which stands for voluntary prepayment rate. They 
are calculated similar to a conditional prepay¬ 
ment rate (CPR), in which a monthly percent¬ 
age of prepaid principal (sometimes denoted 
by VMM) is annualized. Involuntary prepay¬ 
ment speeds are quoted as conditional default 
rates (CDRs) 2 , which are the annualized rate 
of default. CDRs are calculated by annualiz¬ 
ing the monthly rate of default as a percentage 
of the current balance, or the MDR. The sum of 
the monthly VMMs and MDRs equals the total 
deal single monthly mortgage (SMM) rate for 
any particular month. 

The issue of how much principal is projected 
to be received as a result of involuntary prepay¬ 
ments is a straightforward function of the as¬ 
sumed default rate and loss severity. Loss sever¬ 
ities are simply the percentage of the defaulted 
principal that ultimately will not be returned to 
the investment trust. The inverse of loss sever¬ 
ity is the recovery percentage. 

The issues associated with the timing of prin¬ 
cipal return are more complex. Since the CDR is 
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by definition the involuntary prepayment rate, 
a higher default rate assumes the faster return 
of at least some principal to investors. As a re¬ 
sult, the faster return of principal to the trust 
due to higher default rates can offset the effects 
of principal loss. This effect is a function of the 
price of the security, the loss severity, and the 
tranche's position in the transaction's structure 
(i.e., under what circumstances the security will 
absorb losses). 

In addition, the amount of time between when 
a default occurs and recovered principal is re¬ 
ceived by the trust (the lag ) can have a ma¬ 
jor influence on investor returns, especially for 
bonds that are more junior in priority. A longer 
lag between the time of default and the re¬ 
ceipt of recovered principal delays the write¬ 
down of the junior bond's principal value. This 
means that the investor may receive interest 
payments for a longer period of time, improv¬ 
ing the value of securities for which the interest 
payments comprise the bulk of expected cash 
flows. In fact, lower-priority subordinates are 
sometimes referred to as credit IOs, since in¬ 
vestors assume that no principal will be re¬ 
turned, and the only cash flows that they expect 
to receive are coupon payments. Since the out¬ 
standing principal is written off more slowly, in¬ 
vestors holding the tranche receive a larger and 
longer stream of interest payments as the lag 
extends. 

There are a variety of factors that influence 
the lag. Both the amount of seriously delinquent 
loans at a point in time and the actions of ser¬ 
vicers play major roles in the timing of defaults 
and principal recoveries. The period after 2007, 
for example, saw a huge increase in the number 
of seriously delinquent loans outstanding. At 
the same time, servicers (i.e., the entities that 
process borrower payments and manage the 
foreclosure process) were unable to effectively 
manage the huge surge in problem loans. This 
resulted in an enormous backup in the foreclo¬ 
sure pipeline, and led to long lags between the 
time when loans stopped performing and the 
properties were liquidated. 


Legal and political factors also impact lag 
times. Since real estate transactions are gov¬ 
erned by state and local laws, there are differ¬ 
ences in the timing of principal returns based 
on the state in which a loan resides. Some 
states, which are referenced as judicial states, re¬ 
quire that a foreclosure be approved by a judge, 
which typically slows the foreclosure process. 
Foreclosures in nonjudicial states can be pro¬ 
cessed faster, resulting in shorter lags. Also, the 
foreclosure process itself can become a matter 
of controversy. In 2010, for example, problems 
with the legal documentation of foreclosure fil¬ 
ings led to the suspension of foreclosure pro¬ 
ceedings in some states, as well as calls for a 
national foreclosure moratorium. 

Generally speaking, the amount and timing of 
cash flows to the trust are impacted by a variety 
of actions and decisions taken by both borrow¬ 
ers and servicers, and are also influenced by 
exogenous factors. We discuss how these be¬ 
haviors can be understood and modeled later 
in this entry. 

Deal-Specific Factors 

There are also a series of other subtle and ob¬ 
scure factors that can impact the cash flows 
and returns of nonagency securities. Some of 
these factors result from decisions by the ser¬ 
vicer, while others vary depending on how an 
individual transaction's governing documents 
were written. These factors include (but are not 
limited to) the following: 

• Servicers are required to advance principal 
and interest on delinquent loans. However, 
the governing documents of most deals state 
that the servicer is not required to ad¬ 
vance any amount it deems "nonrecover- 
able" through the foreclosure process. The 
interpretation of "recoverability" depends on 
servicers' policies with respect to how long 
they will advance against seriously delin¬ 
quent loans, along with the loan-to-value ra¬ 
tios (LTVs) of properties backing these loans. 
(Since expected recoveries are a function of 
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the current LTV, servicers often will stop ad¬ 
vancing on loans where the current LTV ex¬ 
ceeds a certain threshold.) 

• The treatment of "modified" loans (i.e., loans 
for which the terms were altered in order to 
help borrowers meet their obligations) within 
individual transactions was rarely outlined 
in deals issued prior to the mortgage crisis. 
For example, there have been controversies 
regarding whether "forborne" (i.e., deferred) 
principal resulting from loan modifications 
should be written off immediately (which 
typically benefits the senior bondholders in a 
transaction) or deferred until the point where 
principal losses are realized by the trust, 
which would result in more interest flowing 
to the subordinates. 

* The allocation of losses due to principal and 
interest "shortfalls" can become highly com¬ 
plex and deal-specific, particularly once the 
subordinate bonds in an overcollateralization 
structure are paid off. For example, some 
deals (typically those issued before mid-2005) 
only allow for the balances of senior bonds to 
be reduced by payments actually made by 
borrowers. These structures can experience 
a phenomenon called "negative overcollater¬ 
alization," which means that losses for the 
seniors are "implied." As a result, losses on 
the senior tranches are only realized when 
the collateral pool is entirely paid off and the 
trust is terminated with some bond balances 
still outstanding. 

One conclusion that can be drawn is that in¬ 
vestors in private-label MBS must have the will¬ 

Table 1 Hypothetical Transition Matrix 


ingness and ability to read and understand the 
documents governing their holdings. Events 
and factors that were either not contemplated 
or were viewed as highly improbable can, un¬ 
der adverse conditions, become important in 
determining investor returns. 

UNDERSTANDING THE 
EVOLUTION OF CREDIT 
PERFORMANCE WITHIN 
A TRANSACTION 

As discussed previously, the actions and deci¬ 
sions taken by both borrowers and servicers, 
along with outside environmental factors, de¬ 
termine both the amount and timing of cash 
flows received by the trust. This behavior can 
be conceptualized through the use of transition 
matrices. Such matrices show the probability of 
loans moving from one credit status (or "state") 
to another in any month. This technique is 
often used as a foundation for formally model¬ 
ing voluntary and involuntary speeds. We ad¬ 
dress it here, however, to help conceptualize 
the "life cycle" of a transaction's credit pro¬ 
file. The methodology offers useful techniques 
for demonstrating how the credit problems of 
obligors evolve into delinquencies and defaults 
and flow through a transaction over time. It is 
also useful in describing and quantifying how 
changes in the overall credit environment might 
impact the performance of a loan population. 

Table 1 contains a hypothetical example of a 
roll matrix for a loan population, which can 
be defined either narrowly (e.g., for a single 


T| ("to") State 




Payoff 

Current 

D30 

D60 

D90+ 

Bk 

Fcl 

REO 

Liq 

Total 


Current 

0.6% 

94.6% 

4.6% 

0.0% 

0.0% 

0.1% 

0.0% 

0.0% 

0.0% 

100.0% 


D30 

0.2% 

20.0% 

42.4% 

36.9% 

0.0% 

0.4% 

0.0% 

0.0% 

0.1% 

100.0% 

To 

D60 

0.1% 

2.8% 

8.9% 

34.1% 

52.8% 

0.5% 

0.7% 

0.0% 

0.2% 

100.0% 

("from") 

D90+ 

0.1% 

1.9% 

0.7% 

1.0% 

85.7% 

0.7% 

8.3% 

0.2% 

1.5% 

100.0% 

State 

Bk 

0.1% 

0.1% 

0.3% 

0.2% 

3.7% 

86.8% 

8.3% 

0.4% 

0.1% 

100.0% 


Fcl 

0.1% 

0.7% 

0.1% 

0.0% 

4.2% 

1.3% 

88.7% 

3.4% 

1.5% 

100.0% 


REO 

0.7% 

0.0% 

0.0% 

0.0% 

0.2% 

0.1% 

0.4% 

82.3% 

16.3% 

100.0% 
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Table 2 Applying the Current Population Profile to the Transition Matrix 


A. Current Deal Profile 


Percent of UPB 

Current 

61.1% 

D30 

4.6% 

D60 

2.1% 

D90+ 

16.6% 

Bk 

2.2% 

Fcl 

11.4% 

REO 

2.0% 


B. Multiply Current Performance by Transition Matrix 

T-| ("to") State 




Payoff 

Current 

D30 

D60 

D90+ 

Bk 

Fcl 

REO 

Liq 



Current 

0.3% 

57.8% 

2.8% 

0.0% 

0.0% 

0.1% 

0.0% 

0.0% 

0.0% 



D30 

0.0% 

0.9% 

2.0% 

1.7% 

0.0% 

0.0% 

0.0% 

0.0% 

0.0% 


To 

D60 

0.0% 

0.1% 

0.2% 

0.7% 

1.1% 

0.0% 

0.0% 

0.0% 

0.0% 


("from") 

D90+ 

0.0% 

0.3% 

0.1% 

0.2% 

14.2% 

0.1% 

1.4% 

0.0% 

0.2% 


State 

Bk 

0.0% 

0.0% 

0.0% 

0.0% 

0.1% 

1.9% 

0.2% 

0.0% 

0.0% 



Fcl 

0.0% 

0.1% 

0.0% 

0.0% 

0.5% 

0.1% 

10.1% 

0.4% 

0.2% 



REO 

0.0% 

0.0% 

0.0% 

0.0% 

0.0% 

0.0% 

0.0% 

1.6% 

0.3% 



Subtotal 

0.4% 

59.2% 

5.1% 

2.6% 

15.9% 

2.3% 

11.7% 

2.1% 

0.8% 

98.8% 

Normalized Total" 


59.9% 

5.1% 

2.7% 

16.1% 

2.3% 

11.8% 

2.1% 


100.0% 


"Excluding payoffs and liquidations. 


transaction) or more broadly (to represent a par¬ 
ticular product and vintage). The vertical axis 
of the matrix shows the current (or "from") 
states of the population, while the horizontal 
axis shows the future (i.e., "to") states of the 
loans, typically one month hence. The horizon¬ 
tal axis also allows for two additional states, 
which would represent termination of the loans 
either through "payoff" (i.e., prepaid volun¬ 
tarily) or "liquidation" (involuntarily prepaid). 
Each row must sum to 100%, as every loan in 
the population at time zero must transition to 
some state in the following month. 

The matrix itself can be created through a va¬ 
riety of techniques. In some cases, the matrix 
simply represents historical experience (over ei¬ 
ther a short- or long-term horizon), while other 
analysts use loan-level simulations to generate 
the matrix. Note that not all cells have values 
greater than zero, as some transitions are im¬ 
possible; for example, a loan cannot go from 
current to 60-days delinquent without first re¬ 
siding in the 30-days delinquent bucket. 


Once a transition matrix is created, it can be 
applied to the population's current profile (i.e., 
at time T 0 ) as a means of projecting the popu¬ 
lation's credit performance in a future month. 
Table 2 illustrates the matrix math involved in 
generating the population's profile in month T\, 
treating the To profile as a 1 x7 matrix shown in 
Table 2(A) to be multiplied times the 7x9 tran¬ 
sition matrix in Table l. 3 Table 2(B) shows the 
resulting profile one month hence (i.e., at time 
Ti) after summing each column, along with the 
percentage of loans that drop out of the pop¬ 
ulation through voluntary or involuntary pre¬ 
payment. The remaining population profile is 
then normalized by dividing the percentages 
of remaining loans in each credit state (i.e., ex¬ 
cluding loans that are paid off or liquidated) 
by the remaining percentage in the pool. (In the 
exhibit, 98.8% represents the portion of the pop¬ 
ulation that remains active; the 59.2% of loans 
expected to be current in month T\ is divided 
by this percentage to get the 59.9% normalized 
total.) 
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Age (months) 

■ Liquidated □ 60+ (inch Bk, F/C, & REO) 

□ Paid Off □ Current & 30-59 Days Down 

Figure 1 Projected Long-Term Performance Trends for Subject Deal 


The process can be performed iteratively in 
order to show how the population's profile can, 
given unchanged transition behavior, be ex¬ 
pected to evolve over time. (This means that the 
profile at T i can be multiplied by the transition 
matrix to generate a profile for time T 2 , etc.) 
Figure 1 shows the projected profile of the 
population over the next 10 years by iteratively 
applying the transition matrix in Table 1 to the 
evolving population. The chart indicates that 
liquidated loans (i.e., loans that go into default 
and are removed from the pool through the 
foreclosure process) will comprise the largest 
single cohort in around three years if current 
transition probabilities hold. Moreover, two- 
thirds of the current population can be expected 
to be liquidated in 10 years. 

The iterative calculation can also be used to 
generate projections for voluntary and invol¬ 
untary prepayment speeds. VPR and CDR vec¬ 
tors can then be utilized in yield and cash flow 
calculators. 4 Figure 2 shows the vectors gener¬ 
ated by the analysis over 120 months. 

Interestingly, the vectors are neither constant 
nor linear; note that the CDR vector increases 


fairly steadily for the first few years before level¬ 
ing off around month 60. This pattern highlights 
the intrinsic nature of population transitions. 
Loans flow through the different credit states 
at varying rates that are a function of transition 
probabilities captured by the matrix. Therefore, 
the levels of VPRs and CDRs over time will vary 
even if transition activity is assumed to remain 
stable. 

However, transition patterns normally do 
vary over time, reflecting changes in the eco¬ 
nomic and lending landscape as well as in the 
actions of servicers. The impact of changing be¬ 
haviors can be captured by altering the transi¬ 
tion matrix at a point in time. For example, a 
move on the part of servicers to more aggres¬ 
sively clean up the foreclosure pipeline would 
be captured in a transition framework by in¬ 
creasing the percentages in "late-stage" transi¬ 
tions (i.e., D90+ to FC, FC to REO, and REO to 
Liq) at a point in the future. Conversely, a full 
foreclosure moratorium (which was discussed 
in 2010) would be taken into account by chang¬ 
ing all probabilities "from" the D90+, FC and 
REO buckets to zero for the expected length 
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1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 

Age (months) 

Figure 2 VPRs and CDRs for Subject Deal Using Unchanged Transition Matrix 


of the moratorium. Finally, improved borrower 
performance would be captured by increasing 
"cures," (i.e., D30 and D60 to Current) while 
decreasing the Current to D30 percentage. The 
updated matrix would be utilized at the point 
when the changes in behavior were expected to 
go into effect. 


Incorporating such changes in servicer and / 
or borrower behavior would result in discon¬ 
tinuities in the VPR and CDR vectors. Along 
with the base vector. Figure 3 shows the pro¬ 
jected CDRs for the subject population if the 
vectors are calculated using transition matri¬ 
ces after month 12 that reflect the "Foreclosure 



Figure 3 CDR Projections for Different Scenarios after 12 Months 
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Cleanup" and "Borrower Improvement" sce¬ 
narios described above. 

THE PROCESS OF 
ESTIMATING 
PRIVATE-LABEL 
MBS RETURNS 

The analysis and valuation of private-label MBS 
is complicated by the need to project and ac¬ 
count for a number of variables over and above 
those required to evaluate agency securities. As 
noted previously, the analysis requires addi¬ 
tional metrics necessary to project the principal 
and interest cash flows paid to the trust, as well 
as how they will be allocated to the different 
tranches under a variety of scenarios. 

The additional complexity associated with 
private-label MBS means that the dominant 
metric used to assess expected returns is loss- 
adjusted yield. This represents the internal rate 
of return (IRR) for a security's projected cash 
flows using the additional factors and variables 
discussed previously after adjusting for the nor¬ 
mal MBS-specific issues such as payment fre¬ 
quency and delay. The increased complexity 
associated with the product means that some 
methodologies, such as total return analysis, 
are infrequently utilized in evaluating credit 
pieces. For example, total return requires the 
estimation of a terminal value at the horizon 
for each scenario being analyzed. The complex¬ 
ity involved in projecting future prices makes 
them, and thus the analysis, quite subjective. 

In the following sections, we illustrate the 
technique described in this entry using a series 
of tranches, as well as the collateral, from a rep¬ 
resentative 2007-vintage hybrid ARM transac¬ 
tion. 5 The three tranches examined include a 
super-senior (SS) tranche with 24.2% original 
credit support; a senior mezzanine (SM) tranche 
(i.e., a bond originally rated triple-A but junior 
in priority to the SS) with original credit sup¬ 
port of 5.25%; and a subordinate ("sub") bond 
or tranche that originally had 3.85% credit en¬ 
hancement. 


Differentiating between Collateral 
and Tranche Losses 

The various factors outlined above have in¬ 
teresting effects and interactions within indi¬ 
vidual transactions with respect to losses. For 
one thing, it is important to differentiate be¬ 
tween losses on a deal's collateral pool (i.e., at 
the trust level) and those impacting individual 
bonds within a transaction. Private-label MBS 
have a variety of internal mechanisms that allo¬ 
cate cash flows and principal losses within the 
structure to tranches having different degrees 
of seniority. Therefore, losses absorbed by indi¬ 
vidual bonds are a function of both the losses 
absorbed by the trust and the amount of credit 
support available to them. 

Figure 4 shows projected losses, as a per¬ 
centage of original face, for both the over¬ 
all collateral pool of the deal as well as the 
three tranches described above. Losses were 
calculated using different loss severity assump¬ 
tions while assuming a constant 4% VPR and 
CDR. (These levels are hypothetical and used 
for illustrative purposes only.) While the line 
showing projected losses on the collateral has 
a linear upward slope, the profile of projected 
losses for the tranches are quite different. For 
example, the SS tranche suffers no losses un¬ 
til severities are greater than 50%, while the 
SM begins to experience losses at severities 
greater than40%. The sub tranche, however, has 
a unique loss profile. It experiences no losses 
until severities exceed 30%, but at that point 
losses spike higher; virtually the entire princi¬ 
pal value of the bond is written off once the 
assumed loss severity reaches 45%. The chart 
highlights a critical conclusion; in addition to 
being different from the collateral, each bond's 
exposure to losses is a function of its place in 
the transaction's capital structure. 

The Interaction of Credit Inputs 

There are also a series of interesting observa¬ 
tions that can be made by comparing the yields 
of the three bonds under a variety of scenarios. 
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Figure 4 Collateral and Bond Projected Losses at Different Loss Severities (assumption: 4% VPR, 4% 
CDR, 12-month lag) 


For the purposes of the analysis, the bonds were 
all run at the hypothetical level of a 10% yield 
to assumptions of 4% VPR, a 6% CDR, a 60% 
loss severity, and a 12-month lag. (This resulted 
in prices of 64-12, 48-00, and 6-22 for the three 
securities.) Using those base-case prices, we ran 
a few representative scenarios in which differ¬ 
ent variables were altered, with the goal of ex¬ 


ploring some of the subtleties of the different 
tranches' returns. 

Figure 5 shows yields on the three securities 
calculated using different CDR projections, as¬ 
suming a constant 4% VPR along with a 60% 
loss severity and a 12-month lag. The yield on 
the SS tranche remains fairly stable (and ac¬ 
tually increases slightly until the CDR reaches 



Figure 5 Projected Yields on Different Tranches Using Different CDR (assumption: 4% VPR, 60% 
severity, 12-month lag) 
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6%). The reason for this behavior is that faster 
CDRs effectively increase the overall rate of pre¬ 
payments to the SS tranche; however, the bond 
does not absorb losses until the 6% CDR level is 
breached due to its credit support. Given the 
tranche's highly discounted dollar price, the 
faster rate of prepayments increases its yield. 
By contrast, yields on the more junior tranches 
decline as CDRs are increased since both bonds 
realized losses once their limited credit support 
is exhausted. 

The rate of voluntary prepayments also in¬ 
fluences returns for some bonds in the trans¬ 
action. Figure 6 shows projected yields on the 
three tranches at different assumed VPRs, us¬ 
ing a constant 6% CDR (and, as before, 60% 
severity and a 12-month lag assumption). While 
their profiles partially reflect the impact of faster 
prepayments on bonds with deeply discounted 
prices, voluntary prepayments also have a sub¬ 
tle impact on nonagency MBS. When voluntary 
prepayments increase, principal is paid back to 
investors at 100% of face value. This means that 
there is less principal outstanding that can later 
go into default, even if the CDR and loss sever¬ 


ity are held constant. As a result, yields for the 
senior tranches are influenced (and in the SM's 
case, strongly so) by the expected voluntary pre¬ 
payment speed. 

By contrast, the yield on the sub tranche class 
is insensitive to changes in the VPR assump¬ 
tion, in part as a result of its place in the deal's 
structure. (Subordinates generally don't receive 
voluntary prepayments in an overcollateraliza¬ 
tion structure unless the deal "steps down," 
which does not happen under these assump¬ 
tions.) Its returns, however, are highly sensi¬ 
tive to the combination of assumptions used 
for CDRs, loss severities, and lags. In partic¬ 
ular, the severity assumption plays a key role 
despite the fact that the bond does not receive 
principal under most scenarios. As a credit IO, 
the tranche's outstanding principal value serves 
as its notional value by dictating how much in¬ 
terest is paid to investors in any single month. 
Since the severity strongly influences how fast 
the tranche's face value is written off, it (along 
with the lag assumption) dictates how long the 
bond will remain outstanding and thus how 
much interest investors can expect to receive. 



Figure 6 Projected Yields on Different Tranches Using Different VPR (assumption: 6% CDR, 60% 
severity, 12-month lag) 






Analysis of Nonagency Mortgage-Backed Securities 


39 


Evaluating Available Credit Support 

Before evaluating projected yields and cash 
flows for a tranche, a prudent step is to assess 
the security's remaining credit support rela¬ 
tive to the expected level of losses. The objec¬ 
tive is to evaluate whether a bond's remaining 
credit support (i.e., the amount and proportion 
of bonds junior in priority) is adequate given 
the losses that the transaction is expected to ab¬ 
sorb. The following discussion outlines a simple 
yet useful methodology for gauging a security's 
credit support relative to expected losses by us¬ 
ing its current performance profile. 

The analysis begins by evaluating a transac¬ 
tion's capital structure. Table 3(A) shows the 
original and current credit structure of a deal. 


(While hypothetical, the deal's structure and 
profile is representative of transactions issued 
in 2006 and 2007.) The next step, shown in 
Table 3(B), uses a simple technique to estimate 
future cumulative losses for the transaction. 
Utilizing the current performance profile of 
the transaction, each performance cohort is as¬ 
signed a probability of ultimate default, along 
with an assumed loss severity. The example 
uses a 10% estimate of ultimate default on 
current loans, a 50% estimate for loans that 
are D30, while 100% of loans that are seriously 
delinquent (D90, FC, and REO) are expected 
to ultimately default. (Note that loans in 
bankruptcy are not included in this calculation 
since they are generally captured in other 


Table 3 Calculating "Coverage Ratios" for Tranches in a Transaction 


A. Original and Current Deal Credit Structure 


Tranche 

Orig. Rating 

Orig. C/E 

Curr. C/E 

Curr. Factor 

At (super senior) 

AAA 

25.0% 

23.2% 

0.6950 

A2 (senior mezz) 

AAA 

7.5% 

4.5% 

0.6950 

Ml 

AA 

4.0% 

2.4% 

1.0000 

M2 

A 

3.5% 

1.6% 

1.0000 

M3 

BBB 

3.0% 

0.8% 

1.0000 

M4 

BB 

2.5% 

0.0% 

1.0000 

M5 

B 

1.5% 

0.0% 

0.0180 

M6 

NR 

0.0% 

n/a 

0.0000 

B. Current Credit Profile of Transaction 



Eventual 

Assumed 


Performance 

UPB 

Default 

Severity 

Expected Loss 

Current 

63.1% 

10% 

75% 

4.73% 

D30 

4.4% 

50% 

75% 

1.65% 

D60 

3.1% 

90% 

75% 

2.11% 

D90 

14.6% 

100% 

75% 

10.96% 

FC 

12.8% 

100% 

75% 

9.61% 

REO 

2.0% 

100% 

75% 

1.48% 

Total 




30.54% 

C. Calculating Coverage Ratio 





Coverage Ratio 

Tranche 


Curr. C/E 


(curr. CE/expected loss) 

A1 (super senior) 


23.2% 


0.760 

A2 (senior mezz) 


4.5% 


0.147 

Ml 


2.4% 


0.079 

M2 


1.6% 


0.052 

M3 


0.8% 


0.026 

M4 


0.0% 


0.000 

M5 


0.0% 


0.000 

M6 


n/a 


n/a 
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delinquency buckets.) Each delinquency cohort 
is multiplied by its assigned percentages and 
the loss severity assumption; the sum of these 
figures represents the percentage of losses that 
the deal will ultimately be expected to absorb. 

The final step is to divide each tranche's cur¬ 
rent credit support percentage by the transac¬ 
tion's total expected losses, as shown in Table 
3(C). This coverage ratio measures how much 
credit support is available to each tranche if 
the expected losses are eventually realized. In 
the example, the 76% coverage ratio of the At 
tranche suggests that the bond is likely to ex¬ 
perience significant future losses, despite its 
current sizeable cushion. Expected losses will 
probably also be large enough to eventually 
cause the other outstanding bonds in the capi¬ 
tal structure (i.e., the A2 down to the M5) to be 
entirely written down. 

While this analysis serves as a useful first step 
in evaluating individual tranches, it is limited 
by its simplistic approach. The default percent¬ 
ages assigned to each credit bucket are arbi¬ 
trary, and also cannot account for changes in 
the credit environment. It also doesn't take into 
account the issue of time, that is, when losses 
will accrue and bonds will be written down. 
This limits its usefulness in evaluating credit 
IOs and more junior securities. Finally, the anal¬ 
ysis doesn't take some forms of credit support, 
such as excess spread and insurance wraps, into 
account. 

Despite its limitations, however, the method¬ 
ology serves as a useful first step in evaluating 
the credit enhancement currently supporting a 
tranche. In addition, investors evaluating po¬ 
tential purchases of newer securities will find 
this and related techniques particularly help¬ 
ful in evaluating both the adequacy of a bond's 
credit support and whether it is vulnerable to a 
downgrade by the rating agencies. 

Yield and Loss Matrix Analysis 

As noted previously, the complexities associ¬ 
ated with the product have made loss-adjusted 


yield the primary metric for evaluating and 
comparing credit-related MBS. However, stan¬ 
dard yield matrices must be altered in order to 
account for the numerous additional inputs and 
outputs necessary to properly evaluate private- 
label MBS. The additional inputs include 
separate entries for voluntary and involuntary 
prepayments, along with the inclusion of ex¬ 
pected loss severities, lags, and servicer ad¬ 
vances. In some cases, the analysis must also 
account for the presence of insurance wraps and 
how long they might remain in place; expec¬ 
tations for how long servicers will continue to 
advance principal and interest; and whether the 
deal will pass its triggers (i.e., the tests that dic¬ 
tate cash flow distributions within individual 
transactions). 

In addition, a number of additional outputs 
are necessary in order to assess a bond's value. 
In addition to average life, spreads, and dura¬ 
tions, investors need to assess expected losses 
on both the tranche and the deal's collateral at 
different levels of the inputs. Also useful are the 
points in time, if applicable, that the bond will 
experience its first principal loss, along with the 
amount of liquidations and losses previously 
realized. 

Table 4 contains examples of yield matrices 
that might be used to evaluate the super-senior 
and senior mezzanine tranches introduced in an 
earlier section. Table 4(A) and (B) shows tables 
for the SS (super senior) tranche and the senior 
mezzanine (SM) tranche, respectively, priced 
(as before) at a 10% loss-adjusted yield to a 4% 
VPR/6% CDR base assumption. The tables in 
the exhibit show loss-adjusted yields and credit 
performance data for a range of CDRs, while 
holding the other variables (i.e., VPR, loss sever¬ 
ity, and lag) constant. In addition to yields and 
average lives, the matrices show the durations, 
the dates of the first writedown, and the per¬ 
centages of bond and collateral losses at the dif¬ 
ferent CDR assumptions (which are the same 
for both tranches in this case). 

However, the necessity of holding multiple 
inputs constant makes this format somewhat 
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Table 4 Example of Yield Tables for Private-Label MBS Tranches Pricing at 10% Yield at 4% VPR/6% CDR, 60% 
Severity, and a 12-Month Lag 


A. Super-Senior Tranche (Px 64-12) 


VPR 

4 

4 

4 

4 

4 

4 

CDR 

2 

4 

6 

8 

10 

12 

Yield 

9.725 

9.978 

9.977 

9.711 

9.310 

8.835 

WAL 

10.39 

9.96 

8.93 

7.95 

7.11 

6.4 

Duration 

6.49 

6.09 

5.71 

5.36 

5.05 

4.77 

First-Loss Dt 

N/A 

09/25/2032 

08/25/2023 

11/25/2019 

12/25/2017 

09/25/2016 

% Tranche Loss (orig. face) 

0.0 

0.4 

3.7 

7.6 

11.1 

13.9 

% Collat. Loss (orig. face) 

9.0 

15.5 

20.4 

24.0 

26.8 

28.9 

% Collat. Loss (curr. face) 

13.0 

22.5 

29.6 

34.8 

38.8 

41.9 

B. Senior Mezzanine Tranche (Px 48-00) 

VPR 

4 

4 

4 

4 

4 

4 

CDR 

2 

4 

6 

8 

10 

12 

Yield 

15.066 

14.406 

9.995 

3.784 

-2.785 

-9.137 

WAL 

10.38 

6.69 

4.1 

3.01 

2.46 

2.13 

Duration 

5.18 

4.37 

3.59 

3.08 

2.74 

2.52 

First Loss Dt 

N/A 

06/25/2019 

10/25/2015 

05/25/2014 

08/25/2013 

04/25/2013 

% Tranche Loss (orig. face) 

0.0 

18.8 

34.0 

41.2 

45.1 

47.6 

% Collat. Loss (orig. face) 

9.0 

15.5 

20.4 

24.0 

26.8 

28.9 

% Collat. Loss (curr. face) 

13.0 

22.5 

29.6 

34.8 

38.8 

41.9 


awkward and time-consuming. For example, 
the tables would need to be recalculated mul¬ 
tiple times in order to account for other as¬ 
sumptions for VPRs, loss severities, and lags. 
An alternative and somewhat more flexible 
scheme displays two variables as the axes, with 
yields and/or bond losses as the output (creat¬ 
ing three-dimensional "surfaces" of yields and 
losses). Table 5 contains a matrix for the SM 
tranche showing VPRs on the vertical axis and 
CDRs on the horizontal, while holding the loss 
severity and lag assumptions constant. As with 
other forms of matrices, however, this format 


is also limited to showing two variables at any 
one time. Additional matrices would need to 
be constructed in order to display different fac¬ 
tors, depending on how relevant they were to 
the analysis. 

Model-Generated Analysis 

The variables used in the above analysis can be 
generated in a variety of ways, depending on 
both investors' practices and the prevailing cir¬ 
cumstances. During periods of relatively stable 
credit and housing performance, for example. 


Table 5 Yield and Bond Loss Matrix for Senior Mezzanine Tranche at Base-Case Price 


VPR 


2 

13.420/0.0% 

12.166/25.1% 

7.081/39.9% 

0.309/46.0% 

-6.665/49.2% 

-13.249/51.1% 

4 

15.067/0.0% 

14.402/18.8% 

9.992/34.1% 

3.781/41.2% 

-2.786/45.1% 

-9.137/47.6% 

6 

16.893/0.0% 

16.681/13.9% 

12.847/29.1% 

7.128/36.9% 

0.922/41.4% 

-5.209/44.4% 

8 

18.890/0.0% 

19.010/9.9% 

15.681/24.8% 

10.396/33.1% 

4.515/38.1% 

-1.409/41.4% 

10 

21.050/0.0% 

21.394/6.6% 

18.515/21.1% 

13.621/29.7% 

8.033/35.0% 

2.306/38.6% 

12 

23.368/0.0% 

23.843/4.0% 

21.368/17.9% 

16.830/26.6% 

11.509/32.2% 

5.968/36.0% 


2 4 6 8 10 12 


CDR 
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some investors may choose to simply utilize re¬ 
cent history for inputs such VPRs, CDRs, and 
loss severities, while making subjective adjust¬ 
ments based on an examination of the transac¬ 
tion's current collateral profile. 

Alternatively, some investors may choose to 
utilize more sophisticated analysis, which can 
incorporate both the attributes of a deal's col¬ 
lateral along with exogenous economic and 
market variables. The models can be further 
incorporated into integrated systems that gen¬ 
erate yield and loss figures while simultane¬ 
ously analyzing and stratifying the collateral. 
Partial output from such an integrated sys¬ 
tem is shown in Table 6. The exhibit shows 
a yield matrix from Vichara Technology's sys¬ 
tem for the SS tranche. 6 The matrix shows a 
variety of outputs at different multiples of the 
prepayment and default models, assuming un¬ 
changed home prices and interest rates. In ad¬ 
dition, separate tables generated by the analysis 
(not shown) display the current credit structure 
of the deal, the tranche's cash flows, and analy¬ 
ses of the collateral. (The model also allows for 
the generation of a "credit OAS," although this 
metric is not widely utilized by investors at this 
writing due to its sensitivity to modeling error.) 

Additional analysis can be generated for 
different home price appreciation (HPA) and 
interest rate assumptions. For example, a 
conservative set of assumptions might call 
for a 100 basis point parallel increase in rates 
accompanied by a 10% immediate decline in 
home prices. In addition, models for HPA that 
project different appreciation rates based on 
geographic and economic factors can also be 
utilized. 

In the case of private-label securities, the 
normal challenge of assessing a model's "rea¬ 
sonableness" is complicated by the interactive 
nature of the variables. Unlike agency securi¬ 
ties, where "model-equivalent CPRs" can be 
easily estimated (i.e., the bond's average life 
is iteratively calculated at various CPRs until 
it equals the model's calculated WAL), the 
division of prepayments into voluntary and 
involuntary categories means that a model- 


equivalent CDR cannot be calculated unless 
the VPR is held constant, and vice versa. This 
necessitates the need for additional output in 
order to view and judge the model's VPR and 
CDR projections. 

Interpreting the Outputs 

The analysis and valuation of most securities 
(and virtually all fixed income investments) can 
be broadly summarized as assessing the "cor¬ 
rect" level of expected returns given both mar¬ 
ket conditions and the bond's risks. This means 
that a number of factors need to be evaluated, 
including: 

• The security's base-case yields and returns. 

• Its returns in best- and worst-case scenarios. 

• The likelihood of different scenarios being 

realized. 

The relative complexity of analyzing private- 
label MBS, particularly compared to evaluat¬ 
ing agency-backed securities, results from both 
the multiplicity of factors influencing returns as 
well as the many exogenous elements that drive 
these factors. 

For example, a cursory evaluation of the yield 
matrix for the SM tranche (contained in Table 
4(B)) indicates that that the bond's projected 
yields decline rapidly as CDRs are increased. 
However, the matrix in Table 5 also shows 
that the tranche's yields remain relatively high 
if VPRs increase commensurately with CDRs 
(i.e., in the lower-right quadrant of the matrix). 
Alternatively, its projected yields are negative 
when higher CDRs are paired with lower VPRs 
(in the upper-right quadrant), while yields 
greater than 20% can be achieved with a com¬ 
bination of fast VPRs and slowing CDRs (the 
lower-left quadrant). If an investor decides that 
the combination of VPRs and CDRs in the 
upper-right quadrant represents a likely sce¬ 
nario, the negative yields projected for such 
scenarios indicates that the base-case yield as¬ 
sumption is too low to compensate investors for 
the risks being accepted. 

Utilizing just these two variables, the analy¬ 
sis requires investors to assess the returns of 
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Table 6 Partial Output of Integrated Model for SS Bond 


HPI FLAT/+0 IR Shock Scenario 


Percent of 

Prepay Model 

Analytics 

Percent of Default Model 

75% 

100% 

125% 


Yield 

10.561 

8.416 

7.295 


Price 

64.38 

64.38 

64.38 


WAL 

5.454 

5.295 

4.979 


MDuration 

3.661 

3.878 

3.855 


Convexity 

0.263 

0.303 

0.308 


Present Value 

157,012,886 

157,012,886 

157,012,886 

75% 

Present Value + Accrued 

157,021,422 

157,021,422 

157,021,422 


Collateral Loss % 

37.84% 

44.74% 

47.97% 


Bond Collateral Loss 

37.84% 

44.74% 

47.97% 


Bond Principal Window 

1-333 

1-356 

1-379 


Bond Principal Writedown 

31,919,013 

51,373,711 

61,802,013 


First Period Writedown 

37 

31 

27 


Bond Principal Writedown 

13.09% 

21.06% 

25.34% 


Total Interest Shortfall 

— 

— 

— 


Yield 

14.344 

10.668 

8.359 


Price 

64.38 

64.38 

64.38 


WAL 

4.252 

4.336 

4.268 


MDuration 

2.695 

3.062 

3.250 


Convexity 

0.144 

0.186 

0.216 


Present Value 

157,012,886 

157,012,886 

157,012,886 

100% 

Present Value + Accrued 

157,021,422 

157,021,422 

157,021,422 


Collateral Loss % 

29.62% 

39.49% 

45.29% 


Bond Collateral Loss 

29.62% 

39.49% 

45.29% 


Bond Principal Window 

1-328 

1-355 

1-388 


Bond Principal Writedown 

17,307,456 

40,600,481 

56,914,682 


First Period Writedown 

40 

30 

27 


Bond Principal Writedown 

7.10% 

16.65% 

23.33% 


Total Interest Shortfall 

— 

— 

— 


Yield 

17.718 

14.530 

10.856 


Price 

64.38 

64.38 

64.38 


WAL 

3.495 

3.576 

3.631 


MDuration 

2.183 

2.379 

2.650 


Convexity 

0.094 

0.112 

0.140 


Present Value 

157,012,886 

157,012,886 

157,012,886 

125% 

Present Value + Accrued 

157,021,422 

157,021,422 

157,021,422 


Collateral Loss 

24.23% 

32.30% 

40.38% 


Bond Collateral Loss 

24.23% 

32.30% 

40.38% 


Bond Principal Window 

1-328 

1-356 

1-394 


Bond Principal Writedown 

9,427,796 

26,037,741 

45,698,364 


First Period Writedown 

44 

32 

27 


Bond Principal Writedown 

3.87% 

10.68% 

18.74% 


Total Interest Shortfall 

— 

— 

— 


Source: Vichara Technologies. Analysis utilizes deal libraries of Intex Solutions, and models and data provided by 
CoreLogic. 


potential investments in a range of different 
prepayment and default scenarios with vary¬ 
ing degrees of plausibility. Further inquiries 
should be made regarding expected principal 
losses on the investment under the assumed 


scenarios, taking the availability and adequacy 
of credit support into account. The sensitivity of 
the bond's returns to changes in other relevant 
factors must then be examined. As an example, 
expectations for real estate prices will directly 
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impact expected loss severities, which will in 
turn affect an investor's willingness to buy se¬ 
curities that are more junior in priority. Another 
example relates to the state of the foreclosure 
pipeline and its influence on lags. During much 
of 2009 and 2010, the backup in the foreclosure 
pipeline meant that buying credit IOs, which 
benefited from the extended lag, was a prof¬ 
itable strategy as long as servicers continued to 
advance P&I. 

The most difficult aspect of the analysis is gen¬ 
erating expectations for factors that are difficult 
or impossible to quantify. The previous example 
of the value of credit IOs serves as an example. 
In addition to the dearth of significant informa¬ 
tion from servicers (who treat much of the in¬ 
formation as having proprietary value), certain 
factors simply defy quantification. In addition, 
investors must continuously check their analy¬ 
sis to be certain that they understand what fac¬ 
tors are driving their results. This means that 
the sort of analyses performed earlier in this 
entry (particularly in the section describing the 
interaction of factors) is highly useful in devel¬ 
oping intuitions for how bonds can be expected 
to perform under varying conditions. 

Note that this entry's discussions were fo¬ 
cused on the evaluation of legacy bonds, that is, 
private-label MBS issued in the period prior to 
mid-2007. The techniques described in this en¬ 
try, however, can also be used to evaluate newly 
issued securities, although some adjustments to 
the methodologies might need to be made. In¬ 
vestors analyzing the adequacy of credit sup¬ 
port using the "coverage ratio" methodology 
demonstrated in Table 3, for example, would 
need to replace the use of a transaction's current 
credit profile with alternative ways of predict¬ 
ing future losses. 

Finally, noticeably absent from these discus¬ 
sions were any mention of the rating agen¬ 
cies. Bond ratings cannot and should never 
substitute for rigorous analysis, as investors 
that experienced the post-2007 credit meltdown 
can attest. Ratings are relevant mainly due to 
constraints and restrictions on the holdings of 
regulated investors; when bond holdings are 


downgraded to below investment grade, many 
investors are forced to liquidate them, causing 
their prices to crater. Techniques similar to the 
coverage ratios outlined previously can be used 
to monitor the adequacy of bonds' credit sup¬ 
port and identify bonds that are vulnerable to 
being downgraded. 


KEY POINTS 

• In the analysis of agency MBS, since the gov¬ 
ernment backing of these securities eliminates 
investors' exposure to principal writedown, 
the focus is on estimating the timing of prin¬ 
cipal cash flows. 

• Private-label securities require layers of addi¬ 
tional analysis because of the introduction of 
a series of additional factors that determine 
the bond's cash flows and thus their pro¬ 
jected returns. These factors can be broadly 
characterized as (1) the amount of principal 
expected to be returned, (2) the timing of 
principal returns, and (3) the allocation of 
principal within the transaction. 

• The issue of how much principal is projected 
to be received as a result of prepayments is a 
straightforward function of the assumed de¬ 
fault rate and loss severity. Loss severity is 
measured as the percentage of the defaulted 
principal that will ultimately not be returned 
to the investment trust and of this measure is 
the recovery percentage. 

• The amount and timing of cash flows to the 
trust are impacted by a variety of actions and 
decisions taken by both borrower and ser¬ 
vicers, and are also influenced by exogenous 
factors. 

• The analysis and valuation of private-label 
MBS is complicated by the need to project 
and account for a number of variables over 
and above those required to evaluate agency 
securities, requiring additional metrics neces¬ 
sary to project the principal and interest cash 
flows paid to the trust, as well as how they 
will be allocated to the different tranches, un¬ 
der a variety of scenarios. 


Analysis of Nonagency Mortgage-Backed Securities 


45 


• The additional complexity associated with 
private-label MBS means that the dominant 
metric used to assess expected returns is loss- 
adjusted yield. 

• Before evaluating projected yields and cash 
flows for a tranche, a prudent step is to as¬ 
sess the security's remaining credit support 
relative to the expected level of losses. The 
objective is to evaluate whether a bond's re¬ 
maining credit support (i.e., the amount and 
proportion of bonds junior in priority) is ad¬ 
equate given the losses that the transaction is 
expected to absorb. 

• The relative complexity of analyzing private- 
label MBS, particularly compared to evalu¬ 
ating agency-backed securities, results from 
both the multiplicity of factors influencing re¬ 
turns as well as the many exogenous elements 
that drive these factors. The most difficult as¬ 
pect of the analysis is generating expectations 
for factors that are difficult or impossible to 
quantify. 

NOTES 

1. For an explanation of nonagency MBS, see 
Fabozzi (2005) and Fabozzi, Bhattacharya, 
and Berliner (2011). 


2. See Chapter 4 in Fabozzi, Bhattacharya, and 
Berliner (2011). 

3. The example uses the common notation 
where loans that are 30 to 59 days delinquent 
are shown as D30, loans that are 90 or more 
days delinquent are D90+, and so on. "Pay¬ 
off" accounts for loans that are voluntarily 
prepaid; "Liq" are seriously delinquent loans 
that are liquidated, with Ti representing the 
month when recoveries are received by the 
trust. 

4. The vectors technically are not the equivalent 
of VPRs and CPRs since they don't account 
for the effects of amortization on the cash 
flows. 

5. The analysis utilized CWALT 07-HY8C Al, 
A2, and Ml. 

6. The system utilizes the deal libraries of Intex 
Solutions; the analysis shown used models 
and data provided by CoreLogic. 
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Abstract: The valuation of residential mortgage-backed securities begins with a projection of a 
subject security's cash flow. The monthly cash flow from the underlying pool of mortgage loans 
includes three components: (1) scheduled principal payments (also referred to as amortization), 
(2) interest payments, and (3) any prepayments. Prepayments are any payments made by borrow¬ 
ers that are in excess of the scheduled principal payment. Consequently, the cash flow depends 
on the prepayment behavior of the borrowers in the mortgage pool. In addition to prepayments, 
the expected credit performance of the underlying loans must be projected to estimate a residen¬ 
tial mortgage-backed securities cash flow. The sharp deterioration in mortgage performance that 
emerged in late 2006 led to the realization that prepayments and defaults often had related effects 
on the performance of these securities, even though they represent very different phenomena. As 
a result, new terminology has emerged to clarify the different circumstances that result in the early 
return of principal to investors. Understanding the terms used in the market to define prepayments 
and default experience, as well as the methodologies used to generate these metrics, is important 
for the following reasons: efficient risk-based pricing at the origination level; evaluation of relative 
value within the residential mortgage-backed securities sector (as well as across the fixed income 
universe); effective hedging and management of prepayment and credit risk exposure; and ex post 
performance attribution. 


47 




48 


Mortgage-Backed Securities Analysis and Valuation 


Securities backed by a pool of residential mort¬ 
gage loans, referred to as mortgage-backed se¬ 
curities (MBS) or mortgage-related securities, 
have complex cash flow characteristics com¬ 
pared to the traditional government, corpo¬ 
rate or municipal security Residential MBS are 
classified as agency MBS and nonagency MBS. 
The former include MBS issued by Ginnie 
Mae (a federally-related government entity) 
and two government-sponsored enterprises 
(Fannie Mae and Freddie Mac). Residential 
MBS not issued by agency MBS are called nona¬ 
gency or private label MBS. In turn nonagency 
MBS are categorized based on the credit qual¬ 
ity of the underlying borrower or lien. There are 
nonagency MBS backed by prime loans, along 
with those backed by borrowers with blem¬ 
ished credit histories or an inferior lien on the 
mortgaged property (e.g., a second mortgage 
lien). The latter nonagency MBS are generically 
referred to as subprime MBS. 1 

Complicating the cash flows projection of a 
residential MBS is that borrowers can prepay 
their loans and will in fact do so for a variety of 
reasons. Such prepayments can occur for a vari¬ 
ety of reasons. Virtually all mortgage loans have 
a "due on sale" clause, which means that the re¬ 
maining balance of the loan must be paid when 
the house is sold. Existing mortgages can also 
be refinanced by the obligor if the prevailing 
level of mortgage rates declines, or if a more at¬ 
tractive financing vehicle is proposed to them. 
In addition, homeowners can make partial pre¬ 
payments on their loan, which serve to reduce 
the remaining balance and shorten the loan's 
remaining term. Prepayments strongly impact 
the returns and performance of MBS, and in¬ 
vestors devote significant resources to studying 
and modeling them. 

For the holder of a mortgage-related security 
asset, the borrower's prepayment option cre¬ 
ates a unique form of risk. In cases where the 
obligor refinances the loan in order to capital¬ 
ize on a drop in market rates, the investor has 
a high-yielding asset pay off, and it can be re¬ 
placed only with an asset carrying a lower yield. 


Prepayment risk is analogous to "call risk" for 
corporate and municipal bonds in terms of 
its impact on returns, and also creates uncer¬ 
tainty with respect to the timing of investors' 
cash flows. In addition, changing prepayment 
"speeds" due to interest rate moves causes vari¬ 
ations in the cash flows of mortgages and se¬ 
curities collateralized by mortgage products, 
strongly influencing their relative performance 
and making them difficult and expensive to 
hedge. 

Prepayments are phenomena resulting from 
decisions made by the borrower and/or the 
lender and occur for the following reasons: (1) 
sale of the property (due to normal mobility, as 
well as death and divorce); (2) destruction of 
the property by fire or other disaster, (3) default 
on the part of the borrower, and (4) refinanc¬ 
ing. Prepayments attributable to the first two 
reasons are referred to under the broad rubric 
of "turnover." Turnover rates tend to be fairly 
stable over time, but are strongly influenced by 
the health of the housing market, specifically 
the levels of real estate appreciation and the 
volume of existing home sales. Refinancing ac¬ 
tivity is categorized as either "rate and term" or 
"cash-out" refinancings. Rate-and-term (or "no 
cash") transactions generally depend on a bor¬ 
rower's ability to obtain a new loan with either 
a lower rate or a smaller payment. This activ¬ 
ity is therefore dependent on the level of inter¬ 
est rates, the shape of the yield curve, and the 
availability of alternative loan products. These 
factors also impact cash-out activity, although 
the primary driver of cash-out refinancings re¬ 
mains home price appreciation; the ability to 
borrow additional funds against a property is 
contingent on the property having appreciated 
in price. 

The paradigm in mortgages is thus fairly 
straightforward. Mortgages with low note rates 
(that are "out-of-the-money," to borrow a term 
from the option market) normally prepay 
fairly slowly and steadily, while loans carrying 
higher rates (and are "in-the-money") are prone 
to experience spikes in prepayments due to 
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refinancings when rates decline. In turn, the 
relationship between a loan's note rate and 
the prevailing level of mortgage rates dic¬ 
tates whether the borrower has an incentive to 
refinance. 

It is important to understand how changes 
in prepayment rates impact the performance 
of mortgages and MBS. Since prepayments in¬ 
crease as bond prices rise and market yields 
are declining, mortgages shorten in average 
life and duration when the bond market ral¬ 
lies, constraining their price appreciation. Con¬ 
versely, rising yields cause prepayments to slow 
and bond durations to extend, resulting in a 
greater drop in price than experienced by more 
traditional (i.e., option-free) fixed income prod¬ 
ucts. As a result, the price performance of 
mortgages and MBS tends to lag that of compa¬ 
rable fixed maturity instruments (such as Trea¬ 
sury notes) when the prevailing level of yields 
changes. This phenomenon is generically de¬ 
scribed as negative convexity. The effect of chang¬ 
ing prepayment speeds on mortgage durations, 
based on movements in interest rates, is pre¬ 
cisely the opposite of what a bondholder would 
desire. (Fixed income portfolio managers, for 
example, extend durations as rates decline, and 
shorten them when rates rise.) The price per¬ 
formance of mortgages and MBS is, therefore, 
decidedly nonlinear in nature, and the prod¬ 
uct will underperform assets that do not exhibit 
negatively convex behavior as rates fluctuate. 

Consequently, it is essential for participants 
in the residential MBS market to understand 
the general prepayment and credit performance 
nomenclature. The market is characterized by 
the usage of a variety of terms; some terms de¬ 
scribe general phenomena, while others are spe¬ 
cific to certain types of loan products and assets. 
In this entry, the basic terms used to characterize 
residential mortgage-related prepayments and 
losses are discussed. Our focus is on describing 
the terminology and outlining the methodolo¬ 
gies used in calculating relevant metrics, not 
on the determinants of prepayment and default 
behavior. 


PREPAYMENT 

TERMINOLOGY 

For fixed-rate fully amortizing assets, such as, 
home equity loans (FIELs), and manufactured 
housing loans (MFIs), the monthly scheduled 
payment (consisting of scheduled principal 
and interest) is constant throughout the amor¬ 
tization term. If the borrower pays more than 
the monthly scheduled payment, the extra 
payment will be used to pay down the out¬ 
standing balance faster than the original amor¬ 
tization schedule, resulting in a prepayment (or, 
as it is sometimes referenced, an unscheduled 
principal payment). If the outstanding balance 
is paid off in full, the prepayment is a complete 
prepayment; if only a portion of the outstand¬ 
ing balance is prepaid, the prepayment is called 
either a partial prepayment or curtailment. Pre¬ 
payments can be the result of natural turnover, 
refinancings, defaults, partial paydowns, and 
credit-related events. 

The evaluation of prepayments is further 
complicated by the fact that there is an in¬ 
terplay between defaults, which are effectively 
credit-related prepayments, and prepayments 
attributable specifically to declining interest 
rates. In agency MBS (i.e., pools issued by 
Ginnie Mae, Fannie Mae, and Freddie Mac) 
there have at times been large numbers of se¬ 
riously delinquent loans in pools for which 
Freddie Mac and Fannie Mae continued to 
pay interest and scheduled principal. In 2010, 
however, the two government-sponsored enter¬ 
prises (Fannie Mae and Freddie Mac) changed 
their policies and began buying loans that were 
120 days or more delinquent out of pools. These 
buyouts initially resulted in a surge in prepay¬ 
ment speeds. Moreover, the new policy meant 
that pools containing large numbers of lower- 
quality loans would tend to experience con¬ 
sistently faster prepayment speeds than those 
pools backed by better-credit loans. 

Flowever, for private-label MBS, prepay¬ 
ments resulting from credit events must be 
treated differently than those attributable to 
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refinancings. This is because a default means 
that the investor will probably not receive the 
entire amount of the defaulted principal, but 
only the amount recovered after the foreclosure 
process is completed. Moreover, the timing of 
payments is also at issue. There is typically a 
sizeable delay between the time a borrower be¬ 
comes delinquent on a loan and its ultimate 
liquidation. This has resulted in the conven¬ 
tion where prepayments in private-label securi¬ 
ties are separated into voluntary and involuntary 
prepayments. Voluntary prepayments occur as 
a result of a refinancing, the sale of the prop¬ 
erty, or other events (e.g., the death of the prop¬ 
erty owner) where the full principal amount is 
paid immediately to the bondholder. Involun¬ 
tary prepayments occur as a result of a credit 
event, for which both the timing and net prin¬ 
cipal received are uncertain. 

Prepayments and defaults can be analyzed 
on both the loan and pool level. Loan-level 
prepayment analysis, which requires detailed 
loan-level information, is more accurate than 
pool-level prepayment analysis, but is also 
more computationally intensive. Additionally, 
this type of analysis allows the inclusion of 
specific obligor and property characteristics as 
determinants of prepayments and defaults. 
Loan-level analysis involves tracking defaults 
and prepayments on an individual loan ba¬ 
sis, projecting each loan's cash flows, and com¬ 
bining these amounts to calculate aggregated 
metrics. Due to the diversity of the characteris¬ 
tics of the underlying loans in most deals, loan 
level analysis is generally more accurate and 
has greater predictive capabilities. 


CALCULATING 
PREPAYMENT SPEEDS 

The first critical step in calculating prepayment 
speed is to define a prepayment. For the pur¬ 
poses of this discussion, a prepayment is de¬ 
fined as the early return of principal to the 


investor. By definition, this means that amor¬ 
tization (or scheduled principal payments) 
must be excluded from the calculation, leav¬ 
ing only unscheduled principal payments to be 
analyzed. 

Conditional Prepayment Rate 

The approach most commonly used to generate 
prepayment speeds is to calculate monthly pre¬ 
paid principal as a percentage of the security's 
outstanding balance and then annualize that 
percentage. Most current approaches to prepay¬ 
ment calculations either quote this annualized 
periodic speed, known as the conditional pre¬ 
payment rate (CPR) directly or use it as an in¬ 
put to generate other quotation benchmarks. 2 
This methodology is useful in that it allows an¬ 
alysts to both calculate the historical prepay¬ 
ment experience of a security, as well as project 
prepayment speeds (and thus a security's cash 
flows) into the future. When used as part of 
a model to generate projected cash flows, the 
CPR calculation assumes that some fraction of 
the unpaid principal balance (or UPB) of the 
pool is prepaid each month for the remaining 
term of the mortgage. The advantages of this 
approach are its simplicity and its flexibility. 
For example, changes in economic conditions 
that impact prepayment rates or changes in the 
historical prepayment pattern of a pool can be 
analyzed quickly. In addition, the CPR can be 
used as an input to other models and quotation 
mechanisms, as noted already. 

The CPR is an annual rate. Flowever, be¬ 
cause mortgage cash flows are a monthly 
phenomenon, calculating the CPR requires the 
generation of a monthly prepayment rate, 
called the single monthly mortality rate (SMM). 
The SMM is the most fundamental mea¬ 
sure of prepayment speeds. SMM measures the 
monthly prepayment amount as a percentage 
of the previous month's outstanding balance 
minus the scheduled principal payment. Math¬ 
ematically, the SMM is calculated as follows: 
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SMM = 


Total payment, including prepayments — Scheduled interest payment — Scheduled principal payment 
[Unpaid principal balance — Scheduled principal payment] 


For example, if the pool balance at month zero 
is $10,000,000, assuming an interest rate of 12%, 
the scheduled principal and interest payments 
are $2,861.26 and $100,000 in month one, re¬ 
spectively. If the actual payment received by 
investors in month one is $202,891.25, the SMM 
rate is 1%, calculated as 

, ($202,891.25 - $100,000 - $2,861.26) „ 

($10,000,000 - $2,861.26) 

Therefore, if a mortgage loan prepaid at 1% 
SMM in a particular month, this means that 1% 
of that month's scheduled balance (last month's 
outstanding balance minus the scheduled prin¬ 
cipal payment) has been prepaid. 

Given the SMM, a CPR can be computed us¬ 
ing the following formula: 

CPR = 1 - (1 - SMM) 12 
For example, if the SMM is 1%, then the CPR is 
CPR = 1 - (0.99) 12 = 11.36% 

Conversely, CPRs can be converted into 
SMMs (and thus be used to generate monthly 
cash flows) through the following formula: 

SMM = 1 - (1 - CPR) 1/12 


For example, suppose that the CPR used to es¬ 
timate prepayments is 6%. The corresponding 
SMM is 

SMM = 1 - (1 - 0.06) 1/l2 = 1 - o.94 0 08333 = 0.5143% 

PSA Prepayment Benchmark 

The Public Securities Association (PSA) prepay¬ 
ment benchmark is expressed as a monthly se¬ 
ries of annual prepayment rates. 3 The basic PSA 
model assumes that prepayment rates are low 
for newly originated mortgages and then in¬ 
crease linearly as the mortgages age or season. 

The PSA standard benchmark assumes the 
following prepayment rates for 30-year mort¬ 
gages: 

1. A CPR of 0.2% for the first month, increased 
by 0.2% per year per month for the next 29 
months when it reaches 6% per year. 

2. A 6% CPR for the remaining years. 

This benchmark, referred to as "100% PSA" 
or simply "100 PSA," is graphically depicted in 
the middle graph in Figure 1. Mathematically, 
100 PSA can be expressed as follows: 

If t < 30 then CPR = 6% x (f/30) 

If t > 30 then CPR = 6% 


150% PSA 
100% PSA 
50% PSA 


0 30 

Mortgage Age (months) 



Figure 1 Graphical Depiction of 50 PSA, 100 PSA, and 300 PSA 
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where t is the number of months since the mort¬ 
gage was originated. Since the CPR prior to 
month 30 rises at a constant rate, this period is 
sometimes referred to as the "ramp," and loans 
are considered to be "on the ramp" when they 
are less than 30 months old. 

Slower or faster speeds are then referred to 
as some percentage of PSA. For example, 50 
PSA means one-half the CPR of the PSA bench¬ 
mark prepayment rate; 150 PSA means 1.5 times 
the CPR of the PSA benchmark prepayment 
rate; 300 PSA means three times the CPR of 
the benchmark prepayment rate. This is illus¬ 
trated graphically in Figure 1 for 50 PSA, 100 
PSA, and 150 PSA. A prepayment rate of 0 PSA 
means that no prepayments are assumed. 

It is important to note that mortgage pools 
will typically be comprised of loans having dif¬ 
ferent origination months and, therefore, differ¬ 
ent ages. In practice, the weighted average loan 
age (WALA) of a pool or security is used as a 
proxy for its age. However, a large dispersion 
of loan ages within a pool will distort the PSA 
calculation. 

It is helpful to outline the CPRs and SMMs 
assumed at different PSA assumptions for dif¬ 
ferent loan ages. The SMMs for month 5, month 
20, and months 31 through 360 assuming 100 
PSA are calculated as follows: 

For month 5: 

CPR = 6% (5/30) = 1% = 0.01 

SMM = 1 - (1 - 0.01) 1/l2 = 1 - (0.99) 0083333 
= 0.000837 

For month 20: 

CPR = 6% (20/30) = 4% = 0.04 

SMM = 1 - (1 - 0.04) 1 42 = i _ (0.96) 0 083333 
= 0.003396 

For months 31-360: 

CPR = 6% 

SMM = 1 - (1 - 0.06) 1/12 = 1 - (0.94) 0 083333 
= 0.005143 


The SMMs for month 5, month 20, and months 
31 through 360 assuming 165 PSA are computed 
as follows: 

For month 5: 

CPR = 6% (5/30) = 1% = 0.01 
165 PSA = 1.65(0.01) = 0.0165 
SMM = 1 - (1 - 0.0165) 1/12 

= 1 - (0.9835) 0 083333 = 0.001386 

For month 20: 

CPR = 6% (20/30) = 4% = 0.04 
165 PSA = 1.65(0.04) = 0.066 
SMM = 1 - (1 - 0.066)^ 2 = 1 - (0.934) 0 083333 
= 0.005674 

For months 31 through 360: 

CPR = 6% 

165 PSA = 1.65 (0.06) = 0.099 
SMM = 1 - (1 - 0.099) 1 ' 42 = 1 - (0.901) 0 083333 
= 0.007828 

Notice that the SMM assuming 165 PSA is not 
1.65 times the SMM at 100 PSA. Rather, the CPR 
for the pool's age at 100 PSA is multiplied by 
1.65 to generate the CPR representing 165 PSA 
at that age. 

Illustration of Monthly Cash Flow 
Construction 

We now show how to construct a monthly cash 
flow for a hypothetical agency pass-through 
given a PSA assumption. For the purpose of 
this illustration, the underlying mortgages for 
this hypothetical pass-through are assumed to 
be fixed rate fully amortizing mortgages with a 
weighted average coupon (WAC) rate of 6.0%. 
It will be assumed that the mortgage pass¬ 
through rate is 5.5% with a weighted average 
maturity (WAM) of 358 months. 

Table 1 shows the cash flow for selected 
months assuming 100 PSA. The cash flow is 
broken down into three components: (1) inter¬ 
est (based on the pass-through rate), (2) the 
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Table 1 Monthly Cash Flow for a $400 Million Mortgage Pass-Through with a 5.5% Pass-Through Rate, a WAC of 
6.0%, and a WAM of 358 Months, Assuming 100% PSA 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

(9) 

Month 

Outstanding 

Balance 

SMM 

Mortgage 

Payment 

Net Interest 

Scheduled 

Prinicipal 

Prepayments 

Total 

Principal 

Cash Flow 

1 

400,000,000 

0.00050 

2,402,998 

1,833,333 

402,998 

200,350 

603,349 

2,436,682 

2 

399,396,651 

0.00067 

2,401,794 

1,830,568 

404,810 

266,975 

671,785 

2,502,353 

3 

398,724,866 

0.00084 

2,400,187 

1,827,489 

406,562 

333,463 

740,025 

2,567,514 

4 

397,984,841 

0.00101 

2,398,177 

1,824,097 

408,253 

399,780 

808,033 

2,632,130 

5 

397,176,808 

0.00117 

2,395,766 

1,820,394 

409,882 

465,892 

875,773 

2,696,167 

6 

396,301,034 

0.00134 

2,392,953 

1,816,380 

411,447 

531,764 

943,211 

2,759,591 

7 

395,357,823 

0.00151 

2,389,738 

1,812,057 

412,949 

597,362 

1,010,311 

2,822,368 

8 

394,347,512 

0.00168 

2,386,124 

1,807,426 

414,386 

662,652 

1,077,038 

2,884,464 

9 

393,270,474 

0.00185 

2,382,110 

1,802,490 

415,758 

727,600 

1,143,357 

2,945,847 

10 

392,127,117 

0.00202 

2,377,698 

1,797,249 

417,063 

792,172 

1,209,235 

3,006,484 

11 

390,917,882 

0.00219 

2,372,890 

1,791,707 

418,300 

856,336 

1,274,636 

3,066,343 

12 

389,643,247 

0.00236 

2,367,686 

1,785,865 

419,470 

920,057 

1,339,527 

3,125,391 

13 

388,303,720 

0.00253 

2,362,089 

1,779,725 

420,571 

983,303 

1,403,873 

3,183,599 

14 

386,899,847 

0.00271 

2,356,101 

1,773,291 

421,602 

1,046,041 

1,467,643 

3,240,934 

15 

385,432,204 

0.00288 

2,349,724 

1,766,564 

422,563 

1,108,239 

1,530,802 

3,297,366 

16 

383,901,402 

0.00305 

2,342,961 

1,759,548 

423,454 

1,169,864 

1,593,318 

3,352,866 

17 

382,308,084 

0.00322 

2,335,813 

1,752,245 

424,273 

1,230,887 

1,655,159 

3,407,405 

18 

380,652,925 

0.00340 

2,328,284 

1,744,659 

425,020 

1,291,274 

1,716,294 

3,460,953 

19 

378,936,632 

0.00357 

2,320,377 

1,736,793 

425,694 

1,350,996 

1,776,690 

3,513,483 

20 

377,159,941 

0.00374 

2,312,095 

1,728,650 

426,296 

1,410,023 

1,836,319 

3,564,968 

21 

375,323,622 

0.00392 

2,303,442 

1,720,233 

426,824 

1,468,325 

1,895,148 

3,615,382 

22 

373,428,474 

0.00409 

2,294,420 

1,711,547 

427,278 

1,525,872 

1,953,150 

3,664,697 

23 

371,475,324 

0.00427 

2,285,034 

1,702,595 

427,657 

1,582,637 

2,010,294 

3,712,889 

24 

369,465,030 

0.00444 

2,275,288 

1,693,381 

427,962 

1,638,590 

2,066,553 

3,759,934 

25 

367,398,478 

0.00462 

2,265,185 

1,683,910 

428,192 

1,693,706 

2,121,898 

3,805,808 

26 

365,276,580 

0.00479 

2,254,730 

1,674,184 

428,347 

1,747,956 

2,176,303 

3,850,488 

27 

363,100,276 

0.00497 

2,243,928 

1,664,210 

428,427 

1,801,315 

2,229,742 

3,893,952 

28 

360,870,534 

0.00514 

2,232,783 

1,653,990 

428,430 

1,853,758 

2,282,189 

3,936,178 

29 

358,588,346 

0.00514 

2,221,300 

1,643,530 

428,358 

1,842,021 

2,270,379 

3,913,909 

30 

356,317,967 

0.00514 

2,209,875 

1,633,124 

428,286 

1,830,345 

2,258,631 

3,891,755 

100 

223,414,587 

0.00514 

1,540,329 

1,023,984 

423,256 

1,146,847 

1,570,104 

2,594,087 

101 

221,844,483 

0.00514 

1,532,407 

1,016,787 

423,185 

1,138,773 

1,561,958 

2,578,745 

102 

220,282,525 

0.00514 

1,524,526 

1,009,628 

423,114 

1,130,740 

1,553,853 

2,563,482 

103 

218,728,672 

0.00514 

1,516,686 

1,002,506 

423,042 

1,122,749 

1,545,791 

2,548,297 

104 

217,182,881 

0.00514 

1,508,885 

995,422 

422,971 

1,114,799 

1,537,770 

2,533,191 

105 

215,645,111 

0.00514 

1,501,125 

988,373 

422,900 

1,106,891 

1,529,790 

2,518,164 

200 

100,719,066 

0.00514 

919,770 

461,629 

416,174 

515,859 

932,033 

1,393,662 

201 

99,787,032 

0.00514 

915,039 

457,357 

416,104 

511,066 

927,170 

1,384,527 

202 

98,859,862 

0.00514 

910,333 

453,108 

416,034 

506,298 

922,332 

1,375,439 

203 

97,937,531 

0.00514 

905,651 

448,880 

415,964 

501,555 

917,518 

1,366,399 

204 

97,020,012 

0.00514 

900,994 

444,675 

415,893 

496,836 

912,730 

1,357,405 

205 

96,107,283 

0.00514 

896,360 

440,492 

415,823 

492,142 

907,966 

1,348,457 

300 

28,001,417 

0.00514 

549,218 

128,340 

409,211 

141,907 

551,118 

679,457 

301 

27,450,299 

0.00514 

546,393 

125,814 

409,142 

139,073 

548,215 

674,028 

302 

26,902,085 

0.00514 

543,583 

123,301 

409,073 

136,254 

545,326 

668,628 

303 

26,356,758 

0.00514 

540,787 

120,802 

409,003 

133,450 

542,453 

663,255 

304 

25,814,305 

0.00514 

538,006 

118,316 

408,934 

130,660 

539,595 

657,910 

305 

25,274,710 

0.00514 

535,239 

115,842 

408,865 

127,885 

536,751 

652,593 

350 

3,725,850 

0.00514 

424,402 

17,077 

405,773 

17,075 

422,848 

439,925 

351 

3,303,002 

0.00514 

422,219 

15,139 

405,704 

14,901 

420,605 

435,744 

352 

2,882,397 

0.00514 

420,048 

13,211 

405,636 

12,738 

418,374 

431,585 

353 

2,464,023 

0.00514 

417,887 

11,293 

405,567 

10,587 

416,154 

427,447 

354 

2,047,869 

0.00514 

415,738 

9,386 

405,499 

8,447 

413,946 

423,332 

355 

1,633,924 

0.00514 

413,600 

7,489 

405,430 

6,318 

411,749 

419,237 

356 

1,222,175 

0.00514 

411,473 

5,602 

405,362 

4,201 

409,563 

415,164 

357 

812,613 

0.00514 

409,357 

3,724 

405,294 

2,095 

407,388 

411,113 

358 

405,224 

0.00514 

407,251 

1,857 

405,225 

0 

405,225 

407,082 


a Since the WAM is 358 months, the underlying mortgage pool is seasoned an average of two months. Therefore, the 
CPR for month 28 is 6%. 
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regularly scheduled principal payment, and (3) 
prepayments based on 100 PSA. Let's walk 
through Table 1 column by column: 

Column 1 . This is the month. 

Column 2. This column gives the outstanding 
mortgage balance at the beginning of the 
month. It is equal to the outstanding balance 
at the beginning of the previous month re¬ 
duced by the total principal payment in the 
previous month. 

Column 3. This column shows the SMM for 100 
PSA. Two things should be noted in this col¬ 
umn. First, for month 1, the SMM is for a 
pass-through that has been seasoned three 
months because the WAM is 357 months. 
This results in a CPR of 0.8%. Second, from 
month 27 on, the SMM is 0.00514, which cor¬ 
responds to a CPR of 6%. 

Column 4. The aggregate monthly mortgage 
payments using a 6% note rate are shown in 
this column. Notice that the total monthly 
mortgage payment declines over time, as 
prepayments reduce the mortgage balance 
outstanding. (In the absence of prepay¬ 
ments, this figure would remain constant.) 
In essence, the payment is calculated each 
month as a function of the WAC, the remain¬ 
ing balance at the end of the prior month, and 
the remaining term (i.e., the original WAM 
minus the number of months since issuance). 
For example, the payment in month 10 of 
$2,376,474 can be generated on a calculator 
by inputting $391,508,422 as the balance or 
present value, 0.5% (6.0% divided by 12) as 
the rate, and 348 months as the remaining 
term. 4 

Column 5. The monthly interest paid to the 
pass-through investor is found in this col¬ 
umn. This value is determined by multiply¬ 
ing the outstanding mortgage balance at the 
beginning of the month by the pass-through 
rate of 5.5% and dividing by 12. 

Column 6. This column shows the scheduled 
principal repayment, or amortization. This 
is the difference between the total monthly 


mortgage payment [the amount shown in 
column (4)] and the gross coupon interest for 
the month. The gross coupon interest is 6.0% 
multiplied by the outstanding mortgage bal¬ 
ance at the beginning of the month, then di¬ 
vided by 12. 

Column 7. The dollar value of prepayments for 
the month is reported in this column. This 
amount is calculated by using the following 
equation: 

Prepayments* 

= SMM(Beginning principal balance* 

— Scheduled principal balance*) 

So, for example, in month 100, the be¬ 
ginning mortgage balance is $223,414,587, 
the scheduled principal payment is $423,356, 
and the SMM at 100 PSA is 0.00514301 (only 
0.00514 is shown in the table to save space), 
so the prepayment is 

0.00514301 x ($223,414,587 - $423,356) 

= $1,146,847 

Column 8. The total principal payment, which 
is the sum of columns (6) and (7), is shown 
in this column. 

Column 9. The projected monthly cash flow for 
this pass-through is shown in this last col¬ 
umn. The monthly cash flow is the sum of 
the interest paid to the pass-through investor 
[column (5)] and the total principal payments 
for the month [column (8)]. 

Prospectus Prepayment Curve 

A more recent addition to MBS prepayment ter¬ 
minology is the prospectus prepayment curve 
(PPC). While the logic underlying the PSA con¬ 
vention (i.e., that loans prepay faster as they 
age, all other factors constant) remains in force, 
a PPC curve allowed its creator (typically the 
underwriter of a private-label deal) to specify 
the prepayment ramp that was used to struc¬ 
ture the deal. Evidence suggested that loans 
have seasoned faster than the 30 month period 
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implied by the PSA curve, especially for some 
products (such as alt-A loans) that were be¬ 
lieved to season faster than normal. Rather than 
use a percentage of a publicly utilized ramp, 
PPC curves (which are quoted in a transaction's 
prospectus supplement) were used for many 
nonagency transactions between 2004 and 
2007. 

Typically, 100% PPC is the base-case pre¬ 
payment assumption used to create a particu¬ 
lar deal. PPC curves (or ramps) are generally 
specified as a beginning and terminal CPR, 
along with the associated time period. A typ¬ 
ical ramp might be specified as "8-20% CPR 
over 12 months." This translates to an assump¬ 
tion of 8% CPR in the first month, increasing 
1.09% per month for the next 11 months, and 
terminating at 20% CPR in month 12. How¬ 
ever, there is no industry standardization for 
the usage of this terminology, as the specifica¬ 
tion is issue-dependent. As a result, investors 
must confirm how "100% PPC" is defined for 
each particular issue before performing further 
analysis. 

The language utilized in a deal's prospectus 
supplement is illuminating. For example, the 
document for the CWALT 2005-J9 deal has lan¬ 
guage as follows: 

Prepayments of mortgage loans commonly are mea¬ 
sured relative to a prepayment standard or model. 
The model used in this prospectus supplement as¬ 
sumes a constant prepayment rate (i.e., CPR) or 
an assumed rate of prepayment each month of the 
then-outstanding principal balance of a pool of new 
mortgage loans. A 100% prepayment assumption 
for loan group 1 (the “prepayment assumption") 
assumes a CPR of 8.0% per annum of the then out¬ 
standing principal balance of the applicable mort¬ 
gage loans in the first month of the life of the 
mortgage loans and an additional approximately 
1.0909090909% (precisely 12%/U) per annum in 
the second through 11th months. Beginning in the 
12th month and in each month thereafter during 
the life of the mortgage loans, a 100% prepayment 
assumption assumes a CPR of 20.0% per annum 
each month. 

Note that the prospectus supplement does not 
directly refer to a "PPC," but rather defines 


the prepayment ramp as "a 100% prepayment 
assumption." 

Prepayment Conventions for 
Securities Backed by Home Equity 
and Manufactured Housing Loans 

While the expression of prepayments in the 
MBS market is fairly standardized and com¬ 
prises a combination of PSA curves and CPR 
calculations as previously described, a variety 
of descriptions are used to express the pay- 
down behavior of securities backed by home 
equity and manufactured housing loans. While 
issuance of securities backed by these loans fell 
out of favor in the mid-2000s, a brief discussion 
of these conventions will nonetheless be help¬ 
ful in understanding how prepayment conven¬ 
tions have been adjusted in order to represent 
an asset's unique behavior. Despite the diver¬ 
sity in terminology, most of the concepts used 
to indicate prepayments for these two sectors 
of the mortgage market use the CPR concept 
as the numeraire while incorporating the PSA 
ramping methodology. 

Home Equity Prepayment Speeds 

In the early stages of the development of the 
securitized market for home equity loans, the 
majority of the loans were fixed rate, closed-end 
loans. Over the years, the balance has slowly 
shifted in favor of adjustable rate loans, par¬ 
ticularly subprime ARMs. The earliest defini¬ 
tion of prepayment speeds in the home equity 
market was the home equity prepayment (HEP) 
curve. 5 The primary motivation for using a dif¬ 
ferent prepayment methodology for home eq¬ 
uity loans was to capture the faster seasoning 
ramp observed for the asset class. Typically, 
home equity loans season faster than traditional 
single-family loans, making the PSA ramp an 
inappropriate description of the behavior of 
prepayments. 

The HEP curve reflects the observed behav¬ 
ior in historic HEL data—it has a ramp of 
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Figure 2 HEP Curves 


10 months and a variable long-term CPR to 
reflect individual issuer speeds. A faster long¬ 
term speed means faster CPRs on the ramp 
because the ramp is fixed at 10 months regard¬ 
less of the long-term speed. For example, a 
20% HEP projection would mean a 10-month 
ramp increasing to 20% in the 10th month 
from 2% in the first month and a constant 20% 
thereafter. Figure 2 shows several HEP curves 
at 20% HEP and 24% HEP, where month 1 
speeds of 2.4% CPR increase over 10 months to 
24% CPR. 

In addition to utilizing the HEP curve, a PPC 
ramp is also commonly used to define the base- 
case prepayment assumption for the product. 
As with other mortgage products, the spec¬ 
ification of the ramp will be dependent on 
the attributes of the underlying loan collateral, 
with respect to both the beginning and termi¬ 
nal speeds as well as the duration of the ramp. 
Occasionally, deals are also priced to a constant 
CPR assumption, ignoring the impact of sea¬ 
soning in generating the deal's cash flows. 

Manufactured Housing 
Prepayment Curve 

The manufactured housing prepayment (MHP) 
curve is a measure of prepayment behavior for 
manufactured housing, based on the Green Tree 


Financial manufactured housing prepayment 
experience. MHP is similar to the PSA curve, ex¬ 
cept that the seasoning ramp is slightly different 
to account for the specific behavior of manufac¬ 
tured loans: 100% MHP is equivalent to 3.6% 
CPR at month zero and increases 0.1% CPR ev¬ 
ery month until month 24, when it plateaus at 
6% CPR. Figure 3 shows the prepayment speeds 
at 50% MHP, 100% MHP, and 200% MHP. 

DELINQUENCY, DEFAULT, 
AND LOSS TERMINOLOGY 

The measurement of potential and actual cash 
flow impairment resulting from borrower credit 
problems is critically important to the analysis 
of private label or nonagency MBS. Historically, 
the importance of these measures stemmed 
from their role in allowing investors in subordi¬ 
nate MBS tranches to assess relative value and 
risk. However, the mortgage crisis that began 
in 2007 demonstrated to investors that all nona¬ 
gency securities have exposure to defaults and 
losses; put differently, it is impossible to invest 
in nonagency MBS without taking on a material 
degree of credit risk. This means that any diver¬ 
gence in realized default and loss experience 
from investors' initial expectations can result in 
writedowns and losses on the investment. 
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Age (months) 


Figure 3 MHP Curves 


Despite the importance of delinquencies, 
losses, and defaults in the mortgage-related 
markets, the terminology is not standardized. 
For instance, static pool losses may be reported 
on a monthly or annualized basis as a percent¬ 
age of either current or original balance, with 
the metric based upon current balance being the 
preferred method to ensure consistency with 
prepayment reporting. 

Before we discuss the measurement of de¬ 
faults and losses, it is instructive to briefly re¬ 
view the various outcomes of a loan when the 
obligor ceases making scheduled payments. A 
loan becomes delinquent when the obligor fails 
to make the contractual payment on the stated 
date. If the underlying property has appreci¬ 
ated from the initial purchase price, the home- 
owner can often sell the home and use the 
proceeds to settle the mortgage debt. (This gen¬ 
erally is categorized as a voluntary prepayment 
and is considered part of housing turnover.) If 
the homeowner cannot sell the property at a 
high enough price and remains delinquent, the 
loan is declared to be in default once all collec¬ 
tion (and modification) efforts have failed. At 
that point, the issuer (or the servicer) has several 
options. There may either be a short sale, where 
the borrower sells the property in a negotiated 


transaction subject to approval by the servicer; 
alternatively, the property may go into the fore¬ 
closure or repossession process and be eventu¬ 
ally sold by the servicer. Therefore, the process 
chain is delinquency to default to foreclosure 
(or repossession) to liquidation, at which time 
the severity of loss can be assessed. 

Delinquency Measures 

As mentioned, when a borrower fails to make 
one or more timely payments, the loan is said 
to be delinquent. Delinquency measures are de¬ 
signed to gauge whether borrowers are current 
on their loan payment as well as to stratify un¬ 
paid loans according to the seriousness of the 
delinquency. The calculation method used is de¬ 
termined by the servicer. When the underlying 
pool of assets is comprised of mortgage loans, 
the two commonly used methods for classifying 
delinquencies are those recommended by the 
now-defunct Office of Thrift Supervision (OTS) 
and the Mortgage Bankers Association (MBA). 

The OTS method uses the following loan 
delinquency classifications: 

• Payment due date to 30 days late: Current 

• 30-60 days late: 30 days delinquent 
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• 60-90 days late: 60 days delinquent 

• More than 90 days late: 90+ days delinquent 

The MBA method is a somewhat more strin¬ 
gent classification method, classifying a loan as 
30 days delinquent once payments are not re¬ 
ceived after the due date. Thus, a loan classi¬ 
fied as "current" under the OTS method would 
be listed be as "30 days delinquent" under the 
MBA method. The two methods can report sig¬ 
nificantly different delinquencies. 6 

Default Measures 

The conditions that result in classification of 
some loans as delinquent (such as the loss of 
a job or illness) may change, resulting in the 
resumption of timely principal and interest pay¬ 
ments. However, some portion of the loans clas¬ 
sified as delinquent typically end up in default. 
By definition, default is the point where the 
borrower loses title to the property in question. 

Two broadly used measures for quantifying 
default are the cumulative default rate and the 
conditional default rate. The cumulative default 
rate (denoted as the CDX) is the proportion of 
the total face value of loans in a pool that have 
gone into default as a percentage of the total 
face value of the security. 

The conditional default rate (CDR) is the an¬ 
nualized value of the unpaid principal balance 
of newly defaulted loans over the course of a 
month as a percentage of the unpaid balance of 
the pool (before scheduled principal payment) 
at the beginning of the month. It is computed by 
first calculating the monthly default rate (MDR) 
as shown below: 

MDR for month t 

Default loan balance in month t 

Beginning balance for month t — Scheduled principal 

payment in month f 

This is then annualized as follows to get the 
CDR: 

CDR f = 1 — (1 — Default rate for month f) 12 

Note that the conversion of MDR to CDR is 
identical to the formula for converting SMMs 


to CPRs. As described earlier, the default rate is 
represents involuntary prepayments, and the 
CDR represents the involuntary prepayment 
speed calculated for nonagency MBS. Voluntary 
prepayment speeds (i.e, those resulting from re¬ 
financing activity and housing turnover) must 
be calculated separately. 

Let's use the following as an example. As¬ 
sume that a nonagency pool 7 with an 8% note 
rate and 300 months left to maturity has a bal¬ 
ance at time t of $10,000,000. The pool's sched¬ 
uled monthly payment is $77,181.62, comprised 
of $66,666.67 in interest and $10,514.96 in sched¬ 
uled principal. Assume that the pool receives 
$20,000 of voluntary prepayments and $15,000 
in involuntary prepayments. 8 

The monthly voluntary prepayment speed is 
calculated as follows: 

$20,000 

Voluntary SMM = - 

y $10,000,000 - $10,514.96 

= 0.002 


This can then be converted to 2.37% CPR. 
The MDR is calculated similarly: 


MDR = 


$15,000 

$10,000,000 - $10,514.96 


0.0015 


which can be converted to 1.78% CDR. 

In some cases, the involuntary and voluntary 
prepayment speeds are combined to calculate a 
single prepayment speed. In this case, the cal¬ 
culation of a "total CPR" is as follows: 


Total SMM = 


$35,000 

$10,000,000 - $10,514.96 
0.0035 


which can be converted to a total CPR of 4.12%. 

There are a number of issues implied by these 
calculations. First, note that the voluntary SMM 
and MDR equals the pool's total SMM. (It is 
not true, however, that CPRs and CDRs sum to 
equal the total pool CPR; it is only the monthly 
rates that are additive.) In using the output of a 
model, it is also important to ascertain what the 
vendor means when they quote a "CPR." Since 
many systems will show CPRs as the annual¬ 
ized rate of all prepayments (i.e., total CPRs) 
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Age (months) 


Figure 4 Monthly Dollar Amounts of Defaults on a $100 Million Pool Using 8% CDR at Different 
Voluntary Prepayment Speeds 


and show CDRs separately, the voluntary 
prepayment speed must be calculated indepen¬ 
dently. This can be accomplished by deannual¬ 
izing the CPRs and CDRs (i.e, converting them 
to SMMs and MDRs), subtracting the MDR 
from the SMM, and annualizing the difference. 
In the above example, the voluntary SMM is 
0.0035 less 0.0015 or 0.002, which annualizes to 
a 2.37% voluntary CPR. 

Also note that the CDR metric measures only 
the amount of defaults and not the amount of 
losses because actual losses depends upon the 
amounts that can be recovered on loans in de¬ 
fault, adjusted for the costs of collection and 
servicer advances, if applicable. In the extreme 
case, if there is full recovery of the unpaid prin¬ 
cipal balance of the defaulted loans, the losses 
will be zero except for the costs of recovery. 
However, depending upon the timing of the 
recovery of the defaulted loan balances, the 
cash flows to certain bondholders may be in¬ 
terrupted. 

There is also an interesting and important re¬ 
lationship between the voluntary prepayment 
speed and the dollar amount of defaults in a 
pool. Every dollar of principal that is prepaid 
voluntarily is returned at 100 cents on the dollar 


and cannot subsequently go into default. There¬ 
fore, the dollar amount of a pool's principal that 
goes into default declines as voluntary prepay¬ 
ment speeds increase, even if the assumed CDR 
remains constant. This is illustrated in Figure 4. 
The figure shows the projected dollar amounts 
of defaults on a $100 million pool with an 8.5% 
note rate at 8% CDR for two different voluntary 
CPRs. At a combination of 15% CPR and 8% 
CDR, the pool is expected to lose a total of $21.9 
million in face value; the projected amount of 
defaulted principal using 8% CPR and 8% CDR 
increases to $29.0 million. 

As with prepayment analysis, there are dis¬ 
advantages to using constant CDRs that tend 
to distort credit analysis. A constant CDR as¬ 
sumption is not necessarily consistent with the 
actual behavior of defaults, and also does not al¬ 
low the analysis to take variations in the timing 
of defaults into account. As with prepayments, 
credit problems have historically tended to be 
very low immediately after the loans are closed, 
but generally increase with time as the pool in 
question ages. 

One time-honored methodology is to uti¬ 
lize the Standard Default Assumption (SDA) 
convention, which assumes that defaults (as 
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Age (months) 

Figure 5 100% SDA Without Effects of Prepayments 


measured in annual terms using CDRs) have a 
fairly consistent pattern over the life of the pool. 
The SDA model is similar in concept to the PSA 
convention used in prepayment analysis, and is 
specified as follows: 

* 0.02% initial CDR, rising 0.02% CDR until 
reaching 0.6% CDR in month 30. 

• A constant 0.6% CDR from months 30 to 60. 

• A linear decline of 0.0095% between months 
61 and 120, reaching 0.03% in month 120. 

* A constant 0.03% CDR for the remaining term. 

The base SDA curve is shown in Figure 5. 

In addition to the prescribed CDR curve de¬ 
scribed above, the base SDA model explicitly 
accounts for the effects of voluntary prepay¬ 
ments by assuming a prepayment speed of 
150% PSA. One hundred percent SDA at 150% 
PSA results in cumulative defaults of around 
2.73%. The dollar amount of monthly defaults 
is calculated as the product of monthly default 
rates or MDRs (i.e., the deannualized CDR) and 
the monthly balance factor at the projected pre¬ 
payment speed. Cumulative defaults are the 
sum of this vector. Table 2 shows how 100% 
SDA would be calculated, assuming a 6.0% 
coupon pass-through (as in the prior examples). 


A depiction of monthly defaults using the base 
assumptions of the SDA model at 150% PSA is 
shown in Figure 6. 

Loss Severity Measures 

Where the lender has a lien on the property, a 
portion of the value of the loan can be recovered 
through the legal recovery process (i.e., through 
foreclosure and repossession) and subsequent 
sale of the asset. The difference between the pro¬ 
ceeds received from the recovery process (after 
all transaction costs) and principal balance of 
the loan is the loss in dollars. The historical 
loss severity rate in any month is defined as 
follows: 

Liquidation Proceeds 
Loss severity rate = 1----- 

Liquidation Balance; 

The loss severity rate ranges from 0 to 1 (or 
0% to 100%). If the loss severity rate is zero, 
then liquidation proceeds are equal to the liq¬ 
uidated loan balance. A loss severity rate of 1 
(or 100%) means that there are no liquidation 
proceeds. The loss rate is equal to the annual 
default rate multiplied by the loss assumption 
severity. In projecting future cash flows and 
losses, investors will often use a constant loss 
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Table 2 Calculation of Monthly Defaults Using 100% SDA at 150% PSA for a Pass-Through with a 5.5% 
Pass-Through Rate, a WAC of 6.0%, and a WAM of 357 Months 


(1) 

(2) 

(3) 

(4) 

(5) 


100% SDA 

100% SDA 

Bond Factor 

Factor-Adjusted 

Month 

(in CDRs) 

(in MDRs)“ 

(@ 150% PSA) 

MDR 6 

1 

0.080% 

0.007% 

0.99798 

0.0067% 

2 

0.100% 

0.008% 

0.99571 

0.0083% 

3 

0.120% 

0.010% 

0.99318 

0.0099% 

4 

0.140% 

0.012% 

0.99041 

0.0116% 

5 

0.160% 

0.013% 

0.98738 

0.0132% 

6 

0.180% 

0.015% 

0.98410 

0.0148% 

7 

0.200% 

0.017% 

0.98057 

0.0164% 

8 

0.220% 

0.018% 

0.97680 

0.0179% 

9 

0.240% 

0.020% 

0.97278 

0.0195% 

10 

0.260% 

0.022% 

0.96853 

0.0210% 

11 

0.280% 

0.023% 

0.96403 

0.0225% 

12 

0.300% 

0.025% 

0.95930 

0.0240% 

13 

0.320% 

0.027% 

0.95433 

0.0255% 

14 

0.340% 

0.028% 

0.94914 

0.0269% 

15 

0.360% 

0.030% 

0.94372 

0.0284% 

16 

0.380% 

0.032% 

0.93807 

0.0298% 

17 

0.400% 

0.033% 

0.93220 

0.0311% 

18 

0.420% 

0.035% 

0.92612 

0.0325% 

19 

0.440% 

0.037% 

0.91982 

0.0338% 

20 

0.460% 

0.038% 

0.91332 

0.0351% 

21 

0.480% 

0.040% 

0.90661 

0.0363% 

22 

0.500% 

0.042% 

0.89970 

0.0376% 

23 

0.520% 

0.043% 

0.89260 

0.0388% 

24 

0.540% 

0.045% 

0.88531 

0.0399% 

25 

0.560% 

0.047% 

0.87783 

0.0411% 

26 

0.580% 

0.048% 

0.87017 

0.0422% 

27 

0.600% 

0.050% 

0.86233 

0.0432% 

28 

0.600% 

0.050% 

0.85456 

0.0428% 

29 

0.600% 

0.050% 

0.84685 

0.0425% 

30 

0.600% 

0.050% 

0.83920 

0.0421% 

100 

0.192% 

0.016% 

0.43487 

0.0069% 

101 

0.182% 

0.015% 

0.43064 

0.0065% 

102 

0.173% 

0.014% 

0.42644 

0.0061% 

103 

0.163% 

0.014% 

0.42228 

0.0057% 

104 

0.154% 

0.013% 

0.41815 

0.0054% 

105 

0.144% 

0.012% 

0.41406 

0.0050% 

200 

0.030% 

0.003% 

0.14894 

0.0004% 

201 

0.030% 

0.003% 

0.14715 

0.0004% 

202 

0.030% 

0.003% 

0.14538 

0.0004% 

203 

0.030% 

0.003% 

0.14363 

0.0004% 

204 

0.030% 

0.003% 

0.14188 

0.0004% 

205 

0.030% 

0.003% 

0.14016 

0.0004% 

300 

0.030% 

0.003% 

0.03093 

0.0001% 

301 

0.030% 

0.003% 

0.03022 

0.0001% 

302 

0.030% 

0.003% 

0.02952 

0.0001% 

303 

0.030% 

0.003% 

0.02882 

0.0001% 

304 

0.030% 

0.003% 

0.02814 

0.0001% 

305 

0.030% 

0.003% 

0.02745 

0.0001% 

350 

0.030% 

0.003% 

0.00289 

0.0000% 

351 

0.030% 

0.003% 

0.00247 

0.0000% 

352 

0.030% 

0.003% 

0.00204 

0.0000% 

353 

0.030% 

0.003% 

0.00163 

0.0000% 

354 

0.030% 

0.003% 

0.00121 

0.0000% 

355 

0.030% 

0.003% 

0.00080 

0.0000% 

356 

0.030% 

0.003% 

0.00040 

0.0000% 

357 

0.030% 

0.003% 

0.00000 

0.0000% 




Cumulative Defaults 

2.75% 


" CDRs are converted to MDRs by using the following formula: 


MDR = 1 - (1 - CDR) 1 ^ 
b Column (3) x (4) 
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Age (months) 


Figure 6 Monthly CDRs for 100% SDA Using 150% PSA 


severity assumption based on a combination 
of loan attributes, projected changes in home 
prices, and the length of time until liquidation. 
The percentage of loss severity is then applied 
to the monthly default amount (generated by 
using the applicable MDR) in order to calculate 
monthly losses. 

Default and loss severity assumptions (which 
translate into expected losses) are critical met¬ 
rics for holders of mortgages and MBS that have 
exposure to mortgage credit performance. From 
the viewpoint of issuers, the assumptions used 
to value and capitalize investments in retained 
tranches are critical for assessing a firm's value, 
as any deterioration in the performance of re¬ 
tained tranches can negatively impact overall 
corporate valuations. Investors in whole-loan 
mortgages and subordinate MBS routinely use 
the credit metrics discussed above to analyze 
the relative value of different alternatives by 
generating default- and loss-adjusted returns 
and valuations. 

KEY POINTS 

• The monthly cash flow from the underly¬ 
ing pool of mortgage loans for a residential 
mortgage-backed security includes sched¬ 
uled principal payments, interest payments, 
and any principal payments made by borrow¬ 
ers that is in excess of the scheduled principal 


payment. The last component is referred to as 
prepayments. 

* The valuation of residential mortgage-backed 
securities requires the generation of a residen¬ 
tial MBS's cash flow. Prepayment speeds and 
default rates must be projected in order to 
do so. 

* The performance of a residential MBS de¬ 
pends on the prepayments and performance 
of the loan pool. 

* The measurement of potential and actual cash 
flow impairment resulting from borrower 
credit problems is critically important to the 
analysis of nonagency or private label MBS. 

* Complicating the evaluation of prepayments 
is the interplay between defaults, which are 
effectively credit-related prepayments, and 
prepayments attributable specifically to de¬ 
clining interest rates. 

* The approach most commonly used to mea¬ 
sure prepayment speeds is the conditional 
prepayment rate, which calculates monthly 
prepaid principal (i.e., that excludes sched¬ 
uled principal amortizations) as a percentage 
of the security's outstanding balance and then 
annualizes that percentage. The CPR is an an¬ 
nual rate; the corresponding monthly rate is 
the single monthly mortality rate. 

* The Public Securities Association (PSA) 
prepayment benchmark is expressed as a 
monthly series of annual prepayment rates 
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that assumes prepayment rates are low for 
newly originated mortgages and then will 
speed up as the mortgages age. 

• A loan is classified as delinquent when a 
borrower fails to make one or more timely 
payments. Measures of delinquency are de¬ 
signed to gauge whether borrowers are 
current on their loan payment as well as strat¬ 
ifying unpaid loans according to the seri¬ 
ousness of the delinquency. The calculation 
method used is determined by the servicer. 
The two commonly used methods for classi¬ 
fying delinquencies are those recommended 
by the now-defunct Office of Thrift Supervi¬ 
sion (OTS) and the Mortgage Bankers Asso¬ 
ciation (MBA). 

* Cumulative default rate and conditional de¬ 
fault rate are the two broadly used metrics 
for quantifying defaults for a mortgage pool. 
The cumulative default rate is the proportion 
of the total face value of loans in a pool that 
have gone into default as a percentage of the 
total face value of the collateral pool. The con¬ 
ditional default rate is the annualized value 
of the unpaid principal balance of newly de¬ 
faulted loans over the course of a month as a 
percentage of the unpaid balance of the pool 
(before scheduled principal payment) at the 
beginning of the month. To compute this mea¬ 
sure, the monthly default rate must first be 
calculated. 

NOTES 

1. Fora detailed discussion of the types of mort¬ 
gage loans and residential MBS, see Fabozzi, 
Bhattacharya, and Berliner (2011). 


2. Also called the constant prepayment rate. 

3. This benchmark is commonly referred to as a 
"prepayment model," suggesting that it can 
be used to estimate prepayments. Character¬ 
ization of this benchmark as a prepayment 
model is inaccurate. It is simply a market 
convention. While the PSA has changed its 
name to the Securities Industry and Financial 
Markets Association, or SIFMA, the bench¬ 
mark is still referred to as the "PSA prepay¬ 
ment benchmark." 

4. The calculation can also be presented as a se¬ 
ries of formulas, which are available in Chap¬ 
ter 21 Fabozzi (2006). 

5. The FIEP curve was developed by Prudential 
Securities based on the prepayment experi¬ 
ence of $10 billion of home equity loan deals. 

6. For example, a June 9, 2000, report by 
Moody's titled, "Contradictions in Terms: 
Variations in Terminology in the Mortgage 
Market," shows that the reported delinquen¬ 
cies can differ dramatically when the differ¬ 
ent conventions are used. 

7. For clarity's sake, we assume a simple pool 
with no credit enhancement. 

8. These payments are reported in the monthly 
remittance reports compiled by a transac¬ 
tion's trustee. 
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Abstract: Prepayments and their impact on principal cash flows are critical components of the 
valuation, trading, and risk management of residential mortgage-backed securities. Because of 
this, substantial resources are expended by investors and dealers in understanding and modeling 
prepayment "speeds." However, prepayment behavior is not static and has evolved repeatedly 
since the first prepayment waves in the early 1990s. Moreover, the very definition of "prepayments" 
has evolved from one focused primarily on borrowers' refinancing options to one encompassing a 
plethora of actions and decisions. 


In general, a mortgage is a loan that is secured 
by underlying assets that can be repossessed in 
the event of default. In the residential housing 
market, a mortgage is defined as a loan made 
to the owner of a one- to four-family residen¬ 
tial dwelling and secured by the underlying 
property (i.e., the land, the structure and any 
improvements). The fundamental unit in the 
residential mortgage-backed securities (MBS) mar¬ 
ket is the pool. At its lowest common denomina¬ 
tor, mortgage-backed pools are aggregations of 
large numbers of mortgage loans with similar 


(but not identical) characteristics. Loans with 
a commonality of attributes such as note rate 
(i.e., the interest rate paid by the borrower on 
the loan), term to maturity, credit quality, loan 
balance, and product type are combined using a 
variety of legal mechanisms to create relatively 
fungible investment vehicles. 

To value a residential MBS, a financial mod¬ 
eler must project the cash flow. For an in¬ 
dividual mortgage, the monthly cash flow 
includes the scheduled principal payments 
(also referred to as amortization), interest 
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payments, and any prepayments. Prepayments 
are any payments made by borrowers that are 
in excess of the scheduled principal payment. 
Consequently, the cash flow depends on the 
prepayment behavior of the borrowers in the 
mortgage pool. This risk faced by investors is 
referred to as prepayment risk and is similar to 
the risk faced by investors in callable corporate 
bonds. 

Both the valuation and the subsequent per¬ 
formance of a residential MBS depend on 
prepayments—projected in the former case and 
realized in the latter case. In this entry, we dis¬ 
cuss the underlying factors impacting principal 
repayment rates. We also draw distinctions be¬ 
tween the traditional view of prepayments and 
a broader one that puts credit-related factors 
into context. 


PREPAYMENT 

FUNDAMENTALS 

Traditional prepayment analysis has focused on 
borrowers' option to retire their loans prior to 
maturity. Virtually all mortgage loans allow for 
the early repayment of principal. Prepayment 
behavior can be divided into several categories. 
The first of these is referred to as turnover, which 
occurs when the underlying property is sold 
and the associated loan is retired. Turnover can 
occur for a number of reasons: 

* The homeowner moves or trades up to a 
larger house. 

* The obligor relocates as part of changes in 
their job or employment. 

• The property is sold subsequent to the death 
of the homeowner or as part of a divorce 
settlement. 

• The property is destroyed by a fire or other 
natural disaster. 

In all these cases, the resulting proceeds (from 
either the property's sale or an insurance settle¬ 
ment) are passed on as prepaid principal to the 
holder of the mortgage. In the event of the sale 


of the property, the loan is paid off from the 
proceeds of the sale; in fact, most loans contain 
a "due-on-sale" clause ensuring that the loan is 
retired once the property is sold. Properties are 
also sold in the event that the obligors encounter 
financial difficulties. While we discuss credit- 
related factors at several points in this entry, it 
is important to note that prepayments resulting 
from credit events are sometimes taken into ac¬ 
count under the broad umbrella of "turnover." 

A second form of prepayment can be broadly 
ascribed to refinancing. This behavior can take 
a number of forms. A rate-and-term refinancing 
is undertaken solely to reduce the borrower's 
monthly payment, most commonly due to a de¬ 
cline in the level of consumer mortgage rates. 
Such a change puts the market rate for new 
mortgages below the rate of existing loans, cre¬ 
ating incentives to refinance. A related activity 
takes place when borrowers refinance in order 
to liquefy their home's equity by increasing the 
balance on their new loan. Such transactions, re¬ 
ferred to as cash-out refinancings, often are taken 
as an alternative to second lien loans. Cash-out 
activity is strongly correlated with rates of home 
price appreciation which, logically enough, cre¬ 
ates the borrower equity extracted through the 
transaction. Such activity can also be relatively 
insensitive to traditional refinancing incentives, 
and has at times boosted prepayment speeds for 
lower-coupon MBS. 

At various points in time, borrowers have also 
been inclined to refinance from one product into 
a different one that offers a payment savings. 
A simple form of product transition is to refi¬ 
nance from a fixed-rate loan into an adjustable- 
rate mortgage (ARM) that offers a lower rate. 
Borrowers have also transitioned into prod¬ 
ucts with alternative amortization schemes, 
such as interest-only and negative amortization 
loans, in order to reduce their monthly pay¬ 
ment burdens. Such transitions are contingent 
on the availability and popularity of alterna¬ 
tive products, as well as borrowers' ability (ei¬ 
ther through lower rates or other nontraditional 
means) to achieve payment reductions. 
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Incentive (NoteRate-Freddie Mac PMMS Rate) 

Figure 1 Prepayment S-Curves for Different Years for 30-year Fixed Rate Conventional Loans 
Data Source: eMBS. 


Another critical factor in prepayments is 
based on the borrower's financial situation. 
However, the impact of borrower credit on pre¬ 
payments is quite complex. Prepayments often 
result directly from changes to homeowners' fi¬ 
nancial situation. At its simplest, principal is 
returned to investors when borrowers default 
on their loans, although the amount and tim¬ 
ing of principal cash flows is subject to many 
variables. However, credit-related factors also 
exert more subtle effects on prepayment behav¬ 
ior. For example, borrowers with weak credit, or 
who don't have significant equity in their home, 
may not be able to take advantage of declining 
interest rates by obtaining new loans. 

Taken together, these factors and activities re¬ 
sult in prepayment speeds that vary across the 
MBS market. The most common way to assess 
prepayment speeds within a product group is 
by a simple view of prepayment speeds as mea¬ 
sured by conditional prepayment rates (CPRs) 
at various levels of refinancing incentive. Pre¬ 
payment S-curves show prepayment speeds for 
different levels of mortgage rates and / or refi¬ 
nancing incentives. S-curves can be created us¬ 


ing a number of different methodologies and 
data sources. Either projected or historical pre¬ 
payment speeds can be shown; additionally, the 
level of prepayments can be compared by show¬ 
ing either the absolute level of rates or the rela¬ 
tive degree of refinancing incentive. 

An example of S-curves for different peri¬ 
ods of time is shown in Figure 1. The figure 
shows historical prepayment speeds for 30-year 
conventional fixed-rate pools exhibited by refi¬ 
nancing incentive (defined as the cohort's WAC 
less the Freddie Mac 30-year fixed survey rate 
for that period). The different shapes of the 
S-curves are indicative of different consumer 
behaviors. For example, the curve for 2003 
was quite steep, indicating that borrowers were 
extremely sensitive to refinancing opportuni¬ 
ties; borrowers that had an incentive to refi¬ 
nance (or, to borrow a term from the option 
market, were "in-the-money") did so in large 
numbers. At the same time, prepayments 
on "out-of-the-money" pools (i.e., those with 
lower weighted-average coupons (WACs) and 
no apparent refi-nancing incentive) were rela¬ 
tively slow, reflecting slow housing turnover 
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and limited cash-out activity. By contrast, the 
S-curves for 2004 and 2005 were increasingly 
flat. This reflected faster housing turnover, brisk 
levels of cash-out activity, and growing prod¬ 
uct transition activity for loans with minimal 
or negative incentives, while in-the-money bor¬ 
rowers were less responsive to apparent refi¬ 
nancing opportunities. 

The following subsections discuss the pri¬ 
mary drivers of prepayment speeds in more 
detail. 

Turnover 

As previously described, turnover refers to ac¬ 
tivity in which the underlying property is sold 
or liquidated, with the proceeds of the sale sub¬ 
sequently passed through to the holder of the 
mortgage as a prepayment. There are a num¬ 
ber of ways to observe the level of turnover. A 
simple way to assess turnover is to look at the 
prepayment speeds of out-of-the-money MBS 
pools, such as, for example, prepayment speeds 
on Fannie 4.0s when mortgage rates are 5% or 
higher. 

However, prepayment speeds for lower- 
coupon MBS can also be influenced by fac¬ 
tors other than turnover. For example, high 
levels of cash-out refinancings (when borrow¬ 
ers refinance primarily to monetize the equity 
in their homes) will also increase prepayment 
speeds on out-of-the-money coupons. Product 
transition activity, which was widespread from 
2004 through early 2007, can also distort the 
normal calculation of "in-the-moneyness." As 
discussed later in this entry, transitions typi¬ 
cally are associated with the widespread avail¬ 
ability and popularity of products that allow 
borrowers to reduce their monthly payment 
obligations through either lower loan rates or 
alternative amortization schemes. 

A truer estimate of housing turnover can be 
obtained by calculating existing home sales for 
single-family homes as a percentage of the num¬ 
ber of such homes owned. Existing home sales 
data are published monthly by the National 
Association of Realtors, while the number of 


single-family homes outstanding is reported by 
the Census Bureau on a quarterly basis, subject 
to periodic adjustments. Research indicates that 
turnover has varied over time, primarily reflect¬ 
ing changes in the level of home sales. 

It is tempting to associate elevated housing 
turnover with robust growth in home prices. 
Purely speaking, however, housing turnover is 
not directly associated with real estate price ap¬ 
preciation, but rather with the level of home 
sales activity and the number of completed 
transactions. While home prices and sales are 
highly correlated, it is conceivable that home 
prices could stagnate while sales activity re¬ 
mains firm, and vice versa. 

Refinancing 

Refinancing ("refi") activity can be broadly de¬ 
fined as transactions where borrowers replace 
their existing mortgage with a new loan, us¬ 
ing the proceeds from the new loans to pay off 
their preexisting mortgage obligations. While it 
encompasses a number of different activities, 
it most commonly occurs when the prevailing 
level of interest rates declines to the point where 
borrowers can take out new loans and reduce 
their monthly payments (after accounting for 
transaction costs and potential penalties). 1 

As noted already, refinancing activity can be 
broadly categorized as rate-and-term refinanc¬ 
ings, where borrowers act solely to reduce their 
mortgage payments, and cash-out refinancings 
for which the new loan is larger than the one be¬ 
ing retired. Rate-and-term refis are easily con¬ 
ceptualized as a form of option exercise. In a 
fashion similar to a corporation calling a debt 
issue, homeowners can reduce their required 
debt service obligations by calling their current 
loans carrying above-market rates and issuing 
new debt. 

However, the nature of mortgage lending 
complicates borrowers' refinancing decisions. 
Homeowners refinancing their loans are subject 
to a variety of costs and fees, many of which 
are fixed. The expected monthly savings, by 
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contrast, is a function of the size of the loan 
in question. This implies that refinancing in¬ 
centives are strongly impacted by loan size, as 
smaller loans typically require a greater refi¬ 
nancing incentive in order to trigger refinanc¬ 
ing activity Take, for example, two loans with 
5% note rates and balances of $200,000 and 
$400,000, respectively A 50 basis point rate sav¬ 
ings reduces the payment on the $200,000 loan 
by $60 per month, while the same rate sav¬ 
ings reduces the larger loan's monthly payment 
by roughly $120. If both loans are subject to 
$1,000 in refinancing costs, the borrower with 
the $400,000 loan will recoup the initial out¬ 
lay in month 8; the borrower with the smaller 
loan needs more than double the time to break 
even. This makes loan size a critical variable 
in modeling and projecting future prepayment 
speeds. 

Cash-out refinancings are commonly viewed 
as a subset of overall refinancing activity. For 
example, Freddie Mac defines cash-out refis as 
transactions where the new loan is at least 5% 
larger than the original one, and reports cash¬ 
outs as a percentage of overall prepayment ac¬ 
tivity. The level of cash-out activity has varied 
significantly over time. For example, the rel¬ 
ative level of cash-out activity was extremely 
high in the late 1980s and 1990s, as well as in 
the period between 2003 and 2007. 

The primary driver of cash-out activity at any 
point in time is the amount of equity borrowers 
have in their homes. In turn, equity is a function 
of both the original equity in the home (i.e., the 
inverse of a loan's loan-to-value (LTV) ratio) 
and the rate of home price appreciation since 
the home was purchased. 

Aggregate refinancing incentives can be ob¬ 
served by examining the distribution of note 
rates within the MBS universe at various points 
in time. Keep in mind that the outstanding 
mortgage population is always changing, as 
new loans are issued and older loans are retired. 
The distribution of note rates for the popula¬ 
tion of outstanding loans is strongly impacted 
by refinancing activity, which can be thought 


of as recycling older high-rate loans into new 
mortgages with lower rates. 

A useful technique is to compare the out¬ 
standing balances and the cumulative percent¬ 
ages of note rates for MBS products at different 
points in time. The cumulative balance percent¬ 
ages are calculated as follows: 

* Divide the outstanding market balances into 
discrete segments or "buckets" by WAC. (The 
following analysis uses 12.5 basis point WAC 
buckets.) 

* For each WAC bucket, calculate the percent¬ 
age of the remaining balances with note rates 
equal to and below that bucket. 

For example, if the lowest WAC bucket is 5.0% 
to 5.124% and it represents 2% of the remain¬ 
ing balance, its cumulative percentage is 2%. If 
the next WAC bucket (5.125% to 5.249%) com¬ 
prises 6% of the unpaid balance of the mar¬ 
ket, its cumulative balance is therefore 8%. This 
process is completed for all WAC buckets. This 
technique is particularly useful in assessing the 
"refinanceability" of the market at particular 
points in time. 

FACTORS INFLUENCING 
PREPAYMENT SPEEDS 

In understanding and evaluating prepayment 
behavior, the level of consumer mortgage rates 
is the single factor upon which most attention 
is paid. Flowever, there is no single "market" 
rate that analysts can observe. There are al¬ 
ways differences in the rate offerings of dif¬ 
ferent lenders; since loans are the "product" 
they offer, it's not surprising that there are pric¬ 
ing discrepancies. Individual lenders also have 
a variety of offerings, with different combi¬ 
nations of interest rates and up-front fees (or 
"points," which vary inversely with the rate 
offered). While these options give borrowers 
choices between up-front costs and monthly 
payments, the relationship between rates and 
points is highly lender-specific and a function 
of their pricing algorithms. Finally, lenders seek 
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Figure 2 MBA Refi Index versus Freddie Mac 30-year Survey Rate 
Data Sources: Mortgage Bankers Association and Freddie Mac. 


to price in the risk of loans to various borrow¬ 
ers in a serious of activities broadly classified 
under "risk-based pricing." 2 

However, a variety of outside factors that 
influence prepayment speeds and refinancing 
behavior can be outlined. These include ex¬ 
ogenous factors, mortgage industry economics, 
and consumer behaviors and preferences. 

Borrower Inefficiencies 

Rational borrowers will always seek to lower 
their borrowing costs by refinancing their debts. 
Refinancing opportunities present themselves 
to both institutional and individual borrow¬ 
ers. Unlike corporations and municipalities, 
however, residential borrowers are relatively 
inefficient in capitalizing on refinancing op¬ 
portunities. (If mortgagors were efficient, for 
example, few if any premium pools would 
be outstanding; however, there were approxi¬ 
mately $110 billion of 30-year Fannie Maes with 
coupons of 6.5% and higher at the end of 2010.) 

Borrower inefficiencies exist for a number of 
reasons. Homeowners have varying degrees of 
awareness of financial market rates and condi¬ 


tions, and as a result are not always cognizant 
of refinancing opportunities. Borrowers often 
hear about declines in rates from their friends 
and co-workers; they also may read about it in 
the financial press or see it discussed on news 
programs. These are collectively referred to as 
media effects. While the growth of the financial 
press (with information available from print, 
television, and the Internet) has improved re¬ 
financing efficiency over time, it often takes a 
significant and noteworthy drop in rates to gen¬ 
erate conversation and media "buzz." This ex¬ 
plains the tendency for refinancings to occur 
in waves, as illustrated in Figure 2. The fig¬ 
ure shows mortgage rates (using Freddie Mac's 
30-year survey rate as a proxy, shown on a re¬ 
verse scale) versus refinancing activity, using 
the Mortgage Bankers Association's refinanc¬ 
ing applications index. The figure indicates that 
refinancing activity often remains tepid for long 
periods of time, but spikes when mortgage rates 
decline beyond some indeterminate threshold. 

In addition, the costs associated with refinanc¬ 
ing alter the refinancing economics for borrow¬ 
ers. The need to overcome cost hurdles serves 
to inhibit refinancing activity and complicates 
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refinancing decisions. As noted previously, this 
is particularly relevant for borrowers with 
smaller loan balances, who typically require a 
greater refinancing incentive before engaging 
in rate-and-term refinancings. 

Refinancing efficiency has also been impacted 
by the structure of the mortgage industry. Be¬ 
ginning in the mid-1990s, lenders became in¬ 
creasingly adept at marketing their products 
and generating refinancing activity. Some of 
these activities involve directly contacting exist¬ 
ing customers, while others involve mass mar¬ 
keting through television commercials, print 
advertisements, and direct mail and phone so¬ 
licitations. Also contributing to the marketing 
effort was a cadre of mortgage brokers and 
other "third-party originators" who acted as 
agents linking lenders and borrowers. These de¬ 
velopments contributed to improved refinanc¬ 
ing efficiency. 

The events that culminated in the financial 
crisis in 2008, however, led to sharp contraction 
in "wholesale" lending activities. Brokers were 
blamed for poor loan quality and sloppy paper¬ 
work; since they did not make loans directly, 
they arguably had no incentive to insure the 
quality of their loans. As a result, many smaller 
originators that were dependent on the whole¬ 
sale channel failed, while a number of large 
originators curtailed or severely limited their 
interaction with third-party lenders. This de¬ 
velopment in turn served to impair borrowers' 
ability and / or willingness to capitalize on refi¬ 
nancing opportunities. 

Finally, additional factors impact refinancing 
activities. After 2007, for example, a combina¬ 
tion of significantly tighter lending standards, 
fewer product offerings, and declining bor¬ 
rower equity due to falling home prices acted to 
further depress refinancing activity. Referring 
to Figure 2, the inability of the MBA's refi index 
to reach and maintain high levels reflected the 
fact that the pool of borrowers with the abil¬ 
ity to refinance was quickly exhausted when 
mortgage rates plummeted beginning in early 
2009. 


Product Choices and Transitions 

Both rate-and-term and cash-out refinancing 
activity is at times influenced by product tran¬ 
sitions. This means that borrowers can lower 
their monthly payment by refinancing from one 
product into another. This type of activity has 
varied over time, depending on the availabil¬ 
ity, popularity, and pricing of alternative prod¬ 
ucts.When the yield curve has been relatively 
steep, for example, large numbers of borrow¬ 
ers have sometimes refinanced out of fixed rate 
loans into adjustable rate products. 

Transition activity has varied substantially 
over time, however, driven by both lender 
offerings and consumer preferences. Prior to 
mid-2003, for example, ARMs were a niche 
product targeted primarily to first-time home 
buyers. In the summer of 2003, however, ARM 
volumes rose fairly dramatically, as consumers 
refinanced out of fixed rate products into newly 
popular hybrid ARMs. This reflected both con¬ 
sumers' increased comfort with adjustable rate 
loans as well as marketing efforts by mort¬ 
gage lenders designed to maintain issuance 
volumes. By mid-2007, borrowers once again 
eschewed ARMs, in part due to bad publicity 
emphasizing their riskiness. 

These abrupt changes in behavior are illus¬ 
trated in Figure 3. The figure contains a scatter- 
chart showing the Freddie Mac 30-year fixed 
survey rate on the horizontal axis, and the 
percentage of loans taken as ARMs on the 
vertical axis. The figure demonstrates the ex¬ 
istence of three distinct regimes. ARMs were 
relatively unpopular in the years prior to mid- 
2003, and only reflected a large share of activ¬ 
ity when mortgage rates were relatively high. 
From mid-2003 through early 2008, by contrast, 
the percentage of ARMs was relatively high ir¬ 
respective of the level of mortgage rates and, 
by implication, refi activity. After the begin¬ 
ning of 2008, ARMs again fell out of favor; by 
2010 they comprised less than 10% of new loan 
applications. 

The varying popularity of fixed-to-ARM re¬ 
financings has several implications. Because 
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of the generally upward slope of the yield 
curves, ARM rates are typically lower than 
fixed rates. This means that borrowers will¬ 
ing to utilize adjustable rate products will be 
presented with an apparent refinancing incen¬ 
tive more often than those borrowers that es¬ 
chew ARMs and will only consider fixed rate 
products. (Of course, this savings is only guar¬ 
anteed for an ARM's fixed rate or "teaser" 
period.) Taking available ARM rates into ac¬ 
count means that more borrowers can reduce 
their mortgage rates by refinancing. 3 As a re¬ 
sult, regimes where ARMs are a popular prod¬ 
uct choice (due to consumer preferences and / or 
a steep yield curve) are characterized by steady 
levels of refinancing activity and relatively flat 
S-curves. 

Alternatively, when short rates rise and push 
ARM rates higher, fixed-to-ARM refinancing in¬ 
centives are reduced. In fact, regimes associated 
with flat yield curves are often characterized by 
ARM-to-fixed transitions, as borrowers seek to 
lock in lower long-term rates. Taken together, 
these phenomena indicate that refinancing be¬ 
havior is not simply dictated by the level of 
intermediate and long interest rates. The levels 
of all interest rates, as well as the shape of the 


yield curve, are important drivers of refinanc¬ 
ing incentives and prepayment activity. 

Large-scale transitions also have been ob¬ 
served as borrowers utilized loan products with 
alternative amortization schedules and pay¬ 
ment schemes. As a simple example, a borrower 
with a $200,000 loan balance and a 30-year loan 
with a fixed 5% note rate would have monthly 
P&I payments of $1,074. If they refinanced into 
an interest-only loan with the same term and 
note rate, their new monthly payment would 
be $833, an initial savings of $240. However, the 
savings would only be available for the period 
that the borrower was allowed to make interest- 
only payments; after that point, the loan is "re¬ 
cast" (i.e., the payments are recalculated) over 
the remaining term. If the borrower chooses 
a new loan with an interest-only period of 10 
years, the post-recast monthly payment would 
be $1,320, significantly higher than the payment 
on the original loan. 

The borrower's decision thus trades off early 
savings for a sharply increase monthly pay¬ 
ment (or payment shock) at the recast. While such 
decisions were popular during the period of 
widespread product transitions, the mortgage 
crisis of 2007 led to the realization that these 
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types of transitions exposed both borrowers 
and lenders to serious embedded risks. As a re¬ 
sult, transitions into alternative payment prod¬ 
ucts became fairly rare by 2008. 

Changes in Homeowner Equity 
and Credit 

As noted earlier, the experience of the post-2007 
period has highlighted the interrelationship be¬ 
tween prepayments and home prices and, by 
extension, borrower credit. We already high¬ 
lighted the importance of cash-out refinancings 
and the critical role that home price apprecia¬ 
tion plays in this activity. In addition, deteri¬ 
orating borrower credit (of which homeowner 
equity is a crucial element) often directly results 
in prepayments, as we discuss next. 

However, changing home prices and bor¬ 
rower credit have other subtle affects on pre¬ 
payments. For example, borrowers often are 
presented with an enhanced refinancing in¬ 
centive when their credit improves. If they 
took loans with relatively high rates because 
of risk-based pricing, they can capitalize on 
their improved situation by refinancing. Such 
"credit curing" can be related to economic 
factors such as improving labor markets and 
consumer credit conditions, particularly when 
observing local or regional activity. A similar 
phenomenon is associated with rapid increases 
of home prices. Borrowers with high LTVs who 
were saddled with higher risk-based mortgage 
rates and/or mortgage insurance premiums 
can lower their payments once their homes ap¬ 
preciate in value, even if the overall level of 
mortgage rates remains unchanged. 

Alternatively, borrower credit can also act to 
slow prepayment speeds. Borrowers with de¬ 
teriorating credit may not be able to capitalize 
on declining interest rates if they cannot obtain 
new loans because of tighter credit standards. 
Declining real estate values can also prevent 
homeowners from refinancing existing loans by 
reducing or eliminating their equity. If home- 
owners' equity disappears or becomes nega¬ 


tive (a situation often referenced as "being un¬ 
derwater"), they may lose the ability to ob¬ 
tain new loans. Moreover, significant declines 
in home values ultimately serve to constrain 
homeowners from selling their properties, as 
they would be forced to realize large losses 
on their homes. These developments are col¬ 
lectively called prepayment lock-in, and serve 
to slow both refinancing- and turnover-related 
prepayments. 

Time 

Prepayment rates vary with the passage of 
time. In addition to purely random variations, 
fairly predictable changes occur to prepayment 
speeds due to factors that are independent of in¬ 
terest rates. The behavior of borrowers under¬ 
goes a variety of secular and cyclical changes 
as time elapses; in addition, the composition of 
closed loan populations (i.e., loans collateraliz¬ 
ing a pool) changes as the pool ages and loans 
drop out for any number of reasons. 

Time-related factors mean that evaluating any 
MBS at a single constant speed is unrealis¬ 
tic. This realization was first incorporated into 
the PSA prepayment benchmark, which rec¬ 
ognized the fact that loans are more likely to 
prepay as they age (or season). Borrowers are 
disinclined to prepay their loans immediately 
after issuance, but become increasingly open to 
the possibility as time elapses. This is due to a 
variety of factors: 

• Borrowers typically are reluctant to under¬ 
take the effort and expense of refinancing un¬ 
til their loans are at least a few months old. 

• Borrowers are unlikely to sell their proper¬ 
ties and move immediately after purchasing 
a home. This is true even for homeowners 
that relocate frequently; evidence suggests 
that they tend to stay in their homes for at 
least a year. 

• It takes some time for borrowers to build eq¬ 
uity in their property (assuming, of course, a 
regime of rising home prices). 
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The key insight introduced by the PSA model 
is the concept that prepayment speeds are not 
constant over time, especially early in loans' 
lives. It is, however, simplistic in its assump¬ 
tion of a constant prepayment speed after 
30 months, and does not account for other time- 
related behaviors. One such factor is seasonal¬ 
ity, which suggests that prepayments typically 
increase during spring and summer months. 
Another behavior, burnout, accounts for the ob¬ 
servation that loans remaining in a population 
are less likely to refinance after a certain point 
in time. The underlying logic is that borrowers 
that have not availed themselves of refinancing 
opportunities lack the ability and / or the incli¬ 
nation to do so. 

The combination of these behaviors means 
that a time series of CPRs generated by a pre¬ 
payment model (as well as the realized pre¬ 
payment speeds for any security)—the CPR 
vector —will look very different from the equiv¬ 
alent speeds quoted as percentages of the PSA 
model. 

Time-related changes to prepayment speeds 
are even more profound for mortgage prod¬ 
ucts that do not require fixed monthly payments 
over their life. For example, ARMs typically ex¬ 
perience a spike in prepayment speeds as the 
loans approach their first reset date. (For ex¬ 
ample, the monthly payments on 5/1 hybrid 
ARMs change when the loans reset at month 
60.) Interest-only loans exhibit comparable be¬ 
havior, as their required monthly payments in¬ 
crease once the IO period expires. All such 
products exhibit prepayment patterns reflect¬ 
ing variations in the loans' monthly payments 
and, by implication, refinancing incentives. 

The spike in ARM speeds at their reset results 
from a variety of factors. Unlike homeowners 
in Europe, U.S. borrowers have traditionally 
been somewhat averse to adjustable-rate loans. 
This means that borrowers often prepay hybrid 
ARMs simply to avoid being exposed to chang¬ 
ing interest rates and variable payments. It also 
is a function of the level of the benchmark rate 
at the reset; in regimes where the yield curve 
is flat or inverted, the new loan rates are often 


higher than the teaser rate. The resulting pay¬ 
ment shock creates a refinancing incentive for 
borrowers during periods when the new rate is 
higher than that for either a new ARM or a fixed 
rate loan. 

Empirical evidence shows a sharp increase 
in CPRs at the reset; in addition, models also 
project a cyclical increase in speed every 12 
months thereafter, corresponding with the an¬ 
nual rate resets for the loans as well as normal 
seasonal patterns. 4 

DEFAULTS AND 

"INVOLUNTARY" 

PREPAYMENTS 

The mortgage crisis that erupted in early 2007 
underscored the critical role of credit perfor¬ 
mance in all sectors of the mortgage and MBS 
markets. In the past, investors assumed that 
senior nonagency MBS were "money-good" 
by virtue of their triple-A ratings. The col¬ 
lapse of mortgage performance both reinforced 
the importance of sound credit analysis of 
private-label securities, while also giving in¬ 
vestors a painful and expensive lesson on the 
factors influencing residential mortgage credit 
performance. 

Factors Influencing Default 
Frequency and Credit Performance 

The general thinking has long been that bor¬ 
rower equity simply provides a cushion for the 
lender in cases when the home must be repos¬ 
sessed. However, a critical lesson learned from 
the post-2006 experience is that borrower credit 
performance and home prices are strongly in¬ 
terrelated at a number of levels, and that 
high-LTV loans have, all else being equal, an 
increased likelihood of default. 

At its most basic, appreciating home prices 
give borrowers the ability to monetize their 
home's equity in order to meet their finan¬ 
cial obligations and mitigate cash flow prob¬ 
lems. In addition, steady or rising home prices 
also impact the resolution of troubled loans. 
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Delinquent borrowers that have equity in their 
homes can sell their properties and, using the 
net proceeds, pay off their loans instead of going 
into foreclosure. In theory, borrowers should 
never default if their homes' values are great 
enough to extinguish the loan and pay the asso¬ 
ciated costs. Borrowers whose homes have de¬ 
clined to the point where their LTVs are greater 
than 100% (i.e., where their loans are greater 
than the value of their homes) do not have this 
option. This accounts for why some loan vin¬ 
tages (such as the year 2000) have experienced 
relatively high levels of delinquency but limited 
defaults and losses; borrowers in financial diffi¬ 
culty were able to sell their homes and emerge 
"whole." 5 

The decline in home prices that began in 
2007 resulted in unexpectedly large increases 
in defaults. The loss of home equity induced 
numerous borrowers to exercise the option 
embedded in any collateralized loan that allows 
the collateral to revert back to the lender. It is 
axiomatic in corporate credit theory that bor¬ 
rowers are expected to default on loans once 
the value of the loans' collateral declines below 
the value of the loans themselves. However, the 
mortgage sector has long operated under the as¬ 
sumption that obligors rarely walk away from 
the properties because of the importance of 
dwellings to families' well-being. This behavior 
was untested until 2007, in large part because 
home prices have never before experienced sig¬ 
nificant and widespread declines. However, the 
new phenomenon of the "strategic default" 
emerged during the mortgage crisis, where 
large numbers of homeowners with income and 
assets sufficient to service their loans never¬ 
theless ceased making monthly mortgage pay¬ 
ments. 

The emergence of this activity has a number 
of implications. The most important realization 
is that home prices and mortgage credit perfor¬ 
mance are closely linked. In this light, the strong 
credit performance exhibited by the mortgage 
market since the 1950s was arguably skewed 
higher by decades of steady home price appre¬ 
ciation. This assertion implicitly argues that res¬ 


idential mortgage loans are riskier assets than 
previously assumed. In addition, mortgage un¬ 
derwriters have placed undue faith in metrics 
such as credit scores which, while valuable, can¬ 
not serve as reliable proxies for borrowers' will¬ 
ingness to service their loans during times of 
financial distress. 

Voluntary and Involuntary 
Prepayments 

Once borrowers cease making regular pay¬ 
ments, the loans eventually go into default, 
meaning that the borrowers lose title to the un¬ 
derlying properties. The properties are subse¬ 
quently liquidated, typically by being placed in 
foreclosure; this means that the servicer even¬ 
tually takes possession of the property and sells 
it. The proceeds of the sale, less associated costs, 
are categorized as recovered principal or recov¬ 
eries. Since recoveries are typically less than the 
amount of the loan, some entity must absorb a 
principal loss. 

Losses for agency MBS are absorbed by the 
entity or agency that guaranteed them. At some 
point, seriously delinquent loans in agency 
pools are classified as "nonperforming" and 
subsequently bought out of the pools, either 
by the GSEs or (in the case of FHA and VA loans) 
the servicer. Because of the principal guar¬ 
anty, the full face value of principal is quickly 
returned to investors. This means that all un¬ 
scheduled principal payments can be captured 
in a single "prepayment speed" reported for 
the security in question. This measure is cal¬ 
culated based on the total principal repaid on 
the pool and the breakdown (either reported or 
estimated) between amortizations and prepay¬ 
ments (i.e., between scheduled and unsched¬ 
uled principal payments). As a result, many 
agency securities exhibited increased prepay¬ 
ment speeds during periods of poor credit 
performance and widespread delinquencies, 
particularly when the agencies change their 
buyout policies. 6 (This also blurs the line be¬ 
tween credit-related prepayments and normal 
housing "turnover.") 
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By contrast, traditional and credit-related pre¬ 
payments must be calculated and reported sep¬ 
arately for nonagency securities. This is because 
of the fact that credit support for these securities 
is internal; deals are structured such that senior 
bonds in a transaction have priority over other 
bonds in receiving principal and interest. Since 
the transaction itself will absorb incurred losses, 
traditional prepayments (which return all of 
principal to the security holder) and credit- 
related prepayments (which result in shortfalls 
that must be allocated within structures) must 
be segregated. As a result, private-label secu¬ 
rities report both voluntary prepayments, which 
encompass traditional prepayment activity, and 
credit-related involuntary prepayments. The lat¬ 
ter result from defaults or other events specifi¬ 
cally related to credit events (such as short sales 
of homes), while also accounting for the like¬ 
lihood that less than the full amount of prin¬ 
cipal will be returned to the transaction (or, 
more accurately, the trust holding the deal's 
collateral). 

These factors complicate the projection and 
calculation of prepayment speeds for private- 
label securities. Voluntary prepayments are 
typically quoted as VPRs, which stands for 
voluntary prepayment rate. They are calculated 
similarly to a CPR, in which a monthly per¬ 
centage of prepaid principal (sometimes called 
a VMM) is annualized. Involuntary prepay¬ 
ment speeds are quoted as conditional default 
rates (CDRs) which are calculated by annual¬ 
izing the monthly default rates or MDRs. Note 
that the sum of the monthly VMMs and MDRs 
equals the total deal SMM for any particular 
month. 

Involuntary prepayments require additional 
metrics to be reported. In addition to the rate 
of default, an estimate must be made of the 
loss severity (which indicates how much of the 
defaulted principal amount is returned to in¬ 
vestors) as well as the lag between the time 
when loans go into default (i.e., when the bor¬ 
rowers lose title to the properties) and when the 
trusts receive the recovered principal. 


Interactions Between Prepayments 
and Defaults 

There are some interesting interactions between 
voluntary and involuntary prepayment speeds 
that impact the analysis of private-label secu¬ 
rities. All things equal, fast prepayments en¬ 
hance the performance of these securities; faster 
return of principal means that there is less prin¬ 
cipal outstanding to go into default. At the 
same assumed CDR, faster voluntary prepay¬ 
ment speeds (i.e., a higher VPR assumption) 
will typically result in higher projected yields 
and returns. 

This assertion is somewhat simplistic, how¬ 
ever, since it doesn't take the changing compo¬ 
sition of the pool into account. For example, it is 
unlikely that the CDR would remain constant 
under the different VPR assumptions, as the 
profile of any closed population of mortgages 
changes over time. In addition to home prices 
and economic conditions, the composition of 
the collateral pool backing a transaction evolves 
as the result of attrition. Loans pay off over time 
as a result of both voluntary and involuntary 
factors. Voluntary prepayments negatively im¬ 
pact the composition of a pool because "bet¬ 
ter" borrowers (i.e., those with stronger credit 
and/or more equity in their homes) are able 
to take advantage of refinancing opportunities; 
since weaker borrowers are locked into their ex¬ 
isting loans, the credit profile of the remaining 
population deteriorates. This is known as ad¬ 
verse selection, and suggests that the credit qual¬ 
ity of a pool typically declines over time, all 
things equal. 

The high level of defaults experienced dur¬ 
ing the mortgage crisis also created a new and 
unanticipated phenomenon. High levels of de¬ 
faults means that weaker borrowers are drop¬ 
ping out of the collateral pools. In turn, the 
remaining borrowers generally have stronger 
credit, meaning that the population's credit 
profile improves over time. This is especially 
noteworthy during periods of declining home 
prices. Borrowers with poor credit (i.e., both 
those unable or unwilling to service their loans) 
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go into default in large numbers, while stronger 
borrowers who are nonetheless "locked in" by a 
lack of equity continue to service their loans and 
remain in the pool. This process is sometimes 
called favorable selection, and was most promi¬ 
nently observed in subprime and alt-A pools, 
which experienced very high levels of defaults. 

Neither the processes of adverse nor favor¬ 
able selection take place in a vacuum. For exam¬ 
ple, the performance of a cohort assumed to be 
adversely selected (i.e., having experienced rel¬ 
atively high levels of voluntary prepayments) 
will improve in the face of home price appreci¬ 
ation. Alternatively, a population of subprime 
loans may experience a renewed surge in de¬ 
faults if money-market rates increase sharply. 
Since many subprime loans have adjustable- 
note rates with very high loan margins, ris¬ 
ing rates create widespread payment shock that 
challenges the ability of borrowers to service 
their loans. 


KEY POINTS 

• Traditional prepayment analysis has focused 
on borrowers' option to retire their loans prior 
to maturity. 

• The two primary drivers of prepayment be¬ 
havior are turnover and refinancing. 

• Turnover occurs when the underlying prop¬ 
erties are sold and the associated loan is 
retired. 

• Refinancing behavior includes rate-and-term 
refinancing (undertaken to reduce the bor¬ 
rower's monthly payment, most commonly 
due to a decline in the level of consumer 
mortgage rates) and cash-out refinancing (of¬ 
ten are taken as an alternative to second lien 
loans and strongly correlated with rates of 
home price appreciation). 

• The most common way to assess prepayment 
speeds within a product group at various 
levels of refinancing incentive is with the pre¬ 
payment S-curves. These curves show pre¬ 


payment speeds for different levels of mort¬ 
gage rates and /or refinancing incentives. 

• In understanding and evaluating prepayment 
behavior, the level of consumer mortgage 
rates is the single factor to which most at¬ 
tention is paid. 

* Outside factors that influence prepayment 
speeds and refinancing behavior include 
exogenous factors, mortgage industry eco¬ 
nomics, and consumer behaviors and pref¬ 
erences. 

NOTES 

1. For a more detailed discussion, see Chap¬ 
ter 3 in Fabozzi, Bhattacharya, and Berliner 
( 2011 ). 

2. See Bhattacharya, Berliner, and Fabozzi 
(2008). 

3. If ARM rates are low enough, virtually the 
entire fixed rate coupon stack can be consid¬ 
ered in-the-money. 

4. See Bhattacharya, Berliner, and Fabozzi 
(2008). 

5. In these cases, the transaction is recorded as 
a home sale and captured under "turnover." 

6. In early 2010, Fannie Mae and Freddie Mac 
instituted policies in which loans that were 
120 days or more delinquent were automati¬ 
cally bought out of pools. Prior to that, buy¬ 
outs had been left to their discretion. The 
process of buying out large numbers of se¬ 
riously delinquent loans led to sharp short¬ 
term spikes in prepayment speeds, as well 
as huge writedowns for Fannie Mae and 
Freddie Mac. 
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Abstract: At one time the belief was that financial institutions are exposed to two main risks. 
Operational risk was regarded as a mere part of "other" risks. That view has changed. This risk is 
now viewed as a major risk faced by financial institutions as the world financial system has been 
shaken by a number of banking failures since the mid 1980s, and the risks—that internationally 
active banks, in particular, have had to deal with—have become more complex and challenging. 
More than 100 operational losses exceeding $100 million in value each and a number of losses 
exceeding $1 billion have impacted financial firms globally since the end of the 1980s. There is no 
question that the cause is unrelated to market or credit risks. Such large-scale losses have resulted in 
bankruptcies, mergers, or substantial equity price declines of a large number of highly recognized 
financial institutions. 


A long-held belief is that credit risk and market 
risk have been considered the two largest con¬ 
tributors to the risks faced by financial entities 
such as banks, insurance companies, and as¬ 
set management firms. Credit risk is the risk of 
counterparty failure; market risk is the loss due 
to changes in market indicators, such as equity 
prices, interest rates, and exchange rates. It is 
now recognized that operational risk is a major 
risk faced by financial entities. In general terms, 
operational risk is the risk of loss resulting from 
inadequate or failed internal processes, people, 
or systems or from external events. This risk 
encompasses legal risks, which includes, but is 


not limited to, exposure to fines, penalties, or 
punitive damages resulting from supervisory 
actions, as well as private settlements. 

Operational losses have been reflected in 
banks' balance sheets for many decades. Op¬ 
erational risk affects the soundness and oper¬ 
ating efficiency of all banking activities and all 
business units. Most of the losses are relatively 
small in magnitude—the fact that these losses 
are frequent makes them predictable and of¬ 
ten preventable. Examples of such operational 
losses include losses resulting from acciden¬ 
tal accounting errors, minor credit card fraud, 
or equipment failures. Operational risk-related 
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events that are often more severe in the mag¬ 
nitude of incurred loss include tax noncom¬ 
pliance, unauthorized trading activities, major 
internal fraudulent activities, business disrup¬ 
tions due to natural disasters, and vandalism. 

Until around the 1990s, the latter events have 
been infrequent, and even if they did occur, 
banks were capable of sustaining the losses 
without major consequences. This is quite un¬ 
derstandable because the operations within the 
banking industry until about the middle of 
the 1980s have been subject to numerous re¬ 
strictions, keeping trading volumes relatively 
modest and diversity of operations limited. 
Therefore, the significance of operational risk 
(whose impact is positively correlated with in¬ 
come size and dispersion of business units) has 
been perceived as minor, with limited effect on 
management's decision making and capital al¬ 
location when compared to credit risk and mar¬ 
ket risk. However, serious changes in the global 
financial markets have caused noticeable shifts 
in banks' risk profiles. 

In this entry, we discuss some key aspects that 
distinguish operational risk from credit risk and 
market risk. They are related to the arrival pro¬ 
cess of loss events, the loss severity, and the de¬ 
pendence structure of operational losses across 
a bank's business units. 


OPERATIONAL RISK 
DEFINED 

Let's begin by distinguishing operational risk 
from other categories of financial risk. Opera¬ 
tional risk is, in large part, a firm-specific and 
nonsystematic risk. 1 Early publications of the 
Bank for International Settlements (BIS) defined 
operational risk as: 2 

• Other risks. 

• "Any risk not categorized as market and 
credit risk." 

• "The risk of loss arising from various types of 
human or technical errors." 


Other definitions proposed in the literature 
include: 

• Risk "arising from human and technical er¬ 
rors and accidents." 3 

• "A measure of the link between a firm's busi¬ 
ness activities and the variation in its business 
results." 4 

• "The risk associated with operating a 
business." 5 

The formal definition that is currently widely 
accepted was initially proposed by the British 
Bankers Association (2001) and adopted by the 
BIS in January 2001. Operational risk was de¬ 
fined as "the risk of direct or indirect loss 
resulting from inadequate or failed internal 
processes, people or systems or from external 
events." 

The industry responded to this definition 
with criticism regarding the lack of a clear def¬ 
inition of "direct" and "indirect" losses. A re¬ 
fined definition of operational risk dropped the 
two terms, hence finalizing the definition of op¬ 
erational risk as: 

Operational risk is the risk of loss resulting from 
inadequate or failed internal processes, people or 
systems, or from external events. (BIS, 2001b, p. 2) 

This definition includes legal risk, but ex¬ 
cludes strategic and reputational risk (these 
will be defined soon). The definition is "causal- 
based," providing a breakdown of operational 
risk into four categories based on its sources: 
(1) people, (2) processes, (3) systems, and (4) 
external factors. According to Barclays Bank, 
the major sources of operational risk include 
operational process reliability, IT security, out¬ 
sourcing of operations, dependence on key 
suppliers, implementation of strategic change, 
integration of acquisitions, fraud, error, cus¬ 
tomer service quality, regulatory compliance, 
recruitment, training and retention of staff, and 
social and environmental impacts. 6 

Large banks and financial institutions some¬ 
times prefer to use their own definition of 
operational risk. For example, Deutsche Bank 
defines operational risk as 
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potential for incurring losses in relation to employ¬ 
ees, contractual specifications and documentation, 
technology, infrastructure failure and disasters, ex¬ 
ternal influences and customer relationships. 7 

The Bank of Tokyo-Mitsubishi defines opera¬ 
tional risk as "the risk of incurring losses that 
might be caused by negligence of proper opera¬ 
tional processing, or by incidents or misconduct 
by either officers or staffs." 8 

In October 2003, the U.S. Securities and Ex¬ 
change Commission (SEC) defined operational 
risk as: 

the risk of loss due to the breakdown of con¬ 
trols within the firm including, but not limited to, 
unidentified limit excesses, unauthorized trading, 
fraud in trading or in back office functions, inexpe¬ 
rienced personnel, and unstable and easily accessed 
computer systems. 9 

OPERATIONAL RISK 
EXPOSURE INDICATORS 

The probability of an operational risk event 
occurring increases with a larger number of 
personnel (due to increased possibility of com¬ 
mitting an error) and with a greater transaction 
volume. Examples of operational risk exposure 
indicators include: 10 

* Gross income. 

* Volume of trades or new deals. 

* Value of assets under management. 

* Value of transactions. 

* Number of transactions. 

* Number of employees. 

* Employees' years of experience. 

* Capital structure (debt to equity ratio). 

* Historical operational losses. 

* Historical insurance claims for operational 
losses. 

For example, larger banks are more likely to 
have larger operational losses. Shih, Samad- 
Khan, and Medapa (2000) measured the de¬ 
pendence between a bank size and operational 
loss amounts. They found that, on average, for 
every unit increase in a bank size, operational 
losses are predicted to increase by roughly a 


fourth root of that. This means that when they 
regressed log-losses on a bank's log-size, the es¬ 
timated coefficient was approximately 0.25. In a 
different study, Chapelle, Crama, Hiibner, and 
Peters (2005) estimated the coefficient to be 0.15. 

CLASSIFICATION OF 
OPERATIONAL RISK 

Operational risk can be classified according to 

• The nature of the loss: internally inflicted or 
externally inflicted. 

• The impact of the loss: direct losses or indirect 
losses. 

• The degree of expectancy: expected or unex¬ 
pected. 

• Risk type, event type, and loss type. 

• The magnitude (or severity) of loss and fre¬ 
quency of loss. 

We discuss each one below. 

Internal versus External 
Operational Losses 

Operational losses can be either internally in¬ 
flicted or result from external sources. Inter¬ 
nally inflicted sources include most of the losses 
caused by human, process, and technology fail¬ 
ures, such as those due to human errors, internal 
fraud, unauthorized trading, injuries, business 
delays due to computer failures, or telecom¬ 
munication problems. External sources include 
man-made incidents such as external fraud, 
theft, computer hacking, terrorist activities, and 
natural disasters such as damage to physical as¬ 
sets due to hurricanes, floods, and fires. 

Many of the internal operational failures 
can be prevented with appropriate internal 
management practices; for example, tightened 
controls and management of the personnel can 
help prevent some employee errors and inter¬ 
nal fraud, and improved telecommunication 
networks can help prevent some technological 
failures. 
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External losses are very difficult to prevent. 
However, it is possible to design insurance or 
other hedging strategies to reduce or possibly 
eliminate externally inflicted losses. 

Direct versus Indirect Operational 
Losses 

Direct losses are the losses that directly arise 
from the associated events. For example, an 
incompetent currency trading can result in a 
loss for the bank due to adverse exchange rate 
movements. As another example, mistakenly 
charging a client $50,000 instead of $150,000 
results in the loss for the bank in the amount 
of $100,000. The Basel II Capital Accord sets 
guidelines regarding the estimation of the reg¬ 
ulatory capital charge by banks based only on 
direct losses. Table 1 identifies the Basel II Capi¬ 
tal Accord's categories and definitions of direct 
operational losses. 

Indirect losses are generally opportunity costs 
and the losses associated with the costs of fixing 
an operational risk problem, such as near-miss 
losses, latent losses, or contingent losses. 

Near-Miss Operational Losses 

Near-miss losses (or near-misses) are the esti¬ 
mated losses from those events that could po¬ 
tentially occur but were successfully prevented. 
The rationale behind including near-misses into 
internal databases is as follows: The definition 


of "risk" should not be solely based on the past 
history of actual events but instead should be a 
forward-looking concept and include both ac¬ 
tual and potential events that could result in 
material losses. The mere fact that a loss was 
prevented in the past (be it by luck or by con¬ 
scious managerial action) does not guarantee 
that it will be prevented in the future. There¬ 
fore, near-misses signal flaws in a bank's in¬ 
ternal system and should be accounted for in 
internal models. It is also possible to view near- 
misses from quite the opposite perspective: The 
ability to prevent these losses before they hap¬ 
pen demonstrates the bank's effective opera¬ 
tional risk management practices. Therefore, 
the losses that would result had these events 
taken place should not be included in the inter¬ 
nal databases. 

Muermann and Oktem (2002, p. 30) define 
near-miss as: 

an event, a sequence of events, or an observation 
of unusual occurrences that possesses the potential 
of improving a system's operability by reducing the 
risk of upsets some of which could eventually cause 
serious damage. 

They assert that internal operational risk mea¬ 
surement models must include adequate man¬ 
agement of near-misses. 

Muermann and Oktem propose develop¬ 
ing a pyramid-type three-level structure for 
the near-miss management system: corporate 
level, branch level, and individual level. At the 
corporate level within every bank, they propose 


Table 1 Direct Loss Types and Their Definitions According to the Basel II Capital Accord 

Loss Type 

Contents 

Write-downs 

Loss of recourse 

Restitution 

Legal liability 

Regulatory and compliance 

Loss of or damage to assets 

Direct reduction in value of assets due to theft, fraud, unauthorized activity, or 
market and credit losses arising as a result of operational events 

Payments or disbursements made to incorrect parties and not recovered 
Payments to clients of principal and/or interest by way of restitution, or the cost 
of any other form of compensation paid to clients 

Judgements, settlements, and other legal costs 

Taxation penalties, fines, or the direct cost of any other penalties, such as license 
revocations 

Direct reduction in value of physical assets, including certificates, due to an 
accident, such as neglect, fire, and earthquake 


Source: BIS (2001a), p. 23, with modifications. 
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establishing a Near-Miss Management Strate¬ 
gic Committee whose primary functions would 
include: 

* Establishing guidelines for corporate and site 
near-miss structures. 

* Developing criteria for classification of near- 
misses. 

* Establishing prioritizing procedures for each 
near-miss class. 

* Auditing the near-miss system. 

* Integrating quality and other management 
tools into near-miss management practice. 

* Identifying gaps in the near-miss manage¬ 
ment structure based on analysis of incidents 
with higher damage (beyond near-misses) 
and taking corrective actions. 

* Developing guidelines for training site 
management and employees on near-miss 
system. 

At the branch level, they propose establish¬ 
ing a Near-Miss Management Council for ev¬ 
ery business unit. The key responsibilities of 
the council would include: 

* Adapting criteria set by Near-Miss Man¬ 
agement Strategic Committee to the branch 
practices. 

* Monitoring site near-miss practices. 

* Promoting the program. 

* Ensuring availability of necessary resources 
for analysis and corrective action, especially 
for high priority near-misses. 

* Periodically analyzing reported near-misses 
for further improvement of the system. 

* Training employees on NM implementation. 

Finally, a successful near-miss management 
system relies on the individual actions by 
managers, supervisors, and employees. Appro¬ 
priate training is necessary to recognize op¬ 
erational issues before they become a major 
problem and develop into operational losses for 
the bank. 


Expected versus Unexpected 
Operational Losses 

Some operational losses are expected, some are 
not. The expected losses are generally those that 
occur on a regular (such as every day) basis, 
such as minor employee errors and minor credit 
card fraud. Unexpected losses are those losses 
that generally cannot be easily foreseen, such 
as terrorist attacks, natural disasters, and large- 
scale internal fraud. 

Operational Risk Type, Event Type, 
and Loss Type 

Confusion arises in the operational risk liter¬ 
ature because of the distinction between risk 
type (or hazard type), event type, and loss 
type. When banks record their operational loss 
data, it is crucial to record it separately ac¬ 
cording to event type and loss type, and cor¬ 
rectly identify the risk type. 11 The distinction 
between the three is comparable to cause and 
effect: 12 

• Hazard constitutes one or more factors that 
increase the probability of occurrence of an 
event. 

• Event is a single incident that leads directly to 
one or more effects (e.g., losses). 

• Loss constitutes the amount of financial dam¬ 
age resulting from an event. 

Thus, hazard potentially leads to event, and 
event is the cause of loss. Therefore, an event 
is the effect of a hazard while loss is the effect 
of an event. 

Figure 1 illustrates the mechanism of opera¬ 
tional loss occurrence. The following example, 
adopted from Mori and Harada (2001), further 
illustrates how the correct identification of the 
"event type" is critical in determining whether 
a loss of a particular "loss type" is attributed to 
market, credit, or operational risk. 

Consider the following example: 

• A reduction in the value of a bond due to a 
change in the market price. 
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Examples of hazard types: Examples of event types: 


Loss type categories: 


• Inadequate employee 
management 

• Obsolete computer 
systems 

• Inexperienced personnel 

• Large transaction volumes 

• Diversity and cultural 
differences 

• Unfavorable climate 
conditions or geographical 
location 

• Other 


• Internal fraud (e.g., 
unauthorized trading, 
forgery, theft) 

• External fraud (e.g., credit 
card fraud) 

• Diversity/discrimination 
events 

• Improper business and 
market practices 

• Failed/inaccurate 
reporting 

• System failure 

• Natural disasters 

• Other 


• Write-downs 

• Loss of recourse 

• Restitution 

• Legal liability 

• Regulatory and 
compliance (e.g., fines 
and taxation 
penalties) 

• Loss of or damage to 
physical assets 

• Other 


Figure 1 The Process of Operational Loss Occurrence 
Source: Mori and Harada (2001), p. 3, with modifications. 


• A reduction in the value of a bond due to the 
bankruptcy of the issuer. 

• A reduction in the value of a bond due to a 
delivery failure. 

In this example, the write-down of the bond 
(the loss type) belongs to the scope of market 


risk, credit risk, and operational risk, respec¬ 
tively Accurate documentation of operational 
risk by the type of hazard, event, and loss is also 
essential for an understanding of operational 
risk. 

The Basel II Capital Accord classifies oper¬ 
ational risk into seven event-type groups (see 


Table 2 Operational Risk Event Types and Their Descriptions According to the Basel II Capital Accord 


Event Types and Descriptions According to Basel II 
Event Type Definition and Categories 


1. Internal Fraud 


2. External Fraud 

3. Employment Practices 
and Workplace Safety 

4. Clients, Products, and 
Business Practices 


5. Damage to Physical 
Assets 

6. Business Disruption 
and System Failures 

7. Execution, Delivery, 
and Process 
Management 


Acts intended to defraud, misappropriate property, or circumvent regulations, the law, 
or company policy, which involves at least one internal party. Categories: 
unauthorized activity and theft and fraud. 

Acts of a type intended to defraud, misappropriate property, or circumvent the law, by a 
third party. Categories: (1) theft and fraud and (2) systems security. 

Acts inconsistent with employment, health, or safety laws or agreements, from payment 
of personal injury claims, or from diversity/discrimination events. Categories: (1) 
employee relations, (2) safe environment, and (3) diversity and discrimination. 

Unintentional or negligent failure to meet a professional obligation to specific clients 
(including fiduciary and suitability requirements), or from the nature or design of a 
product. Categories: (1) suitability, disclosure, and fiduciary, (2) improper business or 
market practices, (3) product flaws, (4) selection, sponsorship, and exposure, and (5) 
advisory activities. 

Loss or damage to physical assets from natural disaster or other events. Categories: 
disasters and other events. 

Disruption of business or system failures. Categories: systems. 

Failed transaction processing or process management, from relations with trade 
counterparties and vendors. Categories: (1) transaction capture, execution, and 
maintenance, (2) monitoring and reporting, (3) customer intake and documentation, 
(4) customer/client account management, (5) trade counterparties, and (6) vendors 
and suppliers. 


Source: BIS (2001b), pp. 21-23. 
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Table 2) and six operational loss types (see 
Table 1). 

Operational Loss Severity and 
Frequency 

We have already stated that expected losses 
generally refer to the losses of low severity (or 
magnitude) and high frequency. Generalizing 
this idea, operational losses can be broadly clas¬ 
sified into four main groups: 

1. Low frequency/low severity 

2. High frequency/low severity. 

3. High frequency/high severity. 

4. Low frequency/high severity. 

The idea is illustrated in the top half of 
Figure 2. 

According to Samad-Khan (2005), the third 
group is implausible. More precisely, he sug¬ 
gests classifying each of the frequency and 


severity of operational losses into three groups: 
low, medium, and high. This creates a 3 x 3 ma¬ 
trix of all possible "frequency/severity" com¬ 
binations. He states that "medium frequency/ 
high severity," "high frequency/medium 
severity," and "high frequency/high severity" 
losses are unrealistic. 

Recently, the financial industry also agreed 
that the first group is not feasible. Therefore, the 
two remaining categories of operational losses 
that the financial industry needs to focus on 
are "high frequency/low severity" and "low 
severity/high frequency" losses. The idea is il¬ 
lustrated in the bottom half of Figure 2. 

The losses of "high frequency/low severity" 
are relatively unimportant for an institution and 
can often be prevented. What poses the greatest 
damage is the "low frequency/high severity" 
losses. Banks must be particularly attentive to 
these losses as these cause the greatest finan¬ 
cial consequences to the institution, including 
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Figure 2 Classification of Operational Risk by Frequency and Severity: Unrealistic View (top) and 
Realistic View (bottom) 
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potential bankruptcy. 13 Just a few of such events 
may result in bankruptcy or a significant decline 
in the market value of the bank. Therefore, it 
is critical for banks to be able to capture such 
losses in their internal risk models. 

KEY POINTS 

• Financial institutions bear various opera¬ 
tional losses on a daily basis. Examples are 
losses resulting from employee errors, inter¬ 
nal and external fraud, equipment failures, 
business disruptions due to natural disasters, 
and vandalism. 

• Credit risk and market risk had been per¬ 
ceived as the two biggest sources of risk for fi¬ 
nancial institutions. Operational risk has been 
regarded as a mere part of "other" risks. Fail¬ 
ures of major financial entities have made 
market participants aware of the importance 
of this risk. 

• Operation risk is the risk of loss resulting from 
inadequate or failed internal processes, peo¬ 
ple, or systems or from external events. This 
definition identifies operational risk as com¬ 
ing from four major causes: processes, hu¬ 
man, systems, and external factors. 

• Operational risk can be classified according 
to several principles: nature of the loss (inter¬ 
nally inflicted or externally inflicted), direct 
losses or indirect losses, degree of expectancy 
(expected or unexpected), risk type, event 
type or loss type, and by the magnitude (or 
severity) of loss and the frequency of loss. 

• Operational risk can be the cause of repu¬ 
tational risk, a risk that can occur when the 
market reaction to an operational loss event 
results in reduction in the market value of a 
bank that is greater than the amount of the 
initial loss. 

NOTES 

1. However, operational risk is not entirely 
idiosyncratic. Two recent studies—Allen 
and Bali (2007) and Chernobai, Jorion, and 
Yu (2011)—found evidence of the effect of 


macroeconomic factors on operational risk 
in banks. 

2. See BIS (1998). 

3. See Jorion (2000). 

4. See King (2001). 

5. See Crouhy, Galai, and Mark (2001). 

6. See Barclays Bank Annual Report 2004, 
Form 20-F/A. 

7. Deutsche Bank 2005 Annual Report, p. 45. 

8. Bank of Tokyo-Mitsubishi Financial Perfor¬ 
mance, Form 20-F (2005), p. 124. 

9. "Supervised Investment Bank Holding 
Companies," SEC (2003), p. 62914. 

10. Examples of operational risk exposure in¬ 
dicators are given in BIS (2001a, Annex 4), 
Haubenstock (2003), and Allen, Boudoukh, 
and Saunders (2004). 

11. See the discussion of this issue in Mori and 
Harada (2001) and Alvarez (2002). 

12. See Mori and Harada (2001). 

13. The events that incur such losses are often 
called the "tail events." 
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Abstract: In general terms, operational risk is the risk of loss resulting from inadequate or failed 
internal processes, people, or systems or from external events. The models that have been proposed 
for assessing operational risk can be broadly classified into top-down models and bottom-up 
models. Top-down approaches quantify operational risk without attempting to identify the events 
or causes of losses. Bottom-up models quantify operational risk on a micro level being based on 
identified internal events. The obstacle hindering the implementation of these models is the scarcity 
of available historical operational loss data. 


Identifying the core principles that underlie the 
operational risk process is the fundamental build¬ 
ing block in deciding on the optimal model to 
be used. In this entry we provide an overview 
of models that have been put forward for the 
assessment of operational risk. These models 
are broadly classified into top-down models and 
bottom-up models. 

Operational risk is distinct from credit risk 
and market risk, posing difficulties of imple¬ 
mentation of the Basel II guidelines and strate¬ 
gic planning. We discuss some key aspects that 
distinguish operational risk from credit risk and 
market risk. They are related to the arrival pro¬ 
cess of loss events, the loss severity, and the de¬ 
pendence structure of operational losses across 


a bank's business units. Finally in this entry 
we reconsider the normality assumption—an 
assumption often made in modeling financial 
data—and question its applicability for the pur¬ 
pose of operational risk modeling. 

OPERATIONAL RISK 
MODELS 

Broadly speaking, operational risk models stem 
from two fundamentally different approaches: 
(1) the top-down approach, and (2) the bottom- 
up approach. Figure 1 illustrates a possible cat¬ 
egorization of quantitative models. 

Top-down approaches quantify operational 
risk without attempting to identify the events 
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Figure 1 Topology of Operational Risk Models 

or causes of losses. 1 That is, the losses are 
simply measured on a macro basis. The prin¬ 
cipal advantage of this approach is that little 
effort is required with collecting data and eval¬ 
uating operational risk. Bottom-up approaches 
quantify operational risk on a micro level be¬ 
ing based on identified internal events, and 
this information is then incorporated into the 
overall capital charge calculation. The advan¬ 
tage of bottom-up approaches over top-down 
approaches lies in their ability to explain the 
mechanism of how and why operational risk is 
formed within an institution. Banks can either 
start with top-down models and use them as a 
temporary tool to estimate the capital charge 
and then slowly shift to the more advanced 
bottom-up models, or they can adopt bottom- 
up models from the start, provided that they 
have robust databases. 


Models Based on Top-Down 
Approaches 

In this section we will provide a brief look 
at the seven top-down approaches shown in 
Figure l. 2 

Multifactor Equity Pricing Models 
Multifactor equity pricing models, also referred 
to as multifactor models, can be utilized to per¬ 
form a global analysis of banking risks and may 
be used for the purpose of integrated risk man¬ 
agement, in particular for publicly traded firms. 
The stock return process R t can be estimated by 
regressing stock return on a large number of 
external risk factor indexes I t related to mar¬ 
ket risk, credit risks, and other nonoperational 
risks (such as interest rate fluctuations, stock 
price movements, and macroeconomic effects). 
Operational risk is then measured as the volatil¬ 
ity of the residual term. Such models rely on the 
assumption that operational risk is the residual 
banking risk, after credit and market risks are 
accounted for. 3 

Rt — a t + b\Iu + • ■ ■ + b n I nt + et 

in which St is the residual term, a proxy for 
operational risk. 

This approach relies on the widely known ef¬ 
ficient market hypothesis that was introduced 
by Fama (1970), that states that in efficient 
capital markets all relevant past, publicly, and 
privately available information is reflected in 
current asset prices. 

Capital Asset Pricing Model 

Under the capital asset pricing model (CAPM) 
approach all risks are assumed to be measur¬ 
able by the CAPM and represented by beta 
(d). CAPM, developed by Sharpe (1964), is an 
equilibrium model that describes the pricing 
of assets. It concludes that the expected secu¬ 
rity risk premium (i.e., expected return on se¬ 
curity minus the risk-free rate of return) equals 
beta times the expected market risk premium 
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(i.e., expected return on the market minus the 
risk-free rate of return). 

Under the CAPM approach, operational risk 
is obtained by measuring market, credit, and 
other risks' betas and deducting them from the 
total beta. With respect to applications to opera¬ 
tional risk, the CAPM approach was discussed 
by Hiwatashi and Ashida (2002) and van den 
Brink (2002). According to van den Brink (2002), 
the CAPM approach has some limitations and 
so has not received a wide recognition for oper¬ 
ational risk, but was in the past considered by 
Chase Manhattan Bank. 

Income-Based Models 

Income-based models resemble the multifactor 
equity price models: Operational risk is esti¬ 
mated as the residual variance by extracting 
market, credit, and other risks from the his¬ 
torical income (or earnings) volatility. Income- 
based models are described by Allen, 
Boudoukh, and Saunders (2004), who refer to 
these models as earnings at risk models and 
by Hiwatashi and Ashida (2002), who refer to 
them as the volatility approach. According to 
Cruz (2002), the profit and loss (P&L) volatil¬ 
ity in a financial institution is attributed 50%, 
15%, and 35% to credit risk, market risk, and 
operational and other risks, respectively. 

Expense-Based Models 

Expense-based models measure operational 
risk as fluctuations in historical expenses rather 
than income. The unexpected operational losses 
are captured by the volatility of direct expenses 
(as opposed to indirect expenses, such as op¬ 
portunity costs, reputational risk, and strategic 
risk, that are outside the agreed scope of opera¬ 
tional risk), adjusted for any structural changes 
within the bank. 

Operating Leverage Models 

Operating leverage models measure the rela¬ 
tionship between operating expenses and to¬ 
tal assets. Operating leverage is measured as a 
weighted combination of a fraction of fixed as¬ 


sets and a portion of operating expenses. Exam¬ 
ples of calculating operating leverage amount 
per business line include taking 10% of fixed 
assets plus 25% times three months' operating 
expenses for a particular business, or taking 2.5 
times the monthly fixed expenses. 4 

Scenario Analysis and Stress Testing Models 

Scenario analysis and stress testing models can 
be used for testing the robustness properties 
of loss models, in monetary terms, in the pres¬ 
ence of potential events that are not part of 
banks' actual internal databases. These mod¬ 
els, also called expert judgment models by van 
den Brink (2002), are estimated based on the 
"what if" scenarios generated with reference 
to expert opinion, external data, catastrophic 
events that occurred in other banks, or imagi¬ 
nary high-magnitude events. Experts estimate 
the expected risk amounts and their associated 
probabilities of occurrence. For any particular 
bank, examples of scenarios include: 5 

* Bank's inability to reconcile a new settlement 
system with the original system. 

* A class action suit alleging incomplete 
disclosure. 

* Massive technology failure. 

* High-scale unauthorized trading (for exam¬ 
ple, adding the total loss borne by the Barings 
bank preceding its collapse into the database, 
and reevaluating the model). 

* Doubling the bank's maximum historical loss 
amount. 

Additionally, stress tests can be used to see the 
likely increase in risk exposure due to removing 
a control or reduction in risk exposure due to 
tightening of controls. 

Risk Indicator Models 

Risk indicator models rely on a number (one or 
more) of operational risk exposure indicators 
to track operational risk. In the operational risk 
literature, risk indicator models are also called 
indicator approach models, 6 risk profiling 
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models/ and peer-group comparison. 8 A nec¬ 
essary aspect of such models is testing for pos¬ 
sible correlations between risk factors. These 
models assume that there is a direct and sig¬ 
nificant relationship between the indicators 
and target variables. For example, Taylor and 
Hoffman (1999) illustrate how training expen¬ 
diture has a reverse effect on the number of 
employee errors and customer complaints and 
Shih, Samad-Khan, and Medapa (2000) illus¬ 
trate how a bank's size relates to the operational 
loss amount. 

Risk indicator models may rely on a single 
indicator or multiple indicators. The former 
model is called the single-indicator approach; 9 
an example of such a model is the Basic In¬ 
dicator Approach for quantification of the op¬ 
erational risk regulatory capital, proposed by 
the Basel II. The latter model is called the multi¬ 
indicator approach; an example of such a model 
is the Standardized Approach. 

Models Based on Bottom-Up 
Approaches 

An ideal internal operational risk assess¬ 
ment procedure would be to use a balanced 
approach, and include both top-down and 
bottom-up elements in the analysis. 10 For 
example, scenario analysis can prove effec¬ 
tive for backtesting purposes, and multifactor 
causal models are useful in performing opera¬ 
tional Value-at-Risk (VaR) sensitivity analysis. 
Bottom-up approach models can be categorized 
into three groups: 11 process-based models, 
actuarial-type models (or statistical models), 
and proprietary models. 

Process-Based Models 

There are three types of process-based models: 

(1) causal models and Bayesian belief networks, 

(2) reliability models, and (3) multifactor causal 
models. We describe each below. 

The first group of process-based models is the 
causal models and Bayesian belief networks. 
Also called causal network models, causal 


models are subjective self-assessment models. 
Causal models form the basis of the scorecard 
models. 12 These models split banking activities 
into simple steps; for each step, bank manage¬ 
ment evaluates the number of days needed to 
complete the step, the number of failures and 
errors, and so on, and then records the results 
in a "process map" (or scorecards) in order 
to identify potential weak points in the opera¬ 
tional cycle. Constructing associated event trees 
that detect a sequence of actions or events that 
may lead to an operational loss is part of the 
analysis. 13 For each step, bank management es¬ 
timates a probability of its occurrence, called 
the subjective (or prior) probability. The ulti¬ 
mate event's probability is measured by the 
posterior probability. Prior and posterior prob¬ 
abilities can be estimated using the Bayesian 
belief networks. 14 A variation of the causal 
models, connectivity models, focuses on the ex 
ante cause of operational loss event, rather than 
the ex post effect. 

The second group of process-based models 
encompasses reliability models. These models 
are based on the frequency distribution of the 
operational loss events and their interarrival 
times. Reliability models focus on measuring 
the likelihood that a particular event will occur 
at some point or interval of time. We discuss 
this model below. 

lff(t) is the density of a loss amount occurring 
at time f, then the reliability of the system is the 
probability of survival up to time f, denoted by 
R(t) and calculated as 

*« = i- fmds 

Jo 

The hazard rate (or the failure rate), h(t), is the 
rate at which losses occur per unit of time f, 
defined as 


In practical applications, it is often convenient 
to use the Poisson-type arrival model to de¬ 
scribe the occurrence of operational loss events. 
Under the simple Poisson model with the 
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intensity rate X (which represents the average 
number of events in any point of time), the inter¬ 
arrival times between the events (i.e., the time 
intervals between any two consecutive points 
of time in which an event takes place) follow an 
exponential distribution having density of form 
f(t) — Xe~ lt with mean interarrival time equal to 
1/X. The parameter X is then the hazard rate for 
the simple Poisson process. 

Finally, the third group of process-based 
models is multifactor causal models. These 
models can be used for performing the fac¬ 
tor analysis of operational risk. These are 
regression-type models that examine the sensi¬ 
tivity of aggregate operational losses (or, alter¬ 
natively, VaR) to various internal risk factors (or 
risk drivers). Multifactor causal models have 
been discussed in the VaR and operational risk 
literature. 15 Examples of control factors include 
system downtime in minutes per day, number 
of employees in the back office, data quality 
(such as the ratio of the number of transac¬ 
tions with no input errors to the total number of 
transactions), total number of transactions, skill 
levels, product complexity, level of automation, 
customer satisfaction, and so on. Cruz (2002) 
suggests using manageable explanatory factors. 
In multifactor causal models, operational losses 
OR, or VaR, in a particular business unit at a 
point f, are regressed on a number of control 
factors: 

ORf = flf + fqXit + • • • + b n X nt + St 

where X/ c , k = 1, 2,..., n, are the explanatory 
variables, and b's are the estimated coefficients. 
The model is forward-looking (or ex ante) as 
operational risk drivers are predictive of fu¬ 
ture losses. Extensions to the simple regression 
model may include autoregressive mod¬ 
els, regime-switching models, ARMA/GARCH 
models, and others. 

Actuarial Models 

Actuarial models (or statistical models) are gen¬ 
erally parametric statistical models. They have 
two key components: (1) the loss frequency and 


(2) the loss severity distributions of the historic 
operational loss data. Operational risk capital 
is measured by the VaR of the aggregated one- 
year losses. 16 

For the frequency of the loss data it is com¬ 
mon to assume a Poisson process, with possible 
generalizations, such as a Cox process. 

Actuarial models can differ by the type of 
the loss distribution. Empirical loss distribu¬ 
tion models do not specify a particular class of 
loss distributions, but directly utilize the em¬ 
pirical distribution derived from the historic 
data. Parametric loss distribution models make 
use of a particular parametric distribution for 
the losses (or part of them), such as lognormal, 
Weibull, Pareto, and so on. Models based on 
extreme value theory (EVT) restrict attention 
to the tail events (i.e., the losses in the upper 
quantiles of the severity distribution), and VaR 
or other analyses are carried out upon fitting 
the generalized Pareto distribution to the data 
beyond a fixed high threshold. Van den Brink 
suggests using all three models simultaneously; 
Figure 2, inspired by his discussions, illustrates 
possible approaches. Yet another possibility is 
to fit an ARMA/GARCH model to the losses be¬ 
low a high threshold and the generalized Pareto 
distribution to the data exceeding it. 



Empirical distribution, Generalized 

parametric distribution, Pareto 

combination of both, or other distribution 

Figure 2 An Example of a Histogram of the 
Operational Loss Severity Distribution 
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Proprietary Models 

Proprietary models for operational risk have 
been developed by major financial service com¬ 
panies and use a variety of bottom-up and 
top-down quantitative methodologies, as well 
as qualitative analysis, to evaluate operational 
risk. Banks can input their loss data into ready 
and systematized spreadsheets, which would 
be further categorized. The system then per¬ 
forms a qualitative and quantitative analysis of 
the data, and can carry out multiple tasks such 
as calculating regulatory capital, pooling in¬ 
ternal data with external, performing Bayesian 
network analysis, and so on. 


SPECIFICS OF OPERATIONAF 
FOSS DATA 

The nature of operational risk is very different 
from that of market risk and credit risk. In fact, 
operational losses share many similarities with 
insurance claims, suggesting that most actuar¬ 
ial models can be a natural choice of the model 
for operational risk, and models well developed 
by the insurance industry can be almost exactly 
applied to operational risk. In this section we 
discuss some key issues characterizing opera¬ 
tional risk that must be taken into consideration 
before quantitative analysis is undertaken. 

Scarcity of Available Historical Data 

The major obstacle banks face in developing 
comprehensive models for operational risk is 
the scarcity of available historical operational 
loss data. As of 2011, generally, even the largest 
banks have no more than 11-12 years of loss 
data. Shortage of relevant data means that the 
models and conclusions drawn from the avail¬ 
able limited samples would lack sufficient ex¬ 
planatory power. This in turn means that the 
estimates of the expected loss and VaR may be 
highly volatile and unreliable. In addition, com¬ 
plex statistical or econometric models cannot be 
tested on small samples. 


The problem becomes amplified when deal¬ 
ing with modeling extremely high operational 
losses: One cannot model tail events when only 
a few such data are present in the internal loss 
database. Three solutions have been proposed: 
(1) pooling internal and external data, (2) sup¬ 
plementing actual losses with near-miss losses, 
and (3) scenario analysis and stress tests (dis¬ 
cussed earlier in this entry). 

The idea behind pooling internal and exter¬ 
nal data is to populate a bank's existing internal 
database with data from outside the bank. The 
rationale is twofold: (1) to expand the database 
and hence increase the accuracy of statistical es¬ 
timations and (2) to account for losses that have 
not occurred within the bank but that are not 
completely improbable based on the histories 
of other banks. According to BIS, 

... a bank's internal measurement system must 
reasonably estimate unexpected losses based on the 
combined use of internal and relevant external loss 
data... (BIS, 2006, p. 150) 

Baud, Frachot, and Roncalli (2002) propose a 
statistical methodology to pool internal and ex¬ 
ternal data. Their methodology accounts for the 
fact that external data are truncated from below 
(banks commonly report their loss data to exter¬ 
nal parties in excess of $1 million) and that bank 
size may be correlated with the magnitudes of 
losses. They showed that pooling internal and 
external data may help avoid underestimation 
of the capital charge. 

Data Arrival Process 

One of the difficulties that arise with modeling 
operational losses has to do with the irregular 
nature of the event arrival process. In market 
risk models, market positions are recorded on a 
frequent basis, many times daily depending on 
the entity, by marking to market. Price quotes 
are available daily or for those securities that 
are infrequently traded, model-based prices are 
available for marking a position to market. As 
for credit risk, credit ratings by rating agen¬ 
cies are available. In addition, rating agencies 
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provide credit watches to identify credits that 
are candidates for downgrades. In contrast, op¬ 
erational losses occur at irregular time intervals 
suggesting a process of a discrete nature. This 
makes it similar to the reduced-form models 
for credit risk, in which the frequency of de¬ 
fault (i.e., failure to meet a credit agreement) is 
of nontrivial concern. Hence, while in market 
risk we need to model only the return distribu¬ 
tion in order to obtain VaR, in operational risk 
both loss severity and frequency distributions 
are important. 

Another problem is related to timing and data 
recording issues. In market and credit risk mod¬ 
els, the impact of a relevant event is almost 
immediately reflected in the market and credit 
returns. In an ideal scenario, banks would know 
how much of the operational loss would be 
borne by the bank from an event at the very mo¬ 
ment the event takes place and would record the 
loss at this moment. However, from the practi¬ 
cal point of view, this appears nearly impos¬ 
sible to implement, because it takes time for 
the losses to accumulate after an event takes 
place. Therefore, it may take days, months, or 
even years for the full impact of a particular loss 
event to be evaluated. Hence, there is the prob¬ 
lem of discrepancy (i.e., a time lag) between the 
occurrence of an event and the time at which 
the incurred loss is being recorded. 

This problem directly affects the method in 
which banks choose to record their operational 
loss data. When banks record their operational 
loss data, they record (1) the amount of loss, 
and (2) the corresponding date. We can identify 
three potential scenarios for the types of date 
banks might use: 17 

1. Date of occurrence: the date on which the event 
that has led to operational losses actually 
took place. 

2. Date on which the existence of event has been 
identified: the date when bank authorities re¬ 
alize that an event that has led to operational 
losses has taken or is continuing to take place. 
Recording a loss at this date may be relevant 


in cases when the true date of occurrence is 
impossible or hard to track. 

3. Accounting date: the date on which the total 
amount of operational losses due to a past 
event are realized and fully measured, and 
the state of affairs of the event is closed or 
assumed closed. 

Depending on which of the three date types is 
used, the models for operational risk and con¬ 
clusions drawn from them may be considerably 
different. For example, in the third case of ac¬ 
counting dates, we are likely to observe cyclical¬ 
ity/seasonal effects in the time series of the loss 
data (for example, many loss events would be 
recorded around the end of December), while in 
the first and second cases such effects are much 
less likely to be present in the data. Fortunately, 
however, selection of the frequency distribution 
does not have a serious impact on the resulting 
capital charge. 18 

Loss Severity Process 

There are three main problems that operational 
risk analysts must be aware of with respect to 
the severity of operational loss data: (1) the non¬ 
negative sign of the data, (2) the high degree of 
dispersion of the data, and (3) the shape of the 
data. 

The first problem related to the loss severity 
data deals with the sign of the data. Depending 
on the movements in the interest or exchange 
rates, the oscillations in the market returns and 
indicators can take either a positive or nega¬ 
tive sign. This is different in the credit and 
operational risk models—usually, only losses 
(i.e., negative cash flows) are assumed to take 
place. 19 Hence, in modeling operational loss 
magnitudes, one should either consider fitting 
the loss distributions that are defined only on 
positive values, or should use distributions that 
are defined on negative and positive values, 
truncated at zero. 

The second problem deals with the high 
degree of dispersion of loss data. Historical 
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observations suggest that the movements in the 
market indicators are generally of relatively low 
magnitude. Bigger losses are usually attributed 
to credit risk. Finally, although most of the op¬ 
erational losses occur on a daily basis and hence 
are small in magnitude, the excessive losses of 
financial institutions are in general due to the 
operational losses, rather than credit or mar¬ 
ket risk-related losses. Empirical evidence in¬ 
dicates that there is an extremely high degree of 
dispersion of the operational loss magnitudes, 
ranging from near-zero to billions of dollars. In 
general, this dispersion is measured by variance 
or standard deviation. 20 

The third problem concerns the shape of the 
loss distribution. The shape of the data for oper¬ 
ational risk is very different from that of market 
or credit risk. In market risk models, for ex¬ 
ample, the distribution of the market returns is 
often assumed to be nearly symmetric around 
zero. Asymmetric cases refer to the data whose 
distribution is either left-skewed (i.e., the left 
tail of the distribution is very long) or right- 
skewed (i.e., the right tail of the distribution is 
very long) and/or whose distribution has two 
or more peaks of different height. Operational 
losses are highly asymmetric, and empirical ev¬ 
idence on operational risk indicates that the 
losses are highly skewed to the right. This is 
in part explained by the presence of "low fre¬ 
quency/high severity" events. See Figure 2 for 
an exemplary histogram of operational losses. 

As previously discussed, empirical evidence 
on operational losses indicates a majority of ob¬ 
servations being located close to zero, and a 
small number of observations being of a very 
high magnitude. The first phenomenon refers 
to a high kurtosis (i.e., peak) of the data, and 
the second one indicates heavy tails (or fat tails). 
Distributions of such data are often described 
as leptokurtic. 

The Gaussian (or normal) distribution is of¬ 
ten used to model market risk and credit risk. 
It is characterized by two parameters, fx and o , 
that are its mean and standard deviation. Fig¬ 
ure 3 provides an example of a normal density. 



Loss amount (x) 


Figure 3 An Example of a Gaussian Density 

Despite being easy to work with and having at¬ 
tractive features (such as symmetry and stabil¬ 
ity under linear transformations), the Gaussian 
distribution makes several critical assumptions 
about the loss data. They include the following: 

• The Gaussian assumption is useful for model¬ 
ing the distribution of events that are symmet¬ 
ric around their mean. It has been empirically 
demonstrated that operational losses are not 
symmetric and severely right-skewed, mean¬ 
ing that the right tail of the loss distribution 
is very long. 

• In most cases (except for the cases when the 
mean is very high), the use of Gaussian dis¬ 
tribution allows for the occurrence of nega¬ 
tive values. This is not a desirable property 
for modeling loss severity because negative 
losses are usually not possible. 21 

• More importantly, the Gaussian distribution 
has an exponential decay in its tails (this 
property puts the Gaussian distribution into 
the class of light-tailed distributions), which 
means that the tail events (i.e., the events of 
an unusually high or low magnitude) have 
a near-zero probability of occurrence. FIow- 
ever, very high-magnitude operational losses 
can seriously jeopardize a financial institu¬ 
tion. Thus, it would be inappropriate to model 
operational losses with a distribution that 
essentially excludes the possibility of high- 
impact individual losses. Empirical evidence 
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strongly supports the conjecture that the dis¬ 
tribution of operational losses is in fact very 
leptokurtic—that is, has a high peak and very 
heavy tails (i.e., very rare events are assigned 
a positive probability). 

For the reasons presented above, it is unlikely 
that the Gaussian distribution would find much 
application for the assessment of operational 
risk. 22 Heavier tailed distributions such as log¬ 
normal, Weibull, and even Pareto and alpha- 
stable, ought to be considered. 

Dependence Between 
Business Units 

In order to increase the accuracy of opera¬ 
tional risk assessment, banks are advised to 
classify their operational loss data into groups 
of different degrees and nature of exposure to 
operational risk. Following this principle, the 
advanced measurement approaches (AMA) for 
the quantification of the operational risk capital 
charge, proposed by Basel II, suggest estimat¬ 
ing operational risk capital separately for each 
"business line/event type" combination. Such 
a procedure is not common in market risk and 
credit risk models. 

The most intuitive approach to combine risk 
measures collected from each of these "business 
line/event type" combinations is to add them 
up. 23 However, such an approach may result 
in overestimation of the total capital charge be¬ 
cause it implies a perfect positive correlation 
between groups. To prevent this from happen¬ 
ing, it is essential to account for dependence 
between these combinations. Covariance and 
correlation are the simplest measures of depen¬ 
dency, but they assume a linear type of depen¬ 
dence, and therefore can produce misleading 
results if the linearity assumption is not true. 
An alternative approach would involve using 
copulas that are more flexible with respect to 
the form of the dependence structure that may 
exist between different groups. Another attrac¬ 
tive property of copulas is their ability to cap¬ 


ture the tail dependence between the distribu¬ 
tions of random variables. Both properties are 
preserved under linear transformations of the 
variables. 

KEY POINTS 

• Operational risk measurement models are di¬ 
vided into top-down and bottom-up models. 

• Top-down models use a macro-level regu¬ 
latory approach to assess operational risk 
and determine the capital charge. They in¬ 
clude multifactor equity price models, in¬ 
come and expense-based models, operating 
leverage models, scenario analysis and stress 
testing models, and risk indicator models. 

• Bottom-up models originate from a micro¬ 
level analysis of a bank's loss data and con¬ 
sideration for the process and causes of loss 
events in determination of the capital charge. 
They include process-based models (such as 
causal network and Bayesian belief models, 
connectivity models, multifactor causal mod¬ 
els, and reliability models), actuarial models, 
and proprietary models. 

• Scarcity and reliability of available inter¬ 
nal operational loss data remains a barrier 
preventing banks from developing compre¬ 
hensive statistical models. Sufficiently large 
datasets are especially important for mod¬ 
eling low frequency high severity events. 
Three solutions have been put forward to help 
expand internal databases: pooling together 
internal and external data, accounting for 
near-misses, and stress tests. 

• The nature of operational risk is fundamen¬ 
tally different from that of credit and mar¬ 
ket risks. Specifics of operational loss process 
include discrete data arrival process, delays 
between time of event and loss detec¬ 
tion/ accumulation, loss data taking only pos¬ 
itive sign, high dispersion in magnitudes 
of loss data, distribution of loss data be¬ 
ing severely right-skewed and heavy-tailed, 
and dependence between business units and 
event types. 
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• While many market and credit risk models 
make the convenient Gaussian assumption on 
the market returns or stock returns, this dis¬ 
tribution is unlikely to be useful for the oper¬ 
ational risk modeling because it is unable to 
capture the nonsymmetric and heavy-tailed 
nature of the loss data. 

NOTES 

1. An exception is the scenario analysis mod¬ 
els in which specific events are identified 
and included in internal databases for stress 
testing. These events are, however, imagin¬ 
able and do not appear in the banks' origi¬ 
nal databases. 

2. Some of these models are described in 
Allen, Boudoukh, and Saunders (2004). 

3. See Chapter 2 in Chernobai, Rachev, and 
Fabozzi (2007) for an example of an empiri¬ 
cal study that utilized such models in order 
to to evaluate the sensitivity of operational 
risk to macroeconomic factors. 

4. See Marshall (2001). 

5. The first four examples are due to Marshall 

( 2001 ). 

6. See Hiwatashi and Ashida (2002). 

7. See Allen, Boudoukh, and Saunders (2004). 

8. See van den Brink (2002). 

9. See van den Brink (2002). 

10. The Internal Measurement Approach (see 
description in BIS, 2001) combines some 
elements of the top-down approach and 
bottom-up approach: The gamma parame¬ 
ter in the formula for the capital charge is set 
externally by regulators, while the expected 
loss is determined based on internal data. 

11. See Allen, Boudoukh, and Saunders (2004). 

12. In February 2001 the Basel Committee sug¬ 
gested the Scorecard Approach as one pos¬ 
sible advanced measurement approach to 
measure the operational risk capital charge. 

13. See, for example, Marshall (2001) on the 
"fishbone analysis." 

14. Relevant Bayesian belief models with ap¬ 
plications to operational risk are discussed 


in Alexander and Pezier (2001), Neil and 
Tranham (2002), and Giudici (2004), among 
others. 

15. See also Haubenstock (2003) and Cruz 
(2002). The empirical study by Allen and 
Bali (2007) investigates the sensitivity of 
operational VaR to macroeconomic, rather 
than a bank's internal, risk factors. 

16. Actuarial models form the basis of the loss 
distribution approach, an advanced mea¬ 
surement approach for operational risk. See 
BIS (2001). 

17. Identification of the three types of dates 
are based on discussions with Marco 
Moscadelli (Banking Supervision Depart¬ 
ment, Bank of Italy). 

18. See Carillo Menendez (2005). 

19. Certainly, it is possible that an event due to 
operational risk can incur unexpected prof¬ 
its for a bank, but usually this possibility is 
not considered. 

20. Some very heavy-tailed distributions, such 
as the heavy-tailed Weibull, Pareto, or 
alpha-stable, can have an infinite variance. 
In these situations, robust measures of 
spread must be used. 

21. Certainly, it is possible to use a truncated 
(at zero) version of the Gaussian distribu¬ 
tion to fit operational losses. 

22. Of course, a special case is fitting the Gaus¬ 
sian distribution to the natural logarithm of 
the loss data. This is equivalent (in terms 
of obtaining the maximum likelihood pa¬ 
rameter estimates) to fitting the lognormal 
distribution to the original loss data. 

23. This is the approach that was proposed in 
BIS (2001). 
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Abstract: A major risk faced by financial entities is operational risk. In general terms, operational 
risk is the risk of loss resulting from inadequate or failed internal processes, people, or systems 
or from external events. The two principal approaches in modeling operational loss distributions 
are the nonparametric approach and the parametric approach. It is important to employ a model 
that captures tail events and for this reason in operational risk modeling, distributions that are 
characterized as light-tailed distributions should be used with caution. 


For financial entities, representing a stream 
of uncertain operational losses with a specified 
model is a difficult task: Data can be wrongly 
recorded, fuzzy, incomplete (e.g., truncated or 
censored), or simply limited. Two main ap¬ 
proaches may be undertaken: nonparametric 
and parametric. In this entry, we focus on the 
nonparametric approach, common loss distri¬ 
butions, and mixture distributions. We begin 
by reviewing the nonparametric approach to 
modeling operational losses and then proceed 
to the parametric approach and review some 
common continuous distributions that can be 
relevant for modeling operational losses. For 
each of the distributions, we focus on its major 
characteristics that are important when using 


them to model the operational loss data: den¬ 
sity, distribution, tail behavior, mean, variance, 
mode, skewness, and kurtosis. 


APPROACHES TO 
OPERATIONAL RISK 
MODELING 

The two main approaches to operational risk 
modeling are: 

1. Nonparametric approach. One approach would 
be to directly use the empirical density of 
the data or its smoothed curve version. 1 This 
nonparametric approach can be relevant in 
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two circumstances: first, when the available 
data are not believed to follow any conven¬ 
tional distribution, 2 and second, when the 
data set available at hand is believed to be 
sufficiently comprehensive. 3 
2. Parametric approach. The task is considerably 
simplified if we are able to fit a curve of a 
simple analytical form that satisfies certain 
properties. The general goal of this paramet¬ 
ric approach is to find a loss distribution that 
would most closely resemble the distribution 
of the loss magnitudes of the available data 
sample. 

Figure 1 shows a common histogram for the 
operational loss data with a fitted continuous 
curve. Visual examination suggests that magni¬ 
tudes of the majority of the losses are very close 
to zero as is seen from the high peak around 
zero of the histogram; an insignificant fraction 
of data account for the long right tail of the 
histogram. Clearly, if we choose the paramet¬ 
ric approach and if the fitted curve represents 
a density of some chosen parametric distribu¬ 
tion, the loss distributions that would be ade¬ 
quate for modeling operational losses are those 
that are right-skewed, possibly leptokurtic, and 
have support on the positive values. 

Figure 2 summarizes possible approaches to 
modeling operational loss severity. 



Figure 1 Illustration of a Histogram of Loss Data 
and Fitted Continuous Density 


Nonparametric Approach 


Parametric Approach 




Empirical Distribution 


Common Distributions 




Smooth Curve Approximation 


Mixture Distributions 


Figure 2 Approaches to Modeling Loss Severity 


NONPARAMETRIC 
APPROACH: EMPIRICAL 
DISTRIBUTION FUNCTION 

Modeling operational losses with their empiri¬ 
cal distribution function is a nonparametric ap¬ 
proach as it does not involve estimation of the 
parameters of a loss distribution. In this sense, 
it is the simplest approach. On the other hand, 
it makes the following two critical assumptions 
regarding future loss data: 

* Historic loss data are sufficiently comprehen¬ 
sive. 

* All past losses are equally likely to reappear 
in the future, and losses of other magnitudes 
(such as potential extreme events that are not 
a part of existent database) cannot occur. 


Suppose we want to find the empirical dis¬ 
tribution function of a random variable X. It is 
found by: 


P(X< x) = 


number of losses < x 
total number of losses 


The empirical distribution function looks like 
a step function, with a step up occurring at each 
observed value of X. Figure 3 provides an illus¬ 
tration. The density function 4 is simply a rel¬ 
ative frequency histogram with a bar at each 
observed data value, and the height of each bar 
shows the proportion of losses of this magni¬ 
tude out of total. 

Note that the empirical distribution is often 
used in goodness-of-fit tests. One can compare 
it with a fitted loss distribution, and if the fitted 
loss distribution follows closely the empirical 
distribution, then this indicates a good fit; 
if it does not follow closely the empirical 
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Figure 3 Illustration of Empirical Distribution 
Function 

distribution function, then the loss distribution 
is not optimal. 

PARAMETRIC APPROACH: 
CONTINUOUS LOSS 
DISTRIBUTIONS 

In this section, we review several popular loss 
distributions. Certainly, a variety of additional 
distributions may be created by using some 
transformation of the original data and then 
fitting a distribution to the transformed data. 
A popular transformation involves taking the 
natural logarithm of the data. It is notable that 
if the original data are severely right-skewed, 
then the distribution of the log-data often be¬ 
comes "bell-shaped" and nearly symmetric. For 
example, fitting the normal distribution to the 
log-data is equivalent to fitting the lognormal 
distribution to the original data. 

Exponential Distribution 

The exponential distribution for a random vari¬ 
able X of length n is described by its density/ 
and distribution F of the following form: 

f(x) = Xe~ Xx , F(x) = 1 — e~ Xx , x>0 

The distribution is characterized by only 
one parameter A (A > 0), which is the scale 
parameter. 



Figure 4 Illustration of Exponential Density 

Examples of exponential densities are illus¬ 
trated in Figure 4. The maximum likelihood es¬ 
timate (MLE) for X is 

1 l n 

X — — where x = - x; 

x n ' 1 

;=i 

Raw moments are calculated as: 

E(X*) = § 

and so the population mean and variance are 
mean(X) = 1/A., var(X) = 1/A 2 

The mode of an exponential distribution is 
located at zero. The skewness and kurtosis co¬ 
efficients are y i = 2 and y 2 = 6, respectively. 

The inverse of the distribution has a sim¬ 
ple form F _1 (p) = —1/A log(l — p), p e (0, 1), 
and so an exponential random variate can be 
simulated using the inverse transform method 
by X = — l log U, where U is distributed uni¬ 
formly on the (0, 1) interval. Another popular 
simulation method uses the Von Neumann al¬ 
gorithm. 

The exponential density is monotonically de¬ 
creasing toward the right and is character¬ 
ized by an exponentially decaying right tail of 
the form F (x) — e~ Xx , which means that high- 
magnitude events are given a near-zero prob¬ 
ability. For this reason, it is unlikely that it 
would find much use in modeling operational 
losses, where arguably the central concern is the 
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losses of a very high magnitude (unless, per¬ 
haps, some generalizations of the exponential 
distribution or mixture models are considered). 

Note that another parameterization of the ex¬ 
ponential distribution is possible, with the den¬ 
sity specified as f(x) = . 


Lognormal Distribution 

A random variable X has a lognormal distribu¬ 
tion if its density and distribution are: 


/(*) = 


F(x) = <t> 


y/litax 
logx 


(log x-v-r 

-e 2 a 2 




x > 0 


where <t>(x) is the distribution of a standard nor¬ 
mal, N( 0, 1), random variable, and can be ob¬ 
tained by looking up the table of the standard 
normal quantiles. 5 

Examples of the lognormal density are illus¬ 
trated in Figure 5. The parameters p (—oo < p 
< oo) and a (p > 0) are the location and scale 
parameters, respectively, and can be estimated 
with MLE as: 



/=i 


1 

- ^(iogXy-A ) 2 
/=! 

( 1 ) 


and so the population mean and variance are 
calculated to be 

mean(X) = , var(X) = (e ff2 - l)e 2ll+a2 

The mode is located at e M-cr ~. The skewness 
and kurtosis coefficients are: 

n = v> 2 - 1(2 + e° 2 ) 
yi = e 4ff2 + 2e 3 ° 2 + 3e 2a2 - 6 

The inverse of the distribution is F~ 1 (p) = 
e® If’)»+/*, an d so a lognormal random variate 
can be simulated by X = e 9 ^ U h+M, w here <J> 
is the standard normal distribution. Note that 
a lognormal random variable can be obtained 
from a normal random variable Y with param¬ 
eters p and a (this is often written as N(p, cr)) 
via the transformation X = e Y . Thus, if X has a 
lognormal distribution, then log X has a normal 
distribution with the same parameters. 

The lognormal distribution is characterized 
by moderately heavy tails, with the right tail 
F (x) ~ x~ 1 e~ log x . To fit a lognormal distribu¬ 
tion to the data, one can take the natural loga¬ 
rithm of the dataset, and then fit to it the normal 
distribution. Note that the MLE will produce 
the same estimates, but the method of moments 
will produce different parameter estimates. 


Raw moments are calculated as: 

i , a 2 k 2 

E(X k ) = e ^ + — 



Weibull Distribution 

The Weibull distribution is a generalization of 
the exponential distribution: Two parameters 
instead of one parameter allow for greater flex¬ 
ibility and heavier tails. The density and distri¬ 
bution are 6 

f(x) — a t 6x a ~ 1 e~ f>x “, F(x) = 1 — e~^ x “, x>0 

with /3 (ft > 0) being the scale parameter and 
a (a > 0) the shape parameter. 

Examples of the density are illustrated in Fig¬ 
ure 6. The MLE estimators for the parameters 
do not exist in closed form, and should be evalu¬ 
ated numerically. Raw moments are calculated 
as: 



Figure 5 Illustration of Lognormal Density 


E(x fc ) = p~ k/a r 


a 
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Figure 6 Illustration of Weibull Density 


Figure 7 Illustration of Gamma Density 


and so the population mean and variance are: 


mean(X) = b -1/ “r 



var(X) = r 2/a (r (l + ^ - r 2 (l + 


The mode is located at 1 — a -1 ) 1 ^ for a 
> 0 and at zero otherwise. The formulae for the 
skewness and kurtosis coefficients are: 


thors use 1/p instead of p. Sometimes l//3“ is 
used instead of p. 


Gamma Distribution 

The gamma distribution is another gener¬ 
alization of an exponential distribution and 


2T 3 (1 + £) - 3T(1 + 1)T(1 + |) + T(1 + |) 

[r(i+ |) - r 2 (i + 2)3/ 2 ] 

-6 [r 4 (l + i) - 12T 2 (1 + 1)T(1 + f) - 3r 2 (l + \) - 4T(1 + 2)r(l + D + r(i + 1)] 

[r(i + |)-r 2 (i + i )] 2 


The inverse of a Weibull random variable does 
not exist in a simple closed form. To generate a 
Weibull random variable, one can first generate 
an exponential random variable Y with param¬ 
eter ft and then follow the transformation X = 
yl/a. 

The right tail behavior of a Weibull ran¬ 
dom variable follows the form F ( x) — ef’ x ", and 
so the distribution is heavy-tailed for a < 1. 
Weibull distribution has been found to be the 
optimal distribution in reinsurance models 7 as 
well as in asset returns models. 8 

Note the following regarding the Weibull dis¬ 
tribution. First, if a = 1, then the Weibull distri¬ 
bution reduces to the exponential distribution. 
Second, other parameterizations of the Weibull 
distribution are possible. For example, some au- 


is specified by its density and distribution 
given by 9 


-x^e-P* 


f{x) r(or)' 

F (x) = r(a; fix), x>0 


where the two parameters, a (a > 0) and 
P (fi > 0), characterize the shape and scale, 
respectively. 

Examples of the density are illustrated in Fig¬ 
ure 7. The MLE estimates for the parameters 
can be only evaluated numerically. The raw mo¬ 
ments are found by: 


E{X k ) = 


r (a+k) 

r (a)P k 
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yielding the population mean and variance as 

Ct Oi 

mean(X) = var(X) = — 

P P 


The mode is for a > 1 and zero otherwise. 

P 

The skewness and kurtosis coefficients are 
found by 




If a is an integer, 10 then to generate a gamma 
random variable with parameters a and ft one 
can generate a sum of a exponential random 
variables each with parameter p. Hence, if 
U a are independent uniform (0, 1) 
random variables, then X = — l//31og (11 " =] l/ ; j 
has the desired distribution. A variety of meth¬ 
ods for generation of a gamma random variable 
is described in Devroye (1986). 


Beta Distribution 


The beta distribution has density and distribu¬ 
tion of the following form: 11 


/(*) = 


r(a + P) 


.a —1 


(1 -xf- 1 


r(a)ros) 

F(x) = I(x;a, P), 0 < x < 1 


Examples of the density are illustrated in Fig¬ 
ure 8. Note that X has a bounded support on 
[0, 1]. Certainly, operational loss data may be 
rescaled to fit this interval. In this case, the fol¬ 
lowing version of the beta density and distri¬ 



bution is possible (the parameter 0 is assumed 
known): 


/(x)= Lpm ( ry-W 1 _r ) 
J w r (<x + b) W V e) 


r(a + P) \0/ 

tX 


F(x) = I ^-;a,, 0 < x < 6, 


f >-1 1 
x 

e > o 


The parameters a (a > 0) and fi (ft > 0) de¬ 
termine the shape of the distribution. The MLE 
estimators can be evaluated numerically. The 
raw moments for the regular version of the beta 
density can be found by 

k (a + P-ma + k -V'. 

^ ’ (a -!)!(« + p + k — 1)! 


yielding the mean and the variance: 


mean(X) = 


var(X) = 


aP 


(a + P) 2 {u + P + 1) 


The mode is equal to (a — 1 )/(a + P — 2). 
The skewness and kurtosis coefficients are esti¬ 
mated by 

2 (P — a)^l + a + p 

T P(2 + a + p) 

_ 6 [a 3 + a 2 (l - IP) + p 2 ( 1 + P)~ lap(2 + p)] 
ap(a + p + 2)(a + P + 3 ) 

The beta random variate can be generated us¬ 
ing an algorithm described in Ross (2001, 2002) 
or Devroye (1986). 

Note that the beta distribution is related to 
the gamma distribution. Suppose we have two 
gamma random variables X and Y with param¬ 
eters ai, P\ and q! 2 . Pi, respectively. Then the 
variable Z = X/(X+Y) has a beta distribution 
with parameters a\, 012 ■ This property can be 
used to generate a beta random variate from 
two gamma random variates. 


Pareto Distribution 


The Pareto distribution is characterized by its 
density and distribution of the form: 


/(*) = 


ccP a 

x a+1 ’ 



p < X < OO 


Figure 8 Illustration of Beta Density 


F(x) = l- 
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Figure 9 Illustration of Pareto Density 


Note that the range of permissible values of X 
depends on the scale parameter ft (ft > 0). The 
parameter a (a > 0) determines the shape. 

Figure 9 illustrates some examples of the den¬ 
sity. No closed-form expressions for the MLE 
estimators exist (except for the case when ft = 1 , 
in which case a = n/ Y^]=\ log x j )> so they have 
to be evaluated numerically. 

The raw moments are estimated by 


E(X k ) = 


aft k 
a — k 


from which the population mean and variance 
are found to be 


mean(X) = 
var(X) = 


otft 
a — 


for a > 1 
aft 2 


(a - l) 2 (a - 2) 


for a > 2 


The mode is equal to zero. The skewness and 
kurtosis coefficients are: 


a — 2 2(a + 1) 

n = V-y- 

V a a — 3 
6 (a 3 + a 2 — 6a — 2) 

^ 2 a (a — 3)(a — 4) 

The inverse of the distribution is F~ : (p) = 
ft((l — p) _1/ " — 1), which can be used to gen¬ 
erate a Pareto random variate. 

The Pareto distribution is a very heavy-tailed 
distribution, as is seen from the tail behavior, a 
determines the heaviness of the right tail, which 
is monotonically decreasing for the Pareto dis¬ 


tribution: The closer it is to zero, the thicker the 
tail, F(x) = -j/E j . Tails proportional to x~ a 
are called the power tails (as opposed to the 
exponentially decaying tails) because they fol¬ 
low a power function. The case when a < 1 
refers to a very heavy-tailed case, in which the 
mean and the variance are infinite (see the for¬ 
mulas for mean and variance earlier), means 
that losses of an infinitely high magnitude are 
possible. 

While on one hand the Pareto distribution ap¬ 
pears very attractive for modeling operational 
risk, as it is expected to capture very high- 
magnitude losses, on the other hand, from the 
practical point of view, the possibility of infinite 
mean and variance could pose a problem. 

Note the following: 

* Different versions of the Pareto distribution 
are possible. Occasionally a simplified, 1- 
parameter version of the Pareto distribution 
is used, with ft = 1 . 

* A 1-parameter Pareto random variable may 
be obtained from an exponential random vari¬ 
able via a simple transformation. If a random 
variable Y follows an exponential distribu¬ 
tion with parameter X, then X = e Y has the 
1-parameter Pareto distribution with the 
same shape parameter. 

* A 2-parameter Pareto distribution may be 
reparameterized in such a way that we obtain 
the generalized Pareto distribution (GPD). 
The GPD can be used to model extreme events 
that exceed a high threshold. 


Burr Distribution 

The Burr distribution is a generalized three- 
parameter version of the Pareto distribution 
and allows for greater flexibility in the shape 
due to additional shape parameter y (y > 0). 
The density and distribution functions can be 
written as 


f(x) = yaft 0 
F(x) = 1 - 


rY - 1 


(ft + XYY + 1 


ft+XY 


x > 0 
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Figure 10 Illustration of Burr Density 


Examples of the density are depicted in Fig¬ 
ure 10. The MLE estimators for the parameters 
can generally be evaluated only numerically. 
The raw moments are estimated as: 


E(X*) = 


P k !v 

r>)* 


r i+- r 


—y < k < ya 


from which the population mean and variance 
are calculated as: 

mean(X) = ^ (l + 1) P (« - i) 


var(X) = 


ya > 1 

^r| 

T(a) 1 

p 2 'y 
“r Ha) 
ya > 2 





The mode is equal to ^ (%yTi ) f°r T > 1 
and zero otherwise. 

The Burr random variable can be generated by 
the inverse transform method, using F^ 1 (p) = 
(P((i - - 1 )?». 

The right tail has the power law property 
and obeys F(x) = ( -jfpyv'j • The distribution 
is heavy-tailed for the case a < 2 and is very 
heavy-tailed when a < 1. The Burr distribution 
has been used in the insurance industry, and 
has been found to be an optimal distribution 
for natural catastrophe insurance claims. 12 


Note the following two points. First, if y = 1, 
then the Burr distribution reduces to the Pareto 
distribution. Second, other parameterizations 
of the Burr distribution are possible. For exam¬ 
ple, the Burr distribution with ft = 1 is known 
as the loglogistic distribution. 


EXTENSION: MIXTURE LOSS 
DISTRIBUTIONS 

Histograms of the operational loss data often 
reveal a very high peak close to zero and a 
smaller but distinct peak toward the right tail. 
This may suggest that the operational loss data 
often do not follow a pattern of a single distribu¬ 
tion, even for data belonging to the same loss 
type (such as operational losses due to busi¬ 
ness disruptions) and the same business line 
(such as commercial banking). One approach 
in modeling such losses would be to consider 
the GPD to model the tail events and an em¬ 
pirical or other distribution for the remaining 
lower-magnitude losses. Alternatively, one may 
consider a single distribution composed by a 
mixture of two or more loss distributions. 

The density and distribution of a ///-point mix¬ 
ture distribution can be expressed as 

m m 

f( x ) = Y 2 w jfj( x ), F( x ) = Y^ w jFj( x ) 

;=i /=i 

where Wj,j = 1,2,... ,m, are the positive weights 
attached to each member distribution, adding 
up to 1. It is possible to have a mixture of differ¬ 
ent types of distributions, such as exponential 
and Weibull, or of the same type of distribution 
but with different parameters. 

An example of a mixture of two lognormal 
distributions (pi = 0.9, ay = 1, /x 2 = 3, a 2 = 0.5) 
is depicted in Figure 11. 

The MLE estimates of the parameters (in¬ 
cluding the weights) of mixture distributions 
can generally be evaluated only numerically. 
A commonly used procedure to estimate the 
parameters of mixture distributions is the 
expectation-maximization algorithm. The raw 
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Figure 11 Illustration of 2-Point Lognormal 
Mixture Density 


moments are found as the weighted sum of the 
/cth moments evaluated individually for each 
of the m member distributions. The population 
mean and variance are found by 


mean(X) = uqEy(X), var(X) = wjcrj(X) 
j=i j =1 


where the subscripts j refer to each mem¬ 
ber density. The right tail follows F ( x) = 

zy =1 w } Fj(x). 

The advantage of using mixture distributions 
is that they can be fitted to practically all shapes 
of loss distributions. On the other hand, the 
models may lack reliability due to a large num¬ 
ber of parameters that need to be estimated (in 
particular, when the available loss data set is 
not large enough). For example, a 2-point mix¬ 
ture of exponential distributions requires only 
three parameters, but a 4-point mixture of expo¬ 
nential distributions requires seven parameters. 
In some cases, this problem may be overcome 
when certain simplifications are applied to the 
model. For example, it is possible to achieve 
a 2-point mixture of Pareto distributions with 
four, instead of five, unknown parameters; the 
following distribution has been successfully ap¬ 
plied to liability insurance: 


F(x) = 1 -a 


Pi 


Pi 


+ (1 — «) 


Pi 


Pi ■ 


o ;+2 


with the first distribution covering smaller 
magnitude events and having a higher weight a 
attached, and the second distribution covering 
infrequent large-magnitude events. 13 

An extension to mixture distributions may be 
to allow m to be a parameter, and "let the data 
decide" on how many distributions should en¬ 
ter the mixture. This, however, makes the model 
data-dependent and more complex. 14 

Note that the term mixture distribution is 
sometimes also used for distributions in which 
an unknown parameter is believed to be ran¬ 
dom and follows some distribution rather than 
being fixed. For example, a mixture of Poisson 
and gamma distributions (i.e., the parameter of 
the Poisson distribution follows a gamma dis¬ 
tribution) will result in a hypergeometric distri¬ 
bution. 


A NOTE ON THE TAIL 
BEHAVIOR 

Operational risk managers are concerned with 
finding a model that would capture the "tail 
events." In the context of operational losses, it is 
understood that tail events refer to the events in 
the upper tail of the loss distribution. A crucial 
task in operational risk modeling is to produce 
a model that would give a realistic account to 
the possibility of losses exceeding a very high 
amount (this becomes critical in the estimation 
of the Value-at-Risk). 

In operational risk modeling, thin-tailed dis¬ 
tributions should be used with caution. The 
following example illustrates the danger of fit¬ 
ting a light-tailed distribution to the data whose 
true distribution is heavy-tailed. 15 We gener¬ 
ated 5,000 points from the Pareto distribution 
(heavy-tailed) with parameters a = 1.67 and 
P = 0.6. We then fitted an exponential distribu¬ 
tion (light-tailed) to the data. The MLE proce¬ 
dure resulted in the exponential parameter of 
X — 1.61. Figure 12 demonstrates the difference 
in the behavior of the tails of both distributions. 
In the far right, the probability of exceeding any 
high point is significantly lower (roughly, by 
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Figure 12 Tails of Pareto and Exponential Dis¬ 
tributions Fitted to Simulated Pareto Random 
Variable 

5%) under the exponential fit. This indicates 
that the probability of high-value events (and 
exceeding them) will be underestimated if one 
commits the mistake of fitting a thin-tailed loss 
distribution to the loss data. Such mistakes may 
be costly and lead to serious consequences in 
the operational risk management, if the poten¬ 
tial for high-magnitude losses is being inade¬ 
quately assessed. 

In Table 1 common distributions are classi¬ 
fied into two categories depending on the heav¬ 
iness of the right tail. Note that the Weibull 
distribution can be thin-tailed or heavy-tailed 
depending on the value of the shape parame¬ 
ter. Regarding the lognormal distribution, some 


literature refers to it as a thin-tailed distribu¬ 
tion, but we follow Embrechts, Kliippelberg, 
and Mikosch (1997), who put it in the class of 
medium-tailed distributions. The beta distribu¬ 
tion has a bounded support, which makes it a 
thin-tailed distribution. 

EMPIRICAL EVIDENCE WITH 
OPERATIONAL LOSS DATA 

In this section we provide results from empiri¬ 
cal studies based on operational loss data that 
apply the distributions described in this entry. 
There are two types of studies: Those based on 
real operational loss data and those based on 
simulated data. 

The empirical studies indicate that practition¬ 
ers try a variety of possible loss distributions 
for the loss data and then determine an opti¬ 
mal one on the basis of goodness-of-fit tests. 
It is common to use the Kolmogorov-Smirnov 
(KS) and Anderson-Darling (AD) tests to ex¬ 
amine the goodness of fit of the model to the 
data. The two tests use different measures of 
the discrepancy between the fitted continu¬ 
ous distribution and the empirical distribution 
functions. The KS test better captures the dis¬ 
crepancy around the median of the data, while 
the AD test is more optimal for the tails. A 
smaller value of the test statistic indicates a 


Table 1 Tail Behavior of Common Loss Distributions 


Name 

Tail F (x) 

Parameters 

Thin-Tailed Distributions 

Normal 

Exponential 

Gamma 

Weibull 

Beta 

F(x) = l-O(^) 

F(x) = e~ kx 

F(x) = 1 - T (or; fix) 

F(x) = e~f >x ° 

F(x) = 1 - I(x;ce,p) 

—oo < fi < oo, a > 0 
X > 0 
a, P > 0 
a >1, fi > 0 
a, P > 0 

Medium-Tailed and Heavy-Tailed Distributions 

Lognormal 

f(x) = 1 - O ( tot) 

—oo < fi < oo, a > 0 

Weibull 

F(x) = e~ tlx ° 

0 < a < 1, p > 0 

Pareto 

rw = fe)" 

a, P > 0 

Burr 


a, p, y > 0 
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better fit. Other goodness-of-fit tests include 
Kuiper, Cramer-von Mises, and Pearson's x 2 
test, among others. 

Studies with Real Data 

We review some empirical studies based on real 
operational loss data from financial institutions. 

Muller Study of 1950-2002 Operational 
Loss Data 

Muller (2002) carried out empirical analysis 
with external operational loss data obtained 
from worldwide institutions in the 1950-2002 
period, made available then by the IC 2 Opera¬ 
tional Loss FIRST Database. Only data in U.S. 
dollars for the events whose state of affairs was 
"closed" or "assumed closed" on an indicated 
date were considered for the analysis. The data 
were available for five loss types: 

* "Relationship" (such as events related to legal 
issues, negligence, and sales-related fraud). 

* "Human" (such as events related to employee 
errors, physical injury, and internal fraud). 

* "Processes" (such as events related to busi¬ 
ness errors, supervision, security, and trans¬ 
actions). 

* "Technology" (such as events related to tech¬ 
nology and computer failures and telecom¬ 
munications). 

* "External" (such as events related to natural 
and man-made disasters and external fraud). 

Figure 13 shows the histograms of the five 
data sets. There is a clear peak in the beginning, 
which is captured by the excessive kurtosis; a 
heavy right tail is also evident and is captured 
by the high degree of positive skewness (see 
Table 2). 

From the common distributions discussed 
in this entry, exponential, lognormal, Weibull, 
gamma, and Pareto distributions were used. 
Table 2 demonstrates the five samples' MLE 
parameter estimates and KS and AD statistic 
values for the five distributions. The center of 
the data is best explained by the lognormal dis¬ 
tribution, as is concluded from the lowest KS 


statistic values, for all except "Technology" type 
losses for which Weibull is the best. The same 
conclusions are drawn regarding the tails of the 
datasets. 

Cruz Study of Legal Loss Data 

Cruz (2002) applies exponential, Weibull, 
and Pareto distributions to a sample (in U.S. 
dollars) from a legal database (from an undis¬ 
closed source), consisting of 75 points. 16 The 
sample's descriptive statistics, as well as the 
MLE parameters for the three distributions 17 
and goodness-of-fit statistics are depicted in 
Table 3. The data are highly leptokurtic and 
significantly right-skewed. Based on visual 
and formal tests for the goodness of fit, 18 Cruz 
concluded that the Pareto distribution fits the 
data best. Nevertheless, none of the considered 
loss distributions is able to capture well the 
heaviness of the upper tail. 

Moscadelli Study of 2002 LDCE Operational 
Loss Data 

Moscadelli (2004) explores the data (in eu¬ 
ros) collected by the Risk Management Group 
(RMG) of the Basel Committee in June 2002's 
Operational Risk Loss Data Collection Exercise 
(LDCE). There were 89 participating banks from 
19 countries worldwide that provided their in¬ 
ternal loss data for the year 2001. The data were 
classified into eight business lines and pooled 
together across all banks. The eight business 
lines are: 

* BL1: Corporate Finance. 

* BL2: Trading and Sales. 

* BL3: Retail Banking. 

* BL4: Commercial Banking. 

* BL5: Payment and Settlement. 

* BL6: Agency Services. 

* BL7: Asset Management. 

* BL8: Retail Brokerage. 

The lognormal, gamma, Gumbel, Pareto, and 
exponential distributions were fitted to the 
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Figure 13 Relative Frequency Histograms of Operational Loss Data in Muller Study 


data. The estimation procedure used in the 
study was somewhat simplified for two rea¬ 
sons. First, different banks used different mini¬ 
mum truncation levels for their internal data, 
roughly between €6,000 to €10,000. This is¬ 
sue was ignored in the estimation process. Sec¬ 


ond, the data across all participating banks 
were pooled together without any considera¬ 
tion given for bank characteristics such as size. 

Table 4 reproduces the sample descriptive 
statistic (based on 1,000 bootstrapped sam¬ 
ples generated from the original data), MLE 
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Table 2 Sample Description, Parameter Estimates, and Goodness-of-Fit Tests in the Muller Study 



"Relationship" 

"Human" 

"Processes" 

"Technology" 

"External" 

1. Sample Description 

# obs. 

585 

647 

214 

61 

220 

Mean ($ '000,000) 

0.0899 

0.1176 

0.3610 

0.0770 

0.0930 

Median ($ '000) 

12.8340 

6.3000 

50.1708 

11.0475 

8.9076 

St.Dev. ($ '000,000) 

0.3813 

0.7412 

1.0845 

0.1351 

0.4596 

Skewness 

11.1717 

18.8460 

7.8118 

3.0699 

10.9407 

Kurtosis 

152.2355 

418.8717 

81.5218 

14.7173 

136.9358 

2. MLE Parameter Estimates and Goodness-of-Fit Test Statistics 

Exponential distribution 

A 

9.0-10 7 

0.15-10 7 

0.36-10 7 

7.7-10 7 

9.3-10 7 

KS test 

0.4024 

0.5489 

0.3864 

0.3909 

0.4606 

AD test 

1.2-10 5 

8460 

3.9185 

1.9687 

430.2 

Lognormal distribution 






I 1 

16.2693 

15.9525 

17.6983 

16.1888 

15.9696 

o 

2.1450 

2.4551 

2.2883 

2.5292 

2.2665 

KS test 

0.0301 

0.0530 

0.0620 

0.1414 

0.0449 

AD test 

0.0787 

0.1213 

0.1600 

0.3043 

0.1597 

Weibull distribution 






a 

0.0002 

0.0008 

0.0001 

0.0003 

0.0004 

p 

0.4890 

0.4162 

0.4822 

0.4692 

0.4527 

KS test 

0.0608 

0.0907 

0.0656 

0.1179 

0.0749 

AD test 

0.4335 

0.2231 

0.2247 

0.2372 

0.2696 

Gamma distribution 






a 

— 

— 

0.3372 

0.3425 

— 

p 

— 

— 

1.07-10 9 

0.2-10 9 

- 

KS test 

- 

- 

0.1344 

0.1357 

- 

AD test 

- 

- 

- 

- 

- 

Pareto distribution 






a 

-0.8014 

-0.8936 

-0.7642 

-0.6326 

-0.8498 

p 

1.8-10 7 

1.6-10 7 

8.5-10 7 

2.8-10 7 

1.4-10 7 

KS test 

0.1296 

0.1979 

0.1504 

0.2812 

0.1783 

AD test 

0.4031 

0.5566 

0.6256 

1.0918 

0.4784 


Table 3 Sample Descriptive Statistics, Parameter Estimates, and 
Goodness-of-Fit Tests in the Cruz Study 


1. Sample Description 

Mean ($) 

439,725.99 

Median ($) 

252,200 

St.dev. ($) 

538,403.93 

Skewness 

4.42 

Kurtosis 

23.59 


2. MLE Parameter Estimates and Goodness-of-Fit Test Statistics 


Exponential 

k = 

= 440,528.63 

KS 

test: 0.2104 

W 2 

test: 1.3525 

Weibull 

a - 

= 2.8312 

KS 

test: 0.3688 

W 2 

test: 4.8726 


p-- 

= 0.00263 





Pareto 

a - 

= 6.1737 

KS 

test: 0.1697 

W 2 

test: 0.8198 


p-- 

= 2,275,032.12 






Source: Cruz (2002), pp. 57,58, and 60, with modifications. 
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Table 4 Sample Descriptive Statistics, Parameter Estimates, and Goodness-of-Fit Statistics in the Moscadelli Study 

BL1 BL2 BL3 BL4 BL5 BL6 BL7 BL8 


1. Sample Description 


# obs. 

423 

5,132 

28,882 

3,414 

1,852 

1,490 

1,109 

3,267 

Mean (€'000) 

646 

226 

79 

356 

137 

222 

195 

125 

St.dev. (€'000) 

6,095 

1,917 

887 

2,642 

1,320 

1,338 

1,473 

1,185 

Skewness 

16 

23 

55 

15 

24 

13 

25 

32 

Kurtosis 

294 

674 

4,091 

288 

650 

211 

713 

1,232 


2. 

MLE Parameter Estimates and Goodness-of-Fit Test Statistics 



Lognormal distribution 









3.58 

3.64 

3.17 

3.61 

3.37 

3.74 

3.79 

3.58 

a 

1.71 

1.27 

0.97 

1.41 

1.10 

1.28 

1.28 

1.08 

KS test 

0.18 

0.14 

0.18 

0.16 

0.15 

0.12 

0.11 

0.12 

AD test 

22.52 

181 

1,653 

174 

73.74 

46.33 

25.68 

87.67 

Gumbel distribution 










93.96 

51.76 

25.63 

48.30 

35.86 

54.82 

56.78 

41.03 

a 

602 

185 

58.80 

204 

110 

181 

154 

93.51 

KS test 

0.43 

0.37 

0.34 

0.37 

0.36 

0.35 

0.32 

0.31 

AD test 

125 

1,224 

6,037 

831 

436 

333 

204 

577 


Source: Moscadelli (2004), pp. 19 and 25. 


parameter estimates (based on the original 
data), and goodness-of-fit test statistics 19 for the 
lognormal and Gumbel distributions. 20 Other 
considered distributions showed a poor fit. 
Although lognormal and Gumbel fitted the 
main body of the data rather well, they per¬ 
formed poorly in the upper tail, according to 
Moscadelli. This was confirmed by the test 
statistic values above the 90% critical values, 
meaning that it is unlikely that the data come 
from a selected distribution at the 90% confi¬ 
dence level. 

He further performs the analysis of the data 
using the extreme value theory argument for 
modeling high losses with the GPD, finding 
that GPD outperforms other considered dis¬ 
tributions. He also confirms the findings from 
other empirical studies that operational losses 
follow a very heavy-tailed distribution. 

De Fontnouvelle-Rosengren-Jordan Study of 
2002 LDCE Operational Loss Data 

The dataset examined in Moscadelli was also 
analyzed by de Fontnouvelle, Rosengren, and 
Jordan (2006). They limited their analysis 
to the data collected from six banks, and 


performed the analysis on the bank-by-bank 
basis, rather than pooling the data as was done 
in Moscadelli. For confidentiality reasons, only 
the data belonging to the four business lines— 
Trading and Sales (BL1), Retail Banking (BL2), 
Payment and Settlement (BL3), and Asset Man¬ 
agement (BL4)—and six loss types—Internal 
Fraud (LT1), External Fraud (LT2), Employment 
Practices and Workplace Safety (LT3), Clients, 
Products and Business Practices (LT4), and 
Execution, Delivery and Process Management 
(LT5)—were included in the analysis. 

The following distributions were considered 
for the study: exponential, Weibull, lognor¬ 
mal, gamma, loggamma (i.e., log of data is 
gamma-distributed), 1-parameter Pareto, Burr, 
and loglogistic. 21 The distributions were fitted 
using the MLE method. Overall, heavy-tailed 
distributions—Burr, loggamma, loglogistic, 
and 1-parameter Pareto—fit the data very well, 
while thin-tailed distributions' fit is poor, as 
expected. In particular, losses of LT3 are well fit 
by most of the heavy-tailed distributions and 
lognormal. In many cases, the estimated pa¬ 
rameters would be unreasonable, for example 
resulting in a negative mean loss. For some 








Modeling Operational Loss Distributions 


117 


Table 5 Sample Descriptive Statistics, Parameter Estimates, and 
Goodness-of-Fit Tests in the Lewis Study 


1. Sample Description 

Mean (£) 


151,944.04 

Median (£) 


103,522.90 

St.dev. (£) 


170,767.06 

Skewness 


2.84 

Kurtosis 


12.81 

2. MLE Parameter Estimates and Goodness-of-Fit Test Statistics 

Normal 

H = 151, 944.04, a = 170, 767.06 

AD test: 8.090 

Exponential 

X = 151, 944.04 

AD test: 0.392 

Weibull 

a = 0.95446, p = 0.00001 

AD test: 0.267 


Source: Lewis (2004), p. 88, with modifications. 


BL and LT data sets, the models failed the x 2 
goodness-of-fit test for all considered cases. 
Hence, de Fontnouvelle, Rosengren, and 
Jordan performed additional analysis using 
the extreme value theory and fitting the GPD 
to the data exceeding a high threshold. 22 

Lewis Study of Legal Liability Loss Data 

Lewis (2004) reports his findings for a sam¬ 
ple (in British pounds) of legal liability losses 
(from an undisclosed source), consisting of 140 
points. 23 He fits the normal, exponential, and 
Weibull distributions 24 to the data and com¬ 
pares the fit. Table 5 shows the descriptive 
statistics for the sample, the MLE parameters 
for three fitted distributions, and the values 
of the AD goodness-of-fit statistic. The data 
are highly leptokurtic and significantly right- 
skewed. As expected, the normal distribution 
results in a very poor fit, and the Weibull dis¬ 
tribution seems the most reasonable assump¬ 


tion, based on the lowest value of the AD test 
statistic. 

Studies with Simulated Data 

A number of studies on operational risk that 
have appeared in literature were using simu¬ 
lated rather than real data. We present a few 
examples here. 

Reynolds-Syer Study 

Reynolds and Syer (2003) apply a nonparamet- 
ric approach to modeling operational loss sever¬ 
ity. They use a hypothetical sample of six-year 
internal operational loss data of a firm, with a 
total of 293 observations. The summary of input 
data is given in Table 6. Using the sample of his¬ 
toric data, sampling is repeated a large number 
of times, and 1,000 simulated years are created. 
For each year, the simulated losses are summed 
up. The distribution of yearly aggregated 


Table 6 Sample Descriptive Statistics of Loss Data in the Reynolds-Syer 
Study 


Year 

# obs. 

Total ($ '000,000) 

Average ($ '000) 

St. Dev. ($ '000) 

2000 

64 

7.55 

117.9 

109.6 

2001 

57 

6.35 

111.3 

106.2 

2002 

52 

5.14 

98.8 

93.7 

2003 

55 

5.29 

96.1 

88.0 

2004 

43 

3.86 

89.7 

78.5 

2005 

45 

3.41 

75.7 

68.5 


Source: Reynolds and Syer (2003), p. 204. 
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operational losses is assumed to follow the re¬ 
sulting empirical distribution. 

Rosenberg-Schuermann Study 

Rosenberg and Schuermann (2006) use a Monte 
Carlo approach to generate a sample of 200,000 
operational losses. For the loss distribution 
they consider a 1-parameter Pareto distribution 
with parameter 1/0.65 = 1.5385. This param¬ 
eter is based on the average of the exponen¬ 
tial parameters 25 of 1/0.64 and 1/0.66, obtained 
for logarithmic losses from the OpRisk Ana¬ 
lytics database and Op Vantage database, re¬ 
spectively, in the empirical study carried out 
by de Fontnouvelle, Dejesus-Rueff, Jordan, and 
Rosengren (2003). Recall that since the shape 
parameter is less than one, then such Pareto 
distribution has a finite mean but an infinite 
variance. To guarantee the existence of the first 
two moments, Rosenberg and Schuermann set a 
log-loss greater than 1,000 standard deviations 
equal to a loss of 1,000 standard deviations. 

KEY POINTS 

* Broadly, one can classify the approaches to 
model operational loss magnitudes into two 
groups: nonparametric approach and para¬ 
metric approach. 

* Under the nonparametric approach, one can 
either model the losses using the empirical 
distribution function, or one can fit a smooth 
curve to the histogram of the data and analyze 
the properties of the curve instead. 

• Under the parametric approach, one can fit 
one (or more) of common parametric distribu¬ 
tions directly to the data (and compare them). 

• Because of the specific nature of the oper¬ 
ational loss data, the distributions that are 
most likely to find application to modeling 
the losses are those that are right-skewed and 
are defined only on the positive values of 
the underlying random variable. These distri¬ 
butions include the exponential, lognormal, 
Weibull, gamma, beta, Pareto, Burr, and mix¬ 
ture distributions. 


• Operational risk managers are concerned 
with finding a model that would capture the 
"tail events." Common distributions are clas¬ 
sified into two categories depending on the 
heaviness of the right tail: light-tailed and 
heavy-tailed. In operational risk modeling, 
light-tailed distributions should be used with 
caution. 

• There have been several empirical studies 
with operational loss data. Two types of em¬ 
pirical studies are distinctive: studies that 
use real loss data and studies that use simu¬ 
lated data. Generally, most of the studies sug¬ 
gest that heavy-tailed loss distributions (such 
as lognormal or Pareto) best describe opera¬ 
tional loss magnitudes. 


NOTES 

1. An example is cubic spline approxima¬ 
tion as is done in Rosenberg and Schuer¬ 
mann (2006). Useful references on this 
approach include Silverman (1986) and 
Scott (1992). 

2. See Rosenberg and Schuermann (2006). 

3. See Cizek, Flardle, and Weron (2005). 

4. To be more precise, for a discrete random 
variable it is called probability mass func¬ 
tion. 

5. The lognormal distribution was proposed 
by the Basel Committee for the operational 
risk modeling in 2001. 

6. T(fl) is the complete gamma function, 

F(fl) = J 0 °° When a is an integer, 

then T(fl) = (a — 1)! 

7. See Madan and Unal (2004) and Kremer 
(1998). 

8. See Mittnik and Rachev (1993a, 1993b). 

9. T (a; b) is the incomplete gamma function 

defined as T (a;b) = / 0 & t n ~ 1 e~ t dt. 

10. In this case, the gamma distribution is called 
the Erlang distribution. 

11. I(x;a, p) is the regularized beta function 

equal to /* w“ _1 ( 1 - uf- l du x . 

12. See Cizek, Flardle, and Weron (2005). 
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13. See Klugman, Panjer, and Willmot 
(2004). 

14. See Klugman, Panjer, and Willmot (2004). 

15. In literature, thin-tailed distributions are 
also called light-tailed distributions, and 
heavy-tailed distributions are also called 
fat-tailed distributions. We will use the cor¬ 
responding terms interchangeably. 

16. Original dataset is available from Cruz 
(2002), Chapter 3, p. 57. 

17. Note that the density specification for the 
exponential and Weibull distributions in 
Cruz (2002) are different. We report the pa¬ 
rameter values based on the specifications 
of the density functions as presented in this 
entry. 

18. The KS values reported in Table 3 should 
be further scaled by ~Jyi (n being the sample 
length) if we want to compare the goodness 
of fit across samples of different lengths. 

19. The test statistics are unadjusted to the 
length of data. 

20. The Gumbel distribution is light-tailed 

and has density f(x) = £ exp { — — 

exp { —^-}), defined on x e Di. The sup¬ 
port allows for negative loss values, so 
the Gumbel distribution is unlikely to 
find much application in operational risk 
modeling. 

21. The density of the loglogistic distribution is 
f(x) — ax 1/b ~ 1 /[b( 1 + ax 1/b ) 2 ]. 

22. For the tables with the y 2 goodness-of-fit 
statistic values and other details of this 
empirical study we refer the reader to 
de Fontnouvelle, Rosengren, and Jordan 
(2006). 

23. Original dataset is available from Lewis 
(2004), Chapter 7, p. 87. 

24. Lewis (2004) does not report the parame¬ 
ter estimates for the Gaussian and Weibull 
cases. We computed them directly by fitting 
the ditributions to the data. 

25. We stated earlier that an exponential trans¬ 
formation of an exponentially distributed 
random variable follows a 1-parameter 
Pareto distribution. 
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Introduction to Stochastic 
Programming and Its Applications 
to Finance 
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Abstract: Mathematical programming is one of a number of operations research techniques that em¬ 
ploys mathematical optimization models to assist in decision making. Mathematical programming 
includes linear programming, integer programming, mixed-integer programming, nonlinear pro¬ 
gramming, stochastic programming, and goal programming. Mathematical programming models 
allow the decision maker to identify the "best" solution. This is in contrast to other mathematical 
tools that are in the arsenal of decision makers such as statistical models (which tell the decision 
maker what occurred in the past), forecasting models (which tell the decision maker what might 
happen in the future), and simulation models (which tell the decision maker what will happen 
under different conditions). The mean-variance model for portfolio selection as formulated by 
Markowitz is an example of an application of one type of mathematical programming (quadratic 
programming). However, in formulating optimization models in many applications in finance, de¬ 
cision makers need to take into consideration the uncertainty about the model's parameters and the 
multiperiod nature of the problem faced. To deal with these situations, the technique of stochastic 
programming is employed. 


The dynamic nature of financial decision mak¬ 
ing requires the use of tools that are capable 
of capturing the multiperiod nature inherent in 
problems faced by asset managers in portfolio 
selection decisions and financial managers in 
capital budgeting decisions. These tools should 
be understandable with adequate treatment of 
uncertainty. They should incorporate practical 
considerations, such as transaction costs in the 
case of asset managers. Stochastic program¬ 
ming bears these characteristics. In this entry, 
we discuss the basics of stochastic program¬ 


ming, give a brief history, and emphasize its 
importance by comparing the approach to other 
tools used in finance. 

WHAT IS STOCHASTIC 
PROGRAMMING? 

Stochastic programming is nothing but a fancy 
name for the study of optimal decision making 
under uncertainty. As opposed to "determin¬ 
istic," the term "stochastic" implies that some 
of the parameters of the problem are random 
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(that is, not known with certainty); the term 
"programming" points to links with mathemat¬ 
ical programming and optimization algorithms. 

Uncertainty is almost always inherent in real- 
world decision problems (and even more so in 
financial planning). As an example, we may 
consider a bet whose outcome is determined by 
flipping a coin. In such problems, uncertainty 
of parameters may be due to the presence of un¬ 
certain events (e.g., a coin flip in the previous 
example) or simply due to lack of reliable data. 

In the past, due to computational and infor¬ 
mational limitations, optimal decision models 
were often formulated deterministically by re¬ 
placing the uncertainties with expectations or 
best estimates. With contributions from many 
disciplines, including operations research, and 
improvements in the information technology 
(faster hardware and software), stochastic pro¬ 
gramming is rapidly developing today. 

The main features of a stochastic program, 
which can be viewed as an optimal decision 
model with explicit consideration of uncertain¬ 
ties, are: 

• Random parameters with known (or partially 
known) distributions. 

• Several decision variables with many poten¬ 
tial values. 

• Discrete time periods for decisions. 

• Use of expectations (or other functions of de¬ 
cision variables) for objectives. 

The problem structure, constraints, and ob¬ 
jectives (risk/reward) are modeled across time 
along with the uncertainty of events. Future 
uncertainty is modeled through generating sce¬ 
narios over time. In other words, the real¬ 
izations of the uncertain parameters may be 
(gradually) revealed after some or all of the 
decisions have been made. High-performance 
computers take advantage of sophisticated al¬ 
gorithms to determine the optimal decision that 
will take into account the future uncertainty. As 
the uncertainty is revealed after each stage, re¬ 
course decisions can be made in the light of new 
information. 


The relative importance of these main features 
contrasts with other decision-making models, 
such as statistical decision theory, decision anal¬ 
ysis, dynamic programming, Markov decision 
processes, and stochastic control (SC). In con¬ 
trast to statistical decision theory, stochastic 
programming has emphasized solution meth¬ 
ods and analytical solution properties over 
procedures for constructing objectives and up¬ 
dating probabilities. Stochastic programs gen¬ 
erally have higher dimensions (that is, larger 
problem size) than SC models, which put more 
emphasis on control rules and have more re¬ 
strictive constraint assumptions. 

We can see the first forms of decision mod¬ 
els that involve uncertainty in the early days 
of the history of mathematical programming. 
Beale (1955), Dantzig (1955), and Charnes 
and Cooper (1959) were the first to pro¬ 
pose linear programs with random parameters. 
Dantzig named his approach "linear program¬ 
ming under uncertainty," whereas Charnes and 
Cooper called theirs "chance/probabilistically 
constrained programming." 

Subsequently, stochastic programming has 
become a major subfield of mathematical pro¬ 
gramming with several theoretical develop¬ 
ments. For overviews of the literature including 
algorithms and applications, see Kali and Wal¬ 
lace (1994), Infanger (1994), Ermoliev and Wets 
(1988), Birge and Louveaux (1997), Wallace and 
Ziemba (2003), Prekopa (1995), Higle and Sen 
(1996), Wallace et al. (1996), Censor and Zenios 
(1997), Wets and Ziemba (1999), Dupacova et al. 
(2002), Birge et al. (2002), and Ruszczynski and 
Shapiro (2003). 

In general, stochastic optimization models re¬ 
sult in large-scale programs since they include a 
large number of scenarios to reflect all possible 
outcomes of future uncertainty. Therefore, ef¬ 
forts on the algorithmic developments focused 
on adaptations of large-scale linear program¬ 
ming methods for special classes of stochastic 
programs whose structures are exploitable. In 
other words, emphasis was placed on form¬ 
ing the deterministic equivalent program and 
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taking advantage of the structure of the result¬ 
ing formulation. Dantzig and Madansky (1961) 
introduced Dantzig-Wolfe decomposition as a 
possible method. One of the most successful 
approaches has been the application of Ben¬ 
ders' decomposition (Benders, 1962) method to 
stochastic programs, originally developed by 
Van Slyke and Wets (1969). Birge (1985) ex¬ 
tended this idea for multistage stochastic pro¬ 
grams. In general, these methods concentrate 
on linear models. The diagonal quadratic ap¬ 
proximation (DQA) algorithm, originally de¬ 
veloped for linear programs by Mulvey and 
Ruszczynski (1992), can handle both quadratic 
and general convex (or equivalently concave) 
objective functions with linear constraints, as 
shown in Berger et al. (1994). The progressive 
hedging algorithm, developed by Rockafellar 
and Wets (1991), dualizes the nonanticipativity 
constraints and, like DQA, iterates over scenar¬ 
ios to force these constraints to be equal. Unlike 
DQA, there is no quadratic penalty term and 
all scenarios are coordinated through a master 
processor. Mulvey and Vladimirou (1991) suc¬ 
cessfully implemented the progressive hedging 
algorithm in the context of stochastic networks. 

Specialized software packages that employ 
these methods are much faster than general 
solvers. Combined with algebraic modeling 
languages, such as AMPL, these specialized 
stochastic programming solvers provide effi¬ 
cient means of tackling problems that involve 
high levels of uncertainty. 

Stochastic Programming in Finance 

Financial planning represents one of the major 
application areas of stochastic programming. 
In fact, it is a natural domain for stochastic 
programming, since risk needs to be incorpo¬ 
rated into investment decisions (portfolio de¬ 
cisions and capital budgeting decisions) and 
the problem structure is amenable to alge¬ 
braic constraints and relationships. Determin¬ 
istic approximations would fail to see the big 
picture. For example, through stochastic pro¬ 
grams, portfolio allocations that would opti¬ 


mize an investor's risk level under several 
scenarios can be determined; by contrast, be¬ 
cause they ignore risk, deterministic programs 
provide inadequate solutions. Static portfolio 
selection models, based on Markowitz's mean- 
variance model (1952), have been proposed in 
many cases; however, their implementations 
may result in significant transaction costs and 
mistimed liquidation of assets. Examples of ap¬ 
plication of stochastic programming in financial 
planning can be found in Ziemba and Vickson 
(1975) and Zenios (1992). 

Within finance, stochastic programming ap¬ 
plications have greatly increased in recent 
years, particularly in asset-liability manage¬ 
ment (ALM). Multistage stochastic programs 
take into account the dynamic aspects of ALM 
problems faced by institutional investors. Based 
on assumptions about the (joint) dynamics 
of risk factors that are usually described by 
stochastic processes, representative scenarios 
for investment strategies are generated. Trans¬ 
actions take place at discrete points in time over 
a finite planning horizon. Moreover, several 
constraints (e.g., liability considerations, liquid¬ 
ity restrictions, limits on risk exposure) can be 
taken into account. 

Since multistage programs suffer from an ex¬ 
ponential growth in problem size with respect 
to the number of periods under consideration, 
the first models for ALM that appeared in the 
early 1980s [see Kallberg et al. (1982), Kusy 
and Ziemba (1986)] were restricted to a two- 
stage structure due to computational limita¬ 
tions. Mulvey and Vladimirou (1992) looked 
at optimal investment strategies given liabili¬ 
ties in a network environment. At Fannie Mae, 
Holmer (1994) implemented a system to min¬ 
imize investment risk while taking into ac¬ 
count that firm's retained mortgage portfolio. 
Advances in computing power, paired with effi¬ 
cient algorithms that are specialized for stochas¬ 
tic programming, help researchers implement 
and solve very large scale stochastic programs, 
such as the pension fund model of Gondzio and 
Kouwenberg (2001), with millions of scenarios. 
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One of the first successful commercial mul¬ 
tistage stochastic programming applications is 
the Russell-Yasuda Kasai model (see Cariho 
et al., 1994,1998; and Carino and Ziemba, 1998). 
The model was employed to optimize the in¬ 
vestment decisions for a Japanese insurance 
company over time where investment returns 
and liabilities are uncertain. The problem is 
complicated by constraints to meet the random 
liabilities and legal restrictions on the use of in¬ 
come in Japan. 

Other successful commercial applications in¬ 
clude the Towers Perrin-Tillinghast ALM sys¬ 
tem of Mulvey et al. (2000), the fixed income 
portfolio management models of Zenios (1995) 
and Beltratti et al. (1999), and the InnoALM 
system of Geyer and Ziemba (2008). A good 
number of applications in ALM are provided in 
Ziemba and Mulvey (1998), Ziemba (2003), and 
Zenios and Ziemba (2006). 

Among other areas in finance, capital bud¬ 
geting and fixed income portfolio manage¬ 
ment have been researched extensively using 
stochastic programming methods. For the for¬ 
mer, Lockett and Gear (1975), De et al. (1982), 
and Turney (1990) are the earliest applications. 
Bradley and Crane (1972) were the first to pro¬ 
pose stochastic programming for bond port¬ 
folio management. Zenios and Kang (1993) 
developed a portfolio immunization strategy in 
a multi-period stochastic optimization frame¬ 
work. Granville et al. (1994) describe a dual 
method for an asset-only allocation problem. 
Many other applications in the fixed income 
literature exist, including Hiller and Eckstein 
(1993) and Golub et al. (1995). 

STOCHASTIC 
PROGRAMMING VERSUS 
OTHER METHODS IN 
FINANCE 

In this section, we compare stochastic pro¬ 
gramming with other methods applied to fi¬ 
nancial planning (especially to ALM). First, 


we highlight the dynamic aspects of stochas¬ 
tic programming and show its differences with 
static models. Afterward, we briefly discuss 
continuous-time models in finance and compare 
these models with stochastic programming—a 
discrete-time approach. 

Static versus Dynamic Models in 
Financial Planning 

The most well-known static model for financial 
planning is, without a doubt, the mean-variance 
model of Markowitz (1952). In this framework, 
the minimum-variance portfolio that satisfies 
a required expected return defines the optimal 
portfolio. Mulvey (1989) extended this model 
to account for liabilities by replacing the re¬ 
turn measure with the surplus (defined as assets 
minus liabilities). Others have introduced 
downside risk measures (e.g., conditional 
value-at-risk, semivariance, mean absolute de¬ 
viation, to name a few) to replace variance, rec¬ 
ognizing the fact that variance is not a good 
risk measure for most asset classes (such as 
derivatives and fixed income securities) and 
for long-term investors. (See, e.g., Fishburn, 
1977, Worzel et al., 1994, and Rockafellar and 
Uryasev, 2000.) Despite being computationally 
attractive, static models are inappropriate for 
long-term investors facing sequential decisions. 
Single-period models, unlike dynamic models, 
fail to cope with the dynamic aspects of the 
problem, such as transaction costs. 

Among other static models, duration¬ 
matching models seem to be interesting, espe¬ 
cially for ALM. These models seek to protect 
the surplus against an interest rate uncertainty. 
The optimal portfolio of assets is the lowest- 
cost portfolio whose value and duration are 
equal to those of liabilities. Applications can 
be quite successful in certain cases, for exam¬ 
ple, when a defined benefit pension plan has 
been terminated and taken over by an insur¬ 
ance company or when the transaction costs 
are low. However, these models ignore the facts 
that individual cash inflows and outflows are 
not matched and that one needs to adjust to the 
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changes in duration at every stage (which leads 
to high transaction costs). Therefore, computa¬ 
tional and structural advantages of these mod¬ 
els are insufficient to justify their drawbacks. 

Dynamic models, in contrast, provide sub¬ 
stantial flexibility to address the issues faced 
by long-term investors. They are not as easy 
to solve and conceptualize as static models; 
however, as discussed earlier in this entry, 
the advances in technical aspects of stochas¬ 
tic programming and today's computational 
power more than make up for these incapaci¬ 
ties. Among these methods, dynamic program¬ 
ming seems to be especially interesting from an 
ALM perspective, as the optimal decisions are 
obtained in feedback form. However, it suffers 
from the curse of dimensionality as the planning 
horizon or the uncertainty representation is ex¬ 
tended. An alternative method to overcome this 
problem is to specify a decision rule within the 
same framework, which also helps handle the 
transaction costs more easily. Nevertheless, in¬ 
corporating decision rules leads to nonconvex 
optimization models (see Mulvey and Simsek, 
2002). Fleten et al. (2002) illustrate the supe¬ 
rior performance of dynamic models over static 
models. In the next section, we discuss the two 
major types of dynamic models. 

Continuous-Time Models versus 
Stochastic Programming 

Continuous-time models were introduced to 
the finance literature by Merton (1969). The 
variables that define the states of the world are 
modeled through stochastic differential equations 
(SDEs). Asset prices also follow SDEs whose 
parameters may be state and / or time depen¬ 
dent. Trading is assumed to occur continuously. 
Under additional assumptions on investors' 
preferences (that is, utility functions) and the 
structure of the economy, an explicit analyti¬ 
cal solution can be found for these models by 
SC techniques. Thus, they provide better in¬ 
sights than the stochastic programming solu¬ 
tions, which are hard to generalize. However, 


as Cochrane (2001, p. 28) suggests: "... in the 
complexity of most practical situations, one of¬ 
ten ends up resorting to numerical simulation 
of a discretized model anyway." 

Although some of the SC recommendations 
are implementable, the model simplifications 
may render them ineffective. As these mod¬ 
els cannot incorporate complex constraints im¬ 
posed by realistic situations and most investors 
(e.g., pension funds) do not want to trade con¬ 
tinuously, we turn to stochastic programming, 
which allows decisions to be made at a finite 
number of discrete points in time. 

In most cases, stochastic programming mod¬ 
els require the uncertainties be approximated 
by a scenario tree with a finite number of 
states of the world at each time. As Kouwen- 
berg and Zenios (2006, p. 291) suggest: ",.. 
important practical issues such as transaction 
costs, multiple state variables, market incom¬ 
pleteness, taxes and trading limits, regulatory 
restrictions, and corporate policy requirements 
can be handled simultaneously within the frame¬ 
work." This huge practical advantage, unfor¬ 
tunately, comes at a significant cost: curse of 
dimensionality. As analytical solutions are not 
possible, stochastic programming models need 
to be solved via numerical optimization. The 
model size explodes as the size of the state space 
or the number of decision stages increases. In 
recent years, this drawback has been substan¬ 
tially overcome through the development of 
new algorithms and the advances in comput¬ 
ing power. Still, one should be careful about 
incorporating too much detail into a stochastic 
programming model, not because of the com¬ 
putational disadvantages but mainly to avoid 
confusing the decision maker, since SP solutions 
are hard to generalize. 

It is, however, interesting to note that the 
continuous-time models have been the focus 
of research in the financial economics literature, 
whereas models in the operation research litera¬ 
ture are mostly stated in discrete time. As Berger 
(1995) points out, there have been several 
successful applications of SC, such as the 
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Black-Scholes option pricing formula (Black 
and Scholes, 1973) and the continuous-time cap¬ 
ital asset pricing model (Merton, 1973). See 
also Constantinides (1986), Dumas and Luciano 
(1991), and Shreve and Soner (1991) for SC ap¬ 
plications with practical considerations such as 
transaction costs. 

A GENERAL MULTISTAGE 
STOCHASTIC 
PROGRAMMING MODEL 
FOR FINANCIAL PLANNING 

To illustrate the use of stochastic programming, 
we provide in this section a multistage stochas¬ 
tic program to tackle a long-term investment 
problem. We formulate the deterministic equiv¬ 
alent of the stochastic program and we discuss 
the issue of modeling the uncertain parameters 
on scenario generation methods. 

Model Formulation 

Here, we define the multiperiod investment 
problem as a multistage stochastic program. 
The basic model is a variant of Mulvey et al. 
(1997), with special attention to transaction 
costs. 

To define the model, we divide the entire plan¬ 
ning horizon T into two discrete time intervals 
Ti and T 2 , where Ti = {0,1,..., r) and T 2 = 
{t + 1,..., T}. The former corresponds to pe¬ 
riods in which investment decisions are made. 
Period r defines the end of the planning hori¬ 
zon. We focus on the investor's position at the 
beginning of period r. Decisions occur at the be¬ 
ginning of each time stage. Much flexibility ex¬ 
ists. An active trader might see his time interval 
as short as minutes, whereas a pension plan ad¬ 
viser will be more concerned with much longer 
planning periods such as the dates between the 
annual board of directors' meetings. It is pos¬ 
sible for the steps to vary over time—short in¬ 
tervals at the beginning of the planning period 
and longer intervals toward the end. T 2 handles 
the horizon at time r by calculating economic 


and other factors beyond period r up to period 
T. The investor renders passive decisions after 
the end of period r. 

Asset classes are defined by set A = 
{1,2,..., I}, with category 1 representing cash. 
The remaining asset classes can include growth 
and value stocks, bonds, real estate, hedge 
funds, or private equity. The asset classes 
should track well-defined market segments. 
Ideally, the co-movements between pairs of as¬ 
set class returns would be relatively low so 
that diversification can be done across the as¬ 
set classes. 

In multiperiod models, uncertainty is repre¬ 
sented by a set of distinct realizations, called 
scenarios, s e S. The scenarios may reveal iden¬ 
tical values for the uncertain quantities up to a 
certain period; that is, they share common in¬ 
formation history up to that period. We address 
the representation of the information structure 
through nonanticipativity constraints, which 
require that variables sharing a common his¬ 
tory, up to time period f, must be set equal to 
each other (see equation (7) below). 

We assume that the portfolio is rebalanced 
at the beginning of each period. Alternatively, 
we could simply make no transaction except to 
reinvest any dividend and interest—a buy-and- 
hold strategy. For convenience, we also assume 
that the cash flows are reinvested in the gener¬ 
ating asset class and all the borrowing (if any) 
is done on a single-period basis. 

For each i e A, f e Tj, and s e S, we define the 
following parameters and decision variables. 

Parameters 

r s i t = 1 + Pi t , where p\ t is the percent return for 

asset i, in time period t, under scenario s 
(projected by a stochastic scenario generator, for 
example, see Mulvey et al. [2000]). 

7i s Probability that scenario s occurs, it s = 1. 

seS 

w 0 Wealth at the beginning of time period 0. 

Oij Transaction costs incurred in rebalancing asset i 
at the beginning of period t (symmetric 
transaction costs are assumed, that is, cost of 
selling equals cost of buying). 
fi s t Borrowing rate in period t, under scenario s. 
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Decision Variables 

x? t Amount of money in asset class i, at the 

beginning of time period t, under scenario s, 
after rebalancing. 

vf i Amount of money in asset class i, at the 

beginning of time period t, under scenario s, 
before rebalancing. 

Wf Total wealth at the beginning of time period t, 
under scenario s. 

p®, Amount of asset i purchased for rebalancing in 
period t, under scenario s. 

d- t Amount of asset i sold for rebalancing in period 
f, under scenario s. 

b s t Amount of money borrowed at the beginning 
of period t, underscenario s. 

Given these definitions, we present the deter¬ 
ministic equivalent of the stochastic asset-only 

allocation problem. 

Max EU«) = 3r s U«) (1) 

seS 

subject to 

£<0 = ^0 V s e S (2) 

ieA 

Y Xj r = w z V s e S (3) 

ieA 


t = 1 t = 2 



v s it = r- Vs e S, t = 1,..., r, ieA 

( 4 ) 

Kt = v h + PtA 1 - °i->) ~ d h (5) 

Vs eS,i e A/{1}, t = 1,..., r 

*!,t = v u + 'E d !A 1 ~ “‘A 

-Hvlt-K- !(1 + J8f_i) + 

¥ i 

Vs e S, f = 1,..., r 

x- t — xf t V s and s' with identical past up to 
time t,t — 1, ..., t ^ 

A generalized network investment model is 
presented in Figure 1. This figure depicts the 
flows across time for each of the asset classes. 
While all constraints cannot be put into a net¬ 
work model, the graphical form is easy for asset 
managers to comprehend. General linear and 
nonlinear programs are now readily available 
for solving the resulting problem. However, a 
network may have computational advantages 
for extremely large problems, such as security 
level models. 

As with single-period models, the nonlin¬ 
ear objective function (1) can take several 


t = T 



Figure 1 Network Representation for Each Scenario, s e S 
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different forms. If the classical return-risk 
function is employed, then (1) becomes Max 
Z = i] x Mean(w; T ) — (1 — r]) x Risk(w r ), where 
Mean(w T ) is the expected total wealth and 
Risk(u> r ) is the risk of the total wealth across 
the scenarios at the beginning of period r. Pa¬ 
rameter ij indicates the relative importance of 
risk as compared with the expected value. This 
objective leads to an efficient frontier at period r 
by allowing alternative values of i] in the range 
[0,1]. It can be shown that a viable alternative to 
the mean-risk framework is the von Neumann- 
Morgenstern expected utility of wealth at the 
beginning of period r. 

Let's review the six constraints: 

1. Constraint (2) guarantees that the total initial 
investment equals the initial wealth. 

2. Constraint (3) represents the total wealth in 
the beginning of period r. This constraint 
can be modified to include assets, liabili¬ 
ties, and investment goals, in which case the 
modified result is referred to as the "surplus 
wealth" (Mulvey, 1989). Many investors ren¬ 
der investment decisions without reference 
to their liabilities or investment goals. Mul¬ 
vey (1989) incorporates the notion of surplus 
wealth into the mean-variance and the ex¬ 
pected utility models to address liabilities in 
the context of asset allocation strategies. 

3. Constraint (4) depicts the wealth vf t accu¬ 
mulated at the beginning of period t before 
rebalancing in asset i. 

4. Constraint (5) gives the flow balance con¬ 
straint for all assets except cash for all pe¬ 
riods. This constraint guarantees that the 
amount invested in period t equals the net 
wealth for the asset. 

5. Constraint (6) represents the flow balancing 
constraint for cash. 

6. Constraints (7) are the nonanticipativity con¬ 
straints. 

The preceding constraints ensure that the sce¬ 
narios with the same past will have identical 
decisions up to that period. While these con¬ 
straints are numerous, solution algorithms take 


NAV 1 RV 1 NAV 2 RV 2 NAV 3 RV 3 • • • RV S 



Figure 2 Split Variable Formulation (NAV: 
Nonanticipativity Variables, RV: Remaining Vari¬ 
ables) 


advantage of their simple structure. See Birge 
and Louveaux (1997), Dantzig and Infanger 
(1993), Kali and Wallace (1994), and Mulvey and 
Ruszczyhski (1995), for example. 

Figure 2 depicts the constraint structure for 
a split variable formulation of the stochastic 
asset allocation problem. This formulation has 
proven successful for solving the model using 
techniques such as the progressive hedging al¬ 
gorithm of Rockafellar and Wets (1991) and the 
DQA algorithm by Mulvey and Ruszczyhski 
(1995). The split variable formulation can be 
beneficial for direct solvers that use the interior 
point method. Given today's powerful PCs, the 
nonlinear optimization system can be solved in 
a direct fashion for realistic-size implementa¬ 
tions. 

By substituting constraint (7) back into con¬ 
straints (2) to (6), we obtain a standard form 
of the stochastic allocation problem. Con¬ 
straints for this formulation exhibit a dual block 
diagonal structure for two-stage stochastic pro¬ 
grams and a nested structure for general multi¬ 
stage problems. This formulation may be better 
for some direct solvers. The standard form of 
the stochastic program possesses fewer deci¬ 
sion variables than the split variable model and 
is the preferred structure by many researchers 
in the field. This model can be solved by means 
of decomposition methods, for example, the 
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L-shaped method (a specialization of Benders' 
algorithm). (See Birge and Louveaux, 1997; 
Consigli and Dempster, 1998; Dantzig and In- 
fanger, 1993; and Kouwenberg and Zenios, 
2006.) 

As shown by Consigli and Dempster (1998), 
Dantzig and Infanger (1993), Mulvey et al. 
(2000), Ziemba and Mulvey (1998), and Ziemba 
(2003), a multistage model can provide superior 
performance over single-period models. 

Modeling Future Uncertainties 
(Scenario Generation) 

To model future uncertainty in our financial 
planning problem, we utilize a representative 
set of scenarios. In this section, we review the 
procedures for scenario generation and give de¬ 
tails about the approach described. 

In most cases, stochastic programming mod¬ 
els require that the future uncertainties are 
approximated by a scenario tree with a finite 
number of states of the world at each time. The 
planning horizon is divided into T time periods 
(generally years for pension planning). 

A sample scenario tree of three periods and 
nine scenarios is depicted in Figure 3. The 
root of the tree represents the current state of 
the world. A scenario is defined as a single 
branch from the root to any leaf of the tree 
(e.g., the boldfaced path corresponds to sce¬ 
nario 4). Thus, all of the parameter uncertain¬ 
ties are depicted along this branch. Each node 
represents a state of the world under a given 
scenario at a given time; for instance, the bold¬ 
faced node corresponds to the set of uncertain¬ 
ties at the end of period 2 under scenario 4. 
The stochastic program will determine an opti¬ 
mal decision for each node of the scenario tree, 
given the information available at that point. 
As there are multiple succeeding nodes, the op¬ 
timal decisions will be determined without ex¬ 
ploiting hindsight. A stochastic programming 
model will find the optimal policy that will fit 
the current state of the world and the decision 
maker in each node, while anticipating the op- 


t = 0 t=1 t = 2 t = 3 



timal adjustment of the policy later on as the 
tree evolves and more information is revealed. 

Generating scenario trees to represent the 
evolution of future uncertainty is a two-step 
process. Figure 4 depicts a diagram of the 
process. 

The first step involves the construction of 
a stochastic forecasting model. This involves 
choosing a model that would be appropriate 
for the uncertain variables and calibrating the 
parameters of this model using historical data. 

The simplest approach, bootstrapping histor¬ 
ical data, eliminates the need for a mathemat¬ 
ical model (see, e.g., Grauer and Flakansson, 
1982). Among mathematical models, stochas¬ 
tic differential equations and time series anal¬ 
ysis are two commonly used techniques to 
generate anticipatory scenarios. Our preference 
is to employ the former technique, in which a 


Model Selection/ Sampling 

Calibration Process 



Figure 4 From Historical Data to Scenario Trees 
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sequence of economic factors (e.g., gross do¬ 
mestic product, corporate earnings, interest 
rates, and inflation) drive the state variables (see 
Mulvey [1996] for the details). The parameters 
of the SDEs for the economic factors and asset 
returns can be calibrated using historical data. A 
standard variance reduction method, antithetic 
variates, can be employed to improve the accu¬ 
racy of the model's recommendations. Indirect 
inference methods for calibrating the parame¬ 
ters of the resulting stochastic system can be 
employed (see Gourieroux et al., 1993). 

The next step involves the discretization of the 
scenarios generated by the stochastic forecast¬ 
ing model in the first step. To avoid any com¬ 
putational disadvantages, this has to be done 
using a small number of nodes, which in turn 
will lead to approximation errors. There are sev¬ 
eral methods to achieve this depending on the 
models employed in the first step (see Hoyland 
and Wallace, 2001; Kouwenberg and Zenios, 
2006; Grebeck, Rachev, and Fabozzi, 2009; and 
Ziemba, 2003). We first create discretized sam¬ 
ple paths by moment-matching, using the cas¬ 
caded SDE structure in the first step. Then, we 
convert these sample paths to a scenario tree by 
clustering (see Dupacova et al., 2002). We be¬ 
gin by grouping similar first stage values of the 
sample paths into clusters, and then continue 
sequentially through each stage. 

For an ALM system, one needs to generate 
scenarios for the liability side as well as the as¬ 
set side. Obviously, both components are driven 
by economic factors. Liabilities are affected by 
actuarial predictions as well. When modeling 
the asset returns, one may need to use senti¬ 
ment or expert judgment to improve the range 
of scenarios. 

The future value of the liabilities can be es¬ 
pecially tricky to project for institutions, such 
as pension plans, where the liabilities consist of 
several contracts and therefore the valuation is 
affected by various sources of uncertainty. For 
a typical pension plan, one can simulate the 
future status of the participants by making as¬ 
sumptions about the retirement rates, resigna¬ 


tion frequency, promotion / demotion probabil¬ 
ities, and the mortality rate. Once this is done, 
the interest rates are forecasted and used to cal¬ 
culate the present value of the liabilities. 

When modeling the asset returns, the eco¬ 
nomic factors that drive the primary asset- 
class returns are projected as a first step, which 
would then be followed by the projection of re¬ 
turns for these primary assets. More complex 
assets would be the last to be modeled in this 
setup. Alternatively, one can model all uncer¬ 
tain variables at once through one big set of 
multivariate time-series models. 


KEY POINTS 

• Stochastic programming is an operations re¬ 
search method for optimal decision making 
under uncertainty and bears suitable charac¬ 
teristics for modeling and solving financial 
planning applications, such as asset-liability 
management, capital budgeting, and fixed in¬ 
come portfolio management. 

• The main features of a stochastic program are: 
random parameters with known (or partially 
known) distributions; several decision vari¬ 
ables with many potential values; multiple 
discrete time periods for decisions; use of ex¬ 
pectations (or other functions of decision vari¬ 
ables) for objectives. 

• Stochastic programs typically have larger 
problem size than stochastic control mod¬ 
els, which put more emphasis on control 
rules and have more restrictive constraint 
assumptions. 

• Stochastic programming models generally re¬ 
quire that the future uncertainties are approx¬ 
imated by a scenario tree with a finite number 
of states of the world at each time. 

• Multiperiod stochastic programs with a large 
number of parameters and scenarios result 
in large-scale deterministic-equivalent pro¬ 
grams. Specialized software packages com¬ 
bined with algebraic modeling languages are 
utilized to efficiently tackle these problems. 
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Abstract: As the use of quantitative techniques has become more widespread in the investment 
industry, the issue of how to handle portfolio estimation and model risk has grown in importance. 
Robust optimization is a technique for incorporating estimation errors directly into the portfolio 
optimization process, and is typically applied in conjunction with robust statistical estimation 
methods. The robust optimization approach uses the distribution from the estimation process to 
find a portfolio allocation in one single optimization, while keeping the computational costs low. 
Robust portfolios tend to be less sensitive to estimation errors, offer some improved portfolio 
performance, and often have lower turnover ratios. 


The concepts of portfolio optimization and 
diversification have been instrumental in the 
understanding of financial markets and the de¬ 
velopment of financial decision making. The 
major breakthrough came in 1952 with the 
publication of Harry Markowitz's theory of 
portfolio selection. Markowitz suggested that 
sound financial decision making is a quantita¬ 
tive trade-off between risk and return. His work 
spurred a vast amount of research on quan¬ 
tifying market behavior, and one of the main 
practical consequences of his theory was the 


acceptance of the notion that diversification re¬ 
duces portfolio risk. 

Sixty years after Markowitz's seminal work, 
substantial advances have been made in the 
theory and practice of portfolio management. 
Today, quantitative techniques for forecast¬ 
ing asset returns, portfolio allocation, risk 
measurement, trading and rebalancing, to 
mention a few, have a major presence in the 
financial industry. Their proliferation has been 
facilitated by the decreased cost of comput¬ 
ing power and the increased availability of 
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sophisticated and specialized software that al¬ 
lows investors to incorporate their forecasts 
about the future direction of markets into disci¬ 
plined analytical frameworks. 

As the use of quantitative techniques has be¬ 
come widespread in the investment industry, 
the consideration of estimation risk and model 
risk has grown in importance. For example, 
Bayesian techniques and robust estimation of 
model parameters are now common in finan¬ 
cial applications. Most recently, practitioners 
have begun incorporating the uncertainty in¬ 
troduced by estimation errors directly into the 
portfolio optimization process by mathemati¬ 
cal techniques referred to as robust optimiza¬ 
tion. Contrary to the traditional approach, in 
which inputs to the portfolio allocation frame¬ 
work are treated as deterministic, robust port¬ 
folio optimization incorporates the notion that 
inputs have been estimated with errors. In this 
case, the inputs are not the traditional forecasts, 
such as expected returns and asset covariances, 
but rather uncertain ty sets containing these point 
estimates (e.g., confidence intervals around the 
forecasts). 

In this entry, we survey the area of robust 
optimization and its applications in portfolio 
management. We begin by explaining the main 
ideas behind the robust optimization approach, 
and discuss the relationship between robust 
optimization and other robust methods for 
portfolio management. Next, we review some 
important developments in robust optimiza¬ 
tion applications, and conclude with a dis¬ 
cussion of future directions in robust portfolio 
management. 

THE ROBUST 

OPTIMIZATION APPROACH 

Introduced in the operations research litera¬ 
ture by Ben-Tal and Nemirovski (1998) and El 
Ghaoui and Lebret (1997), modem robust opti¬ 
mization allows a portfolio manager to solve a 
robust formulation of the portfolio optimization 
problem with one single call to an optimization 


solver in about the same time as the classical 
portfolio optimization problem. The resulting 
optimal portfolio allocations tend to be more 
stable and less sensitive to changes in model 
parameters. 

Consider the classical mean-variance portfo¬ 
lio allocation problem: 

max q'w — kw'Ew 

W 

si. wh = 1 

where q is the vector of expected returns 
(alphas) for N assets in the investment universe, 
£ is the asset-asset covariance matrix, w is the 
N-dimensional vector of portfolio weights, X is 
the risk aversion coefficient, and i is a vector of 
ones. This optimization problem simply states 
that the optimal portfolio weights should be 
chosen so that the expected portfolio return less 
the portfolio risk (scaled by the risk aversion 
coefficient) is as large as possible. The equal¬ 
ity constraint ensures that the portfolio weights 
add up to one. 

As demonstrated, for instance, by Black and 
Litterman (1992), a small change in the expected 
asset returns can result in large changes in the 
optimal portfolio allocation. In other words, the 
classical portfolio optimization problem is not 
robust with respect to small changes in its in¬ 
puts. Since in practice expected returns and as¬ 
set covariances cannot be measured exactly but 
have to be estimated—sometimes with large 
errors—it is important in applications that un¬ 
certainty resulting from estimation errors be 
taken into account. 

One way to make the optimization problem 
robust with respect to estimation errors is to 
require that the optimal solution remains opti¬ 
mal for all values of the expected returns that 
are "close" to the estimates of expected returns 
ft. We can express this requirement in the opti¬ 
mization problem as follows: Instead of using 
the estimate (i of q, we consider a set of vectors 
that are close to the estimate q, and solve the 
optimization problem for all vectors in this set. 
The idea here is that the expected returns may 
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have been estimated with some error, but that 
the estimates are not too far away from the true 
expected returns. Mathematically, this idea is 
incorporated in the definition of an uncertainty 
set for ft, 

Lt«(A) = {b I I hi - Ail < i = 1, N] 

( 1 ) 


In words, the set Uj(p.) contains all vectors 
p = (pi,..., hm) such that each component /x, 
is in the interval [/x, — 8,, jli + <$;], and is often 
referred to as a "box" uncertainty set. From 
a statistical point of view, these intervals can 
be chosen to be certain confidence intervals 
around each point estimate fit. 

We solve a modification of the original op¬ 
timization problem such that even if p takes 
its worst possible value within the uncertainty 
set, the allocation remains optimal. Namely, 
we solve the max-min portfolio optimization 
problem 


max 

W 

S.t. 


min Ip'wl — Aw'Ew 
wh= 1 


At first sight, this optimization problem looks 
complicated, as we have to minimize the 
objective function with respect to p over the 
specified uncertainty set and, simultaneously, 
maximize the objective function with respect to 
w to find the optimal allocation. However, as we 
will see shortly, this problem can be reformu¬ 
lated into an equivalent maximization problem 
with respect to only w. First, let us understand 
what this model implies from an intuitive 
perspective. 

Observe that this model incorporates the no¬ 
tion of aversion to estimation error in the fol¬ 
lowing sense. When the interval [/x, — <5,, /x, + 
<$,] for the expected return of the ith asset is 
large, meaning that the expected return has 
been estimated with large estimation error, then 
the minimization problem over p is less con¬ 
strained. Consequently, the minimum will be 
smaller than it would be in situations when 
the interval for /x; is smaller. Obviously, when 


the interval is small enough, the minimization 
problem will be so tightly constrained that it 
would deliver a solution that is close to the 
optimal solution of the classical portfolio op¬ 
timization problem in which estimation errors 
are ignored. In other words, it is the size of the 
intervals (in general, the size of the uncertainty 
set) that controls the aversion to the uncertainty 
that comes from estimation errors. 

The robust version of the classical portfolio 
optimization problem is obtained by solving the 
max-min problem above, and for this model is 
easy to derive without any involved mathemat¬ 
ics. Namely, it is 

max p'w — 6' |w| — Aw'Ew 

W 

s.t. wh = 1 

where |w| denotes the absolute value of the en¬ 
tries of the vector of weights w. To gain some 
intuition, notice that if the weight of asset i 
in the portfolio is negative, the worst-case ex¬ 
pected return for asset i is A; + A' (we lose the 
largest amount possible). If the weight of asset 
i in the portfolio is positive, then the worst-case 
expected return for asset i is A; — <5; (we gain 
the smallest amount possible). Observe that 
A ;Wi — 8{ \ wi\ equals (A/ — A) Wi if the weight 
Wj is positive and (/x, +8,) w, if the weight 
Wi is negative. Hence, the mathematical expres¬ 
sion in the objective agrees with our intuition: 
It minimizes the worst-case expected portfo¬ 
lio return. In this robust version of the mean- 
variance formulation, assets whose mean return 
estimates are less accurate (have a larger estima¬ 
tion error 8/) are therefore penalized in the ob¬ 
jective function, and will tend to have a smaller 
weight in the optimal portfolio allocation. 

This optimization problem has the same com¬ 
putational complexity as the nonrobust mean- 
variance formulation—namely, it can be stated 
as a quadratic optimization problem. The lat¬ 
ter can be achieved by using a standard trick 
that allows us to get rid of the absolute val¬ 
ues for the weights. The idea is to introduce an 
N-dimensional vector of additional variables i|> 
to replace the absolute values, and to write an 
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equivalent version of the optimization problem, 
max ft'w — 5'rp — Aw'Ew 

W.ljl 

s.t. wh = 1 

i j/i > u>,-; i/'V > —Wi,i = 1,,N 

Therefore, incorporating considerations about 
the uncertainty in the estimates of the expected 
returns in this example has virtually no compu¬ 
tational cost. 

We can view the effect of this particular "ro- 
bustification" of the mean-variance portfolio 
optimization formulation in two different ways. 
On the one hand, we see that the values of the 
expected returns for the different assets have 
been adjusted downwards in the objective func¬ 
tion of the optimization problem. That is, the 
robust optimization model "shrinks" the ex¬ 
pected return of assets with large estimation 
error. On the other hand, we can interpret the 
additional term in the objective function as a 
"risk-like" term that represents penalty for es¬ 
timation error. The size of the penalty is deter¬ 
mined by the investor's aversion to estimation 
risk, and is reflected in the magnitude of the 
deltas. 

More complicated specifications for uncer¬ 
tainty sets have more involved mathematical 
representations, but can still be selected so that 
they preserve an easy computational structure 
for the robust optimization problem. For exam¬ 
ple, a frequently used uncertainty set is 

Lt,(W={p|(p-p/5:- 1 (p-p)<5 2 } (2) 

where is the covariance matrix of estima¬ 
tion errors for the vector of expected returns 
|_i. This uncertainty set represents the require¬ 
ment that the scaled sum of squares (scaled 
by the inverse of the covariance matrix of es¬ 
timation errors) between all elements in the set 
and the point estimates fa, fa ,..., An can be 
no larger than S 2 . We note that this uncertainty 
set cannot be interpreted as individual confi¬ 
dence intervals around each point estimate. In¬ 
stead, those familiar with statistics will notice 
that this uncertainty set captures the idea of a 
joint confidence region used, for example, in 


Wald tests. In practice, the covariance matrix of 
estimation errors is often assumed to be diag¬ 
onal. For this particular case, the set contains 
all vectors of expected returns that are within a 
certain number of standard deviations from the 
point estimate of the vector of expected returns, 
and the resulting robust portfolio optimization 
problem would protect the investor if the vec¬ 
tor of expected returns is indeed within that 
range. 

Selecting Uncertainty Sets from 
Statistical Procedures 

Flow do we select uncertainty sets for a particu¬ 
lar application? In practice, their shape and size 
are usually based on statistical estimates and 
probabilistic guarantees. For example, uncer¬ 
tainty set (1) defines an N-dimensional box: It 
considers possible deviations of the N uncertain 
parameters from their expected values, and the 
resulting robust portfolio optimization problem 
protects against the worst possible realization 
of each individual parameter separately. Uncer¬ 
tainty set (2) defines an N-dimensional ellipsoid 
(in two dimensions, an ellipsoid is an ellipse), 
and is not as conservative as (1). The resulting 
robust portfolio optimization offers protection 
from the worst possible joint deviation of the 
actual expected returns from the forecasts, by 
considering the correlations between the es¬ 
timation errors of the uncertain parameters 
through the covariance matrix T . 

The calibration of the parameters that enter 
the definition of uncertainty sets is very impor¬ 
tant. For example, the intervals for ft that define 
the uncertainty set (1) above can be matched to 
95% or 99% confidence intervals for the esti¬ 
mates of the expected returns. The value of the 
parameter S in the uncertainty set (2) can be 
related to probabilistic guarantees, as we will 
explain later. 

The covariance matrix of the errors in 
the estimated expected asset returns in uncer¬ 
tainty set (2) can be obtained using several 
different techniques. Flowever, its estimation 
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can be problematic because of the difficulty in 
separating the estimation error in expected re¬ 
turns from the inherent variability in actual real¬ 
ized returns (Lee, Stefek, and Zhelenyak, 2006). 
Specifically, if a portfolio manager forecasts a 
5% active return over the next time period, but 
achieves 1%, he cannot argue that there was a 
4% error in his expected return, so evaluating 
X from historical data can be tricky 

In theory, if returns in a given sample of size 
T are assumed to come from a normal distri¬ 
bution, then X u equals (1/T) • X, where X is 
the covariance matrix of asset returns as before. 
However, experience seems to suggest that this 
may not be the best method in practice. One is¬ 
sue is that this approach applies only in a world 
in which returns are stationary. Another impor¬ 
tant issue is whether the estimate of the asset 
covariance matrix X itself is reliable if it is es¬ 
timated from a sample of historical data. It is 
well-known that computing a meaningful asset 
return covariance matrix requires a large num¬ 
ber of observations—many more observations 
than the number of assets in the portfolio—and 
even then the sample covariance matrix may 
contain large estimation errors that produce 
poor results in the mean-variance optimization. 
A fix when sufficient data are not available is to 
compute the estimation errors in expected re¬ 
turns at a factor (e.g., industry, country, sector) 
level, and use their variances and covariances 
in the estimation error covariance matrix for the 
individual asset returns in a manner similar to 
standard factor models. 

Several approximate methods for estimating 
X M have also been found to work well in prac¬ 
tice (Stubbs and Vance, 2005). For example, it 
has been observed that simpler estimation ap¬ 
proaches, such as computing the diagonal ma¬ 
trix containing the variances of the estimates (as 
opposed to the complete error covariance ma¬ 
trix), often provide most of the benefit in robust 
portfolio optimization. In addition, standard 
approaches for estimating expected returns, 
such as Bayesian statistics and regression-based 
methods, generate estimates for the estimation 


error covariance matrix in the process of 
generating the estimates themselves. 

Uncertainty sets (1) and (2) are both sym¬ 
metric, that is, the sets are symmetric around 
the vector of uncertain parameters ft. One can 
also consider asymmetric uncertainty sets that 
better reflect information about the probability 
distributions of the uncertain parameters when 
the probability distributions are skewed (see 
Natarajan, Pachamanova, and Sim, 2008). Re¬ 
cently, there has been also a substantial interest 
in developing "structured" uncertainty sets, 
that is, uncertainty sets that are constructed for a 
specific purpose. Frequently, structured uncer¬ 
tainty sets based on simple intersections of ele¬ 
mentary uncertainty sets are used to minimize 
the "conservatism" in traditional ellipsoidal or 
"box" uncertainty sets. We will discuss such 
uncertainty sets in more detail later in this entry. 

Clarifying a Misconception about 
Robust Optimization 

Among practitioners, the notion of robust 
portfolio optimization is often equated with 
the robust mean-variance model we just dis¬ 
cussed, with uncertainty set (1) or (2) for the 
expected asset returns. While robust optimiza¬ 
tion applications frequently involve one form 
or another of this model, the actual scope of 
robust optimization can be much broader. We 
note that the term "robust optimization" refers 
to the technique of incorporating information 
about uncertainty sets for the parameters in the 
optimization model, and not to the specific defi¬ 
nitions of uncertainty sets or the choice of which 
parameters to model as uncertain. For example, 
we can use the robust optimization method¬ 
ology to incorporate considerations for uncer¬ 
tainty in the estimate of the covariance matrix 
in addition to the uncertainty in expected 
returns, and obtain a different robust portfolio 
allocation formulation. 

Robust optimization can be applied also 
to portfolio allocation models that are differ¬ 
ent from the mean-variance framework, e.g.. 
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Sharpe ratio and value-at-risk optimization 
(see, for example, Goldfarb and Iyengar, 2003 
and Natarajan, Pachamanova, and Sim, 2008). 
There are numerous useful and reasonable ro¬ 
bust formulations, and a complete review is 
beyond the scope of this entry. We refer inter¬ 
ested readers to Fabozzi et al. (2007) for further 
details. 


THE RELATIONSHIP TO 
BAYESIAN METHODS AND 
ECONOMIC THEORY 

Critics have argued that robust optimization is 
not really different from shrinkage estimators 
that combine the minimum variance portfolio 
with a speculative investment portfolio. Indeed, 
when using a particular uncertainty set for the 
expected returns (assuming all other parame¬ 
ters in the mean-variance problem are certain), 
it can be shown that the optimal mean-variance 
portfolio weights using robust optimization are 
a linear combination of the weights of the mini¬ 
mum variance portfolio (which is independent 
of investor preferences or expected returns) and 
a mean-variance efficient portfolio. These port¬ 
folio weights can also be obtained by solving 
a standard mean-variance problem with 
expected return estimates derived from a stan¬ 
dard shrinkage estimator with specific shrink¬ 
age parameters (see, for example, Garlappi, 
Uppal, and Wang, 2007 and Scherer, 2005). Ro¬ 
bust optimization thus appears to offer a less 
transparent way to express investor preferences 
and tolerance to uncertainty than other ap¬ 
proaches, such as Bayesian methods, in which the 
shrinkage parameters can be defined explicitly. 

In general, however, robust optimization is 
not necessarily equivalent to shrinkage estima¬ 
tion. For instance, differences are apparent in 
the presence of additional portfolio constraints. 
Furthermore, as we mentioned earlier, the ro¬ 
bust optimization methodology can be used 
to account for uncertainty in parameters other 
than expected asset returns (covariances of asset 


returns, for example), making its relationship 
with Bayesian estimation even less obvious. 

The concept of robust optimization has been 
criticized also from the point of view of clas¬ 
sical economic theory (see, for example, Sims, 
2001). From a behavioral and decision-making 
point of view, few individuals have max-min 
preferences. Indeed, max-min preferences de¬ 
scribe the behavior of decision makers who face 
great ambiguity and thus make choices con¬ 
sistent with the belief that the worst possible 
outcomes are highly likely. This kind of con¬ 
servative behavior is not typical of the average 
investor. The problem of overconservativeness 
in applying robust optimization, however, can 
be controlled by modifying the specification of 
uncertainty sets for the parameters, as we will 
explain in the following section. 

USING ROBUST PORTFOLIO 
OPTIMIZATION IN 
PRACTICE 

One of the main problems in assessing the prac¬ 
tical benefits of the robust optimization ap¬ 
proach is that its performance is dependent on 
the choice (or calibration) of the model param¬ 
eters, such as the coefficient of aversion to esti¬ 
mation error S. In a sense, however, this issue is 
no different from the calibration of standard pa¬ 
rameters in the classical portfolio optimization 
framework, such as the length of the estima¬ 
tion period to use for forecast generation and 
the choice of the risk aversion coefficient. These 
and other parameters need to be determined 
empirically or subjectively. 

Note also that other robust modeling devices 
such as Bayesian estimators and the Black- 
Litterman model (for an overview, see Fabozzi, 
Focardi, and Kolm, 2006) have similar issues. In 
particular, for shrinkage estimators, the portfo¬ 
lio manager needs to determine which shrink¬ 
age target to use and the size of the shrinkage 
parameter. In the Black-Litterman model, he 
needs to provide his confidence in equilibrium 
as well as his confidence in his views. 
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The values of the robust formulation parame¬ 
ters can sometimes be matched to probabilistic 
guarantees. For example, if the estimates of the 
expected asset returns are assumed to be nor¬ 
mally distributed, then there is an a>% chance 
that the estimates will fall in the ellipsoidal set 
(2) around the manager's estimates pi, 

Us(M = {hKb-M'z-^p-pO^a 2 } 

if S 2 is assigned the value of the <wth percentile of 
ay 2 distribution with degrees of freedom equal 
to the number of assets in the portfolio. As an 
example, suppose that there are 15 assets in the 
asset universe and that all returns are normally 
distributed. If we choose 8 2 = 25, then 95% of 
all expected returns will be in the set lb (pi). 

More generally, if the expected returns can 
belong to any possible probability distribution, 
then assigning 



guarantees that the estimates will fall in the un¬ 
certainty set Its (pi) with probability at least &>% 
(El Ghaoui, Oks, and Oustry, 2003). 

It has been observed that in practice the stan¬ 
dard robust mean-variance formulation with 
the above uncertainty set specification for esti¬ 
mated expected returns may result in portfolio 
allocations that are too pessimistic. Recall that 
the traditional robust counterpart tries to find 
the optimal solution so that constraints contain¬ 
ing uncertain coefficients are satisfied for the 
worst-case realizations of the uncertain param¬ 
eters. Naturally, the larger the uncertainty set, 
the greater the chance that the optimal portfo¬ 
lio allocation will be conservative. Therefore, 
especially in situations in which the worst-case 
expected returns can be far away from the es¬ 
timated expected returns, some portfolio per¬ 
formance may be sacrificed. Of course, we can 
always make a formulation less pessimistic by 
considering a smaller uncertainty set. For the 
uncertainty set above, we can achieve this by 
decreasing the parameter S. However, there is a 
recent trend among practitioners to apply more 


structured restrictions. We provide an example 
of a structured uncertainty set next. 

When we formulated the robust portfolio op¬ 
timization problem earlier in this entry, we 
made the assumption that all of the actual re¬ 
alizations of expected returns could be worse 
than their expected values. Thus, the net ad¬ 
justment in the expected portfolio return will 
always be downwards. While this leads to a 
more robust problem than the original one, in 
many instances it may be too pessimistic to as¬ 
sume that all estimation errors go against us. 
Instead, it may be more reasonable to believe 
that at least some of the true realizations will 
be above their expected values. For example, 
we may make the assumption that there are 
approximately as many realizations above the 
estimated values as there are realizations be¬ 
low the estimated values. This condition can 
be incorporated in the portfolio optimization 
problem by adding an additional restriction to 
the uncertainty set (2). Ceria and Stubbs (2006) 
refer to this adjustment as a "zero net alpha 
adjustment." Instead of adjusting the alphas of 
the estimates, we can perform this kind of ad¬ 
justment also on their standard deviations or 
variances. It can be shown that the effect of the 
zero net adjustment is equivalent to modifying 
the covariance matrix E of estimation errors 
for the expected returns. Tests with real data in¬ 
dicate that robust mean-variance optimization 
with this kind of adjustment for expected return 
estimates outperforms classical mean-variance 
optimization 70% to 80% of the time (Ceria and 
Stubbs, 2006). 

Other structured uncertainty sets include 
"tiered" uncertainty sets in which some of the 
uncertain parameters are modeled as "well- 
behaved," while others are modeled as "mis¬ 
behaving." The modeler can require protection 
against a prespecified number of parameters 
that he believes will "misbehave," that is, 
which will deviate significantly from their 
expected values (see, for example, Bienstock, 
2006). In the context of portfolio optimization, 
we would specify "misbehaving" parameters 
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as those realizations of expected asset returns 
that are likely to be lower than their estimates. 

Effect of Robust Portfolio 
Optimization Formulations on 
Performance 

As we mentioned earlier, some tests with sim¬ 
ulated and real market data indicate that ro¬ 
bust optimization, when inaccuracy is assumed 
in the expected return estimates, outperforms 
classical mean-variance optimization in terms 
of total excess return a large percentage (70% 
to 80%) of the time (Ceria and Stubbs, 2006). 
Other tests have not been as conclusive (Lee, 
Stefek, and Zhelenyak, 2006). The factor that 
accounts for much of the difference is how the 
uncertainty in parameters is modeled. There¬ 
fore, finding a suitable degree of robustness 
and appropriate definitions of uncertainty sets 
can have a significant impact on portfolio 
performance. 

Independent tests by practitioners and aca¬ 
demics using both simulated and market data 
appear to confirm that robust optimization gen¬ 
erally results in more stable portfolio weights, 
that is, that it eliminates the extreme cor¬ 
ner solutions resulting from traditional mean- 
variance optimization. Robust mean-variance 
optimization also appears to improve worst- 
case portfolio performance, and results in 
smoother and more consistent portfolio returns. 
Finally, by preventing large swings in positions, 
robust optimization frequently makes better 
use of the turnover budget and risk constraints. 

Robust optimization, however, is not a 
panacea. By using robust portfolio optimiza¬ 
tion, investors are likely to trade off the opti¬ 
mality of their portfolio allocation in cases in 
which nature behaves as they predicted for pro¬ 
tection against the risk of inaccurate estimation. 
Therefore, investors using the technique should 
not expect to do better than classical portfolio 
optimization when estimation errors have little 
impact, or when typical scenarios occur. They 
should, however, expect insurance in scenarios 


in which their estimates deviate from the actual 
realized values by up to the amount they have 
prespecified in the modeling process. 

PRACTICAL 

CONSIDERATIONS FOR 
ROBUST PORTFOLIO 
ALLOCATION 

Which type of robust models is best for mod¬ 
eling financial portfolios? The short answer is: 
It depends. Among others, it depends on the 
size of the portfolio, the type of assets and their 
distributional characteristics, the portfolio 
strategies and trading styles involved, and 
the existing infrastructure. Sometimes it makes 
sense to consider a combination of several tech¬ 
niques, such as a blend of Bayesian estima¬ 
tion and robust portfolio optimization. This is 
an empirical question—indeed, the only way 
to find out which strategy performs best is 
through thorough research and testing. A sim¬ 
ple step-by-step checklist for robust quantita¬ 
tive portfolio management could include: 

1. Risk forecasting: Develop an accurate risk 
model 

2. Return forecasting: Construct robust ex¬ 
pected return estimates 

3. Classical portfolio optimization: Start with a 
simple framework 

4. Model risk mitigation: 

a. Minimize estimation risk through the use 
of robust estimators 

b. Improve the stability of the optimization 
framework through robust optimization 

5. Extensions 

Needless to say, by no means do we claim that 
this list is complete or that it has to be followed 
religiously—it is simply provided as a starting 
point for the quantitative portfolio manager. 

In general, the most difficult item in this list 
is the calculation of robust expected return esti¬ 
mates. Developing profitable trading strategies 
("a generation") is notoriously hard, but not 
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impossible. It is important to remember that 
modern portfolio optimization techniques and 
fancy mathematics are not going to help at all if 
the underlying trading strategies are poor. 

Implicit in this list is that for each step one 
needs to perform thorough testing in order to 
understand the effect of changes and new addi¬ 
tions to the model. It is not unusual that quan¬ 
titative analysts and portfolio managers will 
have to revisit previous steps as part of the re¬ 
search and development process. For example, 
it is important to understand the interplay be¬ 
tween forecast generation and the reliability of 
optimized portfolio weights. Introducing a ro¬ 
bust optimizer may lead to more reliable, and 
often more stable, portfolio weights. However, 
how to make the optimization framework more 
robust depends on how expected return and 
risk forecasts are produced. Therefore, some¬ 
times one has to refine or modify basic forecast 
generation. Identifying the individual and the 
combined contribution of different techniques 
is crucial in the development of a successful 
quantitative framework. 

Minimizing estimation risk and improving 
the reliability of the optimization framework 
can be done in either order, or sometimes at 
the same time. The goal of both approaches 
is of course to improve the overall reliabil¬ 
ity and performance of the portfolio alloca¬ 
tion framework. Some important questions to 
consider here are: When/why does the frame¬ 
work perform well (poorly)? How sensitive is 
it to changes in inputs? How does it behave 
when constraints change? Are portfolio weights 
intuitive—do they make sense? How high is the 
turnover of the portfolio over time? 

Starting from the simple framework of clas¬ 
sical portfolio optimization, many extensions 
are possible. Typical examples include the in¬ 
troduction of transaction costs models, more 
complex constraints (e.g., integer constraints 
such as round lotting or cardinality constraints), 
different risk measures (e.g., downside risk 
measures, higher moments), and dynamic and 
stochastic programming for incorporating in¬ 


tertemporal dependencies. Often, these are 
problem specific and have to be dealt with on a 
case-by-case basis. 

FUTURE DIRECTIONS 

Advances in the mathematical and physical 
sciences have always had a major impact on 
finance. In particular, probability theory, statis¬ 
tics, econometrics, and operations research 
have provided the necessary tools and disci¬ 
pline for the development of modern financial 
economics and large-scale portfolio manage¬ 
ment. The substantial advances in the areas 
of robust estimation and robust optimization 
during the 1990s have proven to be of signifi¬ 
cant importance for the practical applicability 
and reliability of portfolio management and 
optimization. 

From a theoretical perspective, the area of ro¬ 
bust optimization is quite mature. By contrast, 
there are many unanswered questions in the 
practice of robust portfolio optimization. There 
is a need for more empirical research in or¬ 
der to provide better guidelines for applying 
robust optimization in a way that guarantees 
superior portfolio performance. In particular, 
practitioners need to understand better (1) the 
implications of using different types of uncer¬ 
tainty set, (2) the interaction between different 
forecast generation methods (estimation tech¬ 
niques) and robust optimization, (3) how to 
calibrate model parameters in the optimization 
model, and (4) how to deal with the overcon¬ 
servatism inherent in many robust models. 

The robust optimization framework offers 
great flexibility and many new interesting pos¬ 
sibilities in portfolio management. For instance, 
robust portfolio optimization can exploit the 
notion of statistically equivalent portfolios. 
Specifically, with robust optimization, a man¬ 
ager can find the best portfolio that (1) mini¬ 
mizes trading costs with respect to the current 
holdings, and (2) has an expected portfolio re¬ 
turn and variance that are statistically equiv¬ 
alent to those of the classical mean-variance 


146 


Optimization Tools 


portfolio. Common portfolio constraints, such 
as transaction cost considerations and tax impli¬ 
cations, can be handled efficiently in the robust 
optimization framework. 

Robust optimization has also shown promise 
as a computationally attractive alternative to 
classical optimization methods when it comes 
to multiperiod portfolio management. There 
are numerous benefits to taking a long-term 
view of investment management. Treating port¬ 
folio allocation as a multiperiod problem pro¬ 
vides a framework for robust overall portfolio 
management that takes into consideration the 
effects of rebalancing, transaction costs, future 
liabilities, and taxes. 

By incorporating multiperiod views on asset 
behavior in rebalancing models, portfolio man¬ 
agers may be able to reduce their transaction 
costs, as the portfolio will not be rebalanced 
unnecessarily often. As a simple example, if a 
portfolio manager expects asset returns to dip 
at the next time period, but then recover, he may 
choose to hold on to the assets in his portfolio 
in order to minimize transaction costs. How¬ 
ever, if the net gain from realizing the tax loss 
is higher than the expense of the transactions, 
he may choose to trade for short term benefit 
despite believing that the portfolio value will 
recover after two trading periods. These trade¬ 
offs are complex to evaluate and model, and tra¬ 
ditional optimization techniques for multistage 
optimization, such as dynamic programming 
(see, for example, Bertsekas, 1995a) and stochas¬ 
tic programming (see, for example, Wallace and 
Ziemba, 2005), have not been very successful in 
this context as they result in computationally 
intractable problems due to the "curse of di¬ 
mensionality." However, if future asset returns 
are treated as uncertain parameters, and the un¬ 
certainty in their estimates is modeled through 
appropriately chosen uncertainty sets, the re¬ 
sulting portfolio optimization formulations are 
computationally tractable. 

We emphasize that while the focus of this 
entry has been on the application of robust 
optimization to portfolio construction, robust 


optimization is a powerful and general tool 
with financial applications that extend well 
beyond that of portfolio allocation. The robust 
optimization technique appears promising in 
enhancing existing models for optimal trading, 
the computation of hedge ratios, the estimation 
of econometric models, and quantitative model 
selection—just to mention a few. Certainly, the 
future may bring many more. 

KEY POINTS 

• As the use of quantitative techniques has be¬ 
come widespread in the investment indus¬ 
try, the consideration of estimation risk and 
model risk has grown in importance. 

• In contrast to the traditional approach in 
which inputs to the portfolio allocation 
framework are treated as deterministic, ro¬ 
bust portfolio optimization incorporates es¬ 
timation errors in input parameters directly 
into the optimization process. 

• In robust portfolio optimization, the inputs 
are not the traditional forecasts, such as 
expected returns and risk, but rather un¬ 
certainty sets containing these point esti¬ 
mates (e.g., confidence intervals around the 
forecasts). 

• The robust optimization is a general tech¬ 
nique that leads to a more reliable portfolio 
allocation framework and offers greater flex¬ 
ibility and many new interesting possibilities 
for the portfolio manager. 

• One of the main problems in assessing the 
practical benefits of the robust optimization 
approach is that its performance is dependent 
on the choice (or calibration) of the model 
parameters, such as the coefficient of aversion 
to estimation error. 

• Which type of robust model is best for 
modeling financial portfolios depends on, 
among other things, the size of the port¬ 
folio, the type of assets and their distribu¬ 
tional characteristics, the portfolio strategies 
and trading styles involved, and the existing 
infrastructure. 
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Abstract: Probability theory is the mathematical approach to formalizing the uncertainty of events. 
Even though a decision maker may not know which one of the set of possible events may finally 
occur, with probability theory, a decision maker has the means of providing each event with a 
certain probability. Furthermore, it provides the decision maker with the axioms to compute the 
probability of a composed event in a unique way. The rather formal environment of probability 
theory translates in a reasonable manner to the problems related to risk and uncertainty in finance 
such as, for example, the future price of a financial asset. Today investors may be aware of the 
price of a certain asset, but they cannot say for sure what value it might have tomorrow. To make a 
prudent decision, investors need to assess the possible scenarios for tomorrow's price and assign to 
each scenario a probability of occurrence. Only then can investors reasonably determine whether 
the financial asset will satisfy an investment objective. 


Probability theory serves as the quantification 
of risk in finance. To estimate probabilistic mod¬ 
els, we have to gather and process empirical 


data. In this context, we need the tools provided 
by statistics. In this entry we introduce the gen¬ 
eral concepts of probability theory 
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HISTORICAL DEVELOPMENT 
OF ALTERNATIVE 
APPROACHES TO 
PROBABILITY 

Before we introduce the formal definitions, we 
provide a brief outline of the historical develop¬ 
ment of probability theory and the alternative 
approaches since probability is, by no means, 
unique in its interpretation. We will describe 
the two most common approaches: relative fre¬ 
quencies and axiomatic system. 

Probability as Relative Frequencies 

The relative frequencies approach to probabil¬ 
ity was conceived in 1928 by Richard von Mises 
(Mises, 1928) and as the name suggests formu¬ 
lates probability as the relative frequencies de¬ 
noted by f(xj). This initial idea was extended 
by Hans Reichenbach (1935). Given large sam¬ 
ples, it was understood that /(x,) was equal to 
the true probability of value x,. For example, if 
/(x,) is small, then the true probability of value 
Xi occurring should be small, in general. How¬ 
ever,/(x,) itself is subject to uncertainty. Thus, 
the relative frequencies might deviate from the 
corresponding probabilities. For example, if the 
sample is not large enough, whatever large may 
be, then it is likely that we obtain a rare set of 
observations and draw the wrong conclusion 
with respect to the underlying probabilities. 

This point can be illustrated with a simple ex¬ 
ample. Consider throwing a six-sided dice 12 
times. Intuitively, one would expect the num¬ 
bers 1 through 6 to occur twice each, since this 
would correspond to the theoretical probabili¬ 
ties of 1 /6 for each number. But since so many 
different outcomes of this experiment are very 
likely possible, one might observe relative fre¬ 
quencies of these numbers different from 1/6. 
So, based on the relative frequencies, one might 
draw the wrong conclusion with respect to the 
true underlying probabilities of the according 
values. However, if we increase the repetitions 
from 12 to 1,000, for example, with a high de¬ 


gree of certainty, the relative frequency of each 
number will be pretty close to 1 /6. 

The reasoning of von Mises and Reichen¬ 
bach was that since extreme observations 
are unlikely given a reasonable sample size, 
the relative frequencies will portray the true 
probabilities with a high degree of accuracy. 
In other words, probability statements based 
on relative frequencies were justifiable since, 
in practice, highly unlikely events could be 
ruled out. 

In the context of our dice example, they 
would consider unlikely that certain numbers 
appeared significantly more often than others 
if the series of repetitions is, say, 1,000. But still, 
who could guarantee that we do not acciden¬ 
tally end up throwing 300 Is, 300 2s, 400 3s, and 
nothing else? 

The approach of von Mises becomes rel¬ 
evant, again, in the context of estimating 
and hypothesis testing. For now, however, we 
will not pay any further attention to it but 
turn to the alternative approach to probability 
theory. 

Axiomatic System 

Introduced by Andrei N. Kolmogorov in 1933, 
the axiomatic system abstracted probability 
from relative frequencies as obtained from ob¬ 
servations and instead treated probability as 
purely mathematical. The variables were no 
longer understood as the quantities that could 
be observed but rather as some theoretical en¬ 
tities "behind the scenes." Strict rules were set 
up that controlled the behavior of the variables 
with respect to their likelihood of assuming val¬ 
ues from a predetermined set. So, for example, 
consider the price of a stock, say General Elec¬ 
tric (GE). GE's stock price as a variable is not 
what you can observe but a theoretical quan¬ 
tity obeying a particular system of probabilities. 
What you observe is merely realizations of the 
stock price with no implication on the true prob¬ 
ability of the values since the latter is given and 
does not change from sample to sample. The 
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relative frequencies, however, are subject to 
change depending on the sample. 

We illustrate the need for an axiomatic system 
due to the dependence of relative frequencies 
on samples using our dice example. Consider 
the chance of occurrence of the number 1. Based 
on intuition, since there are six different "num¬ 
bers of dots" on a dice, the number 1 should 
have a chance of 1/6, right? Suppose we ob¬ 
tain the information based on two samples of 
12 repetitions each, that is, ti\ = ;i 2 = 12. In 
the following table, we report the absolute fre¬ 
quencies, fl,, representing how many times the 
individual numbers of dots 1 through 6 were 
observed. 



Absolute Frequencies a, 

Number of Dots 

Sample 1 

Sample 2 

1 

4 

1 

2 

1 

1 

3 

3 

1 

4 

0 

1 

5 

1 

1 

6 

3 

7 

Total 

12 

12 

That is, in sample 1,1 dot was observed 4 times 

while, in sample 2, 
once, and so on. 

1 dot was 

observed only 

From the above observations, we obtain the 

following relative frequencies 



Relative Frequencies /(*,) 

Number of Dots 

Sample 1 

Sample 2 

1 

0.3333 

0.0833 

2 

0.0833 

0.0833 

3 

0.2500 

0.0833 

4 

0.0000 

0.0833 

5 

0.0833 

0.0833 

6 

0.2500 

0.5833 

Total 

1.0000 

1.0000 


That is, in sample 1, 1 dot was observed 
33.33% of the time while in sample 2, 1 dot 
was observed 8.33% of the time, and so on. We 
see that both samples lead to completely differ¬ 
ent results about the relative frequencies for the 


number of dots. But, as we will see, the theoret¬ 
ical probability is 1/6 = 0.1667, for each value 1 
through 6. So, returning to our original question 
of the chance of occurrence of 1 dot, the answer 
is still 1/6 = 0.1667. 

In finance, the problem arising with this con¬ 
cept of probability is that, despite the knowl¬ 
edge of the axiomatic system, we do not know 
for sure what the theoretical probability is for 
each value. We can only obtain a certain de¬ 
gree of certainty as to what it approximately 
might be. This insight must be gained from 
estimation based on samples and, thus, from 
the related relative frequencies. So, it might ap¬ 
pear reasonable to use as many observations as 
possible. However, even if we try to counteract 
the sample-dependence of relative frequencies 
by using a large number of observations, there 
might be a change in the underlying probabili¬ 
ties exerting additional influence on the sample 
outcome. For example, during the period of a 
bull market, the probabilities associated with an 
upward movement of some stock price might 
be higher than under a bear market scenario. 

Despite this shortcoming, the concept of prob¬ 
ability as an abstract quantity as formulated by 
Kolmogorov (1933) has become the standard in 
probability theory. 

SET OPERATIONS AND 
PRELIMINARIES 

Before proceeding to the formal definition of 
probability, randomness, and random variables 
we need to introduce some terminology. 

Set Operations 

A set is a combination of elements. Usually, we 
denote a set by some capital (uppercase) letter, 
for example S, while the elements are denoted 
by lowercase letters such as a, b, c,... or a\, 
fl 2 ,.... To indicate that a set S consists of exactly 
the elements a, b, c, we write S = {a,b,c}. If we 
want to say that element a belongs to S, the no¬ 
tation used is that a e S where e means "belongs 










154 


Probability Theory 


to." If, instead, a does not belong to S, then the 
notation used is a S where f means "does not 
belong to." 

A type of set such as S = {n,b,c} is said to 
be countable since we can actually count the in¬ 
dividual elements a, b, and c. A set might also 
consist of all real numbers inside of and includ¬ 
ing some bounds, say a and b. Then, the set is 
equal to the interval from a to b, which would 
be expressed in mathematical notation as S = 
[a,b]. If either one bound or both do not belong 
to the set, then this would be written as either 
S = ( a,b ], S = [a,b), or S = ( a,b ), respectively, 
where the parentheses denote that the value is 
excluded. An interval is an uncountable set since, 
in contrast to a countable set S — {a,b,c }, we can¬ 
not count the elements of an interval. 1 

We now present the operators used in the con¬ 
text of sets. The first is equality denoted by = and 
intuitively stating that two sets are equal, that 
is. Si = S 2 , if they consist of the same elements. 
If a set S consists of no elements, it is referred 
to as an empty set and is denoted by S = 0. If 
the elements of Si are all contained in S 2 , the 
notation used is Si C S 2 or Si c S 2 . In the first 
case, S 2 also contains additional elements not 
in Si while, in the second case, the sets might 
also be equal. For example, let Si = {a,bj and 
S 2 = {a,b,c}, then Si c S 2 . The operator c would 
indicate that S 2 consists of, at least, a and b. Or, 
let Mi = [0,1] and M 2 = [0.5,1], then M 2 c M v 

If we want to join a couple of sets, we use 
the union operator denoted by U. For example, 
let Si = {a,b} and S 2 = {b,c,d}, then the union 
would be Si U S 2 = {a,b,c,d}- Or, let Mi = [0,1] 
and M 2 = [0.5,1], thenM 2 UMi = [0,1] = Mi. 2 
If we join n sets Si, S 2 ,..., S n with n > 2, we 
denote the union by U” =1 S, 

The opposite operator to the union is the dif¬ 
ference denoted by the "\" symbol. If we take 
the difference between set Si and set S 2 , that is, 
Si\S 2 , we discard from Si all the elements that 
are common to both Si and set S 2 . For example, 
let Si = {a,b} and S 2 = {b,c,d}, then Si\S 2 = {«}. 

To indicate that we want to single out ele¬ 
ments that are contained in several sets simul¬ 


taneously, then we use the intersection operator 
fl. For example, with the previous sets, the in¬ 
tersection would be Si fl S 2 = {fr}. Or, let Mi = 
[0,1] and M 2 = [0.5,1], then Mi n M 2 = [0.5,1] = 
M 2 . Instead of the fl symbol, one sometimes 
simply writes S 1 S 2 to indicate intersection. 

If two sets contain no common elements (i.e., 
the intersection is the empty set), then the sets 
are said to be pairwise disjoint. For example, the 
sets Si = {a,b} and S 2 = {c,d} are pairwise dis¬ 
joint since Si fl S 2 = 0. Or, let Mi = [0,0.5) and 
M 2 = [0.5,1], then Mi fl M 2 = 0. If we intersect 
n sets Si, S 2 ,. ■., S n with n > 2, we denote the 
intersection by (T" =1 S,. 

The complement to some set S is denoted by S. 
It is defined as S fl S = 0 and S U S = £2. That 
is, the complement S is the remainder of that 
is not contained in S. 

Right-Continuous and 
Non-decreasing Functions 

Next we introduce two concepts of functions 
that should be understood in order to appre¬ 
ciate probability theory: right-continuous func¬ 
tion and non-decreasing function. 

A function/ is right-continuous at x if the limit 
from the right of the function values coincides 
with the actual value of/ at x. Formally, that is 
I i m x> xf( x ) = /(x). We illustrate this in Figure 1. 
At the abscissae X\ and x 2 , the function/ jumps 
to/(X|) and f{xf) respectively. 3 After each jump, 
the function remains at the new level for some 
time. Hence, approaching xi from the right, that 
is, for higher x-values, the function/ approaches 
f(x 1 ) smoothly. This is not the case when ap¬ 
proaching X\ from the left since / jumps at X\ 
and, hence, deviates from the left-hand limit. 
The same reasoning applies to / at abscissa x 2 . 
A function is said to be a right-continuous func¬ 
tion if it is right-continuous at every value on 
the x-axis. 

A function/ is said to be a non-decreasing func¬ 
tion if it never assumes a value smaller than any 
value to the left. We demonstrate this using Fig¬ 
ure 2. We see that while, in the different sections 
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Figure 1 Demonstration of Right-Continuity of Some Hypothetical Function/ at Values x\ and X 2 


A, B, and C, / might grow at different rates, it 
never decreases. Even for x-values in section B, 
/ has zero and thus a nonnegative slope. 

Outcome, Space, and Events 

Before we dive into the theory, we will use 
examples that help illustrate the concept be¬ 


hind the definitions that follow later in this 
entry. 

Let us first consider again the number of dots 
of a dice. If we throw it once, we observe a 
certain value, that is, a realization of the ab¬ 
stract number of dots, say 4. This is one par¬ 
ticular outcome of the random experiment. We 
will denote the outcomes by w and a particular 



Figure 2 Hypothetical Non-decreasing Function/ 
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outcome i will be denoted by co{. We might 
just as well have realized 2 , for example, which 
would represent another outcome. All feasible 
outcomes, in this experiment, are given by 

co i = 1 a >2 — 2 co 3 = 3 C 04 = 4 co$ = 5 coc, = 6 

The set of all feasible outcomes is called space 
and is denoted by £2. In our example, U = { cot , 

(X>2, (O 3 , CO 4 , CO 5 , COfr}. 

Suppose that we are not interested in the exact 
number of points but care about whether we 
obtain an odd or an even number, instead. That 
is, we want to know whether the outcome is 
from A = {coi, a> 3 : <03 }—that is, the set of all 
odd numbers—or B = {co 2 , coe ,}—the set of 
all even numbers. The sets A and B are both 
contained in Q; that is, both sets are subsets of 
f2. Any subsets of Q are called events. So, we 
are interested in the events "odd" and "even" 
number of dots. When individual outcomes are 
treated as events, they are sometimes referred 
to as elementary events or atoms. 

All possible subsets of f2 are given by the so- 
called power set 2 Q of ft. A power set of ft is a set 
containing all possible subsets of 12 including 
the empty set 0 and Q, itself . 4 

For our dice example, the power set is given 
in Table 1. With the aid of this power set, we 
are able to describe all possible events such 
as, for example, the number of dots less than 
3 (i.e., { o> \ , 0 ) 2 }) or the number of dots either 1 or 
greater than or equal to 4 (i.e., {co\, C04, C05, cos}). 


The power set has an additional pleasant fea¬ 
ture. It contains any union of arbitrarily many 
events as well as any intersection of arbitrar¬ 
ily many events. Because of this, we say that 
2 n is closed under countable unions and closed un¬ 
der countable intersections. Unions are employed 
to express that at least one of the events has to 
occur. We use intersections when we want to 
express that the events have to occur simulta¬ 
neously. The power set also contains the com¬ 
plements to all events. 

As we will later see, all these properties of the 
power set are features of a a-algebra (in words: 
sigma-algebra), often denoted by A. 

Now consider an example where the space 
U is no longer countable. Suppose that we are 
analyzing the daily logarithmic returns for a 
common stock or common stock index. The¬ 
oretically, any real number is a feasible out¬ 
come for a particular day's return . 5 So, events 
are characterized by singular values as well as 
closed or open intervals on the real line. For 
example, we might be interested in the event 
£ that the S&P 500 stock index return is "at 
least 1%." Using the notation introduced ear¬ 
lier, this would be expressed as the half-open 
interval £ = [0.01, 00). 6 This event consists of 
the uncountable union of all outcomes between 
0.01 and 00 . Now, as the sets containing all fea¬ 
sible events, we might take, again, the power 
set of the real numbers, that is, 2 n with U = 
(— 00 , 00 ) = R . 7 But, for theoretical reasons 


Table 1 The Power Set of the Example Number of Dots of a Dice 

2 a = { 0 , {(0 1}, {0)2), )cd 3 ), {cd 4 }, {W5}, {CDs), {cl>2, CD 3 ), {coi, <04}, {cd 2, 6125}, (cD2, CDs), . . . 

{CD 3 , CD 4 ), {cD 3 , CD5}, {cD 3 , CDs), [co 4, CD 5 ), {cD 4 , CD 6 ), {CD5, CDs), {cD 4 , U>2. CD 3 ), 

{cDl, CD 2, CD 4 ), {cDl, CD2, CD5), {cDl, 0)2, CDs), {cD 4 , CD 3 , CD 4 ), [o)\, CD 3 , CD5), {cDl, CD 3 , CDS } • 

{CDI, CD 4 , CD5}, {CDl, CD 4 , CDs). {CDl, CD5, CDs), {cD2, CD 3 , CD 4 ), {cD2, CD 3 , CD5), {cD2, CD 3 , CDs). ■ ■ ■ 

{cD2, CD 4 , CDs), {cD2, C0 4 , CDs), {CD2, CD5, CDs), {CD 3 , CD 4 , CD5}, {cD 3 , CD 4 , CDs), {cD 3 , CD5, CDs), 

{cd 4 , CD5, CDs), {CDl, CD2, 0)3, CD 4 ), [o)\, 0)2, CO3, CD5}, {cDl, CD2, CD 3 , CDs), [o)\, 0)2, 0)4, CD5}, 

{CDl, 0)2, 0)4, CDs), {CDl, CD2, CD5, CDs), {CDl, CD 3 , CD 4 , CD5}, {cDl, CD 3 , <1)4, CDs), 

{CDl, CO3, 0)5, CDs), {CDl, 0>4, 0)3, CDs), {cD2, CD 3 , CD 4 , CD5}, {cD2, CD 3 , CD 4 , CDs), 

{cD2, 0)3, CD5, CDs), {CD2, 0)4, 0)3, CD6), {0)3, 0)4, 0)3, CDs), {cDj, CD2, CD 3 , 0)4, CD5}, 

{CDl, CD2, CD 3 , 0)4, CDs), {CDj, 0)2, 0)3, 0)3, 0)3], [cDl, CD2, 0)4, 0)3, CDs), 

{CDl, 0)3, 0)4, 0)3, CDs), {CDJ, CD 3 , <1)4, 0)3, CDs), £ 2 } 

Note: The notation {cd,} for i = 1, 2,..., 6 indicates that the outcomes are treated as 
events. 
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beyond the scope of this entry, that might cause 
trouble. 

Instead, we take a different approach. To de¬ 
sign our set of events of the uncountable space 
£2, we begin with the inclusion of the events 
"any real number," which is the space £2, itself, 
and "no number at all," which is the empty set 
0. Next, we include all events of the form "less 
than or equal to a," for any real number a, that 
is, we consider all half-open intervals (— oo,a\, 
for any net. Now, for each of these (—oo,a], 
we add its complement (— 00 , a] = £2\(—oo,zz] = 
(a, 00 ), which expresses the event "greater than 

a." So far, our set of events contains 0, £2, all sets 
(—oo,«], and all the sets (a, 00 ). Furthermore, we 
include all possible unions and intersections of 
everything already in the set of events as well as 
of the resulting unions and intersections them¬ 
selves. By doing this, we guarantee that any 
event of practical relevance of an uncountable 
space is considered by our set of events. 

With this procedure, we construct the Borel 
a-algebra, B. This is the collection of events we 
will use any time we deal with real numbers. 

The events from the respective a-algebra of the 
two examples can be assigned probabilities in a 
unique way, as we will see. 

The Measurable Space 

Let us now express the ideas from the previous 
examples in a formal way. To describe a random 
experiment, we need to formulate 

1. Outcomes a> 

2. Space £2 

3. a-algebra A 

Definition 1 — Space: The space £2 contains all 
outcomes. Depending on the outcomes &>, the 
space £2 is either countable or uncountable. 
Definition 2— a-algebra: The < 7 -algebra A is the 
collection of events (subsets of £2) with the 
following properties: 

a. £2eA and 0eA. 

b. If event £ e A then E e A. 


c. If the countable sequence of events E\, 
E 2 , £3,... e A then UgjE, e A and fi^ 1 E; 
e A. 

Definition 3—Borel a-algebra: The a-algebra 
formed by 0, £2 = M, intervals (oo/z] for 
some real a, and countable unions and in¬ 
tersections of these intervals is called a Borel 
a-algebra and denoted by B. 

Note that we can have several a-algebrae for 
some space £2. Depending on the events we are 
interested in, we can think of a a-algebra A that 
contains fewer elements than 2 n (for countable 
£2), or the Borel a-algebra (for uncountable £2). 
For example, we might think of A = {0, £2}, that 
is, we only want to know whether any outcome 
occurs or nothing at all. 8 It is easy to verify that 
this simple A fulfills all requirements a, b, and 
c of Definition 2. 

Definition 4 —Measurable space: The tuple (£2, A) 
with A being a a-algebra of £2 is a measurable 
space. 

A tuple is the combination of several compo¬ 
nents. For example, when we combine two val¬ 
ues a and b, the resulting tuple is (a,b), which we 
know to be a pair. If we combine three values a, 
b, and c, the resulting tuple (a,b,c) is known as a 
triplet. 

Given a measurable space, we have enough to 
describe a random experiment. All that is left is 
to assign probabilities to the individual events. 
We will do so next. 

PROBABILITY MEASURE 

We start with a brief discussion of what we ex¬ 
pect of a probability or probability measure ; that 
is, the following properties: 

Property 1: A probability measure should assign 
each event £ from our a-algebra a nonneg¬ 
ative value corresponding to the chance of 
this event occurring. 

Property 2: The chance that the empty set occurs 
should be zero since, by definition, it is the 
improbable event of "no value." 
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Property 3: The event that "any value" might oc¬ 
cur (i.e., 12) should be 1 or, equivalently, 100% 
since some outcome has to be observable. 
Property 4: If we have two or more events that 
have nothing to do with one another that are 
pairwise disjoint or mutually exclusive, and 
create a new event by uniting them, the prob¬ 
ability of the resulting union should equal 
the sum of the probabilities of the individual 
events. 

To illustrate, let: 

• The first event state that the S&P 500 log re¬ 
turn is "maximally 5%," that is, Ei = (— oo, 
0.05], 

• The second event state that the S&P 500 log 
return is "at least 10%," that is, £2 = [0.10, 00 ). 

Then, the probability of the S&P log return ei¬ 
ther being no greater than 5% or no less than 
10 % should be equal to the probability of E\ 
plus the probability of £ 2 . 

Let's proceed a little more formally. Let (£2, A) 
be a measurable space. Moreover, consider the 
following definition. 

Definition 5—Probability measure: A function P 
on the u-algebra A of £2 is called a probability 
measure if it satisfies: 

a. P(0) = 0 and P(12) = 1. 

b. For a countable sequence of events £ 1 , 
£ 2 ,... in A that are pairwise disjoint (i.e., 
Ei n Ej = 0... ,i j), we have 

( \ 00 

ue<) = E p < £ <> 

This property is referred to as countable additi- 
vity. 

Then we have everything we need to model 
randomness and chance, that is, we have the 
space 12, the u-algebra A of 12, and the proba¬ 
bility measure P. This triplet (12, A, P) forms the 
so called probability space. 

At this point, we introduce the notion of 
P-almost surely (P-a.s.) occurring events. It is 
imaginable that even though P(12) = 1, not all 
of the outcomes in 12 contribute positive prob¬ 


ability. The entire positive probability may be 
contained in a subset of 12 while the remaining 
outcomes form the unlikely event with respect to 
the probability measure P. The event account¬ 
ing for the entire positive probability with re¬ 
spect to P is called the certain event with respect to 
P. If we denote this event by E as , then we have 
P(Efls) = 1 yielding P(!2\E as ) = 0. 

There are certain peculiarities of P depending 
on whether 12 is countable or not. It is essential 
to analyze these two alternatives since this dis¬ 
tinction has important implications for the de¬ 
termination of the probability of certain events. 
Here is why. 

Suppose, first, that 12 is countable. Then, we 
are able to assign the event {&>,} associated with 
an individual outcome, a>,, a nonnegative prob¬ 
ability pi = P({&>,}), for all a>i e 12. Moreover, the 
probability of any event £ in the u-algebra A 
can be computed by adding the probabilities of 
all outcomes associated with £. That is, 

P < E > = E„« e v, 

In particular, we have 

i ’( n ) = E„ €!1 p. = i 

Let us resume the six-sided dice tossing ex¬ 
periment. The probability of each number of 
dots 1 through 6 is 1 /6 or formally, 

P({« 1 }) = P({0> 2 }) = • • • = P««6l) = 1/6 

or equivalently. 

Pi = p 2 = ■ ■ ■ = Pe = 1/6 

Suppose, instead, we have 12 = 1R. That is, 12 is 
uncountable and our u-algebra is given by the 
Borel u-algebra, B. To give the probability of 
the events £ in B, we need an additional device, 
given in the next definition. 

Definition 6 — Distribution function: A function F 

is a distribution function of the probability mea¬ 
sure P if it satisfies the following properties: 

a. P is right-continuous. 

b. F is non-decreasing. 
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c. lim F(x) = 0 and lim F(x) = 1. 

x —>—oo x^-oo 

d. For any rel, we have F(x) = P((—oo,x]). 

It follows that, for any interval ( x,i /], we com¬ 
pute the associated probability according to 

F(y)-F(x)=P((x,y]) (1) 

So, in this case we have a function F uniquely 
related to P from which we derive the proba¬ 
bility of any event in B. Note that in general 
F is only right-continuous, that is the limit of 
F(y), when y > x and y —> x, is exactly F(x). At 
point x, we might have a jump of the distribu¬ 
tion F(x). The size of this jump equals P({x}). 
This distribution function can be interpreted in 
a similar way to the relative empirical cumula¬ 
tive distribution function. That is, we state the 
probability of our quantity of interest being less 
than or equal to x. 

To illustrate, the probability of the S&P 500 log 
return being at most 1%, £ = (—oo, 0.01], is given 
by f s&P 5 °°(0.01) = P((-oo, 0.01]), 9 while the 
probability of it being between —1% and 1% is 

F s & P500( 0 01) _ F s&psoo ( _ 0 01) = p((_ 0 .oi, 0.01]) 


RANDOM VARIABLE 

Now the time has come to introduce the con¬ 
cept of a random variable. When we refer to some 
quantity as being a random variable, we want 
to express that its value is subject to uncertainty, 
or randomness. Technically, the variable of in¬ 
terest is said to be stochastic. In contrast to a 
deterministic quantity whose value can be de¬ 
termined with certainty, the value of a random 
variable is not known until we can observe a 
realized outcome of the random experiment. 
However, since we know the probability space 
(f2, A, P), we are aware of the possible values it 
can assume. 

One way we can think of a random variable 
denoted by X is as follows. Suppose we have 
a random experiment where some outcome oo 
from the space Q occurs. Then, depending on 


this a>, the random variable X assumes some 
value X(co) = x, where to can be understood as 
input to X. What we observe, finally, is the value 
x, which is only a consequence of the outcome 
co of the underlying random experiment. 

For example, we can think of the price of a 
30-year Treasury bond as a random variable as¬ 
suming values at random. However, expressed 
in a somewhat simple fashion, the 30-year Trea¬ 
sury bond depends completely on the prevail¬ 
ing market interest rate (or yield) and, hence, 
is a function of it. So, the underlying random 
experiment concerns the prevailing market in¬ 
terest rate with some outcome co while the price 
of the Treasury bond, in turn, is merely a func¬ 
tion of co. 

Consequently, a random variable is a function 
that is completely deterministic and depends on 
the outcome co of some random experiment. In 
most applications, random variables have val¬ 
ues that are real numbers. 

So, we understand random variables as func¬ 
tions from some space into an image or state 
space. We need to become a little more formal 
at this point. To proceed, we will introduce a 
certain type of function, the measurable function, 
in the following 

Definition 7 —Measurable function: Let (fi, A) and 
(£T,A') be two measurable spaces. That is 
FI, £2' are spaces and A, A' their a-algebrae, 
respectively. A function X\ FI -> FI' is A-A' 
-measurable if, for any set E 1 e A', we have 

X _1 (E') e A 

In words, this means that a function from 
one space to another is measurable if the ori¬ 
gin with respect to this function of each im¬ 
age in the a -algebra of the state space can be 
traced in the a -algebra of the domain space. 
Instead of A-A'-measurable, we will, hence¬ 
forth, use simply measurable since, in our state¬ 
ments, it is clear which a-algebrae are being 
referred to. 

We illustrate this in Figure 3. Function X cre¬ 
ates images in £2' by mapping outcomes co from 
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Figure 3 Relationship between Image £'and X 1 (£') through the Measurable Function X 


ft with values X(co) = x in ft'. In reverse fash¬ 
ion, for each event E' in the state space with 
a-algebra A', X~ x finds the corresponding ori¬ 
gin of £' in a-algebra A of the probability space. 

Now, we define a random variable X as a mea¬ 
surable function. That means for each event in 
the state space a-algebra. A', we have a corre¬ 
sponding event in the a-algebra of the domain 
space, A. 

To illustrate this, let us consider the example 
with the dice. Now we will treat the "number 
of points" as a random variable X. The possible 
outcome values of X are given by the state space 
ft', namely, ft' = {1,2,3,4,5,6}. 10 The origin or 
domain space is given by the set of outcomes ft 
= {&>i, a> 2 , 0 ) 3 , a> 4 , &> 5 , cof\. Now, we can think 
of our random variable X as the function X: 
ft—> ft with the particular map X(&>,) = i with 

i = 1,2,... ,6. 


Random Variables on a 
Countable Space 

We will distinguish between random variables 
on a countable space and on an uncountable 
space. We begin with the countable case. 


The random variable X is a function mapping 
the countable space ft into the state space ft'. 
The state space ft' contains all outcomes or val¬ 
ues that X can obtain. 11 Thus, all outcomes in 
ft' are countable images of the outcomes a> in 
ft. Between the elements of the two spaces, we 
have the following relationship. 

Let x be some outcome value of X in ft'. Then, 
the corresponding outcomes from the domain 
space ft are determined by the set 

X -1 (M) = {« : X(a >) = x] 

In words, we look for all outcomes w that are 
mapped to the outcome value x. 

For events, in general, we have the relation¬ 
ship 

X _1 (£') = {« : X(co) e E’} 

which is the set of all outcomes a> in the domain 
space that are mapped by X to the event £' in 
the state space. That leads us to the following 
definition: 

Definition 8—Random variable on a countable 
space: Let (ft, A) and (ft', A') be two measur¬ 
able spaces with countable ft and ft'. Then 
the mapping X: ft-»ft' is a random variable 
on a countable space if, for any event E' e A' 
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composed of outcomes x e £2', we have 

P X (E') = P({co : X(oj) e E'}) = P(X~\E')) 
= P(X e £') (2) 

We can illustrate this with the following ex¬ 
ample from finance referred to as the "binomial 
stock price model." The random variable of in¬ 
terest will be the price of some stock. We will 
denote the price of the stock by S. Suppose at 
the beginning of period f, the price of the stock 
is $20 (i.e., St = $20). At the beginning of the 
following period, t + 1, the stock price is either 
S f+ i = $18 or S t+ i = $22. We model this in the 
following way. 

Let: 

• (£2, A) and (£2', A') be two measurable spaces 
with £2' = {$18,$22} (i.e., the state space of the 
period t + 1 stock price) and A (i.e., the corre¬ 
sponding ct- algebra of all events with respect 
to the stock price in t + 1). 

• £2 be the space consisting of the outcomes of 
some random experiment completely influ¬ 
encing the t + 1 stock price. 

• A be the corresponding a -algebra of £2 with 
all events in the origin space. 

Now, we can determine the origin of the event 
that 

St+i = $18 by £ d0 wn = {« : S(a>) = $18} 

and 

St+i = $22 by £ up = {&>: S(a>) = $22} 

Thus, we have partitioned £2 into the two 
events, Ed 0W n and £ up , related to the two pe¬ 
riod f + 1 stock prices. With the probability 
measure P on £2, we have the probability space 
(£2,A ,P). Consequently, due to equation (2), we 
are able to compute the probability P s ($18) = 
P(Edown) and P s ($22) = P(E up ), respectively. 

Random Variables on an 
Uncountable Space 

Now let's look at the case when the probability 
space (£2, A, P) is no longer countable. Recall 


the particular way in which events are assigned 
probabilities in this case. 

While for a countable space any outcome u> 
can have positive probability, that is, p,„ > 0, this 
is not the case for individual outcomes of an 
uncountable space. On an uncountable space, 
we can have the case that only events associ¬ 
ated with intervals have positive probability. 
These probabilities are determined by the dis¬ 
tribution function F(x) — P(X < x) — P(X < x) 
according to equation (1). 

This brings us to the following definition: 

Definition 9—Random variable on a general possi¬ 
bly uncountable space: Let (£2, A) and (£2', A') 
be two measurable spaces with, at least, £2 
uncountable. The map X: £2—£2' is a random 
variable on the uncountable space (£2,A,P) if it 
is measurable. That is, if, for any £' e A', 
we have 

X _1 (E') e A 

induce probability from (£2, A, P) on (£2', A') 
by 

P X (E') = p({co : X(co) e E'}) = PUT^E')) 
= P(X e E') 

We call this the probability law or distribution of 
X. Typically, the probability of X e E' is written 
using the following notation: 

P x (£') = P(X e £') 

Very often, we have the random variable X as¬ 
sume values that are real numbers (i.e., £2' = JR 
and IBS' = B). Then, the events in the state space 
are characterized by countable unions and in¬ 
tersections of the intervals (—oo,fl] correspond¬ 
ing to the events {X < a}, for real numbers a. In 
this case, we require that to be a random vari¬ 
able, X satisfies 

{<u : X(a>) < a} = X” 1 ((oo,«]) e B 
for any real a. 

To illustrate, let's use a call option on a stock. 
Suppose in period t we purchase a call option 
on a certain stock expiring in the next period 
T —t + 1. The strike price, denoted by K, is $50. 
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Then as the buyer of the call option, in t + 1 
we are entitled to purchase the stock for $50 no 
matter what the market price of the stock (S f+ i) 
might be. The value of the call option at time 
t + 1, which we denote by C t+ i, depends on the 
market price of the stock at t + 1 relative to the 
strike price ( K ). Specifically, 

• If S t+ 1 is less than K, then the value of the 
option is zero, that is, C (+ \ — 0 

* If Sf+i is greater than K, then the value of the 
option is equal to S t+ \ — K 

Let (£2, A , P) be the probability space with 
the stock price in f + 1; that is, Sf+i = s repre¬ 
senting the uncountable real-valued outcomes. 
So, we have the uncountable probability space 
(£2, A, P) = (R, B, P). Assume that the price at 
t + 1 can take any nonnegative value. Assume 
further that the probability of exactly s is zero 
(i.e., P(Sf+i = s) = 0), that is, the distribution 
function of the price at T = 1 is continuous. Let 
the value of the call option in T = t + 1, Ct+i, 
be our random variable mapping from £2 to £2'. 
Since the possible values of the call option at 
t + 1 are real numbers, the state space is un¬ 
countable as well. Hence, we have (£2', A') = 
(R, B). Cf+i, to be a random variable, is a 
B-B'-measurable function. 

Now, the probability of the call becoming 
worthless is determined by the event in the ori¬ 
gin space that the stock price falls below K. For¬ 
mally, that equals 

P c,+1 (0) = P(C t+ i < 0} = P(St+ 1 < K} 

= P((-oo, K]) 

since the corresponding event in A to a 0 value 
for the call option is (—oo,K], Equivalently, 
C-\({0}) = (—oo, K], Any positive value c of 
Cf+i is associated with zero probability since 
we have 

P c «(c) = P(C t+ i = c} = P(St+ 1 = c + K} = 0 

due to the relationship C f+ i = Sf+i — K for 
S t+ i > K. 


KEY POINTS 

• Events in a mathematical probabilistic sense 
represent sets of values. They are used to de¬ 
scribe a certain situation such as an asset price 
being below some benchmark value. 

• A probability measure is a function that as¬ 
signs each event a unique probability be¬ 
tween zero and one. With respect to this 
probability measure an event is P-almost sure 
if it is assigned probability one, while an un¬ 
likely event is one with zero probability. 

• A random variable is a function assuming val¬ 
ues from a given set of values at random. The 
probability of the random variables assuming 
certain values is determined by the probabil¬ 
ity measure. 

• A distribution function is uniquely related to 
the probability measure. It assigns real num¬ 
bers values between zero and one. At any real 
number, it represents the probability that a 
random variable assumes values of at most 
this number. 

• Stochastic is the Greek term for random. It is 
often used in probability theory to describe 
that something is not deterministic, that is, 
known with certainty in advance. 

NOTES 

1. Suppose we have the interval [1,2], that is 
all real numbers between 1 and 2. We can¬ 
not count all numbers inside of this interval 
since, for any two numbers such as, for ex¬ 
ample, 1 and 1.001,1.0001, or even 1.000001, 
there are always infinitely many more num¬ 
bers that lie between them. 

2. Note that in a set, we do not consider an 
element more than once. 

3. By abscissa we mean a value on the horizon¬ 
tal x-axis. 

4. For example, let £2 = {1,2,3}, then the power 
set 2 fi = {0,{ 1 },{2},{3},{ 1,2},{1,3},{2,3}, 
£2}. That is, we have included all possi¬ 
ble combinations of the original elements 
of £2. 
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5. Let us assume, for now, that we are not re¬ 
stricted to a few digits due to measurement 
constraints or quotes conventions in the 
stock market. Instead, we consider being 
able to measure the returns to any degree 
of precision. 

6. By convention, we never include oo since it 
is not a real number. 

7. The symbol R. is just a mathematical abbre¬ 
viation for the real numbers. 

8. The empty set is interpreted as the improba¬ 
ble event. 

9. We use the index in p s&P50 ° to emphasize 
that this distribution function is unique to 
the probability of events related to the S&P 
500 log returns. 

10. Note that we do not define the outcomes 
of number of dots as nominal or even rank 


data anymore, but as numbers. That is 1 is 
1, 2 is 2, and so on. 

11. Theoretically, Q' does not have to be count¬ 
able; that is, it could contain more elements 
than X can assume values. But we restrict 
ourselves to countable state spaces L>' con¬ 
sisting of exactly all the values of X. 
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Abstract: Discrete probability distributions are needed whenever the random variable is to describe 
a quantity that can assume values from a countable set, either finite or infinite. A discrete probability 
distribution (or law) is quite intuitive in that it assigns certain values positive probabilities adding 
up to one, while any other value automatically has zero probability. In general, neglecting some 
of the mathematical rigor, discrete distributions can be understood from the insight gained from 
descriptive statistics. For example, the random number of defaults in a bond portfolio inside of a 
given period of time can be modeled with a discrete probability distribution. Another example is 
given by sampling when we are interested in whether an observation belongs to a certain group. 
Also, simple stock price models are based on discrete laws where the stock price can only change 
to one of a finite number of possible values. 


Discrete random variables are random variables 
on the countable space. We present the most 
important discrete random variables used in 
finance and their probability distribution (also 
called probability law): Bernoulli, binomial, hy¬ 
pergeometric, multinomial, Poisson, and dis¬ 
crete uniform. 

Appendix A provides a summary of the dis¬ 
crete distributions covered. 


DISCRETE LAW 

In order to understand the distributions dis¬ 
cussed in this entry, we will explain the gen¬ 
eral concept of a discrete law. Based on the 
knowledge of countable probability spaces, we 
introduce the random variable on the count¬ 
able space as the discrete random variable. To 
fully comprehend the discrete random vari¬ 
able, it is necessary to become familiar with the 
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process of assigning probabilities to events in 
the countable case. Furthermore, the cumula¬ 
tive distribution function will be presented as 
an important representative of probability. It is 
essential to understand the mean and variance 
parameters. Wherever appropriate, we draw 
analogies to descriptive statistics for a facilita¬ 
tion of the learning process. 

Random Variable on the 
Countable Space 

Recall the probability space (Zl , A, P) where £2 is 
a countable space. The probability of any event 
£ is given by 

P(E)=Y J Vi 

(OieE 

with the pi being the probabilities of the individ¬ 
ual outcomes o>, in the event £. Remember that 
the random variable X is the mapping from £2 
into £2' such that the state space £2' is countable. 
(We denote random variables by capital letters, 
such as X, whereas the outcomes are denoted by 
small letters, such as x,.) Thus, the probability 
of any event £' in the state space has probability 

P(X e £') = P X (E') = Vi 

coi:X(coi)eE' 

since E' is associated with the set 
{a>i : X(a>j) e E'} 

through X. The probability of each individual 
outcome of X yields the discrete probability law 
of X. It is given by P(X = x,) = pf, for all 
Xi e £2'. 

Only for individual discrete values x is the 
probability p x positive. This is similar to the 
empirical frequency distribution with positive 
relative frequency/ at certain observed values. 
If we sort the x,- e £2 in ascending order, anal¬ 
ogous to the empirical relative cumulative fre¬ 
quency distribution 

Xi<X 


we obtain the discrete cumulative distribution 
(cdf) of X, 

F X (x) = P(X < x) = E V? 

Xi<X 

That is, we express the probability that X as¬ 
sumes a value no greater than x. 

Suppose we want to know the probability of 
obtaining at most 3 dots when throwing a dice. 
That is, we are interested in the cdf of the ran¬ 
dom variable number of dots, at the value x = 
3. We obtain it by 

F x (3) = pi + p 2 + P3 = 1/6 + 1/6 + 1/6 = 0.5 

where the p, denote the respective probabilities 
of the number of dots less than or equal to 3. A 
graph of the cdf is shown in Figure 1. 

Mean and Variance 

The sample mean and variance are sample de¬ 
pendent statistics. Flere we present the mean 
and variance of the distribution as parameters 
where the probability space can be understood 
as the analog to the population. 

To illustrate, we use the random variable 
number of dots obtained by tossing a dice. Since 
we treat the numbers as numeric values, we are 
able to perform transformations and compu¬ 
tations with them. By throwing a dice several 
times, we would be able to compute a sample 
average based on the respective outcome. So, 
a question could be: What number is theoreti¬ 
cally expected? In our discussion below, we see 
how to answer that question. 

Mean 

The mean is the population equivalent to the 
sample average of a quantitative variable. In or¬ 
der to compute the sample average, we sum up 
all observations and divide the resulting value 
by the number of observations, which we will 
denote by n. Alternatively, we sum over all val¬ 
ues weighted by their relative frequencies. 

This brings us to the mean of a random 
variable. For the mean of a random variable. 
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Cumulative Distribution Function of Number of Dots Appearing from Tossing a Dice 


Figure 1 

we compute the accumulation of the outcomes 
weighted by their respective probabilities; that 
is, 

£(X)=J>-p x (1) 

Xi&Q 

given that equation (1) is finite. (Often, the mean 
is denoted as the parameter p.) If the mean is 
not finite, then the mean is said to not exist. 
The mean equals the expected value of the ran¬ 
dom variable X. However, as we will see in the 
following examples, the mean does not actually 
have to be equal to one of the possible outcomes. 

For the number of dots on the dice example, 
the expected value is 

6 .. 6 

E(X) = 2> Pi -£i= 21/6 = 3.5 

i=i ° ;=i 

So, on average, one can expect a value of 3.5 
for the random variable, despite the fact this is 
not an obtainable number of dots. How can we 
interpret this? If we were to repeat the dice toss¬ 
ing many times, record for each toss the number 
of dots observed, then, if we averaged over all 
numbers obtained, we would end up with an 
average very close if not identical to 3.5. 

Let's move from the dice tossing example to 
look at a binomial stock price model. With the 


stock price S at the end of period 1 being either 
Si = $18 or Si = $22, we have only these two 
outcomes with positive probability each. We de¬ 
note the probability measure of the stock price 
at the end of period 1 by P s (-). At the begin¬ 
ning of the period, we assume the stock price 
to be So = $20. Furthermore, suppose that up- 
and down-movements are equally likely; that 
is, P s (18) = 1 /i and P s (22) = V 2 . So we obtain 

E(S) = 1/2 • $18 + V 2 • $22 = $20 

This means on average, the stock price will re¬ 
main unchanged even though $20 is itself not 
an obtainable outcome. 

We can think of it this way. Suppose we ob¬ 
served some stock over a very long period of 
time and the probabilities for up- and down- 
movements did not change. Furthermore sup¬ 
pose that each time the stock price was $20 at 
the beginning of some period, we recorded the 
respective end-of-period price. Then, we would 
finally end up with an average of these end-of- 
period stock prices very close to if not equal 
to $20. 

Variance 

Just like in the realm of descriptive statistics, 
we are interested in the dispersion or spread of 
the data. For this, we introduce the variance as 
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a measure. Our focus is on the variance as a pa¬ 
rameter of the random variable's distribution. 

A sample measure of spread gives us infor¬ 
mation on the average deviation of observa¬ 
tions from their sample mean. With the help 
of the variance, we intend to determine the 
magnitude we have to theoretically expect of 
the squared deviation of the outcome from the 
mean. Again, we use squares to eliminate the 
effect from the signs of the deviations as well 
as to emphasize larger deviations compared to 
smaller ones, just as we have done with the 
sample variance. 

For the computation of the expected value of 
the squared deviations, we weight the individ¬ 
ual squared differences of the outcomes from 
the mean with the probability of the respective 
outcome. So, formally, we define the variance 
of some random variable X to be 

4 = Var(X) = (Xi - E(X)) 1 2 pf (2) 

For example, for the number of dots obtained 
from tossing a dice, we obtain the variance 

6 

a 2 x = Vflr(X) = £(i-E(X)) 2 p, x 
1=1 

= 7 [(1 - 3.5) 2 + (2 - 3.5) 2 + • • ■ + (6 - 3.5) 2 ] 
6 

= 2.9167 

E(X) - 1.7078 

\ 

o *-o— 


Thus, on average, we have to expect a squared 
deviation from the mean by roughly 2.9. 

The standard deviation is simply the square root 
of the variance. Formally, the standard devia¬ 
tion is given by 

a x = y/Var{X) 

The standard deviation appeals to intuition 
because it is a quantity that is of the same scale 
as the random variable X. In addition, it helps 
in assessing where the probability law assigns 
its probability mass. A rule of thumb is that 
at least 75% about the probability mass is as¬ 
signed to a vicinity of the mean that extends 
two standard deviations in each direction from 
the mean. Furthermore, this rule states that in 
at least 89% of the times, a value will occur that 
lies in a vicinity of the mean of three standard 
deviations in each direction. 

For the number of dots obtained from tossing 
a dice, since the variance is 2.9167, the standard 
deviation is 

a x = V2.9167 = 1.7078 

In Figure 2, we display all possible outcomes 
1 through 6 indicated by the o symbol, in¬ 
cluding the mean of E(X) = 3.5. We extend a 
vicinity about the mean of length a x = 1.7078, 
indicated by the "+" symbol, to graphically 

E(X) + 1.7078 

/ 

— — e-* o 


— — 0- ——O — 


E(X) = 3.5 


1 2 3 4 5 6 

Figure 2 Relation Between Standard Deviation (a = 1.7078) and Scale of Possible Outcomes 1,2,..., 6 
Indicated by the o Symbol 
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relate the magnitude of the standard deviation 
to the possible values of X. 


BERNOULLI DISTRIBUTION 

In the remainder of this entry, we introduce 
the most common discrete distributions used 
in finance. We begin with the simplest one, the 
Bernoulli distribution. 

Suppose we have a random variable X with 
two possible outcomes. That is, we have the 
state space Q' = {xj,x 2 }. The distribution of X is 
given by the probability for the two outcomes, 
that is, 

pf = p and P 2 = 1 — p 

Now, to express the random experiment of 
drawing a value for X, all we need to know is 
the two possible values in the state space and 
parameter p representing the probability of Xi. 
This situation is represented concisely by the 
Bernoulli distribution. This distribution is de¬ 
noted B(p) where p is the probability parameter. 

Formally, the Bernoulli distribution is asso¬ 
ciated with random variables that assume the 
values X\ = 1 and X 2 = 0, or = {0,1}. That is 
why this distribution is sometimes referred to 
as the "zero-one distribution." One usually sets 
the parameter p equal to the probability of X \ 
such that 

p = P(X = xi) = P(X= 1) 

The mean of a Bernoulli distributed random 
variable is 

E(X) = 0.(l-p) + l-p=p (3) 

and the variance is 

Var(X) — (0 - p) 2 ■ (1 - p) + (1 - p) 2 ■ p 

= V ■ (1 - P) ( 4 ) 

The Bernoulli random variable is commonly 
used when one models the random experiment 
where some quantity either satisfies a certain 
criterion or not. For example, it is employed 
when it is of interest whether an item is intact 
or broken. In such applications, we assign the 


outcome "success" the numerical value 1 and 
the outcome "failure" the numerical value 0, for 
example. Then, we model the random variable 
X describing the state of the item as Bernoulli 
distributed. 

Consider the outcomes when flipping a coin: 
head or tail. Now we set head equal to the nu¬ 
merical value 0 and tail equal to 1. We take X as 
the Bernoulli distributed random variable de¬ 
scribing the side of the coin that is up after the 
toss. What should be considered a fair coin? It 
would be one where in 50% of the tosses, head 
should be realized and in the remaining 50% of 
the tosses, tail should realized. So, a fair coin 
yields 

p — 1 — p — 0.5 

According to equation (3), the mean is then 
E(X) = 0.5 while, according to equation (4), the 
variance is Var(X ) = 0.25. Flere, again, the mean 
does not represent a possible value x from the 
state space We can interpret it in the fol¬ 
lowing way: Since 0.5 is halfway between one 
outcome (0) and the other outcome (1), the coin 
is fair because the mean is not inclined to either 
outcome. 

As another example, we will take a look at 
credit risk modeling by considering the risk of 
default of a corporation. Default occurs when 
the corporation is no longer able to meet its debt 
obligations, a priori, default occurring during 
some period is uncertain and, hence, is treated 
as random. Here, we view the corporation's fail¬ 
ure within the next year as a Bernoulli random 
variable X. When the corporation defaults, X = 
0 and in the case of survival, X = 1. For exam¬ 
ple, a corporation may default within the next 
year with probability 

P(X = 0) = l- p = l- e~ om = 0.0392 
and survive with probability 

P(X = 1) = p = e~ om = 0.9608 

We can, of course, extend the prerequisites 
of the Bernoulli distribution to a more general 
case; that is, we may choose values for the two 
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outcomes, X\ and x 2 , of the random variable X 
different from 0 and 1. Then, we set the param¬ 
eter p equal to either one of the probabilities 
P(X = X|) or P(X = x 2 ). The distribution yields 
mean 

E(X) = xi ■ p + x 2 ■ (1 - p) 
and variance 

Var(X) = (xj - E(X)) 2 ■p + (x 2 ~ E(X)) 2 • (1 - p) 

where we set p = P(X = x\). 

We illustrate this generalization of the 
Bernoulli distribution in the case of the bino¬ 
mial stock price model. Again, we denote the 
random stock price at time period 1 by Si. Recall 
that the state space Q! = {$18, $22} containing 
the two possible values for Si. The probability 
of Si assuming value $18 can be set to 

P(Si = $18) = p 

so that 

P(Si = $22) = 1 - p 

Hence, we have an analogous situation to a 
Bernoulli random experiment; however, with 
£2' = {$18,$22} instead of £2' = {0,1}. 

Suppose that 

P(Si = $18) = p = 0.4 and 
P(Sj = $22) = 1 - p = 0.6 

Then, the mean is 

E(Si) = 0.4 • $18 + 0.6 • $22 = $20.4 
and the variance 

Var(Si) = ($18 - $20.4) 2 • 0.4 

+ ($22 - $20.4) 2 • 0.6 = ($3.84) 2 

BINOMIAL DISTRIBUTION 

Suppose that we are no longer interested in 
whether merely one single item satisfies a par¬ 
ticular requirement such as success or failure. 
Instead, we want to know the number of items 
satisfying this requirement in a sample of n 
items. That is, we form the sum over all items 


in the sample by adding 1 for each item that is 
success and 0 otherwise. For example, it could 
be the number of corporations that satisfy their 
debt obligation in the current year from a sam¬ 
ple of 30 bond issues held in a portfolio. In this 
case, a corporation would be assigned 1 if it sat¬ 
isfied its debt obligation and 0 if it did not. We 
would then sum up over all 30 bond issues in 
the portfolio. 

Now, one might realize that this is the link¬ 
ing of n single Bernoulli trials. In other words, 
we perform a random experiment with n "inde¬ 
pendent" and identically distributed Bernoulli 
random variables, which we denote by B(p). 
Note that we introduced two important as¬ 
sumptions: independent random variables and 
identically distributed random variables. Inde¬ 
pendent random variables or independence is 
an important statistical concept that requires a 
formal definition. We will not provide one here. 
Instead, we will simply relate independence to 
an intuitive interpretation such as uninfluenced 
by another factor or factors. So in the Bernoulli 
trials, we assume independence, which means 
that the outcome of a certain item does not in¬ 
fluence the outcome of any others. By identical 
distribution we mean that the two random vari¬ 
ables' distributions are the same. In our context, 
it implies that for each item, we have the same 
B(p) distribution. 

This experiment is as if one draws an item 
from a bin and replaces it into the bin before 
drawing the next item. Thus, this experiment 
is sometimes referred to as dmzving with replace¬ 
ment. All we need to know is the number of 
trials, n, and the parameter p related to each sin¬ 
gle drawing. The resulting sum of the Bernoulli 
random variables is distributed as a binomial dis¬ 
tribution with parameters n and p and denoted 
by B{n, p). 

Let X be distributed B(n, p). Then, the random 
variable X assumes values in the state space 
£2' = {0,1,2,..., n}. In words, the total X is equal 
to the number of items satisfying the particular 
requirement (i.e., having a value of 1). X has 
some integer value i of at least 0 and at most n. 
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To determine the probability of X being equal 
to i, we first need to answer the following ques¬ 
tion: How many different samples of size n can 
yield a total of i hits (i.e., realizations of the out¬ 
come i)? The notation to represent realizing i 
hits out of a sample of size n is 



The expression in equation (5) is called the bino¬ 
mial coefficient and is explained in Appendix B 
of this entry 

Since in each sample the n individual B(p) 
distributed items are drawn independently, the 
probability of the sum over these n items is the 
product of the probabilities of the outcomes of 
the individual items. We illustrate this in the 
next example. 

Suppose we flip a fair coin 10 times (i.e., 
n = 10) and denote by Y, the result of the f-th 
trial. We denote by Y, = 1 that the f-th trial pro¬ 
duced head and by Y, = 0 that it produced tail. 
Assume we obtain the following result 


Yi Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Y 9 Y 10 
1100010110 


So, we observe X = 5 times head. For this par¬ 
ticular result that yields X = 5, the probability 
is 

P(Yi = 1,Y 2 = 1,...,Y 10 = 0) 

= P(Y 1 = 1)-P(Y 2 = 1).....P(Y 10 = 0) 

= V ' V ■ • ■ ■ ■ (i - V) 

= p 5 ■ (1 - V? 

Since we are dealing with a fair coin (i.e., 
p = 0.5), the above probability is 

P(Yi = 1, Y 2 = 1,..., Yio = 0) = 0.5 s ■ 0.5 s 
= 0.5 10 « 0.0010 



different samples leading to X = 5, we compute 
the probability for this value of the total as 

P(X = 5)= (™\ p 5.(l-pf 

= 252 ■ 0.5 10 = 0.2461 

So, in roughly one fourth of all samples of 
n = 10 independent coin tosses, we obtain a 
total of X = 5 Is (or heads). 

From the example, we see that the exponent 
for p is equal to the value of the total X (i.e., 
i = 5), and the exponent for 1 — p is equal to 
n — i = 5. 

Let p be the parameter from the related 
Bernoulli distribution (i.e., P(X = 1) = p). The 
probability of the B(n, p) random variable X be¬ 
ing equal to some i e is given by 

P(X = i) = (j ) • p‘ ■ (1 — p) n ~' , i = 1, 2, ..., n 

(6) 

For a particular selection of parameters, the 
probability distribution at certain values can be 
found in the four tables in Appendix A. 

The mean of a B(n, p) random variable is 

E(X) = n ■ p (7) 

and its variance is 

Var(X) = n ■ p ■ (1 — p) (8) 

Below we will apply what we have just learned 
to be the binomial stock price model and two 
other applications. 

Application to the Binomial Stock 
Price Model 

Let's extend the binomial stock price model 
in the sense that we link T successive periods 
during which the stock price evolves. (The en¬ 
tire time span of length T is subdivided into 
the adjacent period segments (0,1], (1,2],..., 
(T — 1, T].) In each period (f, f + 1], the 
price either increases or decreases by, say, 
10%. The 10% can be intuitively thought of as 
the variability of the stock price S. Thus, the 
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corresponding factor by which the price will 
change from the previous period is 0.9 (down 
movement) and 1.1 (up movement). Based on 
this assumption about the price movement for 
the stock each period, at the end of the period 
(f, t + 1], the stock price is 

St+i = St ■ Yf+i 

where the random variable Y f+ \ assumes a 
value from {0.9, 1.1}, with 0.9 representing a 
price decrease of 10% and 1.1 a price increase 
of 10%. Consequently, in the case of Yf+i = 1.1, 
we have 

St+i = St ■ i.i 

while, in case of Yf+i =0.9, we have 
Sf+i = S t ■ 0.9 

For purposes of this illustration, let's assume 
the following probabilities for the down move¬ 
ment and up movement, respectively, 

P(Y t+1 = 1.1) = p = 0.6 

and 

P(Y f+1 = 0.9) = 1 - p = 0.4 

After T periods, we have a random total of 
X up movements; that is, for all periods (0,1], 
(1,2],..., and (T — 1, T], we increment X by 1 
if the period related factor Y t+ i = 1.1, f = 0, 
1,..., T — 1. So, the result is some x e {1,2,..., 
T}. The total number of up movements, X, is 
a binomial distributed B(T, p) random variable 
on the probability space (£2', A', P x ) where 

1. The state space is £2' = {1,2,..., T}. 

2. a -algebra A' is given by the power set 2"' 
of £2'. 

3. P x is denoted by the binomial probability 
distribution given by 

P(X = k)=( T A p k ( 1 - p) T ~ k , k = 1,2,..., T 


Consequently, according to equations (7) and 
(8), we have 

£(X) = 2-0.6 = 1.2 

and 

Var(X) = 2 ■ 0.6 • 0.4 = 0.48 

By definition of St and X, we know that the 
evolution of the stock price is such that 

S T = S 0 -l.l x -0.9 r " x 

Let us next consider a random variable that 
is not binomial itself, but related to a binomial 
random variable. Now, instead of considering 
the B(T, p) distributed total X, we could intro¬ 
duce, as a random variable, the stock price at 
T (i.e., St). Using an illustration, we will derive 
the stock price independently of X and, then, 
emphasize the relationship between St and X. 
Note that Sj is not a binomial random variable. 

Let us set T = 2. We may start with an initial 
stock price of So = $20. At the end of the first 
period, that is, (0,1], we have 

Si = So • Yi 

either equal to 

Si = $20 • 1.1 = $22 
or 

Si = $20 ■ 0.9 = $18 

At the end of the second period, that is, (1,2], 
we have 

s 2 = Si ■ Y 2 = $22 ■ 1.1 = $24.20 

or 

s 2 = Si • Y 2 = $22 • 0.9 = $19.80 
in the case where Si = $22, and 

S 2 = Sj • Y 2 = $18 • 1.1 = $19.80 
or 

s 2 = Si ■ Y 2 = $18 ■ 0.9 = $16.20 

in the case where Si = $18. 

That is, at time f + 1 = T = 2, we have three 
possible values for S 2 , namely, $24.20, $19.80, 


with p = 0.6. 
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$24.20 


$19.80 


$16.20 


0 1 2 

Figure 3 Binomial Stock Price Model with Two Periods 

Note: Starting price So = $20. Upward factor u = 1.1, downward d = 0.9. 


and $16.20. Hence, we have a new state space 
that we will denote by Q' s = {$16.2, $19.8, 
$24.2}. Note that S 2 = $19.80 can be achieved 
in two different ways: (1) Si = So • 1.1 • 0.9 and 
(2) Si = So • 0.9 1.1. The evolution of this pric¬ 
ing process, between time 0 and T = 2, can be 
demonstrated using the binomial tree given in 
Figure 3. 

As a -algebra, we use A = 2 n s, which is the 
power set of the state space S2' s . It includes 
events such as, for example, "stock price in 
T = 2 no greater than $19.80," defined as 
E! = {S 2 < $19.80}. 

The probability distribution of S 2 is given by 
the following 

P(S 2 = $24.20) = P(Yj = 1.1) • P(Y 2 = 1.1) 

= f 2 ) P 2 = ®‘6 2 = 0.36 

P(S 2 = $19.80) = P(Yj = 0.9) • P(Y 2 = 1.1) 

+ P(Yj = 1.1) ■ P(Y 2 = 0.9) 

= 2(1 -p)p= -0.4-0.6 

= 0.48 

P(S 2 = $16.20) = P(Yj = 0.9) • P(Y 2 = 0.9) 

= Q)(l-p) 2 =0.4 2 =0.16 


We now have the complete probability space 
of the random variable S 2 . One can see the con¬ 
nection between S 2 and X by the congruency 
of the probabilities of the individual outcomes, 
that is, 

P(S 2 = $24.20) = P(X = 2) 

P(S 2 = $19.80) = P(X=1) 

P(S 2 = $16.20) = P(X = 0) 

From this, we derive, again, the relationship 
S 2 = S 0 • l.l x • 0.9 2_x 

Thus, even though S 2 , or, generally St, is not 
distributed binomial itself, its probability distri¬ 
bution can be derived from the related binomial 
random variable X. 1 


Application to the Binomial Interest 
Rate Model 

We next consider a binomial interest rate model 
of short rates, that is, one-period interest rates. 
Starting in t = 0, the short rate evolves over the 
subsequent two periods as depicted in Figure 4. 
In t — 0, we have yq = 4%, which is the short rate 
for period 1. For the following period, period 2, 
the short rate is r\ while finally, r 2 is valid for 
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® Period 1 ^ Period 2 2 Period 3 

Figure 4 Binomial Interest Rate Model 


period 3, from t — 2 through t = 3. Both ri and 
r 2 are unknown in advance and assume values 
at random. 

As we see, in each of the successive periods, 
the short rate either increases or decreases by 
1% (i.e., 100 basis points). Each movement is as¬ 
sumed to occur with a probability of 50%. So, in 
period i, i= 1 , 2 , the change in interest rate, Ar„ 
has P( Ar, = 1%) — p = 0.5 for an up-movement 
and P(Ar, = —1%) = 2 — p = 0.5 for a down- 
movement. For each period, we may model the 
interest rate change by some Bernoulli random 
variable where X\ denotes the random change 
in period 1 and X 2 that of period 2. The X, = 1 in 
case of an up-movement and X, = 0 otherwise. 
The sum of both (i.e., Y = Xi + X 2 ) is a bino- 
mially distributed random variable, precisely 

Y ~ B (2,0.5), thus, assuming values 0,1, or 2. 
To be able to interpret the outcome of Y in 

terms of interest rate changes, we perform the 
following transformations. A value of X, = 1 
yields A n — 1% while X, = 0 translates into Ar, 
= — 1% . Hence, the relationship between Y and 
f '2 is such that when Y = 0, implying two down- 
movements in a row, 7-2 = ro — 2% = 2%. When 

Y = 1, implying one up- and down-movement 
each, r 2 — ro + 1% —1% = 4%. And finally, 

Y = 2 corresponds to two up-movements such 


that r 2 = r 0 + 2% = 6%. So, we obtain the prob¬ 
ability distribution: 


r z P(r 2 ) 

2 % 0.5° -0.5 2 =0.25 

4% ^JO.5 1 -0.5 1 =0.5 

6 % ( ^ ) 032 ' °- 5 ° = °- 25 


HYPERGEOMETRIC 

DISTRIBUTION 

Recall that the prerequisites to obtain a bino¬ 
mial B(n, p) random variable X is that we have n 
identically distributed random variables Y„ all 
following the same Bernoulli law B(p) of which 
the sum is the binomial random variable X. We 
referred to this type of random experiment as 
"drawing with replacement" so that for the se¬ 
quence of individual drawings Y,, we always 
have the same conditions. 

Suppose instead that we do not "replace." 
Let's consider the distribution of "drawing 
without replacement." This is best illustrated 
with an urn containing N balls, K of which are 
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black and N — K are white. So, for the initial 
drawing, we have the chance of drawing a black 
ball equal to K/N, while we have the chance 
of drawing a white ball equal to (N — K)/N. 
Suppose the first drawing yields a black ball. 
Since we do not replace it, the condition be¬ 
fore the second drawing is such that we have 
(K — 1) black balls and still (N — K) white balls. 
Since the number of black balls has been re¬ 
duced by one and the number of white balls 
is unchanged, the chance of drawing a black 
ball has been reduced compared to the chance 
of drawing a white ball; the total is also re¬ 
duced by one. Hence, the condition is different 
from the first drawing. It would be similar if 
instead we had drawn a white ball in the first 
drawing, however, with the adverse effect on 
the chance to draw a white ball in the second 
drawing. 

Now suppose in the second drawing an¬ 
other black ball is selected. The chances are 
increasingly adverse against drawing another 
black ball in the third trial. This changing envi¬ 
ronment would be impossible in the binomial 
model of identical conditions in each trial. 

Even if we had drawn first a black ball and 
then a white ball, the chances would not be the 
same as at the outset of the experiment before 
any balls were drawn because the total is now 
reduced to N — 2 balls. So, the chance of obtain¬ 
ing a black ball is now (K — 1) / (N — 2), and that 
of obtaining a white ball is (N — K — 1)/(N — 
2). Mathematically, this is not the same as the 
original K/N and (N — K)/ ( N ). Hence, the con¬ 
ditions are altering from one drawing (or trial) 
to the next. 

Suppose now that we are interested in the 
sum X of black balls drawn in a total of n trials. 
Let's look at this situation. We begin our reason¬ 
ing with some illustration given specific values, 
that is, 

N= 10 
K = 4 
n = 5 
k = 3 


© 

© 


© 

© 

© 

© 

(w4) 

© 

(w6) 


Figure 5 Drawing n = 5 Balls without Replace¬ 
ment 

Note: N=10,K = 4 (black), n = 5, and k = 3 (black). 


The urn containing the black and white balls 
is depicted in Figure 5. Let's first compute the 
number of different outcomes we have to con¬ 
sider when we draw n = 5 out of N — 10 balls 
regardless of any color. We have 10 different op¬ 
tions to draw the first ball; that is, b 1 through zv6 
in Figure 5. After the first ball has been drawn 
without replacement, the second ball can be 
drawn from the urn consisting of the remain¬ 
ing nine balls. After that, the third ball is one 
out of the remaining eight, and so on until five 
balls have been successively removed. In total, 
we have 

10 x 9 x 8 x 7 x 6 = 101/5! = 30, 240 

alternative ways to withdraw the five balls. For 
example, we may draw b4, bl, b 1, w3, and w6. 
However, this is the same as zv6, iu3, b4, bl, and 
bl or any other combination of these five balls. 
Since we do not care about the exact order of the 
balls drawn, we have to account for that in that 
we divide the total number of possibilities (i.e., 
30,240) by the number of possible combinations 
of the very same balls drawn. The latter is equal 
to 

5x4x3x2x1 = 51 = 120 

Thus, we have 30,240/120 = 252 different 
nonredundant outcomes if we draw five out of 
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10 balls. Alternatively, this can be written as 


252 = 


10 ! 

5! x 5! 



( 9 ) 


Consequently, the chance of obtaining exactly 
this set of balls (i.e., {hi, bl, b4, zv3, zv6}) in any 
order is given by the inverse of equation (9) 
which is 


1 

252 



( 10 ) 


Now recall that we are interested in the chance 
of obtaining a certain number k of black balls 
in our sample. So, we have to narrow down the 
number of possible outcomes given by equation 
(9) to all samples of size 5 that yield that number 
k which, here, is equal to 3. How do we do 
this? 

We have a selection of four black balls (i.e., 
bl, b 2, b3, and bA) to draw from. That gives us 
a total of 4 x 3 x 2 = 4! = 24 different possi¬ 
bilities to recover k = 3 black balls out of the 
urn consisting of four balls. Again, we do not 
care about the exact order in which we draw 
the black balls. To us, it is the same whether we 
select them, for example, in the order bl — b2 — 
b4 or b2 — b4 — bl, as long as we obtain the set 
{bl, bl, bA}. So, we correct for this by dividing 
the total of 24 by the number of combinations 
to order these particular black balls; that is, 

3x2x1= 3! = 6 


Hence, the number of combinations of draw¬ 
ing k = 3 black balls out of four is 


24/6 = 41/3! =4 


Next we need to consider the previous num¬ 
ber of possibilities of drawing k = 3 black balls 
in combination with drawing n — k = 2 white 
balls. We apply the same reasoning as before to 
obtain two white balls from the collection of six 
(i.e., {zvl, zvl, zv3, zvA, zv5, zv6}). That gives us 6 
x 5/2 = 6!/(2! x 4!) = 15 nonredundant options 
to recover two white balls, in our example. 


In total, we have 

, _ 4 x 3 x 2 x 1 

4 x 15 = -x 

3x2x1 

4! 6! 


3! x 1! 2! x 4! 


6x5x4x3x2xl 


2xlx4x3x2xl 



different possibilities to obtain three black and 
two white balls in a sample of five balls. All 
these 60 samples have the same implication for 
us (i.e., k — 3). Combining these 60 possibilities 
with a probability of 0.004 as given by equation 
(10), we obtain as the probability for a sum of 
k = 3 black balls in a sample of n = 5 


60/252 = 0.2381 


Formally, we have 



0.2381 


Then, for our example, the probability distri¬ 
bution of X is 


P(X = k) 



1,2, 3,4 


( 11 ) 

(Note that we cannot draw more than four black 
balls from bl, bl, b3, and bA.) 

Let's advance from the special conditions of 
the example to the general case; that is, (1) at 
the beginning, some nonnegative integer N of 
black and white balls combined, (2) the overall 
number of black balls 0 < K < N, (3) the sample 
size 0 < n < N, and (4) the number 0 < k < n of 
black balls in the sample. 

In equation (11), we have the probability of k 
black balls in the sample of n = 5 balls. We dis¬ 
sect equation (9) into three parts: the denom¬ 
inator and the two parts forming the product 
in the numerator. The denominator gives the 
number of possibilities to draw a sample of 
n — 5 balls out of N = 10 balls, no matter what 
the combination of black and white. In other 
words, we choose n = 5 out of N — 10. The 
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resulting number is given by the binomial co¬ 
efficient. We can extend this to choosing a gen¬ 
eral sample of n drawings out of a population 
of an arbitrary number of N balls. Analogous 
to equation (9), the resulting number of possi¬ 
ble samples of length n (i.e., n drawings) is then 
given by 



Next, suppose we have k black balls in this 
sample. We have to consider that in equation 
(11), we chose k black balls from a population 
of K = 4 yielding as the number of possibilities 
for this the binomial coefficient on the left-hand 
side in the numerator. Now we generalize this 
by replacing K = 4 by some general number 
of black balls (K < N) in the population. The 
resulting number of choices for choosing k out 
of the overall K black balls is then. 



And, finally, we have to draw the remaining 
n — k balls, which have to be white, from the 
population of N — K white balls. This gives us 


N—K 
n — k 


(14) 


different nonredundant choices for choosing 
n — k white balls out of N — K. 

Finally, all we need to do is to combine equa¬ 
tions (12), (13), and (14) in the same fashion as 
equation (11). By doing so, we obtain 


P(X = k) = 



k = 1,2, ...,n 


(15) 


as the probability to obtain a total of X = k 
black balls in the sample of length n without 
replacement. 

Importantly, here, we start out with N balls of 
which K are black and, after each trial, we do not 
replace the ball drawn, so that the population 


is different for each trial. The resulting random 
variable is hypergeometric distributed with pa¬ 
rameters (N, K, n); that is, Hyp(N, K, n), and 
probability distribution given by equation (15). 

The mean of a random variable X following a 
hypergeometric probability law is given by 

E(X)=„A 


and the variance of this X ~ Hyp(N, K, n) is 
given by 


Var(X) = a 2 



N-K 

N 


N-n 


N- 1 


The hypergeometric and the binomial dis¬ 
tributions are similar, though not equivalent. 
However, if the population size N is large, the 
hypergeometric distribution is often approxi¬ 
mated by the binomial distribution with equa¬ 
tion (6) causing only little deviation from the 
true probabilities of equation (15). 


Application 

Let's see how the hypergeometric distribution 
has been applied in a Federal Reserve Bank 
of Cleveland study by Humpage (1998) to as¬ 
sess whether U.S. exchange-rate intervention 
resulted in a desired depreciation of the dollar. 

Consider the following scenario. The U.S. 
dollar is appreciating against a certain foreign 
currency. This might hurt U.S. exports to the 
country whose sovereign issues the particular 
foreign currency. In response, the U.S. Federal 
Reserve might be inclined to intervene by pur¬ 
chasing that foreign currency to help depreciate 
the U.S. dollar through the increased demand 
for foreign currency relative to the dollar. 
This strategy, however, may not necessarily 
produce the desired effect. That is, the dollar 
might continue to appreciate relative to the 
foreign currency. Let's let an intervention by 
the Federal Reserve be defined as the purchase 
of that foreign currency. Suppose that we let the 
random variable X be number of interventions 
that lead to success (i.e., depreciation of the 
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dollar). Given certain conditions beyond the 
scope of this book, the random variable X is 
approximately distributed hypergeometric. 

This can be understood by the following 
slightly simplified presentation. Let the num¬ 
ber of total observations be N days of which 
K is the number of days with a dollar de¬ 
preciation (with or without intervention), and 
At — K is the number of days where the dollar 
appreciated or remained unchanged. The num¬ 
ber of days the Federal Reserve intervenes is 
given by rt. Furthermore, let k equal the num¬ 
ber of days the interventions are successful so 
that n — k accounts for the unsuccessful inter¬ 
ventions. The Federal Reserve could technically 
intervene on all N days that would yield a to¬ 
tal of K successes and N — K failures. How- 
ever, the actual number of occasions n on which 
there are interventions might be smaller. The 
n interventions can be treated as a sample of 
length n taken from the total of N days without 
replacement. 

The model can best be understood as fol¬ 
lows. The observed dollar appreciations, persis¬ 
tence, or depreciations are given observations. 
The Federal Reserve can merely decide to in¬ 
tervene or not. Consequently, if it took action 
on a day with depreciation, it would be con¬ 
sidered a success and the number of successes 
available for future attempts would, therefore, 
be diminished by one. If, on the other hand, the 
Federal Reserve decided to intervene on a day 
with appreciation or persistence, it would incur 
a failure that would reduce the number of avail¬ 
able failures left by one. The N — n days there 
are no interventions are treated as not belong¬ 
ing to the sample. 

The randomness is in the selection of the days 
on which to intervene. The entire process can 
be illustrated by a chain with N tags attached 
to it containing either a + or — symbol. Each 
tag represents one day. A + corresponds to an 
appreciation or persistence of the dollar on the 
associated day, while a — to a depreciation. We 
assume that we do not know the symbol behind 
each tag at this point. 


In total, we have K tags with a + and At — K 
with a — tag. At random, we flip n of these tags, 
which is equivalent to the Federal Reserve tak¬ 
ing action on the respective days. Upon turning 
the respective tag upside right, the contained 
symbol reveals immediately whether the asso¬ 
ciated intervention resulted in a success or not. 

Suppose we have N = 3,072 total observa¬ 
tions of which K — 1,546 represents the num¬ 
ber of days with a dollar depreciation, while on 
N — K = 1,508 days the dollar either became 
more valuable or remained steady relative to 
the foreign currency. 

Again, let X be the hypergeometric random 
variable describing successful interventions. 
On n — 138 days, the Federal Reserve saw rea¬ 
son to intervene, that is, purchase foreign cur¬ 
rency to help bring down the value of the dollar 
which was successful on k = 51 days and un¬ 
successful on the remaining n — k = 87 days. 
Concisely, the values are given by N — 3,072, 
K = 1,546, N — K — 1,508, n = 138, k = 51, and 
n — k = 87. 

So, the probability for this particular outcome 
k — 51 for the number of successes X given 
n — 138 trials is 


P(X = 51) = 



0.00013429 


which is an extremely small probability. 

Suppose we state the simplifying hypothesis 
that the Federal Reserve is overall successful 
if most of the dollar depreciations have been 
the result of interventions (i.e., purchase of for¬ 
eign currency). Then, this outcome with k — 51 
successful interventions given a total of N — K 
depreciations shows that the decline of the dol¬ 
lar relative to the foreign currency might be 
the result of something other than a Federal 
Reserve intervention. Hence, the Federal Re¬ 
serve intervention might be too vague a forecast 
of a downward movement of the dollar relative 
to the foreign currency. 
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MULTINOMIAL 

DISTRIBUTION 

For our next distribution, the multinomial dis¬ 
tribution, we return to the realm of drawing 
with replacement so that for each trial, there 
are exactly the same conditions. That is, we 
are dealing with independent and identically 
distributed random variables. (Once again we 
note that we are still short of a formal definition 
of independence in the context of probability 
theory. We use the term in the sense of "un¬ 
influenced by.") However, unlike the binomial 
distribution, let's change the population so that 
we have not only two different possible out¬ 
comes for one drawing, but a third or possibly 
more outcomes. 

We extend the illustration where we used 
an urn containing black and white balls. In 
our extension, we have a total of N balls with 
three colors: K w white balls, K k black balls, and 
K r = N — K w — Kb red balls. The probability of 
each of these colors is denoted by 

P(Y = white) = p w 

P(Y = black) = pb 
P(Y — red) = p r 

with each of these probabilities representing the 
population share of the respective color: p, = 
Ki/N, for i — white, black, and red. Since all 
shares combined have to account for all N, we 
set 


p r = 1 - pb ~ Pw 

For purposes of this illustration, let p w = pi, 
= 0.3 and p r — 0.4. Suppose that in a sample of 
n — 10 trials, we obtain the following result: n w 
= 3 white, = 4 black, and n r = n — n w — nb = 
3 red. Furthermore, suppose that the balls were 
drawn in the following order 


Yi Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y g Y 9 Y 10 
rwbbwrrbwb 


where the random variable Y, represents the 
outcome of the z-th trial. (We denote w = white. 


b = black, and r = red.) This particular sample 
occurs with probability 


P(Yi = r,Y 2 = w,... ,Y W = b) = p r ■ p w ■... ■ pb 

3 3 4 

= Pr ■ Pw'Pb 


The last equality indicates that the order of ap¬ 
pearance of the individual values, once again, 
does not matter. 

We introduce the random variable X repre¬ 
senting the number of the individual colors oc¬ 
curring in the sample. That is, X consists of the 
three components X w , X;„ and X r or, alterna¬ 
tively, X = (X W/ X/j, X f ). Analogous to the bino¬ 
mial case of two colors, we are not interested 
in the order of appearance, but only in the re¬ 
spective numbers of occurrences of the different 
colors (i.e., n W/ nb, and n r ). Note that several dif¬ 
ferent sample outcomes may lead to X = (n w , 
rib, n r ). The total number of different nonredun- 
dant samples with n w , nb, and n r is given by 
the multinomial coefficient introduced in Ap¬ 
pendix B, which here yields 


n 

n h n r 


10 

3 3 4 


= 4,200 


Hence, the probability for this value of X = 
(k w , k b , k r ) = (3,4,3) is then 


P(X=( 3,4,3)) =( 3 3° ^-pI-pIpI 

= 4,200 • 0.3 3 ■ 0.3 4 ■ 0.4 3 
= 0.0588 

In general, the probability distribution of a 
multinomial random variable X with k compo¬ 
nents X\, X 2 , ..., Xjt is given by 


P(Xi = «i, X 2 = n 2 ,...,, Xjt = n k ) 

= ( n )-p?-p?-...-p? (16) 

\ni n 2 ...n k J 

where, for; = 1,2,... ,k, n, denotes the outcome 
of component ; and the p, the corresponding 
probability. 
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The means of the k components X] through 
X k are given by 

E(X 1 )=p 1 -n 
E(X k ) = p k -n 

and their respective variances by 

Var(Xi) = of = pi • (1 - p x ) • n 

Var(X k ) = erf = p k .(l-p k ). n 

Multinomial Stock Price Model 

We can use the multinomial distribution to ex¬ 
tend the binomial stock price model described 
earlier. Suppose we are given a stock with price 
So, in t — 0. In t = 1, the stock can have either 
price 

S|'° = S 0 • u 
S® = So • l 
sf :1 = S 0 • d 

Let the three possible outcomes be a 10% in¬ 
crease in price (u = 1.1), no change in price 
(/ = 1.0), and a 10% decline in price (d — 0.9). 
That is, the price either goes up by some factor, 
remains steady, or drops by some factor. There¬ 
fore, 

S< u) = S 0 ■ 1.1 
S® = S 0 ■ 1.0 
S\ d) = S 0 ■ 0.9 

Thus, we have three different outcomes of the 
price change in the first period. Suppose the 
price change behaved the same in the second 
period, from t — 1 until t — 2. So, we have 

g(“) _ Sj -1.1 

S® = Sr • 1.0 
S* d) = Si • 0.9 

at time t = 2 depending on 

Si e jsj' 0 , S®, s| d) ) 


Let's denote the random price change in the 
first period by Yi and the price change in the 
second period by the random variable Y 2 . So, it 
is obvious that Y 1 and Y 2 independently assume 
some value in the set {u,l,d} — {1.1,1.0,0.9}. Af¬ 
ter two periods (i.e., in t = 2), the stock price is 

S 2 = So • Yj ■ Y 2 e |s< u) , S®, S' d) } 

Note that the random variable S 2 is not multi- 
nomially distributed itself. However, as we will 
see, it is immediately linked to a multinomial 
random variable. 

Since the initial stock price So is given, the 
random variable of interest is the product 
Yi ■ Y 2 , which is in a one-to-one relationship 
with the multinomial random variable X = (n U/ 
Hi, rid) (Le., the number of up-, zero-, and down- 
movements, respectively). The state space of 
Yi • Y 2 is given by {uu,ul,nd,ll,ld,dd}. This corre¬ 
sponds to the state space of X, which is given by 

£2' = {(2,0, 0), (0, 2, 0), (0, 0, 2), (1,1, 0), (1,0,1), 

(0,L1)1 

Note that since Yi • Y 2 is a product, we do 
not consider, for example, (Yj = u, Y 2 = d) and 
(Yi = d,Y 2 = u) separately. With 

P(Y, = u) = p u = 0.25 
P(Y, =l)=p> = 0.50 
P (Y; = d) = p d = 0.25 

the corresponding probability distribution of X 
is given in the first two columns of Table 1. We 
use the multinomial coefficient 

n„ n, n d ) 

where 

n = the number of periods 
n u — the number of up-movements 
rii — number of zero movements 
n d — number of down-movements 

Now, if So = $20, then we obtain the proba¬ 
bility distribution of the stock price in t — 2 as 
shown in columns 2 and 3 in Table 1. Note that 
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Table 1 Probability Distribution of the Two-Period Stock Price Model 


X = («„, n lr n d ) 


P(X = •) 


S? = ■ 


( 2 , 0 , 0 ) 

( 1 , 1 , 0 ) 

( 1 , 0 , 1 ) 

(0,2,0) 

( 0 , 1 , 1 ) 

(0,0,2) 


(2 0 o)p“p“ = 0 - 0625 
(l 1 0 ) PV = 2-0.25-0.5 = 0.25 
(l 0 l) P u P d = 2 - 0.25 2 = 0.125 
(0 2 0 )p , P I =°-5 2 = 0.25 
(0 1 l) p,pd = 2 ■ 0.5 ■ O- 25 = 0.25 
(0 0 2 ) = 0-25 2 = 0.0625 


So • u 2 = 20 ■ l.l 2 = 24.2 
So • k ■ 2 = 20 • 1.1 • 1.0 = 22 
S 0 d = 20-l.l-0.9 = 19.8 
So ■ / ■ 2 = 20 ■ 1.0 2 = 20 
So • / ■ d = 20 ■ 1.0 ■ 0.9 = 18 
So • d 2 = 20 ■ 0.9 2 = 16.2 


In the first and second columns, we have the probability distribution of the two period stock price changes X = 
Yi ■ Y2 in the multinomial stock price model. In the third column, we have the probability distribution of the stock 
price S2. 


the probabilities of the values of S 2 are asso¬ 
ciated with the corresponding price changes X 
and, hence, listed on the same lines of Table 1. 
It is now possible to evaluate the probability 
of events such as, "a stock price S 2 of, at most, 
$22," from the a -algebra A' of the multinomial 
probability space of X. This is given by 

P(S 2 < $22) 

= P(S 2 = $16.2) + P(S 2 = $18) + P(S 2 = $19.8) 
+ P(S 2 = $20) + P(S 2 = $22) 


= 0.25 + 0.125 + 0.25 + 0.25 + 0.0625 
= 1 - P(S 2 = $24.2) 

= 0.9375 

where the second line is the result of the fact 
that the sum of the probabilities of all disjoint 
events has to add up to one. That follows since 
any event and its complement account for the 
entire state space S2 . 

In Figure 6, we can see the evolution of the 
stock price along the different paths. 



0 1 It 


Figure 6 Multinomial Stock Price Model: Stock Price S 2 , in t = 2 
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From equation (1), the expected stock price in 
t = 2 is computed as 

E(S 2 ) = J>-P(S 2 = s) 

sgQ' 

= $16.2 • 0.0625 + $18 • 0.25 + $19.8 • 0.125 
+ $20 • 0.25 + $22 ■ 0.25 + $24.2 • 0.0625 
= $20 

So, on average, the stock price will remain un¬ 
changed. 


POISSON DISTRIBUTION 

To introduce our next distribution, consider the 
following situation. A property and casualty in¬ 
surer underwrites a particular type of risk, say, 
automotive damage. Overall, the insurer is in¬ 
terested in the total annual dollar amount of the 
claims from all policies underwritten. The total 
is the sum of the individual claims of differ¬ 
ent amounts. The insurer has to have enough 
equity as risk guarantee. In a simplified way, 
the sufficient amount is given by the number 
of casualties N times the average amount per 
claim. 

In this situation, the insurer's interest is in the 
total number of claims N within one year. Note 
that there may be multiple claims per policy. 
This number N is random because the insurer 
does not know its exact value at the beginning 
of the year. The insurer knows, however, that 
the minimum number of casualties possible is 
zero. Theoretically, although it is unlikely, there 
may be infinitely many claims originating from 
the year of interest. 

So far, we have considered the number of 
claims over the period of one year. It could be 
of interest to the insurer, however, to know the 
behavior of the random variable N over a pe¬ 
riod of different length, say five years, or even 
the number of casualties related to one month 
could be of interest. It might be reasonable to as¬ 
sume that there will probably be fewer claims 
in one month than in one year or five years. 


The number of claims, N, as a random variable 
should follow a probability law that accounts 
for the length of the period under analysis. In 
other words, the insurers want to assure that 
the probability distribution of N gives credit to 
N being proportional to the length of the period 
in the sense that if a period is n times as long 
as another, then the number of claims expected 
over the longer period should be n times as 
large, as well. 

As a candidate that satisfies these require¬ 
ments, we introduce the Poisson distribution 
with parameter A formally expressed as Poz(A). 
We define that the parameter is a positive real 
number (i.e., A > 0). A Poisson random variable 
N —that is, X ~ Poi( A)—assumes nonnegative 
integer values. Formally, N is a function map¬ 
ping the space of outcomes, ft, into the state 
space 

S2'= {0,1,2,...} 

which is the set N of the nonnegative integer 
numbers. 

The probability measure of a Poisson random 
variable N for nonnegative integers k = 0,1, 
2,... is defined as 

P{N = k) = ^e~ x (17) 

where e = 2.7183 is the Euler constant. Here, we 
have unit period length. 

The mean of a Poisson random variable with 
parameter A is 

E(N) = A 

while its variance is given by 

Var(N) = cr 2 = A (18) 

So, both parameters, mean and variance, of 
N ~ Poi( A) are given by the parameter A. 

For a period of general length f, equation (17) 
becomes 

P(N = k) = ^-rf-e~ u (19) 

K ! 
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We can see that the new parameter is now 
X t , accounting for the time proportionality of 
the distribution of N, that is, N = N(t) is the 
number of jumps of size 1 in the interval (0, f). 
The mean changes to 

EN(t) = Xt (20) 

and analogous to the variance given by (18) is 
now 

Var(N(t)) = a 2 (t) = Xt (21) 

We can see by equation (20) that the average 
number of occurrences is the average per unit 
of time, X, times the length of the period, f, in 
units of time. The same holds for the variance 
given by equation (21). 

The Poisson distribution serves as an ap¬ 
proximation of the hypergeometric distribution 
when certain conditions are met regarding sam¬ 
ple size and parameter p. 

Application to Credit Risk 
Modeling for a Bond Portfolio 

The Poisson distribution is typically used in fi¬ 
nance for credit risk modeling. For example, 
suppose we have a pool of 100 bonds issued by 
different corporations. By experience or empir¬ 
ical evidence, we may know that each quarter 
of a year the expected number to default is two; 
that is, X = 2. Moreover, from prior research, 
we can approximate the distribution of N by 
the Poisson distribution, even though, theoret¬ 
ically, the Poisson distribution admits values k 
greater than 100. What is the number of bonds 
to default within the next year, on average? 
According to equation (3), since the mean is 
Equarter(N) = X = 2 per quarter, the mean per year 
(f = 4) is 

Eyear(N) = Xt = 2-4 = 8 

By equation (20), the variance is 8, from equa¬ 
tion (19), the probability of, at most, 10 bonds 


to default is given by 

P(N < 10) = P(N = 0) + P(N = 1) + ■ • • 

+ P(N= 10) 

_ —2x4 (2x4)° *4 (2 x 4)1 

0 ! 1 ! 

. —2x4 (2 X 4) 10 
10 ! 

= 0.8159 


DISCRETE UNIFORM 
DISTRIBUTION 

Consider a probability space (Q 1 , A', P) where 
the state space is a finite set of, say n, outcomes, 
that is, ST = {x 1 x 2 ,..., x n }. The cr-algebra A' is 
given by the power set of fT. 

So far we have explained how drawings from 
this ft' may be modeled by the multinomial dis¬ 
tribution. In the multinomial distribution, the 
probability of each outcome may be different. 
However, suppose that the for our random vari¬ 
able X, we have a constant P(X — xj) — 1 In, for 
all / = 1,2,...., n. Since all values Xj have the 
same probability (i.e., they are equally likely), 
the distribution is called the discrete uniform dis¬ 
tribution. We denote this distribution by X ~ 
DUq.'. We use the specification ft' to indicate 
that X is a random variable on this particular 
state space. 

The mean of a discrete, uniformly distributed 
random variable X on the state space f2' = {xi, 
Xi,, x„} is given by 

n i n 

E (x)=y> ■ = - y> ( 22 ) 

i =1 i=l 

Note that equation (22) is equal to the arithmetic 
mean. The variance is 

Var(X) = J2 Piixi - E(X)) 2 

i:Xi€Q' 

= - E (* - £ ( x )) 2 

n . z —' 

v.XieQ' 

with E(X) from equation (22). 
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A special case of a discrete uniform probabil¬ 
ity space is given when Q' = {1,2,...,«}. The 
resulting mean, according to equation ( 22 ), is 
then, 

e (x) = y>.*, = iyy 
1=1 1=1 
1 n(n + 1) n + 1 

= s x — 2 — = — (23) 

For this special case of discrete uniform dis¬ 
tribution of a random variable X, we use the 
notation X ~ DU(n) with parameter n. 

Let's once more consider the outcome of a 
toss of a dice. The random variable number of 
dots, X, assumes one of the numerical outcomes 
1, 2, 3, 4, 5, 6 each with a probability of 1/6. 
Hence, we have a uniformly distributed dis¬ 
crete random variable X with the state space 
£2' = {1, 2, 3,4, 5, 6 }. Consequently, we express 
this as X ~ DU ( 6 ). 

Next, we want to consider several indepen¬ 
dent trials, say n = 10, of throwing the dice. By 
ill, n 2 , n 3 , n 4 , n 5 , and n 6 , we denote the num¬ 
ber of occurrence of the values 1, 2, 3, 4, 5, and 
6 , respectively. With constant probability p\ — 
p 2 = ... = P 6 — 1 / 6 , we have a discrete uni¬ 
form distribution, that is, X ~ DU ( 6 ). Thus, the 
probability of obtaining jq = 1 , n 2 = 2 , n 3 = 1 , 
114 = 3, ??5 = 1, and /q = 2, for example, is 

P(X 1 = 1,X 2 = 1,...,X 6 = 2) 



10! /1\ 10 
= 1! x 2 ! x ... x 2 ! ‘ \ 6 / 

= 151200 ■ 0.00000016538 
= 0.0025 

Application to the Multinomial 
Stock Price Model 

Let us resume the stock price model where in 
t = 0 we have a given stock price, say So = $ 20 , 
where there are three possible outcomes at the 
end of the period. In the first period, the stock 


price either increases to 

Sf° = S 0 • 1.1 = $22 
remains the same at 

Sf = S 0 • 1.0 = $20 
or decreases to 

S\ d) = S 0 ■ 0.9 = $18 

each with probability 1/3. Again, we introduce 
the random variable Y assuming the values 
u = 1.1, / = 1.0, and d = 0.9 and, thus, repre¬ 
senting the percentage change of the stock price 
between t = 0 and t + 1 = 1. The stock price in 
t + 1 = 1 is given by the random variable Si on 
the corresponding state space 

£2 S = js^sf.sf) 

Suppose we have n = 10 successive periods 
in each of which the stock price changes by the 
factors u, /, or d. Let the multinomial random 
variable X = (Xi, X 2 , X 3 ) represent the total of 
up-, zero-, and down-movements, respectively. 
Suppose, after these n periods, we have n u = 
3 up-movements, nj = 3 zero-movements, and 
ftd = 4 down-movements. According to equa¬ 
tion (16), the corresponding probability is 

r ( X,=3,X 2 = 3.X3 = 4)=( 3 1 3 0 4 )( i )“ 

= 4200-0.00001935 
= 0.0711 

This probability corresponds to a stock price in 
t = 10 of 

S 10 = S 0 • u 3 -l 3 ■d i = $20 • l.l 3 ■ 1 - 0.9 4 = $17.47 

This stock price is a random variable given by 

S 10 = So • Yi • Y 2 •... • Yi 0 

where the Y, are the corresponding relative 
changes (i.e., factors) in the periods i = 1 , 2 ,..., 
10. Note that S 10 is not uniformly distributed 
even though it is a function of the random vari¬ 
ables Yi, Y 2 ,..., Y 10 because its possible out¬ 
comes do not have identical probability. 
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B(n, p), Binomial Probability Distribution 


P(X 

= *)= ' 

go-* 

• (1 - p) n 

k for n 

= 5 

p 

0.1 

0.2 

0.5 

0.8 

0.9 

k 






1 

0.3281 

0.4096 

0.1563 

0.0064 

0.0005 

2 

0.0729 

0.2048 

0.3125 

0.0512 

0.0081 

3 

0.0081 

0.0512 

0.3125 

0.2048 

0.0729 

4 

0.0005 

0.0064 

0.1563 

0.4096 

0.3281 

5 

0 

0.0003 

0.0313 

0.3277 

0.5905 

B(n, p), Binomial Probability Distribution 


P (X - 

= *>=( 


(1 - V Y 

k for n - 

= 10 

P 

0.1 

0.2 

0.5 

0.8 

0.9 

k 






1 

0.3874 

0.2684 

0.0098 

0 

0 

2 

0.1937 

0.3020 

0.0439 

0.0001 

0 

3 

0.0574 

0.2013 

0.1172 

0.0008 

0 

4 

0.0112 

0.0881 

0.2051 

0.0055 

0.0001 

5 

0.0015 

0.0264 

0.2461 

0.0264 

0.0015 

6 

0.0001 

0.0055 

0.2051 

0.0881 

0.0112 

7 

0 

0.0008 

0.1172 

0.2013 

0.0574 

8 

0 

0.0001 

0.0439 

0.3020 

0.1937 

9 

0 

0 

0.0098 

0.2684 

0.3874 

10 

0 

0 

0.0010 

0.1074 

0.3487 

B(n, p), Binomial Probability Distribution 


P(X- 

=M 


(1 - pT 

?r 

H-K 

o 

1-1 

II 

= 50 

P 

0.1 

0.2 

0.5 

i 0.8 

0.9 

k 







1 

0.0286 

0.0002 

0 

0 

0 

2 

0.0779 

0.0011 

0 

0 

0 

3 

0.1386 

0.0044 

0 

0 

0 

4 

0.1809 

0.0128 

0 

0 

0 

5 

0.1849 

0.0295 

0 

0 

0 

6 

0.1541 

0.0554 

0 

0 

0 

7 

0.1076 

0.0870 

0 

0 

0 

8 

0.0643 

0.1169 

0 

0 

0 

9 

0.0333 

0.1364 

0 

0 

0 


p 

0.1 

0.2 

0.5 

0.8 

0.9 

k 






10 

0.0152 

0.1398 

0 

0 

0 

20 

0 

0.0006 

0.0419 

0 

0 

30 

0 

0 

0.0419 

0.0006 

0 

40 

0 

0 

0 

0.1398 

0.0152 

41 

0 

0 

0 

0.1364 

0.0333 

42 

0 

0 

0 

0.1169 

0.0643 

43 

0 

0 

0 

0.0870 

0.1076 

44 

0 

0 

0 

0.0554 

0.1541 

45 

0 

0 

0 

0.0295 

0.1849 

46 

0 

0 

0 

0.0128 

0.1809 

47 

0 

0 

0 

0.0044 

0.1386 

48 

0 

0 

0 

0.0011 

0.0779 

49 

0 

0 

0 

0.0002 

0.0286 

50 

0 

0 

0 

0 

0.0052 

B(n, p), Binomial Probability Distribution 


P(X = 

*>-( 

*v- 

■ (i - vT 

k for n — 

o 

o 

T— ( 


p 

k 

0.1 

0.2 

0.5 

0.8 

0.9 

1 

0.0003 

0 

0 

0 

0 

2 

0.0016 

0 

0 

0 

0 

3 

0.0059 

0 

0 

0 

0 

4 

0.0159 

0 

0 

0 

0 

5 

0.0339 

0 

0 

0 

0 

6 

0.0596 

0.0001 

0 

0 

0 

7 

0.0889 

0.0002 

0 

0 

0 

8 

0.1148 

0.0006 

0 

0 

0 

9 

0.1304 

0.0015 

0 

0 

0 

10 

0.1319 

0.0034 

0 

0 

0 

20 

0.0012 

0.0993 

0 

0 

0 

30 

0 

0.0052 

0 

0 

0 

40 

0 

0 

0.0108 

0 

0 

50 

0 

0 

0.0796 

0 

0 

60 

0 

0 

0.0108 

0 

0 

70 

0 

0 

0 

0.0052 

0 

80 

0 

0 

0 

0.0993 

0.0012 

90 

0 

0 

0 

0.0034 

0.1319 

91 

0 

0 

0 

0.0015 

0.1304 

92 

0 

0 

0 

0.0006 

0.1148 

93 

0 

0 

0 

0.0002 

0.0889 

94 

0 

0 

0 

0.0001 

0.0596 

95 

0 

0 

0 

0 

0.0339 

96 

0 

0 

0 

0 

0.0159 

97 

0 

0 

0 

0 

0.0059 

98 

0 

0 

0 

0 

0.0016 

99 

0 

0 

0 

0 

0.0003 

100 

0 

0 

0 

0 

0 
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Poi(X), Poisson Probability Distribution 

X k ■ e~ x 

P(X = k) = -j— for Several Values of 

x\ 

Parameter X 


k 

X 0.1 

0.5 

1 

2 

5 

10 

1 

0.0905 

0.3033 

0.3679 

0.2707 

0.0337 

0.0005 

2 

0.0045 

0.0758 

0.1839 

0.2707 

0.0842 

0.0023 

3 

0.0002 

0.0126 

0.0613 

0.1804 

0.1404 

0.0076 

4 

0 

0.0016 

0.0153 

0.0902 

0.1755 

0.0189 

5 

0 

0.0002 

0.0031 

0.0361 

0.1755 

0.0378 

6 

0 

0 

0.0005 

0.0120 

0.1462 

0.0631 

7 

0 

0 

0.0001 

0.0034 

0.1044 

0.0901 

8 

0 

0 

0 

0.0009 

0.0653 

0.1126 

9 

0 

0 

0 

0.0002 

0.0363 

0.1251 

10 

0 

0 

0 

0 

0.0181 

0.1251 

11 

0 

0 

0 

0 

0.0082 

0.1137 

12 

0 

0 

0 

0 

0.0034 

0.0948 

13 

0 

0 

0 

0 

0.0013 

0.0729 

14 

0 

0 

0 

0 

0.0005 

0.0521 

15 

0 

0 

0 

0 

0.0002 

0.0347 

16 

0 

0 

0 

0 

0 

0.0217 

17 

0 

0 

0 

0 

0 

0.0128 

18 

0 

0 

0 

0 

0 

0.0071 

19 

0 

0 

0 

0 

0 

0.0037 

20 

0 

0 

0 

0 

0 

0.0019 

50 

0 

0 

0 

0 

0 

0 

100 

0 

0 

0 

0 

0 

0 


APPENDIX B BINOMIAL 
AND MULTINOMIAL 
COEFFICIENTS 

In this appendix, we explain the concept of the 
binomial and multinomial coefficients used in 
discrete probability distributions. 


BINOMIAL COEFFICIENT 

The binomial coefficient is defined as 

n\ _ n\ 
k) = k\(n - k)\ 

for some nonnegative integers k and n with 0 < 
k < n. For the binomial coefficient, we use the 
factorial operator denoted by the “\" symbol. A 
factorial is defined in the set of natural numbers 


N that is k = 1,2, 3,... as 

k\ = k ■ (k - 1) • (k - 2) •... • 1 
For k — 0, we define 0! = 1. 

Derivation of the Binomial 
Coefficient 

In the context of the binomial distribution, we 
form the sum X of n independent and iden¬ 
tically distributed Bernoulli random variables 
Y, with parameter p or, formally, Y,~ B(p), 
i = 1,2,..., n. The random variable is then dis¬ 
tributed binomial with parameters n and p, i.e., 
X ~ B(n, p). Since the random variables Y, have 
either value 0 or 1 , the resulting binomial ran¬ 
dom variable (i.e., the sum X) assumes some 
integer value between 0 and n. Let X = k for 
0 <k<n. Depending on the exact value k, there 
may be several alternatives to obtain k since, 
for the sum X, it is irrelevant in which order the 
individual values of the Y, appear. 

Special Case n = 3 

We illustrate the special case where n = 3 using a 
£>(3,0.4) random variable X; that is, X is the sum 
of three independent B(0.4) distributed random 
variables Yi, Y 2 , and Y 3 . All possible values for 
X are contained in the state space Q' = {0, 1, 2, 
3}. As we will see, some of these k e £2' can be 
obtained in different ways. 

We start with k = 0. This value can only be 
obtained when all Y, are 0, for i = 1, 2, 3. So, 
there is only one possibility. 

Next we consider k = 1. A sum of X = 1 can 
be the result of one Y, = 1 while the remain¬ 
ing two Yj are 0. We have three possibilities for 
Y, = 1 since it could be either the first, the sec¬ 
ond, or the third of the Bernoulli random vari¬ 
ables. Then we place the first 0. For this, we have 
two possibilities since we have two Y, left that 
are not equal to 1. Next, we place the second 0, 
which we have to assign to the remaining Y, . 
As an intermediate result, we have 3-21=6 
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Y l = 1 • 

® 


© 

© 

Y 2 =! ® 

• 

© = © 

• 

© 

y 3 =1 © 

© 

• = ® 

© 

• 

Y. 

y 2 

Y Y 

a 3 1 1 

y 2 

Y 3 


3 

Figure B.l Three Different Ways to Obtain a Total of X = ^ Y = 1 

i=i 

Note: The alternatives matched by the = symbol lead to the same outcome, respectively. 


possibilities. However, we do not need to dif¬ 
ferentiate between the two 0 values because it 
does not matter which of the zeros is assigned 
first and which second. So, we divide the total 
number of options by the number of possibili¬ 
ties to place the 0 values (i.e., 2). The resulting 
number of possible ways to end up with X = 1 
is 

3-2-1 3! 

2 ~ 2 ! • 1 ! “ 

For reasons we will make clear later, we intro¬ 
duced the middle term in the above equation. 

Let us illustrate this graphically In Figure B.l, 
a black ball represents a value Y; = 1 at the 
z-th drawing while the white numbered circles 
represent a value of Y, = 0 at the respective 
z-th drawing with z matching the number in the 
circle. 

Now let k = 2. To yield the sum X = 2, we 
need two Y; = 1 and one Y, = 0. So, we have 
three different positions to place the 0, while 
the remaining two Y, have to be equal to 1 au¬ 
tomatically. Analogous to the prior case, X = 1, 
we do not need to differentiate between the two 
1 values, once the 0 is positioned. 


Finally, let k — 3. This is accomplished by all 
three Y, = 1. So, there is only one possibility to 
obtain X = 3. 

We summarize these results in Table B.l. 


Special Case n = 4 

We extend the prior case to the case where 
the random variable X is the sum of four 
Bernoulli distributed random variables—that 
is, Y, nd B(p). i = 1, 2, 3. 4—assuming either 
value 0 or 1 for each. The resulting sum X is 
then binomial distributed B( 4, p) assuming 
values k in the state space Q' — {0,1,2,3,4}. 
Again, we will analyze how the individual 
values of the sum X can be obtained. 

To begin, let us consider the case k — 0. As in 
the prior case n — 3, we have only one possibil¬ 
ity (i.e., all four Y, equal to 0, that is, Yi = Y 2 = 
Y 3 = Y 4 = 0). This can be seen from the follow¬ 
ing. Technically, we have four positions to place 
the first 0. Then, we have three choices to place 
the second 0. For the third 0, we have two po¬ 
sitions available, and one for the last 0. In total, 
we have 

4x3x2x1 = 24 


Table B.l Different Choices to Obtain X = k when n = 3 


k =0 fc=1 fc=2 fc=3 


1 = 


3! 

0! x 3! 



3 = 


3! 

1! x 2! 



3 = 


3! 

2! x 1! 



3! = / 3 

3! x 0! \3 
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Y, 

Y, 

Y, 



© © ©=© o ® © 
= o © © © 
= © © © © 


YYYYYYYY 

1 1 1 2 1 3 1 4 M *2 A 3 1 4 

Figure B.2 Four Different Ways to Obtain Y\ = Y 2 = 1 


But due to the fact that we do not care about 
the order of the 0 values, we divide by the total 
number of options (i.e., 24) and then obtain 

4 x 3 x 2 x 1 4! 

4 x 3 x 2 x 1 ~ 4! “ 1 

Next, we derive a sum of k = 1. This can be 
obtained in four different ways. The reasoning 
is similar to that in the case k = 1 for n = 3. 
We have four positions to place the 1. Once the 
1 is placed, the remaining Y, have to be auto¬ 
matically equal to 0. Again, the order of placing 
the 0 values is irrelevant, which eliminates the 
redundant options through division of the total 
number by 3 x 2 x 1 = 6 . Technically, we have 

4 x 3 x 2 x 1 4! 

3x2x1 ~ 3! ~~ 

For a sum X equal to k — 2, we have four dif¬ 
ferent positions to place the first 1. Then, we 
have three positions left to place the second 1 . 
This yields 4 x 3 = 12 different options. How¬ 
ever, we do not care which one of the 1 values 
is placed first since, again, their order is irrel¬ 
evant. So, we divide the total number by 2 to 

Y . = Y 3 =1 © Q d 


indicate that the order of the two 1 values is 
unimportant. Next, we place the first 0, which 
offers us two possible positions for the remain¬ 
ing Y, that are not equal to 1 already. For this, 
we have two options. In total, we then have 

4 x 3 x 2 x 1 4! 

-= — = 12 

2x1 2! 

possibilities. Then, the second 0 is placed on the 
remaining Y,. So, there is only one choice for 
this 0. Because we do not care about the order 
of placement of the 2 values, we divide by 2 . 
The resulting number of different ways to yield 
a sum X of k = 2 is 

4 x 3 x 2 x 1 4! 

2 x 1 x 2 x 1 “ 2 ! x 2 ! _ 6 

which is illustrated in Figures B.2 through B.7. 

A sum of X equal to k = 3 is achieved by three 
1 values and one 0 value. So, since the order of 
the 1 values is irrelevant due to the previous 
reasoning, we only care about where to place 
the 0 value. We have four possibilities, that is, 

4 x 3 x 2 x 1 4! 

3x2x1 ~~ 3! ~ 

©=© © © © 


= © © © © 
= © © © © 


Figure B.3 Four Different Ways to Obtain Yi = Y 3 = 1 
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y i= y 4 =i ©©©© = ©©©© 


= 0 © ® O 
= © © ® © 


Figure B.4 Four Different Ways to Obtain Yj = Y 4 = 1 


y 2 = y 3 = i ® O © © = ® © © © 


Y, = Y, = 1 


Y, = Y, = 1 


= ® © © ® 
= © © © © 


YYYYYYYY 
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Figure B.5 Four Different Ways to Obtain Y 2 = Y 3 = 1 


y 2 = y 4 =i © © © © = ® © © © 

y 2 = y 4 =i =© © ® © 

y 2 = y 4 =i =© © © © 


YYYYYYYY 

1 1 1 2 1 3 A 4 Aj A 2 ± 3 A 4 

Figure B.6 Four Different Ways to Obtain Y 2 = Y 4 = 1 


y 3 = y 4 = i ®@©©=®@©© 
y 3 = y 4 = i =© © © © 

y 3 = y 4 = i =© ® © © 


Figure B.7 Four Different Ways to Obtain Y 3 = Y 4 = 1 
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Table B.2 Different Choices to Obtain X = k when n = 4 


fc = 0 


k = 1 


fc = 2 


fc = 3 


fc = 4 


41 - f 4 ^l 1 - 41 _ f 4 ') 4! _f 4 \ , 4! _/4\ , 4! _ /4 

0! x 4! — \0y 1! x 3! \ 4 / 2!x2! \2j — 3! x 1! — \3y ~4!x0! \ 4 


Finally, to obtain k = 4, we only have one 
possible way to do so, as in the case where 
k — 0. Mathematically, this is 

4 x 3 x 2 x 1 4! 

4 x 3 x 2 x 1 “ 4! ~ 1 

We summarize the results in Table B.2. 

General Case 

Now we generalize for any n e N (i.e., some 
nonnegative integer number). The binomial 
random variable X is hence the B(n, p) dis¬ 
tributed sum of n independent and identically 
distributed random variables Y, 

From the two special cases (i.e., n = 3 and 
n — 4), it seems that to obtain the number of 
choices for some 0 < k < n, we have n! in the 
numerator to account for all the possibilities 
to assign the individual n values to the Y„ no 
matter how many 1 values and 0 values we 
have. In the denominator, we correct for the 
fact that the order of the 1 values and 0 values 
is irrelevant. That is, we divide by the number 
of different orders to place the 1 values on the 
Y, that are equal to 1, and also by the number 
of different orders to assign the 0 values to the 
Y, being equal to 0. Therefore, we have n\ in the 


numerator and k\ x (n — fc)! in the denominator. 
The result is illustrated in Table B.3. 

MULTINOMIAL 

COEFFICIENT 

The multinomial coefficient is defined as 

n \ n\ 

Ill «2-••«/<;/ H\\ ■ U2 ] - ■ ■ ■. ■ nf. 

for n\ + U2 + ... + nk = n. Sometimes, the multi- 
nominal coefficient is referred to as the polyno¬ 
mial coefficient. 

Assume we have some population of balls 
with fc different colors. Suppose n times we 
draw some ball and return it to the popula¬ 
tion such that for each trial (i.e., drawing), we 
have the identical conditions. Flence, the indi¬ 
vidual trials are independent of each other. Let 
Y, denote the color obtained in the i-th trial for 
i = 1 , 2 ,..., n. 

How many different possible samples of 
length n are there? Let us think of the draw¬ 
ings in a different way. That is, we draw one 
ball after another disregarding color and assign 
the drawn ball to the trials Yi through Y n in an 
arbitrary fashion. 


Table B.3 Different Choices to Obtain X = k for General n 


fc = 0 


fc = 1 


fc = 2 


1 = 


0! x n\ V 0 


1! x (n — 1)! 


n x (n — 1) n! 

2 “ 2! x (n - 2)! 


k = n — 1 


k = n 


(n — 1)! x 1! 


n 

n — 1 


n! 


n\ x 0! 
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First, we draw a ball with any of the k colors 
and assign it to one of the n trials, Y,. Next, we 
draw the second ball and assign it to one of the 
remaining n — 1 possible trials i as outcome of 
Yj. This yields 

n x (n — 1 ) 

different possibilities. The third ball drawn is 
assigned to the n — 2 trials left so that we have 

n x (n — 1 ) x (n — 2 ) 

possibilities, in total. This is continued until we 
draw the nth (i.e., the last), color, which can only 
be placed in the last remaining trial Y,. In total 
this yields 

n x (ft — 1 ) x (n — 2 ) x ... x 2 x 1 = n\ 

different possibilities of drawing n balls. 

The second question is how many different 
possibilities are there to obtain a sample with 
the number of occurrences n\, iij, ..., and of 
the respective colors. Let red be one of these 
colors and suppose we have a sample with a 
total of n r = 3 red balls from trials 2,4, and 7 so 
that Y 2 = Y 4 = Y 7 = red. The assignment of red 
to these three trials yields 

3! = 3x2xl = 6 

different orders of assignment. Now, we are in¬ 
different with respect to which of the Y2, Y4, 
and Y7 was assigned red first, second, and third. 
Thus, we divide the total number n! of different 
samples by n r \ — 3! to obtain only nonredundant 
results with respect to a red ball. We proceed in 
the same fashion for the remaining colors and, 
finally, obtain for the total number of nonredun¬ 
dant samples containing of color 1 , j ?2 of color 
2 ,..., and of color k 

n\ \ _ n\ 

n\ ri2 ... rik ) m\ x 712! x ... x n^! 

which is exactly the multinomial coefficient 
equation given above. 


KEY POINTS 

• A discrete law or probability distribution is 
related to some discrete random variable, that 
is, a random variable that can assume values 
from a countable set of values. Typical exam¬ 
ples include counts (i.e., the number of items 
meeting certain requirements) and number of 
hits. 

• The most important discrete random vari¬ 
ables used in finance and their probability 
distribution are the Bernoulli, binomial, hy¬ 
pergeometric, multinomial, Poisson, and dis¬ 
crete uniform. 

• The Bernoulli distribution might be the most 
famous discrete law. It is applied when a ran¬ 
dom variable can only assume one of two 
values—0 or 1. A simple example would be 
the toss of a coin. In financial models, it is ap¬ 
plied if it is of interest whether a certain event 
has occurred ( 1 ) or not ( 0 ). 

• The binomial distribution is the extension of 
the Bernoulli distribution in the sense that it 
represents repeated trials where the respec¬ 
tive outcomes are either 0 or 1 , so that in total 
we can obtain any integer between 0 and n, 
where n is the number of Bernoulli trials. A 
typical example in finance would be given 
by the binomial stock price model where it 
is the objective to count the number of up- 
movements of some stock over a given num¬ 
ber of periods. 

• Drawing with replacement refers to the 
experiment of repeated trials where each 
individual trial is conducted under identi¬ 
cal conditions as the others and without 
influencing each other. A prerequisite of 
the binomial distribution is drawing with 
replacement. 

• The Poisson distribution is related to a dis¬ 
crete random variable that can assume any 
nonnegative integer value. A typical appli¬ 
cation is in risk theory when the number of 
defaults or occurrences of some undesirable 
event has to be modeled. 
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NOTE 

1. Note that the successive prices Si,..., Sj 
depend on their respective predecessors. 
They are said to be path-dependent. Only the 
changes, or factors Yf+i, for each period are 
independent. In this case, the price S f+ i de¬ 
pends only on St, however, and not the en¬ 


tire past. This is referred to as the Markov 
property. 
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Abstract: Continuous probability distributions are needed when the random variable of interest 
can assume any value inside of one or more intervals of real numbers such as, for example, 
any number greater than zero. Asset returns, for example, whether measured monthly, weekly, 
daily, or at an even higher frequency are commonly modeled as continuous random variables. 
In contrast to discrete probability distributions that assign positive probability to certain discrete 
values, continuous probability distributions assign zero probability to any single real number. 
Instead, only entire intervals of real numbers can have positive probability such as, for example, 
the event that some asset return is not negative. For each continuous probability distribution, this 
necessitates the so-called probability density, a function that determines how the entire probability 
mass of one is distributed. The density often serves as the proxy for the respective probability 
distribution. 


In this entry, we introduce the concept of con¬ 
tinuous probability distributions. We present 
the continuous distribution function with its 
corresponding density function, a function 
unique to continuous probability laws. In this 
entry, parameters of location and scale such 
as the mean and higher moments—variance 
and skewness—are defined. For a more tech¬ 
nical discussion of continuous distributions, 
see Evans, blastings, and Peacock (2000) or 
Johnson, Kotz, and Balakrishnan (1995). 


CONTINUOUS PROBABILITY 
DISTRIBUTION DESCRIBED 

Suppose we are interested in outcomes that are 
no longer countable. Examples of such out¬ 
comes in finance are daily logarithmic stock 
returns, bond yields, and exchange rates. Tech¬ 
nically, without limitations caused by rounding 
to a certain number of digits, we could imag¬ 
ine that any real number could provide a fea¬ 
sible outcome for the daily logarithmic return 
of some stock. That is, the set of feasible values 
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that the outcomes are drawn from (i.e., the space 
Q) is uncountable. The events are described 
by continuous intervals such as, for example, 
(—0.05, 0.05], which, referring to our example 
with daily logarithmic returns, would represent 
the event that the return at a given observation 
is more than —5% and at most 5%. 

In the context of continuous probability dis¬ 
tributions, we have the real numbers R as the 
uncountable space ft. The set of events is given 
by the Borel a -algebra B, which is based on the 
half-open intervals of the form (—oo, a ], for any 
real a. The space R and the a -algebra B form the 
measurable space (R, B), which we are to deal 
with throughout this entry. 


DISTRIBUTION FUNCTION 

To be able to assign a probability to an event in 
a unique way, in the context of continuous dis¬ 
tributions we introduce as a device the contin¬ 
uous distribution function F(a), which expresses 
the probability that some event of the sort 
(—oo, a] occurs (i.e., that a number is realized 
that is at most a). (Formally, an outcome co e £1 is 
realized that lies inside of the interval (—oo, a ].) 
As with discrete random variables, this function 
is also referred to as the cumulative distribution 
function (cdf) since it aggregates the probability 
up to a certain value. 

To relate to our previous example of daily 
logarithmic returns, the distribution function 
evaluated at say 0.05, that is, F(0.05), states 
the probability of some return of at most 5%. 
(The distribution function F is also referred to 
as the cumulative probability distribution function 
(often abbreviated cdf) expressing that the 
probability is given for the accumulation of all 
outcomes less than or equal to a certain value.) 

For values x approaching — oo, F tends to zero, 
while for values x approaching oo, F goes to 1. 
In between, F is monotonically increasing and 
right-continuous. More concisely, we list these 


properties below: 

Property 1. F(x) ■ A ~ > - °°> 0 
Property 2. F ( x ) ■ A ~ > ” > 1 
Property 3. F(b) — F(a) > 0 for b > a 
Property 4. lim F (x) = F(a) 

x\.a 

The behavior in the extremes—that is when x 
goes to either —oo or oo—is provided by prop¬ 
erties 1 and 2, respectively. Property 3 states 
that F should be monotonically increasing (i.e., 
never become less for increasing values). Fi¬ 
nally, property 4 guarantees that F is right- 
continuous. 

Let us consider in detail the case when F(x) is 
a continuous distribution, that is, the distribu¬ 
tion has no jumps. The continuous probability 
distribution function F is associated with the 
probability measure P through the relationship 

F(a) — P((—oo,«]) 
that is, that values up to a occur, and 

F(b)-F(a) = P((a,b]) (1) 

Therefore, from equation (1) we can see that 
the probability of some event related to an inter¬ 
val is given by the difference between the value 
of F at the upper bound b of the interval minus 
the value of F at the lower bound a. That is, the 
entire probability that an outcome of at most 
a occurs is subtracted from the greater event 
that an outcome of at most b occurs. Using set 
operations, we can express this as 

(a,b] = (—oo, i>]\(—oo, a] 

For example as we have seen, the event of a 
daily return of more than —5% and, at most, 
5% is given by (—0.05, 0.05]. So, the probability 
associated with this event is given by P((—0.05, 
0.05]) = F(0.05) - F(—0.05). 

In contrast to a discrete probability distri¬ 
bution, a continuous probability distribution 
always assigns zero probability to countable 
events such as individual outcomes a, or unions 
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thereof such as 

OO 

U «; 

1=1 

That is, 

P({a,}) = 0, for all cii 

p(0«,)=° (2) 

From equation (2), we can apply the left-hand 
side of equation (1) also to events of the form 
(,a,b ) to obtain 

P((a,b)) = F(b)-F(a) (3) 

Thus, it is irrelevant whether we state the proba¬ 
bility of the daily logarithmic return being more 
than —5% and at most 5%, or the probability of 
the logarithmic return being more than —5% 
and less than 5%. They are the same because 
the probability of achieving a return of exactly 
5% is zero. With a space £2 consisting of un- 
countably many possible values such as the set 
of real numbers, for example, each individual 
outcome is unlikely to occur. So, from a proba¬ 
bilistic point of view, one should never bet on 
an exact return or, associated with it, one par¬ 
ticular stock price. 

Since countable sets produce zero probabil¬ 
ity from a continuous probability measure, they 
belong to the so-called P-null sets. All events as¬ 
sociated with P-null sets are unlikely events. 

So, how do we assign probabilities to events 
in a continuous environment? The answer is 
given by equation (3). That, however, presumes 
knowledge of the distribution function F. The 
next task is to define the continuous distribution 
function F more specifically as explained next. 


DENSITY FUNCTION 

The continuous distribution function F of a 
probability measure P on (R. B) is defined as 
follows 

F(x)= J f(t)dt (4) 

— OO 


where/(f) is the density function of the probabil¬ 
ity measure P. 

We interpret equation (4) as follows. Since, 
at any real value x the distribution function 
uniquely equals the probability that an outcome 
of at most x is realized, that is, F(x) — P((—oo, x]), 
equation (4) states that this probability is ob¬ 
tained by integrating some function/ over the 
interval from —oo up to the value x. 

What is the interpretation of this function/? 
The function / is the marginal rate of growth 
of the distribution function F at some point 
x. We know that with continuous distribution 
functions, the probability of exactly a value of x 
occurring is zero. However, the probability 
of observing a value inside of the interval 
between x and some very small step to the 
right Ax (i.e., [x, x + Ax)) is not necessarily 
zero. Between x and x + Ax, the distribution 
function F increases by exactly this probability; 
that is, the increment is 


F(x + Ax) — F(x) = P (X e[x, x + Ax)) (5) 


Now, if we divide F (x + Ax) — F(x) from 
equation (5) by the width of the interval Ax, we 
obtain the average probability or average incre¬ 
ment of F per unit step on this interval. If we 
reduce the step size Ax to an infinitesimally 
small step 3x, this average approaches the 
marginal rate of growth of F at x, which we denote 
/; that is, 1 


lim 

Ax^O 


F (x + Ax) — F(x) 
Ax 


3 F (x) 
3x 


= /(x) 


( 6 ) 


At this point, let us recall the histogram with 
relative frequency density for class data. Over 
each class, the height of the histogram is given 
by the density of the class divided by the width 
of the corresponding class. Equation (6) is some¬ 
what similar if we think of it this way. We divide 
the probability that some realization should be 
inside of the small interval. And, by letting 
the interval shrink to width zero, we obtain 
the marginal rate of growth or, equivalently, the 
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derivative of F. (We assume that F is continuous 
and that the derivative of F exists.) Hence, we 
call / the probability density function or simply the 
density fi met ion. Commonly, it is abbreviated as 
pdf. 

Now, when we refocus on equation (4), we 
see that the probability of some occurrence of 
at most x is given by integration of the den¬ 
sity function/ over the interval (—oo, x]. Again, 
there is an analogy to the histogram. The rel¬ 
ative frequency of some class is given by the 
density multiplied by the corresponding class 
width. With continuous probability distribu¬ 
tions, at each value f, we multiply the cor¬ 
responding density /(f) by the infinitesimally 
small interval width dt. Finally, we integrate all 
values of/ (weighted by dt) up to x to obtain the 
probability for (— 00 , x\. This, again, is similar to 
histograms: In order to obtain the cumulative 
relative frequency at some value x, we compute 
the area covered by the histogram up to value x. 

In Figure 1, we compare the histogram and 
the probability density function. The histogram 


with density h is indicated by the dotted lines 
while the density function / is given by the 
solid line. We can now see how the probabil¬ 
ity P((— 00 , x*]) is derived through integrating 
the marginal rate / over the interval (— 00 , x*] 
with respect to the values t. The resulting total 
probability is then given by the area Ai of the 
example in Figure 1. This is analogous to class 
data where we would tally the areas of the rect¬ 
angles whose upper bounds are less than x* and 
the part of the area of the rectangle containing 
x* up to the dash-dotted vertical line. 

Requirements on the 
Density Function 

Given the uncountable space R (i.e., the real 
numbers) and the corresponding set of events 
given by the Borel a -algebra B, we can give a 
more rigorous formal definition of the density 
function. The density function f of probability 
measure P on the measurable space (M, B) with 
distribution function F is a Borel-measurable 



Figure 1 Comparison of Histogram and Density Function 

Note: Area A\ represents probability P((— 00 , x*]) derived through integration of/(f) with respect to t 
between —00 and x* 
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function/ satisfying. 

X 

P((-oo,x]) = F(x)= j /(f)* (7) 

— OO 

with /(f) > 0, for all f e K and 

OO 

f /( 0 * = 1 

—OO 

By the requirement of Borel-measurability, we 
simply assume that the real-valued images gen¬ 
erated by / have their origins in the Borel 
a-algebra B. Informally, for any value y — f(t), 
we can trace the corresponding origin(s) f in B 
that is (are) mapped to y through the function/. 
Otherwise, we might incur problems comput¬ 
ing the integral in equation (7) for reasons that 
are beyond the scope of this entry. 

From definition of the density function given 
by equation (7), we see that it is reasonable that/ 
be a function that exclusively assumes nonneg¬ 
ative values. Although we have not mentioned 
this so far, it is immediately intuitive since / is 
the marginal rate of growth of the continuous 
distribution function F. At each f,/(t) • dt repre¬ 
sents the limit probability that a value inside of 
the interval (f, t + dt] should occur, which can 
never be negative. Moreover, we require the in¬ 


tegration of / over the entire domain from — oo 
to oo to yield 1, which is intuitively reasonable 
since this integral gives the probability that any 
real value occurs. 

The requirement 

OO 

/ /(f)* = 1 

—OO 

implies the graphical interpretation that the 
area enclosed between the graph off over the 
entire interval (—oo, oo) and the horizontal axis 
equals one. This is displayed in Figure 2 by the 
shaded area A. For example, to visualize graph¬ 
ically what is meant by 

X 

j fm 

—oo 

in equation (7), we can use Figure 1. Suppose 
the value x were located at the intersection of 
the vertical dash-dotted line and the horizontal 
axis (i.e., x*). Then, the shaded area A\ repre¬ 
sents the value of the integral and, therefore, 
the probability of occurrence of a value of at 
most x. To interpret 

b 

f /m 



Figure 2 Graphical Interpretation of the Equality A= f f (x)dx = 1 

—OO 
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b 

Figure 3 Graphical Interpretation of A = f f{x)dx 

a 


graphically, look at Figure 3. The area represent¬ 
ing the value of the interval is indicated by A. 
So, the probability of some occurrence of at least 
a and at most b is given by A. Here again, the 
resemblance to the histogram becomes obvious 
in that we divide one area above some class, for 
example, by the total area, and this ratio equates 
the according relative frequency. 

For the sake of completeness, it should be 
mentioned without indulging in the reasoning 
behind it that there are probability measures 
P on (K, B) even with continuous distribution 
functions that do not have density functions as 
defined in equation (7). But, in our context, we 
will only regard probability measures with con¬ 
tinuous distribution functions with associated 
density functions so that the equalities of equa¬ 
tion (7) are fulfilled. 

Sometimes, alternative representations equiv¬ 
alent to equation (7) are used. Typically, the fol- 


lowing expressions are used 


F(x) = 

J f(t ) • l(-oo ,x]dt 

(8a) 


R 

OO 


F(x) = 

J f{t) ■ l(_oc ,x]dt 

(8b) 


—OO 


OO 


F(x) = 

j P(dt) 

OO 

(8c) 

F(x) = 

[ dP(t) 

(8d) 


Note that in the first two equalities, (8a) and 
(8b), the indicator function l( a ,b] is used. The last 
two equalities, (8c) and (8d), can be used even 
if there is no density function and, therefore, 
are of a more general form. We will, however, 
predominantly apply the representation given 
by equation (7) and occasionally resort to the 
last two forms above. 

We introduce the term support at this point 
to refer to the part of the real line where the 
density is truly positive, that is, all those x where 
f(x) > 0. 

CONTINUOUS RANDOM 
VARIABLE 

So far, we have only considered continuous 
probability distributions and densities. We yet 
have to introduce the quantity of greatest inter¬ 
est to us in this entry, the continuous random vari¬ 
able. For example, stock returns, bond yields. 
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and exchange rates are usually modeled as con¬ 
tinuous random variables. 

Informally stated, a continuous random vari¬ 
able assumes certain values governed by a 
probability law uniquely linked to a contin¬ 
uous distribution function F. Consequently, 
it has a density function associated with its 
distribution. Often, the random variable is 
merely described by its density function rather 
than the probability law or the distribution 
function. 

By convention, let us indicate the random 
variables by capital letters. Recall that any ran¬ 
dom variable, and in particular a continuous 
random variable X, is a measurable function. Let 
us assume that X is a function from the prob¬ 
ability space £2 = R into the state space £2' = R. 
That is, origin and image space coincide. The 
corresponding er-algebrae containing events of 
the elementary outcomes co and the events in 
the image space X(a>), respectively, are both 
given by the Borel a-algebra B. Now, we can 
be more specific by requiring the continuous 
random variable X to be a B — B-measurable 
real-valued function. That implies, for exam¬ 
ple, that any event X e (a. b], which is in B, 
has its origin X -1 ((a,b]) in B, as well. Measur¬ 
ability is important when we want to derive 
the probability of events in the state space 
such as X e (a, b] from original events in the 
probability space such as X -1 ((a,b]). At this 
point, one should not be concerned that the 
theory is somewhat overwhelming. It will be¬ 
come easier to understand once we move to the 
examples. 


COMPUTING PROBABILITIES 
FROM THE DENSITY 
FUNCTION 

The relationship between the continuous ran¬ 
dom variable X and its density is given by the 
following. 2 Suppose X has density /, then the 
probability of some event X<iorXe(d,6]is 


computed as 

X 

P(X < x) = J f(t)dt 

—oo 

b 

P(Xe(a,b]) = f f(t)dt (9) 

a 

which is equivalent to F(x) and F(b) — F(a) re¬ 
spectively, because of the one-to-one relation¬ 
ship between the density / and the distribution 
function F of X. 

As explained earlier, using indicator func¬ 
tions, equation (9) could be alternatively 
written as 

OO 

P(X < x) = j l ( _oo , x] (t)f(t)dt 

—OO 

oo 

P(Xe(a,b])= j 1 {aM (t)f(t)dt 

— OO 

In the following, we will introduce parame¬ 
ters of location and spread such as the mean 
and the variance, for example. In contrast to the 
data-dependent statistics, parameters of ran¬ 
dom variables never change. Some probability 
distributions can be sufficiently described by 
their parameters. They are referred to as para¬ 
metric distributions. For example, for the normal 
distribution we introduce shortly, it is sufficient 
to know the parameters mean and variance to 
completely determine the corresponding distri¬ 
bution function. That is, the shape of parametric 
distributions is governed only by the respective 
parameters. 


LOCATION PARAMETERS 

The most important location parameter is the 
mean that is also referred to as the first moment. 
It is the only location parameter presented in 
this entry. 

The mean can be thought of as an average 
value. It is the number that one would have to 
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expect for some random variable X with given 
density function/. The mean is defined as fol¬ 
lows: Let X be a real-valued random variable 
on the space £2 = R with Borel a -algebra B. The 
mean is given by 

OO 

E(X)= J x ■ f(x)dx (10) 

— OO 

in case the integral on the right-hand side of 
equation (10) exists (i.e., is finite). Typically, the 
mean parameter is denoted as fi. 

In equation (10) that defines the mean, we 
weight each possible value x that the random 
variable X might assume by the product of the 
density at this value,/(x), and step size dx. Re¬ 
call that the product/(x) • dx can be thought 
of as the limiting probability of attaining the 
value x. Finally, the mean is given as the inte¬ 
gral over these weighted values. Thus, equation 
(10) is similarly understood as the definition of 
the mean of a discrete random variable where, 
instead of integrated, the probability-weighted 
values are summed. 

DISPERSION PARAMETERS 

We turn our focus toward measures of spread 
or, in other words, dispersion measures. Again, 
as with the previously introduced measures 
of location, in probability theory the disper¬ 
sion measures are universally given parame¬ 
ters. Here, we introduce the moments of higher 
order, variance, standard deviation, and the 
skewness parameters. 

Moments of Higher Order 

It might sometimes be necessary to compute 
moments of higher order. As we already know 
from descriptive statistics, the mean is the mo¬ 
ment of order one. (Alternatively, we often say 
th e first moment. For the higher orders k, we 
consequently might refer to the k -th moment.) 
However, one might not be interested in the ex¬ 
pected value of some quantity itself but of its 
square. If we treat this quantity as a continu¬ 


ous random variable, we compute what is the 
second moment. 

Let X be a real-valued random variable on 
the space £2 = R with Borel a -algebra B. The 
moment of order k is given by the expression 

OO 

E (X k ) = J x k • f(x)dx (11) 

— OO 

in case the integral on the right-hand side of 
equation (11) exists (i.e., is finite). 

From equation (11), we learn that higher- 
order moments are equivalent to simply com¬ 
puting the mean of X taken to the k- th power. 

Variance 

The variance involves computing the expected 
squared deviation from the mean E(X) = /j. of 
some random variable X. For a continuous ran¬ 
dom variable X, the variance is defined as fol¬ 
lows: Let X be a real-valued random variable on 
the space £2 = R with Borel a -algebra B, then 
the variance is 

OO 

Var(X) = J (x - £(X)) 2 ■ f{x)dx 

—OO 

OO 

= J(x- p) 2 ■ f(x)dx (12) 

— OO 

in case the integral on the right-hand side of 
equation (12) exists (i.e., is finite). Often, the 
variance in equation (12) is denoted by the 
symbol <r 2 . 

In equation (12), at each value x, we square 
the deviation from the mean and weight it by 
the density at x times the step size dx. The latter 
product, again, can be viewed as the limiting 
probability of the random variable X assuming 
the value x. The square inflates large deviations 
even more compared to smaller ones. For some 
random variable to have a small variance, it is 
essential to have a quickly vanishing density in 
the parts where the deviations (x — n) become 
large. 

All distributions that we discuss in this en¬ 
try are parametric distributions. For some of 
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Figure 4 Two Density Functions Yielding Common Means, //1 = 112 , but Different Variances, cr 2 < cr 2 
Note: Dashed graph: cr 2 = 1. Solid graph: cr| = 1.5. 


them, it is enough to know the mean // and 
variance cr 2 and consequently, we will resort 
to these two parameters often. Historically, the 
variance has often been given the role of risk 
measure in context of portfolio theory. Sup¬ 
pose we have two random variables Ri and 
R 2 representing the returns of two stocks. Si 
and S 2 , with equal means /i r, and ji r 2 , respec¬ 
tively so that /i r, = /xr 2 . Moreover, let Ri and 
R 2 have variances a ( 2 and er| 2 , respectively with 
a R 1 < °r 2 ■ Then, omitting further theory at this 
moment, we prefer Si to S 2 because of the Si's 
smaller variance. We demonstrate this in Fig¬ 
ure 4. The dashed line represents the graph 
of the first density function while the second 
one is depicted by the solid line. Both density 
functions yield the same mean (i.e., jX\ — il 2 ). 
However, the variance from the first density 
function, given by the dashed graph, is smaller 
than that of the solid graph (i.e., cr 2 < er|). Thus, 
using variance as the risk measure and resort¬ 
ing to density functions that can be sufficiently 
described by the mean and variance, we can 
state that density function for Si (dashed graph) 
is preferable. We can interpret the figure as 
follows. 


Since the variance of the distribution with the 
dashed density graph is smaller, the probabil¬ 
ity mass is less dispersed over all x values. 
Hence, the density is more condensed about 
the center and more quickly vanishing in the 
extreme left and right ends, the so-called tails. 
On the other hand, the second distribution with 
the solid density graph has a larger variance, 
which can be verified by the overall flatter and 
more expanded density function. About the 
center, it is lower and less compressed than the 
dashed density graph, implying that the second 
distribution assigns less probability to events 
immediately near the center. However, the den¬ 
sity function of the second distribution decays 
more slowly in the tails than the first, which 
means that under the governance of the latter, 
extreme events are less likely than under the 
second probability law. 

Standard Deviation 

The parameter related to the variance is the 
standard deviation. As we know from descrip¬ 
tive statistics described earlier in this book, the 
standard deviation is the positive square root 
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of the variance. That is, let X be a real-valued 
random variable on the space £2 = R. with Borel 
a -algebra B. Furthermore, let its mean and vari¬ 
ance be given by /x and a 2 , respectively. The 
standard deviation is defined as 

ct = ^Var(X) 

For example, in the context of stock returns, 
one often expresses using the standard devia¬ 
tion the return's fluctuation around its mean. 
The standard deviation is often more appealing 
than the variance since the latter uses squares, 
which are a different scale from the original val¬ 
ues of X. Even though mathematically not quite 
correct, the standard deviation, denoted by a, is 
commonly interpreted as the average deviation 
from the mean. 

Skewness 

Consider the density function portrayed in Fig¬ 
ure 5. The figure is obviously symmetric about 
some location parameter /x in the sense that 
/(—x — /x) = /(x — /x). Suppose instead that we 
encounter a density function/ of some random 
variable X that is depicted in Figure 6. This fig¬ 
ure is not symmetric about any location param¬ 
eter. Consequently, some quantity stating the 


extent to which the density function is deviat¬ 
ing from symmetry is needed. This is accom¬ 
plished by a parameter referred to as skewness. 
This parameter measures the degree to which 
the density function leans to either side, if at all. 

Let X be a real-valued random variable on the 
space £2 = 1R. with Borel a-algebra B, variance 
a 2 , and mean /x = £(X). The skewness parame¬ 
ter, denoted by y, is given by 

E ((x — £(X)) 3 ) 

Y C7 3 /2 

The skewness measure given above is referred 
to as the Pearson skewness measure. Negative 
values indicate skewness to the left (i.e., left 
skeived) while skewness to the right is given by 
positive values (i.e., right skeived). 

The design of the skewness parameter fol¬ 
lows the following reasoning. In the numerator, 
we measure the distance from every value x 
to the mean £(X) of random variable X. To 
overweight larger deviations, we take them 
to a higher power than one. In contrast to the 
variance where we use squares, in the case of 
skewness we take the third power since three 
is an odd number and thereby preserves both 
the signs and directions of the deviations. Due 



Figure 5 Example of Some Symmetric Density Function/(x) 
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Figure 6 Example of Some Asymmetric Density Function/(x) 


to this sign preservation, symmetric density 
functions yield zero skewness since all devia¬ 
tions to the left of the mean cancel their counter¬ 
parts to the right. To standardize the deviations, 
we scale them by dividing by the standard 
deviation, also taken to the third power. So, 
the skewness parameter is not influenced by 
the standard deviation of the distributions. If 
we did not scale the skewness parameter in 
this way, distribution functions with density 
functions having large variances would always 
produce larger skewness even though the 
density is not really tilted more pronouncedly 
than some similar density with smaller 
variance. 

We graphically illustrate the skewness param¬ 
eter y in Figure 6 for some density function/(x). 
A density function/ that assumes positive val¬ 
ues/^) only for positive real values (i.e., x > 0) 
but zero for x < 0 is shown in the figure. The 
random variable X with density function/ has 
mean /x = 1.65. Its standard deviation is com¬ 
puted as or = 0.957. The value of the skewness 
parameter is y — 0.7224, indicating a positive 
skewness. The sign of the skewness parame¬ 
ter can be easily verified by analyzing the den¬ 
sity graph. The density peaks just a little to the 


right of the leftmost value x = 0. Toward the left 
tail, the density decays abruptly and vanishes 
at zero. Toward the right tail, things look very 
different in that/ decays very slowly, approach¬ 
ing a level of f— 0 as x goes to positive infinity. 
(The graph is depicted for x e [0, 3.3].) 

KEY POINTS 

* A continuous random variable is a random 
variable that does not only assume values 
from a set of discrete values but may assume 
any real value from within one or more in¬ 
tervals. Often, asset returns are modeled as 
continuous random variables. 

* The continuous distribution function is the 
probability distribution associated with a 
continuous random variable. It distinguishes 
itself from the discrete probability distribu¬ 
tion in that it gives positive probability only 
to entire intervals rather than some discrete 
values only. 

* To appreciate continuous random variables, 
it is necessary to understand the concept of 
the derivative of some function, which is the 
marginal rate of growth of some function at a 
certain point. It can be conceived as the slope 
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of the function at some point considering only 
very small increments in the argument of the 
function away from the point. 

• The density function determines how the 
probability mass of one is distributed across 
the real line. Hence, it would be counterintu¬ 
itive if that function were ever negative such 
that we require it to be nonnegative. Tech¬ 
nically, it is the marginal rate of growth of 
the distribution function at any position or, in 
other words, its derivative. 

• As support of some probability distribution, 
we define the subset of the real numbers that 
represents 100% of the probability. For the 
continuous probability distributions, it is the 
collection of intervals where the associated 
probability density is positive. 


NOTES 

1. The expression 3 F (x) is equivalent to the in¬ 
crement F(x + Ax) — F(x) as Ax goes to zero. 

2. Sometimes the density of X is explicitly in¬ 
dexed f x . We will not do so here, however, 
except where we believe not doing so will 
lead to confusion. The same holds for its dis¬ 
tribution function F. 
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Abstract: To model the behavior of certain financial assets in a stochastic environment, we can 
usually resort to a variety of theoretical distributions. Most commonly, probability distributions 
are selected that are analytically well known. For example, the normal distribution is often the 
distribution of choice when asset returns are modeled, or the exponential distribution is applied 
to characterize the randomness of the time between two successive defaults of firms in a bond 
portfolio. Many other distributions are related to them or built on them in a well-known manner. 
These distributions often display pleasant features such as stability under summation—meaning 
that the return of a portfolio of assets whose returns follow a certain distribution again follows the 
same distribution. However, one has to be careful using these distributions since their advantage 
of mathematical tractability is often outweighed by the fact that the stochastic behavior of the true 
asset returns is not well captured by these distributions. 


In this entry, we discuss the more commonly 
used distributions with appealing statistical 
properties that are used in finance. The dis¬ 
tributions discussed are the normal distribu¬ 
tion, the chi-square distribution, the Student's 
f-distribution, the Fisher's F-distribution, the 
exponential distribution, the gamma distribu¬ 
tion (including the special Erlang distribu¬ 
tion), the beta distribution, and the log-normal 


distribution. Many of the distributions en¬ 
joy widespread attention in finance, or sta¬ 
tistical applications in general, due to their 
well-known characteristics or mathematical 
simplicity. However, as we emphasize, the 
use of some of them might be ill-suited to 
replicate the real-world behavior of financial 
returns. For a more technical discussion of 
continuous distributions, see Evans, Hastings, 
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Figure 1 Normal Density Function for Various Parameter Values 


and Peacock (2000) or Johnson, Kotz, and 
Balakrishnan (1995). 


NORMAL DISTRIBUTION 

The first distribution we discuss is the 
normal distribution. It is the distribution most 
commonly used in finance despite its many 
limitations. This distribution, also referred to 
as the Gaussian distribution (named after the 
mathematician and physicist C. F. Gauss), is 
characterized by the two parameters: mean 
(fi) and standard deviation (<r). The distribu¬ 
tion is denoted by N(ji, a 2 ). When /x = 0 and 
a 2 — 1, then we obtain the standard normal 
distribution. 

For x e R, the density function for the normal 
distribution is given by 

1 (i-m) 2 

f{x) = -=^-e-W ( 1 ) 

\J2.no 


The density in equation (1) is always positive. 
Flence, we have support (i.e., positive density) 
on the entire real line. Furthermore, the density 
function is symmetric about //. A plot of the 
density function for several parameter values 
is given in Figure 1. As can be seen, the value 
of ii results in a horizontal shift from 0 while a 
inflates or deflates the graph. A characteristic of 
the normal distribution is that the densities are 
bell shaped. 

A problem is that the distribution function 
cannot be solved for analytically and therefore 
has to be approximated numerically. In the par¬ 
ticular case of the standard normal distribu¬ 
tion, the values are tabulated. Standard statisti¬ 
cal software provides the values for the stan¬ 
dard normal distribution as well as most of 
the distributions presented in this entry. The 
standard normal distribution is commonly de¬ 
noted by the Greek letter <t> such that we have 
O(x) = F (x) = P (X < x), for some standard 
normal random variable X. In Figure 2, graphs 
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Figure 2 Normal Distribution Function for Various Parameter Values 


of the distribution function are given for three 
different sets of parameters. 

Properties of the Normal 
Distribution 

The normal distribution provides one of the 
most important classes of probability distribu¬ 
tions due to two appealing properties that gen¬ 
erally are not shared by all distributions: 

Property 1. The distribution is location-scale 
invariant. 

Property 2. The distribution is stable under 
summation. 

Property 1, the location-scale invariance prop¬ 
erty, guarantees that we may multiply X by b 
and add a where a and b are any real numbers. 
Then, the resulting a + b ■ X is, again, normally 
distributed, more precisely, N (a + // , bo). Con¬ 
sequently, a normal random variable will still be 
normally distributed if we change the units of 
measurement. The change into a + b ■ X can be 
interpreted as observing the same X, however. 


measured in a different scale. In particular, if a 
and b are such that the mean and variance of 
the resulting a + b ■ X are 0 and 1, respectively, 
then a + b ■ X is called the standardization ofX. 

Property 2, stability under summation, en¬ 
sures that the sum of an arbitrary number n 
of normal random variables, Xj, X 2 ,..., X n 
is, again, normally distributed provided that 
the random variables behave independently of 
each other. This is important for aggregating 
quantities. 

These properties are illustrated later in the 
entry. 

Furthermore, the normal distribution is of¬ 
ten mentioned in the context of the central limit 
theorem. It states that a sum of random vari¬ 
ables with identical distributions and being in¬ 
dependent of each other results in a normal 
random variable. 1 We restate this formally as 
follows: 

Let Xj, X 2 ,..., X„ be identically dis¬ 
tributed random variables with mean £ (X, ) = 
/i and Var(Xi) — a 2 and do not influence the 
outcome of each other (i.e., are independent). 
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Then, we have 

J2 x i ~ n ■ n 

—- 7 =-0,1) (2) 

Osjn 

as the number n approaches infinity. The D 
above the convergence arrow in equation ( 2 ) in¬ 
dicates that the distribution function of the left 
expression convergences to the standard nor¬ 
mal distribution. 

Generally, for n — 30 in equation (2), we con¬ 
sider equality of the distributions; that is, the 
left-hand side is N(0,1) distributed. In certain 
cases, depending on the distribution of the X, 
and the corresponding parameter values, n < 
30 justifies the use of the standard normal dis¬ 
tribution for the left-hand side of equation ( 2 ). 
If the X, are Bernoulli random variables, that is, 
X, ~ B(p), with parameter p such that n ' V > 
5, then we also assume equality in the distribu¬ 
tions in equation (2). Depending on p, this can 
mean that n is much smaller than 30. 

These properties make the normal distribu¬ 
tion the most popular distribution in finance. 
But this popularity is somewhat contentious, 
however, for reasons that will be given as we 
progress in this entry. 

The last property we will discuss of the nor¬ 
mal distribution that is shared with some other 
distributions is the bell shape of the density 
function. This particular shape helps in roughly 
assessing the dispersion of the distribution due 
to a rule of thumb commonly referred to as the 
empirical rule. Due to this rule, we have 

P (X e [p ± or]) = F (p + a) - F (p - a) « 68 % 

P (X e [p ± 2a]) = F (p + 2a) - F (p - 2a) ~ 95% 
P (X e [p ± 3a]) = F (u + 3a) - F (p - 3a) ~ 100% 

The above states that approximately 68 % of 
the probability is given to values that lie in an in¬ 
terval one standard deviation a about the mean 
p. About 95% probability is given to values 
within 2 er to the mean, while nearly all prob¬ 
ability is assigned to values within 3 a from the 
mean. 


By comparison, the so-called Chebychev in¬ 
equalities valid for any type of distribution—so 
not necessarily bell-shaped—yield 

P(Xe[n± er]) « 0% 

P(Xe[p± 2a]) « 75% 
P{Xe[p± 3<r]) « 89% 

which provides a much coarser assessment 
than the empirical rule as we can see, for ex¬ 
ample, by the assessed 0 % of data contained 
inside of one standard deviation about the 
mean. 


Applications to Stock Returns 

Applying Properties 1 and 2 to 
Stock Returns 

With respect to Property 1, consider an exam¬ 
ple of normally distributed stock returns r with 
mean p. If p is nonzero, this means that the re¬ 
turns are a combination of a constant p and ran¬ 
dom behavior centered about zero. If we were 
only interested in the latter, we would subtract 
p from the returns and thereby obtain a new 
random variable f =r — p, which is again nor¬ 
mally distributed. 

With respect to Property 2, we give two exam¬ 
ples. First, let us present the effect of aggrega¬ 
tion over time. We consider daily stock returns 
that, by our assumption, follow a normal law. 
By adding the returns from each trading day 
during a particular week, we obtain the week's 
return as r w = r Mo + r Tu +...+ r Fr where r Mo , 
r Tu ,..., r Fr are the returns from Monday through 
Friday. The weekly return r w is normally dis¬ 
tributed as well. The second example applies 
to portfolio returns. Consider a portfolio con¬ 
sisting of n different stocks, each with normally 
distributed returns. We denote the correspond¬ 
ing returns by R\ through R n . Furthermore, in 
the portfolio we weight each stock i with w„ 
for i = 1,2,..., n. The resulting portfolio return 
R p = wiRi + W 2 R 2 + ■ ■ ■ + w„R„ is also a normal 
random variable. 
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Using the Normal Distribution to 
Approximate the Binomial Distribution 

Consider the binomial stock price model. At 
time t = 0, the stock price was So = $20. At time 
t = 1 , the stock price was either up or down 
by 10 % so that the resulting price was either 
So = $18 or So = $22. Both up- and down- 
movement occurred with probability P($18) = 
P($22) = 0.5. Now we extend the model to an 
arbitrary number of n days. Suppose each day 
i, i — 1 , 2 ,. .., n, the stock price developed in 
the same manner as on the first day. That is, 
the price is either up 10% with 50% probability 
or down 10% with the same probability. If on 
day i the price is up, we denote this by X, = 1 
and X; = 0 if the price is down. The X, are, 
hence, B(0.5) random variables. After, say, 50 
days, we have a total of Y = Xi + X 2 + ... + 
X 50 up-movements. Note that because of the as¬ 
sumed independence of the X„ that Y is a B( 50, 
0.5) random variable with mean n • p = 25 and 
variance n ■ p ■ (1 — p) = 12.5. Let us introduce 


From the comments regarding equation (2), 
we can assume that Z50 is approximately 
N(25,12.5) distributed. So, the probability 
of at most 15 up-movements, for example, 
is given by P(Y < 15) = d>((15 - 25)/VT2 5) = 
0.23%. By comparison, the probability of no 
more than five up-movements is equal to P(Y < 
5) = <t>((5 - 25)/-v/IZ5) = 0%. 

Normal Distribution for Logarithmic Returns 

As another example, let X be some random 
variable representing a quantitative daily mar¬ 
ket dynamic such as new information about 
the economy. A dynamic can be understood as 
some driving force governing the development 
of other variables. We assume that it is normally 
distributed with mean £(X) = p — 0 and vari¬ 
ance Var(X) = a 2 = 0.2. Formally, we would 
write X ~ N (0,0.2). So, on average, the value of 
the daily dynamic will be zero with a standard 
deviation of V0.2. In addition, we introduce a 


stock price S as a random variable, which is 
equal to So at the beginning. 

After one day, the stock price is modeled to 
depend on the dynamic X as follows 

Si = So • e x 

where Si is the stock price after one day. The 
exponent X in this presentation is referred to as 
a logarithmic return in contrast to a multiplicative 
return R obtained from the formula R = Si/So 
— 1. So, for example, if X = 0.01, Si is equal 
to e 001 • So- That is almost equal to 1.01 • So, 
which corresponds to an increase of 1% relative 
to So- 2 The probability of X being, for instance, 
no greater than 0.01 after one day is given by 3 

0.01 

P(X < 0.01) = J f(x)dx 

—00 

0.01 

f 1 ,2 

= / ,_ ,_ -.e 202 dx& 0.51 

J V2nV02 

— OO 

Consequently, after one day, the stock price in¬ 
creases, at most, by 1% with 51% probability, 
that is, P(Sj < 1.01-S 0 )«0.51. 

Next, suppose we are interested in a five- 
day outlook where the daily dynamics X/, 
i = 1, 2,..., 5 of each of the following consec¬ 
utive five days are distributed identically as X 
and independent of each other. Since the dy¬ 
namic is modeled to equal exactly the continu¬ 
ously compounded return— that is logarithmic 
returns—we refer to X as the return in this entry. 
For the resulting five-day returns, we introduce 
the random variable Y = Xj + X 2 + ... + X 5 
as the linear combination of the five individual 
daily returns. We know that Y is normally dis¬ 
tributed from Property 2. More precisely, Y ~ 
N(0,1). So, on average, the return tends in nei¬ 
ther direction, but the volatility measured by 
the standard deviation is now ~J5 ~ 2.24 times 
that of the daily return X. Consequently, the 
probability of Y not exceeding a value of 0.01 is 
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now, 

0.01 

P(Y< 0.01) = [ * e~£dy^ 0.50 

J V2jtV1 

—oo 

We see that the fivefold variance results in a 
greater likelihood to exceed the threshold 0.01, 
that is, 

P(Y > 0.01) = 1 - P(Y < 0.01) 

« 0.50 > 0.49 « P(X > 0.01) 

We model the stock price after five days as 

S 5 = S 0 ■ e Y = S 0 ■ e x i+ x 2+-+ x 5 

So, after five days, the probability for the stock 
price to have increased by no more than 1% 
relative to So is equal to 

P(S 5 < e 0M • S 0 ) = P(S 5 < 1.01 • So) « 0.50 

There are two reasons why in finance loga¬ 
rithmic returns are commonly used. First, log¬ 
arithmic returns are often easier to handle than 
multiplicative returns. Second, if we consider 
returns that are attributed to ever shorter pe¬ 
riods of time (e.g., from yearly to monthly to 
weekly to daily and so on), the resulting com¬ 
pounded return after some fixed amount of 
time can be expressed as a logarithmic return. 
The theory behind this can be obtained from 
any introductory book on calculus. 


CHI-SQUARE DISTRIBUTION 

Our next distribution is the chi-square distribu¬ 
tion. Let Z be a standard normal random vari¬ 
able, in brief Z ~ N (0,1), and let X = Z 2 . 
Then X is distributed chi-square with one de¬ 
gree of freedom. We denote this as X ~ X 2 (l). 
The degrees of freedom indicate how many inde¬ 
pendently behaving standard normal random 
variables the resulting variable is composed of. 
Here X is just composed of one, namely Z, and 
therefore has one degree of freedom. 

Because Z is squared, the chi-square dis¬ 
tributed random variable assumes only non¬ 
negative values; that is, the support is on the 


nonnegative real numbers. It has mean E(X) = 
1 and variance Var(X) = 2. 

In general, the chi-square distribution is char¬ 
acterized by the degrees of freedom n, which 

assume the values 1, 2,_Let X\, X 2 ,... , X„ 

be n / 2 (1) distributed random variables that are 
all independent of each other. Then their sum, 
S, is 

n 

S = J>~ x 2 (n) (3) 

i =1 

In words, the sum is again distributed chi- 
square but this time with n degrees of freedom. 
The corresponding mean is E(X) = n, and the 
variance equals Var(X) = 2 • n. So, the mean 
and variance are directly related to the degrees 
of freedom. 

From the relationship in equation (3), we see 
that the degrees of freedom equal the number of 
independent x 2 (l) distributed Xj in the sum. If 
we have Xi ~ x 2 ( n i) and X 2 ~ x 2 (m 2 ), it follows 
that 

Xi + X 2 ~x 2 («i+h 2 ) (4) 


From property (4), we have that chi-square 
distributions have Property 2; that is, they are 
stable under summation in the sense that the 
sum of any two chi-squared distributed random 
variables is itself chi-square distributed. 

The chi-square density function with n de¬ 
grees of freedom is given by 


/(*) = 


/(*) = 


2% r (»/ 2 ) 

o 


• e 


x n . x n n- 


x > 0 
x < 0 


for n = 1, 2, ... where T(-) is the gamma func¬ 
tion. Figure 3 shows a few examples of the chi- 
square density function with varying degrees 
of freedom. As can be observed, the chi-square 
distribution is skewed to the right. 


Application to Modeling Short-Term 
Interest Rates 

As an example of an application of the chi- 
square distribution, we present a simplified 
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Figure 3 Density Functions of Chi-Square Distributions for Various Degrees of Freedom n 


model of short-term interest rates, that is, so- 
called short rates. The short rate given by r t , at 
any time f, is assumed to be a nonnegative con¬ 
tinuous random variable. Furthermore, we let 
the short rate be composed of d independent 
dynamics X\, X2 ,..., Xd according to 

r, = X 2 , + X 2 + ■ ■ ■ + X 2 d 

where d is some positive integer number. In ad¬ 
dition, each X, is given as a standard normal 
random variable independent of all other dy¬ 
namics. Then, the resulting short rate r t is chi- 
square distributed with d degrees of freedom, 
that is, r t ~ x 2 (d). 

If we let d = 2 (i.e., there are two dynamics 
governing the short rate), the probability of a 
short rate between 0 and 1% is 0.5%. That is, we 
have to expect that on five out of 1,000 days, we 
will have a short rate assuming some value in 
the interval (0,0.01]. If, in addition, we had one 
more dynamic included such that r t ~ y 2 (3), 
then, the same interval would have probability 
P(r t e (0,0.01]) ~ 0.03%, which is close to being 
an unlikely event. We see that the more dynam¬ 


ics are involved, the less probable small interest 
rates such as 1% or less become. 

It should be realized, however, that this is 
merely an approach to model the short rate sta¬ 
tistically and not an economic model explaining 
the factors driving the short rate. 


STUDENT'S f-DISTRIBUTION 

An important continuous probability distribu¬ 
tion when the population variance of a distri¬ 
bution is unknown is the Students t-distribution 
(also referred to as the t-distribution and Stu¬ 
dent's distribution. 

The f-distribution is a mixture of the normal 
and chi-square distributions. To derive the dis¬ 
tribution, let X be distributed standard normal, 
that is, X ~ N(0,1), and S be chi-square dis¬ 
tributed with n degrees of freedom, that is, S ~ 
X 2 (n). Furthermore, if X and Y are independent 
of each other (which is to be understood as not 
influencing the outcome of the other), then 
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In words, equation (5) states that the result¬ 
ing random variable Z is Student's f-distributed 
with n degrees of freedom. The degrees of free¬ 
dom are inherited from the chi-square distribu¬ 
tion of S. 

How can we interpret equation (5)? Suppose 
we have a population of normally distributed 
values with zero mean. The corresponding nor¬ 
mal random variable may be denoted as X. If 
one also knows the standard deviation of X, 

a = ^Var(X) 

with X/a, we obtain a standard normal random 
variable. 

However, if a is not known, we have to use, 
for example, 

ysM = yi/«-(x2 + xf + --- + x2) 

instead where Xj, X|, ■ ■ ■, Xf t are n random vari¬ 
ables identically distributed as X. Moreover, Xj, 
X2 ,..., X„ have to assume values independently 
of each other. Then, the distribution of 

x/7s7^ 


is the f-distribution with n degrees of freedom, 
that is, 

X/yfsjn ~ t(n) 


By dividing by a or S/n, we generate rescaled 
random variables that follow a standardized 
distribution. Quantities similar to X/y/S/n play 
an important role in parameter estimation. 

The density function is defined as 


/(*) = 


1 

y/tl ■ 7t 





n±1 
2 


( 6 ) 

where the gamma function F is incorporated 
again. The density function is symmetric and 
has support (i.e., is positive) on all R. 

Basically, the Student's f-distribution has a 
similar shape to the normal distribution, but 
thicker tails. For large degrees of freedom n, the 
Student's f-distribution does not significantly 
differ from the standard normal distribution. 
As a matter of fact, for n > 100, it is practically 
indistinguishable from N(0,1). 

Figure 4 shows the Student's f-density func¬ 
tion for various degrees of freedom plotted 



Figure 4 Density Function of the f-Distribution for Various Degrees of Freedom n Compared to the 
Standard Normal Density Function (N(0,1)) 
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Figure 5 Distribution Function of the /-Distribution for Various Degrees of Freedom n Compared to 
the Standard Normal Density Function (N( 0,1)) 


against the standard normal density function. 
The same is done for the distribution function 
in Figure 5. 

In general, the lower the degrees of freedom, 
the heavier the tails of the distribution, mak¬ 
ing extreme outcomes much more likely than 
for greater degrees of freedom or, in the limit, 
the normal distribution. This can be seen by the 
distribution function that we depicted in Fig¬ 
ure 5 for n = 1 and n = 5 against the standard 
normal cumulative distribution function (cdf). 
For lower degrees of freedom such as n = 1, the 
solid curve starts to rise earlier and approach 1 
later than for higher degrees of freedom such as 
n — 5 or the N(0,1) case. 

This can be understood as follows. When we 
rescale X by dividing by y/S/n as in equation 
(5), the resulting X/^JS/n obviously inherits 
randomness from both X and S. Now, when S 
is composed of few X„ only, say n = 3, such that 
X/ A yS/n has three degrees of freedom, there is 
a lot of dispersion from S relative to the stan¬ 
dard normal distribution. By including more 
independent N(0,1) random variables X, such 
that the degrees of freedom increase, S becomes 
less dispersed. Thus, much uncertainty relative 


to the standard normal distribution stemming 
from the denominator in X/ V S/n vanishes. The 
share of randomness in X/^/S/n originating 
from X alone prevails such that the normal char¬ 
acteristics preponderate. Finally, as n goes to 
infinity, we have something that is nearly stan¬ 
dard normally distributed. 

The mean of the Student's t random variable 
is zero, that is E(X) = 0, while the variance is a 
function of the degrees of freedom n as follows 

a 2 = Var(X) = -— 
n — 2 

For n = 1 and 2, there is no finite variance. Dis¬ 
tributions with such small degrees of freedom 
generate extreme movements quite frequently 
relative to higher degrees of freedom. Precisely 
for this reason, stock price returns are often 
found to be modeled quite well using distri¬ 
butions with small degrees of freedom, or alter¬ 
natively, large variances. 

Application to Stock Returns 

Let us resume the example at the end of the 
presentation of the normal distribution. We 
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consider, once again, the 5-day return Y with 
standard normal distribution. Suppose that 
now we do not know the variance. For this 
reason, at any point in time f, we rescale the 
observations of Y by 



where the Y\, Y 2 2 ,..., Y 2 5 are the five inde¬ 
pendent weekly returns immediately prior to 
Y. The resulting rescaled weekly returns 


+ y! 2 + ■ ■ ■ + y?5 

then are f(5) distributed. The probability of Y 
not exceeding a value of 0.01 is now 

P(Y < 0.01) = F(0.01) = 0.5083 

where F is the cumulative distribution function 
of the Student's f-distribution with five degrees 
of freedom. Under the N(0,1), this probability 
was about the same. 

Again, we model the stock price after five 
days as S5 = So-e y where So is today's price. 
As we know, when Y < 0.01, then S5 < So • 
gO.oi _ g 0 . 1 oi. Again, it follows that the stock 
price increases by at most 1% with probability 
of about 0.51. So far there is not much difference 
here between the standard normal and the f(5) 
distribution. 

Let's analyze the stock of American Interna¬ 
tional Group (AIG) in September 2008. During 
one week, that is, five trading days, the stock 
lost about 67% of its value. That corresponds 
to a value of the 5-day return of Y = —1.0986 
because of e Y = 10986 = 0.3333 = 1 - 0.6667. 

In the N(0,1) model, a decline of this magnitude 
or even worse would occur with probability 

P(Y < -1.0986) = <D(—1.0986) = 13.6% 

while under the f(5) assumption, we would 
obtain 

P(Y < -1.0986) = F (—1.0986) = 16.1% 

This is 2.5% more likely in the f(5) model. So, 
stock price returns exhibiting extreme move¬ 


ments such as that of the AIG stock price should 
not be modeled using the normal distribution. 


F-DISTRIBUTION 

Our next distribution is the F-distribution. It is 
defined as follows. Let X ~ x 2 ( n i ) and Y ~ 
X 2 (n 2 ). 

Furthermore, assuming X and Y to be inde¬ 
pendent, then the ratio 

F (ni, n 2 ) = (7) 

hz 

has an F-distribution with n 1 and n 2 degrees 
of freedom inherited from the underlying chi- 
square distributions of X and Y, respectively. 
We see that the random variable in equation 
(7) assumes nonnegative values only because 
neither X nor Y are ever negative. Flence, the 
support is on the nonnegative real numbers. 
Also like the chi-square distribution, the F- 
distribution is skewed to the right. 

The F-distribution has a rather complicated 
looking density function of the form 



«1 + «2 ’ 


x > 0 


x < 0 


( 8 ) 


Figure 6 displays the density function (8) for 
various degrees of freedom. As the degrees 
of freedom ri\ and n 2 increase, the function 
graph becomes more peaked and less asymmet¬ 
ric while the tails lose mass. 

The mean is given by 

E(X)=-^~, for n 2 > 2 (9) 

n 2 — 2 


while the variance equals 


Var(X) = 


2n\(ni +n 2 -l) 
ni(n 2 - 2) 2 (n 2 - 4) 


, for n 2 > 4 


( 10 ) 

Note that according to equation (9), the mean 
is not affected by the degrees of freedom n\ of 
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Figure 6 Density Function of the F-Distribution for Various Degrees of Freedom ri\ and n 2 


the first chi-square random variable, while the 
variance in equation (10) is influenced by the 
degrees of freedom of both random variables. 

EXPONENTIAL 

DISTRIBUTION 

The exponential distribution is characterized by 
the positive real-valued parameter X. In brief, 
we use the notation Exp(/.). An exponential ran¬ 
dom variable assumes nonnegative values only. 
The density defined for X > 0 by 


is right skewed. Figure 7 presents the density 
function for various parameter values X. 

The distribution function is obtained by sim¬ 
ple integration as 

F(x) = 1 — e~ Xx 

For identical parameter values as in Figure 7, 
we have plots of the exponential distribution 
function shown in Figure 8. 

For this distribution, both the mean and vari¬ 
ance are relatively simple functions of the pa¬ 


rameter. That is, for the mean 

and for the variance 

Var(X) = 1 

There is an inverse relationship between the 
exponential distribution and the Poisson dis¬ 
tribution. Suppose we have a Poisson random 
variable N with parameter X, i.e., N ~ Poi(X), 
counting the occurrences of some event within 
a time frame of length T. Furthermore, let Xj, 
X2, ... be the Exp(X) distributed interarrival 
times between the individual occurrences. That 
is between time zero and the first event, Xj units 
of time have passed, between the first event and 
the second, X2 units of time have elapsed, and 
so on. Now, over these T units of time, we expect 
T ■ X = T ■ E(N) events to occur. Alternatively, 
we have an average of T/(T ■ X) = \/X — £(X) 
units of time to wait between occurrences. 

Suppose that by time T we have counted ex¬ 
actly n events. Then the accrued time r elapsed 
when the event occurs for the nth time is ob¬ 
tained by the sum of all individual interarrival 
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Figure 7 Exponential Density Function for Various Parameter Values X 


times Xi, X2 ,..., X,„ which cannot be greater 
than T. Formally 

n 

r = £ X,- < T ( n ) 


A result of this relationship is 


E(N) — X = 


1 

£(Xj 


The exponential distribution is commonly re¬ 
ferred to as a distribution with a "no memory" 
property in the context of life-span that ends 
due to some break. 

That means that there is no difference in the 
probability between the following two events. 
Event one states that the object will live for 
the first r units of time after the object's cre¬ 
ation while event two states that the object will 



Figure 8 Distribution Function F(x) of the Exponential Distribution for Various Parameter Values X 
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continue living for the next r units of time after 
it has already survived some t units of time. In 
other words, if some interarrival time or survival 
time (i.e., the time between certain occurrences) 
is Exp(X) distributed, one starts all over wait¬ 
ing at any given time t provided that the break 
has not occurred yet. (Technically, these consid¬ 
erations as well as the following equation (12) 
require the understanding of the notion condi¬ 
tional distributions. Here it will suffice to apply 
pure intuition.) So, for example, let the time un¬ 
til the next default of one of several corporate 
bonds held in some portfolio be given as an ex¬ 
ponential random variable. Then the probabil¬ 
ity of the first bond defaulting in no more than 
t units of time given that none have defaulted 
so far is the same as the probability of the nth 
bond defaulting after at most t units of time 
given that n - 1 bonds have already defaulted. 
That is, we only care about the probability dis¬ 
tribution of the time of occurrence of the next 
default regardless of how many bonds have al¬ 
ready defaulted. 

Finally, an additional property of the expo¬ 
nential distribution is its relationship to the chi- 
square distribution. Let X be Exp(X). Then X is 
also chi-square distributed with two degrees of 
freedom, that is, X ~ x 2 (2)- 

Applications in Finance 

In applications in finance, the parameter X often 
has the meaning of a defaidt rate, default inten¬ 
sity, or hazard rate. This can be understood by 
observing the ratio 

P(Xe(t,t + dt]) 
dt ■ P(X > t) ( ’ 

which expresses the probability of the event of 
interest such as default of some company oc¬ 
curring between time t and t + dt given that 
it has not happened by time t, relative to the 
length of the horizon, dt. Now, let the length of 
the interval, dt, approach zero, and this ratio in 
equation (12) will have X as its limit. 

The exponential distribution is often used 
in credit risk models where the number of 
defaulting bonds or loans in some portfolio 


over some period of time is represented by a 
Poisson random variable and the random times 
between successive defaults by exponentially 
distributed random variables. In general, then, 
the time until the nth default is given by the 
sum in equation (11). 

Consider, for example, a portfolio of bonds. 
Moreover, we consider the number of defaults 
in this portfolio in one year to be some Poisson 
random variable with parameter 1 = 5, that is, 
we expect five defaults per year. The same pa¬ 
rameter, then, represents the default intensity 
of the exponentially distributed time between 
two successive defaults, that is, r ~ Exp( 5), so 
that on average, we have to wait E(r) = 1/5 of a 
year or 2.4 months. For example, the probabil¬ 
ity of less than three months (i.e., 1 /4 of a year) 
between two successive defaults is given by 

P(r < 0.25) = 1 - e -5 ' 0 ' 25 = 0.7135 

or roughly 71%. Now, the probability of no de¬ 
fault in any given year is then 

P(r > 1) = e~ 51 = 0.0067 

or 0.67%. 


RECTANGULAR 

DISTRIBUTION 


The simplest continuous distribution we are 
going to introduce is the rectangular distribution. 
Often, it is used to generate simulations of 
random outcomes of experiments via transfor¬ 
mation. If a random variable X is rectangular 
distributed, we denote this by X ~ Re(a, 
b) where a and b are the parameters of the 
distribution. 

The support is on the real interval [a, b]. The 
density function is given by 


/(*) = 


1 

b — a' 
0 


a < x <b 
x f [a, b\ 


(13) 


We see that this density function is always con¬ 
stant, either zero or between the bounds a and 
b, equal to the inverse of the interval width. Fig¬ 
ure 9 displays the density function (13) for some 
general parameters a and b. 





220 


Probability Theory 


f(x) lr 
0.9 - 
0.8 - 
0.7 - 
0.6 - 
0.5 - 
0.4 - 
0.3 - 
0.2 - 
0.1 - 

0 - 

a b 


Figure 9 Density Function of a Re(a, b ) Distribution 


Through integration, the distribution func¬ 
tion follows in the form 


F(x) = 


x < a 


a < x < b 


b — a 

1 x > b 
The mean is equal to 

E(X) = C —z— 


(14) 


and the variance is 

(b — a) 2 

Var(X) = 

In Figure 10, we have the distribution func¬ 
tion given by equation (14) with some general 
parameters a and b. By analyzing the plot, we 
can see that the distribution function is not dif¬ 
ferentiable at a or b, since the derivatives of F 
do not exist for these values. At any other real 



Figure 10 Distribution Function of a Re(a, b ) Distribution 
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Figure 11 Density Function of a Gamma Distribution Ga(X, b) 


value x, the derivative exists (being 0) and is 
continuous. We say in the latter case that / is 
smooth there. 


GAMMA DISTRIBUTION 

Next we introduce the gamma distribution for 
positive, real-valued random variables. Char¬ 
acterized by two parameters, X and c, this dis¬ 
tribution class embraces several special cases. It 
is skewed to the right with support on the posi¬ 
tive real line. We denote that a random variable 
X is gamma distributed with parameter X and 
c by writing X ~ Ga(X, c) where X and c are 
positive real numbers. 

The density function is given by 


/(*) = 


X(Xx) c 1 exp{— Xx} 


F(c) 

0 


x > 0 
x < 0 


(15) 


with gamma function T. A plot of the den¬ 
sity function from equation (15) is provided in 


Figure 11. The distribution function is 


F(x) = 


The mean is 


with variance 


0 

Xx 

J u c ~ 1 e~ n du 

o_ 

b c r{c) 


E{X)= l 

Vi,(X) = 


x < 0 


x > 0 


Erlang Distribution 

A special case is the Erlang distribution, which 
arises for natural number values of the param¬ 
eter c, that is, c e N. The intuition behind it is 
as follows. Suppose we have c exponential ran¬ 
dom variables with the same parameter X, that 
is, X\, X 2 ,..., X c ~ Exp(X) all being independent 
of each other. Then the sum of these 

S = ±x 

i =1 
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Figure 12 Density Function of a Beta Distribution Be(c, d ) 


is distributed Ga(X, c) such that the resulting 
distribution function is 


0 s < 0 


F(s) = 


C —1 

1 — e~ Xs Y 


i\ 


s > 0 


So, when we add the identically Exp(X) dis¬ 
tributed interarrival times until the c th default, 
for example, the resulting combined waiting 
time is Erlang distributed with parameters c 
and X. 


The density function is defined by 
1 

f(x) = 


v.C — 1 


(1 - x) 


d—1 


0 


0 < x < 1 


else 


where B(c, d) denotes the beta function with pa¬ 
rameters c and d. The density function may as¬ 
sume various different shapes depending on c 
and d. For a few exemplary values, we present 
the plots in Figure 12. As we can see, for c = d, 
the density function is symmetric about x = 0.5. 


BETA DISTRIBUTION 

The beta distribution is characterized by the two 
parameters c and d that are any positive real 
numbers. We abbreviate this distribution by 
Be(c, d). It has a density function with support 
on the interval [0,1], that is, only for x e [0,1] 
does the density function assume positive val¬ 
ues. In the context of credit risk modeling, it 
commonly serves as an approximation for gen¬ 
erating random defaults when the true underly¬ 
ing probabilities of default of certain companies 
are unknown. 


LOG-NORMAL 

DISTRIBUTION 

Another important distribution in finance is the 
log-normal distribution. It is connected to the nor¬ 
mal distribution via the following relationship. 
Let Y be a normal random variable with mean 
p. and variance a 2 . Then the random variable 

X = e Y 

is log-normally distributed with parameters // 
and a 2 . In brief, we denote this distribution by 
X ~ Ln(p, a 2 ). 









Continuous Probability Distributions with Appealing Statistical Properties 


223 



Figure 13 Density Function of the Log-Normal Distribution for Various Values of // and cr 2 


Since the exponential function e Y = exp(Y) 
only yields positive values, the support of the 
log-normal distribution is on the positive half 
of the real line only, as will be seen by its density 
function given by 


/(*) = 


1 (In x-, 1 ) 2 

- ~—e 2 » 2 ’ x > 0 

xctVzjt 

0 else 


(16) 


which looks strikingly similar to the normal 
density function given by (2). Figure 13 de¬ 
picts the density function for several parameter 
values. 

This density function results in the log-normal 
distribution function 


F(x) = <l> 



where T>( ) is the distribution function of the 
standard normal distribution. (This is the re¬ 
sult of the one-to-one relationship between the 
values of a log-normal and a standard normal 
random variable.) A plot of the distribution 
function for different parameter values can be 
found in Figure 14. 


Mean and variance of a log-normal random 
variable are 

£(X) = e( M+<T ^) (17) 

and 

Var(X) = e a \e al - l)e 2 ^ (18) 


Application to Modeling 
Asset Returns 

The reason for the popularity of the log-normal 
distribution is that logarithmic asset returns r 
have been historically modeled as normally dis¬ 
tributed such that the related asset prices are 
modeled by a log-normal distribution. That is, 
let P f denote today's asset price and, further¬ 
more, let the daily return r be N(/x, a 2 ). Then in 
a simplified fashion, tomorrow's price is given 
by P f+ i = P f ■ e r while the percentage change 
between the two prices, e r , is log-normally dis¬ 
tributed, that is, Ln(/r, a 2 ). 

The log-normal distribution is closed un¬ 
der special operations as well. If we let the n 
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Figure 14 Distribution Function of the Log-Normal Distribution for Various Parameter Values n 
and a 2 


random variables X \,..., X n be log-normally 
distributed each with parameters /i and a 2 and 
uninfluenced by each other, then multiplying 
all of these and taking the nth root we have that 


~[X,~ Ln(fj.,a 2 ) 


\ 1=1 


where the product sign is defined as 


n 

[jX i = X 1 xX 2 x...xX„ 

;=i 


As an example, we consider a very simplified 
stock price model. Let S = $100 be today's stock 
price of some company. We model tomorrow's 
price Si as driven by the 1-day dynamic X from 
the previous example of the normal distribu¬ 
tion. In particular, the model is 

Si = Sq • e x 


By some slight manipulation of the above 
equation, we see that the ratio of tomorrow's 


price over today's price 


Si 

So 


= e 


x 


follows a log-normal distribution with param¬ 
eters p and cr, that is, Si/So ~ LN(n, a 2 ). We 
may now be interested in the probability that 
tomorrow's price is greater than $120; that is. 


P(Si > 120) = P(S 0 e x > 120) 

= P(100 -e x > 120) 


This corresponds to 



= P(e x > 1.20) 

= 1 - P(e x < 1.20) 

= 1 - F (1.2) 

= 1-0.8190 = 0.1810 


where in the third equation on the right-hand 
side, we have applied the log-normal cumula¬ 
tive probability distribution function F. So, in 
roughly 18% of the scenarios, tomorrow's stock 
price Si will exceed the price of today. So = $100, 
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by at least 20%. From equation (17), the mean 
of the ratio is 



implying that we have to expect tomorrow's 
stock price to be roughly 10% greater than to¬ 
day, even though the dynamic X itself has an 
expected value of 0. Finally, equation (18) yields 
the variance 

Var (!) = u s 2 ySo = e°- 2 (e 02 - 1) = 0.2704 

which is only slightly larger than that of the 
dynamic X itself. 

The statistical concepts learned to this point 
can be used for pricing certain types of deriva¬ 
tive instruments such as the Black-Scholes op¬ 
tion pricing model. 

KEY POINTS 

* The more commonly used distributions 
with appealing statistical properties that are 
used in finance are the normal distribution, 
the chi-square distribution, the Student's t- 
distribution, the Fisher's F-distribution, the 
exponential distribution, the gamma, the beta 
distribution, and the log-normal distribution. 

* The normal distribution is probably the most 
famous probability distribution. Its popular¬ 
ity is credited to the fact that it serves as the 
distribution of many random sums of random 
variables. Moreover, it serves as the origin for 
many other probability distributions with ap¬ 
pealing properties. 

* The empirical rule is helpful in assessing how 
the data of most samples are dispersed even if 
we do not know the underlying distribution. 
The theoretical counterpart, the Chebychev 
inequality, provides limits for the dispersion 
of any probability distribution whose vari¬ 
ance we know. 


* Logarithmic returns in contrast to percentage 
returns is the most commonly used method 
to express changes of asset prices. The reason 
for the widespread use of returns computed 
in terms of logarithms lies in the simple math¬ 
ematical tractability of their form. Moreover, 
their intuitive appeal results from the fact that 
they can be understood as the relative price 
changes obtained from constant trading. 

* The default intensity finds extended use in fi¬ 
nancial models considering stochastic default 
such as the default of some bond in a bond 
portfolio. It expresses the probability of de¬ 
faulting within the next unit of time interval 
as we let the length of this interval approach 
zero. 

* The interarrival time is the random variable 
associated with the time between two succes¬ 
sive random events. For example, for a bond 
portfolio manager it is of interest to model 
the time between some default in the portfo¬ 
lio and the next default. Commonly, the in¬ 
terarrival time is modeled as an exponential 
random variable. 

NOTES 

1. There exist generalizations such that the dis¬ 
tributions need no longer be identical. How- 
ever, this is beyond the scope of this entry. 

2. For values near 0, the logarithmic return X 
is virtually equal to the multiplicative return 
R. Rounding to two decimals, they are both 
equal to 0.01 here. 

3. For some computer software, the probability 
will be given as 0.5 due to rounding. 
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Abstract: Continuous probability distributions are commonly the preferred candidates when mod¬ 
eling financial asset returns. The most popular of them is unquestionably the normal distribution 
because of its appealing properties as well as the fact that it serves as the limit distribution for 
many sums of random variables such as, for example, aggregated returns. The normal distribution 
generally renders modeling easy because all moments exist. However, the normal distribution fails 
to reflect stylized facts commonly encountered in asset returns, namely, the possibility of very ex¬ 
treme movements and skewness. To remedy this shortcoming, probability distributions accounting 
for such extreme price changes have become increasingly popular. Some of these distributions con¬ 
centrate exclusively on extreme values and others permit any real number, but in a manner that is 
capable of reflecting market behavior. Consequently, there is a selection of probability distributions 
that can realistically reproduce asset price changes. Their common shortcoming is generally that 
they are mathematically difficult to handle. 


In this entry, we present a collection of con¬ 
tinuous probability distributions that are used 
in finance in the context of modeling extreme 
events. Although there are distributions that are 
appealing in nature due to their mathematical 
simplicity, the ones introduced in this entry are 
sometimes rather complicated, using parame¬ 
ters that are not necessarily intuitive. However, 
due to the observed behavior of many quanti¬ 


ties in finance, there is a need for more flexi¬ 
ble distributions compared to keeping models 
mathematically simple. 

While the Student's f-distribution is able to 
mimic some behavior inherent in financial data 
such as so-called heavy tails (which means that 
a lot of the probability mass is attributed to 
extreme values), it fails to capture other ob¬ 
served behavior such as skewness. Hence, 
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we decided not to include the Student's t- 
distribution in this entry. 

In this entry, we present the generalized ex¬ 
treme value distribution, the generalized Pareto 
distribution, the normal inverse Gaussian dis¬ 
tribution, and the a-stable distribution together 
with their parameters of location and spread. 
The presentation of each distribution is accom¬ 
panied by some illustration to help render the 
theory more appealing. 

GENERALIZED EXTREME 
VALUE DISTRIBUTION 

Sometimes it is of interest to analyze the 
probability distribution of extreme values of 
some random variable rather than the entire 
distribution. This occurs in risk management 
(including operational risk, credit risk, and mar¬ 
ket risk) and risk control in portfolio manage¬ 
ment. For example, a portfolio manager may 
be interested in the maximum loss a portfolio 
might incur with a certain probability. For this 
purpose, generalized extreme value (GEV) distri¬ 
butions are designed. They are characterized by 
the real-valued parameter §. Thus, the abbrevi¬ 
ated appellation for this distribution is GEV(^). 

Technically, one considers a series of identi¬ 
cally distributed random variables Xj, X 2 ,..., 
X„, which are independent of each other so that 
each one's value is unaffected by the others' 
outcomes. Now, the GEV distributions become 
relevant if we let the length of the series n be¬ 
come ever larger and consider its largest value, 
that is, the maximum. 

The distribution is not applied to the data im¬ 
mediately but, instead, to the so-called standard¬ 
ized data. Basically, when standardizing data x, 
one reduces the data by some constant real 
parameter a and divides it by some positive 
parameter b so that one obtains the quantity 
(x — a)/b. (Standardization is a linear transform 
of the random variable such that its location 
parameter becomes zero and its scale one.) The 
parameters are usually chosen such that this 
standardized quantity has zero mean and unit 
variance. We have to point out that neither vari¬ 


ance nor mean have to exist for all probability 
distributions. 

Extreme value theory, a branch of statistics 
that focuses solely on the extremes (tails) of a 
distribution, distinguishes between three dif¬ 
ferent types of generalized extreme value 
distributions: Gumbel distribution, Frechet 
distribution, and Weibull distribution. In the 
extreme value theory literature, these distri¬ 
butions are referred to respectively as Type I, 
Type II, and Type III. (See Embrechts, Kliippel- 
berg, and Mikosch [2003], De Haan and Ferreira 
[2006], and Kotz and Nadarajah [2002].) The 
three types are related in that we obtain one 
type from another by simply varying the value 
of the parameter £. This makes GEV distribu¬ 
tions extremely pleasant for handling financial 
data. 

For the Gumbel distribution, the general param¬ 
eter is zero (i.e., £ = 0) and its density function 
is 

f(x) = e~ x exp {— e~ x J 

A plot of this density is given by the dashed 
graph in Figure 1 that corresponds to £ = 0. 
The distribution function of the Gumbel distri¬ 
bution is then 

F(x) = exp {—e —x } 

Again, for $ = 0, we have the distribution func¬ 
tion displayed by the dashed graph in Figure 2. 

The second GEV(^) distribution is the Frechet 
distribution, which is given for £ > 0 and has 
density 

f(x) — (1 + §x)“^ _1 ■ exp{-x _f } 

with corresponding distribution function 

F (x) = exp |-(1 + x) ^ J 

Note that the prerequisite 1 + § x > 0 has to be 
met. For a parameter value of £ = 0.5, an exam¬ 
ple of the density and distribution function is 
given by the dotted graphs in Figures 1 and 3, 
respectively. 
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Figure 1 GEV {%) Density Function for Various Parameter Values 


Finally, the Weibidl distribution corresponds to 
§ < 0. It has the density function 

f(x) — (1 + fx) _?_1 • exp {-x _? } 

A plot of this distribution can be seen in 
Figure 1, with £ = —0.5 (solid line). Again, 1 
+ £ x > 0 has to be met. It is remarkable that 


the density function graph vanishes in a finite 
right end point, that is, becomes zero. Thus, the 
support is on (-oo, -l/£). The corresponding 
distribution function is 

F(x) = exp{-(l + xrV?} 

a graph of which is depicted in Figure 2 for 
§ = —0.5 (solid line). 



Figure 2 GEV {%) Distribution Function for Various Parameter Values 








230 


Probability Theory 



Figure 3 Generalized Pareto Density Function for Various Parameter Values 


Notice that the extreme parts of the density 
function (i.e., the tails) of the Frechet distribu¬ 
tion vanish more slowly than that of the Gumbel 
distribution. Consequently, a Frechet type dis¬ 
tribution should be applied when dealing with 
scenarios of large extremes. 


GENERALIZED PARETO 
DISTRIBUTION 

A distribution often employed to model large 
values, such as price changes well beyond the 
typical change, is the generalized Pareto distribu¬ 
tion or, as we will often refer to it here, sim¬ 
ply Pareto distribution. This distribution serves 
as the distribution of the so called "peaks over 
thresholds," which are values exceeding certain 
benchmarks or loss severity. 

For example, consider n random variables X\, 
X 2 ,..., X„ that are all identically distributed 
and independent of each other. Slightly ideal¬ 
ized, they might represent the returns of some 
stock on n different observation days. As the 


number of observations n increases, suppose 
that their maximum observed return follows 
the distribution law of a GEV distribution with 
parameter f. Furthermore, let u be some suffi¬ 
ciently large threshold return. Suppose that on 
day z, the return exceeded this threshold. Then, 
given the exceedance, the amount return X, sur¬ 
passed u by, that is, X, — u, is a generalized 
Pareto distributed random variable. 

The following density function characterizes 
the Pareto distribution 


/(*) = 



x > 0 


0 else 


with /3 > 0 and 1 + (f x)/ ft > 0. Hence, the dis¬ 
tribution is right skewed since the support is 
only on the positive real line. The correspond¬ 
ing distribution function is given by 

F(x) =-(!+£-) ,x>0 


As we can see, the Pareto distribution is char¬ 
acterized by two parameters /f and £. In brief. 
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Figure 4 Generalized Pareto Distribution Function for Various Parameter Values 


the distribution is denoted by Pa(P,^). The pa¬ 
rameter p serves as a scale parameter while the 
parameter f is responsible for the overall shape 
as becomes obvious by the density plots in Fig¬ 
ure 3. The distribution function is displayed, in 
Figure 4, for a selection of parameter values. 

For p < 1, the mean is 

E(X) — P/l — £ 

When p becomes very small approaching zero, 
then the distribution results in the exponential 
distribution with parameter X = 1 /p. 

The Pareto distribution is commonly used to 
represent the tails of other distributions. For ex¬ 
ample, while in neighborhoods about the mean, 
the normal distribution might serve well to 
model financial returns; for the tails (i.e., the 
end parts of the density curve), however, one 
might be better advised to apply the Pareto dis¬ 
tribution. The reason is that the normal distri¬ 
bution may not assign sufficient probability to 
more pronounced price changes measured in 
log-returns. On the other hand, if one wishes to 
model behavior that attributes less probability 


to extreme values than the normal distribution 
would suggest, this could be accomplished by 
the Pareto distribution as well. The reason why 
the class of the Pareto distributions provides a 
prime candidate for these tasks is due to the 
fact that it allows for a great variety of different 
shapes one can smoothly obtain by altering the 
parameter values. 


NORMAL INVERSE 
GAUSSIAN DISTRIBUTION 

Another candidate for the modeling of fi¬ 
nancial returns is the normal inverse Gaussian 
distribution. It is considered suitable since it as¬ 
signs a large amount of probability mass to the 
tails. This reflects the inherent risks in finan¬ 
cial returns that are neglected by the normal 
distribution since it models asset returns be¬ 
having more moderately. But in recent history, 
we have experienced more extreme shocks than 
the normal distribution would have suggested 
with reasonable probability. 
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Figure 5 Normal Inverse Gaussian Density Function for Various Parameter Values 


The distribution is characterized by four pa¬ 
rameters, a, b, /i, and <5. In brief, the distribution 
is denoted by NIG(a,b,/j.,8). For real values x, the 
density function is given by 

f(x )-—- exp { 8s/a 2 — b 2 + b(x — p )| 

Ki 
x — 

where K\ is the so-called Bessel function of 
the third kind. In Figure 5, we display the 
density function for a selection of parameter 
values. 

The distribution function is, as in the normal 
distribution case, not analytically presentable. It 
has to be determined with the help of numerical 
methods. We display the distribution function 
for a selection of parameter values in Figure 6. 

The parameters have the following interpreta¬ 
tion. Parameter a determines the overall shape 
of the density while b controls skewness. The 
location or position of the density function is 
governed via parameter p and S is responsi¬ 
ble for scaling. These parameters have values 


(a^/s 2 + (x - p) 2 j 
Js 2 + (x — jjf 2 


according to the following 

a > 0 
0 <b < a 
p e R 
8 > 0 


The mean of a NIG random variable is 


£(X) = p + 


8 ■ b 

Vfl 2 - b 2 


and the variance is 

V«iX) = 8 

(s/a 2 - b 2 ) 


Normal Distribution versus Normal 
Inverse Gaussian Distribution 

Due to a relationship to the normal distribution 
that is beyond the scope here, there are some 
common features between the normal and NIG 
distributions. 

The scaling property of the NIG distribution 
guarantees that any NIG random variable mul¬ 
tiplied by some real constant is again a NIG 
random variable. Formally, for some k e R and 
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Figure 6 Normal Inverse Gaussian Distribution Function for Various Parameter Values 


X ~ NIG(a, b, jjL, S), we have that 


k ■ X 


NIG 


/a b 
\k' k’ 


k ■ tx, k ■ 8 


( 1 ) 


Among others, the result in equation (1) implies 
that the factor k shifts the density function by 
the /c-fold of the original position. Moreover, 
we can reduce skewness in that we inflate X by 
some factor k. 

Also, the NIG distribution is summation 
stable such that, under certain prerequisites 
concerning the parameters, independent NIG 
random variables are again NIG. More pre¬ 
cisely, if we have the random variables Xi ~ 
NIG(a, b, (Mi, <5j) and X 2 ~ NIG(a, b, \x 2 , S 2 ), the 
sum is Xi + X 2 ~ NIG(a, b, /lj + /r 2 , + <5 2 ). So, 
we see that only location and scale are affected 
by summation. 


a-STABLE DISTRIBUTION 

The final distribution we introduce is the 
class of a-stnble distributions. (For a further 
discussion of stable distributions, see Samorod- 
nitsky and Taqqu [2000].) Often, these distri¬ 


butions are simply referred to as stable distri¬ 
butions. While many models in finance have 
been modeled historically using the normal 
distribution based on its pleasant tractability, 
concerns have been raised that it underesti¬ 
mates the danger of downturns of extreme mag¬ 
nitude inherent in stock markets. The sudden 
declines of stock prices experienced during sev¬ 
eral crises since the late 1980s—October 19,1987 
("Black Monday"), July 1997 ("Asian currency 
crisis"), 1998-1999 ("Russian ruble crisis"), 
2001 ("Dot-com bubble"), and July 2007 and 
following ("Subprime mortgage crisis")—are 
examples that call for distributional alterna¬ 
tives accounting for extreme price shocks more 
adequately than the normal distribution. This 
may be even more necessary considering that 
financial crashes with serious price movements 
might become even more frequent in time given 
the major events that transpired throughout the 
global financial markets in 2008. The immense 
threat radiating from heavy tails in stock re¬ 
turn distributions made industry professionals 
aware of the urgency to take them seriously and 
reflect them in their models. 
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Figure 7 Comparison of the Normal (Dash-Dotted) and a-Stable (Solid) Density Functions 


Many distributional alternatives providing 
more realistic chances to severe price move¬ 
ments are known, such as the Student's f, for 
example, or the GEV distributions presented 
earlier in this entry. In the early 1960s, Benoit 
Mandelbrot suggested as a distribution for 
commodity price changes the class of stable 
distributions. The reason is that, through their 
particular parameterization, they are capable of 
modeling moderate scenarios as supported by 
the normal distribution as well as extreme ones 
beyond the scope of most of the distributions 
that we have presented in this entry. 

The stable distribution is characterized by the 
four parameters a, /3, a, and p. In brief, we 
denote the a-stable distribution by S{u,fi,o ,p). 
Parameter a is the so called tail index or charac¬ 
teristic exponent. It determines how much prob¬ 
ability is assigned around the center and the 
tails of the distribution. The lower the value a, 
the more pointed about the center is the density 
and the heavier are the tails. These two fea¬ 
tures are referred to as excess kurtosis relative to 
the normal distribution. This can be visualized 


graphically as we have done in Figure 7 where 
we compare the normal density to an a-stable 
density with a low a = 1.5. The parameters for 
the normal distribution are ji = 0.14 and a — 
4.23. The parameters for the stable distribution 
are a = 1.5, /0 = 0, o — 1, and p = 0. Note that 
symbols common to both distributions have dif¬ 
ferent meanings. 

The density graphs are obtained from fitting 
the distributions to the same sample data of ar¬ 
bitrarily generated numbers. The parameter a 
is related to the parameter § of the Pareto distri¬ 
bution, resulting in the tails of the density func¬ 
tions of a-stable random variables vanishing at 
a rate proportional to the Pareto tail. 

The tails of the Pareto as well as the a-stable 
distribution decay at a rate with fixed power a, 
x~ a (i.e., power law), which is in contrast to the 
normal distribution whose tails decay at an ex¬ 
ponential rate (i.e., roughly e~ x ■ /2 ). We illustrate 
the effect focusing on the probability of exceed¬ 
ing some value x somewhere in the upper tail, 
say x — 3. Moreover, we choose the parameter 
of stability to be a = 1.5. Under the normal law. 
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the probability of exceedance is roughly e -32 / 2 
= 0.011 while under the power law it is about 
3 -1 - 5 = 0.1925. Next, we let the benchmark x be¬ 
come gradually larger. Then the probability of 
assuming a value at least twice or four times as 
large (i.e., 2x or 4x) is roughly 


or 

e 2 s=» 0 

for the normal distribution. In contrast, under 
the power law, the same exceedance probabili¬ 
ties would be (2 x 3 ) -15 = 0.068 or (4 x 3 ) -15 
0.024. This is a much slower rate than under 
the normal distribution. Note that the value of 
x — 3 plays no role for the power tails while the 
exceedance probability of the normal distribu¬ 
tion decays faster the further out we are in the 
tails (i.e., the larger is x). The same reasoning 
applies to the lower tails considering the prob¬ 
ability of falling below a benchmark x rather 
than exceeding it. 

The parameter (i indicates skewness where 
negative values represent left skewness while 
positive values indicate right skewness. The 
scale parameter a has a similar interpretation 
as the standard deviation. Finally, the param¬ 
eter fi indicates location of the distribution. Its 
interpretability depends on the parameter a. If 
the latter is between 1 and 2 , then /i is equal to 
the mean. 

Possible values of the parameters are listed 
below: 

a ( 0 , 2 ) 

P [-1,1] 

a ( 0 , oo) 

/x R 

Depending on the parameters a and ft, the dis¬ 
tribution has either support on the entire real 
line or only the part extending to the right of 
some location. 

In general, the density function is not explic¬ 
itly presentable. Instead, the distribution of the 
a-stable random variable is given by its charac¬ 


teristic function. The characteristic function is 
given by 


<P(t) = 


OO 

/ th ' S(x)ix 
—OO 


exp { - a 01 \t\ a [l - f/6sign(f) tan ^ 
+i jit |, a 7 ^ 1 



2 

1 — ift — sign(f)ln(f) 

JC 

a = 1 



The density, then, has to be retrieved by an 
inverse transform to the characteristic func¬ 
tion. Numerical procedures are employed for 
this task to approximate the necessary com¬ 
putations. The characteristic function (2) is 
presented here more for the sake of complete¬ 
ness rather than necessity. So, one should not 
be discouraged if it appears overwhelmingly 
complex. 

In Figures 8 and 9, we present the density 
function for varying parameters ft and a, re¬ 
spectively. Note in Figure 9 that for a ft — 1, the 
density is positive only on a half-line toward 
the right as a approaches 2 . 

Only in the case of an a of 0.5, 1, or 2 can 
the functional form of the density be stated. 
For our purpose here, only the case a = 2 is 
of interest because for this special case, the sta¬ 
ble distribution represents the normal distribu¬ 
tion. Then, the parameter ft ceases to have any 
meaning since the normal distribution is not 
asymmetric. 

A feature of the stable distributions is that 
moments such as the mean, for example, exist 
only up to the power a. (Recall that a moment 
exists when the according integral of the abso¬ 
lute values is finite.) So, except for the normal 
case (where a = 2 ), there exists no finite vari¬ 
ance. It becomes even more extreme when a is 
equal to 1 or less such that not even the mean 
exists anymore. The nonexistence of the vari¬ 
ance is a major drawback when applying stable 
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Figure 8 Stable Density Function for Various Values of ft 


distributions to financial data. This is one 
reason why the use of this family of distribu¬ 
tion in finance is still contended. 

This class of distributions owes its name to 
the stability property for the normal distribution 


(Property 2): The weighted sum of an arbitrary 
number of a-stable random variables with the 
same parameters is, again, a-stable distributed. 
More formally, let X \,..., X„ be identically dis¬ 
tributed and independent of each other. Then, 



Figure 9 Stable Density Function (totally right-skewed) for Various Values of a 
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assume that for any n e N, there exists a posi¬ 
tive constant a n and a real constant b„ such that 
the normalized sum Y(n) 

Y(n) = aniX, + X 2 + ■ ■ ■ + X„) 

+ b n ~ S(a, p, ct, fi) (3) 

converges in distribution to a random variable 
X, then this random variable X must be stable 
with some parameters a, ft, a , and /i. Again, 
recall that convergence in distribution means 
that the distribution function of Y(n) in equation 
(3) converges to the distribution function on the 
right-hand side of equation (3). 

In the context of financial returns, this means 
that monthly returns can be treated as the sum 
of weekly returns and, again, weekly returns 
themselves can be understood as the sum of 
daily returns. According to equation (3), they 
are equally distributed up to rescaling by the 
parameters a n and b n . 

From the presentation of the normal distri¬ 
bution, we know that it serves as a limit dis¬ 
tribution of a sum of identically distributed 
random variables that are independent and 
have finite variance. In particular, the sum con¬ 
verges in distribution to the standard normal 
distribution once the random variables have 
been summed and transformed appropriately. 
The prerequisite, however, was that the vari¬ 
ance exists. Now, we can drop the requirement 
for finite variance and only ask for indepen¬ 
dence and identical distributions to arrive at 
th e generalized central limit theorem expressed by 
equation (3). The sum of transformed random 
variables following rather arbitrary laws will 
have a distribution that follows a stable dis¬ 
tribution law as the number n becomes very 
large. Thus, the class of a-stable distributions 
provides a greater set of limit distributions than 
the normal distribution containing the latter as 
a special case. Theoretically, this justifies the use 
of a-stable distributions as the choice for model¬ 
ing asset returns when we consider the returns 
to be the resulting sum of many independent 
shocks. 


Let us resume the previous example with the 
random dynamic and the related stock price 
evolution. Suppose, now, that the 10-day dy¬ 
namic was S a distributed. We denote the ac¬ 
cording random variable by V\q. We select a 
fairly moderate stable parameter of a = 1.8. A 
value in this vicinity is commonly estimated 
for daily and even weekly stock returns. The 
skewness and location parameters are both set 
to zero, that is, /3 = p = 0. The scale is a = 1, so 
that if the distribution was normal, that is, a = 2, 
the variance would be 2 and, hence, consistent 
with the previous distributions. Note, however, 
that for a — 1.8, the variance does not exist. Here 
the probability of the dynamic's exceedance of 
the lower threshold of 1 is 

P(Vl 0 > 1) = 0.2413 (4) 

compared to 0.2398 and 0.1870 in the normal 
and Student's t cases, respectively. Again, the 
probability in (4) corresponds to the event that 
in 10 days, the stock price will be greater than 
$271. So, it is more likely than in the normal and 
Student's t model. 

For the higher threshold of 3.5, we obtain 
P(V W > 3.5) = 0.0181 

compared to 0.0067 and 0.0124 from the normal 
and Student's t cases, respectively. This event 
corresponds to a stock price beyond $3,312, 
which is an immense increase. Under the nor¬ 
mal distribution assumption, this event is vir¬ 
tually unlikely. It would happen in less than 1% 
of the 10-day periods. However, under the sta¬ 
ble as well as the Student's t assumption, this 
could happen in 1.81% or 1.24% of the scenarios, 
which is three times or double the probability, 
respectively. Just for comparison, let us assume 
a — 1.6, which is more common during a rough 
market climate. The dynamic would now ex¬ 
ceed the threshold of 1 with probability 

P(Vl 0 > 1) = 0.2428 

which fits in with the other distribution. For 3.5, 
we have 

P(V! 0 > 3.5) = 0.0315 (5) 
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which is equal to five times the probabil¬ 
ity under the normal distribution and almost 
three times the probability under the Student's 
t distribution assumption. For this threshold, 
the same probability as in equation (5) could 
only be achieved with a variance of a 1 = 4, 
which would give the overall distribution a dif¬ 
ferent shape. In the Student's f case, the degree 
of freedom parameter would have to be less 
than 3 such that now the variance would not 
exist any longer. 

For the stable parameters chosen, the same 
results are obtained when the sign of the re¬ 
turns is negative and losses are considered. 
For example, P(V io < —3.5) = 0.0315 corre¬ 
sponds to the probability of obtaining a stock 
price of $3 or less. This scenario would only 
be given 0.67% probability in a normal distri¬ 
bution model. With respect to large portfolios 
such as those managed by large banks, nega¬ 
tive returns deserve much more attention since 
losses of great magnitude result in widespread 
damages to industries beyond the financial 
industry. 

As another example, let's look at what hap¬ 
pened to the stock price of American Interna¬ 
tional Group (AIG) in September 2008. On one 
single day, the stock lost 60% of its value. That 
corresponds to a return of about —0.94. (Keep 
in mind that we are analyzing logarithmic re¬ 
turns.) If we choose a normal distribution with 
jji = 0 and er 2 = 0.0012 for the daily returns, 
a drop in price of this magnitude or less has 
near zero probability. The distributional param¬ 
eters were chosen to best mimic the behavior of 
the AIG returns. By comparison, if we take an 
a-stable distribution with a — 1.6, = 0, /x = 0, 

and a = 0.001 where these parameters were se¬ 
lected to fit the AIG returns, we obtain the prob¬ 
ability for a decline of at least this size of 0.00003, 
that is, 0.003%. So even with this distribution, 
an event of this impact is almost negligible. As a 
consequence, we have to chose a lower param¬ 
eter a for the stable distribution. That brings 
to light the immense risk inherent in the return 
distributions when they are truly a-stable. 


KEY POINTS 

• Heavy tails are the general reference term 
for probability distributions whose probabil¬ 
ity mass in the tails (i.e., extreme parts of the 
distribution) is heavier than in the case of 
a normal distribution. Although there is no 
unique definition of the feature, there exists a 
selection of parameters that express whether 
a distribution is heavy-tailed with respect to 
the normal distribution. Financial asset re¬ 
turns commonly exhibit heavy tails, which 
imposes additional risk on asset managers 
that solely rely on theory based on the nor¬ 
mal distribution and other candidates with 
appealing properties. Hence, it is necessary 
to account for heavy tails. 

• Extreme value theory comprises a collec¬ 
tion of distributions dealing with the most 
extreme values of some set. Either these distri¬ 
butions concentrate on the maxima and min¬ 
ima, respectively, or the most extreme values 
beyond thresholds. In general, this theory dis¬ 
tinguishes among three different kinds of ex¬ 
treme value behavior. Financial risk theory 
has become intertwined with extreme value 
theory since it has become common knowl¬ 
edge that it does not suffice to base all analysis 
on the normal distribution alone. 

• Stable distributions form a class of distribu¬ 
tions capable of dealing with many stylized 
facts observed for asset returns. Moreover, the 
distributions from this class exhibit the prop¬ 
erty of stability under summation, roughly 
meaning that sums of random variables fol¬ 
lowing certain probability laws are again dis¬ 
tributed as individual random variables. This 
makes them appealing for the characteriza¬ 
tion of asset return behavior observed in the 
real world. 

• Skewness is basically a measure of asym¬ 
metry of some distribution. While the nor¬ 
mal distribution is symmetric about its mean, 
many other distributions do not share this fea¬ 
ture. In fact, when analyzing asset returns, 
it is often revealed that they are noticeably 
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skewed to one side; that is, they are asymmet¬ 
ric. Consequently, it is important to consider 
skewness when dealing with asset returns in 
order to avoid additional risk arising from its 
neglect. 

• The generalized central limit theorem is the 
extension of the central limit theorem stat¬ 
ing that the appropriately scaled sum of cer¬ 
tain random variables is eventually standard 
normally distributed when their number be¬ 
comes large. However, the criteria for these 
random variables for the central limit theo¬ 
rem to hold are sometimes unrealistic. The 
generalized central limit theorem, in con¬ 
trast, relaxes some of these criteria to in¬ 
clude a larger selection of random variables 
that would fail to sum up to a standard nor¬ 
mally distributed random variable. The lim¬ 


iting distributions of these sums are instead 
members of the class of a-stable distributions. 
This theorem provides a justification for the 
use of stable distributions in mathematical 
finance. 
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Abstract: In financial models for asset pricing and asset allocation, asset returns and prices are 
assumed to follow a normal or Gaussian distribution. However, the properties of the normal 
distribution are not consistent with the observed behavior found for real-world asset returns. 
More specifically, the symmetric and rapidly decreasing tail properties of asset return distributions 
cannot describe the skewed and fat-tailed properties of the empirical distribution of asset returns. 
The alpha-stable distribution or a-stable distribution has been proposed as an alternative to the 
normal distribution for modeling asset returns because it allows for skewness and fat tails. Recent 
research since the turn of the century has introduced alternative distributions such as the tempered 
stable distributions to better describe asset returns. 


In finance, the normal or Gaussian distribu¬ 
tion has been the underlying assumption in 
describing asset returns in major financial the¬ 
ories such as the capital asset pricing the¬ 
ory and option pricing theory. In the early 
1960s, Benoit Mandelbrot, a mathematician at 


IBM's Thomas J. Watson Research Center, pre¬ 
sented empirical evidence regarding returns 
on commodity prices and interest rate move¬ 
ments that strongly rejected the assumption 
that asset returns are normally distributed (see 
Mandelbrot, 1963). The mainstream financial 
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models at the time relied on the work of Louis 
Bachelier, a French mathematician who at the 
beginning of the 20th century was the first 
to formulate random walk models for stock 
prices (see Bachelier, 1900). Bachelier's work 
assumed that relative price changes followed 
a normal distribution. Mandelbrot's findings 
led a leading financial economist, Paul Coot- 
ner of MIT, to warn the academic community 
that Mandelbrot's finding may mean that "past 
econometric work is meaningless"(see Cootner, 
1964). 

In Mandelbrot's attack on the normal distri¬ 
bution, he suggested that asset returns are more 
appropriately described by a non-normal stable 
distribution referred to as a stable Paretian dis¬ 
tribution or alpha-stable distribution (a-stable 
distribution), so named because the tails of this 
distribution have Pareto power-type decay. The 
reason for describing this distribution as "non¬ 
normal stable" is because the normal distribu¬ 
tion is a special case of the stable distribution. 
Because of the work by Paul Levy, a French 
mathematician who introduced and character¬ 
ized the non-normal stable distribution, this 
distribution is also referred to as the Levy stable 
distribution and the Pareto-Levy stable distri¬ 
bution. 

There are two other facts about asset return 
distributions that have been supported by em¬ 
pirical evidence. First, distributions have been 
observed to be skewed or nonsymmetric. That 
is, unlike in the case of the normal distribu¬ 
tion where there is a mirror imaging of the two 
sides of the probability distribution, typically in 
a skewed distribution one tail of the distribu¬ 
tion is much longer (i.e., has greater probability 
of extreme values occurring) than the other tail 
of the probability distribution. Probability dis¬ 
tributions with this attribute are referred to as 
having fat tails or heavy tails. The second finding 
is the tendency of large changes in asset prices 
(either positive or negative) to be followed by 
large changes, and small changes to be followed 
by small changes. This attribute of asset return 
distributions is referred to as volatility cluster¬ 
ing. In contrast to the normal distribution, the 


a-stable distribution allows for skewness and 
fat tails. 

While the a-stable distribution has certain 
desirable properties that will be discussed in 
more detail in this entry, it is not suitable 
in certain modeling applications such as the 
modeling of option prices. In order to obtain 
a well-defined model for pricing options, the 
mean, variance, and exponential moments of 
the return distribution have to exist. For this 
reason, the smoothly truncated stable distri¬ 
bution and various types of tempered stable 
distributions have been proposed for financial 
modeling. Those distributions are obtained by 
tempering the tail properties of the a-stable dis¬ 
tribution. Because they converge weakly to the 
a-stable distribution, the a-stable distribution 
is embedded in the class of the tempered stable 
distributions. 

In this entry, we discuss the a-stable and tem¬ 
pered stable distributions. The more general 
distribution, named the infinitely divisible dis¬ 
tribution, will be discussed as well. The distri¬ 
butions in this entry are defined by their char¬ 
acteristic functions. The density functions are 
not given by a closed-form formula in general 
but obtained by a numerical method discussed 
in Rachev et al. (2011). 

a-STABLE DISTRIBUTION 

In this section, we discuss a wide class of 
a-stable distributions. We review the defini¬ 
tion and the basic properties of the a-stable 
distribution. We further present the class of 
smoothly truncated stable distributions which 
has been proposed by Menn and Rachev (2009) 
for dealing with the drawbacks of the a-stable 
distribution. 

Definition of an a-Stable 
Random Variable 

We begin with a definition of an a-stable ran¬ 
dom variable. 1 Suppose that X\, X 2 , .. .,X n are 
independent and identically distributed (IID) 
random variables, independent copies of X. 
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Then a random variable X is said to follow 
an a-stable distribution if there exist a positive 
constant C„ and a real number D n such that the 
following relation holds: 

X, + X 2 + ■ • • + X„ = C„X + D„ 

The notation = denotes equality in distribu¬ 
tion. The constant C n — n« dictates the stability 
property, which we will discuss later. When a = 
2, we have the Gaussian (normal) case. In subse¬ 
quent discussions of the a-stable distributions 
in this entry, we restrict ourselves to the non- 
Gaussian case in which 0 < a < 2. 

For the general case, the density of the 
a-stable distribution does not have a closed- 
form solution. The distribution is expressed by 
its characteristic function: 


0stabl e(W«. P, M) = E[e luX ] 

1 exp (ifiu — \au\ a (1 — ip( sign u) tan ™)), 
\ exp(;/ni - a \u\ (1 - ip ^(signu)ln |u|)), 


where 


sign t = 



t > 0 
t = 0 
t < 0 


a / 1 
a = 1 
( 1 ) 


The distribution is characterized by four 
parameters: 


• or. the index of stability or the shape parame¬ 
ter, a e (0, 2). 

• ft: the skewness parameter, p e [—1,+1]. 

• a: the scale parameter, a e (0, +oo). 

• fr. the location parameter, /i e (—oo, +oo). 

When a random variable X follows the a-stable 
distribution characterized by those parameters, 
then we denote it by X ~ S„(er, P, /i). 

The three special cases where there is a closed- 
form solution for the densities are (1) the Gaus¬ 
sian case (a = 2), (2) the Cauchy case (a = 1, 
P = 0), and (3) the Levy case (a = 1/2, p = ±1) 
with the following respective densities: 

• Gaussian: f(x) = 4a2 ’ ~ 00 < x < 00 

• Gauchy:/(x) = 7r((x _; )2 + <j2) , -oo < x < oo 

• Levy:/ (x) = M < x < oo 

Because of the four parameters, the a-stable 
distribution is highly flexible and suitable for 
modeling nonsymmetric, highly kurtotic, and 
heavy-tailed data. Figures 1 and 2 illustrate the 
effects of the shape and skewness parameters, 
respectively, on the shape of the distribution, 
with other parameters kept constant. As is ev¬ 
ident from Figure 1, a lower value for a is at¬ 
tributed to heavier tails and higher kurtosis. 



x 


Figure 1 Illustration of a-Stable Densities for Varying a's, with ft = 0, a = 1, and /i = 0 
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x 


Figure 2 Illustration of a-stable Densities for Varying /3's, with a = 1.25, a = 0.5, and // = 0 


Useful Properties of an a-Stable 
Random Variable 

The four basic properties of the a-stable 
distribution: 

• Property 1. The power tail decay property 
means that the tail of the density function 
decays like a power function (slower than 
the exponential decay), which is what allows 
the distribution to capture extreme events in 
the tails: 

P(|X| > x) oc C ■ x~ a , x —> oo 

for some constant C. More precisely, if X ~ 
S a (<j, P, p) with 0 < a < 2 then 

( lim^ X a P(X > X) = 
j lim^oo X a P(X > -a) = C a 

where 

-- if Q/ -/ 1 

r(2-a)cos(jra/2) 11 u r= 1 

— if a ^ 1 

• Property 2. Raw moments satisfy the property: 

E | X| p < oo for any 0 < p < a 
E | X| p = oo for any p > a 


• Property 3. Because of Property 2, the mean is 
finite only for a > 1: 

E[X] = p fora > 1 
E[X] = oo forO < a < 1 

The second and higher moments are infinite, 
leading to infinite variance together with the 
skewness and kurtosis coefficients. 

• Property 4. The stability property is a use¬ 
ful and convenient property and dictates 
that the distributional form of the variable is 
preserved under linear transformations. The 
stability property is governed by the stabil¬ 
ity parameter a in the constant C n (which 
appeared earlier in the definition of an a- 
stable random variable): C n = h 1 - / “. As was 
stated earlier, smaller values of a refer to 
a heavier-tailed distribution. The standard 
central limit theorem does not apply to the 
non-Gaussian case: An appropriately stan¬ 
dardized large sum of IID random variables 
converges to an a-stable random variable in¬ 
stead of a normal random variable. 

The following examples illustrate the stability 
property. Suppose that X\, X 2 , ... ,X n are IID 
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random variables with X, ~ S„(<t„ ft,, /i,), i = 1, 
2, ... ,n and a fixed a. Then: 


• The distribution of Y — X, is a-stable with 
the index of stability a and parameters: 


P = 




<7 = 



n 


i 


• The distribution of Y = X\ + a for some real 
constant a is a-stable with the index of stabil¬ 
ity a and parameters: 


P = Pi, a—a\ — a 

• The distribution of Y = aX\ for some real con¬ 
stant a(a ^ 0) is a-stable with the index of 
stability a and parameters: 

P = (sigmz)fr 
a = \a | err 

aui fora ± 1 

Li = \ 9 

a/i i — - a(h\a)(j\Pi fora = 1 

* The distribution of Y = —Xi is a-stable with 
the index of stability a and parameters: 


P = ~Pl, U = <7i n = Hi 


Smoothly Truncated Stable 
Distribution 

In some special cases of financial modeling it 
might occur that the infinite variance of stable 
distributions make their application impossi¬ 
ble. In many cases, the infinite variance of the 
return might lead to an infinite price for deriva¬ 
tive instruments such as options, clearly con¬ 
tradicting reality and intuition. The modeler is 
confronted with a dilemma. On the one hand, 
the skewed and heavy-tailed return distribu¬ 
tion disqualifies the normal distribution as a 
suitable candidate; on the other hand, theoret¬ 
ical restrictions in option pricing do not allow 
the application of the stable distribution due to 
its infinite moments of order higher than a. For 
this reason, Menn and Rachev (2009) have sug¬ 


gested the use of appropriately truncated stable 
distributions. 

The exact definition of truncated stable distri¬ 
butions is not that important at this point; that is 
why we restrict ourselves to a brief description 
of the idea. The density function of a smoothly 
truncated stable distribution (STS distribution) 
is obtained by replacing the heavy tails of the 
density function g of some stable distribution 
with parameters (a, ft, a , /i) by the thin tails of 
two appropriately chosen normal distributions 
hi and h 2 . 


/(*) = 


h i(x), 

g(x), 

h 2 (x), x > b 


x < a 
a <x <b 


The parameters of the normal distributions 
are chosen such that the resulting function is 
the continuous density function of a probabil¬ 
ity measure on the real line. If it is possible to 
choose the cutting points a and b in a way that 
the resulting distribution possesses zero mean 
and unit variance, then we have found an easy 
way to characterize standardized STS distribu¬ 
tions. In Figure 3, the influence of the stable 
parameters on the appropriate cutting points 
is examined. As a approaches 2 (i.e., when the 
stable distribution approaches the normal dis¬ 
tribution), we observe that the cutting points 
move to infinity. For small values of a, in con¬ 
trast, the interval [a, b] shrinks, reflecting the 
increasing heaviness of the tails of the stable 
distribution in the center. 

Due to the thin tails of the normal density 
functions, the STS distributions admit finite mo¬ 
ments of arbitrary order but nevertheless are 
able to explain extreme observations. Table 1 
provides a comparison of tail probabilities for 
an arbitrarily chosen STS distribution with zero 
mean and unit variance and the standard nor¬ 
mal distribution. As can be seen from the ta¬ 
ble, the probability of extreme events is much 
higher under the assumption of an STS distri¬ 
bution. STS distributions allow for skewness in 
the returns. Moreover, the tails behave like fat 
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Figure 3 Influence of the Stable Parameters on the Cutting Points a and b 


Table 1 Comparison of Tail Probabilities for a 
Standard Normal and a Standardized STS Distribution 


X 

P(Xi < x) 
with X 1 ~ N( 0,1) 

P(X 2 < x) 
with X z ~ STS 

-1 

15.866% 

11.794% 

-2 

2.275% 

2.014% 

-3 

0.135% 

0.670% 

-4 

0.003% 

0.356% 

-5 

« 10“ 5 % 

0.210% 

-6 

io-®% 

0.120% 

-7 

10- 10 % 

0.067% 

-8 

Rs 10- 14 % 

0.036% 

-9 

p» 10- 17 % 

0.019% 

-10 

Rs 10“ 22 % 

0.010% 


tails but are light tails in the mathematical sense. 
Hence, all moments of arbitrary order exist and 
are finite. For this reason, advocates of the class 
of STS distribution argue that it is an appropri¬ 
ate class for modeling the return distribution of 
various financial assets. 

TEMPERED STABLE 
DISTRIBUTIONS 

In this section, we discuss six types of tempered 
stable distributions. 


Classical Tempered Stable 
Distribution 

Let a e (0,1) U (1, 2), C, A.+, X_ > 0, and m e 
M. X is said to follow the classical tempered stable 
(CTS) distribution if the characteristic function 
of X is given by 

0x00 = 0crs(“;a, C, X + X-m) 

= exp(ium — iuCT(l — a) — L"” 1 ) 
+ CT(—a)((A. + — iu) a — A.“ 

+ (A_ + iuf - r)) (2) 

and we denote it by X ~ CTS(«, C, /.+, /,_, m). 

Using the n tin derivative of 000 = log0xOO 
evaluated around zero, the cumulants c„(X) = 
jr, V^(0) °f X are obtained by 

c i(X) = m 
c„(X) = CP(n-a 

+ (-1torn = 2,3, ■■■ 

The role of the parameters is as follows: 

• The parameter m determines the location of 
the distribution. 
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Figure 4 Probability Density of the CTS Distributions' Dependence on C 
Note: C e {0.25, 0.5,1,2}, a = 1.4, k + = 50, = 50, m = 0. 


* The parameter C is the scale parame¬ 
ter. Figure 4 shows the density func¬ 
tion of the CTS distributions' dependence 
on C. 

* The parameters a + and /,_ control the rate 
of decay on the positive and negative tails. 


respectively. If /, + > /,_ (/. + < /._), then the 
distribution is skewed to the left (right), and 
if /. + = then it is symmetric. Figure 5 il¬ 
lustrates left and right skewed density func¬ 
tions of the CTS distribution, as well as the 
symmetric case. 



Figure 5 Probability Density of the CTS Distributions: Dependence on k + and /. 
Note: (k+, X_) e {(1, 70), (3, 3), (70,1)}, a = 0.8, C = 1, m = 0. 
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Figure 6 Probability Density of the Symmetric CTS Distributions' Dependence on Parameters /,+,X 
Note: X+ = A._ e {10, 20, 30, 40}, a = 1.1, C = 1, m = 0. 


* The parameters k + ,and a are related to tail 
weights. Figures 6 and 7 illustrate this fact. We 
will discuss another role of a later. 

• If a approaches to 0, the CTS distribution con¬ 
verges to the variance-gamma distribution 
(discussed later in this entry) in distribution 
sense. 


If we take a special parameter C defined by 

C = ( r (2 - «)( A “- 2 + r - 2 ))" 1 ( 3 ) 

then X ~ CTS(a, C, X +/ 0) has zero mean 

and unit variance. In this case, X is called 
the standard CTS distribution with parameters 
(a, X + , X-) and denoted by X ~ stdCTS(a, X + , 



Figure 7 Probability Density of the CTS Distributions: Dependence on a 
Note: a e (0.5, 0.8,1.1,1.4}, C = 1, X + = 50, X_ = 50, m = 0. 
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A_). Let m be a real number, a be a positive real 
number, and X ~ stdCTS(a, X + , a_). Then 

Y = <7 X + m m CTS 

/ er“ X+ \ 

X r(2 - u)(X a + ~ 2 + X a _~ 2 ) ’ o”’ o - ’ / 

The random variable Y is the CTS distributed, 
and its mean and variance are m and er 2 , respec¬ 
tively. 


Generalized Classical Tempered 
Stable Distribution 

A more general form of the characteristic func¬ 
tion for the CTS distribution is 

<f>x(u) = exp (ium — i«r(l — a)(C + A“ +_1 — C-X a S 
+ C + F(—a + )((A + - iu)T+ ~ K + ) 

+ C_F(—a_)((A_ + iu) a ~ — X“~)) (4) 


where a+, a_ e (0,1) U (1,2), C+, C_, X + , X- > 0, 
and m e K. This distribution has been referred to 
as the generalized classical tempered stable (GTS) 
distribution and we denote it by X ~ GTS(a + , 
a_, C+, C_, X + , X-,m). 2 

The cumulants of X are Ci (X) = m and 

c„(X) = C+T(n - a + )A“ + -" 

+ (-l)"C_r(n - 


for n — 2, 3, • ■ •. If we substitute 


C + = 


px 


2—a+ 


T(2 -«+)’ 


C_ = 


(1 - p)X 


2—a 


T(2 - a_) 


(5) 


where p e (0, 1), then X ~ GTS(a + , ar_, C+, C_, 
X + , X _, 0) has zero mean and unit variance. In 
this case, X is called the standard GTS distri¬ 
bution with parameters (a+, a_, X + , a_, p) and 
denoted by X ~ stdGTS (a + , a_, a + , a_, p). 


Modified Tempered Stable 
Distribution 

Let a e (0,1) U (1, 2), C, X + , A._ > 0, and ;« e K. 
X is said to follow the modified tempered stable 


(MTS) distribution (see Kim et al., 2009) if the 
characteristic function of X is given by 


4>x(u) = <pMTs{ u }a, C, A.+, X_, m) 

— exp (ium + C(G R (u;a, X + ) + G R (u;a , A_)) 
+ iuC(Gi(u;a, A + ) — Gi(u;a, X-))) (6) 

where for u e M, 

G R (x;a , X) = 2 ^((A 2 + x 2 )? - A“) 

and 


Gi(x;a, X) = 2 2 F 


1 — a 


2F1 1 , 


■ a 3 x 

/ / T - 



where 2 F 1 is the hypergeometric function. We 
denote an MTS distributed random variable X 
by X ~ MTS(a, C, X + , a_, m). 

The role of the parameters of the MTS distri¬ 
bution is same as in the case of the CTS dis¬ 
tribution. For example, the parameters X + and 
X _ control the rate of decay on the positive and 
negative tails, respectively, and if A.+ = /._, then 
it is symmetric. The characteristic function of 
the symmetric MTS distribution is defined not 
only for the case a e (0,1) U (1, 2) but also for 
the case a = 1. The form of the characteristic 
function for the symmetric case is given by 


4>x(u) = (pMTs(u;oi, C,X,X,m ) 


= exp(ium + C2 VttT (^— 
x((l 2 + x 2 ) f -*“)) 

The mean of X is m, and the cumulants of X 
are equal to 


c„(X) = 2”-^cr 


r 


n — a 


x(X a ~ n + (-1 ) n X“~ n ) 


for n — 2, 3, • • •. 

If we substitute 

c = 2^ (V^r (1 -4 “- 2 + r~ 2 )) 1 

(7) 

then X ~ MTS(«, C, X + , a_, 0) has zero mean and 
unit variance. In this case, the random variable 
X is called the standard MTS distribution and 
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Figure 8 Probability Density of the NTS Distributions' Dependence on f J > 
Note: p e {—2.5, 0, 2.5}, a = 0.8, C = 1, X = 4, m = 0. 


denoted by X ~ stdMTS(a, A.+, k_). Let m be a 
real number, cr be a positive real number, and X 
~ stdMTS(a, X + , A_). Then 

Y = aX + m ~ MTS(a, er“C, X + /a, A._/er, m) 

where C is equal to (7). The random variable Y 
is MTS distributed, and its mean and variance 
are m and a 2 , respectively. 

Normal Tempered Stable 
Distribution 

Let a e (0, 2), C, X > 0, \fi\ < X, and m e R. X is 
said to follow the normal tempered stable (NTS) 
distribution. 3 If the characteristic function of X 
is given by 

0 x 00 = 4>nts(u;oi, C , X, p, m) 

= exp(ium — iu2~ 9 T ~^/jrCF ^1 — — ^ 

xP(X 2 - p 2 )i~ l + 2-^CV^r 

X ((x 2 -(fi + iuf)t-(x 2 - 0 2 )t)) (8) 

We denote an NTS distributed random variable 
X by X ~ NTS(a, C, X, ft, m). 

The mean of X is m. The general expressions 
for cumulants of X are omitted since they are 


rather complicated. Instead of the general form, 
we present three cumulants 

Cz(X) = C(X 2 —p 2 )z~~ 2 a(ap 2 —X 2 —p 2 ) 

C 3 (X) = —CaP(X 2 —p 2 )^~ 3 (a 2 ft 2 —3aX 2 —3ctft 2 +6X 2 +2ft 2 ) 
c 4 (X) = Ca(a-2)(X 2 -P 2 )i- A 

x(ce 2 p i )-6aX 2 p 2 -4ap 4 +3p 4 +18X 2 p 2 +3X i ) 

where C = (—|) 

The roles of parameters a, C, and X are same as 
in the case of the symmetric MTS distribution. 
The parameter fi is related to the distribution's 
skewness. If ft < 0 (p > 0), then the distribution 
is skewed to the left (right). Moreover, if ft = 0, 
then it is symmetric. This fact is illustrated in 
Figure 8. 

If we substitute 

C = 2°^ 

x (07r (-“) a(* 2 - p 2 )“- 2 (ap 2 - X 2 - p^y 1 

(9) 

then X ~ NTS(«, C, X, j J >, 0) has zero mean 
and unit variance. In this case, X is called the 
standard NTS distribution and denoted by X ~ 
stdNTS(a, /., ft). Let m be a real number, a be a 
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positive real number, and X ~ stdNTS(a, X, ft). 
Then 


Y = a X + m ~ NTS(a, a“C, X/a, /S/a, m) 

where C is equal to (9). The random variable Y 
is NTS distributed, and its mean and variance 
are m and a 2 , respectively. 

If we substitute a = 1 and C = — into the 

71 

definition of the NTS distribution, we obtain the 
normal inverse Gaussian (NIG) distribution. 4 
That is, if random variable X ~ NTS(1, c/jt , 
X, /8, in), then X becomes an NIG distributed 
random variable. In this case, we denote X ~ 
NIG(c, X, /8, in). 

By substituting a — 1 and C = ^ into (8), we 
obtain the characteristic function of the NIG dis¬ 
tributed X as 


0x00 = <l) NIG (u;c,X, p,m) 
iucft 


= exp l i nm 


yjtf-ifi+iu)! - s/X^-A ) (10) 


If we substitute 

(X 2 -p 2 )^ 

C = - --— 

i 2 


(ii) 


then X ~ NIG(c, X, ft, 0) has zero mean and unit 
variance. In this case, X is called the standard 
NIG distribution and denoted by X ~ stdNIG 

(K P). 


Kim-Rachev Tempered Stable 
Distribution 

Let a e (0, 1) U (1, 2), , k + , k_, r+, r_ > 0, p + , 
p_ e {p > —a | p — 1, p 0}, and m el. X 
is said to follow the Kim-Rachev tempered stable 
(KRTS) distribution (see Kim et al., 2008b) if the 
characteristic function of X is given by 

0x00 = <Pkrts(u;u, k + , fc-, r + , r_, p+, p-,m) 

s ft k+r+ k_r_ \ 

= exp (z um — luPtl — a) - 

\p + +1 p- + iy 

+ k + H{iu;a, r + , p + ) + k_H(—iu;ct, r_, pJ)) 

(12) 


where 

H(x;a,r, p) = ^ (iFiip , -a;l + p;rx)~ 1) 

P 

We denote a KRTS distributed random vari¬ 
able X by X ~ KRTS(a, k + , k_, r +/ r_, p +j p_, 
p_, m). 

The KRTS distribution is an extension of 
the CTS distribution. Indeed, the distribution 
KRTS(o!, k + , k-, r+, r_, p + , p_, m ) converges 
weakly to the CTS distribution as p± —> oo 
provided that C± = c(a+ p±)rg a where c > 0 
(see Kim et al., 2008a). Figure 9 shows that the 
KRTS distribution converges to the CTS distri¬ 
bution when parameter p = p+ — p~ increases to 
infinity. 

The cumulants of the KRTS distributed ran¬ 
dom variable X are C] (X) = ill and 


c„(X) = V(n - a) 


p+ + n 
forzz = 2,3, • • ■. 


k + r’]_ k_r n 

+ + + (—1) -= 


If we substitute 



Q/ + P+ 

r a 
' + 

a + P- 


where 


C = 


1 

r(2 - a) 


( a + V+ r 2-a 
U+p+ + 


<*+P- r 2-a\ 
2+ P- ) 


(13) 

then X ~ KRTS(a, k + , k_, r+, r_, p + , p_, 0) has zero 
mean and unit variance. In this case, X is said to 
be standard KRTS distributed and denoted by 
X ~ stdKRTS(a, r+, r_, p + , pft). Let m be a real 
number, a be a positive real number, and X ~ 
stdKRTS(a, r+, r_, p+, pft). Then 


Y = oX + m 

~ KRTS(a, C(a + p+)(ar+)"“, C(a + pft) 
(orft)~ a , ar + , aft , p + , p_, m) 

where C is equal to (13). The random variable Y 
is KRTS distributed, and its mean and variance 
are m and a 2 , respectively. 
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Figure 9 Probability Density of the CTS Distribution with Parameters C = 1, X + = 10, /._ =2, ct = 1.25, 

and the KRTS Distributions with k± = C(a + p)±‘, r + = 1/A.+ , r_ = 1 />_, where p = p + = p- e {—0.25, 

1 , 10 } 


Rapidly Decreasing Tempered 
Stable Distribution 

Let a e (0,1) U (1,2), C, X + ,X- > 0, and meK.A 
random variable X is said to follow the rapidly 
decreasing tempered stable (RDTS) distribution (see 
Bianchi et al., 2010 and Kim et al., 2010) if the 
characteristic function of X is given by 

0x00 = C, A + , X_, m) 

exp (ium + C(G(iu;a, X + ) + G(—iu;a, A_))) 

(14) 

where 

c lw ,,=r i -Vr(-3(»(--4)- 1 ) 

+ 2“t _ u““ 1 xr 


and M is the confluent h ypergeo metric func¬ 
tion. Further details of the confluent hypergeo¬ 
metric function are presented at the end of this 
entry. In this case, we denote X ~ RDTS(a, C, 
A.+, /,_, m). The role of the parameters are the 
same as for the case of the CTS distribution. 


The mean of X is m, and the cumulants of X 
are 

c(x) = 2“cr (^) 

x(L“-" + (-1)"/.“-"), forn = 2,3, ■ ■ ■. 
If we substitute 

C=2i(r(l-|)(A“- 2 + r- 2 )) _1 (15) 

then X ~ RDTSftf, C, X + , 0) has zero mean 
and unit variance, and X is called the stan¬ 
dard RDTS distribution and denoted by X ~ 
stdRDTS(a, X + , /._). Let m be a real number, a 
be a positive real number, and X ~ stdCTS(a, 
X+, X_). Then 

aX + m ~ RDTS(a, cr“C, X + /a, X-/o, m) 

where C is equal to (15). The random variable Y 
is RDTS distributed, and its mean and variance 
are m and ct 2 , respectively. 
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INFINITELY DIVISIBLE 
DISTRIBUTIONS 

A random variable Y is referred to as infinitely 
divisible if for each positive integer n, there are 
IID random variables Y- u Y 2 , • ■ ■, Y„ such that 
Y - Ylk =1 Yt that is, the distribution of Y is the 
same as the distribution of Jfk=i Yfc 
For example, the normal distribution is in¬ 
finitely divisible. Using the characteristic func¬ 
tion for the normal distribution, we can easily 
check the property Suppose Y ~ N(ji,rj 2 ). For 
any positive integer n, consider a sequence of 
IID random variables Y 1; Y 2 ,... Y„ such that Yt 
~ N(ji/n, cr 2 /n). Since Yfs are independent we 
have 


£ 




n e i iuY ^ 

k =1 


The characteristic function of Y k is given by 


£ [inY k ] = exp I — - 


a 2 u 2 

~Tn 


Hence, the characteristic function of J2k= l Yfc' s 


exp I i u ^2 Y k 


k=l 


— exp l n;/x 


which is the same as the characteristic function 
of Y. Therefore, Y- Jfk-\ Yt- 

Using similar arguments, we can show that 
the Poisson, gamma, variance-gamma (VG), in¬ 
verse Gaussian (IG), a-stable, CTS, GTS, MTS, 
NTS(NIG), RDTS, and KRTS distributions are 
infinitely divisible. The relations of Y and Y k/ k 
— l,...n for those distributions are presented in 
Table 2. We can show that the sum of infinitely 
divisible random variables is again infinitely 
divisible. 

In the literature, the characteristic function of 
the one-dimensional infinitely divisible distri¬ 
bution is generalized by the Levy-Khinchine 
formula: 


exp 


^iyu — -oi 2 u 2 + j (e lux — 1 — iuxl\ x \<i)v(dx)J 


( 16 ) 


Table 2 Infinitely Divisible Distributions 



Y = £LiYc 

Yk 

Poisson 

Pois s(X) 

Poiss(^) 

Gamma 

Gamma(c, X) 

Gamma( X) 

Variance gamma 

VG(C, X+, X.) 

vg(£ *+,*_) 

Inverse Gaussian 

IG(c, X) 

IG(„ ’ A ) 

Normal 

N(p,cr 2 ) 

K-’ -) 

-stable 

S a (o, p, fj.) 


CTS 

CTS(a, C, X +j X-, m) 

CTS(a,£ X +> A_,f) 

GTS 

GTS (a + , a_, C + , C_, A.+, X-, m) 

GTs(a+a_,%,^A+A-,^) 

MTS 

MTS (a, C, A. + , A._, m) 

mts(«, 

NTS 

NTS (a, C, X, p, m) 

NTS(a,$,X,p^) 

KRTS 

KRTS (a + , k + , k_, r + . r_, p + , p_, m) 

KRTS (a, ^ k T ,r + ,r-,p + ,p.^) 

RDTS 

RDTS (a, C, X + , X_, m) 

RDTS(a, £*+,*,*) 
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Table 3 Levy Measures 


Distributions 

Levy Measure 

Poisson 

^Poisson (dx) = h8x{dx)5 

Gamma 

^gamma(^) = J ^-x>odx 

Variance gamma 

WG (dx) = ( c - l+ 'l I>0 + U<o) dx 

Inverse Gaussian 

-£x 

VIG (dx) = Cl 3 U >0 dx 

s/2jrx2 


In the formula, the measure v is referred to as 
the Levy measure. The measure is a Borel mea¬ 
sure satisfying the conditions that u(0) = 0 and 
/ R (l A |x 2 |)ii(dx) < oo. The parameters y and a 
are real numbers. The variable y is referred to as 
the center or drift and determines the location. 
This triplet (ct 2 , v, y) is uniquely defined for 
each infinitely divisible distribution and called 
a Levy triplet. 

If v(dx) = 0, then the characteristic function 
equals the characteristic function of the normal 
distribution. That is, the infinitely divisible dis¬ 
tribution with v(dx) = 0 becomes the normal 
distribution with mean y and variance a 1 . 

If a — 0, then the distribution is referred to as 
a purely non-Gaussian distribution. The char¬ 
acteristic functions of purely non-Gaussian dis¬ 
tributions are computed by 

exp \ iyii + J (e lux -l-iuxl\ x \ s i)v(dx)j 

Hence, except for the location determined by y , 
all the properties of the distribution are charac¬ 
terized by the Levy measure v(dx). The Poisson, 
gamma, VG, IG, a-stable, CTS, GTS, MTS, NTS, 
RDTS, and KRTS distributions are purely non- 
Gaussian distributions. The Levy measure of 
the Poisson, gamma, VG, and IG distributions 
are given in Table 3. 

The Levy measure of the a-stable distribution 
is given by 

Litabl e(dx) = + |^ 1+a l*<0^ dx 

(17) 


Using the Levy Khinchine formula we can ob¬ 
tain the characteristic function in (l). 5 

The Levy measure of the CTS, MTS, NTS, 
KRTS, and RDTS distributions can be obtained 
by multiplying the tempering function by the 
Levy measure of a-stable distribution. For ex¬ 
ample, if we take q(x) = e~ x+x l x>0 + e~ l ~ M l x<0 
as the tempering function, then we obtain the 
Levy measure of the CTS distribution as 


v(dx) 


q(x)v stMe (dx) 


( C+e^ 

\ x 1+ “ 


U>o + 


C_e- X -W \ 
|x| 1+ “ lx< °) 


dx 


Tempering functions of the other distributions 
are presented in Table 4. For this reason, they 
are referred to as the tempered stable distribu¬ 
tions. The GTS distribution is also a purely 
non-Gaussian distribution, but not a tempered 
stable distribution in this sense. Indeed, its Levy 
measure is given by 


v(dx) — 


C + e~ x + x 
x 1+ “+ 


U>0 + 


l+of- l*<oJ 


However, we will refer to the GTS distribu¬ 
tion as a tempered stable distribution for con¬ 
venience. Using the Levy measures and the 
Levy-Khinchine formula, we can obtain the 
characteristic functions (1), (2), (4), (6), (8), (12), 
and (14). 

Generalizations of the tempering function 
and the tempered stable distribution have been 
studied in the literature. 6 
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Table 4 Tempering Functions 

Tempering Function q(x ) 

CTS e-^ x l x>0 + e-^l x<0 

MTS (X+x)^ K ? +i(A. + x)l x> o + (A-lxD^ 1 K«+i(A._|x|)l x< o 

NTS e^(A.|x|)T 1 K ?± ia|x|) 

2 

KRTS r~ p+ fg + e~ x l*s a+p +- l ds l x>0 + rl p - e~^ s s a+p -^ ds 1 I<0 

X+x 2 X_|*l 2 

RDTS e 5- l I>0 + e ^ l*<o 


Exponential Moments 

The exponential moment of a random variable 
X is defined by E[e uX ] for some real number 
n. Existence of the exponential moment is im¬ 
portant for modeling an asset price process in 
option pricing theory. 

The exponential moment of the normal distri¬ 
bution is given by 

£ [e“ x ] = exp ^iu + 


where X ~ N(p, a). 

Using the Levy measure we can check the 
existence of the exponential moment for an 
infinitely divisible random variable. The fol¬ 
lowing theorem (see Sato, 1999) provides a use¬ 
ful tool to verify the existence of an exponential 
moment of an infinitely divisible distribution. 

Theorem Let X be an infinitely divisible random 
variable with the Levy triplet (a 2 , v, y) and let 
u e M. Then £[e" x ] < oo if and only if 



e ux v(dx) < oo 


(18) 


In this case, 


E[e uX ] = <t> x (—iu) 

where <j> is the characteristic function of X and 
i = V^T. 

The existence of exponential moments in the 
tempered stable distributions is as following: 

• For the a-stable random variable X, the expo¬ 
nential moment of X generally does not exist. 


However, if X ~ S„(er, 1, 0), then E[e uX ] < oo 
for u < 0. In this case. 



• For the CTS, GTS, and MTS distributions, the 
condition (18) is satisfied if and only if —X_ < 
u < X+. Hence, £[e" x ] < oo for u e [—X_, X + ]. 

• For the KRTS distribution, E[e llX ] < oo for u e 
[—l/r_, l/r+]. 

• For the NTS and the NIG distributions, 
£[e” x ] < oo for u e [—X — ft, X — p\. 

• For the RDTS distribution, (18) is satisfied for 
the entire real number u. Hence, E[e uX ] < oo 
for all u e M. 

If E[e llX ] < oo, then we can define the log- 
Laplace transform for the random variable X. The 
log-Laplace transform is given by 

L(u) = log£[e" x ] = log 4>(—in) 

if (18) is satisfied. 

For example, let X ~ stdCTS(a, X + , /,_). The 
log-Laplace transform Lqts of X is defined on u 
e [—X_, X + ], and is given by 


£crs( M ; fl . X + , X_) = log <j>CTS(—in;a, C, X+, X_, 0) 
_ (X+ - uf - X“ + (X_ + i if - X" 
a(a - 1)(X“- 2 + XU 2 ) 

MfX"- 1 - X"" 1 ) 

” (1 - a)(X“- 2 + XU 2 ) 

where C is satisfied (3). Using the same method, 
we can obtain the log-Laplace transform of the 
other standard tempered stable distributions as 
follows: 
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• Standard GTS distribution: 


TGrs(«;a+, a-, x + , X-) 

= log <*-, C+, C_, A.+, A_, 0) 

on u e[—A_, A.+] where C+ and C_ satisfy (5). 
* Standard MTS distribution: 


Lmts{w><x> ^+, ^-) 

= log (pMTs(-iu; «, C, A.+, A._, 0) 

on u e /, + ] where C satisfies (7). 

• Standard NTS distribution: 


Tjvrsfaha, M P) — log (/)nts( iW/ot, C, X , p, 0) 

on u e [—/. - p, X-p] where C satisfies (9). 

• Standard NIG distribution: 


1998.) We begin by introducing the following 
notation 


(fl)o = 1, (a)„=a(a+l)---(a+n-l) 

n = 1,2,3, • • •,« el (19) 


and we refer to the notation as the Pochhammer 
symbol. By properties of the gamma function, 
the Pochhammer symbol can also be defined by 


ifP)n — 


r(fl + n) 
r(«) 


n = 0,1,2,3, •••. 


From (19), we obtain 


{In + 1)! = 2 2n n\ 



( 20 ) 


Lnig(u;X , P) — log (p N iG{—iu;C, X, p, 0) 

on u e [■-X - p, X -0] where C satisfies (11). The Hypergeometric Function 
• Standard KRTS distribution: The function 


L KRTS {u;ci,r + ,r-, p+, p_) 

= log (/> K RTs(—iu;a, k+,k_,r + , r_, p+, p_, 0) 

on u e [—a_, A.+] where k + and /c_ satisfy (13). 

* Standard RDTS distribution: 

TsorsOca, ^+, A._) 

= log </>RDTs( — zu i a , C, A.+ , A._, 0) 

onueR where C satisfies (15). 


2 F 1 (a,b;c;x) 


OO 


E 

n=0 


(c)„ n!’ 


|x| < 1 


( 21 ) 

is called the hypergeometric function. If c / 
0, -1, -2, • ■ ■, the function F(a, b; c; x) is a so¬ 
lution to the linear second-order differential 
equation 


x(l — x)y" + (c — {a + b + \)x)y' — a by = 0 

( 22 ) 

referred to as the hypergeometric equation. 
Moreover, if c ^ 0, ±1, ±2, ■ • -, 


HYPERGEOMETRIC 
FUNCTION AND 
CONFLUENT 
HYPERGEOMETRIC 
FUNCTION 

In this entry, we referred to the hypergeometric 
function and the confluent hypergeometric 
function. Here we describe these two spe¬ 
cial functions. (For more details, see Andrews, 


y = C 12 Fi{a,b;c-,x) 

+C 2 x 1 ~ c 2 Fi{1 + a — c, 1 + b — c; 2 — c; x) 

for any constants Ci and C 2 , is a general so¬ 
lution to equation (22). For k = 1, 2, 3 • • 
/cth derivatives are obtained from the following 
equation: 

d k ( a) k (b) k 

- rT2 Fi(a,b;c;x)= y f 'V i(« + k, b + k;c + k;x) 
dx k (c) k 

(23) 





Stable and Tempered Stable Distributions 


257 


The Confluent Hypergeometric 
Function 

The function 


M(iz;c;x) 


OO 


E 

n =0 


(«)» x n 
(c) n n\ 


—OO < X < OO 


(24) 


is called the confluent h ypergeometric function 
and is obtained by the limit of the hypergeo¬ 
metric function as follows: 


M(n;c;x)= lim F(a, b;c;x/b) 

b^-oo 

The function M(a; c; x) is a solution of the linear 
second-order differential equation 

xy" + (c - x)y' -ay = 0 (25) 

referred to as the confluent hypergeometric 
equation. Moreover, if c / 0, ±1, ±2, ■ • •, 

y — CiM(«;c;x) + C2X 1-C F(1 + a — c; 2 — c; x) 


for any constants Ci and C 2 , is a general so¬ 
lution of equation (25). For k = 1, 2, 3 • ■ 
/cth derivatives are obtained by the following 
equation: 


d k 

dx k 


M(a;c;x) 


~— L M(a +k;c + k;x) 

(c)k 


(26) 


KEY POINTS 

* The distribution assumed in financial models 
for asset returns is the normal or Gaussian dis¬ 
tribution. Real-world asset returns, however, 
have been observed to be skewed and non- 
symmetric, two features that are inconsistent 
with the normal distribution. 

• Although the non-Gaussian alpha-stable dis¬ 
tribution is superior to the normal distribu¬ 
tion because it allows for skewness and fat 
tails, it is not suitable in certain modeling ap¬ 
plications such as in modeling option prices. 
This is because the mean, variance, and ex¬ 
ponential moments of the return distribution 
have to exist. The smoothly truncated sta¬ 


ble distribution, obtained by tempering the 
tail properties of the alpha-stable distribution, 
have been proposed for modeling in such 
instances. 

• There are six tempered stable distributions: 
classical tempered stable distribution, gen¬ 
eralized classical tempered stable distribu¬ 
tion, modified tempered stable distribution, 
normal tempered stable distribution, Kim- 
Rachev tempered stable distribution, and 
rapidly decreasing tempered stable distribu¬ 
tion. All six tempered stable distributions and 
the alpha-stable distribution are defined by 
their characteristic functions. 

* The infinitely divisible distribution is charac¬ 
terized by the Levy-Khinchine formula and 
contains the alpha-stable and the tempered 
stable distributions as special cases. 


NOTES 

1. Extensive analysis of a-stable distributions 
and their properties can be found in 
Samorodnitsky and Taqqu (1994), Rachev 
and Mittnik (2000), and Stoyanov and 
Racheva-Iotova (2004a, 2004b). 

2. The K 0 B 0 L distribution (see Boyarchenko 
and Levendorskii, 2000) is obtained by sub¬ 
stituting a — a + = a_, the truncated Levy 
flight is obtained by substituting X = k + = 
X_ and a = a + = u_, while the CGMY distri¬ 
bution (see Carr et al., 2002) is obtained by 
substituting C = C + = C_, G = M = X + 
and Y — a + = a_. 

3. The NTS distribution was originally ob¬ 
tained using a time-changed Brownian mo¬ 
tion with a tempered stable subordinator by 
Barndorff-Nielsen and Levendorskii (2001). 
Later, Kim, Rachev, Chung, and Bianchi 
(2008c) define the NTS distribution by the 
exponential tilting for the symmetric MTS 
distribution. 

4. The NIG distribution has been used for finan¬ 
cial modeling by Barndorff-Nielsen (1998, 
1997) and Rydberg (1997). 



258 


Probability Theory 


5. More details about the calculation can be 
found in Samorodnitsky and Taqqu (1994) 
and Sato (1999). 

6. The tempered stable distribution has been 
generalized by Rosinski (2007) and Bianchi 
et al. (2010). Rosinski (2007) defined the tem¬ 
pering function as the completely mono¬ 
tone function. The complete monotonicity 
of the tempering function q(x) means that 
(—\)" j^q(x) > 0 for all n = 0, 1, 2, ... and 
ret with i/O. The CTS and the KRTS dis¬ 
tributions are included in Rosinski's general¬ 
ization. In Bianchi et al. (2010), the tempering 
function is defined by the positive definite 
radial function. The RDTS and the MTS dis¬ 
tributions are subclasses of the class of the 
TID distributions. 
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Abstract: Fat-tailed laws have been found in many economic variables. Fully approximating a finite 
economic system with fat-tailed laws depends on an accurate statistical analysis of the phenomena, 
but also on a number of the theoretical implications of subexponentiality and scaling. Modeling 
financial variables with stable laws implies the assumption of infinite variance, which seems to 
contradict empirical observations. Nevertheless, scaling laws might still be an appropriate modeling 
paradigm given the complex interaction of distributional shape and correlations in price processes. 
They might help in understanding not only the sheer size of economic fluctuations but also the 
complexity of economic cycles. There are applications where scaling laws play a fundamental 
role, in particular in risk management and financial optimization. Ignoring the possibility of large 
deviations would render financial risk management ineffective and dangerous. 


Most models of stochastic processes and time 
series assume that distributions have finite 
mean and finite variance. In this entry we de¬ 
scribe fat-tailed distributions with infinite vari¬ 
ance. Fat-tailed distributions have been found 
in many financial economic variables ranging 
from forecasting returns on financial assets to 
modeling recovery distributions in bankrupt¬ 
cies. They have also been found in numerous 
insurance applications such as catastrophic in¬ 
surance claims and in value-at-risk measures 
employed by risk managers. 

In this entry, we review the related concepts of 
fat-tailed, power-law, and Levy-stable distributions, 
scaling, and self-similarity, as well as explore the 
mechanisms that generate these distributions. 


We discuss the key intuition relative to the ap¬ 
plicability of fat-tailed or scaling processes to 
finance: In a fat-tailed or scaling world (as op¬ 
posed to an ergodic world), the past does not 
offer an exhaustive set of possible configura¬ 
tions. Adopting, as an approximation, a scaling 
description of financial phenomena implies the 
belief that only a small space of possible config¬ 
urations has been explored; vast regions remain 
unexplored. 

We begin with the mathematics of fat-tailed 
processes, followed by a discussion of classical 
extreme value theory for independent and identi¬ 
cally distributed sequences. We then explore the 
consequences of eliminating the assumption of 
independence and discuss different concepts of 
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scaling and self similarity. We will not provide 
a review of the literature on the evidence of 
fat tails in financial markets. For a review, see 
Rachev, Menn, and Fabozzi (2005). 

SCALING, STABLE LAWS, 

AND FAT TAILS 

Let's begin with a review of the different but re¬ 
lated concepts and properties of fat tails, power 
laws, and stable laws. These concepts appear 
frequently in the financial and economic liter¬ 
ature, applied to both random variables and 
stochastic processes. 

Fat Tails 

Consider a random variable X. By definition, X 
is a real-valued function from the set £2 of the 
possible outcomes to the set R of real numbers, 
such that the set (X < x) is an event. If P(X < 
x) is the probability of the event (X < x), the 
function F(x) = P(X < x) is a well-defined func¬ 
tion for every real number x. The function F(x) 
is called the cumulative distribution function, 
or simply the distribution function, of the ran¬ 
dom variable X. Note that X denotes a function 
£2 —»■ R, x is a real variable, and F(x) is an ordi¬ 
nary real-valued function that assumes values 
in the interval [0,1]. If the function F(x) admits 
a derivative 


The function/(x) is called the probability den¬ 
sity of the random variable X. The function 
F (x) — 1 — F (x) is the tail of the distribution 
F(x). The function F(x) is called the survival 
function. 

Fat tails are somewhat arbitrarily defined. In¬ 
tuitively, a fat-tailed distribution is a distribu¬ 
tion that has more weight in the tails than some 
reference distribution. The exponential decay 
of the tail is generally assumed as the border¬ 
line separating fat-tailed from light-tailed dis¬ 
tributions. In the literature, distributions with 


a power-law decay of the tails are referred to 
as heavy-tailed distributions. It is sometimes as¬ 
sumed that the reference distribution is Gaus¬ 
sian (i.e., normal), but this is unsatisfactory; it 
implies, for instance, that exponential distribu¬ 
tions are fat-tailed because Gaussian tails decay 
as the square of an exponential and thus faster 
than an exponential. 

These characterizations of fat-tailedness (or 
heavy-tailedness) are not convenient from a 
mathematical and statistical point of view. It 
would be preferable to define fat-tailedness in 
terms of a function of some essential property 
that can be associated to it. Several propos¬ 
als have been advanced. Widely used defini¬ 
tions focus on the moments of the distribution. 
Definitions of fat-tailedness based on a single 
moment focus either on the second moment, 
the variance, or the kurtosis, defined as the 
fourth moment divided by the square of the 
variance. In fact, a distribution is often consid¬ 
ered fat-tailed if its variance is infinite or if it is 
leptokurtic (i.e., its kurtosis is greater than 3). 
Flowever, as remarked by Bryson (1982), defi¬ 
nitions of this type are too crude and should be 
replaced by more complete descriptions of tail 
behavior. 

Others consider a distribution fat-tailed if all 
its exponential moments are infinite, E[e sX ] = 
oo for every s > 0. This condition implies that 
the moment-generating function does not ex¬ 
ist. Some suggest weakening this condition, 
defining fat-tailed distributions as those distri¬ 
butions that do not have a finite exponential 
moment of first order. Exponential moments 
are particularly important in finance and eco¬ 
nomics when the logarithm of variables, for in¬ 
stance logprices, are the primary quantity to be 
modeled. 1 

Fat-tailedness has a consequence of practical 
importance: The probability of extremal events 
(i.e., the probability that the random variable 
assumes large values) is much higher than in 
the case of normal distributions. A fat-tailed 
distribution assigns higher probabilities to ex¬ 
tremal events than would a normal distribution. 
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For instance, a six-sigma event (i.e., a realized 
value of a random variable whose difference 
from the mean is six times the size of the stan¬ 
dard deviation) has a near zero probability in 
a Gaussian distribution but might have a non- 
negligible probability in fat-tailed distributions. 

The notion of fat-tailedness can be made 
quantitative as different distributions have dif¬ 
ferent degrees of fat-tailedness. The degree of 
fat-tailedness dictates the weight of the tails 
and thus the probability of extremal events. Ex¬ 
treme value theory attempts to estimate the en¬ 
tire tail region, and therefore the degree of fat- 
tailedness, from a finite sample. A number of 
indicators for evaluating the size of extremal 
events have been proposed; among these is the 
extremal claim index proposed in Embrechts, 
Kluppelberg, and Mikosch (1999), which plays 
an important role in risk management. 


The Class 2 of Fat-Tailed 
Distributions 

Many important classes of fat-tailed distribu¬ 
tions have been defined; each is characterized 
by special statistical properties that are im¬ 
portant in given application domains. We will 
introduce a number of such classes in order 
of inclusion, starting from the class with the 
broadest membership: the class 2, which is de¬ 
fined as follows. Suppose that F is a distribu¬ 
tion function defined in the domain (0, oo) with 
F < 1 in the entire domain (i.e., F is the distribu¬ 
tion function of a positive random variable with 
a tail that never decays to zero). It is said that 
F e 2 if, for any y > 0, the following property 
holds: 


lim 

x—>oo 


F(x) 


= 1 , 


Vy >0 


We can rewrite the above property in an 
equivalent (and perhaps more intuitive from 
the probabilistic point of view) way. Under the 
same assumptions as above, it is said that, given 
a positive random variable X, its distribution 
function F e 2 if the following property holds 


for any y > 0: 


lim P(X > x + y|X > x) 

x —> OO 


= lim 

x —> OO 


F (x + y) 

F(x) 


= 1, Vy > 0 


Intuitively, this second property means that if 
it is known that a random variable exceeds a 
given value x, then it will exceed any bigger 
value with certainty as the value x tends to in¬ 
finity. Some authors define a distribution as be¬ 
ing heavy-tailed if it satisfies this property. 2 

It can be demonstrated that if a distribution 
F(x) e 2, then it has the following properties: 


• Infinite exponential moments of every order: 
E[t ,sX ] = oo for every s > 0 

• lim F(x)e Xx = oo, Vk > 0 

x —>• OO 

As distributions in class 2 have infinite expo¬ 
nential moments of every order, they satisfy 
one of the previous definitions of fat-tailedness. 
However, they might have finite or infinite 
mean and variance. 

The class 2 is in fact quite broad. It includes, 
in particular, the two classes of subexponential 
distributions and distributions with regularly 
varying tails that are discussed in the following 
sections. 


Subexponential Distributions 

A class of fat-tailed distributions, widely used 
in insurance and telecommunications, is the 
class S of subexponential distributions. Introduced 
by Chistyakov (1964), subexponential distribu¬ 
tions can be characterized by two equivalent 
properties: (1) the convolution closure property 
of the tails and (2) the property of the sums. 3 

The convolution closure property of the tails pre¬ 
scribes that the shape of the tail is preserved 
after the summation of identical and indepen¬ 
dent copies of a variable. This property asserts 
that, for x —»■ oo, the tail of a sum of indepen¬ 
dent and identical variables has the same shape 
as the tail of the variable itself. As the distri¬ 
bution of a sum of n independent variables 
is the n-convolution of their distributions, the 
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convolution closure property can be written as 


lim 

X—>OQ 


F n *(x) 


= n 


Note that Gaussian distributions do not have 
this property although the sum of independent 
Gaussian distributions is again a Gaussian dis¬ 
tribution. Subexponential distributions can be 
characterized by another important (and per¬ 
haps more intuitive) property, which is equiv¬ 
alent to the convolution closure property: In a 
sum of n variables, the largest value will be of 
the same order of magnitude as the sum itself. 
For any n, define 


S„(X) = J2 

i=1 


as a sum of independent and identical copies of 
a variable X and call M„ their maxima. In the 
limit of large x, the probability that the tail of 
the sum exceeds x equals the probability that 
the largest summand exceeds x: 

lim > x ) _ ^ 

x^-oo P(M„ > x) 

The class S of subexponential distributions is 
a proper subset of the class 2. Every subexpo¬ 
nential distribution belongs to the class 2 while 
it can be demonstrated (but this is not trivial) 
that there are distributions that belong to the 
class 2 but not to the class S. Distributions that 
have both properties are called subexponential 
as it can be demonstrated that, as all distribu¬ 
tions in 2, they satisfy the property: 

lim F(x)e Xx = oo, VA. > 0 

x—> oo 

Note, however, that the class of distributions 
that satisfies the latter property is broader than 
the class of subexponential distributions; this is 
because the former includes, for instance, the 
class 2. 4 

Subexponential distributions do not have fi¬ 
nite exponential moments of any order, that 
is, E[e sX ] = oo for every s > 0. They may or 
may not have a finite mean and/or a finite vari¬ 
ance. Consider, in fact, that the class of subexpo¬ 


nential distributions includes both Pareto and 
Weibull distributions. The former have infinite 
variance but might have finite or infinite mean 
depending on the index; the latter have finite 
moments of every order (see below). 

The key indicators of subexponentiality are 
(1) the equivalence in the distribution of the 
tail between a variable and a sum of indepen¬ 
dent copies of the same variable and (2) the fact 
that a sum is dominated by its largest term. The 
importance of the largest terms in a sum can be 
made more quantitative using measures such as 
the large claims index introduced in Embrechts, 
Kluppelberg, and Mikosch (1999) that quanti¬ 
fies the ratio between the largest p terms in a 
sum and the entire sum. 

The class of subexponential distributions 
is quite large. It includes not only Pareto 
and stable distributions but also log-gamma, 
lognormal, Benkander, Burr, and Weibull 
distributions. Pareto distributions and stable 
distributions are a particularly important sub¬ 
class of subexponential distributions; these will 
be described in some detail below. 


Power-Law Distributions 

Power-law distributions are a particularly im¬ 
portant subset of subexponential distributions. 
Their tails follow approximately an inverse 
power law, decaying as x~ a . The exponent a 
is called the tail index of the distribution. To 
express formally the notion of approximate 
power-law decay, we need to introduce the class 
91(a), equivalently written as 9t a of regularly 
varying functions. 

A positive function/ is said to be regularly 
varying with index a or/ e 91(a) if the following 
condition holds: 


lim 

X—>O0 


f(tx) 

fix) 


= t 


a 


A function/ e 91(a) is called slowly varying. It 
can be demonstrated that a regularly varying 
function f(x) of index a admits the representa¬ 
tion f(x) = x a l(x) where l(x) is a slowly varying 
function. 
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A distribution F is said to have a regularly 
varying tail if the following property holds: 

F = x- a l(x) 


variables with a finite mean from those with in¬ 
finite mean. Let's take a closer look at the law of 
large numbers and the CLT. 


where / is a slowly varying function. An exam¬ 
ple of a distribution with a regularly varying 
tail is Pareto's law. The latter can be written in 
various ways, including the following: 

F(x) — P(X > x)= —-— for x > 0 

Q _j_ X 01 

Power-law distributions are thus distribu¬ 
tions with regularly varying tails. It can be 
demonstrated that they satisfy the convolution 
closure property of the tail. The distribution of 
the sum of n independent variables of tail index 
a is a power-law distribution of the same index 
a. Note that this property holds in the limit for 
x —>oo. Distributions with regularly varying 
tails are therefore a proper subset of subexpo¬ 
nential distributions. 

Being subexponential, power laws have all 
the general properties of fat-tailed distributions 
and some additional ones. One particularly im¬ 
portant property of distributions with regularly 
varying tails, valid for every tail index, is the 
rank-size order property. Suppose that samples 
from a power law of tail index a are ordered by 
size, and call S r the size of the rth sample. One 
then finds that the law 

S r — ar~« 

is approximately verified. The well-known 
Zipf's law is an example of this rank-size or¬ 
dering. Zipf's law states that the size of an ob¬ 
servation is inversely proportional to its rank. 
For example, the frequency of words in an En¬ 
glish text is inversely proportional to their rank. 
The same is approximately valid for the size of 
U.S. cities. 

Many properties of power-law distributions 
are distinctly different in the three following 
ranges of a: 0 < a < 1, 1 < a < 2, a > 2. The 
threshold a = 2 for the tail index is important as 
it marks the separation between the applicabil¬ 
ity of the standard central limit theorem (CLT); 
the threshold a = 1 is important as it separates 


The Law of Large Numbers and the 
Central Limit Theorem 

There are four basic versions of the law of large 
numbers (LLN), two weak laws of large num¬ 
bers (WLLN), and two strong laws of large 
numbers (SLLN). 

The two versions of the WLLN are formulated 
as follows. 

1. Suppose that the variables X, are IID with 
finite mean £[X,] = £[X] = /i Under this con¬ 
dition it can be demonstrated that the empir¬ 
ical average tends to the mean in probability: 

n 

££ p 

X n = — -► E[X] = /L 

Yl ft—>-oo 

2. If the variables are only independently dis¬ 
tributed (ID) but have finite means and vari¬ 
ances (/x,-,cr,), then the following relationship 
holds: 

n n _ n 

£ Xj £ X,- v m 

_ i= 1 P ^ 7 = 1 _ 7 = 1 

n h^oo n n 

In other words, the empirical average of a 
sequence of finite-mean finite-variance vari¬ 
ables tends to the average of the means. 

The two versions of the SLLN are formulated 
as follows. 


1. The empirical average of a sequence of IID 
variables X, tends almost surely to a constant 
a if and only if the expected value of the vari¬ 
ables is finite. In addition, the constant a is 
equal to /i. Therefore, if and only if |E[X,]| = 
|E[X]| = | /x | < oo the following relationship 
holds: 


£*7 

x„ = —— £[X] = P 

Yl ft—>-oo 

where convergence is in the sense of almost 
sure convergence. 
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2. If the variables X, are only independently 
distributed (ID) but have finite means and 
variances (//,, er,) and 

1 " 

lim —r Y of < oo 

n—>o o ' 


then the following relationship holds: 

n n _ n 

E V X; V //, 

^ _ i=l AS. ,'=i _ ,=i 

n n-»-oo n n 

Suppose the variables are IID. If the scaling 
factor n is replaced with *Jn, then the limit rela¬ 
tion no longer holds as the normalized sum 




i =i 
sfn 


diverges. However, if the variables have finite 
second-order moments, he classical version of 
the CLT can be demonstrated. In fact, under the 
assumption that both first- and second-order 
moments are finite, it can be shown that 


S„ — n ft 
o^/n 


D 


<t> 


n 


Sn = E 

i =1 


where /r, o are respectively the expected value 
and standard deviation of X, and <t> the standard 
normal distribution. 

If the tail index a > 1, variables have finite 
expected value and the SLNN holds. If the tail 
index a > 2, variables have finite variance and 
the CLT in the previous form holds. If the tail 
index a < 2, then variables have infinite vari¬ 
ance: The CLT in the previous form does not 
hold. In fact, variables with a < 2 belong to 
the domain of attraction of a stable law of in¬ 
dex a. This means that a sequence of properly 
normalized and centered sums tends to a stable 
distribution with infinite variance. In this case, 
the CLT takes the form 


Sn - nfl D 

-j- > Lr a , if 1 < a < 2 

n° 

Sn D 


where G are stable distributions as defined be¬ 
low. Note that the case a = 2 is somewhat spe¬ 
cial: variables with this tail index have infinite 
variance but fall nevertheless in the domain of 
attraction of a normal variable, that is, G 2 . Be¬ 
low the threshold 1, distributions have neither 
finite variance nor finite mean. There is a sharp 
change in the normalization behavior at this 
tail-index threshold. 

Stable Distributions 

Stable distributions are not, in their generality, a 
subset of fat-tailed distributions as they include 
the normal distribution. There are different, 
equivalent ways to define stable distributions. 
Let's begin with a key property: the equality in 
distribution between a random variable and the 
(normalized) independent sum of any number 
of identical replicas of the same variable. This is 
a different property than the closure property of 
the tail insofar as (1) it involves not only the tail 
but the entire distribution and (2) equality in 
distribution means that distributions have the 
same functional form but, possibly, with differ¬ 
ent parameters. Normal distributions have this 
property: The sum of two or more normally 
distributed variables is again a normally dis¬ 
tributed variable. But this property holds for a 
more general class of distributions called sta¬ 
ble distributions or Levy-stable distributions. 5 
Normal distributions are thus a special type of 
stable distributions. 

The above can be formalized as follows: Sta¬ 
ble distributions can be defined as those dis¬ 
tributions for which the following identity in 
distribution holds for any number n > 2: 

n 

yX,=C n X+D n 

i =1 

where X, are identical independent copies of 
X and the C„, D n are constants. Alternatively, 
the same property can be expressed stating that 
stable distributions are distributions for which 
the following identity in distribution holds: 
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Stable distributions are also characterized by 
another property that might be used in defin¬ 
ing them: a stable distribution has a domain of 
attraction (i.e., it is the limit in distribution of a 
normalized and centered sum of identical and 
independent variables). Stable distributions co¬ 
incide with all variables that have a domain of 
attraction. 

Except in the special cases of Gaussian (a = 2), 
symmetric Cauchy (a = 1, = 0), and stable 

inverse Gaussian (a = i, /3 = 0) distributions, 
stable distributions cannot be written as simple 
formulas; formulas have been discovered but 
are not simple. However, stable distributions 
can be characterized in a simple way through 
their characteristic function, the Fourier trans¬ 
form of the distribution function. In fact, this 
function can be written as 

O x (f) = exp [iyt - c\t\ a [l - i J 6sign(f)z(t, a)]} 

where t e R, y e R, c > 0, a e (0,2), e [—1,1], 
and 

jta 

z(f, a) = tan -^-lf a ^ 1 

z(f, a) = —2 log \t | if a = 1 

It can be shown that only distributions with 
this characteristic function are stable distribu¬ 
tions (i.e., they are the only distributions closed 
under summation). A stable law is character¬ 
ized by four parameters: a, ft, c, and y. Normal 
distributions correspond to the parameters: a = 
2, p = 0, y = 0. 

Even if stable distributions cannot be writ¬ 
ten as simple formulas, the asymptotic shape 
of their tails can be written in a simple way. In 
fact, with the exception of Gaussian distribu¬ 
tions, the tails of stable laws obey an inverse 
power law with exponent a (between 0 and 2). 
Normal distributions are stable but are an ex¬ 
ception as their tails decay exponentially. 

For stable distributions, the CLT holds in the 
same form as for inverse power-law distribu¬ 
tions. In addition, the functions in the domain 
of attraction of a stable law of index a < 2 are 
characterized by the same tail index. This means 
that a distribution G belongs to the domain of 


attraction of a stable law of parameter a < 2 
if and only if its tail decays as a. In particular, 
Pareto's law belongs to the domain of attraction 
of stable laws of the same tail index. 

EXTREME VALUE THEORY 
FOR IID PROCESSES 

In this section we introduce a number of impor¬ 
tant probabilistic concepts that form the concep¬ 
tual basis of extreme value theory (EVT). The 
objective of EVT is to estimate the entire tail of 
a distribution from a finite sample by fitting to 
an appropriate distribution those values of the 
sample that fall in the tail. Two concepts play a 
crucial role in EVT: (1) the behavior of the up¬ 
per order statistics (i.e., the largest k values in a 
sample) and, in particular, of the sample max¬ 
ima; and (2) the behavior of the points where 
samples exceed a given threshold. We will ex¬ 
plore the limit distributions of maxima and the 
distribution of the points of exceedances of a 
high threshold. Based on these concepts a num¬ 
ber of estimators of the tail index in sequences 
of independent and identically distributed (IID) 
variables are presented. 

Maxima 

In the previous sections we explored the be¬ 
havior of sums. The key result of the theory of 
sums is that the behavior of sums simplifies in 
the limit of properly scaled and centered infi¬ 
nite sums regardless of the shape of individual 
summands. If sums converge, their limit dis¬ 
tributions can only be stable distributions. In 
addition, the normalized sums of finite-mean, 
finite-variance variables always converge to a 
normal variable. 

A parallel theory can be developed for max¬ 
ima, informally defined as the largest value in 
a sample. The limit distribution of maxima, if 
it exists, belongs to one of three possible dis¬ 
tributions: Frechet, Weibull, or Gumbel. This re¬ 
sult forms the basis of classical EVT. Each limit 
distribution of maxima has its own maximum 
domain of attraction. In addition, limit laws are 
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Figure 1 The Distribution of the Maxima of a Normal Variable 


max-stable (i.e., they are closed with respect to 
maxima). However, the behavior of maxima is 
less robust than the behavior of sums. Maxima 
do not converge to limit distributions for im¬ 
portant classes of distributions, such as Poisson 
or geometric distributions. 

Consider a sequence of independent variables 
X, with common, nondegenerate distribution F 
and the maxima of samples extracted from this 
sequence: 

Mi = Xi, M„ = max(Xi,..., X„) 

The maxima M n form a new sequence of 
random variables, which are not, however, 
independent. 

As the variables of the sequence X, are as¬ 
sumed to be independent, the distribution F„ 
of the maxima M n can be immediately written 
down: 

F(x)„ = P(Xj < x v ... v X„ < x) = F n (x) 

where v is the logical symbol for and. 

If the distribution F, which is a nondecreasing 
function, reaches 1 at a finite point Xp —that is. 


if Xp = sup{x: F(x) < 1} < oo, then 

lim P(M„ < x) = lim F n (x) — 0, for x < Xp 

n—>oo n-^-o o 

If Xp is finite, 

P(M„ < x) = F n (x) = 1, for x > Xp 

The point Xp is called the right endpoint of the 
distribution F. 

Figure 1 illustrates the behavior of maxima in 
the case of a normal distribution. Given a nor¬ 
mal distribution with mean zero and variance 
one, 100,000 samples of 20 elements each are 
selected. For each sample, the maximum is cho¬ 
sen. The distribution of the maxima and the em¬ 
pirical distribution of independent draws from 
the same normal are illustrated in the figure. 

A deeper understanding of the behavior of 
maxima can be obtained considering sequences 
of normalized and centered maxima. Consider 
the following sequence: c“ 1 (M„ — d n ) where 
c n > 0, d n e R are constants. 

A fundamental result on the behavior of max¬ 
ima is the Fisher-Tippett theorem, which can be 
stated as follows. Consider a sequence of IID 
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Figure 2 The Distribution of Frechet, Gumbel, and Weibull 


variables X; and the relative sequence of max¬ 
ima M„. If there exist two sequences of constants 
c n > 0, d n e R and a nondegenerate distribution 
function H such that 

C~HMn ~ d n ) S H 

then H is one of the following distributions: 

Frechet: = { ° xp( _*- a) “>° 

Weibull: ^(x) = { f<p[-(--v) “] ^ < 0 a > 0 

[1 x > 0 

Gumbel: A(x) = exp{— e~ x }, x e R 

The limit distribution H is unique, in the sense 
that different sequences of normalizing con¬ 
stants determine the same distribution. 

The three above distributions—Frechet, 
Weibull, and Gumbel—are called standard ex¬ 
treme value distributions. They are continuous 
functions for every real x. Random variables 
distributed according to one of the extreme 
value distributions are called extremal random 
variables. 


As an example, consider a standard exponen¬ 
tial variable X. As F(x) = P(X < x) = 1 — e~ x , 
x > 0 the distribution of the maxima is P(M„ < 
x) — F"(x) = (1 — e~ x ) n , x > 0. If we choose d n 
In n, we can write: P(M n — d n < x) = P(M n < In 
n + x) — (1 — n~ l e~ x ) n , x > 0. For any given x, 
(1 — n~ l e~ x ) n —*■ exp(— e~ x ), which shows that 
the maxima of standard exponential variables 
centered with d n = In n tend to a Gumbel dis¬ 
tribution. Figure 2 illustrates the three distribu¬ 
tions: Frechet, Gumbel, and Weibull. 

We can now ask if there are conditions on 
the distribution F that ensure the existence of 
centering and scaling constants and the conver¬ 
gence to an extreme value distribution. To this 
end, let's first introduce the concept of the maxi¬ 
mum domain of attraction (MDA) of an extreme 
value distribution H or MDA(H). 

A random variable X is said to belong to the 
MDA(H) of the extreme value distribution H 
if there exist constants c„ > 0, d n e R such 
that 

C-\Mn - d n ) S H 
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Two distribution functions F, G are said to 
be tail equivalent if they have the same right 
endpoints and the following condition holds: 

, F(x) 

lim _ = c, 0 < c < oo 

x^oo G(x) 

Tail equivalence is an important concept for 
characterizing MDAs. In fact, it can be demon¬ 
strated that every MDA(H) is closed with 
respect to tail equivalence (i.e., if two distribu¬ 
tion functions F and G are tail equivalent F e 
MDA(H) if and only if G e MDA(H)). Tail equiv¬ 
alence allows for a powerful characterization of 
the three MDAs. 

Let's first define the quantile function. Given 
a distribution function F, the quantile function 
of F, written F*~(x), is defined as follows: 

F*~(x) = inf[s e R : F(s) > x], 0 < x < 1 

The MDA of the Frechet Distribution 

The Frechet distribution is written as <t> a (x) = 
exp(— x~ a ). Let's start by observing that the tail 
of the Frechet distribution decays as an inverse 
power law. In fact, we can write 1—4> a (x) = 1 — 
exp(— x~ a ) ss x~ a for x —> oo. 

It can be demonstrated that a distribution 
function F belongs to the MDA of a Frechet 
distribution 4>„(x), a > 0 if and only if there 
is a slowly varying function L such that F (x) = 
x a L(x). In this case, the constants assume the 
values 

c n = (l/F*-)(n), d n = 0 

We can rewrite this condition more compactly 
as follows: 

F e MDA(0„) -o- F e R_ a 

From the above definitions it can be demon¬ 
strated that the following five distributions be¬ 
long to the MDA of the Frechet distribution: (1) 
Pareto; (2) Cauchy; (3) Burr; (4) stable laws with 
exponent a < 2; or (5) log-gamma distribution. 


The MDA of the Weibull Distribution 

The Weibull distribution is written as follows: 

4G = exp[-(-x~“)] 

The Weibull and the Frechet distributions are 
closely related to each other. In fact, it is clear 
from the definition that the following relation¬ 
ship holds: 

'l'a(x) = x _1 ), x > 0 

One can therefore expect that the MDA of the 
two distributions are closely related. In fact, it 
can be demonstrated that a distribution func¬ 
tion F belongs to the MDA of a Weibull distri¬ 
bution a > 0 if and only if 

Xp < oo 

and 

F (xp — x _1 ) = x~ a L(x ) 

where L is a slowly varying function. 

If 

F g MDA(> P a ) 

then 

c ~ l {Mn -Xp)S qi a 

The MDA of the Weibull distribution includes 
important distributions such as the distribution 
uniform in (0,1), power laws truncated to the 
right, and beta distributions. 

The MDA of the Gumbel Distribution 

The Gumbel distribution is written as A (x) = 
exp[—exp(—x)]. Observe that the Gumbel distri¬ 
bution has exponential tails. This fact can be eas¬ 
ily ascertained through Taylor expansion. There 
is no simple characterization of the MDA of the 
Gumbel distribution. 

The MDA of a Gumbel distribution encom¬ 
passes a large class of distributions that in¬ 
cludes the exponential distribution, the normal 
distribution, and the lognormal distribution. 
Though the Gumbel distribution has expo¬ 
nential tails, its MDA includes subexponential 
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distributions such as the Berktander distri¬ 
bution, as explained in Goldie and Resnick 
(1988). 

Max-Stable Distributions 

Stable distributions remain unchanged after 
summation; max-stable distributions remain un¬ 
changed after taking maxima. A nondegenerate 
random variable X and the relative distribution 
is called max-stable if there are constants c n > 0, 
d n e R such that the following conditions are 
satisfied 


max(Xj,..., X„) = c„X + d n 

where X, Xj,..., X„ are IID variables. 

It can be demonstrated that the class of max- 
stable distributions coincides with the class of 
possible limit laws for normalized and centered 
maxima. In view of the previous discussions, 
the max-stable laws are the three possible limit 
laws: Frechet, Weibull, and Gumbel. 


Generalized Extreme Value 
Distributions 

The three extreme value distributions, Frechet, 
Weibull, and Gumbel, can be represented 
as a one-parameter family of distributions 
through the standard generalized extreme 
value distribution (GEV) of Jenkinson and Von 
Mises. Define the distribution function H- as 
follows: 


H h = 


exp[—(1 + %x) h?] for £ ^ 0 
exp(— exp(— x)) for £ = 0 


where 1 + fx > 0. One can see from the def¬ 
inition that f = a~ l > 0 corresponds to the 
Frechet distribution, £ = 0 corresponds to the 
Gumbel distribution, and f = —a -1 < 0 corre¬ 
sponds to the Weibull distribution. We can now 
introduce the related location-scale dependent 
family ^ by replacing the argument x with 
(x - n)/f. 


Order Statistics 

The behavior of order statistics is a useful tool 
for characterizing fat-tailed distributions. For 
instance, the famous Zipf's law is an example of 
the behavior of order statistics. Consider a sam¬ 
ple Xi,..., X„ made of n independent draws 
from the same distribution F. Let's arrange the 
sample in decreasing order: 

X n ^ n ^ . . . ^ X\ n 


The random variable X kn is called the k th up¬ 
per order statistic. It can be demonstrated that 
the distribution of the kth upper order statistic 
is 

k-l 

F k , n = P(X k , n <X) = J2 F r (x)F n ~ r (x) 

r=0 

In addition, if F is continuous, it has a density 
with respect to F such that 

X 

Fk,n = J fk,n(z)dF(z) 


where 


fk,n = 


tv. 


-k-l 


(x)F n ~ k (x) 


(k - l)\(n - k)\ 


The differences between two consecutive 
variables in a sample X k , n — X k+ i,„ are random 
variables called spacings. In the case of vari¬ 
ables with finite right endpoint Xp the zero-th 
spacing is defined as: Xo,« — Xi,„ = Xp — Xi r „. 
The distribution of spacings depends on the 
distribution F. For instance, it can be demon¬ 
strated that the spacings of an exponential 
random variable are independent, exponential 
random variables with mean 1/n for an Li¬ 
sa m pie. Spacings are a key concept for the def¬ 
inition of the Hill estimator, as explained later 
in this section. 

Another key concept, which is related to spac¬ 
ings, is that of quantile transformation. Let 
Xi,..., X„ be IID variables with distribution 
function F and let LI] ,..., Lt„ be IID variables 
uniformly distributed on the interval (0,1). Re¬ 
call that, given a distribution function F, the 
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quantile function of F, written F*~(x), is defined 
as follows: 

F*~(x) = inf{s e R : F(s) > x}. 0 < x < 1 

It can be demonstrated that the following re¬ 
sults hold: 

• F^~(U 1 ) = X 1 

• (X lt n, ..., Xn, n ) =[F^(U hn ), . . . ,F*-(Un, n )] 

• The random variable F(X i) has a uniform dis¬ 
tribution on (0,1) if and only if F is a continu¬ 
ous function. 

To appreciate the importance of the quantile 
transformation, let's introduce first the notion 
of empirical distribution function and second the 
Glivenko-Cantelli theorem. The empirical dis¬ 
tribution function F n of a sample X\,, X,, is 
defined as follows: 

1 " 

F n (x) =-V f(X; < X) 
n 

i=i 

where I is the indicator function. In other words, 
for each x, the empirical distribution function 
counts the number of samples that are less than 
or equal to x. 

The Glivenko-Cantelli theorem provides the 
theoretical underpinning of nonparametric 
statistics. It states that, if the samples X\,... , X„ 
are independent draws from the distribution F, 
the empirical distribution function F„ tends to 
F for large n in the sense that 

A„ = sup | F n (x) - F (x)| a ~> 0, for n -»• oo 

xeR 

The quantile transformation tells us that in 
cases where F is a Pareto distribution, if we ap¬ 
proximate n random draws from a uniformly 
distributed variable as the sequence 1,2,..., n, 
then the corresponding values of the sample 
Xi,..., X„ will be 

1 1 1 

which is a statement of Zipf's law. 

From the quantile transformation, the limit 
law of the ratio between two successive or¬ 
der statistics can also be inferred. Suppose that 


an (infinite) population is distributed accord¬ 
ing to a distribution F e 91(a) with regularly 
varying tails. Suppose that n samples are ran¬ 
domly and independently drawn from this dis¬ 
tribution and ordered in function of size: X„ „ > 
X n -i,n > ... > X j „. It can be demonstrated that 
the following property holds: 


Xft:+l,n U 

Point Process of Exceedances or 
Peaks over Threshold 

We have now reviewed the behavior of sums, 
maxima, and upper order statistics of contin¬ 
uous random variables. Yet another approach 
to EVT is based on point processes; herein we 
will use point processes only to define the point 
process of exceedances. 

Point processes can be defined in many differ¬ 
ent ways. To illustrate the mathematics of point 
processes, let's first introduce the homogeneous 
Poisson process. A homogeneous Poisson process 
is defined as a process N(t) that starts at zero, 
i.e., N(0) = 0, and has independent stationary 
increments. In addition, the random variable 
N(t) is distributed as a Poisson variable with pa¬ 
rameter kt. N(t) is therefore a time-dependent 
discrete variable that can assume nonnegative 
integer values. Figure 3 illustrates the distribu¬ 
tion of a Poisson variable. 

A homogeneous Poisson process can also be 
defined as a random sequence of points on 
the real line. Consider all discrete sequences of 
points on the real line separated by random in¬ 
tervals. Intervals are independent random vari¬ 
ables with exponential distribution. This is the 
usual definition of a Poisson process. Call N(t) 
the number of points that fall in the interval 
[0,t]. It can be demonstrated that N(t) is a homo¬ 
geneous Poisson process according to the pre¬ 
vious definition. 

This latter definition can be generalized to de¬ 
fine point processes. Intuitively, a generic point 
process is a random collection of discrete points 
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Figure 3 Distribution of a Poisson Variable 


in some space. From a mathematical point of 
view, it is convenient to describe a point pro¬ 
cess through the distribution of the number of 
points that fall in an arbitrary set. 6 In the case of 
homogeneous Poisson processes, we consider 
the number of points that fall in a given inter¬ 
val; for a generic point process, it is convenient 
to consider a wider class of sets. 

Consider a subspace £ of a finite dimensional 
Euclidean space of dimension n. Consider also 
the a -algebra 23 of the Borel sets generated by 
open sets in £. The space £ is called the state 
space. For each point x in £ and for each set A 
e 23, define the Dime measure e x as 



For any given sequence x„ i > 1 of points in £, 
define the following set function: 

OO 

m(A) = ^2 s x,(A) — card{z : X,- e A}, A e 23 
i =1 

It can be verified that m(A) is a measure 23, 
called a counting measure. If a counting mea¬ 


sure is finite on each compact set, then it is called 
a point measure. In other words, any given 
countable sequence in £ generates a counting 
measure on 23. 

A point process is obtained associating to each 
family of sets A, e 23 the joint probability distri¬ 
butions: 

Pr {m(Aj) = m■ i = 1,2,...,£; £ = 1,2,...} 

To make this definition mathematically rigor¬ 
ous, a point process can be defined as a mea¬ 
surable map from some probability space to the 
set of all point measures equipped with an ap¬ 
propriate a-algebra. Besides the mathematical 
details, it should be clear that point processes 
are defined by the probability distribution of 
the number of points that fall in each set A of 
some a- algebra. The key ingredients of point 
processes are (1) counting measures that asso¬ 
ciate to each set A the number of points of each 
discrete sequence that falls in A with the ad¬ 
ditivity restrictions of measures and (2) prob¬ 
ability distributions defined over the space of 
counting measures. 
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Equipped with the general concept of point 
processes, we can now define the point process 
of exceedances. Consider a threshold formed by 
any real number u and a sequence of random 
variables X,, z = 1,2,_The point process of ex¬ 

ceedances with state space E = (0,1) counts the 
number of instances where the random vari¬ 
ables X, exceed the threshold u: 

OO 

N„(A) = ^£;/„(A) = cardjz < n and X, > u] 
i =1 

Note that in this case the state space specifies 
the size of the sample. 

Estimation 

In the previous sections we presented some key 
topics related to the probability structure of the 
tails of distributions, be they light- or fat-tailed. 
Let's now turn to the problem of estimation, 
which is the key practical task. The problem of 
estimation for EVT is essentially the problem 
of estimating the tail of a distribution from a 
finite sample. The key statistical idea of EVT 
from the point of view of estimation is to use 
only those sample data that belong to the tail 
and not the entire sample. This notion has to 
be made precise by finding criteria that allow 
one to separate the tail from the bulk of the 
distribution. Therefore, the estimation problem 
of EVT distribution can be broken down into 
three separate subproblems: 

• Identify the beginning of the tail. 

• Identify the shape of the tail, in particular dis¬ 
criminate if it is a power-law tail. 

• Estimate the tail parameters, in particular the 
tail index in the case of a power-law tail. 

It turns out that these three problems cannot 
be easily separated. In fact, there is no reliable 
constructive theory for solving all these prob¬ 
lems automatically. In particular, the choice of 
the statistical model (i.e., the distribution that 
best describes data) is a classical problem of 
formulating and validating a scientific hypoth¬ 
esis in a probabilistic context. However, there 


are many tools and tests to help the modeler in 
this endeavor. 

The first fundamental tool is the graphical 
representation of data, in particular the quan¬ 
tile plot or QQ-plot defined as the following set: 





/ n - k + 1 \ 

V n + 1 ) 


: k = 1,2, 


.., n 


The quantile transformation and the 
Glivenko-Cantelli theorem allow conclud¬ 
ing that this plot must be approximately linear. 
Should F be a Pareto distribution, the linearity 
of the QQ-plot is another statement of Zipf's 
law. The quantile plot allows a quick verifica¬ 
tion of a statistical hypotheses by checking the 
approximate linearity of the plot. It also allows 
the modeler to form a preliminary opinion on 
where the tail begins and whether the model 
fails at the far end of the tail. 

Though invaluable as an exploratory tool, 
graphics rely on human judgment and intu¬ 
ition. Rigorous tests are needed. A starting point 
is parameter estimation for the generalized ex¬ 
treme value (GEV) distribution that we write 
as 

H^(x) = exp J, 


with the convention that the case £ = 0 corre¬ 
sponds to the Gumbel distribution: 


H 0 ;n,v(x) = exp 



x e R 


We saw above that these distributions are the 
limit distributions, if they exist, of the normal¬ 
ized maxima of IID sequences. Suppose that 
the data to be estimated are independent draws 
from some EGV. This is a rather strong assump¬ 
tion that we will progressively relax. This as¬ 
sumption might be justified in domains where 
long series of data are available so that the sam¬ 
ple data are the maxima of blocks of consecu¬ 
tive data. Though this assumption is probably 
too strong in the domain of finance, it is useful 
to elaborate its consequences. 
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Standard methodologies exist for parameter 
estimation in this case. In particular, the usual 
maximum likelihood (ML) methodology can be 
used for fitting the best GEV to data. Note that 
if the above distributions fit maxima we have to 
divide data into blocks and consider the max¬ 
ima of each block. To apply ML, we have to 
compute the likelihood function on the data 
and choose the parameters that maximize it. 
This can be done with numerical integration 
methods. 

An estimation method alternative to ML is the 
method of moments, which consists in equating 
empirical moments with theoretical moments. 
An ample literature on various versions of the 
method of moments exists. 7 

Let's now release the assumption that the 
sequence of empirical data are independent 
draws from an exact GEV and replace this with 
the weaker assumption that empirical data are 
independent draws from F e MDA(H^). If we 
assume that the limit distribution is a Frechet 
distribution, then data must be independent 
draws from some distribution F whose tail has 
the form: 


F = x~ a L(x) 

where L is a slowly varying function as de¬ 
scribed earlier in this entry. For this reason, 
estimation under this weaker assumption is 
semiparametric in nature. We will now intro¬ 
duce a number of estimators of the shape pa¬ 
rameter §. 


The Pickand Estimator 

The Pickand estimator for an n-sample of 
independent draws from a distribution F e 
MDA(H|) is defined as 


e(P) 

Sk,n 



Xk,n X 2 k,n 
^2 k,n ^4 k,n 


where the X/ : _ „ are upper order statistics. 

It can be demonstrated that the Pickand esti¬ 
mator has the following properties: 


Weak consistency: 

lk F n -* 00 , k 

Strong consistency: 

p(P) « 


k 

00 , — 
n 


Sk.n 




00 , 


00 , —0 
n 


ln(ln n) 

Asymptotic normality under technical condi¬ 
tions. 


The Pickand estimator is an estimator of the 
parameter £ that does not require any assump¬ 
tion on the type of limit distribution. Let's now 
examine the Hill estimator, which requires the 
prior knowledge that sample data are indepen¬ 
dent draws from a Frechet distribution. Later 
in this entry we will see that the assumption of 
independence can be weakened. 


The Hill Estimator 

Suppose that X\,...,X n are independent draws 
from a distribution F e MDA(b„), a > 0 so that 
F = x~ a L(x) where L is a slowly varying func¬ 
tion. The Hill estimator can be obtained as an 
MLE based on the k upper order statistics. The 
Hill estimator takes the following form: 

&(H) = 4 H n = ^ l J2 ln X F n ~ M Xk ’ n 

The Hill estimator has the same weak and 
strong consistency property as well as asymp¬ 
totic normality as the Pickand estimator. The 
Hill estimator is by far the most popular esti¬ 
mator of the tail index. It has the advantage of 
being robust to some dependency in the data 
but can perform very poorly in case of devia¬ 
tions from strict Pareto behavior. In addition, 
it is subject to a bias-variance trade-off in the 
following sense: The variance of the Hill esti¬ 
mator depends on the ratio k/n: It decreases for 
increasing k. However, using a large fraction of 
the data will introduce bias in the estimator. 

As stated above, a critical tenet of EVT is the 
idea of fitting the tail rather than the entire dis¬ 
tribution. A number of articles on the automatic 
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determination of the optimal subset of samples 
to be included in the tail have appeared. One 
approach to the automatic determination of the 
tail sample using the variance-bias trade-off 
was proposed by Drees and Kaufmann (2000), 
while Dacorogna et al. (1995), and Danielsson 
and de Vries (1977) proposed methods based on 
a bootstrap approach. 

The moment ratio estimator is a generalization 
of the Hill estimator. Consider the following 
estimator of the second order moments of the k 
upper order statistic: 


Mt ,n 


1 

k 


^ 'y ' In If /, In V ■ 1 >; 


2 


The moment ratio estimator is defined as fol¬ 
lows: 


-(m) 
«*,n = 


1 Mk,, 


,(H) 


Wagner and Marsh (2000) did extensive sim¬ 
ulation analysis of various estimators. Their 
finding is that the moment ratio estimator 
outperforms the Hill estimator in sequences 
with a dependence structure (this is discussed 
further in the next section). 

The Hill estimator was extended by Dekkers 
and de Haan (1989) to cover the entire range of 
shape parameters £. A number of other estima¬ 
tors have been proposed. In particular, under 
the assumption that financial data follow a sta¬ 
ble process, estimation procedures based on re¬ 
gression analysis have been suggested. In fact, 
the assumption of stable behavior, or at least 
of exact Pareto tail, naturally leads to fitting a 
linear model in a logarithmic scale. There is an 
ample literature on this topic with a number 
of useful discussions, though empirical stud¬ 
ies based on Monte Carlo simulations are still 
limited. 8 

The estimation methods reviewed above are 
based on the behavior of maxima and upper 
order statistics; another methodology uses the 
points of exceedances of high thresholds. Esti¬ 
mation methodologies based on the points of 


exceedances require an appropriate model for 
the point process of exceedances that was de¬ 
fined in general terms previously in this entry. 


ELIMINATING THE 
ASSUMPTION OF IID 
SEQUENCES 

In the previous sections we reviewed a number 
of mathematical tools that are used to describe 
fat-tailed processes under the key assumption 
of IID sequences. In this section we discuss 
the implications of eliminating this assumption. 
However, in finance theory the assumption 
of stationary sequences of independent vari¬ 
ables is only a first approximation; it has been 
challenged in several instances. Consider in¬ 
dividual price time series. The autocorrelation 
function of returns decays exponentially and 
goes to near zero at very short-time horizons 
while the autocorrelation function of volatility 
decays only hyperbolically and remains differ¬ 
ent from zero for long periods. In addition, if 
we consider portfolios made of many securi¬ 
ties, price processes exhibit patterns of cross 
correlations at different time-lags and, possi¬ 
bly, cointegrating relationships. These findings 
offer additional reasons to consider the as¬ 
sumption of serial independence as only a first 
approximation. 

If we now consider the question of station- 
arity, empirical findings are more delicate. 
The nonstationarity that can be removed by 
differencing is easy to handle and does not 
present a problem. The critical issue is whether 
financial time series can be modeled with 
a single data generation process (DGP) that 
remains the same for the entire period under 
consideration or if the model must be modified. 
Consider, for instance, the question of struc¬ 
tural breaks. At a basic level, structural breaks 
entail nonstationarity as the model parameters 
change with time and thus the finite-dimension 
distributions change with time. However, at a 
higher level one might try to model structural 
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changes, for instance through state-space 
models or Markov switching models. In this 
way, stationarity is recovered but at the price 
of a more complex, serially autocorrelated 
model. 

EVT for multivariate models with complex 
patterns of serial correlations loses its gener¬ 
ality and becomes model-dependent. One has 
to evaluate each model in terms of its behav¬ 
ior as regards extremes. In this section we will 
explore a number of models that have been 
proposed for modeling financial time series: 
ARCH and GARCH models and, more in gen¬ 
eral, state-space models. First, however, a num¬ 
ber of methodological considerations are in 
order. 

In the context of IID sequences, EVT tries to 
answer the question of how to estimate a dis¬ 
tribution with heavy tails given only a limited 
amount of data. The model is the simplest 
(i.e., a sequence of IID variables) and the ques¬ 
tion is how to extrapolate from finite samples 
to the entire tail. In the context of IID distri¬ 
butions, conditional and unconditional distri¬ 
butions coincide. However, if we release the 
IID assumption, we have to specify the model 
and to estimate the entire model—not just 
the tail of one variable. Conditional and un¬ 
conditional distributions no longer coincide. 
For instance, there are families of models that 
are conditionally normal and unconditionally 
fat-tailed. 

Here difficulties begin as model estimation 
might be complex. In addition, estimation of 
some specific tail might not be the primary con¬ 
cern in model estimation. In the context of vari¬ 
ables with a dependence structure, EVT can be 
thought of as a methodology to estimate the 
tails of the unconditional distribution, leaving 
aside the question of full model estimation. 

An important methodological question is 
whether fat-tailedness is generated by the trans¬ 
formation of a sequence of zero-mean, finite 
variance IID variables (i.e., white noise) or 
whether innovations themselves have fat tails 
(i.e., so-called colored noise). For instance, as we 


will see, GARCH models entail fat-tailed return 
distributions as the result of the transformation 
of white noise. On the other hand, one might 
want to estimate an autoregressive moving av¬ 
erage (ARMA) model under the assumption of 
innovations with infinite variance. 

Understanding how power laws and, more 
in general, fat tails are generated from normal 
variables has been a primary concern of econo¬ 
metrics and econophysics. Given the universal¬ 
ity of power laws in economics, it is clearly 
important to understand how they are gener¬ 
ated. These questions go well beyond the sta¬ 
tistical analysis of heavy-tailed processes and 
involve questions of economic theories. Essen¬ 
tially, one wants to understand how the deci¬ 
sions of a large number of economic agents do 
not average out but produce cascading and am¬ 
plification phenomena. 

The law of large numbers tells that if indi¬ 
vidual processes are independent and have fi¬ 
nite variance, then phenomena average out in 
aggregate and tend to an average limit. How¬ 
ever, if individual processes have fat tails, phe¬ 
nomena do not average out even in the infinite 
limit. The weight of individual tails prevails and 
drives the aggregate process. Philip W. Ander¬ 
son, the corecipient of the 1997 Nobel Prize in 
Physics, remarked: 

Much of the real world is controlled as much by the 
"tails" of distributions as by means or averages: by 
the exceptional, not the mean; by the catastrophe, 
not the steady drip; by the very rich, not the "middle 
class." We need to free ourselves from "average" 
thinking. (Anderson, 1997) 

When and if fat-tailed drivers exist, they 
control the ensemble to which they belong. 
But what generates these powerful drivers? 
Models that generate fat tails from standard 
normal innovations attempt to answer this 
question. Different types of models have been 
proposed. One such category of models is 
purely geometric and exploits mathematical 
theories such as percolation and random graph. 
Others exploit phenomena of dynamic nonlin¬ 
ear self-reinforcing cascades of events. 
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Percolation models are based on the well 
known mathematical fact that in regular spa¬ 
tial structures of nodes connected by links, a 
uniform density of links produces connected 
subsets of nodes whose size is distributed ac¬ 
cording to power laws. Percolation models are 
time-transversal models: They model aggrega¬ 
tion at any given time. They might be used 
to explain how fat-tailed IID sequences are 
generated. 

Dynamic financial econometric models ex¬ 
ploit cascading phenomena due to nonlinear¬ 
ities, in particular multiplicative noise. In a 
deterministic setting, it is well known that non¬ 
linear chaotic models generate sequences that, 
when analyzed statistically, exhibit fat-tailed 
distributions. The same happens when noise is 
subject to nonlinear transformation. In the next 
sections, we explore simple ARMA models, 
ARCH-GARCH models, subordinated models, 
and state-space models, all examples of dy¬ 
namic financial econometric models. 

Before doing this, however, let's go back to 
the question of estimation. As observed above, 
if variables are not IID but can be considered 
generated by a DGP, the question of estimation 
is no longer the estimation of a variable but that 
of estimating a model or a theory. The estima¬ 
tion of the eventual tail index is part of a larger 
effort. However, empirical data are a sequence 
of samples characterized by an unconditional 
distribution. One might want to understand if 
estimation procedures used for IID sequences 
can be applied in this more general setting. For 
instance, one might want to understand if tail- 
index estimators such as the Hill estimator can 
be used in the case of serially correlated se¬ 
quences generated by a generic DGP. 

From a practical standpoint, this question is 
quite important as one wants to estimate the 
tails even if one does not know exactly what 
model generated the sequence. Clearly, there is 
no general answer to this problem. However, 
the behavior of a number of estimators under 
different DGPs has been explored through sim¬ 
ulation as explained in the following section. 


Heavy-Tailed ARMA Processes 

Let's first consider the infinite moving average 
representation of a univariate stationary series: 

OO 

x t = YhjS t _j + m 
i =0 

under the assumption that innovations are IID 
a-stable laws of tail index a. By the properties 
of stable distributions it can be demonstrated 
that the finite-dimensional distributions of the 
process x are a-stable. However, restrictions on 
the coefficients need to be imposed. It can be 
demonstrated that a sufficient condition to en¬ 
sure that the process x exists and is stationary 
is the following: 

OO 

£ M a <00 

i =0 

A general univariate ARMA(p,q) model is 
written as follows: 

v ? 

X t = Y<Xi x t-i + J2 a i Zt -i 

’=1 7=1 

where the Z are IID variables. 

Using the lag operator— L —notation, V rep¬ 
resents the variable at i lags, the ARMA(p,t/) 
model is written as follows: 

P 

x t = Y Li xt+Y L ’ Zt 

;=i /=l 

The theory of ARMA processes can be carried 
over at least partially to cover the case of fat¬ 
tailed innovations. In particular, an ARM A(p,t/) 
process with IID a-stable innovations admits a 
stationary, infinite moving average representa¬ 
tion under the same conditions as in the clas¬ 
sical finite-variance case. The coefficients of the 
moving average satisfy the condition 

OO 

£N“<oo 

i =0 

In the case of fat-tailed innovations, covari¬ 
ances and autocovariances lose their meaning. 
It can also be demonstrated, however, that the 
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empirical autocorrelation function is meaning¬ 
ful and is asymptotically normal. It can be 
demonstrated that maximum likelihood esti¬ 
mates can be extended to the infinite vari¬ 
ance case, though through a number of ad hoc 
processes. 


ARCH/GARCH Processes 

The simplest ARCH model can be written as 
follows. Suppose that X is the random variable 
to be modeled, Z is a sequence of independent 
standard normal variables, and a is a hidden 
variable. The ARCH(l) model is written as 

Xf = CTfZf 
CT f 2 = P + 'SXf-i 

This basic model was extended by Bollerslev 
(1989), who proposed the GARCH(p,q) model 
written as 

Xf = CTfZf 

= p + X W-i + X Si x t-i 

i=1 i =1 

The IID variables Z can be standard normal 
variables or other symmetrical, eventually fat¬ 
tailed, variables. 

Let's first observe that model parameters 
must be constrained in order to guarantee the 
stationarity of the model. Stationarity condi¬ 
tions depend on each model. No general sim¬ 
ple expression for the stationarity conditions is 
available. 

Due to the multiplicative nature of noise, 
GARCH models are able to generate fat-tailed 
distributions even if innovations have finite 
variance. This fact was established by Kesten 
(1973). The tail index can be theoretically com¬ 
puted at least in the case GARCH(1,1). Suppose 
a GARCH(1,1) stationary process with Gaus¬ 
sian innovation is given. It can be demonstrated 
that 

P(X > x) fit C -x~ 2k 


where k is the solution of an integral equation. 
In the generic p, cj case, the return process is still 
fat-tailed but no practical way to compute the 
index from model parameter is known. 

Subordinated Processes 

Subordinated processes allow the time scale to 
vary. Subordinated models are, in a sense, the 
counterpart of stochastic volatility models in¬ 
sofar as they model the change in volatility 
by contracting and expanding the time scale. 
The first model was proposed by Clark (1973). 
Subordinated models have been extensively 
studied by Ghysels, Gourieroux, and Josiak 
(1995). 

Subordinated models can be applied quite 
naturally in the context of trading. Individual 
trades are randomly spaced. In modern elec¬ 
tronic exchanges, the time and size of trades are 
individually recorded, thus allowing for accu¬ 
rate estimates of the distributional properties 
of inter-trades intervals. Consideration of ran¬ 
dom spacings between trades naturally leads to 
the consideration of subordinated models. Sub¬ 
ordinated models generate unconditional fat¬ 
tailed distributions. 

Markov Switching Models 

The GARCH family of models is not the only 
family of serially correlated models able to pro¬ 
duce fat tails starting from normally distributed 
innovations. State-space models and Markov- 
switching models present the same feature. The 
basic ideas of state-space models and Markov 
switching models is to split the model into two 
parts: (1) a regressive model that regresses the 
model variable over a hidden variable and (2) 
an autoregressive model that describes the hid¬ 
den variables. 

In its simplest linear form, a state-space model 
is written as follows: 

Xf = (X Zf -f- £f 

Zf = fSZ f _i + St 
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where e f , S t are normally distributed indepen¬ 
dent white noises. State-space models can also 
be written in a multiplicative form: 

Xf — aZf_i + St 

at = Pa t -1 + 8, 

If the second equation is a Markov chain, 
the model is called a Markov-switching model. 
A well-known example of Markov-switching 
models is the Hamilton model in which a two- 
state Markov chain drives the switch between 
two different regressions. 

Purely linear state-space models exhibit fat 
tails only if innovations are fat-tailed. However, 
multiplicative state-space models and Markov- 
switching models can exhibit fat tails even 
if innovations are normally distributed. There 
is a growing literature on Markov-switching 
and multiplicative state-space models and a 
relatively large number of different models, 
univariate as well as multivariate, have been 
proposed. Stochastic volatility models are the 
continuous-time version of multiplicative state- 
space models. 

Estimation 

Let's now go back to the question of model es¬ 
timation in a non-IID framework. Suppose that 
we want to estimate the tail index of the un¬ 
conditional distribution of a set of empirical 
observations in the general setting of non-IID 
variables. Note that if variables are fat-tailed, 
we cannot say that they are serially autocor- 
related as moments of second order generally 
do not exist. Therefore we have to make some 
hypothesis on the DGP. 

There is no general theory of estimation under 
arbitrary DGP. Both theoretical and simulation 
work are limited to specific DGPs. ARMA mod¬ 
els have been extensively studied. EVT holds 
for ARMA models under general nonclustering 
conditions. 9 

Often only simulation results are available. 
A fairly ample set of results are available for 


GARCH(1,1) models. For these models Resnick 
and Starica (1998) showed that the Hill estima¬ 
tor is a consistent estimator of the tail index. 
Wagner and Marsh compared the performance 
of the Hill estimator and of the moment ratio es¬ 
timator for three model classes: IID a-stable re¬ 
turns, IID symmetric Student, and GARCH(1,1) 
with Student-t innovation. They found that, in 
an adoptive framework, the moment ratio es¬ 
timator generally yields results superior to the 
Hill estimator. 

Scaling and Self-Similarity 

The concept of scaling is now quite frequently 
evoked in economics and finance. Let's begin by 
making a distinction between scaling and self¬ 
similarity and some of the properties associ¬ 
ated with inverse power laws within or outside 
the Levy-stable scaling regime. These concepts 
have different, and not equivalent, definitions. 

The concepts of scaling and self-similarity ap¬ 
ply to distributions, processes, or structures. 
Self-similarity was introduced as a property 
that applies to geometrical self-similar objects 
(i.e., fractal structures). In this context, self¬ 
similarity means that a structure can be put 
into a one-to-one correspondence with a part 
of itself. Note that no finite structure can have 
this property; self-similarity is the mark of infi¬ 
nite structures. Self-similarity entails scaling: If 
a fractal structure is expanded by a given fac¬ 
tor, its measure expands by a power of the same 
factor. 10 The notion of scaling is often expressed 
as absence of scale, meaning that a scaling ob¬ 
ject looks the same at any scale, large or small: It 
is impossible to ascertain the size of a portion of 
a scaling object by looking at its shape. The clas¬ 
sical illustration is a Norwegian coastline with 
its fjords and fjords within fjords that look the 
same regardless of the scale. 

However, scaling can be defined without 
making reference to fractals. In its simplest 
form, the notion of scaling entails a variable 
x and an observable A, which is a function 
of A = A(x). If the observable obeys a scaling 
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relationship, there is a constant factor between 
x and A in the sense that A(kx) = X s A(x), where 
s is the scaling exponent that does not de¬ 
pend on x. The only function A(x) that satisfies 
this relationship is a power law. In the three- 
dimensional Euclidean space, volume scales as 
the third power of linear length and surface as 
the second power, while fractals scale according 
to their fractal dimension. 

The same ideas can be applied in a random 
context, but require careful reasoning. A power- 
law distribution has a scaling property as 
multiplying the variable by a factor multiplies 
probabilities by a constant factor, regardless of 
the level of the variable. This means that the 
ratio between the probability of the events X > 
x and X > ax depends only on a power of a, 
not on x. As an inverse power law is not de¬ 
fined at zero, scaling in this sense is a property 
of the tails. The probabilistic interpretation of 
this property is the following: The probability 
that an observation exceeds ax conditional on 
the knowledge that the observation exceeds x 
does not depend on x but only on a. 

There are, however, other meanings attached 
to scaling and these might be a source of con¬ 
fusion. In the context of physical phenomena, 
scaling is often intended as identity of distribu¬ 
tion after aggregation. The same idea is also 
behind the theory of groups of renormaliza¬ 
tion and the notion of self-similarity applied 
to structures such as coastlines. In the latter 
case, the intuitive meaning of self-similarity is 
that if one aggregates portions of the coastline, 
approximating their shape with a straight line, 
and then rescales, the resulting picture is qual¬ 
itatively similar to the original. The same idea 
applies to percolation structures: By aggregat¬ 
ing "sites" (i.e., points in a percolation lattice) 
into supersites and carefully redefining links, 
one obtains the same distribution of connected 
clusters. 

Applying the idea of aggregation in a random 
context, self-similarity seems to mean that, after 
rescaling, the distribution of the sum of inde¬ 
pendent copies of a random variable maintains 


the same shape of the distribution of the vari¬ 
able itself. Note that this property holds only for 
the tails of subexponential distributions—and 
it holds strictly only for stable laws that have 
tails in the (0,2) range but whose shape is not 
a power law except, approximately, in the tails. 
It also holds for Gaussian distributions that do 
not have power-law tails. 

Scaling acquires yet another meaning when 
applied to stochastic processes that are func¬ 
tions of time. The most common among the 
different meanings is the following: A stochas¬ 
tic process is said to have a scaling property if 
there is no natural scale for looking at its paths 
and distributions. Intuitively, this means that it 
is not possible to gauge the scale of a sample 
by looking at its distribution; there is absence 
of scale. An example from finance comes from 
price patterns. If a price pattern is generated by 
a process with the scaling property, the plots of 
average daily and monthly prices will appear 
to be perfectly similar in distribution; looking 
at the plot, it's impossible to tell if it refers to 
daily or monthly prices. 

Self-similarity is another way of expressing 
the same concept. A process is self-similar if 
a portion of the process is similar to the en¬ 
tire process. As we are considering a random 
environment, self-similarity applies to distribu¬ 
tions, not to the actual realization of a process. 
Let's now make these concepts more precise. 

A stochastic process X(f) is said to be self¬ 
similar (ss) of index H (H-ss) if all its finite¬ 
dimensional distributions obey the scaling 
relationship: 

(x kh , x kl2 , X ktm )=k~ H (X h , X f2 ,..., XJVfc > 0 
0 < H < 1, fj, f z ,..., t„ > 0 

The above expression means that the scaling 
of time by the factor k scales the variables X 
by the factor k H . It gives precise meaning to 
the notion of self-similarity applied to stochastic 
processes. 

There is a wide variety of self-similar pro¬ 
cesses that cannot be characterized in a simple 
way as scaling laws: The scaling property of 
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stochastic processes might depend upon the 
shape of distributions as well as the shape 
of correlations. Let's restrict our attention to 
processes that are self-similar with stationary 
increments (sssi) and with index H (H-sssi). 
These processes can be either Gaussian or non- 
Gaussian. Note that a Gaussian process is a pro¬ 
cess whose finite-dimensional distributions are 
all Gaussian. 

Gaussian H-sssi processes might have inde¬ 
pendent increments or exhibit long-range corre¬ 
lations. The only Gaussian H-sssi process with 
independent increment is the Brownian mo¬ 
tion, but there are an infinite number of frac¬ 
tional Brownian motions, which are Gaussian 
H-sssi processes with long-range correlations. 
Thus there are an infinite variety of Gaussian 
self-similar processes. Among the many non- 
Gaussian H-sssi processes with independent in¬ 
crements are the stable Levy processes, which 
are random walks whose increments follow a 
stable distribution. 11 

There is another definition of self-similarity 
for stochastic processes that makes use of the 
concept of aggregation; it is closer, at least in 
spirit, to the theory of renormalization groups. 
Consider a stationary infinite sequence of inde¬ 
pendent and identically distributed variables 
X„ i > 1. Create consecutive nonoverlapping 
blocks of m variables and define the correspond¬ 
ing aggregated sequence of level m averaging 
over each block as follows: 


X 1 "’ 


1 

m 


E 

i=(k—l)m+l 


X, 


A sequence is called exactly self-similar if, for 
any integer m the following relationship holds: 

X = rn 1_H X (m) 


A stationary sequence is called asymptotically 
self-similar if the above relationship holds only 
for m —»■ oo. 

When we apply the notion of scaling to 
stochastic processes—the natural setting for 
economics and finance—we have to abandon 


the simple characterization of scaling as in¬ 
verse power laws. Though the scaling property 
is in itself characterized through simple power 
laws, the scaling processes are complex and rich 
mathematical structures entailing a variety of 
distributions and correlation functions. In par¬ 
ticular, the long-range correlation structure of 
the process plays a role as important as the dis¬ 
tribution of its variables. 


KEY POINTS 

• Fat-tailed laws have been found in many eco¬ 
nomic variables. 

• Fully approximating a finite economic system 
with fat-tailed laws depends on an accurate 
statistical analysis of the phenomena, but also 
on a number of the theoretical implications of 
subexponentiality and scaling. 

• Modeling financial variables with stable laws 
implies the assumption of infinite variance, 
which seems to contradict empirical observa¬ 
tions. 

• Scaling laws might still be an appropriate 
modeling paradigm given the complex inter¬ 
action of distributional shape and correlations 
in price processes. 

• Scaling laws might help in understanding not 
only the sheer size of economic fluctuations 
but also the complexity of economic cycles. 

NOTES 

1. See Bamberg and Dorfleitner (2001). 

2. See, for example, Sigman (1999). 

3. See, for example, Goldie and Kluppelberg 
(1998) and Embrechts, Kluppelberg, and 
Mikosch (1999). 

4. See Sigman (1999). 

5. See Rachev and Mittnik (2000) and Rachev, 
Menn, and Fabozzi (2005). 

6. Cox and Isham (1980). 

7. For a discussion of the different methods, 
see Smith (1990). For a discussion of the 
method of probability-weighted moments, 
see Hosking, Wallis, and Wood (1985). 
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8. Diebold, Schuermanrt, and Stroughair 

( 2000 ). 

9. See Embrechts, Kluppelberg, and Mikosch 
(1999). 

10. For an introduction to fractals, see Falconer 
(1990). 

11. See Samorodnitsky and Taqqu (1994). 
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Abstract: Understanding dependences or functional links between variables is a key theme in fi¬ 
nancial modeling. In general terms, functional dependences are represented by dynamic models. 
Many important models are linear models whose coefficients are correlations coefficients. In many 
instances in financial modeling, it is important to arrive at a quantitative measure of the strength 
of dependencies. The correlation coefficient provides such a measure. In many instances, however, 
the correlation coefficient might be misleading. In particular, there are cases of nonlinear depen¬ 
dencies that result in a zero correlation coefficient. From the point of view of financial modeling, 
this situation is particularly dangerous as it leads to substantially underestimated risk. Different 
measures of dependence have been proposed, in particular copula functions. 


Correlation is a widespread concept in finan¬ 
cial modeling and stands for a measure of de¬ 
pendence between random variables. However, 
this term is very often incorrectly used to mean 
any notion of dependence. Actually correlation 
is one particular measure of dependence among 
many. In the world of multivariate normal dis¬ 
tribution and, more generally, in the world of 
spherical and elliptical distributions, it is the 
accepted measure. This follows from a prop¬ 
erty of the multivariate normal distribution. In 
this entry, we discuss the limitations of correla¬ 
tion as a measure of the dependence between 
two random variables and introduce an alter¬ 


native measure to overcome these limitations, 
copidas. 1 

DRAWBACKS OF 
CORRELATION 

In the general case, there are at least three 
major drawbacks of the correlation measure. 
Consider the case of two real-valued random 
variables X and Y. First, the variances of X and 
Y must be finite or the correlation is not de¬ 
fined. This assumption causes problems when 
working with heavy-tailed data because under 
certain circumstances the variances are infinite 
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and, for that reason, the correlation between 
them is not defined. 

Second, independence of two random vari¬ 
ables implies correlation equal to zero; however, 
generally speaking the opposite is not correct— 
zero correlation does not imply independence. 2 
Only in the case of elliptical distribution are 
uncorrelatedness and independence inter¬ 
changeable notions. This statement is not valid 
if only the marginal distributions are elliptical 
and the joint distribution is nonelliptical. 

Lastly, a more technical point. The correlation 
is not invariant under nonlinear strictly increas¬ 
ing transformations, a serious disadvantage. In 
general corr(T(X),T(Y)) ^ corr(X,Y). One exam¬ 
ple that explains this technical requirement is 
the following: Assume that X and Y represent 
the continuous return (log-return) of two finan¬ 
cial assets over the period [0 ,t], where t denotes 
some point of time in the future. If you know 
the correlation of these two random variables, 
this does not imply that you know the depen¬ 
dence structure between the asset prices itself 
because the asset prices (P and Q for asset X 
and Y, respectively) are obtained by P t = P 0 ■ 
exp(X) and Q t — Q 0 - exp( Y). The asset prices are 
strictly increasing functions of the return but the 
correlation structure is not maintained by this 
transformation. This observation implies that 
the return could be uncorrelated whereas the 
prices are strongly correlated and vice versa. 


OVERCOMING THE 
DRAWBACKS OF 
CORRELATION: COPULAS 

A more prevalent approach, which overcomes 
this disadvantage, is to model dependency us¬ 
ing copulas. As noted by Patton (2004, p. 3): 
"The word copula comes from Latin for a 'link' 
or 'bond', and was coined by Sklar (1959), 
who first proved the theorem that a collec¬ 
tion of marginal distributions can be 'coupled' 
together via a copula to form a multivariate 
distribution." The idea is as follows. The 


description of the joint distribution of a random 
vector is divided into two parts: 

1. The specification of the marginal distribu¬ 
tions. 

2. The specification of the dependence struc¬ 
ture by means of a special function, called 
copula. 

The use of copulas offers the following advan¬ 
tages: 

• The nature of dependency that can be mod¬ 
eled is more general. In comparison, only 
linear dependence can be explained by the 
correlation. 

• Dependence of extreme events might be 
modeled. 

• Copulas are indifferent to continuously in¬ 
creasing transformations (not only linear as 
it is true for correlations). 

Because of these advantages, in recent years 
there has been increased application of copulas 
in asset and option pricing, portfolio selection, 
and risk management. 

MATHEMATICAL 
DEFINITION OF COPULAS 

From a mathematical viewpoint, a copula func¬ 
tion C is nothing more than a probability 
distribution function on the d-dimensional hy¬ 
percube Id — [0,1] x [0,1] x ... x [0,1]: 

C : t d [0,1] 

(xi,... ,x d ) -> C(x i, ...,x d ) 

It has been shown 3 that any multivariate prob¬ 
ability distribution function Fy of some random 
vector Y = (Yi,..., Y d ) can be represented with 
the help of a copula function C in the following 
form: 

fy(yi, ■ • •, yd) = P(Yi < yi, • ■ •, Yd < yd) 

= C(P(Y 1 <y 1 ),...,P(Y d <y d )) 
= C(F Yl (y,P Yd (y d )) 

where the Fy ; , i = 1,..., d denote the marginal 
distribution functions of the random variables 
Y;, z = 1,..., d. 
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Figure 1 Visualization of the Copula for Bivariate Independence* 

Panel a: Uniform Marginal Distributions. Panel b: Standard Normal Marginal Distributions. 

“The graphs show the joint distribution function of a bivariate random vector for two different marginal 
distributions. Each panel consists of a surface and a corresponding contour plot. 


The copula function makes the bridge be¬ 
tween the univariate distribution of the individ¬ 
ual random variables and their joint probability 
distribution. This justifies the fact that the cop¬ 
ula function creates uniquely the dependence, 
whereas the probability distribution of the in¬ 
volved random variables is provided by their 
marginal distribution. 

As an example we consider the following 
three bivariate copula functions: 

• C(x, y)=x-y 

• C(x, y) = min(x, y) 


® _1 to ® _1 (y) 


C(X ' ,J) = f f 

—oo — oo 

' s 2 — 2 pst + f 


P 2 ) 1/2 


exp 


2(1 - p 2 ) 


dsdt 


The first represents the independent case as 
the joint probability distribution equals the 
product of their marginals. The second exam¬ 
ple represents a case of extreme dependence 
whereas the third example represents the gen¬ 
eral Gaussian copula function for the bivariate 
case. 
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Panel a Panel b 




Figure 2 Visualization of the Bivariate Minimum Copula* 

Panel a: Uniform Marginal Distributions. Panel b: Standard Normal Marginal Distributions. 

*The graphs show the joint distribution function of a bivariate random vector for two different marginal 
distributions. Each panel consists of a surface and a corresponding contour plot. 



We illustrate the effect of the different copu¬ 
las by applying them to two different marginal 
distributions, namely (1) the uniform distribu¬ 
tion on the interval [0,1] and (2) the standard 
normal distribution. The results are presented 
in Figures 1, 2, and 3. 

KEY POINTS 

• In financial modeling, it is critical to under¬ 
stand dependencies or functional links be¬ 


tween variables and have a quantitative mea¬ 
sure of the strength of dependencies. 

• The most commonly used measure of depen¬ 
dency in finance is the correlation coefficient. 
This measure might be misleading. In particu¬ 
lar, there are cases of nonlinear dependencies 
that result in a zero correlation coefficient. 

• The existence of finite variances is required 
for a correlation to be computed. Some return 
distributions, however, have fat tails, and the 
variances are infinite. 
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Figure 3 Visualization of the Gaussian Copula with Correlation p = 0.8* 

Panel a: Uniform Marginal Distributions. Panel b: Standard Normal Marginal Distributions. 

“The graph shows the joint distribution function of a bivariate random vector for two different marginal 
distributions. Each panel consists of a surface and a corresponding contour plot. 



* The correlation is not invariant under non¬ 
linear strictly increasing transformations, 
making the use of this measure a serious dis¬ 
advantage. 

* The copula overcomes the drawbacks of the 
correlation as a measure of dependency by al¬ 
lowing for a more general measure than lin¬ 
ear dependence, allowing for the modeling 
of dependence for extreme events, and being 
indifferent to continuously increasing trans¬ 
formations. 

* The copula function bridges the univariate 
distribution of the individual random vari¬ 


ables and their joint probability distribution, 
thereby justifying the fact that the copula 
function creates the dependence uniquely, 
whereas the probability distribution of the in¬ 
volved random variables is provided by their 
marginal distribution. 

NOTES 

1. For a discussion of applications in finance 
and insurance, see Embrechts, McNeil, and 
Straumann (1999) and Patton (2003a, 2003b, 
2004). 
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2. A simple example is the following: Let X be 
a standard normal distribution and Y = X 2 . 
Because the third moment of the standard 
normal distribution is zero, the correlation 
between X and Y is zero despite the fact that 
Y is a function of X, which means that they 
are dependent. 

3. The importance of copulas in the modeling 
of the distribution of multivariate random 
variables is provided by Sklar's theorem. The 
derivation was provided in Sklar (1959). 
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Abstract: Value-at-risk (VaR) calculation based on parametric models is in essence an estimation 
problem. The point estimates should be interpreted accompanied by their confidence intervals. 
Risk management for complex portfolios may consider simultaneously two or more VaR confi¬ 
dence levels. The quantiles used for VaR estimation at different orders such as 1% and 5% are not 
independent and therefore should be analyzed jointly. Consequently, it would be useful to establish 
confidence regions for bivariate VaR estimates that will provide the risk managers with a valuable 
tool for verifying the accuracy of their estimation process, as requested by external audit. A trade¬ 
off between the complexity of probability distribution underlying the model and the degree of 
robustness achieved is recommended. 


While there are many models used for calcu¬ 
lations of risk management measures such as 
value-at-risk (VaR) and expected tail loss (ETL), 
there are not many tools available to a risk man¬ 
ager to verify whether the models chosen are 
very good in practice. In this entry, we high¬ 
light some practical aspects of VaR and ETL 
calculus that are underpinned by theoretical re¬ 
sults on order statistics. More precisely, we show 
how to compute VaR and ETL based on quantile 
sample statistics and how to derive the proba¬ 
bility distribution of this estimator. The most 
important development in this entry is that we 
illustrate how to control the backtesting of two 
risk measures, given by different specifications 
of confidence levels such as 99% and 95%. Usu¬ 
ally there is a difference between the confidence 
level that a bank may use internally and the 


confidence level required by a regulator. Then 
the risk manager should make sure that the risk 
models used perform well for both confidence 
levels. 


PERFORMANCE OF VaR 
ESTIMATION 

VaR is widely used in the financial industry 
as a measure for market risk in normal con¬ 
ditions. This concept has a strong influence on 
bank capital, some of the major implications 
of this estimation process being described in 
Jackson et al. (1997). The European Capital Ad¬ 
equacy Directive allows internal risk manage¬ 
ment models. Marshall and Siegel (1997) found 
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great errors in the estimation methods used in 
the industry. Berkowitz and O'Brien (2002) in¬ 
vestigated the accuracy of value-at-risk models 
used by a sample of large commercial banks 
and their analysis revealed discrepancies in the 
performance of their models. Brooks and Per- 
sand (2002) analyzed common methodologies 
for calculating VaR and concluded that simpler 
models provide better performance than very 
complex models. In the light of severe market 
disruptions and appeal for more stringent mea¬ 
sures, the issue of how reliable is the model used 
for market risk is of paramount importance. 

The estimation of VaR is a statistical exer¬ 
cise and the risk manager, trader, or quant 
analyst has to consider the reliability of the esti¬ 
mates proposed, especially when large amounts 
of money are involved. Although there is a 
plethora of models for VaR pointwise estima¬ 
tion, reviewed for example in Duffie and Pan 
(1997) and Jorion (1996,1997), the literature on 
the confidence associated with these estimators 
is sparse. Jorion (1996) was among the first re¬ 
searchers to consider the uncertainty associated 
with VaR models leading to model risk. Kupiec 
(1995) suggested that it may be very hard to 
determine statistically the accuracy of VaR esti¬ 
mates. After his seminal paper, Pritsker (1997) 
and Dowd (2001) showed how to employ order 
statistics for assessing the VaR accuracy. Dowd 
(2000) described how to build confidence inter¬ 
vals for VaR estimates using simulations meth¬ 
ods but his technique was illustrated only for 
some special cases linked to the Gaussian dis¬ 
tribution. 

Calibrating the models is not always easy and 
for auditing and backtesting purposes the pre¬ 
specified level of confidence can play an im¬ 
portant role. The nonlinearity in results when 
calculating VaR at various levels of confidence 
means that, based on the same model, conclu¬ 
sions obtained in backtesting at one level cannot 
be extrapolated to other levels. In other words, 
we can have a model with very good forecast¬ 
ing power at 5% and quite bad results at 1%, or 


VaR AND DIFFERENT LEVELS 
OF CONFIDENCE 

The starting point of VaR modeling is a time 
series Yi, Y 2 , ..., Y„ of profit and loss obser¬ 
vations (P/L); the time series consists of past 
returns or simulated returns. If the critical level 
(of confidence) for VaR is specified as a (e.g., 
10%, 5%, 1%), for a given sample the VaR is 
determined from the empirical quantile at a°/o, 
which we shall denote by z a . This means that, 
if F(i/) = fl f(u)du is the cumulative den¬ 
sity function of returns, then F (z a ) = a and the 
probability area to the right of z a is equal to 1 
— a. One of the main assumptions made with 
many models for calculating VaR is that the re¬ 
turns Yi, Y 2 ,..., Y n are independent and identi¬ 
cally distributed (IID). This is extremely impor¬ 
tant in supporting the idea that VaR (for future 
returns) can be forecasted based on past data. If 
the IID assumption is not true, then the empiri¬ 
cal quantile cannot be simply calculated from a 
formula. 

Let p be the number of times the realized 
losses exceed the VaR threshold. The risk man¬ 
ager expects ex ante that e(? 7 ) = na. However, ex 
post it is likely that p ^ na. For backtesting, the 
daily loss series implies a sequence of success 
or failure, depending whether the loss is greater 
than VaR threshold or not. The probability of 
failure is a and therefore, with n datapoints, the 
probability density function of p is given by the 
binomial distribution with parameters p and a 

p(p = x)=( n x \u*(l-u) n -* (1) 

for x e {0,1, 2,...}. If the sample size n is large 
enough, the central limit theorem implies that 
follows a standard Gaussian distribu- 

V «“(!—“) 

tion. An asymptotic confidence interval for the 
number of losses that will be seen p can then be 
easily calculated. For example, a 95% asymp¬ 
totic confidence interval for p is 

—1.96^/na(l — a) + na < p 

< 1.96y/na(l — a) + na 


vice versa. 


( 2 ) 
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From the probabilistic point of view the P/L 
values constitute a random sample {Yj, Y 2/ 
... ,Y n } with cumulative distribution function 

n n 

F(yi, y 2 ,..., y„;6) = ]“[ F k (y k ; <p k ) = ]“[ F( y ;0) 

i=i ;=i 


where the last equality follows from the IID 
assumptions. For the empirical calculations of 
VaR the reordered sample (Ypq, Y[ 2 ],..., Y\„\), 
with Y[i] < Y[ 2 ] < ... < Y[„]) is of interest be¬ 
cause the VaR at level a is equal to the negative 
of the u-th lowest value, where v = 100 a + 1 . 
The statistic Ypj is called the first order statistic, 
Y[ 2 ] is called the second order statistic, and so 
on. Y[„] is called the n-th order statistic, and 
they are all sample quantiles. The theory of 
order statistics allows making calculations on 
sample quantiles. This translates for empirical 
work based on the sample above into calculat¬ 
ing the negative of the u-th lowest value, where 
u = na + 1 , orY[„j. 

The portfolio losses can be analyzed through 
the empirical cumulative distribution function 


?(!/) = 


0 

L 

n 


l 


if y < % 
if y [;] ^ y < y [/+i] 
if y> y M 


(3) 


The inverse of this empirical cdf can be used 
as an estimator of VaR at a level. The VaR 
estimator is the order statistic Y[ ( ] such that 

< a < F, which is slightly different from 
the upper empirical cumulative distribution 
function value calculated as the Y[q such that 

< a < F. Mausser (2001) pointed out that 
with 100 IID P/L values, the VaR at 5% level 
would be estimated by the former estimator as 
Y[ 5 ] and by the latter as Y[ 6 ]. 

One major criticism in using VaR to quan¬ 
tify potential losses is the inability to gauge the 
size of extreme losses. To overcome this prob¬ 
lem another risk measure called expected tail 
loss (ETL) has been introduced. The ETL is de¬ 
fined as the mean losses that exceed the VaR 
threshold. Hence, within the same framework 
proposed to calculate VaR, one can determine 
ETL by simply estimating the mean of the sam¬ 


ple censored by the VaR estimate. If Y^-j is the 
order statistic estimator representing VaR, ETL 
can be estimated as the average of (Ypj, Y[ 2 ],..., 
Y[y_i]). It is important to realize that while ETL 
may be more informative for gauging the poten¬ 
tial losses than VaR, from an estimation point 
of view ETL will always depend on VaR. 

The calculation of VaR and expected tail loss 
(ETL) with the order statistics methodology can 
be easily implemented in Matlab. Table 1 con¬ 
tains the VaR and ETL as estimated via the order 
statistics method for simulated samples using 
the Gaussian distribution and the t distribution 
for the series of P/L, at various confidence lev¬ 
els and sample sizes. In addition, the confidence 
intervals determined as the 0.025% and 0.975% 
percentiles of the distribution of each risk mea¬ 
sure are also included. For a given sample size, 
the confidence intervals for both VaR and ETL 
are widening with the increase in the level of 
confidence, as shown in Figures 1 and 2. Similar 
results are obtained for larger sample sizes and 
other distributions. For a prespecified level of 
confidence, the confidence intervals tend to go 
narrower with the increase in the sample size. 

JOINT PROBABILITY 
DISTRIBUTIONS FOR 
ORDER STATISTICS 

If F[,](m) = P(Y[f] < u) is the cumulative distribu¬ 
tion function of the i-th order statistic, then it is 
not difficult to see that Fp](y) = 1—[1— F(y; </>)]" 
and F[ n ](y) — F[(y; </>)]". Exploiting the fact that 
we use the quantile as a VaR estimator, Dowd 
( 2001 ) suggested applying the following known 
result from order statistics for backtesting pur¬ 
poses 

P (exactly j values fromYi, Y 2 ,... ,Y„ are < y) 

= ( n j )F(y;<t>V[l-F(y;cl>)r-i (4) 

to derive the cumulative distribution function 
of this estimator 

%(y) = P<?in < y) = E ( ”) ^(y;<h)'H-f (y;<h)I"" ! (5) 

i=j ' ' 
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Table 1 Order Statistics for VaR and ETL for One-Day Holding Period at 90%, 95% and 99% Confidence Levels 
and Various Sample Sizes Using Standard Normal Distribution and f Distribution 


Sample size 

Level 

Measure 


Normal 



t 


2.50% 

Median 

97.5% 

2.50% 

Median 

97.5% 

n = 100 

90% 

VaR 

0.9299 

1.2816 

1.5874 

0.9247 

1.2770 

1.5854 



ETL 

1.4677 

1.7535 

2.0120 

1.4671 

1.7538 

2.0198 


95% 

VaR 

1.2116 

1.6449 

2.0078 

1.2068 

1.6435 

2.0130 



ETL 

1.6956 

2.0614 

2.3788 

1.6975 

2.0670 

2.3974 


99% 

VaR 

1.6031 

2.3263 

2.8160 

1.6012 

2.3407 

2.8520 



ETL 

2.0254 

2.6640 

3.1116 

2.0335 

2.6897 

3.1677 

n = 500 

90% 

VaR 

1.1278 

1.2816 

1.4263 

1.1268 

1.2807 

1.4256 



ETL 

1.6269 

1.7535 

1.8748 

1.6271 

1.7515 

1.8758 


95% 

VaR 

1.4543 

1.6449 

1.8218 

1.4537 

1.6446 

1.8220 



ETL 

1.8985 

2.0614 

2.2150 

1.8996 

2.0598 

2.2176 


99% 

VaR 

1.9921 

2.3263 

2.6185 

1.9930 

2.3292 

2.6236 



ETL 

2.3650 

2.6640 

2.9299 

2.3685 

2.6653 

2.9385 

n = 1000 

90% 

VaR 

1.1735 

1.2816 

1.3850 

1.1731 

1.2811 

1.3847 



ETL 

1.6644 

1.7535 

1.8401 

1.6645 

1.7513 

1.8405 


95% 

VaR 

1.5110 

1.6449 

1.7719 

1.5108 

1.6447 

1.7720 



ETL 

1.9467 

2.0614 

2.1715 

1.9473 

2.0590 

2.1727 


99% 

VaR 

2.0899 

2.3263 

2.5425 

2.0906 

2.3278 

2.5447 



ETL 

2.4519 

2.6640 

2.8604 

2.4539 

2.6623 

2.8643 

n = 5000 

90% 

VaR 

1.2337 

1.2816 

1.3285 

1.236 

1.2815 

1.3284 



ETL 

1.7139 

1.7535 

1.7926 

1.7140 

1.7510 

1.7927 


95% 

VaR 

1.5857 

1.6449 

1.7027 

1.5856 

1.6448 

1.7027 



ETL 

2.0105 

2.0614 

2.1114 

2.0106 

2.0583 

2.1116 


99% 

VaR 

2.2214 

2.3263 

2.4274 

2.2216 

2.3266 

2.4278 



ETL 

2.5695 

2.6640 

2.7556 

2.5700 

2.6600 

2.7562 

n = 10000 

90% 

VaR 

1.2478 

1.2816 

1.3148 

1.2478 

1.2815 

1.3148 



ETL 

1.7256 

1.7535 

1.7813 

1.7256 

1.7510 

1.7813 


95% 

VaR 

1.6031 

1.6449 

1.6859 

1.6031 

1.6448 

1.6859 



ETL 

2.0256 

2.0614 

2.0968 

2.0255 

2.0582 

2.0969 


99% 

VaR 

2.2524 

2.3263 

2.3984 

2.2525 

2.3265 

2.3985 



ETL 

2.5974 

2.6640 

2.7292 

2.5976 

2.6597 

2.7296 


Note: The number of degrees of freedom for t is chosen as the sample size minus 2. 


In the following we shall denote F(y; </;) by F(y), 
for simplicity. David (1981) pointed to the fol¬ 
lowing useful result giving an analytical for¬ 
mula for the distribution function of the order 
statistic of order j. 


the j -th order statistics is 

m(y) = B{j „ 1 _ /+ 1 ) fH (y)[ 1 - f (y)r i /(y) 

(7) 


where/(y) = g(y). 


F lj](y) = #F(y)(/', n - j + 1) (6) 

where B u (a, b) = f ^ S —— is the incom¬ 
plete beta function and B(a, b) is the beta func¬ 
tion. This helps to calculate the pdf function 
for those distributions that are absolute con¬ 
tinuous with respect to a dominant probability 
measure. 1 The probability density function of 


DISTRIBUTION-FREE 
CONFIDENCE INTERVALS 
FOR VaR 

From a practical point of view, without any 
loss of generality, it is safe to assume that the 
cumulative distribution function F is strictly 
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Normal ETL against confidence level 



Figure 1 Expected Tail Loss for Normal P/L versus Level of Confidence When the Sample Size Is 100; 
Calculations Are Done with Order Statistics 


increasing. Then, for any a e (0, 1) the equa¬ 
tion 

F(y) = « (8) 

has a unique solution. This solution refers to the 
entire population and it is called the quantile of 
order a, denoted by z a . The 95% VaR is z 0 .os. 

The order statistics can provide a distribution- 
free confidence interval for the population 
quantiles. Thompson (1936) showed that 

j ~ t / \ 

P(Y [! ]<z«<Y [ /]) = EU )«*(!-«)"-* 

k=i ' ' 

( 9 ) 

This powerful result allows the construction of 
distribution-free confidence intervals for VaR. 
For given sample size n and VaR level a, there 
are many combinations of i and j that make 
the quantity in (9) larger or equal to 1 — a, the 
confidence level desired. There may be several 
combinations of order statistics Y[,], Y \j\ that sat¬ 
isfy the relationship (9) and the risk manager 


may decide to select the combination leading 
to the shortest confidence interval. Remark that 
choosing the degree of confidence 1 — a is in¬ 
dependent of the level of confidence a for VaR 
point-estimation. In other words, a 95% confi¬ 
dence interval for the population quantile z a 
can be calculated for 95% VaR or for 99% VaR. 


BIVARIATE ORDER 
STATISTICS 

The risk manager is faced with a dilemma. On 
one hand the regulators are asking usually for 
99%-VaR calculation so that the banks are re¬ 
quested to set aside sufficient capital in order to 
absorb 99% of all losses. On the other hand, 
internal models may be used for day-to-day 
operations to forecast 95% Var. As explained 
by Brooks and Persand (2002) using an exam¬ 
ple from Kupiec (1995), the standard error of the 
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Normal VaR against confidence level 



Figure 2 VaR for Normal P/L versus Level of Confidence When the Sample Size Is 100; Calculations 
Are Done with Order Statistics 


99% VaR can be more than 50% larger than the 
corresponding standard error for the 95% VaR. 
This is the case for a model using the Gaussian 
distribution and it can be even worse for fat tail 
distributions, with the confidence intervals for 
the first percentile four times wider than confi¬ 
dence intervals for the fifth percentile. For back¬ 
testing purposes it would be ideal to do a joint 
analysis. Thus, the bivariate joint distribution of 
two order statistics will provide the confidence 
regions (two-dimensional sets) for pairs of VaR 
estimates. For example, the confidence regions 
for 1% VaR and 5% VaR are recovered from the 
bivariate joint distribution of Y[„q, Y[„ 2 ] where 
vi = n x 1/100 + 1 and u 2 = n x 5/100 + 1, re¬ 
spectively. This distribution is fully character¬ 
ized by 

F[i,j](x, l J ) = p ( y m < % Y W < V ) ( 10 ) 

with 1 < i < j < n. The probability on the right 
side of equation (10) can be interpreted as the 


probability that at least i values from the entire 
sample Y\ , Y 2 ,..., Y„ are not greater than x and at 
least/values from the same sample Yi,Y 2 , .. .,Y„ 
are not greater than y. Flence 

n k 

F[;,;](x, y) = XI F(exactly i of Y u Y 2 .Y„ 

k=j s=i 

are < _r and 

exactly j of Yi, Y 2 , _Y„ are < y) 

( 11 ) 

As in the univariate case, see David (1981), it 
follows that 

n k , 

yj I 

F[i ' ; ']( X ’ y) = ^ s!(/c — s)!(h — k)l 

k=j s=i 

x[F(x)] s [F(y)-F( X )] k - s [l-F(y)r- k 

(12) 

for any x < y. Since for x >y the event {Y| y [ < 
y} implies Y [x] < x then F [Uj] (x, y) = F in {y). 




Applications of Order Statistics to Risk Management Problems 


295 


An interesting corollary following from this 
result is that any two order statistics, and there¬ 
fore VaR estimates at different levels, are not 
independent. This follows because the joint dis¬ 
tribution in (12) cannot be factorized as a prod¬ 
uct of two factors, one depending only on x 
and the other only on y, up to a proportion¬ 
ality constant. In other words, if both 1% VaR 
and 5% VaR, for example, are needed for risk 
management purposes, then the quality of the 
VaR estimates should be investigated looking at 
the joint bivariate distribution like that in (12) 
rather than separate distributions of the type 
given in (5). 

KEY POINTS 

• Order statistics can be used as estimators of 
VaR and ETL and they are easy to compute. 

• Banks may have to work with VaR measures 
at several levels of confidence because of reg¬ 
ulatory requirements that may not coincide 
exactly with internal risk management deci¬ 
sions. 

• ETL can be estimated easily with the frame¬ 
work based on order statistics, as the mean of 
the sample censored by the VaR threshold. 

• For a given sample size, the confidence inter¬ 
vals for both VaR and ETL are widening with 
the increase in the level of confidence. For 
a prespecified level of confidence, the confi¬ 
dence intervals tend to go narrower with the 
increase in the sample size. 

• There is a closed form solution for the density 
of any order statistic, which has been advo¬ 
cated here as a VaR estimator. Therefore, it 
would be easy to perform backtesting of VaR 
in this setup. 

• The bivariate distribution of any two order 
statistics is known in closed form and there¬ 
fore could be used for backtesting when banks 
have to work with two VaR measures simul¬ 
taneously. 


NOTE 

1. For practical cases such as those encountered 
in finance we can safely assume that the ran¬ 
dom variables describing P/L series are con¬ 
tinuous and they have probability density 
functions. 


REFERENCES 

Berkowitz, J., and O'Brien, J. (2002). How accurate 
are value-at-risk models at commercial banks. 
Journal of Finance 57, 3:1093-1111. 

Brooks, C., and Persand, G. (2002). Model choice 
and value-at-risk performance. Financial Ana¬ 
lysts Journal 58, 5: 87-97. 

David, H. (1981). Order Statistics, 2nd ed. New 
York: Wiley. 

Dowd, K. (2000). Assessing VaR accuracy. Deriva¬ 
tives Quarterly 6, 3: 61-63. 

Dowd, K. (2001). Estimating VaR with order statis¬ 
tics. Journal of Derivatives 8, 3: 23-30. 

Duffie, D., and Pan, J. (1997). An overview of 
value-at-risk. Journal of Derivatives 4, 3: 7-49. 

Jackson, P., Maude, D. J., and Perraudin, W. (1997). 
Bank capital and value-at-risk. Journal of Deriva¬ 
tives 4: 73-90. 

Jorion, P. (1996). Risk2: Measuring the risk 
in value-at-risk. Financial Analysts Journal 52: 
47-56. 

Jorion, P. (1997). Value-at-Risk: The New Benchmark 
for Controlling Market Risk. Burr Ridge, IL: Irwin. 

Kupiec, P. (1995). Techniques for verifying the ac¬ 
curacy of risk measurement models. Journal of 
Derivatives 3: 73-84. 

Marshall, C., and Siegel, M. (1997). Value-at-risk: 
Implementing a risk measurement standard. 
Journal of Derivatives 4: 91-110. 

Mausser, H. (2001). Calculating quantile-based 
risk analytics with 1-estimators. ALGO Research 
Quarterly 4, 4: 33-47. 

Pritsker, M. (1997). Evaluating VaR methodolo¬ 
gies: Accuracy versus computational time. Jour¬ 
nal of Financial Services Research 12: 201-241. 

Thompson, W. (1936). On confidence ranges for 
the median and other expectation distributions 
for populations of unknown distribution form. 
Annals of Mathematical Statistics 42: 268-269. 



Risk Measures 




Measuring Interest Rate Risk: 
Effective Duration and Convexity 

GERALD W. BUETOW Jr., PhD, CFA 

President and Founder, BFRC Services, LLC 

ROBERT R. JOHNSON, PhD, CFA, CAIA 

Independent Financial Consultant, Charlottesville, VA 


Abstract: Modified duration and effective duration are two ways to measure the price sensitivity of 
a fixed income security Both measure the percentage price change of a security from an absolute 
change in yields. Effective duration is a more complete measure of price sensitivity since it incorpo¬ 
rates embedded optionality while modified duration does not. Combining effective duration with 
effective convexity is a superior risk management and measurement approach than using modified 
duration and convexity In general, for fixed income securities with embedded options, numerical 
approaches (effective) to risk measurement are superior to analytic (modified) approaches. 


Modified duration ignores any effect on cash 
flows that might take place as a result of 
changes in interest rates. Effective duration does 
not ignore the potential for such changes in 
cash flows. For example, bonds with embedded 
options will have very different cash flow prop¬ 
erties as interest rates (or yields) change. Mod¬ 
ified duration ignores these effects completely. 
In order to apply effective duration, an available 
interest rate model and corresponding pricing 
model are needed. 1 The example in this entry 
shows how to compute the effective duration 
of securities with cash flows that are dependent 
on changes in either the level or dynamics of 
the term structure of interest rates. 

There is no difference between modified 
and effective duration for option-free or straight 
bonds. In fact, it can be shown that they are 


mathematically identical when the change in 
rates (or yields) becomes very small. As shown 
in the example, even for bonds with embedded 
options, the differences between the two mea¬ 
sures are minimal over certain ranges of yields. 
For example, when the embedded option is far 
out-of-the-money, the cash flows of the bond 
are not affected by small changes in yields, re¬ 
sulting in almost no difference in cash flows 
between the two measures. 

Convexity and effective convexity measure 
the curvature of the price/yield relationship. 
Convexity (sometimes referred to as stan¬ 
dard convexity) suffers the same limitations 
as modified duration and is therefore not 
generally useful for securities with embed¬ 
ded options. However, similar to the duration 
measures, in ranges of rates (or yields) where 
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the cash flows are not materially affected by 
small changes in yields, the two convexity mea¬ 
sures are almost identical. 

As with the duration measures, there is no 
difference between convexity and effective con¬ 
vexity for option-free or straight bonds. In fact, 
it can be shown that they are mathematically 
identical when the change in rates (or yields) 
becomes very small. As shown above, even 
for bonds with embedded options, the differ¬ 
ences between the two measures are minimal 
over certain ranges of rates depending on the 
characteristics of the embedded option. For 
example, when the embedded option is far out- 
of-the-money, the cash flows of the bond are not 
affected by small changes in yields. 


EFFECTIVE DURATION AND 
EFFECTIVE CONVEXITY—AN 
EXAMPLE 

The following example illustrates how to cal¬ 
culate and interpret effective duration and ef¬ 
fective convexity for straight bonds and bonds 
with embedded options. 2 

Suppose we need to measure the interest rate 
sensitivity of the following three securities: 


1. A 5-year, 6.70% coupon straight (noncallable 
and nonputable) semiannual coupon bond, 
with a current price of 102.75% of par. 

2. A 5-year, 6.25% coupon bond, callable at 
par in years 2 through 5 on the semiannual 
coupon dates, with a current price of 99.80% 
of par. 

3. A 5-year, 5.75% coupon bond, putable at 
par in years 2 through 5 on the semiannual 
coupon dates, with a current price of 100.11% 
of par. 

The cash flows of these securities are very dif¬ 
ferent as interest rates change. Consequently, 
the sensitivities to changes in interest rates are 
also very different. 

Using the Black-Derman-Toy interest rate 
model 3 that is based on the existing term struc¬ 
ture, the term structure of interest rates is 
shifted up and down by 10 basis points (bps) 
and the resulting price changes are recorded. P_ 
corresponds to the price after a downward shift 
in interest rates, P + corresponds to the price af¬ 
ter an upward shift in interest rates, P is the cur¬ 
rent price, and S is the assumed shift in the term 
structure. (Note that shifting the term structure 
in a parallel manner will result in a change in 
yields equal to the shift for option-free bonds.) 
Table 1 shows these prices for each bond. The 


Table 1 Original Prices and Resulting Prices from a Downward and Upward 10 Basis Point 
Interest Rate Shift and the Corresponding Effective Duration and Effective Convexity for Three 
Bonds Based on the Black-Derman-Toy Model 


Price Changes Following 10 bp Shift 



Upward Shift 

Downward Shift 

Variable 

Original Price P 

of 10 bp P + 

of 10 bp P_ 

Straight Bond Price 

102.7509029 

102.3191235 

103.1848805 

Callable Bond Price 

99.80297176 

99.49321718 

100.1085624 

Putable Bond Price 

100.1089131 

99.84237604 

100.3819059 

Effective Duration and Effective Convexity Measures Calculated from Using the Price Changes 

Resulting from the lObp Shifts in the Term Structure 




Effective duration 


Effective convexity 

Straight Bond 

4.21 


21.39 

Callable Bond 

3.08 


-41.72 

Putable Bond 

2.70 


64.49 
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Table 2 Effective Duration and Effective Convexity for Various Shifts in the Term Structure for Three Bonds 


Term Structure 
Shift (bps) 

Straight Bond 

Callable Bond 

Putable Bond 

Effective 

Duration 

Effective 

Convexity 

Effective 

Duration 

Effective 

Convexity 

Effective 

Duration 

Effective 

Convexity 

-500 

4.40 

23.00 

1.91 

4.67 

4.46 

23.46 

-250 

4.30 

22.19 

1.88 

4.55 

4.37 

22.66 

0 

4.21 

21.39 

3.08 

-41.72 

2.70 

64.49 

250 

4.12 

20.62 

4.15 

20.85 

1.87 

7.07 

500 

4.03 

19.87 

4.07 

20.10 

1.81 

4.23 

1000 

3.85 

18.42 

3.89 

18.66 

1.77 

4.03 


formulas for calculating effective duration and 
effective convexity are as follows: 

Effective duration = -— ^ ^ (1) 

2 PS v ' 

Effective convexity = ^ ^ ^ - (2) 

It is critical to understand the importance of 
the pricing model in this exercise. The model 
must account for the change in cash flows of the 
securities as interest rates change. The callable 
and putable bonds have very different cash flow 
characteristics that depend on the level of inter¬ 
est rates. The pricing model used must account 
for this property. 4 

Straight Bond 

The effective duration for the straight bond is 
found by recording the price changes from 
shifting the term structure up (P + ) and down 
(P_) by 10 bps and then substituting these val¬ 
ues into equation (1). The prices are shown in 
Table 1. Consequently, the computation is: 

, . 103.1848805 - 102.3191235 

Effective duration = - 

2(102.7509029)(0.001) 

= 4.21 

Similarly, the calculation for effective convex¬ 
ity is found by substituting the corresponding 
prices into equation (2): 

Effective convexity 

_ 103.1848805 + 102.3191235 - 2(102.7509029) 
~ 102.7509029(0.001) 2 

= 21.39 


For the straight bond, the modified duration 
is 4.21 and the convexity is 21.40. These are very 
close to the effective measures shown in Table 1. 
This demonstrates that, for option-free bonds, 
the two measures are almost the same for small 
changes in yields. 

Table 2 shows the effects of the term struc¬ 
ture shifts on the effective duration and effec¬ 
tive convexity of the straight bond. The effective 
duration increases as yields decrease because as 
yields decrease the slope of the price yield rela¬ 
tionship for option-free bonds becomes steeper 
and effective duration (and modified duration) 
is directly proportional to the slope of this rela¬ 
tionship. For example, the effective duration at 
very low yields (-500-bp shift) is 4.40 and de¬ 
creases to 3.85 at very high rates (+1,000 bps). 
Figure 1 illustrates this phenomenon; as yields 
increase notice how the slope of the price/yield 
relationship decreases (becomes more horizon¬ 
tal or flatter). 



Yield (%) 

Figure 1 Price/Yield Relationship of the 
Straight Bond 
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As the term structure shifts up (that is, as rates 
rise), the yield to maturity on a straight bond 
increases by approximately the same amount. 
As the yield increases, its convexity decreases. 
Figure 1 illustrates this property. As yields in¬ 
crease, the curvature (or the rate of change of 
the slope) decreases. The results in Table 2 for 
the straight bond also bear this out. The effec¬ 
tive convexity values become smaller as yields 
increase. For example, the effective convexity 
at very low yields (-500-bp shift) is 23.00 and 
decreases to 18.43 at very high rates (+l,000-bp 
shift). 

These are both well-documented properties of 
option-free bonds. The modified duration and 
convexity numbers for the straight bond are al¬ 
most identical to the effective measures for the 
straight bond shown in Table 2. 

Callable Bond 

The effective duration for the callable bond is 
found by recording the price changes from shift¬ 
ing the term structure up ( P + ) and down (P.) by 

10 bps and then substituting these values into 
equation (1). The prices are shown in Table 1. 
Note that these prices take into account the 
changing cash flows resulting from the embed¬ 
ded call option. Consequently, the computation 
is: 

100.1085624 - 99.49321718 

Effective duration = - 

2(99.800297)(0.001) 

= 3.08 

Similarly, the calculation for effective convex¬ 
ity is found by substituting the corresponding 
prices into equation (2): 

Effective convexity 

_ 100.1085624 + 99.49321718 - 2(99.80297176) 
~ 99.80297176(0.001) 2 

= -41.72 

The relationship between the shift in rates and 
effective duration is shown in Table 2 and in Fig¬ 
ure 2. As rates increase, the effective duration of 



Figure 2 Price/Yield Relationship of the 
Callable Bond 

the callable bond becomes larger. For example, 
the effective duration at very low yields (-500- 
bp shift) is 1.91 and increases to 3.89 at very 
high rates (+1,000 bps). This reflects the fact 
that as rates increase the likelihood of the bond 
being called decreases and, as a result, the bond 
behaves more like a straight bond; hence, its ef¬ 
fective duration increases. Conversely, as rates 
drop, this likelihood increases and the bond and 
its effective duration behave more like a bond 
with a two-year maturity because of the call op¬ 
tion becoming effective in two years. As rates 
decrease significantly, the likelihood of the is¬ 
suer calling the bond in two years increases. 
Consequently, at very low and intermediate 
rates the difference between the effective du¬ 
ration measure and modified duration is large 
and at very high rates the difference is small. 

As explained above, effective convexity 
measures the curvature of the price/yield re¬ 
lationship of bonds. Low values for effective 
convexity simply mean that the relationship is 
becoming linear (an effective convexity of zero 
represents a linear relationship). As shown in 
Table 2, the effective convexity values of the 
callable bond at extremely low interest rates 
(that is, for the -250-bp and -500-bp shifts in 
the term structure) are very small positive num¬ 
bers (4.55 and 4.67, respectively). This means 
that the relationship is almost linear but ex¬ 
hibits slight convexity. This is due to the call 
option being delayed by two years. At these 
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extremely low interest rates, the callable bond 
exhibits slight positive convexity because the 
price compression at the call price is not com¬ 
plete for another two years. (Price compression 
for a callable bond refers to the property that 
a callable bond's price appreciation potential is 
severely limited as yields decline. As shown in 
Figure 2 as yields fall below a certain level (that 
is, where the yield corresponds to the call price), 
the price appreciation of the callable bond is be¬ 
ing compressed). If this bond were immediately 
callable, the price /yield relationship would ex¬ 
hibit positive convexity at high yields and neg¬ 
ative convexity at low yields. At the current 
level of interest rates, the effective convexity is 
negative as expected. At these rate levels, the 
embedded call option causes enough price com¬ 
pression to cause the curvature of the price/ 
yield relationship to be negatively convex (that 
is, concave). Figure 2 illustrates these proper¬ 
ties. It is at these levels that the embedded op¬ 
tion has a significant effect on the cash flows of 
the callable bond. 

Table 2 shows that for large positive yield 
curve shifts (that is, for the +250-bp, +500-bp, 
and +l,000-bp shifts in the term structure), the 
effective convexity of the callable bond becomes 
positive and very close to the effective convex¬ 
ity values of the straight bond. For example, the 
effective convexity at the -l-250-bp shift is 20.85 
for the callable bond and 20.62 for the straight 
bond. The only reason they are not the same is 
because the coupon rates of the bonds are not 
equal. Consequently, at very low and interme¬ 
diate rates the difference between effective con¬ 
vexity and the standard convexity is large and 
at very high rates the difference is small. The 
intuition behind these findings is straightfor¬ 
ward. At low rates, the cash flows of the callable 
bond are severely affected by the likelihood of 
the embedded call option being exercised by 
the issuer. At high rates, the embedded call op¬ 
tion is so far out-of-the-money that it has almost 
no effect on the cash flows of the callable bond 
and so the callable bond behaves like a straight 
bond. 


Putable Bond 

The effective duration for the putable bond is 
found by recording the price changes from shift¬ 
ing the term structure up (P + ) and down (P_) by 
10 bps and then substituting these values into 
equation (1). The prices are shown in Table 1. 
Note that these prices take into account the 
changing cash flows resulting from the embed¬ 
ded put option. Consequently, the computation 
is: 

,,,, . , . 100.3819059- 99.84237604 

Effective duration = - 

2(100.1089131)(0.001) 

= 2.70 

Similarly, the calculation for effective convex¬ 
ity is found by substituting the corresponding 
prices into equation (2): 

Effective convexity 

_ 100.3819059 + 99.84237604 - 2(100.1089131) 
~ 100.1089131(0.001) 2 

= 64.49 

Because the putable bond behaves so differ¬ 
ently from the other two bonds, the effective du¬ 
ration and effective convexity values are very 
different. As rates increase, the bond behaves 
more like a two-year bond because the owner 
will, in all likelihood, exercise the right to put 
the bond back at the put price as soon as possi¬ 
ble. As a result, effective duration of the putable 
bond is expected to decrease as rates increase. 
This is due to the embedded put option severely 
affecting the cash flows of the putable bond. 
Conversely, as rates fall, the putable bond be¬ 
haves more like a five-year straight bond since 
the embedded put option is so far out-of-the- 
money and has little effect on the cashflows of 
the putable bond. Effective duration should re¬ 
flect these properties. Table 2 shows that this 
is indeed the case. For example, the effective 
duration at very low yields (-500-bp shift) is 
4.46 and decreases to 1.77 at very high rates 
(+1,000 bps). Consequently, at very high rates 
and intermediate rates the difference between 
the effective duration and modified duration 
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Figure 3 Price / Yield Relationship of the Putable 
Bond 

measures is large and at low rates the differ¬ 
ence is small. 

Table 2 shows that the effective convexity of 
the putable bond is positive for all rate shifts 
as would be expected, but it becomes smaller 
as rates increase (that is, for the +250-bp, 
+500-bp, and + 1,000-bp shifts in the term 
structure). As rates increase, the putable bond 
price /yield relationship will become linear be¬ 
cause of the bond's price truncation at the put 
price. (Price truncation for a putable bond refers 
to the property that the putable bond's price de¬ 
preciation potential is severely limited as yields 
increase.) As shown in Figure 3 as yields rise 
above a certain level (that is, where the yield 
corresponds to the put price), the price depreci¬ 
ation of the putable bond is truncated.) This is 
the reason for the small effective convexity val¬ 
ues for the putable bond for the three positive 
shifts in the term structure (7.07, 4.23, and 4.03, 
respectively). It is at these levels that the em¬ 
bedded put option has a significant effect on the 
cash flows of the putable bond. Consequently, 
at very high rates and intermediate rates the 
difference between the effective convexity and 
standard convexity is very large. Figure 3 illus¬ 
trates these properties. 

At very low rates (that is, for the 250-bp and 
500-bp downward shifts in the term structure), 
the putable bond behaves like a 5-year straight 
bond because the put option is so far out-of- 
the-money. Therefore, as the term structure is 


shifted downward, the putable bond's effective 
convexity values approach those of a compara¬ 
ble 5-year straight bond. Comparing the effec¬ 
tive convexity measures for the putable bond 
and the straight bond illustrates this character¬ 
istic. For example, the effective convexity at the 
-250-bp shift is 22.66 for the putable bond and 
22.19 for the straight bond. The two convex¬ 
ity measures are almost identical. In fact, they 
would be identical if their coupon rates were 
equal. 

Figure 2 illustrates these properties. Also no¬ 
tice how the transition from low yields to high 
yields forces the price/yield relationship to 
have a very high convexity at intermediate lev¬ 
els of yields. For example, the current effective 
convexity of the putable bond is 64.49 compared 
to 21.39 for the straight bond and —41.72 for the 
callable bond. This is because of the price trun¬ 
cation of the putable bond resulting from the 
embedded put option moving from out-of-the- 
money and having little influence over the cash 
flows to in-the-money and having a significant 
impact on cash flows. 


PUTTING IT ALL TOGETHER 

Notice in Table 2 how effective duration 
changes much more across yields for the 
callable and putable bonds than it does for the 
straight bond. This is to be expected because 
the embedded options have such a significant 
influence over cash flows as yields change over 
a wide spectrum. Interestingly, at high (low) 
yields the callable (putable) bond's effective du¬ 
ration is very close to the straight bond. This 
is where the embedded call (put) option is so 
far out-of-the-money that the two securities be¬ 
have similarly. The same intuition holds for the 
effective convexity measures. 

A common use of effective duration and ef¬ 
fective convexity is to estimate the percentage 
price changes in fixed income securities for as¬ 
sumed changes in yield. In fact, it is not un¬ 
common for effective duration and effective 
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Table 3 Percentage Price Changes Assuming an Increase in Yield of 100 bps and Effective Duration and Effective 
Convexity for Various Shifts in the Term Structure 


Straight Bond Callable Bond Putable Bond 


Term 

Structure 
Shift (bp) 

% Price 

Change 

Using 

Effective 

Duration 

% Price 

Change 

Using 

Effective 

Convexity 

Total % 

Price 

Change 

% Price 

Change 

Using 

Effective 

Duration 

% Price 

Change 

Using 

Effective 

Convexity 

Total % 

Price 

Change 

% Price 

Change 

Using 

Effective 

Duration 

% Price 

Change 

Using 

Effective 

Convexity 

Total % 

Price 

Change 

-500 

-4.40 

0.11500 

-4.28500 

-1.91 

0.02335 

-1.88665 

-4.46 

0.11730 

-4.34270 

-250 

-4.30 

0.11095 

-4.18905 

-1.88 

0.02275 

-1.85725 

-4.37 

0.11330 

-4.25670 

0 

-4.21 

0.10695 

-4.10305 

-3.08 

-0.20860 

-3.28860 

-2.70 

0.32245 

-2.37755 

250 

-4.12 

0.10310 

-4.01690 

-4.15 

0.10425 

-4.04575 

-1.87 

0.03535 

-1.83465 

500 

-4.03 

0.09935 

-3.93065 

-4.07 

0.10050 

-3.96950 

-1.81 

0.02115 

-1.78885 

1000 

-3.85 

0.09210 

-3.75790 

-3.89 

0.09330 

-3.79670 

-1.77 

0.02015 

-1.74985 


convexity to be presented in terms of estimated 
percentage price change for a given change in 
yield (typically 100 bp): Tables 3 and 4 show this 
alternative presentation for a ±100 bp change in 
yield. These results are computed by substitut¬ 
ing the values from Table 2 into the following 
relationship: 


% Price change = 


— * -(ED)(Ay)(100) 
±i(£C)(Ay) 2 (100) (3) 


where ED is the effective duration, EC is the ef¬ 
fective convexity, and Ay is the assumed change 
in yield (e.g., 100 bp). Equation (3) is the result 
of a Taylor Series expansion on the bond price 
function. Also, note that the effective duration 
(ED) and effective convexity (EC) terms can be 


replaced by modified duration and standard 
convexity, respectively, for option-free bonds. 

Table 3 illustrates the resulting percentage 
price changes resulting from an increase in yield 
of 100 bps at various levels of the term structure. 
For example, the percentage price change for 
the callable bond at the current term structure 
(0-bp shift) is calculated using the values from 
Table 2 and substituting them into equation (3) 
as follows: 

% Price change R» —(3.08)(0.01)(100) 

±i(-41.72)(0.01) 2 (100) 

« -3.08 - 0.2086 = -3.2886% 

This example shows that the estimated total 
percentage price change from effective con¬ 
vexity (—0.2086%) is much smaller than the 


Table 4 Percentage Price Changes Assuming a Decrease in Yield of 100 bps and Effective Duration and Effective 
Convexity for Various Shifts in the Term Structure 


Term 
Structure 
Shift (bp) 

Straight Bond 


Callable Bond 

Putable Bond 


% Price 

Change 

Using 

Effective 

Duration 

% Price 

Change 

Using 

Effective 

Convexity 

Total % 

Price 

Change 

% Price 

Change 

Using 

Effective 

Duration 

% Price 

Change 

Using 

Effective 

Convexity 

Total % 

Price 

Change 

% Price 

Change 

Using 

Effective 

Duration 

% Price 

Change 

Using 

Effective 

Convexity 

Total % 

Price 

Change 

-500 

4.40 

0.1150 

4.5150 

1.91 

0.0234 

1.9334 

4.46 

0.1173 

4.5773 

-250 

4.30 

0.1110 

4.4110 

1.88 

0.0228 

1.9028 

4.37 

0.1133 

4.4833 

0 

4.21 

0.1070 

4.3170 

3.08 

-0.2086 

2.8714 

2.70 

0.3225 

3.0225 

250 

4.12 

0.1031 

4.2231 

4.15 

0.1043 

4.2543 

1.87 

0.0354 

1.9054 

500 

4.03 

0.0994 

4.1294 

4.07 

0.1005 

4.1705 

1.81 

0.0212 

1.8312 

1000 

3.85 

0.0921 

3.9421 

3.89 

0.0933 

3.9833 

1.77 

0.0202 

1.7902 
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percentage price change from effective duration 
(-3.08). 

Table 4 illustrates the resulting percentage 
price changes resulting from a decrease in yield 
of 100 bp at the various levels of the term struc¬ 
ture. For example, the percentage price change 
for the callable bond at the current term struc¬ 
ture (0-bp shift) is calculated using the values 
from Table 2 and substituting them into equa¬ 
tion (3) as follows: 

% Price change « -(3.08)(-0.01)(100) 

+ ^(-41.72)(—0.01) 2 (100) 
ss 3.08 - 0.2086 = 2.8714% 

KEY POINTS 

• Duration and convexity are measures for es¬ 
timating the price sensitivity of a security to 
changes in interest rates. 

• Modified duration and effective duration are 
two ways to measure the price sensitivity of a 
fixed income security. Both measure the per¬ 
centage price change of a security from an 
absolute change in yields. 

• There are important differences between ef¬ 
fective duration and modified duration and 
effective convexity and convexity. The differ¬ 
ences are due to changing cash flows of the 
security being evaluated. 

• The effective measures account for chang¬ 
ing cash flows and the traditional measures 
do not. The differences between the two are 
very significant whenever the cash flows are 
greatly affected by the level of interest rates. 
However, to properly compute the effective 
measures both an interest rate and a valua¬ 
tion model are required. Consequently, they 
are more computationally intensive than the 
traditional measures. 

• The effective and traditional measures are 
identical for option-free bonds. 


• Combining effective duration with effective 
convexity is a superior risk management and 
measurement approach than using modified 
duration and convexity. 

• Investors would be best served by always us¬ 
ing the effective measures since they properly 
account for the cash flow characteristics of a 
security. 

NOTES 

1. For the impact of interest rate models on 
duration and convexity, see Buetow, Hanke, 
and Fabozzi (2001). 

2. For an illustration of how duration and con¬ 
vexity are computed for mortgage-backed 
securities, see Golub (2006) and Fabozzi 
(1999). 

3. Black, Derman, and Toy, 1990. 

4. Note that when calculating the measures, 
users are cautioned to not round values. 
Since the denominators of both the dura¬ 
tion and convexity terms are very small, any 
rounding will have a significant impact on 
results. 
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Abstract: Duration is a useful metric for assessing a bond portfolio's sensitivity to a parallel shift in 
the reference yield curve (e.g., the Treasury yield curve). When the yield curve shift is not parallel, 
however, two bond portfolios with the same duration will not generally experience the same return 
performance. To evaluate differences in expected performance across portfolios, it is therefore 
necessary to quantify the price impact due to changes in the shape, as opposed to a parallel shift, of 
the yield curve. The risk exposure of a portfolio to changes in the yield curve is called yield curve 
risk. Several approaches have been suggested for measuring yield curve risk. 


Duration and convexity are useful measures for 
approximating how the value of a bond port¬ 
folio or a bond index will change for a paral¬ 
lel shift in interest rates. Yet, empirically, both 
published studies 1 and proprietary studies by 
asset management firms have found that yield 
curve changes are not parallel. The exposure of 
a bond portfolio or a bond index to changes in 
the shape of the yield curve is called yield curve 
risk. 

There are several approaches for measuring 
yield curve risk. In this entry, we describe some 
of the more common approaches: cash-flow dis¬ 
tribution analysis versus a benchmark, key rate 
duration, slope elasticity measure, yield curve 
reshaping duration, and analysis of likely shifts 
in the yield curve. We begin the entry with an 
illustration of the drawback of using duration 


and convexity measures when the yield curve 
does not shift in a parallel fashion. 

DURATION, CONVEXITY, 

AND NONPARALLEL YIELD 
CURVE SHILTS 

To illustrate the limitations of duration and 
convexity, let's first look at how two portfolios 
consisting of hypothetical Treasury securities 
with the same portfolio duration will perform if 
the yield curve does not shift in a parallel fash¬ 
ion. Consider the three hypothetical Treasury 
securities shown in Table 1. Security A is the 
short-term Treasury, security B is the long-term 
Treasury, and security C is the intermediate- 
term Treasury. Each Treasury security is selling 
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Table 1 Three Hypothetical Treasury Securities to 
Illustrate the Limitations of Duration and Convexity 


Information on three Treasury securities: 




Yield to 


Treasury 

Coupon 


Maturity 

Maturity 

Issue 

Rate (%) 

Price 

(%) 

(years) 

A 

6.5 

100 

6.5 

5 

B 

8.0 

100 

8.0 

20 

C 

7.5 

100 

7.5 

10 


Calculation of duration and convexity (shock rates by 
10 basis points): 


Treasury _ Value if rate changes by 


issue 

+10 bp 

-10 bp 

Duration 

Convexity 

A 

99.5799 

100.4222 

4.21122 

10.67912 

B 

99.0177 

100.9970 

9.89681 

73.63737 

C 

99.3083 

100.6979 

6.49821 

31.09724 


at par, and it is assumed that the next coupon 
payment is six months from now. The duration 
and convexity for each security are calculated 
in the exhibit. Since all the securities are trad¬ 
ing at par value, the durations and convexities 
are the dollar duration and dollar convexity per 
$100 of par value. 

Suppose that the following two Treasury port¬ 
folios are constructed. The first portfolio con¬ 
sists of only security C, the 10-year issue, and 
shall be referred to as the "bullet portfolio." The 
second portfolio consists of 51.86% of security 
A and 48.14% of security B, and this portfolio 
shall be referred to as the "barbell portfolio." 

The dollar duration of the bullet portfolio is 
6.49821. Recall that dollar duration is a measure 
of the dollar price sensitivity of a security or a 
portfolio. The dollar duration of the barbell is 
the weighted average of the dollar duration of 
the two Treasury securities in the portfolio and 
is computed below: 

0.5186(4.21122) + 0.4814(9.89681) = 6.94821 

The dollar duration of the barbell is equal to 
the dollar duration of the bullet. In fact, the 
barbell portfolio was designed to produce this 
result. 

Duration is just a first approximation of the 
change in price resulting from a change in in¬ 


terest rates. The convexity measure provides 
a second approximation. The dollar convexity 
measure of the two portfolios is not equal. The 
dollar convexity measure of the bullet portfolio 
is 31.09724. The dollar convexity measure of the 
barbell is a weighted average of the dollar con¬ 
vexity measure of the two Treasury securities in 
the portfolio. That is, 

0.5186(10.67912) + 0.4814(73.63737) = 40.98658 

Thus, the bullet has a dollar convexity mea¬ 
sure that is less than that of the barbell portfolio. 
Below is a summary of the dollar duration and 
dollar convexity of the two portfolios: 


Parameter 

Bullet Portfolio 

Barbell Portfolio 

Dollar duration 

6.49821 

6.49821 

Dollar convexity 

31.09724 

40.98658 


The better Treasury portfolio depends on the 
portfolio manager's investment objectives and 
investment horizon. Let's assume a six-month 
investment horizon. The last column of Table 2 

Table 2 Performance of Bullet and Barbell Treasury 
Portfolios over a Six-Month Horizon Assuming a 
Parallel Yield Curve Shift: Scenario Analysis 


Total Return (%) 


Yield Change 
(in bps) 

Bullet 

Portfolio 

Barbell 

Portfolio 

Difference 1 

-300 

53.47 

55.79 

-2.32 

-250 

44.95 

46.38 

-1.43 

-200 

36.79 

37.55 

-0.76 

-150 

28.99 

29.26 

-0.27 

-100 

21.51 

21.47 

0.05 

-50 

14.35 

14.13 

0.22 

-25 

10.89 

10.63 

0.26 

0 

7.50 

7.22 

0.28 

25 

4.18 

3.92 

0.27 

50 

0.93 

0.70 

0.23 

100 

-5.36 

-5.45 

0.09 

150 

-11.39 

-11.28 

-0.11 

200 

-17.17 

-16.79 

-0.38 

250 

-22.71 

-22.01 

-0.70 

300 

-28.03 

-26.96 

-1.06 


"A positive sign indicates that the bullet portfolio out¬ 
performed the barbell portfolio; a negative sign indi¬ 
cates that the barbell portfolio outperformed the bullet 
portfolio. 
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shows the difference in the total return over 
a six-month investment horizon for the two 
Treasury portfolios, assuming that the yield 
curve shifts in a "parallel" fashion. By paral¬ 
lel it is meant that the yield for the short-term 
security (A), the intermediate-term security (C), 
and the long-term security (B) changes by the 
same number of basis points, shown in the first 
column of the table. The total return reported 
in the second column of Table 2 is: 

Bullet portfolio's total return 
— Barbell portfolio's total return 

Thus, a positive value in the last column 
means that the bullet portfolio outperformed 
the barbell portfolio, while a negative sign 
means that the barbell portfolio outperformed 
the bullet portfolio. Note that no assumption 
is needed for the reinvestment rate since the 


Table 3 Performance of Bullet and Barbell Treasury 
Portfolios over a Six-Month Horizon Assuming a 
Flattening of the Yield Curve: Scenario Analysis 


Yield change 
for C (in bps) 


Total return (%) 

Bullet 

Portfolio 

Barbell 

Portfolio 

Difference" 

-300 

53.47 

58.98 

-5.51 

-250 

44.95 

49.26 

-4.31 

-200 

36.79 

40.15 

-3.36 

-150 

28.99 

31.60 

-2.62 

-100 

21.51 

23.58 

-2.06 

-50 

14.35 

16.03 

-1.67 

-25 

10.89 

12.42 

-1.53 

0 

7.50 

8.92 

-1.42 

25 

4.18 

5.53 

-1.35 

50 

0.93 

2.23 

-1.30 

100 

-5.36 

-4.09 

-1.27 

150 

-11.39 

-10.06 

-1.33 

200 

-17.17 

-15.70 

-1.47 

250 

-22.71 

-21.04 

-1.67 

300 

-28.03 

-26.11 

-1.92 


Assumptions: 

Change in yield of security C results in a change in the 
yield of security A plus 30 basis points. 

Change in yield of security C results in a change in the 
yield of security B minus 30 basis points. 

“A positive sign indicates that the bullet portfolio out¬ 
performed the barbell portfolio; a negative sign indi¬ 
cates that the barbell portfolio outperformed the bullet 
portfolio. 


three securities comprising the portfolios are 
assumed to be trading right after a coupon pay¬ 
ment has been made and therefore there is no 
accrued interest. 

Which portfolio is the better investment alter¬ 
native if the yield curve shifts in a parallel fash¬ 
ion and the investment horizon is six months? 
The answer depends on the amount by which 
yields change. Notice in the last column that 
if yields change by less than 100 basis points, 
the bullet portfolio will outperform the barbell 
portfolio. The reverse is true if yields change by 
more than 100 basis points. 

Now let's look at what happens if the yield 
curve does not shift in a parallel fashion. The 
last column of Tables 3 and 4 show the relative 
performance of the two Treasury portfolios for 
a nonparallel shift of the yield curve. Specifi¬ 
cally, in Table 3 it is assumed that if the yield on 


Table 4 Performance of Bullet and Barbell Treasury 
Portfolios over a Six-Month Horizon Assuming a 
Steepening of the Yield Curve: Scenario Analysis 


Yield Change 
for C (in bps) 


Total Return (%) 

Bullet 

Portfolio 

Barbell 

Portfolio 

Difference" 

-300 

53.47 

52.82 

0.65 

-250 

44.95 

43.70 

1.24 

-200 

36.79 

35.14 

1.65 

-150 

28.99 

27.09 

1.89 

-100 

21.51 

19.52 

1.99 

-50 

14.35 

12.39 

1.97 

-25 

10.89 

8.98 

1.91 

0 

7.50 

5.66 

1.84 

25 

4.18 

2.44 

1.74 

50 

0.93 

-0.69 

1.63 

100 

-5.36 

-6.70 

1.34 

150 

-11.39 

-12.38 

0.99 

200 

-17.17 

-17.77 

0.60 

250 

-22.71 

-22.88 

0.17 

300 

-28.03 

-27.73 

-0.30 


Assumptions: 

Change in yield of security C results in a change in the 
yield of security A minus 30 basis points. 

Change in yield of security C results in a change in the 
yield of security B plus 30 basis points. 

“A positive sign indicates that the bullet portfolio out¬ 
performed the barbell portfolio; a negative sign indi¬ 
cates that the barbell portfolio outperformed the bullet 
portfolio. 
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C (the intermediate-term security) changes by 
the amount shown in the first column, A (the 
short-term security) will change by the same 
amount plus 30 basis points, whereas B (the 
long-term security) will change by the same 
amount shown in the first column less 30 basis 
points. That is, the nonparallel shift assumed 
is a flattening of the yield curve. For this yield 
curve shift, the barbell will outperform the bul¬ 
let for the yield changes assumed in the first col¬ 
umn. While not shown in the table, for changes 
greater than 300 basis points for C, the opposite 
would be true. 

In Table 4, the nonparallel shift assumes that 
for a change in C's yield, the yield on A will 
change by the same amount less 30 basis points, 
whereas the yield on B will change by the same 
amount plus 30 basis points. That is, it assumes 
that the yield curve will steepen. In this case, the 
bullet portfolio would outperform the barbell 
portfolio for all but a change in yield greater 
than 250 basis points for C. 

The key point here is that looking at duration 
or convexity tells us little about performance 
over some investment horizon because perfor¬ 
mance depends on the magnitude of the change 
in yields and how the yield curve shifts. 

CASH-FLOW DISTRIBUTION 
ANALYSIS VERSUS A 
BENCHMARK 

The most straightforward approach to assess¬ 
ing a portfolio's risk exposure to yield curve 
shifts is by looking at the distribution of the 
present value of the cash flows for the port¬ 
folio being managed versus a benchmark. The 
benchmark will be either a bond index or a lia¬ 
bility structure. The steps are as follows: 

Step 1: Determine the discrete time periods for 
the analysis. The shortest and longest time is 
determined by the shortest and longest cash 
flows for the portfolio and the benchmark. 
Each time period is referred to as a cash-flow 
vertex. 


Step 2: Compute the cash flows for the port¬ 
folio and the benchmark for each cash-flow 
vertex. 

Step 3: Compute the present value of the cash 
flows for the portfolio and the benchmark 
for each cash-flow vertex. The spot rate used 
to compute the present value is the spot 
rate for the cash-flow vertex. For example, 
if the cash-flow vertex is year 5, the 5-year 
spot rate is used. 

Step 4: Compute the duration contribution at 
each cash flow vertex for the portfolio and 
the benchmark. 

Step 5: Compute the duration contribution as 
a percentage of duration for both the port¬ 
folio and the benchmark for each cash-flow 
vertex. 

Step 6: Compute the difference in the portfolio 
percentage and benchmark percentage com¬ 
puted in Step 5 for each cash-flow vertex. 

In practice, the application is not straightfor¬ 
ward because of the inclusion of bonds with 
embedded options and mortgage-backed and 
asset-backed securities. Suppose a bond is a 7- 
year bond that is callable in three years. The 
cash flows for this bond depend on the portfo¬ 
lio manager's assessment of the probability that 
it will be called in three years. For mortgage- 
backed and asset-backed securities, the cash 
flows depend on the prepayment assumption. 

Another difficulty in the implementation pro¬ 
cess is the allocation of cash flows to the cash¬ 
flow vertices when a cash flow is not exactly on 
a cash-flow vertex date. For example, consider 
a bond whose coupon payment of $1 million 
is to be received 4.75 years from now and that 
there is a 4-year and 5-year cash-flow vertex. 
How should the $1 million coupon payment be 
allocated? The procedure would be to allocate 
25% to the 4-year cash-flow vertex and 75% to 
the 5-year cash-flow vertex. 

Despite its simplicity, the cash-flow distribu¬ 
tion analysis is commonly used as a measure of 
yield curve risk for index fund managers (see 
Volpert, 2000). 
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KEY RATE DURATION 

One approach to measure yield curve risk is 
to change the yield for a particular maturity 
of the yield curve and determine the sensitiv¬ 
ity of a security or portfolio to this change, 
holding all other yields constant. The sensi¬ 
tivity of the change in value to a particular 
change in yield is called rate duration. There 
is a rate duration for every point on the yield 
curve. Consequently, there is not one rate du¬ 
ration, but a vector of durations representing 
each maturity on the yield curve. The total 
change in value if all rates change by the same 
number of basis points is simply the duration 
of a security or portfolio to a parallel shift 
in rates. 

This approach was first suggested by Cham¬ 
bers and Carleton (1988), who called it duration 
vectors. Reitano (1992) suggested a similar ap¬ 
proach and referred to these durations as par¬ 
tial durations. The most popular version of this 
approach is that developed by Ho (1992). This 
approach examines how changes in Treasury 
yields at different points on the spot curve af¬ 
fect the value of a bond portfolio. Ho's method¬ 
ology has three basic steps. The first step is to 
select several key maturities or "key rates" of 
the spot rate curve. Ho's approach focuses on 
11 key maturities on the spot rate curve. These 
rate durations are called key rate durations. The 
specific maturities on the spot rate curve for 
which a key rate duration is measured are 
3 months, 1 year, 2 years, 3 years, 5 years, 
7 years, 10 years, 15 years, 20 years, 25 years, 
and 30 years. However, in order to illustrate 
Ho's methodology, we will select only three key 
rates: 1 year, 10 years, and 30 years. 

The next step is to specify how other rates 
on the spot curve change in response to key 
rate changes. Ho's rule is that a key rate's ef¬ 
fect on neighboring rates declines linearly and 
reaches zero at the adjacent key rates. For exam¬ 
ple, suppose the 10-year key rate increases by 
40 basis points. All spot rates between 10 years 
and 30 years will increase but the amount each 


changes will be different and the magnitude 
of the change diminishes linearly. Specifically, 
there are 40 semiannual periods between 10 and 
30 years. Each spot rate starting with 10.5 years 
increases by 1 basis point less than the spot rate 
to its immediate left (that is, 39 basis points) 
and so forth. The 30-year rate which is the ad¬ 
jacent key rate is assumed to be unchanged. 
Thus, only one key rate changes at a time. Spot 
rates between 1 year and 10 years change in an 
analogous manner such that all rates change 
but by differing amounts. Changes in the 
1-year key rate affect spot rates between 1 and 
10 years, while spot rates 10 years and beyond 
are assumed to be unaffected by changes in the 
1-year spot rate. In a similar vein, changes in 
the 30-year key rate affect all spot rates be¬ 
tween 30 years and 10 years while spot rates 
shorter than 10 years are assumed to be un¬ 
affected by changes in the 30-year rate. This 
process is illustrated in Figure 1. Note that if 
we add the three rate changes together, we 
obtain a parallel yield curve shift of 40 basis 
points. 

The third and final step is to calculate the per¬ 
centage change in the bond's portfolio value 
when each key rate and neighboring spot rates 
are changed. There will be as many key rate du¬ 
rations as there are preselected key rates. Let's 
illustrate this process by calculating the key rate 
duration for a coupon bond. Our hypotheti¬ 
cal 6% coupon bond has a maturity value of 
$100 and matures in five years. The bond de¬ 
livers coupon payments semiannually. Valua¬ 
tion is accomplished by discounting each cash 
flow using the appropriate spot rate. The bond's 
current value is $107.32 and the process is illus¬ 
trated in Table 5. The initial hypothetical (and 
short) spot curve is contained in column (3). 
(Note that the spot rates are annual rates and 
are reported as bond-equivalent yields. When 
present values are computed, we use the ap¬ 
propriate semiannual rates that are taken to be 
one half the annual rate.) The present values of 
each of the bond's cash flows are presented in 
the last column. 
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Figure 1 Graph of How Spot Rates Change 
when Key Rates Change 

Table 5 Valuation of 5-Year 6% Coupon Bond Using 


Spot Rates 



Spot Rate 

Cash Flow 

Present Value 

Years 

Period 

(in percent) 

(in dollars) 

(in dollars) 

0.5 

1 

3.00 

3.0 

2.96 

1.0 

2 

3.25 

3.0 

2.90 

1.5 

3 

3.50 

3.0 

2.85 

2.0 

4 

3.75 

3.0 

2.79 

2.5 

5 

4.00 

3.0 

2.72 

3.0 

6 

4.10 

3.0 

2.66 

3.5 

7 

4.20 

3.0 

2.59 

4.0 

8 

4.30 

3.0 

2.53 

4.5 

9 

4.35 

3.0 

2.47 

5.0 

10 

4.40 

103.0 

82.86 




Total 

107.32 


To compute the key rate duration of the 5- 
year bond, we must select some key rates. We 
assume the key rates are 0.5, 3, and 5 years. To 
compute the 0.5-year key rate duration, we shift 
the 0.5-year rate upwards by 20 basis points 
and adjust the neighboring spot rates between 
0.5 and 3 years as described earlier. (The choice 
of 20 basis points is arbitrary.) Figure 2 shows 
the initial spot curve and the spot curve after 
the 0.5-year key rate and neighboring rates are 
shifted. The next step is to compute the bond's 
new value as a result of the shift. This calcula¬ 
tion is shown in Table 6. The bond's value to 
the shift is $107.30. To estimate the 0.5-year key 
rate duration, we divide the percentage change 
in the bond's price as a result of the shift in 
the spot curve by the change in the 0.5-year 
key rate. Accordingly, we employ the following 
formula: 

Key rate duration = -- 

^ P 0 (Ay) 

where 

P 0 = the bond's value using the initial spot 
curve 

Pi = the bond's value after the shift in the 
spot curve 

Ay = shift in the key rate (in decimal) 

Substituting in numbers from our illustration 
presented above, we can compute the 0.5-year 


Table 6 Valuation of the 5-Year 6% Coupon Bond 
after 0.5-Year Key Rate and Neighboring Spot Rates 
Change 


Years 

Spot Rate 

Period (in percent) 

Cash Flow 
(in dollars) 

Present Value 
(in dollars) 

0.5 

1 

3.20 

3.0 

2.95 

1.0 

2 

3.41 

3.0 

2.90 

1.5 

3 

3.62 

3.0 

2.84 

2.0 

4 

3.83 

3.0 

2.78 

2.5 

5 

4.04 

3.0 

2.71 

3.0 

6 

4.10 

3.0 

2.66 

3.5 

7 

4.20 

3.0 

2.59 

4.0 

8 

4.30 

3.0 

2.53 

4.5 

9 

4.35 

3.0 

2.47 

5.0 

10 

4.40 

103.0 

82.86 




Total 

107.30 



















Yield Curve Risk Measures 


313 


4.5 
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§ 3.9 

Q. 

13-6 

3.3 
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0 

Figure 2 Graph of the Initial Spot Curve and the Spot Curve after the 0.5-Year Key Rate Shift 

To compute the 3-year key rate duration, we 
repeat this process. We shift the 3-year rate by 
20 basis points and adjust the neighboring spot 
rates as described earlier. Figure 3 shows the 
initial spot curve and the spot curve after the 3- 
year key rate and neighboring rates are shifted. 
Note that in this case the only two spot rates 
that do not change are the 0.5-year and the 


4.5 


4.2 
§ 3.9 

CD 

Q. 

I 3.6 

3.3 


3.0 

0.50000 1 1.50000 2 2.50000 3 3.50000 4 4.50000 5 

Maturity (in years) 

Figure 3 Graph of the Initial Spot Curve and the Spot Curve after the 3-Year Key Rate Shift 



key rate duration as follows: 

P 0 = 107.32 
Pi = 107.30 
Ay = 0.002 

„ „ , , 107.32 - 107.30 

0.5-year key rate duration = ---— 

y y 107.32(0.002) 

= 0.0932 







314 


Risk Measures 


4.7 

5=r 4.1 

c 
a> 
o 

a> 
a. 

2 
a) 

> 3.5 

2.9 
0 

Figure 4 Graph of the Initial Spot Curve and the Spot Curve after the 5-Year Key Rate Shift 



5-year key rates. Then, we compute the bond's 
new value as a result of the shift. The bond's 
postshift value is $107.25 and the calculation 
appears in Table 7. Accordingly, the 3-year key 
rate duration is computed as follows: 

107.32 - 107.25 

3-year key rate duration = - 

J J 107.32(0.002) 

= 0.3261 

The final step is to compute the 5-year key du¬ 
ration. We shift the 5-year rate by 20 basis points 


Table 7 Valuation of the 5-Year 6% Coupon Bond 
After 3-Year Key Rate and Neighboring Spot Rates 
Change 


Years 

Spot Rate 

Period (in percent) 

Cash Flow 
(in dollars) 

Present Value 
(in dollars) 

0.5 

1 

3.00 

3.0 

2.96 

1.0 

2 

3.29 

3.0 

2.90 

1.5 

3 

3.58 

3.0 

2.84 

2.0 

4 

3.87 

3.0 

2.78 

2.5 

5 

4.16 

3.0 

2.71 

3.0 

6 

4.30 

3.0 

2.64 

3.5 

7 

4.35 

3.0 

2.58 

4.0 

8 

4.40 

3.0 

2.52 

4.5 

9 

4.40 

3.0 

2.47 

5.0 

10 

4.40 

103.0 

82.86 




Total 

107.25 


and adjust the neighboring spot rates. Figure 4 
presents a graph of the initial spot curve and the 
spot curve after the 5-year key rate and neigh¬ 
boring rates are shifted. The bond's postshift 
value is $106.48 and the calculation appears in 
Table 8. Accordingly the 5-year key rate dura¬ 
tion is computed as follows: 

107.32 - 106.48 

5-year key rate duration = - 

y y 107.32(0.002) 

= 3.9135 


Table 8 Valuation of the 5-Year 6% Coupon Bond 
after 5-Year Key Rate and Neighboring Spot Rates 
Change 


Years 

Spot Rate 

Period (in percent) 

Cash Flow 
(in dollars) 

Present Value 
(in dollars) 

0.5 

1 

3.00 

3.0 

2.96 

1.0 

2 

3.25 

3.0 

2.90 

1.5 

3 

3.50 

3.0 

2.85 

2.0 

4 

3.75 

3.0 

2.79 

2.5 

5 

4.00 

3.0 

2.72 

3.0 

6 

4.10 

3.0 

2.66 

3.5 

7 

4.25 

3.0 

2.59 

4.0 

8 

4.40 

3.0 

2.52 

4.5 

9 

4.50 

3.0 

2.46 

5.0 

10 

4.60 

103.0 

82.05 




Total 

106.48 
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What information can be gleaned from these 
key rate durations? Each key rate duration by 
itself means relatively little. However, the dis¬ 
tribution of the bond's key rate durations helps 
us assess its exposure to yield curve risk. In¬ 
tuitively, the sum of the key rate durations is 
approximately equal to a bond's duration. (The 
reason it is only approximate is that modified 
duration assumes a flat yield curve, whereas 
key rate duration takes the spot curve as given.) 

As a result, it is useful to think of a set of key 
rate durations as a decomposition of duration 
into sensitivities to various portions of the yield 
curve. In our illustration, it is not surprising that 
the lion's share of the yield curve risk exposure 
of the coupon bond in our illustration is due to 
the bond's terminal cash flow, so the 5-year key 
rate duration is the largest of the three. Simply 
put, the 5-year bond's value is more sensitive to 
movements in longer spot rates and less sensi¬ 
tive to movements in shorter spot rates. 

Key rate durations are most useful when com¬ 
paring two (or more) bond portfolios that have 
approximately the same duration. If the spot 
curve is flat and experiences a parallel shift, 
these two bond portfolios can be expected to 
experience approximately the same percentage 
change in value. However, the performance of 
the two portfolios will generally not be the same 
for a nonparallel shift in the spot curve. The key 
rate duration profile of each portfolio will give 
the portfolio manager some clues about the rel¬ 
ative performance of the two portfolios when 
the yield curve changes shape and slope. 

SLOPE ELASTICITY 
MEASURE 

The slope elasticity measure, introduced by 
Schumacher, Dektar, and Fabozzi (1994) for 
managing the yield curve risk of portfolios of 
collateralized mortgage obligation bonds, also 
looks at the sensitivity of a position or portfo¬ 
lio to changes in the slope of the yield curve. 
They define the yield curve slope as the spread 
between the 30-year on-the-run Treasury yield 


and the 3-month Treasury bill yield (that is, ba¬ 
sically the longest and the shortest points on the 
Treasury yield curve). 

They find that while this is not a perfect defi¬ 
nition, it captures most of the effect of changes 
in yield curve slope. They then define changes 
in the yield curve as follows: Half of any ba¬ 
sis point change in the yield curve slope results 
from a change in the 3-month yield and half 
from a change in the 30-year yield. For example, 
with a 200-basis-point steepening of the yield 
curve, the assumption is that 100 basis points of 
that steepening come from a rise in the 30-year 
yield, and another 100 basis points come from 
a fall in the 3-month yield. 

The sensitivity of a bond's price to changes 
in the yield curve is simply its slope elasticity. 
They define slope elasticity as the approximate 
negative percentage change in a bond's price 
resulting from a 100-basis-point change in the 
slope of the curve. Slope elasticity is calculated 
as follows: Increase and decrease the yield curve 
slope, calculate the price change for these two 
scenarios after adjusting for the price effect of a 
change in the level of yields, and compare the 
prices to the initial price. More specifically, the 
slope elasticity for each scenario is calculated as 
follows: 

Price effect of a change in slope/Base price 
Change in yield curve slope 

The slope elasticity is then the average of the 
slope elasticity for the two scenarios. 

A bond or bond portfolio that benefits when 
the yield curve flattens is said to have positive 
slope elasticity; a bond or a bond portfolio that 
benefits when the yield curve steepens is said 
to have negative slope elasticity. The definition 
of yield curve risk follows from that of slope 
elasticity. It is defined as the exposure of the 
bond to changes in the slope of the yield curve. 

YIELD CURVE RESHAPING 
DURATION 

Yield curve reshaping duration, introduced by 
Klaffky, Ma, and Nozari (1992), focuses on three 
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points on the yield curve: 2-year, 10-year, and 
30-year, and the spread between the 10-year and 
2-year issues and the spread between the 30- 
year and 10-year issues. The former spread is 
referred to as the short end of the yield curve, 
and the latter spread the long end of the yield 
curve. Klaffky, Ma, and Nozari refer to the sen¬ 
sitivity of a portfolio to changes in the short end 
of the yield curve as short-end duration (SEDUR) 
and to changes in the long end of the yield curve 
as long-end duration (LEDUR). These concepts, 
however, are applicable to other points on the 
yield curve. 

To calculate the SEDUR of each security in 
the portfolio, the percentage change in the se¬ 
curity's price is calculated for (1) a steepening 
of the yield curve at the short end by 50 basis 
points, and (2) a flattening of the yield curve 
at the short end of the yield curve by 50 basis 
points. Then the security's SEDUR is computed 
as follows: 


SEDUR = 


Ps-Pf 

2P 0 (Ay) 


where 

P s = security's price if the short end of the 
yield curve steepens by 50 basis points 
Pf= security's price if the short end of the 
yield curve flattens by 50 basis points 
Po = security's current market price 
Ay = number of basis points by which the 
yield curve is changed 


To calculate the LEDUR, the same procedure 
is used for each security in the portfolio: Cal¬ 
culate the price for (1) a flattening of the yield 
curve at the long end by 50 basis points, and (2) 
a steepening of the yield curve at the long end 
of the yield curve by 50 basis points. Then the 
security's LEDUR is computed as follows: 


LEDUR = 


Pf - Ps 

2P 0 (Ay) 


For an illustration, see Fabozzi (1999). 


ANALYSIS OF LIKELY YIELD 
CURVE SHIFTS 

While key rate duration is a useful measure 
for identifying the exposure of a portfolio to 
different potential shifts in the yield curve, it 
is difficult to employ this approach to yield 
curve risk in hedging a portfolio. An alterna¬ 
tive approach is to investigate how yield curves 
have changed historically and incorporate typ¬ 
ical yield curve change scenarios into the hedg¬ 
ing process. This approach of using likely yield 
curve changes obtained from principal compo¬ 
nent analysis has been suggested by Richard 
and Gord (1997), Golub and Tilman (1997), and 
Axel and Vankudre (2000). 

Empirically, studies have found that yield 
curve changes are not parallel. Rather, when 
the level of interest rates changes, studies have 
found that short-term rates move more than 
longer-term rates. Some firms develop their 
own proprietary models that decompose his¬ 
torical movements in the rate changes of Trea¬ 
sury strips with different maturities in order to 
analyze typical or likely rate movements. The 
statistical technique used to decompose rate 
movements is principal component analysis. 


KEY POINTS 

• When using a portfolio's duration and con¬ 
vexity to measure the exposure to interest 
rates, it is assumed that the yield curve shifts 
in a parallel fashion. 

• For a nonparallel shift in the yield curve, du¬ 
ration and convexity may not provide ade¬ 
quate information about the risk exposure to 
changes in interest rates. 

• Yield curve risk is the exposure of a portfolio 
to a change in the shape of the yield curve. 
There are several approaches that have been 
proposed for measuring a portfolio's yield 
curve risk. 

• A simple approach to measuring yield curve 
risk, an approach commonly used by in¬ 
dex managers, is an analysis of the cash 
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flow distribution of a portfolio relative to a 
benchmark. 

• Key rate duration measures how changes in 
Treasury yields at different points on the spot 
rate curve affect the value of a bond. 

• Slope elasticity looks at the sensitivity of a po¬ 
sition or portfolio to changes in the slope of 
the yield curve and is defined as the approxi¬ 
mate negative percentage change in a bond's 
price resulting from a 100-basis-point change 
in the slope of the curve. 

• Yield curve reshaping duration decomposes 
the yield curve into a short end and a long 
end. The sensitivity of a portfolio to changes 
in the short end of the yield curve is called 
short-end duration (SEDUR) and to changes 
in the long end of the yield curve is called 
long-end duration (LEDUR). 

• Using principal component analysis, a portfo¬ 
lio manager can determine likely yield curve 
shifts and use those shifts to assess the expo¬ 
sure of a portfolio to yield curve risk. 

NOTE 

1. See, e.g., Litterman and Scheinkman (1991) 
and Jones (1991). 
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Abstract: A risk measure that has been widely accepted since the 1990s is the value-at-risk (VaR). In 
the late 1980s, it was integrated by JP Morgan on a firmwide level into its risk-management system. 
In the mid-1990s, the VaR measure was approved by regulators as a valid approach to calculating 
capital reserves needed to cover market risk. The Basel Committee on Banking Supervision released 
a package of amendments to the requirements for banking institutions, allowing them to use their 
own internal systems for risk estimation. In this way, capital reserves, which financial institutions 
are required to keep, could be based on the VaR numbers computed internally by an in-house risk 
management system. Generally, regulators demand that the capital reserve equal the VaR number 
multiplied by a factor between 3 and 4. Thus, regulators link the capital reserves for market risk 
directly to the risk measure. In practice, there are several approaches for estimating VaR. 


In this entry, we cover the most commonly 
used risk measure used by financial institu¬ 
tions: valne-at-risk (VaR). We comment on its 
properties and different calculation methods. 
Where possible, the definitions and equations 
are geometrically interpreted, making the ideas 
more intuitive and understandable. 


VALUE-AT-RISK DEFINED 

VaR is defined as the minimum level of loss at 
a given, sufficiently high, confidence level for 


a predefined time horizon. The recommended 
confidence levels are 95% and 99%. Suppose 
that we hold a portfolio with a 1-day 99% VaR 
equal to $1 million. This means that over the 
horizon of 1 day, the portfolio may lose more 
than $1 million with probability equal to 1%. 

The same example can be constructed for per¬ 
centage returns. Suppose that the present value 
of a portfolio we hold is $10 million. If the 1- 
day 99% VaR of the return distribution is 2%, 
then over the time horizon of 1 day, we lose 
more than 2% ($200,000) of the portfolio present 
value with probability equal to 1%. 
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Denote by (1 — e)100% the confidence level 
parameter of the VaR. As we explained, losses 
larger than the VaR occur with probability e. 
The probability e, we call tail probability. De¬ 
pending on the interpretation of the random 
variable, VaR can be defined in different ways. 
Formally, the VaR at confidence level (1 — 
e)100% (tail probability e) is defined as the 
negative of the lower e -quantile of the return 
distribution, 

VaR e (X) = -inf{x| P(X < x) > e} = -F^(e) 

( 1 ) 

where e e (0,1) and F^ 1 (e) is the inverse of the 
distribution function. If the random variable X 
describes random returns, then the VaR number 
is given in terms of a return figure. The defini¬ 
tion of VaR is illustrated in Figure 1. 

If X describes random payoffs, then VaR is 
a threshold in dollar terms below which the 
portfolio value falls with probability e, 

VaR e (X) = inf{ x| P(X < x) > e} = F^(e) (2) 

where e e (0,1) and F x ] (e) is the inverse of the 
distribution function of the random payoff. VaR 



VaR 



VaR 


Figure 1 The VaR at 95% Confidence Level of a 
Random Variable X 

Note: The top plot shows the density of X, the 
marked area equals the tail probability, and the 
bottom plot shows the distribution function. 


can also be expressed as a distance to the present 
value when considering the profit distribution. 
The random profit is defined as X — Pq where 
X is the payoff and Pq is the present value. The 
VaR of the random profit equals 

VaR e (X-P 0 ) = — inf{x| P(X- P 0 < x) > e} 

X 

= P 0 - VaR,(X) 

in which VaR e (X) is defined according to (2) 
since X is interpreted as a random payoff. In this 
case, the definition of VaR is essentially given 
by equation (1). 

According to the definition in equation (1), 
VaR may become a negative number. If VaR e (X) 
is a negative number, then this means that at 
tail probability e we do not observe losses but 
profits. Losses happen with even smaller prob¬ 
ability than e. If for any tail probability VaR e (X) 
is a negative number, then no losses can oc¬ 
cur and, therefore, the random variable X bears 
no risk as no exposure is associated with it. In 
this entry, we assume that random variables de¬ 
scribe either random returns or random profits 
and we adopt the definition in equation (1). 

We illustrate one aspect in which VaR differs 
from the deviation measures and all uncertainty 
measures. As a consequence of the definition, if 
we add to the random variable X a nonrandom 
profit C, the resulting VaR can be expressed by 
the VaR of the initial variable in the following 
way 

VaR e (X + C) = VaR € (X) - C (3) 

Thus, adding a nonrandom profit decreases the 
risk of the portfolio. Furthermore, scaling the re¬ 
turn distribution by a positive constant X scales 
the VaR by the same constant, 

VaR e (XX) = XVaR e (X) (4) 

It turns out that these properties characterize 
not only VaR. They are identified as key features 
of a risk measure. 

To illustrate, let's use an example. Suppose 
initially we have a portfolio that consists of a 
common stock with random monthly return 
denoted by r X - We rebalance the portfolio so 
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that it becomes an equally weighted portfolio of 
the stock and a default-free government bond 
with a nonrandom monthly return of 5.26%, 
rg = 5.26%. Thus, the portfolio return can be 
expressed as 

r p = r x ( 1/2) + r B (l/2) = r x /2 + 0.0526/2 

Using equations (3) and (4), we calculate that if 
VaR e (r x ) = 12%, then VaR e (r p ) « 3.365%, which 
is by far less than 6%—half of the initial risk. 
Any deviation measure would indicate that the 
dispersion (or the uncertainty) of the portfolio 
return r p would be twice as small as the uncer¬ 
tainty of r x . 

A very important remark has to be made 
with respect to the performance of VaR and, 
as it turns out, of any other risk measure. It 
is heavily dependent on the assumed probabil¬ 
ity distribution of the variable X. An unrealis¬ 
tic hypothesis may result in underestimation or 
overestimation of true risk. If we use VaR to 
build reserves in order to cover losses in times 
of crises, then underestimation may be fatal and 
overestimation may lead to inefficient use of 
capital. An inaccurate model is even more dan¬ 
gerous in an optimal portfolio problem in which 
we minimize risk subject to some constraints, as 
it may adversely influence the optimal weights 
and therefore not reduce the true risk. 

Even though VaR has been largely adopted 
by financial institutions and approved by reg¬ 
ulators, it turns out that VaR has important 
deficiencies. While it provides an intuitive de¬ 
scription of how much a portfolio may lose, 
generally, it should be abandoned as a risk mea¬ 
sure. The most important drawback is that, in 
some cases, the reasonable diversification effect 
that every portfolio manager should expect to 
see in a risk measure is not present; that is, the 
VaR of a portfolio may be greater than the sum 
of the VaRs of the constituents 

VaR e (X + Y) > VaR e (X) + VaR e (Y) (5) 

in which X and Y stand for the random payoff 
of the instruments in the portfolio. This shows 
that VaR cannot be a true risk measure. 


We give a simple example, which shows that 
VaR may satisfy (5). Suppose that X denotes a 
bond that either defaults with probability 4.5% 
and we lose $50 or it does not default and in 
this case the loss is equal to zero. Let Y be the 
same bond but assume that the defaults of the 
two bonds are independent events. The VaR of 
the two bonds at 95% confidence level (5% tail 
probability) is equal to zero, 

VaR om (X) = VaR 0 . 05 (Y) = 0 

Being the 5% quantile of the payoff distribu¬ 
tion in this case, VaR fails to recognize losses 
occurring with probability smaller than 5%. A 
portfolio of the two bonds has the following 
payoff profile: It loses $100 with probability of 
about 0.2%, loses $50 with probability of about 
8.6%, and the loss is zero with probability 91.2%. 
Thus, the corresponding 95% VaR of the port¬ 
folio equals $50 and clearly, 

$50 = VaR om (X + Y) > VaR 0 . 05 (X) 
+VaR 0 . 05 (Y) = 0 

What are the consequences of using a risk 
measure that may satisfy property (5)? It is go¬ 
ing to mislead portfolio managers that there is 
no diversification effect in the portfolio and they 
may make the irrational decision to concentrate 
it only into a few positions. As a consequence, 
the portfolio risk actually increases. 

Besides being sometimes incapable of recog¬ 
nizing the diversification effect, another draw¬ 
back is that VaR is not very informative about 
losses beyond the VaR level. It only reports that 
losses larger than the VaR level occur with prob¬ 
ability equal to e but it does not provide any 
information about the likely magnitude of such 
losses, for example. 

Nonetheless, VaR is not a useless concept to 
be abandoned altogether. For example, it can be 
used in risk reporting only as a characteristic of 
the portfolio return (payoff) distribution since 
it has a straightforward interpretation. The 
criticism of VaR is focused on its wide appli¬ 
cation by practitioners as a true risk measure, 
which, in view of the deficiencies described 
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above, is not well grounded and should be 
reconsidered. 

COMPUTING PORTFOLIO 
VaR IN PRACTICE 

In this section, we provide three approaches 
for portfolio VaR calculation that are used in 
practice. We assume that the portfolio contains 
common stocks, which is only to make the de¬ 
scription easier to grasp; this is not a restriction 
of any of the approaches. 

Suppose that a portfolio contains n common 
stocks and we are interested in calculating the 
daily VaR at 99% confidence level. Denote the 
random daily returns of the stocks by Xi,... ,X„ 
and by w\, ..., w„ the weight of each stock in 
the portfolio. Thus, the portfolio return r p can 
be calculated as 

r p = n>iXi + W 2 X 2 + -b w n X n 

The portfolio VaR is derived from the distri¬ 
bution of r p . The three approaches vary in the 
assumptions they make. 

The Approach of RiskMetrics 

The approach of RiskMetrics Group is centered 
on the assumption that asset returns have a 
multivariate normal distribution. Under this 
assumption, the distribution of the portfolio re¬ 
turn is also normal. Therefore, in order to calcu¬ 
late the portfolio VaR, we only have to calculate 
the expected return of r p and the standard de¬ 
viation of r p . The 99% VaR will appear as the 
negative of the 1% quantile of the N(Er p , er r 2 ) 
distribution. 

The portfolio expected return can be directly 
expressed through the expected returns of the 
assets 

Er p — w\EX^ 4 - W 2 EX 2 T • • • T w n EX n 

n 

= J2 wkEXk < 6 ) 

k=l 

where E denotes mathematical expectation. 
Similarly, the variance of the portfolio return 


(t, 2 can be computed through the variances of 
the asset returns and their covariances, 

a f r = + w l a x 2 + ' • • + wfcx. 

+ ^2 WjWj COv(X;, Xj) 

'%/ 

in which the last term appears because we have 
to sum up the covariances between all pairs 
of asset returns. There is a more compact way 
of writing down the expression for er 2 using 
matrix notation, 

CT 2 = w'Xw (7) 

in which w = (w\, ..., iv„) is the vector of port¬ 
folio weights and E is the covariance matrix of 
asset returns. 
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in which op, i % /, is the covariance between 
X, and Xj, a/j = cov(X,, Xj). As a result, we ob¬ 
tain that the portfolio return has a normal dis¬ 
tribution with mean given by equation (6) and 
variance given by equation (7). 

The standard deviation is the scale parameter 
of the normal distribution and the mean is the 
location parameter. Due to the normal distribu¬ 
tion properties, if r p e N(Er p , er 2 ), then 

rp ~ EVp e N( 0,1) 

Thus, because of the properties (3) and (4) of the 
VaR, the 99% portfolio VaR can be represented 
as 

VbRo.oi(fp) = ^0.990% ~ Er p ( 8 ) 

where the standard deviation of the portfolio 
return er rp is computed from equation (7), the 
expected portfolio return Er p is given in (6), and 
qo .99 is the 99% quantile of the standard normal 
distribution. 

Note that < 70.99 is a quantity independent of the 
portfolio composition; it is merely a constant 
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that can be calculated in advance. The param¬ 
eters that depend on the portfolio weights are 
the standard deviation of portfolio returns 
and the expected portfolio return. As a conse¬ 
quence, VaR under the assumption of normality 
is symmetric even though, by definition, VaR is 
centered on the left tail of the distribution; that 
is, VaR is asymmetric by construction. This re¬ 
sult appears because the normal distribution is 
symmetric around the mean. 

The approach of RiskMetrics can be extended 
for other types of distributions. Lamantia et al. 
(2006fl) and Lamantia et al. (2006 b) provide such 
extensions and comparisons for Student's t and 
stable distributions. 


The Historical Method 

The historical method does not impose any dis¬ 
tributional assumptions; the distribution of 
portfolio returns is constructed from historical 
data. Hence, sometimes the historical simula¬ 
tion method is called a nonparametric method. 
For example, the 99% daily VaR of the portfolio 
return is computed as the negative of the empir¬ 
ical 1% quantile of the observed daily portfolio 
returns. The observations are collected from a 
predetermined time window such as the most 
recent business year. 

While the historical method seems to be 
more general as it is free of any distributional 
hypotheses, it has a number of major draw¬ 
backs. 

1. It assumes that the past trends will continue 
in the future. This is not a realistic assump¬ 
tion because we may experience extreme 
events in the future, for instance, which have 
not happened in the past. 

2. It treats the observations as independent 
and identically distributed (IID), which is 
not realistic. The daily returns data exhibits 
clustering of the volatility phenomenon, au¬ 
tocorrelations and so on, which are some¬ 
times a significant deviation from the IID 
assumption. 


3. It is not reliable for estimation of VaR at 
very high confidence levels. A sample of one 
year of daily data contains 250 observations, 
which is a rather small sample for the pur¬ 
pose of the 99% VaR estimation. 

The Hybrid Method 

The hybrid method is a modification of the his¬ 
torical method in which the observations are 
not regarded as IID but certain weights are as¬ 
signed to them depending on how close they are 
to the present. The weights are determined us¬ 
ing the exponential smoothing algorithm. The 
exponential smoothing accentuates the most re¬ 
cent observations and seeks to take into account 
the time-varying volatility phenomenon. 

The algorithm of the hybrid approach consists 
of the following steps. 

1. Exponentially declining weights are at¬ 
tached to historical returns, starting from the 
current time and going back in time. Let 
r t -k+ i, .... r f _i, r t be a sequence of k observed 
returns on a given asset, where t is the cur¬ 
rent time. The z-th observation is assigned a 
weight 

e,i = c*x t ~ i 2 

where 0 < X < 1, and c = is a constant 
chosen such that the sum of all weights is 
equal to one, = 1. 

2. Similarly to the historical simulation 
method, the hypothetical future returns are 
obtained from the past returns and sorted in 
increasing order. 

3. The VaR measure is computed from the em¬ 
pirical c.d.f. in which each observation has 
probability equal to the weight 0,. 

Generally, the hybrid approach is appropriate 
for VaR estimation of heavy-tailed time series. 
It overcomes, to some degree, the first and the 
second deficiency of the historical method but 
it is also not reliable for VaR estimation of very 
high confidence levels. 
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The Monte Carlo Method 

In contrast to the historical method, the Monte 
Carlo method requires specification of a statisti¬ 
cal model for asset returns. The statistical model 
is multivariate, hypothesizing both the behav¬ 
ior of the asset returns on a stand-alone basis 
and their dependence. For instance, the mul¬ 
tivariate normal distribution assumes normal 
distributions for the asset returns viewed on 
a stand-alone basis and describes the depen¬ 
dencies by means of the covariance matrix. The 
multivariate model can also be constructed by 
specifying explicitly the one-dimensional dis¬ 
tributions of the asset returns, and their depen¬ 
dence through a copula function. 

The Monte Carlo method consists of the fol¬ 
lowing basic steps. 

Step 1. Selection of a statistical model. The sta¬ 
tistical model should be capable of explain¬ 
ing a number of observed phenomena in the 
data, for example, heavy tails, clustering of 
the volatility, and so on, which we think in¬ 
fluence the portfolio risk. 

Step 2. Estimation of the statistical model param¬ 
eters. A sample of observed asset returns is 
used from a predetermined time window, for 
instance the most recent 250 daily returns. 
Step 3. Generation of scenarios from the fitted model. 
Independent scenarios are drawn from the 
fitted model. Each scenario is a vector of asset 
returns, which depend on each other accord¬ 
ing to the presumed dependence structure of 
the statistical model. 

Step 4. Calcidation of portfolio risk. Compute 
portfolio risk on the basis of the portfolio 
return scenarios obtained from the previous 
step. 

The Monte Carlo method is a very general 
numerical approach to risk estimation. It does 
not require any closed-form expressions and, 
by choosing a flexible statistical model, accurate 
risk numbers can be obtained. A disadvantage 
is that the computed portfolio VaR is dependent 
on the generated sample of scenarios and will 


fluctuate a little if we regenerate the sample. 
This side effect can be reduced by generating a 
larger sample. An illustration is provided in the 
following example. 

Suppose that the daily portfolio return dis¬ 
tribution is standard normal and, therefore, at 
Step 4 of the algorithm we have scenarios from 
the standard normal distribution. Under the as¬ 
sumption of normality, we can use the approach 
of RiskMetrics and compute the 99% daily VaR 
directly from formula (8). Nevertheless, we will 
use the Monte Carlo method to gain more in¬ 
sight into the deviations of the VaR based on 
scenarios from the VaR computed according to 
formula (8). 

In order to investigate how the fluctuations 
of the 99% VaR change about the theoretical 
value, we generate samples of different sizes: 
500,1,000,5,000,10,000,20,000, and 100,000 sce¬ 
narios. The 99% VaR is computed from these 
samples and the numbers are stored. We repeat 
the experiment 100 times. In the end, we have 
100 VaR numbers for each sample size. We ex¬ 
pect that as the sample size increases, the VaR 
values will fluctuate less about the theoretical 
value which is VaR 0 . 0 i(X) = 2.326, X e N(0,1). 

Table 1 contains the result of the experiment. 
From the 100 VaR numbers, we calculate the 
95% confidence interval for the true value given 
in the third column. The confidence intervals 
cover the theoretical value 2.326 and also we 
notice that the length of the confidence interval 

Table 1 The 99% VaR of the Standard Normal 


Distribution Computed from a Sample of Scenarios 


Number of 
Scenarios 

99% VaR 

95% Confidence 
Interval 

500 

2.067 

[1.7515,2.3825] 

1,000 

2.406 

[2.1455,2.6665] 

5,000 

2.286 

[2.1875,2.3845] 

10,000 

2.297 

[2.2261, 2.3682] 

20,000 

2.282 

[2.2305, 2.3335] 

50,000 

2.342 

[2.3085,2.3755] 

100,000 

2.314 

[2.2925, 2.3355] 


Note: The 95% confidence interval is calculated from 
100 repetitions of the experiment. The true value is 
WRo.oi(X) = 2.326. 
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Figure 2 Boxplot Diagrams of the Fluctuation of 
the 99% VaR of the Standard Normal Distribution 
Based on Scenarios 

Note: The horizontal axis shows the number of 
scenarios and the boxplots are computed from 100 
independent samples. 

decreases as the sample size increases. This 
effect is best illustrated with the help of the box- 
plot diagrams 1 shown in Figure 2. A sample of 
100,000 scenarios results in VaR numbers that 
are tightly packed around the true value while 
a sample of only 500 scenarios may give a very 
inaccurate estimate. 

This simple experiment shows that the num¬ 
ber of scenarios in the Monte Carlo method 
has to be carefully chosen. The approach we 
used to determine the fluctuations of the VaR 
based on scenarios is a statistical method called 
parametric bootstrap. The bootstrap methods 
in general are powerful statistical methods that 
are used to compute confidence intervals when 
the problem is not analytically tractable but 
the calculations may be quite computationally 
intensive. 

The true merits of the Monte Carlo method 
can only be realized when the portfolio contains 
complicated instruments such as derivatives. In 
this case, it is no longer possible to use a closed- 
form expression for the portfolio VaR (and any 
risk measure in general) because the distribu¬ 
tion of portfolio return (or payoff) becomes 
quite arbitrary. The Monte Carlo method pro¬ 
vides the general framework to generate scenar¬ 


ios for the risk-driving factors, then revaluates 
the financial instruments in the portfolio under 
each scenario, and, finally, estimates portfolio 
risk on the basis of the computed portfolio re¬ 
turns (or payoffs) in each state of the world. 

While it may seem a straightforward ap¬ 
proach, the practical implementation is a very 
challenging endeavor from both the software 
development and financial modeling points of 
view. The portfolios of big financial institutions 
often contain products that require yield curve 
modeling, development of fundamental and 
statistical factor models, and, on top of that, 
a probabilistic model capable of describing the 
heavy tails of the risk-driving factor returns, the 
autocorrelation, clustering of the volatility, and 
the dependence between these factors. Process¬ 
ing large portfolios is related to manipulation 
of colossal data structures, which requires ex¬ 
cellent skills of software developers in order to 
be efficiently performed. 


BACK-TESTING OF VaR 

If we adopt VaR for analysis of portfolio expo¬ 
sure, then a reasonable question is whether the 
VaR calculated according to any of the meth¬ 
ods discussed in the previous section is real¬ 
istic. Suppose that we calculate the 99% daily 
portfolio VaR. This means that according to our 
assumption for the portfolio return (payoff) dis¬ 
tribution, the portfolio loses more than the 99% 
daily VaR with 1% probability. The question is 
whether this estimate is correct; that is, does the 
portfolio really lose more than this amount with 
1% probability? This question can be answered 
by back-testing of VaR. 

Generally, the procedure consists of the 
following steps. 

Step 1. Choose a time window for the back¬ 
testing. Usually the time window is the most 
recent one or two years. 

Step 2. For each day in the time window, calcu¬ 
late the VaR number. 
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31-Dec-2002 05-May-2003 02-Sep-2003 31-Dec-2003 

Figure 3 The Observed Daily Returns of the S&P 500 Index between December 31,2002 and December 
31, 2003 and the Negative of VaR 

Note: The marked observation is an example of an exceedance. 


Step 3. Check if the loss on a given day is be¬ 
low or above the VaR number computed the 
day before. If the observed loss is larger, then 
we say that there is a case of an exceedance. 
Figure 3 provides an example. 

Step 4. Count the number of exceedances. 
Check if there are too many or too few 
of them by verifying if the number of ex¬ 
ceedances belong to the corresponding 95% 
confidence interval. 

If in Step 4 we find out that there are too 
many exceedances, then the VaR numbers pro¬ 
duced by the model are too optimistic. Losses 
exceeding the corresponding VaR happen too 
frequently. If capital reserves are determined 
on the basis of VaR, then there is a risk of being 
incapable of covering large losses. Conversely, 
if the we find out that there are too few ex¬ 
ceedances, then the VaR numbers are too pes¬ 
simistic. This is also an undesirable situation. 
Note that the actual size of the exceedances is 
immaterial; we only count them. 

The confidence interval for the number of 
exceedances is constructed on the basis of 
the indicator-type events "we observe an ex¬ 
ceedance," "we do not observe an exceedance" 


on a given day. If we consider the 99% VaR, 
then the probability of the first event, according 
to the model, is 1%. Let us associate a number 
with each of the events similar to a coin tossing 
experiment. If we observe an exceedance on a 
given day, then we say that the number 1 has 
occurred, otherwise 0 has occurred. If the back¬ 
testing time window is two years, then we have 
a sequence of 500 zeros and ones and the ex¬ 
pected number of exceedances is 5. Thus, find¬ 
ing the 95% confidence interval for the number 
of exceedances reduces to finding an interval 
around 5 such that the probability of the num¬ 
ber of ones belonging to this interval is 95%. 

If we assume that the corresponding events 
are independent, then there is a complete 
analogue of this problem in terms of coin toss¬ 
ing. We toss an unfair coin independently 500 
times with probability of success equal to 1%. 
What is the range of the number of success 
events with 95% probability? In order to find 
the 95% confidence interval, we can resort to 
the normal approximation to the binomial dis¬ 
tribution. The formula is 

left bound = Ne — F -1 (l - 0.05/2)v/Ne(l - e) 
right bound = Ne + F~ a (l — 0.05/2) v /Ne(l — e) 
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where N is the number of indicator-type events, 
e is the tail probability of the VaR, and F _1 (f) is 
the inverse distribution function of the standard 
normal distribution. In the example, N = 500, 
e = 0.01, and the 95% confidence interval for 
the number of exceedances is [0, 9]. Similarly, 
if we are back-testing the 95% VaR, under the 
same circumstances the confidence interval is 
[15, 34], 

Note that the statistical test based on the 
back-testing of VaR at a certain tail probabil¬ 
ity cannot answer the question if the distribu¬ 
tional assumptions for the risk-driving factors 
are correct in general. For instance, if the portfo¬ 
lio contains only common stocks, then we pre¬ 
sume a probabilistic model for stocks returns. 
By back-testing the 99% daily VaR of portfolio 
return, we verify if the probabilistic model is 
adequate for the 1% quantile of the portfolio 
return distribution; that is, we are back-testing 
if a certain point in the left tail of the portfo¬ 
lio return distribution is sufficiently accurately 
modeled. This should not be confused with sta¬ 
tistical tests such as the Kolmogorov test or 
the Kolmogorov-Smirnov test, which concern 
accepting or rejecting a given distributional 
hypothesis. 

COHERENT RISK MEASURES 

Even though VaR has an intuitive interpretation 
and has been widely adopted as a risk mea¬ 
sure, it does not always satisfy the important 
property that the VaR of a portfolio should not 
exceed the sum of the VaRs of the portfolio posi¬ 
tions. This means that VaR is not always capable 
of representing the diversification effect. 

This fact raises an important question. Can 
we find a set of desirable properties that a risk 
measure should satisfy? An answer is given 
by Artzner et al. (1998). They provide an ax¬ 
iomatic definition of a functional, which they 
call a coherent risk measure. The axioms follow 
with remarks given below each axiom. We de¬ 
note the risk measure by the functional p(X) 
assigning a real-valued number to a random 


variable. Usually, the random variable X is 
interpreted as a random payoff and the motiva¬ 
tion for the axioms in Artzner et al. (1998) fol¬ 
lows this interpretation. In the remarks below 
each axiom, we provide an alternative interpre¬ 
tation, which holds if X is interpreted as random 
return. 

The Monotonicity Property 

Monotonicity p(Y) < p(X), 

if Y > X in almost sure sense 

Monotonicity states that if investment A has 
random return (payoff) Y, which is not less than 
the return (payoff) X of investment B at a given 
horizon in all states of the world, then the risk 
of A is not greater than the risk of B. This is quite 
intuitive but it really does matter whether the 
random variables represent random return or 
profit because an inequality in an almost sure 
sense between random returns may not trans¬ 
late into the same inequality between the corre¬ 
sponding random profits and vice versa. 

Suppose that X and Y describe the random 
percentage returns on two investments A and 
B and let Y = X + 3%. Apparently, Y > X in all 
states of the world. The corresponding payoffs 
are obtained according to the equations 

Payoff (X) = I A ( 1 + X) 

Payoff (Y) = t B (l + Y) = J„(l + X + 3%) 

where I a is the initial investment in opportunity 
A and fg is the initial investment in opportunity 
B. If the initial investment I a is much larger than 
Ib, then Payoff(X) > Payoff(Y) irrespective of 
the inequality Y > X. In effect, investment A 
may seem less risky than investment B in terms 
of payoff but in terms of return, the converse 
may hold. 

The Positive Homogeneity Property 

Positive Homogeneity p( 0) = 0, p(XX) = Xp(X), 
for all X and all X > 0 

The positive homogeneity property states that 
scaling the return (payoff) of the portfolio by 
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a positive factor scales the risk by the same fac¬ 
tor. The interpretation for payoffs is obvious—if 
the investment in a position doubles, so does the 
risk of the position. We give a simple example 
illustrating this property when X stands for a 
random percentage return. 

Suppose that today the value of a portfolio 
is I o and we add a certain amount of cash C. 
The value of our portfolio becomes Io + C. The 
value tomorrow is random and equals h + C 
in which 1 1 is the random payoff. The return of 
the portfolio equals 

x = h + C - to - C = fi-fp ( lo \ 

Io + C t 0 Vo + Cj 

= h h—ih ~ hY 
Io 

where h = I a /(I 0 + C) is a positive constant. The 
axiom positive homogeneity property implies 
that p(X) = lip(Y); that is, the risk of the new 
portfolio will be the risk of the portfolio without 
the cash but scaled by h. 

The Subadditivity Property 

Subadditivity p(X + Y) < p(X) + p(Y), 
for all X and Y 

If X and Y describe random payoffs, then the 
subadditivity property states that the risk of the 
portfolio is not greater than the sum of the risks 
of the two random payoffs. 

The positive homogeneity property and the 
subadditivity property imply that the func¬ 
tional is convex 

p(XX + (1 - k)Y) < p(XX) + p(( 1 - k)Y) 

= V(X) + (1 - k)p(Y) 

where k e [0,1]. If X and Y describe random re¬ 
turns, then the random quantity XX + (1 — X)Y 
stands for the return of a portfolio composed of 
two financial instruments with returns X and Y 
having weights X and 1 — X respectively. There¬ 
fore, the convexity property states that the risk 
of a portfolio is not greater than the sum of the 
risks of its constituents, meaning that it is the 


convexity property that is behind the diversifi¬ 
cation effect that we expect in the case of X and 
Y denoting random returns. 

The Invariance Property 

Invariance p(X + C) — p(X) — C, 
for all X and CeR 

The invariance property has various labels. 
Originally, it was called translation invariance 
while in other texts it is called cash invariance. 2 
If X describes a random payoff, then the invari¬ 
ance property suggests that adding cash to a 
position reduces its risk by the amount of cash 
added. This is motivated by the idea that the 
risk measure can be used to determine cap¬ 
ital requirements. As a consequence, the risk 
measure p(X) can be interpreted as the minimal 
amount of cash necessary to make the position 
free of any capital requirements 

P(X + p(X)) = 0 

The invariance property has a different in¬ 
terpretation when X describes random return. 
Suppose that the random variable X describes 
the return of a common stock and we build 
a long-only portfolio by adding a government 
bond yielding a risk-free rate r B - The portfolio 
return equals zvX + (1 — w)r B , where zv e [0,1] 
is the weight of the common stock in the portfo¬ 
lio. Note that the quantity (1 — w)r B is nonran¬ 
dom by assumption. The invariance property 
states that the risk of the portfolio can be de¬ 
composed as 

p(wX+ (1 - w)r B ) = p (wX) - (1 - w)r B 
= wp{X) - (1 - w)r B 

( 9 ) 

where the second equality appears because of 
the positive homogeneity property. In effect, the 
risk measure admits the following interpreta¬ 
tion: Assume that the constructed portfolio is 
equally weighted, that is, w = 1/2, then the 
risk measure equals the level of the risk-free 
rate such that the risk of the equally weighted 
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portfolio consisting of the risky asset and the 
risk-free asset is zero. The investment in the 
risk-free asset will be, effectively, the reserve 
investment. 

Alternative interpretations are also possible. 
Suppose that the present value of the position 
with random percentage return X is to- Assume 
that we can find a government security earning 
return rjj at the horizon of interest. Then we can 
ask the question in the opposite direction: How 
much should we reallocate from to and invest in 
the government security in order to hedge the 
risk p(X)? The needed capital C should satisfy 
the equation 


1 \^p(X) - Y r B = o 

which is merely a restatement of equation (9) 
with the additional requirement that the risk 
of the resulting portfolio should be zero. The 
solution is 


C = I 0 


P(X) 

p(X) + r* 


Note that if in the invariance property the con¬ 
stant is nonnegative, C > 0, then it follows that 
p(X + C) < p(X). This result is in agreement 
with the monotonicity property as X + C > X. 
In fact, the invariance property can be regarded 
as an extension of the monotonicity property 
when the only difference between X and Y is in 
their means. 

According to the discussion in the previous 
section, VaR is not a coherent risk measure be¬ 
cause it may violate the subadditivity property. 

An example of a coherent risk measure is the 
Average Value-at-Risk (AVaR), defined as the 
average of the VaRs that are larger than the VaR 
at a given tail probability e. The accepted nota¬ 
tion is AVaR e (X) in which e stands for the tail 
probability level. A larger family of coherent 
risk measures is the family of spectral risk mea¬ 
sures, which includes the AVaR as a representa¬ 
tive. The spectral risk measures are defined as 
weighted averages of VaRs. 


KEY POINTS 

• VaR is defined as the minimum level of loss 
at a given, sufficiently high confidence level 
for a predefined time horizon. 

• The performance of VaR, as well as any other 
risk measure, is heavily dependent on the as¬ 
sumed probability distribution for the eco¬ 
nomic measure of interest. 

• Despite VaR's wide acceptance in the finance 
industry, it has important deficiencies so that, 
in general, it should be abandoned as a risk 
measure. However, it is not a useless con¬ 
cept to be abandoned altogether. For exam¬ 
ple, it can be used in risk reporting only as 
a characteristic of the portfolio return (pay¬ 
off) distribution since it has a straightforward 
interpretation. 

• The most important drawback of VaR is 
that, in some cases, the reasonable diversi¬ 
fication effect that every portfolio manager 
should expect to see in a risk measure is not 
present. 

• The criticism of VaR is focused on its wide ap¬ 
plication by practitioners as a true risk mea¬ 
sure, which, in view of its deficiencies, is not 
well grounded and should be reconsidered. 

• Three approaches for portfolio VaR calcula¬ 
tion that are used in practice are the Risk- 
Metrics approach, the historical method ap¬ 
proach, and the Monte Carlo approach. 


NOTES 

1. A boxplot, or a box-and-whiskers diagram, 
is a convenient way of depicting several sta¬ 
tistical characteristics of the sample. The size 
of the box equals the difference between the 
third and the first quartile (75% quantile- 
25% quantile), also known as the interquar¬ 
tile range. The line in the box corresponds 
to the median of the data (50% quantile). 
The lines extending out of the box are 
called whiskers and each of them is long 
up to 1.5 times the interquartile range. All 
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observations outside the whiskers are la¬ 
beled outliers and are depicted by a plus 
sign. 

2. This label can be found in Follmer and Schied 

( 2002 ). 
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Abstract: Despite the fact that the value-at-risk (VaR) measure has been adopted as a standard 
risk measure in the financial industry, it has a number of deficiencies recognized by financial 
professionals. It is not a coherent risk measure. This is because it does not satisfy the subadditivity 
property requirement of a coherent risk measure. That is, there are cases in which the portfolio 
VaR is larger than the sum of the VaRs of the portfolio constituents, supporting the view that VaR 
cannot be used as a true risk measure. Unlike VaR, the average value-at-risk measure (AVaR)—also 
referred to as conditional value-at-risk and expected shortfall—is a coherent risk measure and has 
other advantages that result in its greater acceptance in risk modeling. 


The average value-at-risk (AVaR) is a risk mea¬ 
sure that is a superior alternative to VaR. Not 
only does it lack the deficiencies of VaR, but it 
also has an intuitive interpretation. There are 
convenient ways for computing and estimating 
AVaR, which allows its application in optimal 
portfolio problems. Moreover, it satisfies all ax¬ 
ioms of coherent risk measures and it is consis¬ 
tent with the preference relations of risk-averse 
investors. 

In this entry, we explore in detail the prop¬ 
erties of AVaR and illustrate its superiority to 
VaR. We develop new geometric interpretations 
of AVaR and the various calculation methods. 


We also provide closed-form expressions for the 
AVaR of the normal distribution. Student's t dis¬ 
tribution, and a practical formula for Levy sta¬ 
ble distributions. Finally, we describe different 
estimation methods and remark on potential 
pitfalls. 

AVERAGE VALUE-AT-RISK 
DEFINED 

A disadvantage of VaR is that it does not give 
any information about the severity of losses 
beyond the VaR level. Consider the following 
example. Suppose that X and Y describe the 
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Figure 1 Note: The top plot shows the densities 
of X and Y and the bottom plot shows their c.d.f.s. 
The 95% VaRs of X and Y are equal to 0.15 but X 
has a thicker tail and is more risky. 

random returns of two financial instruments 
with densities and distribution functions such 
as the ones in Figure 1. The expected returns 
are 3% and 1%, respectively. The standard de¬ 
viations of X and Y are equal to 10%. 1 The 
cumulative distribution functions (c.d.f.s) F x (x) 
and F Y (x) cross at x = —0.15 and F x (—0.15) = 
Fy(—0.15) = 0.05. According to the definition of 
VaR, the 95% VaRs of both X and Y are equal 
to 15%. That is, the two financial instruments 
lose more than 15% of their present values with 
probability of 5%. In effect, we may conclude 


that their risks are equal because their 95% VaRs 
are equal. 

This conclusion is wrong because we pay no 
attention to the losses that are larger than the 
95% VaR level. It is visible in Figure 1 that the 
left tail of X is heavier than the left tail of Y. 2 
Therefore, it is more likely that the losses of X 
will be larger than the losses of Y, on condi¬ 
tion that they are larger than 15%. Thus, look¬ 
ing only at the losses occurring with probability 
smaller than 5%, the random return X is riskier 
than Y. Note that both X and Y have equal 
standard deviations. If we base the analysis on 
the standard deviation and the expected return, 
we would conclude that not only is the uncer¬ 
tainty of X equal to the uncertainty of Y, but 
X is actually preferable because of the higher 
expected return. In fact, we realize that it is ex¬ 
actly the opposite, which shows how important 
it is to ground the reasoning on a proper risk 
measure. 

The disadvantage of VaR, that it is not infor¬ 
mative about the magnitude of the losses larger 
than the VaR level, is not present in the risk mea¬ 
sure known as average value-at-risk. In the liter¬ 
ature, it is also called conditional value-at-risk 3 or 
expected shortfall but we will use average value- 
at-risk (AVaR) as it best describes the quantity 
it refers to. 

The AVaR at tail probability c is defined as 
the average of the VaRs, which are larger than 
the VaR at tail probability e. Therefore, by con¬ 
struction, the AVaR is focused on the losses in 
the tail, which are larger than the corresponding 
VaR level. The average of the VaRs is computed 
through the integral 

AVaR e (X) = 1 f VaR p (X)dp (1) 

€ Jo 

where VaR p (X) is defined by VaR e (X) = 
— inf{ x\ P(X < x) > e} = — Ff 1 ( e ). As a matter 
of fact, the AVaR is not well defined for all 
real-valued random variables but only for those 
with finite mean; that is AVaRfX) < oo if £ | X| 
< oo. This should not be disturbing because 
random variables with infinite mathematical 
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Figure 2 Geometrically, AVaR € (X) is the height 
for which the area of the drawn rectangle equals 
the shaded area closed between the graph of the 
inverse c.d.f. and the horizontal axis for t e [0,e]. 
The VaR e (X) value is shown by a dash-dotted line. 


expectation have limited application in the field 
of finance. For example, if such a random vari¬ 
able is used for a model of stock returns, then it 
is assumed that the common stock has infinite 
expected return, which is not realistic. 

The AVaR satisfies all the axioms of coher¬ 
ent risk measures. One consequence is that, un¬ 
like VaR, it is convex for all possible portfolios, 
which means that it always accounts for the di¬ 
versification effect. 

A geometric interpretation of the definition in 
equation (1) is provided in Figure 2. In this fig¬ 
ure, the inverse c.d.f. of a random variable X is 
plotted. The shaded area is closed between the 
graph of F-' (f) and the horizontal axis for t e [0, 
e] where e denotes the selected tail probability. 
AVaR e (X) is the value for which the area of the 
drawn rectangle, equal to e x AVaR e (X), coin¬ 
cides with the shaded area, which is computed 
by the integral in equation (1). The VnR e (X) 
value is always smaller than AVaR e (X). In Fig¬ 
ure 2, VnR e (X) is shown by a dash-dotted line 
and is indicated by an arrow. 

Let us revisit the example developed at the be¬ 
ginning of this section. We concluded that even 



Figure 3 The AVaRs of the Return Distributions 
from Figure 1 in Line with the Geometric Intuition 
Note: Even though the 95% VaRs are equal, the 
AVaRs at 5% tail probability differ, AVnRo .05 (X) > 
AVaR 0 . 05 (Y). 

though the VaRs at 5% tail probability of both 
random variables are equal, X is riskier than 
Y because the left tail of X is heavier than the 
left tail of Y; that is, the distribution of X is more 
likely to produce larger losses than the distribu¬ 
tion of Y on condition that the losses are beyond 
the VaR at the 5% tail probability. We apply the 
geometric interpretation illustrated in Figure 2 
to this example. First, notice that the shaded 
area in Figure 2, which concerns the graph of 
the inverse of the c.d.f., can also be identified 
through the graph of the c.d.f. This is done in 
Figure 3, which shows a magnified section of 
the left tails of the c.d.f.s plotted in Figure 1. 
The shaded area appears as the intersection of 
the area closed below the graph of the distribu¬ 
tion function and the horizontal axis, and the 
area below a horizontal line shifted at the tail 
probability above the horizontal axis. In Fig¬ 
ure 3, we show the area for Fx(x) at 5% tail 
probability. The corresponding area for Fy(x) is 
smaller because Fy(x) < Fx(x) to the left of the 
crossing point of the two c.d.f.s, which is exactly 
at 5% tail probability. 

In line with the geometric interpretation, the 
AVaRo.osiX) is a number such that if we draw 






































334 


Risk Measures 


a rectangle with height 0.05 and width equal 
to AVnRo o^(X), the area of the rectangle (0.05 x 
AVaRo o^(X)) equals the shaded area in Figure 3. 
The same exercise for AVaRo^(Y) shows that 
AVaR 005 (Y) < AVaR 0 05 (X) because the corre¬ 
sponding shaded area is smaller and both rect¬ 
angles share a common height of 0.05. 

Besides the definition in equation (1), AVaR 
can be represented through a minimization 
formula, 4 

AVaR e (X) = min \0 + -E(—X — @) + J (2) 

where (x)+ denotes the maximum between x 
and zero, (x)+ = max(x, 0) and X describes the 
portfolio return distribution. It turns out that 
this formula has an important application in 
optimal portfolio problems based on AVaR as 
a risk measure. In the appendix to this entry, 
we provide an illuminating geometric interpre¬ 
tation of equation (2), which shows the connec¬ 
tion to the definition of AVaR. 

How can we compute the AVaR for a given re¬ 
turn distribution? Throughout this section, we 
assume that the return distribution function is a 
continuous function, that is, there are no point 
masses. Under this condition, after some alge¬ 
bra and using the fact that VaR is the negative 
of a certain quantile, we obtain that the AVaR 
can be represented in terms of a conditional 
expectation, 

AVaR e (X) =-- f Fx\t)dt 

£ Jo 

= -E(X\X< -VaR e {X)) (3) 

which is called expected tail loss (ETL) and is 
denoted by £TL e (X). The conditional expecta¬ 
tion implies that the AVaR equals the average 
loss provided that the loss is larger than the VaR 
level. In fact, the average of VaRs in equation (1) 
equals the average of losses in equation (3) only 
if the c.d.f. of X is continuous at x = VaR e (X). If 
there is a discontinuity, or a point mass, the rela¬ 
tionship is more involved. The general formula 
is given in the appendix to this entry. 


Equation (3) implies that AVaR is related to 
the conditional loss distribution. In fact, under 
certain conditions, it is the mathematical expec¬ 
tation of the conditional loss distribution, which 
represents only one characteristic of it. In the ap¬ 
pendix to this entry, we introduce several sets 
of characteristics of the conditional loss distri¬ 
bution, which provide a more complete picture 
of it. Also, in the appendix, we introduce the 
more general concept of higher-order AVaR. 

For some continuous distributions, it is pos¬ 
sible to calculate explicitly the AVaR through 
equation (3). We provide the closed-form 
expressions for the normal distribution and Stu¬ 
dent's t distribution. In the appendix to this en¬ 
try, we give a semi-explicit formula for the class 
of stable distributions. 


1. The normal distribution 

Suppose that X is distributed according to a 
normal distribution with standard deviation 
ax and mathematical expectation EX. The 
AVaR of X at tail probability e equals 


AVaR,(X)= 


ax 

: \/2JT 


exp - 


(VaR e (Y)) : 


-EX 

( 4 ) 


where Y has the standard normal distribu¬ 
tion, Y e N(0,1). 

2. The Student's t distribution 

Suppose that X has Student's t distribution 
with v degrees of freedom, Xe t( v). The AVaR 
of X at tail probability e equals 


AVaR € (X) = 

/y + l\ 1-v 

\ 2 ) VF / (VaR<{X)?\ 2 

r (") (v-iw^V « / 


, V > 1 

,v = l 


where the notation T(x) stands for the 
gamma function. It is not surprising that for 
v = 1 the AVaR explodes because the Stu¬ 
dent's t distribution with one degree of free¬ 
dom, also known as the Cauchy distribution, 
has infinite mathematical expectation. 5 







Average Value-at-Risk 


335 


Note that equation (4) can be represented in a 
more compact way, 

AVaR e (X) = a x C e — EX (5) 

where C r is a constant which depends only 
on the tail probability e. Therefore, the AVaR 
of the normal distribution has the same struc¬ 
ture as the normal VaR—the difference between 
the properly scaled standard deviation and the 
mathematical expectation. In effect, similar to 
the normal VaR, the normal AVaR properties 
are dictated by the standard deviation. Even 
though AVaR is focused on the extreme losses 
only, due to the limitations of the normal as¬ 
sumption, it is symmetric. 

Exactly the same conclusion holds for the 
AVaR of Student's t distribution. The true mer¬ 
its of AVaR become apparent if the underlying 
distributional model is skewed. 


AVaR ESTIMATION FROM 
A SAMPLE 

Suppose that we have a sample of observed 
portfolio returns and we are not aware of their 
distribution. Provided that we do not impose 
any distributional model, the AVaR of portfolio 
return can be estimated from the sample of ob¬ 
served portfolio returns. Denote the observed 
portfolio returns by r\, r 2/ ..., r n at time instants 
t\, f 2 , ..., f„. The numbers in the sample are 
given in order of observation. Denote the sorted 
sample by r (1) < r (2 ) <,..., < T( n y Thus, r (1) equals 
the smallest observed portfolio return and f( n ) 
is the largest. The AVaR of portfolio returns at 
tail probability e is estimated according to the 
formula 6 

AtoRAr) = ~ + ( e ~ r ” g ], 1 ) r (r**1)j 

( 6 ) 

where the notation \x] stands for the smallest 
integer larger than x? The "hat" above AVaR 
denotes that the number calculated by equation 
(6) is an estimate of the true value because it is 


based on a sample. This is a standard notation 
in statistics. 

We demonstrate how equation (6) is applied 
in the following example. Suppose that the 
sorted sample of portfolio returns is —1.37%, 
-0.98%, -0.38%, -0.26%, 0.19%, 0.31%, 1.91% 
and our goal is to calculate the portfolio AVaR 
at 30% tail probability. In this case, the sample 
contains 7 observations and ( ne) = (7 x 0.3) = 
3. According to equation (6), we calculate 

AVaR() 3 (r) = ^(-1.37% - 0.98%) 

+(0.3 - 2/7)(—0.38%)^ 

= 1.137%. 

Formula (6) can be applied not only to a sam¬ 
ple of empirical observations. We may want 
to work with a statistical model for which no 
closed-form expressions for AVaR are known. 
Then we can simply sample from the distri¬ 
bution and apply formula (6) to the generated 
simulations. 

Besides formula (6), there is another method 
for calculation of AVaR. It is based on the 
minimization formula (2) in which we replace 
the mathematical expectation by the sample 
average, 

AVaRAr) = min ( 9 + — max(—r,- — 0, 0) | 

Bern. \ ne J 

( 7 ) 

Even though it is not obvious, equations (6) and 
(7) are completely equivalent. 

The minimization formula in equation (7) is 
appealing because it can be calculated through 
the methods of linear programming. It can be 
restated as a linear optimization problem by 
introducing auxiliary variables d \,..., d n , one 
for each observation in the sample, 

n 

mm 0+d-J2d k 

k=\ 

subject to —r k — 9 < d k , k — 1, n 
d k > 0, k = 1, n 
6 el 


(8) 
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The linear problem (8) is obtained from 
(7) through standard methods in mathemati¬ 
cal programming. We briefly demonstrate the 
equivalence between them. Let us fix the value 
of 6 to 9*. Then the following choice of the auxil¬ 
iary variables yields the minimum in (8). If —r^ 
— 9* < 0, then dk = 0. Conversely, if it turns out 
that —Yk — 9* > 0, then —r* — 9* = dk■ In this way, 
the sum in the objective function becomes equal 
to the sum of maxima in equation (7). 

Applying (8) to the sample in the example 
above, we obtain the optimization problem 

1 7 

min 0 H--— > dk 

64 7 x 0.3 ^ 

k =1 

subject to 0.98% — 6 < d\ 

-0.31% -9 <d 2 
-1.91% -9 < d 3 
1.37% -6 <d 4 
0.38% - 6 < d 5 
0.26% -6 <d 6 
-0.19% -9 <d 7 
d k >0, k = 1,7 
6 eR 


The solution to this optimization problem is the 
number 1.137%, which is attained for 9 = 0.38%. 
In fact, this value of 9 coincides with the VaR at 
30% tail probability and this is not by chance but 
a feature of the problem, which is demonstrated 
in the appendix to this entry. We verify that the 
solution of the problem is indeed the number 
1.137% by calculating the objective in equation 
(7) for 9 = 0.38%, 


AVaR,{r) = 0.38% + 


0.98% - 0.38% + 1.37% - 0.38% 
7x0.3 


= 1.137% 


are n assets with random returns described by 
the random variables X\,.. ,,X n . Thus, the port¬ 
folio return is represented by 

7p = W\X\ + ... + w n X n 

where w \,..., zv„ are the weights of the assets in 
the portfolio. 


The Multivariate Normal 
Assumption 

If the asset returns are assumed to have a mul¬ 
tivariate normal distribution, then the portfolio 
return has a normal distribution with variance 
vo'Xw, where zv is the vector of weights and £ 
is the covariance matrix between stock returns. 
The mean of the normal distribution is 

n 

Er v = y^w k EX k 

k =1 


where £ stands for the mathematical expecta¬ 
tion. Thus, under this assumption the AVaR of 
portfolio return at tail probability e can be ex¬ 
pressed in closed-form through equation (4), 


AVaR e (r p ) 


yVEiu / (VaRAY)) 2 \ 

^T exp l- 2— ) 

C f -s/ w'Ew — Er p 


- Er 


p 


(9) 


where C € is a constant independent of the port¬ 
folio composition and can be calculated in ad¬ 
vance. In effect, due to the limitations of the 
multivariate normal assumption, the portfolio 
AVaR appears symmetric and is representable 
as the difference between the properly scaled 
standard deviation of the random portfolio re¬ 
turn and portfolio expected return. 


Thus, we obtain the number calculated through 
equation (6). 


COMPUTING PORTFOLIO 
AVaR IN PRACTICE 

The ideas behind the approaches of VaR estima¬ 
tion are applied to AVaR. We assume that there 


The Historical Method 

The historical method is not related to any dis¬ 
tributional assumptions. We use the historically 
observed portfolio returns as a model for the fu¬ 
ture returns and apply formula (6) or (7). 

The historical method has several drawbacks. 
It is very inaccurate for low tail probabilities, for 
example, 1% or 5%. Even with one year of daily 
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returns, which amounts to 250 observations, in 
order to estimate the AVaR at 1% probability, we 
have to use the 3 smallest observations, which 
is quite insufficient. What makes the estimation 
problem even worse is that these observations 
are in the tail of the distribution; that is, they are 
the smallest ones in the sample. The implication 
is that when the sample changes, the estimated 
AVaR may change a lot because the smallest 
observations tend to fluctuate a lot. 


The Hybrid Method 

According to the hybrid method, different 
weights are assigned to the observations by 
which the more recent observations get a higher 
weight. The rationale is that the observations far 
back in the past have less impact on the portfo¬ 
lio risk at the present time. 

The hybrid method can be adapted for AVaR 
estimation. The weights assigned to the ob¬ 
servations are interpreted as probabilities and, 
thus, the portfolio AVaR can be estimated from 
the resulting discrete distribution according to 
the formula 

AVaR e (r) 

( 10 ) 

where rq) < r (2 ) < ... < r^ m ) denotes the sorted 
sample of portfolio returns or payoffs and p\, 
P 2 , ■ ■ ■, Pk m stand for the probabilities of the 
sorted observations; that is, p\ is the probabil¬ 
ity of J'(i). The number /c, in equation (10) is an 
integer satisfying the inequalities 

ke ^e+1 

E Pi < e < E Pi 

7=1 7=1 

Equation (10) follows directly from the defi¬ 
nition of AVaR 8 under the assumption that the 
underlying distribution is discrete without the 
additional simplification that the outcomes are 


equally probable. In the appendix to this en¬ 
try, we demonstrate the connection between 
equation (10) and the definition of AVaR in 
equation (1). 

The Monte Carlo Method 

The Monte Carlo method assumes and esti¬ 
mates a multivariate statistical model for the 
asset return distribution. Then we sample from 
it, and we calculate scenarios for portfolio re¬ 
turn. On the basis of these scenarios, we esti¬ 
mate portfolio AVaR using equation (6) in which 
ri,..., r n stands for the vector of generated sce¬ 
narios. 

Similar to the case of VaR, an artifact of the 
Monte Carlo method is the variability of the risk 
estimate. Since the estimate of portfolio AVaR is 
obtained from a generated sample of scenarios, 
by regenerating the sample, we will obtain a 
slightly different value. 

Suppose that the portfolio daily return distri¬ 
bution is the standard normal law, r p e N(0,1). 
By the closed-form expression in equation (4), 
we calculate that the AVaR of the portfolio at 
1% tail probability equals 

= 2.665 

In order to investigate how the fluctuations 
of the 99% AVaR change about the theoretical 
value, we generate samples of different sizes: 
500,1,000,5,000,10,000,20,000, and 100,000 sce¬ 
narios. The 99% AVaR is computed from these 
samples using equation 6 and the numbers are 
stored. We repeat the experiment 100 times. In 
the end, we have 100 AVaR numbers for each 
sample size. We expect that as the sample size 
increases, the AVaR values will fluctuate less 
about the theoretical value which is AVhRo.oi 
(X) = 2.665, X g N(0,1). 

Panel A of Table 1 contains the result of the 
experiment. From the 100 AVaR numbers, we 
calculate the 95% confidence interval reported 
in the third column. The confidence intervals 
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Table 1 Confidence Intervals Calculated for AVaR 
and VaR 


Number of 


95% Confidence 

Scenarios 

AVaR at 99% 

Interval 

500 

2.646 

[2.2060, 2.9663] 

1,000 

2.771 

[2.3810, 2.9644] 

5,000 

2.737 

[2.5266, 2.7868] 

10,000 

2.740 

[2.5698, 2.7651] 

20,000 

2.659 

[2.5955, 2.7365] 

50,000 

2.678 

[2.6208, 2.7116] 

100,000 

2.669 

[2.6365, 2.6872] 

Panel A: The 99% AVaR of the standard normal distri¬ 
bution computed from a sample of scenarios. The 95% 
confidence interval is calculated from 100 repetitions of 
the experiment. The true value is AVaR om (X) = 2.665. 

Number of 


95% Confidence 

Scenarios 

99% VaR 

Interval 

500 

2.067 

[1.7515,2.3825] 

1,000 

2.406 

[2.1455, 2.6665] 

5,000 

2.286 

[2.1875, 2.3845] 

10,000 

2.297 

[2.2261,2.3682] 

20,000 

2.282 

[2.2305, 2.3335] 

50,000 

2.342 

[2.3085, 2.3755] 

100,000 

2.314 

[2.2925,2.3355] 


Panel B: The 99% VaR of the standard normal distri¬ 
bution computed from a sample of scenarios. The 95% 
confidence interval is calculated from 100 repetitions of 
the experiment. The true value is VaRo.oi(X) = 2.326. 

cover the theoretical value 2.665 and also we no¬ 
tice that the length of the confidence interval de¬ 
creases as the sample size increases. This effect 
is illustrated in Figure 4 with boxplot diagrams. 
A sample of 100,000 scenarios results in AVaR 
numbers, which are tightly packed around the 
true value while a sample of only 500 scenarios 
may give a very inaccurate estimate. 

By comparing. Panel A of Table 1 to Panel B of 
the table, which shows the results for VaR, we 
notice that the length of the 95% confidence in¬ 
tervals for AVaR are larger than the correspond¬ 
ing confidence intervals for VaR. This result is 
not surprising. Given that both quantities are 
at the same tail probability of 1%, the AVaR 
has larger variability than the VaR for a fixed 
number of scenarios because the AVaR is the 
average of terms fluctuating more than the 1% 



500 1,000 5,000 10,000 20,000 50,000 100,000 


Figure 4 Boxplot Diagrams of the Fluctuation of 
the AVaR at 1% Tail Probability of the Standard 
Normal Distribution Based on Scenarios 
Note: The horizontal axis shows the number of 
scenarios and the boxplots are computed from 100 
independent samples. 

VaR. This effect is more pronounced the more 
heavy-tailed the distribution is. 


BACK-TESTING OF AVaR 

Suppose that we have selected a method for 
calculating the daily AVaR of a portfolio. A rea¬ 
sonable question is how we can verify if the es¬ 
timates of daily AVaR are realistic. This is done 
by back-testing. In the case of VaR back-testing 
consists of computing the portfolio VaR for each 
day back in time using the information avail¬ 
able up to that day only. In this way, we have 
the VaR numbers back in time as if we had used 
exactly the same methodology in the past. On 
the basis of the VaR numbers and the realized 
portfolio returns, we can use statistical meth¬ 
ods to assess whether the forecasted loss at the 
VaR tail probability is consistent with the ob¬ 
served losses. If there are too many observed 
losses larger than the forecasted VaR, then the 
model is too optimistic. Conversely, if there are 
too few losses larger than the forecasted VaR, 
then the model is too pessimistic. 
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Note that in the case of VaR back-testing, we 
are simply counting the cases in which there 
is an exceedance; that is, when the size of the 
observed loss is larger than the predicted VaR. 
The magnitude of the exceedance is immaterial 
for the statistical test. 

Unlike VaR, back-testing of AVaR is not 
straightforward and is a much more challeng¬ 
ing task. By definition, the AVaR at tail proba¬ 
bility e is the average of VaRs larger than the 
VaR at tail probability e. Thus, the most direct 
approach to test AVaR would be to perform VaR 
back-tests at all tail probabilities smaller than e. 
If all these VaRs are correctly modeled, then so 
is the corresponding AVaR. 

One general issue with this approach is that 
it is impossible to perform in practice. Suppose 
that we consider the AVaR at tail probability 
of 1%, for example. Back-testing VaRs deeper 
in the tail of the distribution can be infeasible 
because the back-testing time window is too 
short. The lower the tail probability, the larger 
the time window we need in order for the VaR 
test to be conclusive. Another general issue is 
that this approach is too demanding. Even if the 
VaR back-testing fails at some tail probability e\ 
below e, this does not necessarily mean that the 
AVaR is incorrectly modeled because the test 
failure may be due to purely statistical reasons 
and not to incorrect modeling. 

These arguments illustrate why AVaR 
back-testing is a difficult problem—we need the 
information about the entire tail of the return 
distribution describing the losses larger than 
the VaR at tail probability e and there may be 
too few observations from the tail upon which 
to base the analysis. For example, in one busi¬ 
ness year, there are typically 250 trading days. 
Therefore, a one-year back-testing results in 250 
daily portfolio returns, which means that if e = 
1%, then there are only 2 observations available 
from the losses larger than the VaR at 1% tail 
probability. 

As a result, in order to be able to back-test 
AVaR, we can assume a certain "structure" of 



Figure 5 Examples of Risk-Aversion Functions 
Note: The right plot shows the risk-aversion func¬ 
tion yielding the AVaR at tail probability e. 

the tail of the return distribution that would 
compensate for the lack of observations. There 
are two general approaches: 

1. Use the tails of the Levy stable distributions 
as a proxy for the tail of the loss distribution 
and take advantage of the practical semi- 
analytic formula for the AVaR given in the 
appendix to this entry to construct a statisti¬ 
cal test. 

2. Make the weaker assumption that the loss 
distribution belongs to the domain of attrac¬ 
tion of a max-stable distribution. Thus, the 
behavior of the large losses can be approxi¬ 
mately described by the limit max-stable dis¬ 
tribution and a statistical test can be based 
on it. 

The rationale of the first approach is that, gen¬ 
erally, the Levy stable distribution provides a 
good fit to the stock returns data and, thus, the 
stable tail may turn out to be a reasonable ap¬ 
proximation. Moreover, from the generalized 
central limit theorem we know that stable dis¬ 
tributions have domains of attraction, which 
makes them an appealing candidate for an 
approximate model. 
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The second approach is based on a weaker 
assumption. The family of max-stable dis¬ 
tributions arises as the limit distribution of 
properly scaled and centered maxima of IID 
random variables. If the random variable de¬ 
scribes portfolio losses, then the limit max- 
stable distribution can be used as a model for 
the large losses (i.e., the ones in the tail). Unfor¬ 
tunately, as a result of the weaker assumption, 
estimators of poor quality have to be used to 
estimate the parameters of the limit max-stable 
distribution, such as the Hill estimator, for ex¬ 
ample. This represents the basic trade-off in this 
approach. 

TECHNICAL APPENDIX 

In this appendix, we start with a more gen¬ 
eral view that better describes the conditional 
loss distribution in terms of certain character¬ 
istics in which AVaR appears as a special case. 
We continue with the notion of higher-order 
AVaR, generating a family of coherent risk mea¬ 
sures. Next, we provide an intuitive geomet¬ 
ric interpretation of the minimization formula 
for the AVaR calculation. We also provide a 
semi-analytic expression for the AVaR of sta¬ 
ble distributions and compare the expected tail 
loss measure to AVaR. Finally, we comment on 
the proper choice of a risk-aversion function in 
spectral risk measures, which does not result in 
an infinite risk measure. 

Characteristics of Conditional Loss 
Distributions 

In the entry, we defined AVaR as a risk mea¬ 
sure and showed how it can be calculated in 
practice. While it is an intuitive and easy to use 
coherent risk measure, AVaR represents the av¬ 
erage of the losses larger than the VaR at tail 
probability e, which is only one characteristic 
of the distribution of extreme losses. We re¬ 
marked that if the distribution function is con¬ 
tinuous, then AVaR coincides with ETL, which 
is the mathematical expectation of the condi¬ 


tional loss distribution. Besides the mathemati¬ 
cal expectation, there are other important char¬ 
acteristics of the conditional loss distribution. 
For example, AVaR does not provide any in¬ 
formation about how dispersed the conditional 
losses are around the AVaR value. In this sec¬ 
tion, we state a couple of families of useful char¬ 
acteristics in which AVaR appears as one exam¬ 
ple. 

Consider the following tail moment of order 
n at tail probability e, 

m”(X) = - f (Fx\t)) n dt (A.l) 
£ Jo 

where n = 1,2,..., F^ ;1 (f) is the inverse c.d.f. of 
the random variable X. If the distribution func¬ 
tion of X is continuous, then the tail moment of 
order n can be represented through the follow¬ 
ing conditional expectation 

m”(X) = £(X"|X < VaR e (X)) (A.2) 

where n = 1,2,... In the general case, if the c.d.f. 
has a jump at VaR e (X), a link exists between 
the conditional expectation and equation (A.l), 
which is similar to formula (A. 12) later in this 
appendix for AVaR. In fact, AVaR appears as 
the negative of the tail moment of order one, 
AVaR e (X) = —m](X). 

The higher-order tail moments provide addi¬ 
tional information about the conditional distri¬ 
bution of the extreme losses. We can make a 
parallel with the way the moments of a random 
variable are used to describe certain properties 
of it. In our case, it is the conditional distribution 
that we are interested in. 

In addition to the moments in"(X), we intro¬ 
duce the central tail moments of order n at tail 
probability e, 

M:'(X) = - f (F^(f) - ml(X)) n dt (A3) 
c Jo 

where m\(X) is the tail moment of order one. 
If the distribution function is continuous, then 
the central moments can be expressed in terms 
of the conditional expectation 

M;'(X) = E((X-m](X)) n \X < VaR e (X)) 
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The tail variance of the conditional distribu¬ 
tion appears as M 2 ( X) and the tail standard de¬ 
viation equals 

(M 2 (X)) 1/2 = 0 £ ~ ml(X)) 2 dt ) ‘ 

There is a formula expressing the tail variance 
in terms of the tail moments introduced in (A.2), 

M 2 (X) = m 2 (X) - «(X)) 2 

= m 2 (X) - ( AVaR e (X )) 2 

This formula is similar to the representation of 
variance in terms of the first two moments, 

a 2 = EX 2 -(EX) 2 

The tail standard deviation can be used to 
describe the dispersion of conditional losses 
around AVaR as it satisfies the general prop¬ 
erties of dispersion measures. It can be viewed 
as complementary to AVaR in the sense that if 
there are two portfolios with equal AVaRs of 
their return distributions but different tail stan¬ 
dard deviations, the portfolio with the smaller 
standard deviation is preferable. 

Another central tail moment that can be in¬ 
terpreted is Mf (X). After proper normalization, 
it can be employed to measure the skewness 
of the conditional loss distribution. In fact, if 
the tail probability is sufficiently small, the tail 
skewness will be quite significant. In the same 
fashion, by normalizing the central tail moment 
of order 4, we obtain a measure of kurtosis of 
the conditional loss distribution. 

In a similar way, we introduce the abso¬ 
lute central tail moments of order n at tail 
probability e, 

Ai»(X) = - / \Fx\t)-ml(X)\"dt (A.4) 
e Jo 

The tail moments /r”(X) raised to the power of 
1 /n, (//"(X)) 1 ''", can be applied as measures of 
dispersion of the conditional loss distribution if 
the distribution is such that they are finite. 

In the entry, we remarked that the tail of the 
random variable can be so heavy that AVaR be¬ 
comes infinite. Even if it is theoretically finite, 
it can be hard to estimate because the heavy 


tail will result in the AVaR estimator having 
a large variability. Thus, under certain condi¬ 
tions it may turn out to be practical to employ 
a robust estimator instead. The median tail loss 
(MTL), defined as the median of the conditional 
loss distribution, is a robust alternative to AVaR. 
It has the advantage of always being finite no 
matter the tail behavior of the random variable. 
Formally, it is defined as 

MTL f (X) = —F^ 1 (1/2|X < -VaR e (X)) (A.5) 

where F^ 1 (p|X < — VaR e (X)) stands for the in¬ 
verse distribution function of the c.d.f. of the 
conditional loss distribution 

F x (x|X<—VzR f (X)) 

= P(X< x|X < - VaR e (X)) 
P(X<x)/e, x < - VaR e (X) 

“1, x > —VaR f (X) 

In effect, MTL, as well as any other quantile of 
the conditional loss distribution, can be directly 
calculated as a quantile of the distribution of X, 

MTL e (X) = —F x 1 (e/2) 

= VaR e/2 (X) (A.6) 

where F^ 1 (p) is the inverse c.d.f. of X and c is 
the tail probability of the corresponding VaR in 
equation (A.5). Thus, MTL shares the properties 
of VaR. Equation (A.6) shows that MTL is not a 
coherent risk measure even though it is a robust 
alternative to AVaR, which is a coherent risk 
measure. 

In the universe of the three families of mo¬ 
ments that we introduced, AVaR is one special 
case providing only limited information. It may 
be the only coherent risk measure among them 
but the other moments can be employed in ad¬ 
dition to AVaR in order to gain more insight into 
the conditional loss distribution. Furthermore, 
it could appear that other reasonable risk mea¬ 
sures can be based on some of the moments. 
Thus, we believe that they all should be consid¬ 
ered in financial applications. 
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Higher-Order AVaR 

By definition, AVaR is the average of VaRs larger 
than the VaR at tail probability e. In the same 
fashion, we can pose the question of what hap¬ 
pens if we average all AVaRs larger than the 
AVaR at tail probability e. In fact, this quantity 
is an average of coherent risk measures and, 
therefore, is a coherent risk measure itself since 
it satisfies all defining properties of coherent 
risk measures. We call it AVaR of order one and 
denote it by AVaR ^ (X) because it is a derived 
quantity from AVaR. In this section, we consider 
similar derived quantities from AVaR which we 
call higher-order AVaRs. 

Formally, the AVaR of order one is represented 
in the following way 

AVaRf = - [ AVaR p (X)dp 
e Jo 

where AVaR p (X) is the AVaR at tail probability 
p. Replacing AVaR by the definition given in 
equation (1), we obtain 

AVaR? } = ~] j o (f Fx 1 (y)gp(y)‘ i y]dp 

= ~~J 0 F x 1 ( y) g P (y)dp^dy 


The AVaR of order one can be viewed as a spec¬ 
tral risk measure with (p e (y) being the risk aver¬ 
sion function. 

Similarly, we define the higher-order AVaR 
through the recursive equation 

AVaRffX) =- f AVaR ( f~ 1) {X)dp (A.8) 

€ Jo 

where AVaR ( p \X) = AVaR p (X) and n = 1,2... 
Thus, the AVaR of order two equals the average 
of AVaRs of order one, which are larger than 
the AVaR of order one at tail probability e. The 
AVaR of order n appears as an average of AVaRs 
of order n — 1. 

The quantity AVaR^'XX) is a coherent risk 
measure because it is an average of coherent 
risk measures. This is a consequence of the re¬ 
cursive definition in (A.8). It is possible to show 
that AVaR of order n admits the representation 

AVaRffX) = - f VaR y (X)~ (log -) dy (A.9) 
e Jo n - \ y / 

(n) 

and AVaR e (X) can be viewed as a spectral risk 
measure with a risk aversion function equal to 


where 

1/p, y e [0, p] 
0, y>p 


gpiy) = 


<P { :\y) = 


1 

en\ 

0, 



0 <y<e 
e <y<l 


and after certain algebraic manipulations, we 
get the expression 


AVaR { P(X) = -~ f Ff\y )log -dy 
f Jo y 

= / VaR y {X)Uy)dy (A.7) 
Jo 


As a simple consequence of the definition, the 
sequence of higher-order AVaRs is monotonic, 

AVaRfX) < AVaRffX) <...< 
AVaRd'fX) < ... 


In effect, the AVaR of order one can be ex¬ 
pressed as a weighted average of VaRs larger 
than the VaR at tail probability e with a weight¬ 
ing function <pfy) equal to 


<Mj/) = 



0, 


o <y<e 
6 <y<l 


In the entry, we remarked that if the random 
variable X has a finite mean, £|X| < oo, then 
AVaR is also finite. This is not true for spec¬ 
tral risk measures and the higher-order AVaR 
in particular. In line with the general theory de¬ 
veloped later in this appendix, AVaR^’Xx) is fi¬ 
nite if all moments of X exist. For example, if 
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the random variable X has an exponential tail, 
then AVaRe'\x ) < oo for any n < oo. 


The Minimization Formula for AVaR 

In this section, we provide a geometric inter¬ 
pretation of the minimization formula (2) for 
AVaR. We restate equation (2) in the following 
equivalent form, 

AVaRAX) = - mint+ E(-X - 0)+) (A.10) 

6 0eR 

where (x) + = max(i, 0). Note the similarity 
between equation (A. 10) and the definition of 
AVaR in (A.2). Instead of the integral of the 
quantile function in the definition of AVaR, a 
minimization formula appears in (A.10). We 
interpreted the integral of the inverse c.d.f. 
as the shaded area in Figure 2. Similarly, we 
will find the area corresponding to the objec¬ 
tive function in the minimization formula and 
we will demonstrate that as 0 changes, there 
is a minimal area that coincides with the area 
corresponding to the shaded area in Figure 2. 
Moreover, the minimal area is attained for 0 = 
VaR e (X) when the c.d.f. of X is continuous at 
VaR e (X). In fact, all illustrations in this section 
are based on the assumption that X has a con¬ 
tinuous distribution function. 

Consider first the expectation in equation 
(A.10). Assuming that X has a continuous c.d.f., 
we obtain an expression for the expectation in¬ 
volving the inverse c.d.f., 

E(— X — 0) + = / max(-i — 0, O)dFx(x) 

Jr 

i 

max(—F J ^ 1 (f) — 0, 0 )dt 

l 

min(F^ 1 (t) + 0, 0 )dt 

This representation implies that the expectation 
£(—X — 0) + equals the area closed between the 
graph of the inverse c.d.f. and a line parallel to 
the horizontal axis passing through the point 
(0, —0). This is the shaded area on the right plot 
in Figure A. 1. The same area can be represented 






Figure A.l Note: The shaded area is equal to the 
expectation E(-X - 0)+ in which X has a continu¬ 
ous distribution function. 

in terms of the c.d.f. This is done on the left plot 
in Figure A.l. 

Let us get back to equation (A.10). The tail 
probability e is fixed. The product 6X0 equals 
the area of a rectangle with sides equal to 6 and 
0. This area is added to E(—X — 0) + . Figure A.2 
shows the two areas together. The shaded ar¬ 
eas on the top and the bottom plots equal 6 x 
AVaR e (X). The top plot shows the case in which 
—0< —VaRAX). Comparing the plot to Figure 
A.l, we find out that adding the marked area 
to the shaded area we obtain the total area cor¬ 
responding to the objective in the minimization 
formula, 60 + E(—X — 0) + . If —0> —VhR 6 (X), 
then we obtain a similar case shown on the bot¬ 
tom plot. Again, adding the marked area to the 
shaded area we obtain the the total area com¬ 
puted by the objective in the minimization for¬ 
mula. By varying 0, the total area changes but 
it always remains larger than the shaded area 
unless 0 = VaR ( (X). 

Thus, when 0 = VaR e (X) the minimum area 
is attained, which equals exactly 6 x AVaR e (X). 
According to equation (A.10), we have to di¬ 
vide the minimal area by 6 in order to obtain 
the AVaR. As a result, we have demonstrated 
that the minimization formula in equation (2) 
calculates the AVaR. 
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Figure A.2 Note: The marked area is in addition 
to the shaded one. The marked area is equal to 
zero if 9 = VaR e (X) 


able for numerical work. It involves numerical 
integration but the integrand is nicely behaved 
and the integration range is a bounded inter¬ 
val. Numerical integration can be performed 
by standard toolboxes in many software pack¬ 
ages, such as MATLAB, for example. Moreover, 
there are libraries freely available on the In¬ 
ternet. Therefore, numerical integration itself 
is not a severe restriction for applying a for¬ 
mula in practice. Since the formula involves 
numerical integration, we call it a semi-analytic 
expression. 

Suppose that the random variable X has a sta¬ 
ble distribution with tail exponent a, skewness 
parameter /l, scale parameter a, and location 
parameter p, X e S„(er, If a < 1, then 

AVnR e (X) = oo. The reason is that stable distri¬ 
butions with a < 1 have infinite mathematical 
expectation and the AVaR is unbounded. 

If a > 1 and VaR e (X) ^ 0, then the AVaR can 
be represented as 

AVaR e (X) = crAta'P — p 

where the term A t/Q ,^ does not depend on the 
scale and the location parameters. In fact, this 
representation is a consequence of the positive 
homogeneity and the invariance property of 
AVaR. Concerning the term A e a p, 

a |VhR f (X)| 

= - - 

1 — a 7re 

nn/2 

/ g(d)exp(-\VaR e (X)\^ v(9))d0 

J-So 


AVaR for Stable Distributions 

Working with the class of stable distributions in 
practice is difficult because there are no closed- 
form expressions for their densities and distri¬ 
bution functions. Thus, practical work relies on 
numerical methods. 

Stoyanov et al. (2006) give an account of 
the approaches to estimating AVaR of sta¬ 
ble distributions. It turns out that there is a 
formula that is not exactly a closed-form ex¬ 
pression, such as the ones for the normal and 
Student's f AVaR stated in the entry, but is suit¬ 


where 


gm 


sin(a(0o + 0) — 26) a cos 2 9 
sin a(6o + 9) sin 2 a(0 o + 0) ’ 


v ( 9 ) = (cosq!0o)" - 

cos(a0o + (« 


cos 6 


sina(0o + 0) 

-m 


cos 0 


in which 9q = ^ arctan (fi tan^), Ji = 
—sign(VhR e (X))/j, and VaR, (X) is the VaR of the 
stable distribution at tail probability e. 
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If VaR e (X) — 0, then the AVaR admits a very 
simple expression. 


AVaR € (X) = 


2r (^) cos0 o 

(tt — 20q) (cos ado) 1 / 01 


in which T (x) is the gamma function and 6q = 2 
arctan(/itan™). 


ETL vs. AVaR 

The expected tail loss and the average value- 
at-risk are two related concepts. In the entry, 
we remarked that ETL and AVaR coincide if the 
portfolio return distribution is continuous at the 
corresponding VaR level. However, if there is a 
discontinuity, or a point mass, then the two no¬ 
tions diverge. Still, the AVaR can be expressed 
through the ETL and the VaR at the same tail 
probability. In this section, we illustrate this re¬ 
lationship and show why the AVaR is more ap¬ 
pealing. Moreover, it will throw light on why 
equation (6) should be used when considering 
a sample of observations. 

The ETL at tail probability e is defined as the 
average loss provided that the loss exceeds the 
VaR at tail probability e, 

ETL e (X) = —E(X\X < —VaR e (X)) (A.ll) 

As a consequence of the definition, the ETL can 
be expressed in terms of the c.d.f. and the in¬ 
verse c.d.f. Suppose additionally, that the c.d.f. 
of X has a jump at —VaR e (X). In this case, the 
loss VaR e (X) occurs with probability equal to 
the size of the jump and, because of the strict 
inequality in (A.ll), it will not be included in 
the average. 

Figure A.3 shows the graphs of the c.d.f. and 
the inverse c.d.f. of a random variable with a 
point mass at —VaR e (X). If e splits the jump of 
the c.d.f. as on the left plot in Figure A.3, then 
the ETL at tail probability e equals 

ETL e (X) = -E(X|X < -VaR e (X)) 

= —E(X\X < —VaR (0 (X)) 

= ETL eo (X) 



-VaR(X) 0 0 


Figure A.3 The C.D.F. and the Inverse C.D.F. of a 
Random Variable X with a Point Mass at -VaR e (X) 
Note: The tail probability e splits the jump of the 
c.d.f. 


In terms of the inverse c.d.f., the quantity 
ETL e0 (X) can be represented as 

ETL, 0 (X) = -- pFx\t)dt 
eo Jo 

The relationship between AVaR and ETL fol¬ 
lows directly from the definition of AVaR. 9 Sup¬ 
pose that the c.d.f. of the random variable X is 
as on the left plot in Figure A.3. Then, 

AVaR e (X) = -~ J 

= - 1 - (£ F xm+ [ e Fxm) 

= -- f ° Fx\t)dt+ ( —^ VaR € (X) 
f Jo e 

where the last inequality holds because the in¬ 
verse c.d.f. is flat in the interval [e q, e] and the 
integral is merely the surface of the rectangle 
shown on the right plot in Figure A.3. The inte¬ 
gral in the first summand can be related to the 
ETL at tail probability e and, finally, we arrive 
at the expression 

AVaR,(X) = — ETL e (X) + ‘-^-VaR^X) (A.12) 
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Equation (A. 12) shows that AVaR e (X) can be 
represented as a weighted average between the 
ETL and the VaR at the same tail probability as 
the coefficients in front of the two summands 
are positive and sum up to one. In the special 
case in which there is no jump, or if e — €\, then 
AVaR equals ETL. 

Why is equation (A. 12) important if in all 
statistical models we assume that the random 
variables describing return or payoff distribu¬ 
tion have densities? Under this assumption, not 
only are the corresponding c.d.f.s continuous 
but they are also smooth. Equation (A.12) is im¬ 
portant because if the estimate of AVaR is based 
on the Monte Carlo method, then we use a sam¬ 
ple of scenarios that approximate the nicely be¬ 
haved hypothesized distribution. Even though 
we are approximating a smooth distribution 
function, the sample c.d.f. of the scenarios is 
completely discrete, with jumps at the scenar¬ 
ios the size of which equals the 1 /m, where n 
stands for the number of scenarios. 

In fact, equation (6) given in the entry is actu¬ 
ally equation (A.12) restated for a discrete ran¬ 
dom variable. The outcomes are the available 
scenarios, which are equally probable. Consider 
a sample of observations or scenarios ri,.. .,r n 
and denote by r (1) < r (2 ) <.. .< r (u) the ordered 
sample. The natural estimator of the ETL at tail 
probability e is 

•j M-1 

ftw = Z >■« (A.13) 

1 1 k =1 

where \x~\ is the smallest integer larger than x. 

Formula (A.13) means that we average [mc] — 1 
of the \ne~\ smallest observations, which is, in 
fact, the definition of the conditional expecta¬ 
tion in (A.ll) for a discrete distribution. The 
VaR at tail probability e is equal to the negative 
of the empirical quantile, 

VaR e (r) = -r (M) (A. 14) 

It remains to determine the coefficients in 
(A.12). Having in mind that the observations in 
the sample are equally probable, we calculate 


that 

rnej - 1 

e o = - 

n 

Plugging eo, (A.14), and (A.13) into equation 
(A. 12), we obtain (6), which is the sample AVaR. 

Similarly, equation (10) also arises from 
(A.12). The assumption is that the underlying 
random variable has a discrete distribution but 
the outcomes are not equally probable. Thus, 
the corresponding equation for the average loss 
on condition that the loss is larger than the VaR 
at tail probability e is given by 

___ l h 

ETL e (r) = -(A-15) 

60 ;=i 

where eo = Xq= i Pj an d K is the integer satis¬ 
fying the inequalities 

k e k € +l 

J2Pi^ e< J2 Pi 

7=1 7=1 

The sum Y^ k f=i Pj stands for the cumulative 
probability of the losses larger than the the 
VaR at tail probability e. Note that equation 
(A.15) turns into equation (A.13) when the 
outcomes are equally probable. With these re¬ 
marks, we have demonstrated the connection 
between equations (6), (10), and (A.12). 

The differences between ETL and AVaR are 
not without any practical importance. In fact, 
ETL is not a coherent risk measure. Further¬ 
more, the sample ETL in (A.13) is not a smooth 
function of the tail probability while the sam¬ 
ple AVaR is smooth. This is illustrated in 
Figure A.4. The top plot shows the graph of 
the sample ETL and AVaR with the tail proba¬ 
bility varying between 1% and 10%. The sam¬ 
ple contains 100 independent observations on 
a standard normal distribution, X e N(0,1). The 
bottom plot shows the same but the sample is 
larger. It contains 250 independent observations 
on a standard normal distribution. 

Both plots demonstrate that the sample ETL 
is a step function of the tail probability while 
the AVaR is a smooth function of it. This is not 
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e 

Figure A.4 The Graphs of the Sample ETL and 
AVaR with Tail Probability Varying between 1% 
and 10% 

Note: The top plot is produced from a sample 
of 100 observations and the bottom plot from 
a sample of 250 observations. In both cases, 
X e N(0,1). 

surprising because, as e increases, new obser¬ 
vations appear in the sum in (A.13) producing 
the jumps in the graph of the sample ETL. In 
contrast, the AVaR changes gradually as it is 
a weighted average of the ETL and the VaR 
at the same tail probability. Note that, as the 
sample size increases, the jumps in the graph 
of the sample ETL diminish. In a sample of 
5,000 scenarios, both quantities almost overlap. 
This is because the standard normal distribu¬ 


tion has a smooth c.d.f. and the sample c.d.f. 
constructed from a larger sample better approx¬ 
imates the theoretical c.d.f. In this case, as the 
sample size approaches infinity, the AVaR be¬ 
comes indistinguishable from the ETL at the 
same tail probability. 10 

KEY POINTS 

• Although the value-at-risk (VaR) measure is 
a popular risk measure in the financial indus¬ 
try, it has a number of deficiencies. It is not a 
coherent risk measure because it does not sat¬ 
isfy the subadditivity property requirement 
of a coherent risk measure. 

• In contrast to VaR, the average value-at-risk 
measure (AVaR)—also referred to as condi¬ 
tional value-at-risk and expected shortfall—is 
a coherent risk measure and has other advan¬ 
tages that results in its greater acceptance in 
risk modeling. 

• There are convenient ways for computing and 
estimating AVaR that allow its application in 
optimal portfolio problems. 

• A more general family of coherent risk mea¬ 
sures is the spectral risk measure. The AVaR 
is a spectral risk measure with a specific risk- 
aversion function and is important for the 
proper selection of the risk-aversion function 
to avoid explosion of the risk measure. 

• There is connection between the theory of 
probability metrics and risk measures. Basi¬ 
cally, by choosing an appropriate probability 
metric one can guarantee that if two portfo¬ 
lio return distributions are close to each other, 
their risk profiles are also similar. 

NOTES 

1. In fact, X = 0.05\/3Z + 0.03 where Z has 
Student's t distribution with 4 degrees of 
freedom and Y has a normal distribution 
with standard deviation equal to 0.1 and 
mathematical expectation equal to 0.01. The 
coefficient of Z is chosen so that the stan¬ 
dard deviation of X is also equal to 0.1. 
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2. By comparing the c.d.f.s, we notice that the 
c.d.f. of X is "above" the c.d.f. of Y to the 
left of the crossing point, F x (x) > Fy(x), x < 
-0.15. 

3. This term is adopted in Rockafellar and 
Uryasev (2002). 

4. Equation (2) was first studied by Pflug 
(2000). A proof that equation (1) is indeed 
the AVaR can be found in Rockafellar and 
Uryasev (2002). 

5. As we remarked, AVaR e (X) can be infinite 
only if the mathematical expectation of X 
is infinite. Nevertheless, if this turns out to 
be an issue, one can use instead of AVaR 
the median of the loss distribution pro¬ 
vided that the loss is larger than VaR e (X) 
as a robust version of AVaR. The me¬ 
dian of the conditional loss is always finite 
and, therefore, the issue disappears but at 
the cost of violating the coherence axioms. 
The appendix to this entry provides more 
details. 

6. This formula is a simple consequence of 
the definition of AVaR for discrete distribu¬ 
tions; see the appendix to this entry. A de¬ 
tailed derivation is provided by Rockafellar 
and Uryasev (2002). 

7. For example, [3.11 = [3.81 = 4. 

8. A formal proof can be found in Rockafel¬ 
lar and Uryasev (2002). The reasoning in 
Rockafellar and Uryasev (2002) is based on 
the assumption that the random variable 
describes losses while in equation (10), the 
random variable describes the portfolio re¬ 
turn or payoff. 


9. Formal derivation of this relationship can 
be found, for example, in Rockafellar and 
Uryasev (2002). 

10. In fact, this is a consequence of the cel¬ 
ebrated Glivenko-Cantelli theorem claim¬ 
ing that the sample c.d.f. converges almost 
surely to the true c.d.f. 
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Abstract: The standard assumption in financial models is that the distribution for the return on fi¬ 
nancial assets follows a normal (or Gaussian) distribution and therefore the standard deviation (or 
variance) is an appropriate measure of risk in the portfolio selection process. This is the risk measure 
that is used in the well-known Markowitz portfolio selection model (that is, mean-variance model) 
which is the foundation for modern portfolio theory. With mounting evidence since the early 1960s 
that return distributions do not follow a normal distribution, researchers have proposed alterna¬ 
tive risk measures for portfolio selection. These risk measures fall into two disjointed categories: 
dispersion measures and safety-first measures. In addition, there has been considerable theoretical 
work in defining the features of a desirable risk measure. 


Most of the concepts in theoretical and empir¬ 
ical finance that have been developed over the 
last 50 years rest upon the assumption that the 
return or price distribution for financial assets 
follow a normal distribution. Yet, with rare ex¬ 
ception, studies that have investigated the va¬ 
lidity of this assumption since the 1960s fail to 
find support for the normal distribution. More¬ 
over, there is ample empirical evidence that 
many—if not most—financial return series are 
heavy-tailed and, possibly, skewed. The "tails" 
of the distribution are where the extreme values 


occur. Empirical distributions for stock prices 
and returns have found that the extreme values 
are more likely than would be predicted by the 
normal distribution. This means that between 
periods where the market exhibits relatively 
modest changes in prices and returns, there 
will be periods where there are changes that 
are much higher (that is, crashes and booms) 
than predicted by the normal distribution. This 
is not only of concern to financial theorists, 
but also to practitioners seeking, for example, 
to produce probability estimates for financial 
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risk assessment. To more effectively implement 
portfolio selection, alternative risk measures are 
needed. 

In this entry, we review alternative risk 
measures that can be employed in portfolio 
selection, which can accommodate non-normal 
return distributions. These risk measures are 
classified as dispersion measures and safety- 
first measures. We begin with a discussion of the 
desirable features of investment risk measures. 


DESIRABLE FEATURES 
OF INVESTMENT 
RISK MEASURES 

In portfolio theory, the variance of a port¬ 
folio's return has been historically the most 
commonly used measure of investment risk. 
However, different investors adopt different in¬ 
vestment strategies in seeking to realize their in¬ 
vestment objectives. Consequently, intuitively, 
it is difficult to believe that investors have come 
to accept only one definition of risk. Regula¬ 
tors of financial institutions and commenta¬ 
tors to risk measures proposed by regulators 
have proffered alternative definitions of risk. 
As noted by Dowd (2002, p. 1): 

The theory and practice of risk management— 
and, included with that, risk measurement— 
have developed enormously since the pioneer¬ 
ing work of Harry Markowitz in the 1950s. 
The theory has developed to the point where 
risk management/measurement is now re¬ 
garded as a distinct sub-field of the theory of 
finance. ... 

Szego (2004, p. 1) categorizes risk measures as 
one of the three major revolutions in finance and 
places the start of that revolution in 1997. The 
other two major revolutions are mean-variance 
analysis (1952-1956) and continuous-time mod¬ 
els (1969-1973). He notes that alternative risk 
measures have been accepted by practitioners 
but "rejected by the academic establishment 
and, so far discarded by regulators!" (Szego, 
2004, p. 4). 


Basic Features of Investment 
Risk Measures 

Balzer (2001) argues that a risk measure is in¬ 
vestor specific and, therefore, there is "no single 
universally acceptable risk measure." He sug¬ 
gests several features that an investment risk 
measure should capture. Here we describe the 
following three features: 

• Relativity of risk 

• Multidimensionality of risk 

• Asymmetry of risk 

The relativity of risk means that risk should be 
related to performing worse than some alter¬ 
native investment or benchmark. Balzer (1994, 
2001) and Sortino and Satchell (2001), among 
others, have proposed that investment risk 
might be measured by the probability of the 
investment return falling below a specified risk 
benchmark. The risk benchmark might itself be 
a random variable, such as a liability bench¬ 
mark (e.g., an insurance product), the infla¬ 
tion rate or possibly inflation plus some safety 
margin, the risk-free rate of return, the bottom 
percentile of return, a sector index return, a bud¬ 
geted return, or other alternative investments. 
Each benchmark can be justified in relation to 
the goal of the portfolio manager. Should per¬ 
formance fall below the benchmark, there could 
be major adverse consequences for the portfolio 
manager. 

In addition, the same investor could have 
multiple objectives and, hence, multiple risk 
benchmarks. Thus, risk is also a multidimen¬ 
sional phenomenon. However, an appropriate 
choice of the benchmarks is necessary in order 
to avoid an incorrect evaluation of opportuni¬ 
ties available to investors. For example, too of¬ 
ten, little recognition is given to liability targets. 
This is the major factor contributing to the un¬ 
derfunding of U.S. corporate pension sponsors 
of defined benefit plans. 1 

Intuition suggests that risk is an asymmet¬ 
ric concept related to the downside outcomes, 
and any realistic risk measure has to value and 
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consider upside and downside differently. The 
standard deviation considers the positive and 
the negative deviations from the mean as a po¬ 
tential risk. In this case overperformance rela¬ 
tive to the mean is penalized just as much as 
underperformance. 

Intertemporal Dependence and 
Correlation with Other Sources 
of Risk 

The standard deviation is a measure of disper¬ 
sion and it cannot always be used as a mea¬ 
sure of risk. The preferred investment does 
not always present better returns than the 
other. It could happen that the worst invest¬ 
ment presents the greatest return in some peri¬ 
ods. Hence, time could influence the investor's 
choices. 

Clearly, if the degree of uncertainty changes 
over time, the risk has to change during the time 
as well. In this case, the investment return pro¬ 
cess is not stationary; that is, we cannot assume 
that returns maintain their distribution unvar¬ 
ied in the course of time. In much of the research 
published, stationary and independent realiza¬ 
tions are assumed. The latter assumption im¬ 
plies that history has no impact on the future. 
More concrete, the distribution of tomorrow's 
return is the same independent of whether the 
biggest stock market crash ever recorded took 
place yesterday or yesterday's return equaled 
10 %. 

As a result, the oldest observations have 
the same weight in our decisions as the most 
recent ones. Is this assumption realistic? Re¬ 
cent studies on investment return processes 
have shown that historical realizations are not 
independent and present a clustering of the 
volatility effect (time-varying volatility). Those 
phenomena lead to the fundamental time- 
series model autoregressive conditional het- 
eroscedascity (ARCH) formulated by Engle 
(1981). In particular, the last observations have 
a greater impact in investment decisions than 
the oldest ones. Thus, any realistic measure of 


risk changes and evolves over time taking into 
consideration the heteroscedastic (time-varying 
volatility) behavior of historical series. An ex¬ 
amination of the returns of say the equity re¬ 
turn indexes such as the S&P 500 over some 
time period would show a propagation effect on 
another equity market, say the DAX 30. When 
we observe the highest peaks in one return in¬ 
dex series, for example, there is an analogous 
peak in the other return index series. This prop¬ 
agation effect is known as cointegration of the 
return series, introduced by the fundamental 
work of Granger (1981) and elaborated upon 
further by Engle and Granger (1987). The prop¬ 
agation effect in this case is a consequence of 
the globalization of financial markets—the risk 
of a country/sector is linked to the risk of the 
other countries/sectors. Therefore, it could be 
important to limit the propagation effect by di- 
verstfying the risk. As a matter of fact, it is 
largely proven that the diversification, oppor¬ 
tunely modeled, diminishes the probability of 
big losses. Hence, an adequate risk measure val¬ 
ues and models correctly the correlation among 
different investments, sectors, and markets. 


ALTERNATIVE RISK 
MEASURES FOR 
PORTFOLIO SELECTION 

The goal of portfolio selection is the construc¬ 
tion of portfolios that maximize expected re¬ 
turns consistent with individually acceptable 
levels of risk. Using both historical data and in¬ 
vestor expectations of future returns, portfolio 
selection uses modeling techniques to quantify 
"expected portfolio returns" and "acceptable 
levels of portfolio risk," and provides methods 
to select an optimal portfolio. 

It would not be an overstatement to say that 
modern portfolio theory as developed by Harry 
Markowitz (1952, 1959) has revolutionized the 
world of investment management. Allowing 
managers to appreciate that the investment 
risk and expected return of a portfolio can be 
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quantified has provided the scientific and objec¬ 
tive complement to the subjective art of invest¬ 
ment management. More importantly, whereas 
previously the focus of portfolio management 
used to be the risk of individual assets, the the¬ 
ory of portfolio selection has shifted the focus 
to the risk of the entire portfolio. This theory 
shows that it is possible to combine risky assets 
and produce a portfolio whose expected return 
reflects its components, but with considerably 
lower risk. In other words, it is possible to con¬ 
struct a portfolio whose risk is smaller than the 
sum of all its individual parts. 

Though practitioners realized that the risks of 
individual assets were related, prior to modern 
portfolio theory they were unable to formalize 
how combining them into a portfolio impacted 
the risk at the entire portfolio level or how the 
addition of a new asset would change the re¬ 
turn/risk characteristics of the portfolio. This 
is because practitioners were unable to quan¬ 
tify the returns and risks of their investments. 
Furthermore, in the context of the entire port¬ 
folio, they were also unable to formalize the 
interaction of the returns and risks across as¬ 
set classes and individual assets. The failure to 
quantify these important measures and formal¬ 
ize these important relationships made the goal 
of constructing an optimal portfolio highly sub¬ 
jective and provided no insight into the return 
investors could expect and the risk they were 
undertaking. The other drawback, before the 
advent of the theory of portfolio selection and 
asset pricing theory, was that there was no mea¬ 
surement tool available to investors for judging 
the performance of their investment managers. 

The theory of portfolio selection set forth by 
Markowitz was based on the assumption that 
asset returns are normally distributed. As a re¬ 
sult, Markowitz suggested that the appropriate 
risk measure is the variance of the portfolio's 
return and portfolio selection involved only 
two parameters of the asset return distribu¬ 
tion: mean and variance. Hence, the approach 
to portfolio selection he proposed is popularly 
referred to as mean-variance analysis. 


Markowitz recognized that an alternative to 
the variance is the semivariance. 2 The semivari¬ 
ance is similar to the variance except that, in the 
calculation, no consideration is given to returns 
above the expected return. Portfolio selection 
could be recast in terms of mean-semivariance. 
However, if the return distribution is symmet¬ 
ric, Markowitz (1959, p. 190) notes that "an anal¬ 
ysis based on (expected return) and (standard 
deviation) would consider these ... (assets) as 
equally desirable." He rejected the semivari¬ 
ance noting that the variance "is superior with 
respect to cost, convenience, and familiarity" 
and when the asset return distribution is sym¬ 
metric, either measure "will produce the same 
set of efficient portfolios." (Markowitz 1959, 
pp. 193-194). 

There is a heated debate on risk measures 
used for valuing and optimizing the investor's 
risk portfolio. In this section and the one to 
follow, we describe the various portfolio risk 
measures proposed in the literature and more 
carefully look at the properties of portfolio risk 
measures. 

According to the literature on portfolio theory, 
two disjointed categories of risk measures can 
be defined: dispersion measures and safety-first 
risk measures. In the remainder of this entry, we 
review some of the most well-known dispersion 
measures and safety-first measures along with 
their properties. 3 

In the following, we consider a portfolio of 
N assets whose individual returns are given by 
r \,..., r,y. The relative weights of the portfo¬ 
lio are denoted as X\, ... x n and, therefore, the 
portfolio return r p can be expressed as 

N 

r v = x\ • r\ H- x N -r N =y^ j x i - n 

i=i 

We also provide a sample version of the dis¬ 
cussed risk measures. The sample version will 
be based on a sample of length T of independent 
and identically distributed observations r p ‘\ 
k — 1,..., T of the portfolio return r p . These 
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observations can be obtained from a corre¬ 
sponding sample of the individual assets. 


DISPERSION MEASURES 

Several portfolio mean dispersion approaches 
have been proposed in the last few decades. 
The most significant ones are discussed below, 
and we provide for each measure an example 
to illustrate the calculation. 

Mean Standard Deviation 

In the mean standard deviation approach the dis¬ 
persion measure is the standard deviation of the 
portfolio return r p (see Markowitz, 1959, and 
Tobin, 1958): 

a(r v ) = ^E(r p -E(r p )f (1) 

The standard deviation is a special case of the 
mean absolute moment discussed below. The 
sample version can be obtained from the gen¬ 
eral case by setting p — 2. 

Mean Absolute Deviation 

In the mean absolute deviation (MAD) approach, 
the dispersion measure is based on the absolu¬ 
tion deviations from the mean rather than the 
squared deviations as in the case of the stan¬ 
dard deviation. 4 The MAD is more robust with 
respect to outliers. The MAD for the portfolio 
return r p is defined as 

MAD(r p ) = E(\r p - E(r p ) |) (2) 

Mean Absolute Moment 

The mean absolute moment (MAM(ij)) approach is 
the logical generalization of the MQ approach. 
Under this approach the dispersion measure is 
defined as 

MAM(r,, p) = (E(\r p - E(r p ) H) 1 ^, q > 1 

(3) 

Note that the mean absolute moment for q = 2 
coincides with the standard deviation and for 


q = 1 the mean absolute moment reduces to the 
mean absolute deviation. One possible sample 
version of (3) is given by 


MAM (r p ,q) = 


1 J 

N *=i 




where 


T 

(*) 

J Z—(' P 


1 1 

r = — y r 
T 


k =1 


denotes the sample mean of the portfolio return. 


Gini Index of Dissimilarity 

The index of dissimilarity is based on the measure 
introduced by Gini (1912,1921). 5 Gini objected 
to the use of the variance or the MAD because 
they measure deviations of individuals from the 
individual observations of the mean or location 
of a distribution. Consequently, these measures 
linked location with variability, two properties 
that Gini argued were distinct and do not de¬ 
pend on each other. He then proposed the pair¬ 
wise deviations between all observations as a 
measure of dispersion, which is now referred 
to as the Gini measure. 

While this measure has been used for the 
past 80 years as a measure of social and eco¬ 
nomic conditions, its interest as a measure of 
risk in the theory of portfolio selection is rela¬ 
tively recent. Interest in a Gini-type risk mea¬ 
sure has been fostered by Rachev (1991) and 
Rachev and Gamrowski (1995). Mathematically, 
the Gini risk measure for the random portfolio 
return r p is defined as 

GM(r p ,r b ) = Min{£|r p - r b \} (4) 

where the minimum is taken over all joint dis¬ 
tributions of ( r p , r b ) with fixed marginal distri¬ 
bution functions F and G: 

F(x) — P(r p < x) and 

G(x) — P(r b < x), x real 

Here r b is the benchmark return, say, the re¬ 
turn of a market index, or just the risk-free 
rate (U.S. Treasury rate or LIBOR, for example). 
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Expression (4) can be represented as the mean 
absolute deviation between the two distribu¬ 
tion functions F and G: 

+OO 

GM(r p ,rb)= J \F (x) — G(x)\dx 

— OO 

Given a sample or a distributional assumption 
for the benchmark return r/„ the latter expres¬ 
sion can be used for estimating the Gini index 
by calculating the area between the graphs of 
the empirical distribution function of r p and the 
(empirical) distribution function of 

Mean Entropy 

In the mean entropy (M-entropy) approach, the 
dispersion measure is the exponential entropy. 
Exponential entropy is a dispersion measure 
only for portfolios with continuous return dis¬ 
tribution because the definition of entropy for 
discrete random variables is formally different 
and does not satisfy the properties of the disper¬ 
sion measures (positive and positively homoge¬ 
neous). The concept of entropy was introduced 
in the last century in the classical theory of ther¬ 
modynamics. Roughly speaking, it represents 
the average uncertainty in a random variable. 

Probably its most important application in 
finance is to derive the probability density func¬ 
tion of the asset underlying an option on the ba¬ 
sis of the information that some option prices 
provide. 6 Entropy was used also in portfolio 
theory by Philippatos and Wilson (1972) and 
Philippatos and Gressis (1975) and is defined as 

Entropy =-E (log f(r p )) 

where / is the density of the portfolio return. 
Thus, the exponential entropy is given by 

EE(r p ) = e - £ ( lo s/M (5) 

The valuation of entropy can be done either by 
considering the empirical density of a portfolio 
or assuming that portfolio returns belong to a 
given family of continuous distributions and 
estimate their unknown parameters. 


Mean Colog 

In the mean colog (M-colog) approach, the dis¬ 
persion measure is the covariance between the 
random variable and its logarithm. 7 That is, the 
colog of a portfolio return is defined as 

Colog(l + r p ) = E(r p log(l + r p )) 

-E(r p )E(log(l + r p )) (6) 

Colog can easily be estimated based on a sam¬ 
ple of the portfolio return distribution by: 

T 

Colog(l + r p ) - J2 (rf> - f p ) • (log(l + jf 

1 k=1 

- log(l + r p )) 

where 

T 

log(l = ^ log (l + r<f>) 

1 k=1 

denotes the sample mean of the logarithm of 
one plus the portfolio return. 


SAFETY-FIRST RISK 
MEASURES 

Many researchers have suggested the safety- 
first rules as a criterion for decision making un¬ 
der uncertainty. 8 In these models, a subsistence, 
a benchmark, or a disaster level of returns is 
identified. The objective is the maximization of 
the probability that the returns are above the 
benchmark. Thus, most of the safety-first risk 
measures proposed in the literature are linked 
to the benchmark-based approach. 

Even if there are not apparent connections be¬ 
tween the expected utility approach and a more 
appealing benchmark-based approach, Castag- 
noli and LiCalzi (1996) have proven that the 
expected utility can be reinterpreted in terms 
of the probability that the return is above a 
given benchmark. Hence, when it is assumed 
that investors maximize their expected utility, 
it is implicitly assumed that investors minimize 
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the probability of the investment return falling 
below a specified risk benchmark. 

Although it is not always simple to iden¬ 
tify the underlying benchmark, expected util¬ 
ity theory partially justifies the using of the 
benchmark-based approach. Moreover, it is 
possible to prove that the two approaches are 
in many cases equivalent even if the economic 
reasons and justifications are different. 9 

Some of the most well-known safety-first risk 
measures proposed in the literature are de¬ 
scribed in the next section. 

Classical Safety First 

In the classical safety-first portfolio choice prob¬ 
lem the risk measure is the probability of loss or, 
more generally, the probability P* = P(r p <k) 
of portfolio return less than X. Generally, safety- 
first investors have to solve a complex, mixed 
integer linear programming problem to find the 
optimal portfolios. However, when short sales 
are allowed and return distributions are ellipti¬ 
cal, depending on a dispersion matrix Q and 
a vector mean /x, then there exists a closed- 
form solution to the investor's portfolio selec¬ 
tion problem: 

Minimize: P(r p < X) 

N 

Subject to: ffx t = 1, x, > 0 

;=l 

The interesting property of this optimization 
problem is that we are able to express the set of 
optimal portfolios explicitly as a function of the 
shortfall barrier X, the mean vector /x, and the 
dispersion matrix Q. The mean rn and the dis¬ 
persion a 2 of these optimal portfolios can again 
be expressed as a function of the threshold X, 
the mean vector /x, and the dispersion matrix 
Q. In the case where the elliptical family has 
finite variance (as, for example, the normal dis¬ 
tribution), then the dispersion a 1 corresponds 
to the variance. 

As the risk measure consists of the probabil¬ 
ity that the return falls below a given barrier 
X, we can estimate the risk measure by the ra¬ 
tio between the number of observations being 


smaller than X and the total number of observa¬ 
tions in the sample. 

Value at Risk 

Value at risk (VaR|_ f ,) is a closely related possi¬ 
ble safety-first measure of risk defined by the 
following equality: 

VaRj.^rp) = - min{z|(P(r p < z) > a)} (7) 

Here, 1 — a is denoted as the confidence level 
and a usually takes values like 1% or 5%. Theo¬ 
retically, the VaR figure defined by equation (7) 
can admit negative values. In reality, however, 
it is likely and often implicitly assumed that the 
VaR is positive, and it can be interpreted as the 
level at which the losses will not exceed with a 
probability of 1 — Q!%. Sometimes VaR is, there¬ 
fore, defined as the maximum of zero and the 
expression defined in equation (7) to guarantee 
a positive value for VaR. 

VaR can be used as a risk measure to deter¬ 
mine reward-risk optimal portfolios. Moreover, 
this simple risk measure can also be used by fi¬ 
nancial institutions to evaluate the market risk 
exposure of their trading portfolios. The main 
characteristic of VaR is that of synthesizing in 
a single value the possible losses that could oc¬ 
cur with a given probability in a given tem¬ 
poral horizon. This feature, together with the 
(very intuitive) concept of maximum probable 
loss, allows the nonexpert investor to figure out 
how risky the position is and the correcting 
strategies to adopt. Based on a sample of re¬ 
turn observations, VaR estimates coincide with 
the empirical alphaquantile. VaR and sophis¬ 
ticated methodologies for estimating VaR are 
explained in Chapter 14 of Rachev, Menn, and 
Fabozzi (2005). 

Conditional Value at Risk/Expected 
Tail Loss 

The conditional value at risk (CVaRi_„) or expected 
tail loss (ETL) is defined as: 

CVaRi_ ff (r p ) = £(max(-r p , 0) | - r p 
> VaRi - a (r p )) 


(8) 
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where VaRi_ Q ,(X) is defined in equation (7) and 
we assume that portfolio return distribution is 
continuous. 10 From this definition we observe 
that the CVaR can be seen as the expected short¬ 
fall assuming the VaR|_„(X) as the benchmark. 

A sophisticated estimation of CVaR depends 
strongly on the estimation of VaR. An explana¬ 
tion and illustration of the calculation of CVaR is 
provided in Rachev, Menn, and Fabozzi (2005). 
Based on a large sample of observations, a nat¬ 
ural estimate for CVaR can be obtained by av¬ 
eraging all observations in the sample that are 
smaller than the corresponding VaR estimate. 

MiniMax 

An alternative way to derive some safety-first 
optimal portfolios is minimizing the MmiMax 
(MM) risk measure (see Young, 1998). The 
MiniMax of a portfolio return is given by: 

MM(r p ) — — sup{c\P(r p < c) — 0} (9) 

This risk measure can be seen as an extreme 
case of CVaR. 

Lower Partial Moment 

A natural extension of semivariance is the lower 
partial moment risk measure (see Bawa, 1976, 
and Fishburn, 1977) also called dozvnside risk or 
probability-weighted function of deviations below a 
specified target return . This risk measure depends 
on two parameters: 

1. A power index that is a proxy for the in¬ 
vestor's degree of risk aversion. 

2. The target rate of return that is the minimum 
return that must be earned to accomplish the 
goal of funding the plan within a cost con¬ 
straint. 

The lower partial moment of a portfolio r p 
bounded from below is given by 

LPM(r p , q) = ^£(max(f - r p , 0)9) (10) 

where q is the power index and t is the target 
rare of return. 


Given a sample of return observations, we can 
approximate equation (10) as follows: 

1 7 

LPM (r p ,q) = ^max (k { p ] -r p , 6^ 

\| 1 Jc=l 

where as before 


r 


1 

T 


r 


E-f’ 


denotes the sample mean of the portfolio 
return. 


Power Conditional Value at Risk 

The power conditional value at risk measure, in¬ 
troduced in Rachev, Jasic, Biglova, and Fabozzi 
(2005), is the CVaR of the lower partial moment 
of the return. It depends on a power index that 
varies with respect to an investor's degree of 
risk aversion. Power CVaR generalizes the con¬ 
cept of CVaR and is defined as 

CVaR ?> i_ a (r p ) = E(max(-r p , O)' 1 \ - r p 

> VaR!_ a (r p )) (11) 

A sample version of power CVaR can be ob¬ 
tained in the same way as sample version for 
the regular CVaR, that is, one calculates the £/-th 
sample moment of all observations in the sam¬ 
ple that are smaller than the corresponding VaR 
estimate. 


KEY POINTS 

• While the underpinning of financial theory is 
that the distribution of the return on financial 
assets is normally distributed, little evidence 
supports this assumption. Consequently, the 
justification for the use of the standard devia¬ 
tion or variance as a measure of risk in finan¬ 
cial applications such as portfolio selection is 
difficult to justify. 

• Alternative risk measures that can accommo¬ 
date the properties of asset returns that have 
been observed in financial markets have been 
proposed. 
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• Alternative risk measures include dispersion 
measures and safety-first risk measures. 

• Dispersion measures include mean standard 
deviation, mean absolute deviation, mean ab¬ 
solute moment, index of dissimilarity, mean 
entropy, and mean colog. 

• Safety-first risk measures include classical 
safety first, value at risk, conditional value 
at risk, expected tail loss, MiniMax, lower 
partial moment, downside risk, probability- 
weighted function of deviations below a spec¬ 
ified target return, and power conditional 
value at risk. 

NOTES 

1. See Ryan and Fabozzi (2002). 

2. The mean semivariance approach was re¬ 
visited by Stefani and Szego (1976). 

3. For more details, see Rachev, Menn, and 
Fabozzi (2006). 

4. See Konno and Yamazaki (1991), Zenios 
and Kang (1993), Speranza (1993), and 
Ogryczak and Ruszczynski (2001). 

5. For a further discussion of this index, see 
Rachev (1991). 

6 . See Buchen and Kelly (1996) and Avel- 
laneda (1998). 

7. See Giacometri and Ortobelli (2001). 

8 . See, among others, Roy (1952), Tesler 
(1955/6), and Bawa (1976,1978). 

9. See Castagnoli and LiCalzi (1996, 1999), 
Bordley and LiCalzi (2000), Ortobelli and 
Rachev (2001), Rachev and Mittnik (2000, 
pp. 424—464), and Rachev, Ortobelli, and 
Schwartz (2004). 

10. See Bawa (1978), Uryasev (2000), and 
Martin, Rachev, and Siboulet (2003). 
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Back-Testing Market Risk Models 
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Abstract: Back-testing is the quantitative evaluation of a model, and back-testing a risk or prob¬ 
ability density forecasting model involves a comparison of the model's density forecasts against 
subsequently realized outcomes of the random variable whose density is forecast. One purpose 
of back-testing is to determine whether the forecasts are sufficiently close to realized outcomes to 
enable us to conclude that the forecasts are statistically compatible with those outcomes. Back-tests 
conducted for this purpose involve statistical hypothesis tests to determine if a model's forecasts 
are acceptable. Hypothesis tests can be applied to observations involving a loss that exceeds the 
value-at-risk at a given confidence interval, or they can be applied to forecasts of VaRs at multi¬ 
ple confidence intervals. A second purpose of back-testing is to assist risk managers to diagnose 
problems with their risk models and so help improve them. A third purpose of back-testing is to 
rank the performance of a set of alternative risk models to determine which model gives the "best" 
density forecast evaluation performance. 


To back-test a model is to evaluate it in quantita¬ 
tive terms, and back-testing a risk (or probability 
density forecasting) model involves a compari¬ 
son of the model's density forecasts against sub¬ 
sequently realized outcomes of the underlying 
random variable whose density is forecast. The 
importance of back-testing is self-evident: If risk 
managers are to have confidence in their risk 
models, then those models need to be properly 
back-tested and to have performed well under 
those back-tests. 

Back-tests can be used for three complemen¬ 
tary purposes. The first is to assess whether a 
model's density forecasts are statistically com¬ 
patible with the realized values of the under¬ 
lying random variable. The second purpose 
is diagnostic: to generate feedback about the 
model's potential weaknesses to assist the 


model builder and help him/her to "correct" 
the model. The third purpose is to rank alterna¬ 
tive models. A good risk model should fare well 
by all three criteria: It should pass its statistical 
tests, should not generate any worrying diag¬ 
nostics, and should rank well in comparison to 
alternative models. 

The archetypal market risk model is a model 
that forecasts the value at risk (VaR) of a port¬ 
folio over one or more confidence levels, for a 
specified horizon. We will assume for the most 
part that the horizon is a trading day. 

To back-test such a model, we need a dataset 
that consists of the model's forecasts, on the 
one hand, and the daily profits or losses (P/L) 
generated by the portfolio, on the other. The 
first task in back-testing is therefore to assemble 
such a dataset. For most market risk managers. 
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the forecasts themselves should be readily 
available. However, obtaining suitable profit 
and loss data is a more difficult problem than it 
might initially appear to be. The reason is that 
we do not need data on the profits or losses ac¬ 
tually generated by a portfolio, but data on the 
profits or losses attributable to the market risks 
taken: We want P/L data that reflect underlying 
market volatility rather than accounting pru¬ 
dence. We also need to clean our P/L data to get 
rid of components that are not directly related to 
current or recent market risk-taking. Such com¬ 
ponents include fee income, hidden and unre¬ 
alized P/L, earnings attributable to nonmarket 
risks, such as yields on corporate bonds, and 
the impact of intraday trading on P/L. 

Having obtained our dataset, the next stage 
is to carry out a preliminary data analysis. We 
should plot a back-testing chart—a plot of the 
realized P/L over time with the VaR forecasts 
superimposed on it—and look for any odd or 
outstanding features. It is also good practice 
to supplement back-testing charts with P/L 
histograms, which sometimes give a clearer 
indication of the empirical P/L distribution, 
and quantile-quantile (QQ) charts, which plot 
the quantiles of an empirical P/L distribution 
against those of a forecasted P/L distribution. 
It is also a good idea to examine summary 
P/L statistics, including the obvious statistics 
of mean, variance, skewness, kurtosis, range, 
and so on and the number and size of extreme 
observations. A preliminary data analysis can 
be very helpful in enabling practitioners to get 
to know their data and get a feel for any prob¬ 
lems they might encounter. 


STATISTICAL 

BACK-TESTING 

The first type of back-tests are statistical tests 
based on a hypothesis-testing paradigm. We 
first specify the null hypothesis that we wish 
to test—typically the null hypothesis is that the 
model is adequate—and select an alternative 


hypothesis to be accepted if the null is rejected. 
We then select a significance level and estimate 
the probability associated with the null hypoth¬ 
esis being "true." We would accept the null 
hypothesis if the estimated value of this prob¬ 
ability, the estimated prob-value, exceeds the 
chosen significance level, and we would reject it 
otherwise. The higher the significance level, the 
more likely we are to accept the null hypothe¬ 
sis, and the less likely we are to incorrectly reject 
a true model (that is, to make a Type I error). 
Unfortunately, it also means that we are more 
likely to incorrectly accept a false model (that 
is, to make a Type II error). Any test therefore 
involves a trade-off between these two types of 
possible error. Ideally, we should select a sig¬ 
nificance level that takes account of the likeli¬ 
hoods and costs of these errors and strikes an 
appropriate balance between them. However, 
in practice, it is common to select some arbi¬ 
trary significance level such as 5% and apply 
that level in all our tests. A significance level of 
this magnitude gives the model a certain ben¬ 
efit of the doubt, and implies that we would 
reject the model only if the evidence against it 
is reasonably strong. 

EXCEEDANCE-BASED 
STATISTICAL APPROACHES 

Suppose that we have a sample of n daily VaR 
forecasts VaR f and a corresponding sample of n 
realized loss outcomes L f , where t goes from 1 
to n. Lt is denominated in units in which real¬ 
ized losses are positive and realized profits are 
negative. 

Some common approaches to back-testing in¬ 
volve exceedance observations, where an ex¬ 
ceedance observation (also called a tail loss) is 
a loss that exceeds the VaR. These exceedance 
observations li t are obtained by putting our 
sample observations through the following 
transformation: 


l 1 ! 

if 

( L - 

> VaR t 

|o| 


|L, 

< VaR t 
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This transformation gives a unit value to all ob¬ 
servations where there is a loss exceeding VaR 
and a zero value to all other observations. 

Binomial (Kupiec) Approach 

We can now apply the basic frequency (or bino¬ 
mial) test suggested by Kupiec (1995): We test 
whether the observed frequency of exceedances 
is consistent with the frequency predicted by 
the model. In particular, under the null hypoth¬ 
esis that the model is "good," the number of ex¬ 
ceedances x follows a binomial distribution with 
probability p, where p is the tail probability or 
1 minus the confidence level. The probabil¬ 
ity of x exceedances given n observations is 
therefore: 

Prob(x|n, p) = p x ( 1 - p) n ~ x (2) 

Equation (2) also tells us that the only infor¬ 
mation required to implement a binomial test is 
information about the values of n, p, and x. This 
probability is then calculated using a suitable 
calculation engine (e.g., using the "binomdist" 
function in Excel). 

To illustrate, suppose n = 1,000 and we take 
the confidence level a to be 0.95. Our model 
therefore predicts that p = 1 - a = 0.05 and the 
null hypothesis is Ho: p = 0.05. We then expect 
np = 50 exceedances under the null. Now sup¬ 
pose that the number of exceedances, x, is 60. 
This corresponds to an empirical frequency, p, 
equal to 0.060. Since p, exceeds 0.05; we might 
specify a one-sided alternative hypothesis H: 
p > 0.05. The prob-value of the test is the prob¬ 
ability under the null that x > 60. This is most 
easily calculated as 1 — Pr[x < 59], which equals 
0.0867 given the values of n and p. At a conven¬ 
tional significance level such as 5%, we would 
then "pass" the model as acceptable. It is also 
clear that as x gets larger and moves further 
away from its predicted value of 50, then the 
probability of observing x exceedances will fall. 
Values of x with prob-values lower than our 
significance level would lead to rejections of 


the null hypothesis and a "fail" result for the 
model. In fact, if we work with a 5% signifi¬ 
cance level, it is straightforward to show that 
we would accept the null if x < 62 and reject it 
if x > 63. 

We can also apply binomial tests using a two- 
sided alternative hypothesis Hi: p ^ 0.05. We 
could do so by estimating a confidence interval 
for the number of exceedances and checking 
whether x lies within this interval. For example, 
if we want to test using a 5% significance level, 
we would estimate a 95% confidence interval 
for x, the bounds of which would delineate the 
lower and upper 5% tails of x's density function. 
With n — 1,000 and p = 0.05, the 95% confidence 
interval for x is [36, 66]. We would then accept 
the null if x falls within this range and otherwise 
reject it. 

A Normal Approximation 

Testing can be simplified further if we work 
with a normal approximation to the binomial. 
Provided n is sufficiently large—and n would 
be sufficiently large with the sample sizes that 
risk managers typically work with—then the 
distribution of x is approximately normal with 
mean np and variance np( 1 - p). This implies, in 
turn, that the variable z = (x — np)/yjnp( 1 — p) 
is distributed as standard normal, and we can 
test whether the observed value of z is compat¬ 
ible with this distribution. For instance, if we 
wished to carry out a two-sided test, we know 
that the 95% confidence interval for a standard 
normal is [—1.96, +1.96], so we would accept 
the null if (and only if) z falls in this range. 

Tests of Independence 

Besides predicting that x should be binomial 
or approximately normal with large samples, 
the null hypothesis of model adequacy often 
leads to the prediction that x should be indepen¬ 
dent. "Independence" means that there should 
be no temporal pattern in the x series that is, 
the probability of the next observation being an 
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exceedance should be independent of whether 
any previous observation was an exceedance 
or not. Where this prediction arises, it is impor¬ 
tant that it be tested too: A bad model might 
pass the earlier tests, but still be inadequate be¬ 
cause it produces predictable exceeedances or 
clusters of exceedances that ought not to arise. 
Evidence of exceedance clustering would sug¬ 
gest that the model is misspecified, even if the 
model has the correct exceedance frequency. 

One of the simplest independence tests is a 
runs test, in which we test whether the num¬ 
ber of runs in a time series is consistent with 
what we would expect under independence. 
We can apply a runs test to any data that are 
time-ordered and expressed in binary form, as 
is the case with observations in our x series that 
either take the value 0 or the value 1. A run is 
then a sequence of consecutive identical num¬ 
bers, and the number of runs R is equal to the 
number of sign changes plus 1. If 2 / is the num¬ 
ber of observations taking one value and v the 
number taking the other value, then under the 
independence null the mean and variance of the 
number of runs are, respectively: 


Pr = 1 


2uv 


U + V 

2uv(2uv — u — v) 

(u + v) 2 (u + v — 1) 


(3) 

(4) 


If the total number of observations is large, then 
R is approximately normal and z = (R — pr)/ctr 
approximately standard normal, and we can 
test accordingly. 

A more sophisticated version of the same idea 
is suggested by Engle and Manganelli (2004): 
They propose estimating a binary regression 
model—that is, they regress h t against possible 
explanatory variables, such as lagged returns or 
lagged squared returns—and then test for the 
joint insignificance of the explanatory variables. 
A binary regression approach is more powerful 
than a basic runs test because it can take account 
of the impact of other possible variables, which 
a runs test does not. 


Conditional Testing (Christoffersen) 
Approach 

We can also carry out tests of the distribution 
and independence of x within the same test¬ 
ing framework, and this takes us to the con¬ 
ditional back-testing approach of Christoffersen 
(1998). His idea is to separate out the partic¬ 
ular predictions being tested and then test each 
prediction separately. We begin by rephrasing 
the earlier frequency or unconditional coverage 
test in likelihood ratio (LR) form. 

Given that the observed frequency of ex¬ 
ceedances is x/n, then under the hypothesis/ 
prediction of correct unconditional coverage, 
the test statistic 


LR UC = —21n[(l - p) n ~ x p x ] 

+ 21 n[(l — x/n) n ~ x (x/n ) x ] 


(5) 


is distributed as a x 2 (l), a chi-squared with 1 de¬ 
gree of freedom. As we can see from equation 
(5), this boils down to a test of whether the em¬ 
pirical frequency x/n is "close" to the predicted 
frequency p. 

Turning to the independence prediction, let 
nij be the number of days that state j occurred 
after state i occurred the previous day, where 
the states refer to the occurrence or not of an ex¬ 
ceedance, and let t r,y be the probability of state; 
in any given day, given that the previous day's 
state was i. Under the hypothesis of indepen¬ 
dence, the test statistic 

LR md = -2In [(1 - 7t 2 ) noo+nu ft2 m+nn 

+ 2 In [(1 - 7 roi)”“ 7 r 0 T(l - TrnfV” 

( 6 ) 

is also distributed as a x 2 (l), and note that we 
can recover estimates of the probabilities from 

«oi 

Koi = -;- 

n oo + n oi 


wio + nn 

„ «oi + n n 

7T 2 = -;-;- 

Moo + «10 + n oi + n n 
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It follows that under the combined hypothesis 
of correct coverage and independence the test 
statistic 

LR CC = LR UC + LRj n d ( 8 ) 

is distributed as x 2 (2). The Christoffersen 
approach enables us to test both coverage and in¬ 
dependence hypotheses at the same time. More¬ 
over, if the model fails such a test, this approach 
enables us to test each hypothesis separately, 
and so establish whether the model fails be¬ 
cause of incorrect coverage or because of lack 
of independence. 

Strengths and Limitations of 
Exceedance-Based Approaches 

These exceedance tests have the advantages 
that they have a simple intuition, are easy to 
apply, and do not require a great deal of infor¬ 
mation. However, they often lack power (that is, 
the ability to identify bad models) except with 
very large sample sizes, because they throw po¬ 
tentially valuable information away: Focusing 
on tests of exceedances over VaR at a given con¬ 
fidence level is equivalent to throwing away in¬ 
formation about the model's forecasts of VaRs 
at other confidence levels, and this discarded 
information often includes useful information 
about the sizes of tail losses predicted by a risk 
model (or information about VaRs at higher 
confidence levels). This can mean that a "bad" 
risk model will pass an exceedance-based test if 
it generates an acceptably accurate frequency of 
exceedances, even if its forecasts of losses larger 
than VaR are very poor. 

STATISTICAL 

BACK-TESTING OF VaRs AT 
MULTIPLE CONFIDENCE 
LEVELS 

This line of reasoning suggests that we should 
consider back-testing the performance of a 
model's VaR forecasts over multiple confidence 


levels. Indeed, pushed to the limit, it suggests 
that we consider back-testing a model's VaR 
forecasts over all confidence levels at the same 
time. We would proceed by applying the fol¬ 
lowing transformation: 

Pt = Ft(X t ) (9) 

where F t (.) is the (typically time-dependent) 
probability-integral transformation (PIT) that 
maps the realized one-day loss or profit, X f , to 
its cumulative density value, where the fore¬ 
cast is made the previous day. So, for example, 
if our model specifies that losses are standard 
normal, then a value Xf = 1.645 would give us 
pt — Ff(1.645) = 0.95, and so forth. 

We can now deduce that pt is stationary and 
distributed as standard uniform under the hy¬ 
pothesis that the VaR model is adequate. p t is 
also independent because consecutive values 
of pt have no common factors. Hence pt is pre¬ 
dicted to be independent and identically dis¬ 
tributed (IID) 11(0,1) under the null hypothesis. 

As an aside, it is worth noting at this point that 
the independence assumption does not arise in 
cases where we have a multi-step-ahead as op¬ 
posed to a one-step-ahead VaR model: An ex¬ 
ample of the latter is a VaR model that produces 
daily VaR forecasts over a daily forecast hori¬ 
zon; an example of the former is a VaR model 
that produces daily VaR forecasts over a multi¬ 
day horizon. The forecast horizon is equal to 
one day in the one case, and equal to more 
than one day in the other. The pt are predicted 
to be independent for one-day-ahead VaR fore¬ 
casts because consecutive observations are not 
affected by common shocks; however, for mul¬ 
tiday forecasts, there is no independence pre¬ 
diction because consecutive pt observations are 
subject to at least one common random factor. 
For example, the two-day return over Monday 
and Tuesday and the two-day return over Tues¬ 
day and Wednesday are both affected by the 
Tuesday daily return. This means that they have 
a common random factor and are therefore not 
independent. We will ignore multistep-ahead 
models in the rest of our discussion, but the 
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reader should keep in mind that we cannot as¬ 
sume independence for multi-step-ahead mod¬ 
els or regard independence tests applied to such 
models as tests of model adequacy. 


Testing Uniformity 

Returning to the one-step-ahead case, we can 
now test our model by applying conventional 
uniformity tests. One of the best known of these 
is the Kolmogorov-Smirnov (KS) test. The KS 
test statistic D is then the maximum distance 
between the predicted cumulative density F(x), 
which is a 45-degree line, and the empirical cu¬ 
mulative density P ( x ), evaluated over each data 
point X t : 

D = max | F (X t ) — P (X t ) | (10) 

The test value of the KS statistic is then com¬ 
pared to the relevant critical value and the null 
is accepted or rejected accordingly. This test is 
easy to implement because the test statistic is 
straightforward to calculate and its critical val¬ 
ues are easily obtained using Monte Carlo sim¬ 
ulation. However, the KS test tends to be more 
sensitive to the distributional differences near 
the center of the distribution, and is less sensi¬ 
tive at the tails. This is obviously a drawback 
when back-testing VaR models, where we are 
usually much more interested in the tail than in 
the central mass of a distribution. 

A way around this latter problem is to replace 
the KS test with a Kuiper test. The Kuiper test 
statistic D* is the sum of the maximum amount 
by which each distribution exceeds the other: 

D* = max |F(X f ) — £(X f )| 

f (11) 
+ max | P (X f ) — F(X f ) 

The Kuiper test can be implemented in much 
the same way as the KS test: Its test statistic is 
straightforward to calculate and its critical val¬ 
ues can be obtained by Monte Carlo simulation. 
The Kuiper test has the advantage over the KS 
test that it is more sensitive to deviations in the 
tail regions. It is also believed to be more robust 


to transformations in the data, and to be good 
at detecting cyclical and other features in the 
data. However, there is also evidence that it is 
very data intensive and needs large datasets to 
get reliable results. 

We can also test uniformity by applying a text¬ 
book x 2 test to binned (or classified) data). We 
divide the data into k classes and then compute 
the test statistic: 


E 


(Q» - Ej ) 2 

Ei 


( 12 ) 


where O, is the observed frequency of data in 
bin i, and £, is the expected frequency of data in 
bin i. Under the null hypothesis, this test statis¬ 
tic is distributed as y 2 (k — c), where c is the num¬ 
ber of estimated parameters in the VaR model. 
The main disadvantage of the y 1 test is that re¬ 
sults are dependent on the way in which the 
data are binned and binning is (largely) arbi¬ 
trary. In using it, we should be careful to check 
the sensitivity of results to alternative ways of 
binning the data. 


Applying the Berkowitz 
Transformation and Testing for 
Standard Normality 

It is often more convenient to put the p t through 
a second (or Berkowitz) transformation to make 
them standard normal under the null of model 
adequacy; that is, we work with the trans¬ 
formed variable: 


z t = ® Hp t ) (13) 

where <E>(.) is the standard normal distribu¬ 
tion function (see Berkowitz, 2001). This sec¬ 
ond transformation is helpful because testing 
for standard normality is more convenient than 
testing for standard uniformity, and because a 
normal variable is more convenient when deal¬ 
ing with temporal dependence. Under the null, 
Zf will be distributed as IID standard normal 
[denoted by IID N(0,1)]. 
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Testing model adequacy now boils down to 
testing whether z t is distributed as IID N(0,1). 
There are two distinct tasks here: 

1. We need to test whether z t is N(0,1), taking as 
given that z f is IID, and there are various tests 
we might apply. If Zf is standard normal, then 
it should have a zero mean, a variance of 1, a 
zero skew, and a kurtosis of 3. Assuming IID, 
we can test the mean prediction using a z-test 
or t- test, we can test the variance prediction 
using a variance ratio test, and we can test the 
skewness and kurtosis predictions using a 
Jarque-Bera test, which can also be regarded 
as a test of normality itself. All these tests are 
conventional textbook tests and are easy to 
apply. 

2. We need to test whether z* is IID, and there 
are many tests of the IID prediction. These 
include runs and binary regression tests, 
which we have already discussed above. We 
can also estimate the autocorrelation struc¬ 
ture of our Zf observations or fit an autore¬ 
gressive moving average (ARMA) process to 
them. All the parameters in an autocorrela¬ 
tion function or an ARMA process should be 
insignificant, and we can test for their sig¬ 
nificance using standard tests such as a Box- 
Pierce Q test. Another possibility, if we have 
enough data, is to test independence using 
a BDS test (Brock et al., 1987): a BDS test is 
very powerful, but also data-intensive. 

Since the hypothesis of model adequacy pre¬ 
dicts bothN(0,l) and IID, it is important to note 
that the model must "pass" both types of test if 
it is to "pass" overall. 

Tests Applied to Truncated 
Distributions 

There are also situations where we are only 
interested in part of the P/L distribution: For 
example, we might be interested only in the 
distribution of losses in excess of VaR. If we are 
working to a confidence level a, we can take 
our earlier pt series and delete all nontail obser¬ 


vations from it. We then end up with a series 
that is IID uniformly distributed over the in¬ 
terval [0,1 — a], and this implies that pj (1 — 
a) is IID uniformly distributed over the interval 
[0,1]. We can test this prediction using one of the 
uniformity tests discussed earlier. If we wish to, 
we can apply the Berkowitz transformation to 
Pf/(1 — a) to obtain the series Zf = 0 _1 (pf/(l — 
a)), which is distributed as IID N(0,1) under the 
null. We can then apply the tests just discussed. 


USING BACK-TESTS FOR 
DIAGNOSTIC PURPOSES 

We can also modify many of these back-test 
procedures to help diagnose problems with our 
VaR model. Model diagnosis is a key ingredient 
to successful model building, and requires the 
modeler to be on the lookout for evidence of 
possible problems. So, to use an earlier exam¬ 
ple, if we have 60 exceedances out of a sample of 
1,000 and we are operating to a VaR at the 95% 
confidence level, then we know that this is asso¬ 
ciated with a prob-value of 0.0867. Were we car¬ 
rying out a formal back-test of model adequacy 
at a conventional significance level such as 5%, 
we would dismiss this result as statistically in¬ 
significant because the significance level gives 
the model the benefit of the doubt. However, for 
diagnostic purposes we do not wish to give the 
model the benefit of the doubt: Instead, we are 
looking for evidence "against" the model, even 
if that evidence is statistically "weak." In these 
circumstances, a result like this would lead us 
to suspect whether the model has a tendency 
to underestimate the VaR. A wise risk manager 
would then start to ask whether other evidence 
could be found that would confirm or refute 
this suspicion. And, to put the same point a 
little differently, the last thing a risk manager 
should do in the face of such evidence is to wait 
and do nothing till the evidence has become 
overwhelming: The risk manager should act in 
a timely manner on the basis of any reasonable 
evidence available. 
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Independence tests can also be useful diag¬ 
nostic tools. If we apply an independence test 
and the test result gives us some (not neces¬ 
sarily strong) reason to suspect that the model 
does not satisfy a valid independence predic¬ 
tion, then we can interpret this evidence as 
suggesting that there might be some dynamic 
misspecification in our model: Even if the broad 
coverage is about right, there might still be 
something wrong with the updating of our VaR 
forecasts from one day to the next. So, for exam¬ 
ple, if we have a parametric VaR model, then we 
might suspect that a key parameter in the model 
was not being updated efficiently, and the obvi¬ 
ous suspects would be volatility or correlation 
parameters. Again, the evidence might be sta¬ 
tistically "weak," but even weak evidence can 
be useful in pointing to areas of weakness in the 
model. 

Another useful diagnostic is provided by em¬ 
pirical moments of the Berkowitz-transformed 
series (see equation (13) above), which we saw 
earlier are predicted to be standard normal un¬ 
der the null of model adequacy. Some very use¬ 
ful diagnostic information can then be obtained 
by estimating their sample moments and con¬ 
sidering any departures from their predicted 
values: 

* If the sample mean is different from zero, we 
might suspect whether the model's forecasts 
are biased in one direction or the other. 

• If sample variance is less than 1, we might sus¬ 
pect that the model's predicted dispersion is 
too low, in which case the model might over¬ 
estimate risk; and if the sample variance is 
greater than 1, we might suspect that the pre¬ 
dicted dispersion is too high and the model 
underestimates risk. 

• If the sample skew is positive or negative, we 
might suspect that the forecasts are skewed 
in one direction or the other. 

* If the sample kurtosis is less than 3 or (as 
is more likely in risk management contexts) 
bigger than 3, we might ask ourselves if the 
model is overestimating or underestimating 
its tails. 


In each of these cases, we should also check 
the strength of the evidence and we can do so 
by applying the relevant tests and checking out 
their prob-values: The lower the prob-value, the 
stronger the evidence against the model. How¬ 
ever, since we are especially concerned in risk 
management with the possibility that the model 
might underestimate risks, then a sample vari¬ 
ance that considerably exceeds 1 or a sample 
kurtosis that considerably exceeds 3 is poten¬ 
tially important evidence that might warrant 
further scrutiny. 


RANKING ALTERNATIVE 
MODELS 

It is often the case that we are interested in 
how different models compare to each other. 
We can compare models using forecast evalua¬ 
tion methods that give each model a score in 
terms of some loss function; we then use the 
loss scores to rank the models—the lower the 
loss, the better the model. These approaches 
are not statistical tests of model adequacy and 
this means that they do not suffer from the 
low power of tests such as frequency tests: This 
makes them attractive for back-testing with the 
datasets typically available in real-world appli¬ 
cations. In addition, they also allow us to tailor 
the loss function to take account of particular 
concerns: For example, we might be more con¬ 
cerned about higher losses than lower losses, 
and might therefore wish to give higher losses 
a greater weight in our loss function. 

The ranking process has four key ingredients 
for each model: 

1. A set of n paired observations—paired ob¬ 
servations of losses or profits for each period 
and their associated VaR forecasts. 

2. A loss function that gives each observation 
a score Cf depending on how the observed 
loss or profit compares to the VaR forecasted 
for that period. 

3. A benchmark, which gives us an idea of the 
score we could expect from a "good" model. 
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4. A score function, which takes as its inputs 
our loss-function and benchmark values. 


We need to specify the loss function, and a 
number of different loss functions have been 
proposed. Perhaps the most straightforward is 
the binary loss function proposed by Lopez 
(1998, p. 121), which gives exceedance obser¬ 
vations a value of 1 and other observations a 
value of 0. Q is then as follows: 


1 L t > VaR t 
0 1 L,< VaR t 


(14) 


This loss function is intended for the user who 
is (exclusively) concerned with the frequency 
of exceedances. The natural benchmark for this 
loss function is p, the exceedance probability or 
expected value of E(Cf). If we take our bench¬ 
mark to be the expected value of Cf under the 
null hypothesis that the model is "good," then 
Lopez (1998) suggests that a good choice of 
score function is the following quadratic prob¬ 
ability score (QPS) function: 

QPS = 2 J2(Ct-p) 2 (15) 

i=i 


The QPS takes a value in the range [0,2], and 
the closer the QPS-value to zero, the better the 
model. We can therefore use the QPS (or some 
similar score function) to rank our models, with 
the better models having the lower scores. In ad¬ 
dition, the QPS criterion has the attractive prop¬ 
erty that it (usually) encourages truth-telling by 
VaR modelers: If VaR modelers wish to mini¬ 
mize their QPS score, they will (usually) report 
their VaRs "truthfully." This is a useful property 
in situations where the back-tester and the VaR 
modeler are different, and where the back-tester 
might be concerned about the VaR modeler re¬ 
porting false VaR forecasts to alter the results of 
the back-test. 

A drawback of this loss function is that it ig¬ 
nores the magnitude of tail losses. If we wish 
to remedy this defect, Lopez suggests a second. 


size-adjusted, loss function: 


1 + (L, - VaRt) 2 ., U > VaR 
0 1 L, < VaR t 

(16) 


This loss function allows for the sizes of tail 
losses in a way that (15) does not: A model 
that generates higher tail losses would gener¬ 
ate higher values of (16) than one that generates 
lower tail losses, other things being equal. How¬ 
ever, with this loss function, there is no longer 
a straightforward condition for the benchmark, 
so we need to estimate the benchmark by some 
other means (e.g., Monte Carlo simulation). The 
size-adjusted loss function (17) also has the 
drawback that it loses some of its intuition, be¬ 
cause squared monetary returns have no ready 
monetary interpretation. 

A way around this last problem is suggested 
by Blanco and Ihle (1998), who suggest the fol¬ 
lowing loss function: 


I (L t — VaR t )/VaR t L t > VaR t 
C f = \ if 

l 0 L t < VaR t 

(17) 

This loss function gives each tail-loss observa¬ 
tion a weight equal to the tail loss divided by the 
VaR. This has a nice intuition and ensures that 
higher tail losses get awarded higher Cf val¬ 
ues without the impaired intuition introduced 
by squaring the tail loss. The benchmark for this 
forecast evaluation procedure is also easy to de¬ 
rive: The benchmark is equal to the difference 
between the Expected Shortfall (ES) and the 
VaR, divided by the VaR. However, the Blanco- 
Ihle loss function also has a problem of its own: 
Because (17) has the VaR as its denominator, it 
is not defined if the VaR is zero, and can give 
awkward answers if VaR gets "close" to zero 
or becomes negative. We should therefore only 
use it if we can be confident of the VaR being 
sufficiently large and positive. 

We therefore seek a size-based loss function 
that avoids the squared term in the second 
Lopez loss function, but also avoids denomi¬ 
nators that might be zero-valued. A promising 
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candidate is the tail loss itself: 

L f L f > VaR 

if (18) 

o L f < vaR f 

The expected value of the tail loss is of course 
the ES, so we can choose the ES as our bench¬ 
mark and use a quadratic score function such 
as: 

2 " 

QS = - Y(Ct - ESt ) 2 (19) 

n *—' 

t =l 

This approach penalizes deviations of tail losses 
from their expected value, which makes intu¬ 
itive sense. Moreover, because it is quadratic, it 
gives very high tail losses much greater weight 
than more common tail losses, and thereby 
comes down hard on large losses. 


KEY POINTS 

• In general, back-testing is the quantitative 
evaluation of a model. When back-testing is 
applied to a risk or probability density fore¬ 
casting model, it involves a comparison of 
the model's density forecasts against sub¬ 
sequently realized outcomes of the random 
variable whose density is forecast. 

• The main purposes of back-testing market 
risk models are to test model adequacy, to 
diagnose potential model problems, and to 
compare or rank alternative models. A good 
risk model should fare well by all three crite¬ 
ria: It should pass its statistical tests, should 
not generate any worrying diagnostics, and 
should rank well in comparison to alternative 
models. 

• Because the typical market risk model is a 
model that forecasts the value-at-risk of a 
portfolio over one or more confidence levels 
for a specified horizon, back-testing of mar¬ 
ket risk models involves some comparison of 
VaR forecasts against subsequently realized 
values of profit or loss. 

• Formal tests of market risk model adequacy 
can be applied to the frequency and inde¬ 


pendence of exceedance observations, but can 
also be applied to forecasts of VaR at multiple 
confidence levels. 

• Comparable approaches can be used for 
model diagnostic purposes, where the main 
concern is not to test model adequacy in a 
formal way, as such, but instead to gather ev¬ 
idence of possible model misspecification. 

• Simple loss-scoring approaches can be used to 
rank the forecast performance of alternative 
models. 
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Abstract: The measurement of liquidity risk risks is an underdeveloped area of market risk measure¬ 
ment and management. Liquidity issues affect the estimation of conventional market risk measures, 
but the measurement of liquidity risks is an important subject in its own right. Liquidity issues also 
figure prominently in periods of market crisis. There are various easily implementable and often 
complementary approaches to the estimation of liquidity-adjusted Value-at-Risk: These involve 
modeling the bid-ask spread or the liquidity discount incurred when liquidating a position. There 
are also approaches to the modelling of Liquidity-at-Risk, which deals with the riskiness of cash 
flows, in both noncrisis and crisis situations. 


Market practitioners often assume that markets 
are liquid—that is, that we can liquidate or un¬ 
wind positions at going market prices, usually 
taken to be the mean of bid and ask prices, with¬ 
out too much difficulty or cost. This assumption 
is very convenient and provides a justification 
for the practice of marking positions to market 
prices. However, it is often empirically ques¬ 
tionable and the failure to allow for it can seri¬ 
ously undermine market risk measurement. In 
any case, liquidity risk is a major risk factor in 
its own right, and we will often want to measure 
it too. 

This entry looks at liquidity issues and how 
they affect the estimation of market and liquidity 
risk measures. Liquidity issues affect market risk 
measurement through their impact on standard 
measures of market risk. In addition, because 
effective market risk management involves an 
ability to estimate and manage liquidity risk it¬ 
self, we also need to be able to estimate liquidity 
risk—or liquidity-at-risk. Finally, since liquidity 


problems are particularly prominent in market 
crises, we also need to address how to estimate 
crisis-related market risks and liquidity risks. 
Accordingly, the main themes of this entry are: 

• The nature of market liquidity and illiquidity, 
and their associated costs and risks. 

• The estimation of value-at-risk (VaR) in illiquid 
or partially liquid markets— liquidity-adjusted 
VaR (or LVaR). 

• Estimating liquidity-at-risk (LaR). 

• Estimating crisis-related liquidity risks. 

For convenience, and to be faithful to the lit¬ 
erature, we focus on the (discredited, but com¬ 
putationally convenient) VaR risk measure, but 
we should note that any of the approaches 
suggested here can be adapted to estimate 
superior risk measures such as coherent risk 
measures (see, e.g., Artzner et al., 1999) or any 
other quantile-based risk measures. For exam¬ 
ple, estimates of these alternative risk measures 
can be obtained using the "average tail VaR" 
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approach set out in Dowd (2005, Chapter 3): 
This is based on the idea that, since the VaR is a 
quantile, any of these other quantile-based risk 
measurements can be estimated as a weighted 
average of VaRs predicated on a suitable range 
of confidence levels. 

LIQUIDITY AND LIQUIDITY 
RISKS 

The notion of liquidity refers to the ability of a 
trader to execute a trade or liquidate a position 
with little or no cost, risk, or inconvenience. Liq¬ 
uidity is a function of the market, and depends 
on such factors as the number of traders in the 
market, the frequency and size of trades, the 
time it takes to carry out a trade, and the cost 
(and sometimes risk) of transacting. It also de¬ 
pends on the commodity or instrument traded, 
and more standardized instruments (e.g., such 
as FX or equities) tend to have more liquid mar¬ 
kets than nonstandardized or tailor-made in¬ 
struments (e.g., such as over-the-counter [OTC] 
derivatives). Markets vary greatly in their liq¬ 
uidity: Markets such as the FX market and the 
big stock markets are (generally) highly liquid; 
but other markets are less so, particularly those 
for many OTC instruments and instruments 
that are usually held to maturity and, hence, are 
rarely traded once initially bought. Flowever, 
even the "big" standardized markets are not 
perfectly liquid—their liquidity fluctuates over 
time and can fall dramatically in a crisis—so we 
cannot take their liquidity for granted. 

Imperfect liquidity also implies that there is 
no such thing as the going market price. Instead, 
there are two going market prices—an ask price, 
which is the price at which a trader sells, and 
a (lower) bid price, which is the price at which 
a trader buys. The "market" price often quoted 
is just an average of the bid and ask prices, and 
this price is fictional because no one actually 
trades at this price. The difference between the 
bid and ask prices is a cost of liquidity, and 
in principle we should allow for this cost in 
estimating market risk measures. 


The bid-ask spread also has an associated risk, 
because the spread itself is a random variable. 
This means there is some risk associated with 
the price we can obtain, even if the fictional mid¬ 
spread price is given. Other things being equal, 
if the spread rises, the costs of closing out our 
position will rise, so the risk that the spread will 
rise should be factored into our risk measures 
along with the usual market price risk. 

We should also take account of a further dis¬ 
tinction. If our position is small relative to the 
size of the market (e.g., because we are a very 
small player in a very large market), then our 
trading should have a negligible impact on the 
market price. In such circumstances we can re¬ 
gard the bid-ask spread as exogenous to us, and 
we can assume that the spread is determined by 
the market beyond our control. Flowever, if our 
position is large relative to the market, our activ¬ 
ities will have a noticeable effect on the market 
itself and can affect both the market price and 
the bid-ask spread. For example, if we suddenly 
unload a large position, we should expect the 
market price to fall and the bid-ask spread to 
widen. This is partly because there is a limited 
market, and prices must move to induce other 
traders to buy. A second reason is a little more 
subtle: Large trades often reveal information, 
and the perception that they do will cause other 
traders to revise their views. Consequently, a 
large sale may encourage other traders to revise 
downward their assessment of the prospects for 
the instrument concerned, and this will further 
depress the price. In these circumstances the 
market price and the bid-ask spread are to some 
extent endogenous (i.e., responsive to our trad¬ 
ing activities) and we should take account of 
how the market might react to us when esti¬ 
mating liquidity costs and risks. Other things 
again being equal, the bigger our trade, the big¬ 
ger the impact we should expect it to have on 
market prices. 

In sum, we are concerned with both liquidity 
costs and liquidity risks, and we need to take ac¬ 
count of the difference between exogenous and 
endogenous liquidity. We now consider some of 
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the approaches available to adjust our estimates 
of VaR to take account of these factors. 


ESTIMATING 

LIQUIDITY-ADJUSTED VaR 

There are many ways we could estimate 
liquidity-adjusted VaR. These vary in their 
degrees of sophistication and in their ease (or 
otherwise) of implementation, and there is no 
single "best" method. However, sophisticated 
approaches are not necessarily more useful than 
more basic ones, and the best method, even if 
we could establish what it is, is not necessarily 
better than a collection of inferior ones. Instead, 
what we really seek are simple-to-implement 
(i.e., spreadsheet-executable) approaches that 
are transparent in terms of their underlying as¬ 
sumptions; in effect, we are looking for liquid¬ 
ity "add-ons" that allow us to modify original 
VaR estimates that were obtained without any 
consideration for liquidity. We can then easily 
assess the impact of our assumptions on our es¬ 
timates of VaR. Moreover, there is a premium on 
compatibility, because different methods look 
at different aspects of illiquidity, and it can be 
helpful to combine them to get some sense of an 
overall liquidity adjustment. Because of this, a 
really good method might not always be as use¬ 
ful as two inferior methods that actually work 
well together. 

Whichever models we used, we also need 
to check their sensitivities—how does the 
liquidity adjustment change as we change the 
confidence level, holding period, or any other 
parameters? A priori, we should have some 
idea of what these should be (e.g., that the 
liquidity adjustment should fall as the holding 
period rises, etc.), and we need to satisfy our¬ 
selves that the models we use have sensitivities 
of the right sign and approximate magnitude. 
Going further, we should also try to ensure that 
models are calibrated against real data (e.g., bid- 
ask spread parameters should be empirically 
plausible, etc.) and be properly stress-tested 


and back-tested. In addition, we should keep 
in mind that different approaches are often 
suited to different problems, and we should 
not seek a best approach to the exclusion of any 
others. In the final analysis, liquidity issues are 
much more subtle than they look, and there is 
no established consensus on how we should 
deal with them. So perhaps the best advice 
is for risk measurers to hedge their bets, and 
use different approaches to highlight different 
liquidity concerns. 

The Constant Spread Approach 

Ideally, if we had actual transaction prices, 
we could infer the actual returns obtained by 
traders, in which case conventional VaR meth¬ 
ods would take account of spread liquidity 
factors without the need for any further adjust¬ 
ment. In such cases, we would model actual 
returns (taking account of how they depend 
on market volume, etc.), infer a relevant con¬ 
ditional distribution (e.g., a f), and plug in the 
values of the parameters concerned into an ap¬ 
propriate parametric VaR equation. For more on 
how this might be done, see Giot and Grammig 
(2003). 

However, practitioners often lack such data 
and have to work with market prices that are 
averages of bid and ask prices. They might then 
attempt to take account of liquidity factors by 
working with the bid-ask spread, and the sim¬ 
plest way to incorporate liquidity risk into a 
VaR calculation is in terms of a spread that is 
assumed to be constant. If we make this as¬ 
sumption, the liquidity cost is then equal to half 
the spread times the size of the position liqui¬ 
dated. Using obvious notation, this means that 
we add the following liquidity cost (LC) to a 
"standard" VaR: 

1 

LC = -spread* P (1) 

where spread is expressed as actual spread di¬ 
vided by the midpoint. For the sake of compari¬ 
son and using obvious notation, let us compare 
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this to a benchmark conventional lognormal 
VaR with no adjustment for liquidity risk: 

VaR = P [1 - exp(/z R - cr R z a )] (2) 

where the returns have been calculated using 
prices that are the midpoints of the bid-ask 
spread. The liquidity-adjusted VaR, LVaR, is 
then given by: 


LVaR = VaR + LC — P[1 — exp (p R — o R z a ) 

1 

+ - spread] (3) 

Setting /z R = 0 to clarify matters, the ratio of 
LVaR to VaR is then 

LVaR spread , , 

-= 1-1-t-- (4) 

VAR 2[1 — exp(— (r R z a )] 

It is easy to show that the liquidity adjustment 

(a) rises in proportion with the assumed spread, 

(b) falls as the confidence level increases, and (c) 
falls as the holding periods each increase. The 
first and third of these are obviously correct, but 
the second implication is one that may or may 
not be compatible with one's prior expectations. 

This approach is easy to implement and 
requires only minimal information, but the 
assumption of a constant spread is highly 
implausible, and it takes no account of any other 
liquidity factors. 


The Exogenous Spread Approach 

A superior alternative is to assume that traders 
face random spreads. If our position is suffi¬ 
ciently small relative to the market, we can also 
regard our spread risk as exogenous to us (i.e., 
independent of our own trading), for any given 
holding period. We could assume any process 
for the spread that we believe to be empirically 
plausible. For example, we might believe that 
the spread is normally distributed: 

spread ~ N(p, spread , o* pre J (5) 

where p-spread is the mean spread and o S p rea d is 
the spread volatility. Alternatively, we might 
use some heavy-tailed distribution to accom¬ 
modate excess kurtosis in the spread. 


We could now estimate the LVaR using Monte 
Carlo simulation: We could simulate both P and 
the spread, incorporate the spread into P to 
get liquidity-adjusted prices, and then infer the 
liquidity-adjusted VaR from the distribution of 
simulated liquidity-adjusted prices. 

However, in practice, we might take a short¬ 
cut suggested by Bangia et al. (1999). They 
suggest that we specify the liquidity cost (LC) 
as: 


LC — ^ ([^spread V k(T spread ') (6) 

where k is some parameter whose value is to 
be determined. The value of k could be deter¬ 
mined by a suitably calibrated Monte Carlo ex¬ 
ercise, but they suggest that a particular value 
(,k — 3) is plausible (e.g., because it reflects the 
empirical facts that spreads appear to have ex¬ 
cess kurtosis and are negatively correlated with 
returns, etc.). The liquidity-adjusted VaR, LVaR, 
is then equal to the conventional VaR plus the 
liquidity adjustment (6): 


LVaR — VaR + LC = P[1 — exp(/r R — a R z a ) 


+ {p*spread + 3(7 spread! 


( 7 ) 


Observe that this LVaR incorporates (3) as a 
special case when <J sprmd = 0. It therefore retains 
many of the properties of (3), but generalizes 
from (3) in allowing for the spread volatility as 
well. The ratio of LVaR to VaR is then: 


LVaR 

VaR 


LC 

VaR 


= 1 + 3 


1 (h spread + 3(7 spread) 

2 [1 - exp(—(7 ,-Zq.)] 


( 8 ) 


This immediately tells us that the spread 
volatility a S p read serves to increase the liquid¬ 
ity adjustment relative to the earlier case. The 
Bangia et al. framework was also further de¬ 
veloped by Erwan (2002), who presented em¬ 
pirical results that are similar to the illustrative 
ones presented here in suggesting that the liq¬ 
uidity adjustment can make a big difference to 
our VaR estimates. 
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Endogenous-Price Approaches 

The previous approaches assume that prices are 
exogenous and therefore ignore the possibility 
of the market price responding to our trading. 
However, we have also noted that this is of¬ 
ten unreasonable, and we may wish to make a 
liquidity adjustment that reflects the response 
of the market to our own trading. If we sell, 
and the act of selling reduces the price, then 
this market-price response creates an additional 
loss relative to the case where the market price 
is exogenous, and we need to add this extra loss 
to our VaR. The liquidity-adjustment will also 
depend on the responsiveness of market prices 
to our trade: The more responsive the market 
price, the bigger the loss. 

We can estimate this extra loss in various 
ways, but the simplest is to make use of some 
elementary economic theory. We begin with the 
notion of the price elasticity of demand, 17 , de¬ 
fined as the ratio of the proportional change 
in price divided by the proportional change in 
quantity demanded: 

A P/P 

" = Aj4 <0; AJV / N>0 P) 


where in this context N is the size of the market 
and AN is the size of our trade. A N/N is there¬ 
fore the size of our trade relative to the size of 
the market. The impact of the trade on the price 
is therefore 


A P _ AN 

p ~ ,] i<r 


( 10 ) 


We can therefore estimate A P/P on the basis 
of information about i] and A N/N, and both of 
these can be readily guessed at using a combi¬ 
nation of economic and market judgement. The 
LVaR is then: 


/ A P\ ( A N\ 

LVaR = VaR fl - — J = VaR (1 - rj— j 

( 11 ) 

bearing in mind that the change in price is neg¬ 
ative. The ratio of LVaR to VaR is therefore: 


LVaR „ AN 

-= 1 — ii - 

VaR 1 N 


( 12 ) 


This gives us a very simple liquidity adjust¬ 
ment that depends on only two easily calibrated 
parameters. It is even independent of the VaR 
itself: The adjustment is the same regardless of 
whether the VaR is normal, lognormal, etc. 

The ratio of LVaR to VaR thus depends entirely 
on the elasticity of demand 17 and the size of our 
trade relative to the size of the market (A N/N). 

This type of approach is easy to implement, 
and it is of considerable use in situations where 
we are concerned about the impact on VaR of 
endogenous market responses to our trading 
activity, as might be the case where we have 
large portfolios in thin markets. However, it is 
also narrow in focus and entirely ignores bid- 
ask spreads and transactions costs. 

On the other hand, the fact that this approach 
focuses only on endogenous liquidity and the 
earlier ones focus on exogenous liquidity means 
that this last approach can easily be combined 
with one of the others; in effect, we can add 
one adjustment to the other. Thus, two very 
simple approaches can be added to produce 
an adjustment that addresses both exogenous 
and endogenous liquidity risk. This combined 
adjustment is given by 


LVaR 

LVaR 

LVaR 

endogenous 

VaR 

combined VdR 

exogenous VdR 


(13) 


The Liquidity Discount Approach 

A more sophisticated approach is suggested 
by Jarrow and Subramanian (1997). They con¬ 
sider a trader who faces an optimal liquidation 
problem—the trader must liquidate his or her 
position within a certain period of time to max¬ 
imize expected utility, and seeks the best way 
to do so. Their approach is impressive, as it en¬ 
compasses exogenous and endogenous market 
liquidity, spread cost, spread risk, an endoge¬ 
nous holding period, and an optimal liquida¬ 
tion policy. 

Their analysis suggests that we should mod¬ 
ify the traditional VaR in three ways. First, in¬ 
stead of using some arbitrary holding period. 
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we should use an optimal holding period deter¬ 
mined by the solution to the trader's expected- 
utility optimization problem, which takes into 
account liquidity considerations and the pos¬ 
sible impact of the trader's own trading strat¬ 
egy on the prices obtained. We should also add 
the average liquidity discount to the trader's 
losses (or subtract it from our prices) to take ac¬ 
count of the expected loss from the selling pro¬ 
cess. Finally, their analysis also suggests that 
the volatility term should take account of the 
volatility of the time to liquidation and the 
volatility of the liquidity discount factor, as well 
as the volatility of the underlying market price. 

To spell out their approach more formally, as¬ 
sume that prices between trades follow a geo¬ 
metric Brownian motion with parameters /x and 
a . The current time is 0 and the price at time t 
is p(f), so that geometric returns log(p(f)/p(0)) 
are normally distributed. However, the prices 
actually obtained from trading are discounted 
from p(t); more specifically, the prices obtained 
are p(f)c(s), where c(s) is a random quantity- 
dependent proportional discount factor, s is the 
amount traded, 0 < c(s) < 1 and, other things 
being equal, c(s) falls as s rises. Any order 
placed at time f will be also be subject to a 
random execution lag A(s), and therefore take 
place at time f + A(s). Other things again being 
equal, the execution lag A(s) rises with s: Big¬ 
ger orders usually take longer to carry out. Our 
trader has S shares and wishes to maximize the 
present value of his or her current position, as¬ 
suming that it is liquidated by the end of some 
horizon f, taking account of all relevant factors, 
including both the quantity discount c(s) and 
the execution lag A(s). After solving for this 
problem, they produce the following expres¬ 
sion for the liquidity-adjusted VaR: 


LVaR = P {£[ln(p(A(S))c(S)/p(0)j 

+ sfrf[ln(p(A(S))c(S)/p(0)]z„} (14) 


= P 


h - ~2 ) P-A(S) + Mlnc(S) 


a VhA(S) + ( h- 2 ~ ) O’A(S) + Onc(S) 


Z a 


where all parameters have the obvious interpre¬ 
tations. This expression differs from the conven¬ 
tional VaR in three ways. First, the liquidation 
horizon f in the conventional VaR is replaced 
by the expected execution lag /x A (S) i n selling S 
shares. Clearly, the bigger is S, the longer the 
expected execution lag. Second, the LVaR takes 
account of the expected discount /X| nc ( s ) on the 
shares to be sold. And, third, the volatility a in 
the conventional VaR is supplemented by ad¬ 
ditional terms related to cta(s) and oi n(; ( s ), which 
reflect the volatilities of the execution time and 
the quantity discount. Note, too, that if our liq¬ 
uidity imperfections disappear, then /x A (S) = t, 
oa(S) = 0, and c(S) = 1 (which in turn implies 
fMnc(s) = erinc(s) = 0) and our LVaR (14) collapses 
to a conventional VaR as a special case—which 
is exactly as it should be. 

To use this LVaR expression requires estimates 
of the usual Brownian motion parameters // and 
cr, as well as estimates of the liquidity param¬ 
eters /x A (s), <7 A (5), /xinc(s) and oi nc ( s ), all of which 
are fairly easily obtained. The approach is there¬ 
fore not too difficult to implement. All we have 
to do is then plug these parameters into (14) to 
obtain our LVaR. 


ESTIMATING 
LIQUIDITY-AT-RISK (LAR) 

We turn now to liquidity-at-risk (LaR), some¬ 
times also known as cash flow-at-risk (CFaR). 
LaR (or CFaR) relates to the risk attached to 
prospective cash flows over a defined horizon 
period, and can be defined in terms analogous 
to the VaR. Thus, the LaR is the maximum likely 
cash outflow over the horizon period at a spec¬ 
ified confidence level: for example, the 1-day 
LaR at the 95% confidence level is the maxi¬ 
mum likely cash outflow over the next day, at 
the 95% confidence level, and so forth. A posi¬ 
tive LaR means that the likely worst outcome, 
from a cash flow perspective, is an outflow of 
cash; and a negative LaR means that the likely 
worst outcome is an inflow of cash. The LaR is 
the cash flow equivalent to the VaR, but whereas 
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VaR deals with the risk of losses (or profits), LaR 
deals with the risk of cash outflows (or inflows). 

These cash flow risks are quite different from 
the risks of liquidity-related losses. Nonethe¬ 
less, they are closely related to these latter risks, 
and we might use LaR analysis as an input to 
evaluate them. Indeed, the use of LaR for such 
purposes is an important liquidity management 
tool. 

An important point to appreciate about LaR is 
that the amounts involved can be very different 
from the amounts involved with VaR. Suppose 
for the sake of illustration that we have a large 
market-risk position that we hedge with a fu¬ 
tures hedge of much the same amount. If the 
hedge is a good one, the basis or net risk re¬ 
maining should be fairly small, and our VaR 
estimates should reflect that low basis risk and 
be relatively small themselves. However, the 
futures hedge leaves us exposed to the possibil¬ 
ity of margin calls, and our exposure to margin 
calls will be related to the size of the futures 
position, which corresponds to the gross size 
of our original position. Thus, the VaR depends 
largely on the netted or hedged position, whilst 
the LaR depends on the larger gross position. If 
the hedge is a good one, the basis risk (or the 
VaR) will be low relative to the gross risk of the 
hedge position (or the LaR), and so the LaR can 
easily be an order of magnitude greater than the 
VaR. On the other hand, there are also many 
market risk positions that have positive VaR, 
but little or no cash flow risk (e.g., a portfolio 
of long European option positions, which gen¬ 
erates no cash flows until the position is sold or 
the options expire), and in such cases the VaR 
will dwarf the LaR. So the LaR can be much 
greater than the VaR or much less than it, de¬ 
pending on the circumstances. 

As we might expect, the LaR is potentially 
sensitive to any factors or activities, risky or 
otherwise, that might affect future cash flows. 
These include: 

• Borrowing or lending, the impact of which on 

future cash flows is obvious. 


• Margin requirements on market risk positions 
that are subject to daily marking-to-market. 

• Collateral obligations, such as those on 
swaps, which can generate inflows or out¬ 
flows of cash depending on the way the 
market moves. Collateral obligations can 
also change when counterparties like brokers 
alter them in response to changes in volatil¬ 
ity, and collateral requirements on credit- 
sensitive positions (e.g., such as default-risky 
debt or credit derivatives) can change in 
response to credit events such as credit- 
downgrades. 

• Unexpected cash flows can be triggered by 
the exercise of options, including the exercise 
of convertibility features on convertible debt 
and call features on callable debt. 

• Changes in risk management policy; for in¬ 
stance, a switch from a futures hedge to an 
options hedge can have a major impact on 
cash flow risks, because the futures position 
is subject to margin requirements and mark¬ 
ing to market whilst a (long) option position 
is not. 

Two other points are also worth emphasizing 
here. The first is that obligations to make cash 
payments often come at bad times for the firms 
concerned, because they are often triggered by 
bad events. The standard example is where a 
firm suffers a credit downgrade that leads to an 
increase in its funding costs, and yet this same 
event also triggers a higher collateral require¬ 
ment on some existing (e.g., swap) position and 
so generates an obligation to make a cash pay¬ 
ment. It is axiomatic in many markets that firms 
get hit when they are most vulnerable. The sec¬ 
ond point is that positions that might be similar 
from a market risk perspective (e.g., such as 
a futures hedge and an options hedge) might 
have very different cash flow risks. The dif¬ 
ference in cash flow risks arises, not so much 
because of differences in market risk character¬ 
istics, but because the positions have different 
credit risk characteristics, and it is the measures 
taken to manage the credit risk—the margin 
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and collateral requirements, and so on—that 
generate the differences in cash flow risks. 

We can estimate LaR using many of the same 
methods used to estimate VaR and other mea¬ 
sures of market risk. One approach, suggested 
by Singer (1997), is to use our existing VaR esti¬ 
mation tools to estimate the VaRs of marginable 
securities only (i.e., those where P/L translates 
directly into cash flows), thus allowing us to in¬ 
fer an LaR directly from the VaR. We could then 
combine this LaR estimate with comparable fig¬ 
ures from other sources of liquidity risk within 
the organization (e.g., such as estimates of LaR 
arising from the corporate treasury) to produce 
an integrated measure of firm-wide liquidity 
risk. The beauty of this strategy is that it makes 
the best of the risk measurement capabilities 
that already exist within the firm, and effec¬ 
tively tweaks them to estimate liquidity risks. 

However, this approach is also fairly rough 
and ready, and cannot be relied upon when 
the firm faces particularly complex liquidity 
risks. In such circumstances, it is often better to 
build a liquidity-risk measurement model from 
scratch, and we can start by setting out the ba¬ 
sic types of cash flow to be considered. These 
might include: 

* Known certain (or near certain) cash flows 
(e.g., income from government bonds, etc.). 
These are very easy to handle because we 
know them in advance. 

* Unconditional uncertain cash flows (e.g., in¬ 
come from default-risky bonds, etc.). These 
are uncertain cash flows, which we model 
in terms of the probability density functions 
(pdfs) (i.e., we choose appropriate distribu¬ 
tions, assign parameter values, etc.). 

* Conditional uncertain cash flows. These are 
uncertain cash flows that depend on other 
variables (e.g., a cash flow might depend on 
whether we proceeded with a certain invest¬ 
ment, and so we would model the cash flow 
in terms of a pdf, conditional on that in¬ 
vestment); other conditioning variables that 
might trigger cash flows could be interest 


rates, exchange rates, decisions about major 

projects, and so forth. 

Once we specify these factors, we can then 
construct an appropriate engine to carry out 
our estimations. The choice of engine would de¬ 
pend on the types of cash flow risks we have to 
deal with. For instance, if we had fairly uncom¬ 
plicated cash flows we might use an historical 
simulation or variance-covariance approach, or 
some specially designed term-structure model; 
however, since some cash flows are likely to 
be dependent on other factors such as discrete 
random variables (e.g., such as downgrades or 
defaults), it might not be easy tweaking such 
methods to estimate LaRs with sufficient accu¬ 
racy. In such circumstances, it might be better to 
resort to simulation methods, which are much 
better suited to handling discrete variables and 
the potential complexities of cash flows in larger 
firms. 

Another alternative is to use scenario anal¬ 
ysis. We can specify liquidity scenarios, such 
as those arising from large changes in interest 
rates, default by counterparties, the redemption 
of putable debt, calls for collateral on repos and 
derivatives, margin calls on swaps or futures 
positions, and so forth. We would then (as best 
we could) work through the likely/possible 
ramifications of each scenario, and so get an 
idea of the liquidity consequences associated 
with each scenario. Such exercises can be very 
useful, but, as with all scenario analyses, they 
might give us an indication of what could hap¬ 
pen if the scenario occurs, but don't as such tell 
us anything about the probabilities associated 
with those scenarios or the LaR itself. 

ESTIMATING LIQUIDITY IN 
CRISES 

We now consider liquidity in crisis situations. 
As we all know, financial markets occasionally 
experience major crises—these include, for ex¬ 
ample, the stock market crash of 1987, the ERM 
crisis of 1992, the Russian default crisis of the 
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summer of 1998, and, of course, the many liq¬ 
uidity problems experienced since the onset of 
the financial crisis in August 2007. Typically, 
some event occurs that leads to a large price 
fall. This event triggers a huge number of sell 
orders, traders become reluctant to buy, and the 
bid-ask spread rises dramatically. At the same 
time, the flood of sell orders can overwhelm 
the market and drastically slow down the time 
it takes to get orders executed. Selling orders 
that would take minutes to execute in normal 
times instead take hours, and the prices eventu¬ 
ally obtained are often much lower than sellers 
had anticipated. Market liquidity dries up, and 
does so at the very time market operators need 
it most. Assumptions about the market—and 
in particular, about market liquidity—that hold 
in "normal" market conditions can thus break 
down when markets experience crises. This 
means that estimating crisis liquidity is more 
than just a process of extrapolation from LaR 
under more normal market conditions: We need 
to estimate crisis-liquidity risks using methods 
that take into account the distinctive features of 
a crisis—large losses, high bid-ask spreads, and 
so forth. 

One way to way to carry out such an exer¬ 
cise is by applying "crashmetrics" (Wilmott, 
2000, Chapter 58). To give a simple example, 
we might have a position in a single deriva¬ 
tives instrument, and the profit/loss n on this 
instrument is given by a delta-gamma approx¬ 
imation: 

n = SAS+|(AS) 2 (15) 

where A S is the change in the stock price, and 
so forth. The maximum loss occurs when dS = 
— S/y and is equal to: 

L max = -n mm =— ( 16 ) 

2y 

The worst-case cash outflow is therefore 
vi8 2 /(2y), where m is the margin or collat¬ 
eral requirement. This approach can also be 
extended to handle the other Greek param¬ 
eters (the vegas, thetas, rhos, etc.), multi¬ 


option portfolios, counterparty risk, and so 
forth. The basic idea—of identifying worst- 
case outcomes and then evaluating their liq¬ 
uidity consequences—can also be implemented 
in other ways as well. For example, we might 
identify the worst-case outcome as the expected 
outcome at a chosen confidence level, and we 
could estimate this (e.g., using extreme-value 
methods) as the ES at that confidence level. The 
cash outflow would then be m times this ES. 

There are also other ways we can estimate 
crisis-LaR. Instead of focusing only on the high 
losses associated with crises, we can also take 
account of the high-bid ask spreads and/or the 
high bid-ask spread risks associated with crises. 
We can do so, for example, by estimating these 
spreads (or spread risks), and inputting these 
estimates into the relevant liquidity-adjusted 
VaR models discussed earlier. 

However, these suggestions (i.e., Greek- and 
ES-based) are still rather simplistic, and with 
complicated risk factors—such as often arise 
with credit-related risks—we might want a 
more sophisticated model that was able to take 
account of the complications involved, such as: 

• The discreteness of credit events. 

• The interdependency of credit events. 

• The interaction of credit and market risk fac¬ 
tors (e.g., the ways in which credit events de¬ 
pend, in part, on market risk factors). 

• Complications arising from the use of 
credit-enhancement methods such as net¬ 
ting arrangements, periodic settlement, credit 
derivatives, credit guarantees, and credit trig¬ 
gers (see, e.g., Wakeman, 1998). 

These complicating factors are best handled 
using simulation methods tailor-made for the 
problems concerned. 

The obvious alternative to probabilistic ap¬ 
proaches to the estimation of crisis-liquidity 
is to use crisis-scenario analyses. We would 
imagine a big liquidity event—a major market 
crash, the default of a major financial institu¬ 
tion or government, the outbreak of a war, or 
whatever—and work through the ramifications 


380 


Risk Measures 


for the liquidity of the institution concerned. 
One attraction of scenario analysis in this con¬ 
text is that we can work through scenarios in as 
much detail as we wish, and so take proper ac¬ 
count of complicated interactions such as those 
mentioned in the last paragraph. This is harder 
to do using probabilistic approaches, which are 
by definition unable to focus on any specific sce¬ 
narios. However, as with all scenario analyses, 
the results of these exercises are highly subjec¬ 
tive, and the value of the results is critically 
dependent on the quality of the assumptions 
made. 

KEY POINTS 

• Liquidity refers to the ability to execute a 
trade or liquidate a position with little or no 
cost or inconvenience. 

• Liquidity is a function of the market and de¬ 
pends on the type of position traded and 
sometimes the size and trading strategy of 
an individual trader. 

• Liquidity risks are those associated with the 
prospect of imperfect of imperfect market liq¬ 
uidity, and can relate to risk of loss or risk to 
cash flows. 

• There are two main aspects to liquidity risk 
measurement: the measurement of liquidity- 
adjusted measures of market risk (e.g., 
liquidity-adjusted value-at-risk, LVaR) and 
the measurement of liquidity risks per se (e.g., 
liquidity-at-risk, LaR). 

• There are a number of easily implementable 
and often complementary approaches to the 
estimation of liquidity-adjusted measures of 
market risk: the constant spread, exogenous 
spread, and endogenous price approaches, and 
the liquidity discount approach. 


* These approaches can produce risk estimates 
that differ substantially from the risk esti¬ 
mates obtained if liquidity is ignored. 

• There are a number of approaches to the es¬ 
timation of liquidity risks in noncrisis situa¬ 
tions. These include both LaR approaches and 
scenario analyses. 

• The LaR can be much greater than the VaR 
or much less than the VaR, depending on the 
circumstances. 

* Crisis-related liquidity risks can be estimated 
using "crashmetrics" or scenario analyses hy¬ 
pothecated on crisis events such as a dry-up 
in market liquidity. 
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Abstract: Asset returns are often not normally distributed and exhibit several stylized empirical 
facts: fat tails, skewness, finite variance, time scaling, and volatility clustering. Modeling the tail 
distribution of asset returns plays an essential role in downside risk management. The "left tail" of 
the distribution is where market crashes or crises occur. Downside risk can be measured in terms 
of conditional value-at-risk and estimated by fat-tailed and skewed models such as Levy stable, 
truncated Levy flight, skewed Student's f, mixture of normal distributions, and GARCH models. 
These fat-tailed and skewed models have different characteristics in describing the tail distribution 
of asset returns. The objective is to select appropriate ones that can accurately model the downside 
risk. 


The financial crisis of 2008 has led many practi¬ 
tioners and academics to reassess the adequacy 
of the return distribution models, in particular, 
the left tail. This entry focuses on modeling the 
left fat tails since they reflect market crashes or 
crises and play an essential role in downside 
risk management. 

The most common model of asset returns 
is assumed to be normally or Gaussian dis¬ 
tributed (see Bachelier, 1900). In other words, 
the returns follow a random walk or Brown¬ 
ian motion. This model is natural if one as¬ 
sumes the return over a time interval to be 
the result of many small independent shocks, 
which leads to a Gaussian distribution by the 
central limit theorem. However, empirical stud¬ 
ies have observed that the return distributions 


are more leptokurtic or fat-tailed than Gaussian 
distributions. 

A normal distribution model assumes that an 
asset return that is three standard deviations 
below its arithmetic mean (popularly referred 
to as a "three-sigma event") has a probabil¬ 
ity of only approximately 0.13%; that is, once 
every 1,000 times. For example, from January 
1926 to March 2010, the S&P 500 total return 
index had a monthly mean return of 0.93% 
and a monthly standard deviation of 5.54%. A 
negative three-sigma event would be a return 
lower than —15.69%. During this time period 
of 1,010 months, there were 10 monthly returns 
worse than —15.69% as shown in Table 1 (the 
three-sigma event), with the most recent loss of 
—16.79% in October 2008 being ranked at ninth. 
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Table 1 The Worst 10 Monthly Returns for the S&P 
500 (from 1/1926 to 3/2010) 


S&P 500 (%) 

Sep 1931 

-29.73 

Mar 1938 

-24.87 

May 1940 

-22.89 

May 1932 

-21.96 

Oct 1987 

-21.52 

Apr 1932 

-19.97 

Oct 1929 

-19.73 

Feb 1933 

-17.72 

Oct 2008 

-16.79 

Jun 1930 

-16.25 


Source: Morningstar Encorr. 


This implies the probability of a three-sigma 
event is about 1% rather than 0.13%, or eight 
times greater than we would expect under a 
normal distribution. Hence, a normal distribu¬ 
tion fails to describe the "fat" or "heavy" tails 
of the stock market. 

Many statistical models have been put forth to 
account for the heavy tails. We discuss several 
standard and popular fat-tailed models, such as 
Mandelbrot's Levy stable hypothesis (see Man¬ 
delbrot, 1963), the Student's f-distribution (see 
Blattberg and Gonedes, 1974), the mixture of nor¬ 
mal distributions (see Clark, 1973), and GARCH 
(see Bollerslev, 1986) models. There are many 
other fat-tailed candidates, and this entry does 
not aim at being exhaustive. Instead, we se¬ 
lect representative models and illustrate them 
through examples so that practitioners may 
have some intuition about these practically im- 
plementable models. 

Along the way, we introduce a relatively 
new fat-tailed and skewed model: the truncated 
Levy flight (TLF). Another name for the TLF 
is the tempered stable distribution. The TLF 
model has a few interesting properties that we 
will illustrate later, such as possessing fat tails, 
skewness, finite moments, and time scaling. Of 
course, these quantitative models are not the 
only tool, and they need to be integrated with 
judgmental analyses and other estimates, but 


they represent a good starting point for the 
management of downside risk. 

DOWNSIDE RISK MEASURE 

Before we dive into the discussions of fat-tailed 
models, we need to specify an appropriate 
downside risk measure. A popular downside 
risk measure is value-at-risk (VaR), which is an 
estimate of the loss that we expect to be ex¬ 
ceeded with a given level of probability (e.g., 
5%) over a specified time period. VaR has been 
recommended as a way of measuring risk by 
regulators and various financial industry advi¬ 
sory committees. 

Conditional value-at-risk (CVaR), a closely re¬ 
lated measure to VaR, is derived by taking a 
weighted average between the VaR and losses 
exceeding the VaR. Other terms for CVaR in¬ 
clude mean shortfall, tail VaR, and expected tail 
loss. Studies such as Rockafellar and Uryasev 
(2000), for example, have shown that CVaR has 
more attractive properties than VaR. Specifi¬ 
cally, CVaR is a coherent measure of risk as 
proved by Pflug (2000) in the sense of Artzner 
et al. (1999). One of the coherent measures is 
subadditivity; that is, the risk of a combination 
of investments is at most as large as the sum of 
the individual risks. VaR is not always subad¬ 
ditive, which means that the VaR of a portfo¬ 
lio with two instruments may be greater than 
the sum of individual VaRs of these two instru¬ 
ments. In contrast, CVaR is subadditive. There¬ 
fore, CVaR is a more appropriate measure of 
downside risk. 

LEVY STABLE DISTRIBUTION 

Levy distributions are stable; that is, the sum of 
two independent random variables, character¬ 
ized by the same Levy distribution of tail index 
a, is itself characterized by a Levy distribution 
of the same index. In other words, the func¬ 
tional form of the distribution is maintained, if 
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we sum up independent, identically distributed 
Levy stable random variables. The characteris¬ 
tic function of the Levy stable distribution is 
(Levy, 1925): 


\n<p(q) = iSq 
= iSq 


Y\q\‘ 
r\q\ 


1 - if — tan (—ot) 

V2 > 


\q\ 

. q 2 , 

1 — 

\q\n 


for a/1 
for a = 1 


The probability density function is obtained 
by performing the inverse Fourier transform 
on the characteristic function. The four parame¬ 
ters associated with the Levy stable distribution 
are: a determines the tail weight or the distribu¬ 
tion's kurtosis with 0 < a < 2; f> determines the 
distribution's skewness; y is a scale parameter; 
and 8 is a location parameter. One can generate 
univariate stable distributed returns through a 
numerical software package, for example, writ¬ 
ten by John Nolan (2009). 1 (In his software, the 
function "stablerndQ" takes four parameters, a, 
p, y, and 8, and generates random returns that 
follow a Levy stable distribution. For empiri¬ 
cal analyses, these four parameters can be esti¬ 
mated by the software's function "stablefit()".) 

In 1963, Mandelbrot modeled cotton prices 
with a Levy stable process (Mandelbrot, 1963). 
Mandelbrot observed that in addition to be¬ 
ing fat-tailed, the returns show another interest¬ 


ing property: time scaling. This means that the 
distributions of returns have similar functional 
forms for different time intervals, ranging from 
one day to one month. The time scaling prop¬ 
erty is very appealing as it allows the sum of 
two independent Levy stable distributed vari¬ 
ables to be stable distributed, with the same 
stability index a. The normal distribution is a 
special case of the Levy stable distribution, and 
it is scaled in the same way that the sum of two 
normally distributed variables is also normally 
distributed. 

Figure 1 shows the time scaling of the S&P 500 
index returns at time intervals of 1, 2, 3, and 5 
days. The scaling variable for a Levy stable pro¬ 
cess of index a is Z = ■ The best fit gives 

a = 1.5, and a good data collapse can be ob¬ 
served in Figure 1. 

Mandelbrot's finding was later supported by 
Fama's study on stocks (Fama, 1965). A Levy 
stable distribution model has fat tails and obeys 
scaling properties, but it has an infinite vari¬ 
ance, which conflicts with empirical observa¬ 
tions that the return variance is finite. For 
example, extensive analyses on high-frequency 
data (ranging from 1 minute to 1 day) for the 
1,000 largest companies provided evidence that 
the returns have finite variance (Gopikrishnan 
et al., 1998). Infinite variance complicates the 


S&P 500 Index Total Returns (Jan 1950 - Mar 2010) 



Figure 1 The Time Scaling of the S&P 500 Index with a Stability Index a = 1.5 
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♦ Historical-log-Stable - - - - Lognormal 


Figure 2 The Distributions of S&P 500 Monthly Returns Fitted by the Log-Stable and Lognormal 
Models 


task of risk estimation, as well as the applica¬ 
tion of mean-variance portfolio construction. 

Figure 2 illustrates the log-stable and log¬ 
normal distributions in fitting the distribution 
of monthly S&P 500 returns (also see Martin, 
Rachev, and Siboulet, 2003). Log-stable distri¬ 
bution applies the stable distribution to log- 
returns. The vertical axis of Figure 2 is in log 
scale with a base of 10, and this helps to view 
the tails of the distribution more clearly. It is 
clear that the lognormal distribution fails to fit 
the return distribution below —15% (the above- 
mentioned three-sigma events). The log-stable 
distribution fits the tail well, but it extends far 
beyond the historical maximum loss or gain 
with nonnegligible probabilities, which even¬ 
tually results in an infinite variance. In other 
words, the tail for the log-stable distribution is 
perhaps too fat. 

The infinite variance associated with the sta¬ 
ble distribution induces a challenging problem 
in risk estimation. In practice, what is needed 
is a model with a distribution falling between 
the normal and stable distributions so that its 
tail is appropriately fat, but finite. By truncat¬ 
ing the extreme tails of the stable distribution, a 
model named the truncated Levy flight has such 
properties. 


Truncated Levy Flight 

The TLF model was first introduced by Man¬ 
tegna and Stanley (1994) in the physics liter¬ 
ature, and it has drawn widespread attention 
since then. Koponen (1995) modified it in such 
a way as to allow an analytical calculation of 
the characteristic function and determination 
of the complete probability density distribu¬ 
tion. Another name for the TLF is the tempered 
stable distribution—introduced and extended 
by Boyarchenko and Levendorskii (2000), Carr 
et al. (2002), Rosinski (2007), and Kim et al. 
(2008, 2010). Another application is the so- 
called smoothly truncated stable distribution 
introduced by Menn and Rachev (2009). 

In this entry, we focus on the simplest TLF 
model by Mantegna and Stanley (1994). The 
probability density function (PDF) of a simple 
TLF process is defined as: 

P(x) = 0, x < — l; 

P(x ) = PhevyiyXfi l — X — 1/ 

P(x) = 0, x > l 

where PLevy(x) is the PDF of return x for a Levy 
stable distribution and l is the cutoff length for 
the truncation. It can be seen that the truncation 
is abrupt. Alternative TLF models are similar 
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Table 2 Parameter Estimates with the Log-TLF Model for Monthly S&P 500, 
Weekly MSCI EM, and Weekly MSCIEAFE Returns 


Log-TLF 

CL 

P 

y 

S 

Cutoff Length 

S&P 500 Monthly 

1.42 

-0.12 

0.024 

0.010 

6.8 

MSCI EM Weekly 

1.58 

-0.40 

0.015 

0.0054 

8.0 

MSCI EAFE Weekly 

1.79 

-0.52 

0.014 

0.0033 

10.0 


and have in general smoother truncations in 
the form of exponential tails. 

To simulate a TLF process from a Levy stable 
process, we apply a truncation method on the 
Levy stable distributed returns generated in the 
previous section so that the return series follows 
a TLF model. These truncated returns are then 
used in the distribution analyses and CVaR es¬ 
timates, as well as the Monte Carlo simulations. 

The truncation is simply implemented, for ex¬ 
ample, by truncating returns that are beyond 
8 -sigma for the MSCI Emerging Market in¬ 
dex weekly returns or 6.8-sigma for S&P 500 
monthly returns. The estimates of the five pa¬ 
rameters are shown in Table 2. In the table, we 
choose the cutoff length in such a way that it is 
slightly larger than the historical maximum loss 
(in terms of standard deviation) over the entire 
historical period. The cutoff length shown in 
the table is normalized. One can think of a nor¬ 
malized cutoff length of 6 as a six-sigma event. 


The other four parameters are estimated by the 
maximum likelihood method. 

An interesting feature of the TLF model is its 
time scaling behavior. Mantegna and Stanley 
(1994,1999) show that for a small time interval 
(e.g., a minute), the TLF distribution approxi¬ 
mates a Levy stable distribution with Levy sta¬ 
ble scaling; while for a significantly large but 
finite time interval (e.g., a year), the TLF dis¬ 
tribution slowly converges to a Gaussian dis¬ 
tribution. In other words, the TLF undergoes 
a crossover from a Levy stable distribution to 
a Gaussian distribution as the time interval in¬ 
creases. This crossover is consistent with an in¬ 
dependent empirical study of the distribution 
of daily, weekly and monthly returns for which 
a progressive convergence to a Gaussian pro¬ 
cess is deemed to be observed (Akgiray and 
Booth, 1988). 

Figure 3 shows the convergence of the TLF 
from the Levy stable distribution at a small time 



One-month — — One-year - - - - Five-years -Normal 


Figure 3 Time Scaling of the TLF process 
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interval to the Gaussian distribution at a large 
time interval. It shows that as the time inter¬ 
val increases from one month to one year and 
finally to five years, the normalized return dis¬ 
tribution converges from the approximate Levy 
stable distribution (one-month interval) to the 
normal distribution (five-year interval). 

The truncation is able to mathematically solve 
the infinite variance problem inherent in the 
stable distribution. In fact, the truncation leads 
to the advantage that all four moments are fi¬ 
nite. An interesting question is whether there 
are economic rationales for the truncation, even 
though the empirical evidence of finite vari¬ 
ance is convincing. The truncation implies an 
upside or downside boundary for the returns. 
For the left tail, it is easy to see that the return is 
bounded by —100% due to limited liability for 
shareholders for unleveraged indexes or portfo¬ 
lios. However, the existence of the boundary for 
the upside tail is debatable and it may require 
extensive separate research. Factors that can 
limit an infinite positive gain for a large market 
index such as the S&P 500 may include competi¬ 
tive industries, business cycles, government in¬ 
tervention such as antitrust law and increasing 
interest rates, contrarian strategies that lead to 
mean reversion of returns, and so on. Funda¬ 
mental "intrinsic valuation" indicates that the 
asset prices should be commensurate with the 
overall economic growth, which is limited by 
population growth, labor resources, productiv¬ 
ity, and so on. 

On the drawback side, like the normal or 
Levy stable distribution model, the TLF model 
assumes an independent and identically dis¬ 
tributed process and therefore it cannot de¬ 
scribe the time-dependent volatility or volatility 
clustering observed in market data. Volatility 
clustering means that a period of high volatil¬ 
ity tends to be followed by high volatility and 
a period of low volatility is likely followed by 
low volatility. 

An attempt to address this drawback is to 
assume TLF innovations instead of Gaussian 
innovations in GARCH models. A few stud¬ 


ies have investigated the option pricing prob¬ 
lem with GARCH dynamics and non-Gaussian 
innovations. For example, Menn and Rachev 
(2009) considered smoothly truncated stable in¬ 
novations in order to provide a practical frame¬ 
work to extend option pricing theory to the 
Levy stable model. Kim et al. (2010) studied 
parametric models based on tempered stable 
innovations, and they showed that the GARCH 
model with tempered stable innovations ex¬ 
plains both asset price behavior and European 
option prices better than the normal GARCH 
model. 


STUDENT'S t-DISTRIBUTION 

The Student's f-distribution is well documented 
in the literature. Its probability density function 
is given by: 



where u is the degrees of freedom. The Stu¬ 
dent's f-distribution coincides with the Cauchy 
distribution for u = 1, and approaches Gaus¬ 
sian for v —> oo. Finite variance only exists for 
u > 2. 

Blattberg and Gonedes (1974) proposed that 
the returns are distributed with a Student's 
f-distribution. Markowitz and Usmen (1996) 
found that the daily log-return data of the 
S&P 500 index can be fitted by the Student's 
f-distribution with about 4.5 degrees of free¬ 
dom. Hurst and Platen (1997) reached a similar 
conclusion. Platen and Sidorowicz (2007) inves¬ 
tigated the log-returns of a variety of diversi¬ 
fied world stock indexes in different currency 
denominations by applying the maximum like¬ 
lihood ratio test to the large class of generalized 
hyperbolic distributions, and showed that the 
Student's f-distribution with about four degrees 
of freedom was the best fit among the models 
they tested. 
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The Student's f-distribution is symmetric, 
thus it cannot model skewness. In order to 
model negative skewness, Hansen (1994) in¬ 
troduced the skewed Student's f-distribution, 
which is able to model skewness, but it requires 
one more parameter to be estimated. 

The Student's f-distribution has fat tails 
but does not obey time scaling, which indi¬ 
cates that the sum of two independent Stu¬ 
dent's f-distributed variables is not a Student's 
f-variable with the same degrees of freedom. It 
cannot model volatility clustering. 

The kurtosis of the Student's t distribution is 
given by and it is only defined for u > 4. In 
other words, the kurtosis is infinite when v is 
less than or equal to 4, and the skewness tends 
to be unstable for u < 4. In order to avoid an 
infinite kurtosis, we set the minimum u as 4.1 
when the maximum likelihood estimate gives 
a value of v less than 4 (shown as MLE-u in 
Table 3). Our numerical simulations show that 
the CVaR estimate is not sensitive to this small 
change of v. 

For the symmetric Student's f-distribution, v 
is the only parameter that needs to be estimated 
for normalized returns. For the skewed Stu¬ 
dent's f-distribution, we need to add a param¬ 
eter, X, to capture the skewness (see Hansen, 
1994). These estimated parameters are shown 
in Table 3. 

Table 3 Parameter Estimates with the Log Student's t 
and Log Skewed Student's t Distributions for Monthly 
S&P 500, Weekly MSCI EM, and Weekly MSCIEAFE 
Returns 


Log Student's t 


V 

MLE-u 

S&P 500 Monthly 

4.1 

3.6 

MSCI EM Weekly 

4.1 

4.0 

MSCI EAFE Weekly 

4.4 

4.4 

Log Skewed t 


V 

X 

S&P 500 Monthly 

4.1 

-0.13 

MSCI EM Weekly 

4.1 

-0.25 

MSCI EAFE Weekly 

4.4 

-0.09 


MIXTURE OF NORMAL 
DISTRIBUTIONS 

In the mixture of normal distributions model, 
the fat tails are obtained through subordination. 
The model considered for the log-returns is: 

d log S(f) = fid t + <7g(t)dW 

where /i and a are associated with the normal 
process of an individual trade. W is a stan¬ 
dard Brownian motion. This model becomes 
the standard geometric Brownian motion when 
g(t) is constant. g(t) is a subordinator and pos¬ 
itive increasing random process that character¬ 
izes the market trading activity time. 

If g(t) is assumed to be lognormally dis¬ 
tributed with mean jjl s and standard deviation 
er s , this mixture process is also referred to as 
the normal-lognormal mixture. The probability 
density function for the normal-lognormal mix¬ 
ture is given in Clark (1973). 

Other kinds of mixtures exist in the literature, 
such as a normal-gamma mixture, also referred 
to as a variance gamma process (Madan and 
Seneta, 1990). In this entry, we only illustrate the 
normal-lognormal mixture, one of the simplest 
mixture models. The estimated parameters for 
the normal-lognormal mixture are shown in 
Table 4. 

The mixture of normal distributions utilizes 
the concept of a subordinated process. Clark 
(1973) assumes that trading volume is a plausi¬ 
ble measure of the evolution of price dynamics. 
Indeed, a sizeable literature has demonstrated 
a strong positive contemporaneous correlation 
between trading volume and return volatility 
(see, for example, Andersen, 1996). More specif¬ 
ically, the distribution of log-returns occurring 

Table 4 Parameter Estimates with the Mixture 
Distribution for Monthly S&P 500, Weekly MSCI EM, 
and Weekly MSCI EAFE Returns 

li ar /t s <T S 

S&P 500 Monthly 0.0075 0.0382 0.0006 1.193 

MSCI EM Weekly 0.0019 0.0206 0.0002 1.241 

MSCI EAFE Weekly 0.0013 0.0152 0.0003 1.280 
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from a given level of trading volume is subor¬ 
dinate to the distribution of an individual trade 
and directed by the distribution of the trading 
volume. By assuming the normal distribution 
for the individual trade and finite moments for 
the distribution of the trading volume, Clark 
(1973) proves that the mixed distribution has 
fat tails with all moments finite. 

The mixture of normal distributions is intu¬ 
itively appealing because it is directly linked 
to market microstructure such as information 
flow, trading volume, and number of transac¬ 
tions. The subordinated process premise has 
also evolved into stochastic volatility that now 
receives vigorous attention in the finance liter¬ 
ature (see Andersen, 1996). In general, mixture 
of normal distributions has fat tails but does 
not obey time scaling. A generalized mixture 
of normal distributions, however, can describe 
volatility clustering. 

GARCH Models 

General autoregressive conditional het- 
eroscedasticity (GARCH) models, first intro¬ 
duced by Bollerslev (1986), are now widely 
employed in financial time-series analyses. 
In particular, they are used to predict short 
horizon volatilities (ranging from one day to 
one month). 

The return generating process is based on geo¬ 
metric Brownian motion but with the variance 
being a time-dependent GARCH(1,1) process, 
which is defined by the relation: 

— a 0 + a l r t-l + 

where a a , a\, and (i\ are the control parame¬ 
ters of the GARCH(1,1) stochastic process. r t is 
a random variable with zero mean and vari¬ 
ance cr 2 , and is characterized by a conditional 
probability density function/ f (x), which is ar¬ 
bitrary but is often chosen to be Gaussian. In 
this entry, the innovation cr f 2 is assumed to be 
Gaussian. These three control parameters are 
estimated by the maximum likelihood method 
and shown in Table 5. 


Table 5 Parameter Estimates with the GARCH(1,1) 
Model for Monthly S&P 500, Weekly MSCI EM, and 
Weekly MSCI EAFE Returns 




“l 

Hi 

S&P 500 Monthly 

0.00006 

0.1291 

0.8474 

MSCI EM Weekly 

0.00002 

0.1431 

0.8309 

MSCI EAFE Weekly 

0.00002 

0.0897 

0.8815 


GARCH models assume that volatility 
changes with time and with past information. 
Because of the time-dependent volatility, the 
unconditional distribution of returns exhibit fat 
tails. GARCH models allow for volatility clus¬ 
tering or autocorrelation in the volatility. 

The most popular GARCH model is GARCH 
(1,1). The scaling properties of GARCH(1,1) are 
not clear from the theory; however, numerical 
simulations of GARCH(1,1) with Gaussian 
innovations show that it fails to describe the 
scaling properties of high-frequency data (see 
Mantegna and Stanley, 1999). 

GARCH(1,1) processes are unconditionally 
stationary with finite variance if 1 — a.\ — P\ > 
0, and have finite kurtosis if 1 — /l 2 — 2a\ p\ — 
3a 2 > 0. 

MODELING RETURN 
DISTRIBUTIONS FOR MAJOR 
INDEXES 

Applications of the Levy stable. Student's f, and 
mixture of normal distribution models in mod¬ 
eling market indexes are well documented (see, 
for example, Mandelbrot [1963], Clark [1973], 
Blattberg and Gonedes [1974], Markowitz 
and Usmen [1996], Hurst and Platen [1997], 
Martin, Rachev and Siboulet [2003], Platen and 
Sidorowicz [2007], etc.). The literature offered 
detailed methodology on how the model pa¬ 
rameters are estimated. In some cases, they per¬ 
formed comparisons for these models. 

Mantegna and Stanley (1999) studied the TLF 
model and GARCH(1,1) with Gaussian innova¬ 
tions processes. They found that the TLF model 
well describes the time scaling, while it is not 
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able to properly describe the volatility cluster¬ 
ing. The GARCH(1,1) model seems to be com¬ 
plementary to the TLF: It is able to describe 
the volatility clustering, but it fails to describe 
the time scaling. As mentioned earlier, how¬ 
ever, the GARCH model with TLF innovations 
might offer a better solution to the TLF model 
or GARCFI with Gaussian innovations. 

Many previous studies have focused on high- 
frequency data such as daily return data. Flere, 
we are interested in weekly or monthly data 
because investors typically have a relatively 
long investment horizon and portfolios are of¬ 
ten rebalanced monthly. We apply these fat¬ 
tailed models to some well-known weekly or 
monthly returns of equity indexes. Our test as¬ 
sets include the monthly S&P 500 total return 
index, the weekly MSCI Emerging Market in¬ 
dex, and the weekly MSCI EAFE index. One 
reason to use weekly data is to have more data 
points in the tails given that the MSCI indexes 


have relatively short histories. A few other eq¬ 
uity and fixed income indexes, such as the MSCI 
UK, U.S. Long-Term Government Bond, Muni 
bonds, and some individual stocks were tested 
with the same methodologies and the results 
are similar, so they are not reported (e.g., Xiong, 
2010 ). 

We apply the maximum likelihood method to 
calibrate model parameters as previous stud¬ 
ies did. The estimated parameters for the TLF, 
Student's f, normal-lognormal mixture, and 
GARCFI(1,1) are shown in Tables 2, 3, 4, and 
5, respectively. Since we are more interested in 
modeling downside risk, our goal is to fit the 
model's tail distribution to the empirical tail 
distribution in terms of CVaR through Monte 
Carlo simulations. 

Table 6 focuses on nonstable distribution 
models and presents the empirical statistics as 
well as the Monte Carlo simulation results for 
the six models. The statistics for each model 


Table 6 Statistics Summary for Historical Returns, as Well as Simulated Returns for Lognormal, Log-TLF, Log 
Student's f. Log Skewed Student's f, Normal-Lognormal Mixture, and GARCH(1,1) Models 


S&P 500 Monthly 



Mean 

Std Dev 

Skewness 

Kurtosis 

CVaR 

Empirical 

0.93% 

5.54% 

0.35 

12.45 

-12.20% 

Lognormal 

0.93% 

5.54% 

0.16 

3.05 

-9.96% 

log-TLF 

0.93% 

5.54% 

0.59 

12.90 

-12.20% 

log-Student f 

0.93% 

5.54% 

1.35 

47.93 

-10.91% 

log-Skewed t 

0.93% 

5.54% 

0.69 

50.70 

-11.91% 

Mixture 

0.93% 

5.54% 

1.02 

18.85 

-11.34% 

GARCH(1,1) 

0.93% 

5.54% 

0.46 

9.50 

-10.77% 

MSCI EM Weekly 

Empirical 

0.25% 

3.04% 

-0.52 

8.38 

-7.45% 

Lognormal 

0.25% 

3.04% 

0.09 

3.02 

-5.88% 

log-TLF 

0.25% 

3.04% 

-0.38 

12.29 

-7.45% 

log-Student f 

0.25% 

3.04% 

0.71 

22.90 

-6.49% 

log-Skewed t 

0.25% 

3.04% 

-0.81 

14.23 

-7.45% 

Mixture 

0.25% 

3.04% 

0.62 

16.10 

-6.76% 

GARCH(1,1) 

0.25% 

3.04% 

0.58 

23.91 

-6.47% 

MSCI EAFE Weekly 

Empirical 

0.16% 

2.29% 

-0.76 

10.25 

-5.27% 

Lognormal 

0.16% 

2.29% 

0.07 

3.01 

-4.47% 

log-TLF 

0.16% 

2.29% 

-0.47 

9.25 

-5.27% 

log-Student f 

0.16% 

2.29% 

0.46 

16.71 

-4.93% 

log-Skewed t 

0.16% 

2.29% 

-0.18 

13.03 

-5.27% 

Mixture 

0.16% 

2.29% 

0.52 

16.71 

-5.17% 

GARCH(1,1) 

0.16% 

2.29% 

0.10 

4.20 

-4.73% 
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are based on 1,000,000 simulated random re¬ 
turns that follow the corresponding distribu¬ 
tion models. It can be seen that the lognormal 
model underestimates the monthly CVaR by 
2.04% for the S&P 500, the weekly CVaR by 
1.57% for the MSCI EM, and the weekly CVaR 
by 0.8% for the MSCI EAFE, respectively. The 
log Student's f-distribution, normal-lognormal 
mixture, and GARCH(1,1) have similar CVaR 
estimates, and all of them are better than 
the lognormal model but appear to underes¬ 
timate the tail risk. On the other hand, both 
the log-TLF model and the log skewed Stu¬ 
dent's f-model provide a good fit for CVaR 
for all three indexes: S&P 500, MCSI EM, and 
MSCI EAFE. 

Note that the log Student's f, normal- 
lognormal mixture, and GARCH(1,1) are pos¬ 
itively skewed by design in a way similar to the 
lognormal distribution because we are work¬ 
ing with the log-returns. The positive skewness 
resulted from taking the exponential function 
on the log-returns. None of these three mod¬ 
els can account for negative skewness without 
modifications. 


Therefore there are two reasons why the log- 
TLF and the log skewed Student's f-models do 
well in fitting the CVaR. First, their tails are ap¬ 
propriately fat, and second, both of them are 
able to capture negative skewness. For the TLF 
model, the fatness of the tail is controlled by a 
and the cutoff length and the skewness is con¬ 
trolled by ft as shown in Table 2. For the skewed 
Student's f-distribution, the fatness of the tail 
is controlled by the degrees of freedom v and 
the skewness is controlled by k as shown in 
Table 3. 

Figures 4,5, and 6 compare the log-TLF model 
with other models in fitting the historical re¬ 
turn distributions for monthly S&P 500 returns, 
weekly MSCI EM returns, and weekly MSCI 
EAFE returns, respectively. The figures confirm 
the results shown in Table 6. It can be seen 
that the log-TLF provides a good fit for the 
three indexes. The log skewed Student's f is 
almost as effective as the log-TLF model in fit¬ 
ting CVaRs. Compared to the log skewed Stu¬ 
dent's f-distribution, the log-TLF has a fatter 
but shorter tail because of the truncation. On 
the other hand, the normal-lognormal mixture 


S&P 500 Monthly Return Distribution (Jan 1926 -- Mar 2010) 



Returns 


O Historical 

-log-TLF 

-log Skewed-t 

.Mixture 

-GARCH 

- - - - Lognormal 


Figure 4 The Historical Distributions of S&P 500 Monthly Returns Fitted by the Log-TLF, Log Skewed 
Student's f, Normal-Lognormal Mixture, GARCH(1,1), and Lognormal Models 
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MSCIEM Weekly Return Distribution (Jan 1988 -- Mar 2010) 



O Historical 

-log-TLF 

-log Skewed-t 

.Mixture 

-GARCH 
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Figure 5 The Historical Distributions of MSCI EM Weekly Returns Fitted by the Log-TLF, Log Skewed 
Student's t, Normal-Lognormal Mixture, GARCH(1,1), and Lognormal Models 


and GARCH(1,1) model have CVaRs that fall 
between those of the log-TLF and lognormal 
models. The finding for the log symmetric Stu¬ 
dent's /-distribution, not plotted due to space 
limitations, is similar to the normal-lognormal 
mixture and GARCH(1,1) model. 

Table 7 summarizes the underestimated 
CVaRs for the six models that have been ap¬ 
plied to the three indexes. The underestimated 


tails are reported on a relative basis based 
on CVaR estimates shown in Table 6 . For ex¬ 
ample, the lognormal model underestimates 
the monthly CVaR by a relative percentage of 
18% (= 122 1 2.2 96 ) for the s&p 500 index. 

Averaging over the three indexes, the lognor¬ 
mal model underestimates the CVaR by about 
18% on a relative basis. The normal-lognormal 
mixture, the log Student's /-distribution, and 


MSCIEAFE Weekly Return Distribution (Jan 1976 — Mar 2010) 



O Historical -log-TLF -log Skewed-t 

.Mixture -GARCH - - - - Lognormal 


Figure 6 The Historical Distributions of MSCI EAFE Weekly Returns Fitted by the Log-TLF, Log 
Skewed Student's f, Normal-Lognormal Mixture, GARCH(1,1), and Lognormal Models 
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Table 7 Underestimated CVaRs in Relative Percentage for the Six Models 


Index 

S&P 500 

MSCI EM 

MSCI EAFE 

Data Range 

1926.1-2010.3 

1988.1-2010.4 

1976.1-2010.4 

Number of Periods 

1011 Monthly 

1164 Weekly 

1792 Weekly 

Lognormal 

18% 

21% 

15% 

Log-TLF 

0% 

0% 

0% 

Log Student's t 

11% 

13% 

6% 

Log Student's Skewed t 

2% 

0% 

0% 

Normal-Lognormal Mixture 

7% 

9% 

2% 

GARCH(1,1) 

12% 

13% 

10% 


GARCH(1,1) with Gaussian innovations per¬ 
form better than the lognormal model but ap¬ 
pear to underestimate the CVaR by about 6%, 
10%, and 12%, respectively. In contrast, both 
the log-TLF and log skewed f-distribution did 
a better job in modeling the CVaR. 

KEY POINTS 

• It is well known that asset returns often ex¬ 
hibit fat tails, negative skewness, time scal¬ 
ing, and volatility clustering. Fat-tailed and 
skewed models can be used to estimate the 
downside risk of assets. It is important that 
the selected models are able to capture fat 
tails and skewness, among others. 

* The lognormal distribution is the fundamen¬ 
tal assumption of many important financial 
models, but it has thin tails and thus can sig¬ 
nificantly underestimate the downside risk. 
On the other side, the Levy stable distribu¬ 
tion exhibits time scaling and fat tails, but it 
tends to overestimate the downside risk due 
to its infinite variance. 

* The Student's f-distribution can model fat 
tails but not negative skewness. A modi¬ 
fication results in the skewed Student's t- 
distribution, which can model both fat tails 
and negative skewness. Flowever, both of 
them do not possess time scaling properties 
and cannot model volatility clustering. 

• The normal-lognormal mixture is intuitive as 
it is directly linked to market microstructure 
such as information flow and trading vol¬ 
ume. It has fat tails but cannot model negative 


skewness. In general, it does not possess time 
scaling. 

* The truncated Levy flight model can describe 
the asymptotic return distributions measured 
at all frequencies and the scaling proper¬ 
ties (self-similarities). More specifically, for a 
small time interval (e.g., a minute), this dis¬ 
tribution approximates a Levy stable distri¬ 
bution with Levy stable scaling; while for a 
significantly large but finite time interval 
(e.g., a year), the truncated Levy flight dis¬ 
tribution slowly converges to a Gaussian dis¬ 
tribution. It has finite four moments and can 
model both fat tails and negative skewness. 

• The truncated Levy flight or tempered stable 
distribution model cannot describe volatility 
clustering. In contrast, GARCFI with Gaus¬ 
sian innovations can model volatility cluster¬ 
ing but it is often found that the tail is not fat 
enough. Recent studies show that a GARCFI 
with truncated Levy flight innovations ap¬ 
pears to be able to describe most of the styl¬ 
ized empirical facts: fat tails, skewness, and 
volatility clustering. 

NOTE 

1. For details, see http://academic2.american 
,edu/~jpnolan/stable/stable.html/ 
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Abstract: The volatilities and correlations of the returns on a set of assets, risk factors, or interest rates 
are summarized in a covariance matrix. This matrix lies at the heart of risk and return analysis. It 
contains all the information necessary to estimate the volatility of a portfolio, to simulate correlated 
values for its risk factors, to diversify investments, and to obtain efficient portfolios that have 
the optimal trade-off between risk and return. Both risk managers and asset managers require 
covariance matrices that may include very many assets or risk factors. For instance, in a global 
risk management system of a large international bank all the major yield curves, equity indexes, 
foreign exchange rates, and commodity prices will be encompassed in one very large dimensional 
covariance matrix. 


Variances and covariances are parameters of the 
joint distribution of asset (or risk factor) re¬ 
turns. It is important to understand that they 
are unobservable. They can only be estimated or 
forecast within the context of a model. 
Continuous-time models, used for option pric¬ 
ing, are often based on stochastic processes 
for the variance and covariance. Discrete-time 
models, used for measuring portfolio risk, are 
based on time series models for variance and 
covariance. In each case, we can only ever esti¬ 
mate or forecast variance and covariance within 
the context of an assumed model. 

It must be emphasized that there is no ab¬ 
solute "true" variance or covariance. What is 
"true" depends only on the statistical model. 


Even if we knew for certain that our model 
was a correct representation of the data gen¬ 
eration process, we could never measure the 
true variance and covariance parameters ex¬ 
actly because pure variance and covariance are 
not traded in the market. An exception to this 
is the futures on volatility indexes such as the 
Chicago Board Options Exchange Volatility In¬ 
dex (VIX). Hence, some risk-neutral volatility 
is observed. However, this entry deals with co- 
variance matrices in the physical measure. 

Estimating a variance according to the for¬ 
mulas given by a model, using historical data, 
gives an observed variance that is "realized" 
by the process assumed in our model. But 
this "realized variance" is still only ever an 
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estimate. Sample estimates are always subject 
to sampling error, which means that their value 
depends on the sample data used. 

In summary, different statistical models can 
give different estimates of variance and covari¬ 
ance for two reasons: 

• A true variance (or covariance) is different 
between models. As a result, there is a con¬ 
siderable degree of model risk inherent in the 
construction of a covariance or correlation ma¬ 
trix. That is, very different results can be ob¬ 
tained using two different statistical models 
even when they are based on exactly the same 
data. 

• The estimates of the true variances (and co- 
variances) are subject to sampling error. That 
is, even when we use the same model to esti¬ 
mate a variance, our estimates will differ de¬ 
pending on the data used. Both changing the 
sample period and changing the frequency 
of the observations will affect the covariance 
matrix estimate. 

This entry covers moving average discrete¬ 
time series models for variance and covari¬ 
ance, focusing on the practical implementation 
of the approach and providing an explanation 
for their advantages and limitations. Other sta¬ 
tistical tools are described in Alexander (2008a, 
Chapter 9). 


BASIC PROPERTIES OF 
COVARIANCE AND 
CORRELATION MATRICES 

The covariance matrix is a square, symmetric 
matrix of variance and covariances of a set of m 
returns on assets, or on risk factors, given by: 


Since 


1 of d 12 ... . 

■ ■ °1 m ^ 



ff 2 l df • • • • 

■ ■ a 2m 



°31 d 32 

■ ■ a 3m 



yVml 


■ ■ ) 




( of 

6n a \ a 2 


■ ■ ■ ^ 


i ?21 cr 2 cr l 

°2 2 


• ■ ■ £?2md 2 d„, 


63 l°'3 cr l 

632 a 3 a 2 

°3 

■ ■ ■ Q3m a 3 cr m 


\6ml ff m a l 



a m ) 


a covariance matrix can also be expressed as 

V = DCD (2) 

where D is a diagonal matrix with elements 
equal to the standard deviations of the returns 
and C is the correlation matrix of the returns. 
That is: 
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Hence, the covariance matrix is simply a math¬ 
ematically convenient way to express the asset 
volatilities and their correlations. 

To illustrate how to estimate an annual co- 
variance matrix and a 10-day covariance ma¬ 
trix, assume three assets that have the following 
volatilities and correlations: 


Asset 1 volatility 20% Asset 1-Asset 2 correlation 0.8 
Asset 2 volatility 10% Asset 1-Asset 3 correlation 0.5 
Asset 3 volatility 15% Asset 3-Asset 2 correlation 0.3 
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So the annual covariance matrix DCD is: 


/ 0.2 

0 

0 \ 

/I 

0.8 

0 

0.1 

0 

0.8 

1 

\o 

0 

0.15 / 

V 0.5 

0.3 


0.5 \ / 0.2 0 0 \ 

0.3 0 0.1 0 

1 / \ 0 0 0.15/ 


/ 0.04 0.016 0.015 \ 

0.016 0.01 0.0045 

V 0.015 0.0045 0.0225 / 


To find a 10-day covariance matrix in this 
simple case, one is forced to assume the returns 
are independent and identically distributed 
in order to use the square root of time rule: 
that is, that the //-day covariance matrix is h 
times the 1 day covariance matrix. Put another 
way, the 10-day covariance matrix is obtained 
from the annual matrix by dividing each 
element by 25, assuming there are 250 trading 
days per year. 

Alternatively, we can obtain the 10-day matrix 
using the 10-day volatilities in D. Note that un¬ 
der the independent and identically distributed 
returns assumption C should not be affected by 
the holding period. That is. 
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because each volatility is divided by 5 (that is, 
the square root of 25). Then we get the same 
result as above, that is 


/ 0.04 

0 

0 \ 

/I 

0.8 

0.5 \ 

° 

0.02 

° 

0.8 

1 

0.3 

\o 

0 

0.03/ 

\ 0.5 

0.3 

1 ) 


/ 0.04 

0 

0 \ 

/ 0.16 

0.064 

0.06 \ 

° 

0.02 

° 

= 0.064 

0.04 

0.018 

\o 

0 

0.03 / 

\ 0.06 

0.018 

0.09 / 


x 10“ 2 


Note that V is positive semi definite if and only 
if C is positive semidefinite. D is always positive 
definite. Hence, the positive semidefiniteness 
of V only depends on the way we construct 
the correlation matrix. It is quite a challenge 
to generate meaningful, positive semidefinite 
correlation matrices that are large enough for 
managers to be able to net the risks across all 
positions in a firm. Simplifying assumptions are 
necessary. For example, RiskMetrics (1996) uses 
a very simple methodology based on moving 


averages in order to estimate extremely large 
positive definite matrices covering hundreds of 
risk factors for global financial markets. (This is 
discussed further below.) 


EQUALLY WEIGHTED 
AVERAGES 

This section describes how volatility and cor¬ 
relation are estimated and forecast by applying 
equal weights to certain historical time series 
data. We outline a number of pitfalls and limi¬ 
tations of this approach and as a result recom¬ 
mend that these models be used as an indication 
of the possible range for long-term volatility 
and correlation. As we shall see, these models 
are of dubious validity for short-term volatility 
and correlation forecasting. 

In the following, for simplicity, we assume 
that the mean return is zero and that returns 
are measured at the daily frequency, unless 
specifically stated otherwise. A zero mean re¬ 
turn is a standard assumption for risk assess¬ 
ments based on time series of daily data, but 
if returns are measured over longer intervals 
it may not be very realistic. Then the equally 
weighted estimate of the variance of returns is 
the average of the squared returns and the cor¬ 
responding volatility estimate is the square root 
of this expressed as an annual percentage. The 
equally weighted estimate of the covariance of 
two returns is the average of the cross products 
of returns and the equally weighted estimate of 
their correlation is the ratio of the covariance 
to the square root of the product of the two 
variances. 

Equal weighting of historical data was the 
first widely accepted statistical method for fore¬ 
casting volatility and correlation of financial as¬ 
set returns. For many years, it was the market 
standard to forecast average volatility over the 
next h days by taking an equally weighted aver¬ 
age of squared returns over the previous h days. 
This method was called the historical volatil¬ 
ity forecast. Nowadays, many different statis¬ 
tical forecasting techniques can be applied to 
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historical time series data so it is confusing to 
call this equally weighted method the historical 
method. However, this rather confusing termi¬ 
nology remains standard. 

Perceived changes in volatility and corre¬ 
lation have important consequences for all 
types of risk management decisions, whether 
to do with capitalization, resource allocation, 
or hedging strategies. Indeed it is these param¬ 
eters of the returns distributions that are the 
fundamental building blocks of market risk as¬ 
sessment models. It is therefore essential to un¬ 
derstand what type of variability in returns the 
model has measured. The model assumes that 
an independently and identically distributed 
process generates returns. That is, both volatil¬ 
ity and correlation are constant and the "square 
root of time rule" applies. This assumption has 
important ramifications and we shall take care 
to explain these very carefully. 


Statistical Methodology 

The methodology for constructing a covari¬ 
ance matrix based on equally weighted aver¬ 
ages can be described in very simple terms. 
Consider a set of time series i = 1,, m; 
t = 1,..., T. Here the subscript i denotes the as¬ 
set or risk factor, and t denotes the time at which 
each return is measured. We shall assume that 
each return has a zero mean. Then an unbiased 
estimate of the unconditional variance of the zth 
returns variable at time f, based on the T most 
recent daily returns as: 

r 

Erf,-, 

~ 2 1=1 /o\ 


The term "unbiased estimator" means the ex¬ 
pected value of the estimator is equal to the true 
value. 

Note that (3) gives an unbiased estimate of the 
variance but this is not the same as the square of 
an unbiased estimate of the standard deviation. 
That is, ^/E (a 2 ) = a but E (<t) ^ er. So really the 
hat should be written over the whole of a 2 . 


But it is generally understood that the notation 
d 2 is used to denote the estimate or forecast of 
a variance, and not the square of an estimate of 
the standard deviation. So, in the case that the 
mean return is zero, we have 


E (<r 2 ) = a 2 

If the mean return is not assumed to be zero 
we need to estimate this from the sample, and 
this places a (linear) constraint on the variance 
estimated from sample data. In that case, to ob¬ 
tain an unbiased estimate we should use 


E (<■„_, - r ,) 2 


s u = 


1=1 


T - 1 


(4) 


where f, is the average return on the zth series, 
taken over the whole sample of T data points. 
The mean-deviation form above may be use¬ 
ful for estimating variance using monthly or 
even weekly data over a period for which aver¬ 
age returns are significantly different from zero. 
However with daily data the average return is 
usually very small and since, as we shall see be¬ 
low, the errors induced by other assumptions 
are huge relative to the error induced by as¬ 
suming the mean is zero, we normally use the 
form (3). 

Similarly, an unbiased estimate of the uncon¬ 
ditional covariance of two zero mean returns at 
time f, based on the T most recent daily returns 
is: 




Er, 

1=1 




(5) 


As mentioned above, we would normally ig¬ 
nore the mean deviation adjustment with daily 
data. 

The equally weighted unconditional covari¬ 
ance matrix estimate at time t for a set of k 
returns is thus V f = j for z, j — 1,... ,k. 
Loosely speaking, the term "unconditional" 
refers to the fact that it is the overall or long- 
run or average variance that we are estimating, 
as opposed to a conditional variance that can 
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change from day to day and is sensitive to re¬ 
cent events. 

As mentioned in the introduction, we use the 
term "volatility" to refer to the annualized stan¬ 
dard deviation. The equally weighted estimates 
of volatility and correlation are obtained in two 
stages. First, one obtains an unbiased estimate 
of the unconditional covariance matrix using 
equally weighted averages of squared returns 
and cross products of returns and the same 
number n of data points each time. Then these 
are converted into volatility and correlation es¬ 
timates by applying the usual formulas. For in¬ 
stance, if the returns are measured at the daily 
frequency and there are 250 trading days per 
year: 

Equally weighted volatility = d> V250 

Gilt 

Equally weighted correlation = Qii,t = —— i — 

In the equally weighted methodology the 
forecasted covariance matrix is simply taken 
to be the current estimate, there being noth¬ 
ing else in the model to distinguish an estimate 
from a forecast. The original risk horizon for 
the covariance matrix is given by the frequency 
of the data—daily returns will give the 1-day 
covariance matrix forecast, weekly returns will 
give the 1-week covariance matrix forecast, and 
so forth. Then, since the model assumes that 
returns are independently and identically dis¬ 
tributed we can use the square root of time rule 
to convert a 1-day forecast into an h -day co- 
variance matrix forecast, simply by multiplying 
each element of the 1-day matrix by h. Simi¬ 
larly, a monthly forecast can be obtained for the 
weekly forecast by multiplying each element by 
4, and so forth. 

Flaving obtained a forecast of variance, 
volatility, covariance, and correlation we 
should ask: Flow accurate is this forecast? For 
this we could provide either a confidence inter¬ 
val, that is, a range within which we are fairly 
certain that the true parameter will lie, or a stan¬ 
dard error for our parameter estimate. The stan¬ 


dard error gives a measure of precision of the 
estimate and can be used to test whether the 
true parameter can take a certain value, or lie in 
a given range. The next few sections show how 
such confidence intervals and standard errors 
can be constructed. 

Confidence Intervals for Variance 
and Volatility 

A confidence interval for the true variance a 2 
when it is estimated by an equally weighted 
average can be derived using a straightforward 
application of sampling theory. Assuming the 
variance estimate is based on n normally dis¬ 
tributed returns with an assumed mean of zero, 
then Tb 2 /er 2 will have a chi-squared distribu¬ 
tion with T degrees of freedom (see Freund, 
1998). A 100(1 - a)% two-sided confidence in¬ 
terval for T a 2 la 2 would therefore take the form 
(Xi_a /2 T’ Xff /2 r) an d a straightforward calcula¬ 
tion gives the associated confidence interval for 
the variance er 2 as: 

4 ^-) ( 7 ) 

Xff/2 ,T Xi_c/2,r / 

For example, a 95% confidence interval for an 
equally weighted variance forecast based on 30 
observations is obtained using the upper and 
lower chi-squared critical values: 

Xo. 975,30 = 16.791 and Xo. 025,30 = 46.979 

So the confidence interval is (0.6386b 2 , 
1.7867b 2 ) and exact values are obtained by sub¬ 
stituting in the value of the variance estimate. 

Figure 1 illustrates the upper and lower 
bounds for a confidence interval for a variance 
forecast when the equally weighted variance es¬ 
timate is one. We see that as the sample size T 
increases, the width of the confidence interval 
decreases, markedly so as T increases from low 
values. 

We can turn now to the confidence intervals 
that would apply to an estimate of volatility. 
Recall that volatility, being the square root of 
the variance, is simply a monotonic decreasing 
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Figure 1 Confidence Interval for Variance 
Forecasts 

transformation of the variance. Percentiles are 
invariant under any strictly monotonic increas¬ 
ing transformation. That is, if / is any mono¬ 
tonic increasing function of a random variable 
X, then: 

P(c,<X<c u )=P (/ (c,) < / (X) < / (c„)) 

( 8 ) 

Property (8) provides a confidence interval for 
a historical volatility based on the confidence 
interval (7). Since *fx is a monotonic increasing 
function of x, one simply takes the square root 
of the lower and upper bounds for the equally 
weighted variance. For instance if a 95% con¬ 
fidence interval for the variance is [16%, 64%] 
then a 95% for the associated volatility is [4%, 
8%]. And, since x 2 is also monotonic increas¬ 
ing for x > 0, the converse also applies. Thus 
if a 95% confidence interval for the volatility is 
[4%, 8%] then a 95% for the associated variance 
is [16%, 64%]. 

Standard Errors for Equally 
Weighted Average Estimators 

An estimator of any parameter has a distribu¬ 
tion and a point estimate of volatility is just the 
expectation of the distribution of the volatil¬ 
ity estimator. The distribution function of the 
equally weighted average volatility estimator 
is not just square root of the distribution func¬ 
tion of the corresponding variance estimate. In¬ 


stead, it may be derived from the distribution 
of the variance estimator via a simple transfor¬ 
mation. Since volatility is the square root of the 
variance, the density function of the volatility 
estimator is 

g(a) = 2ah(a 2 ) for a > 0 (9) 

where h (d 2 ) is the density function of the vari¬ 
ance estimator. This follow from the fact that if 
y is a monotonic and differentiable function of 
x, then their probability densities g(.) and h(.) 
are related as g(y) = \dx/dy\ h(x) (see Freund, 
1998). Note that when y = Jx, \ dx/dy \ = 2 y 
and so g(y) = 2y h(x). 

In addition to the point estimate or expecta¬ 
tion, one might also estimate the standard devi¬ 
ation of the distribution of the estimator. This is 
called the "standard error" of the estimate. The 
standard error determines the width of a con¬ 
fidence interval for a forecast and it indicates 
how reliable a forecast is considered to be. The 
wider the confidence interval, the more uncer¬ 
tainty there is in the forecast. 

Standard errors for equally weighted average 
variance estimates are based on a normality as¬ 
sumption for the returns. Moving average mod¬ 
els assume that returns are independent and 
identically distributed. Now assuming normal¬ 
ity also, so that the returns are normally and 
independently distributed, denoted by NID(0, 
ct 2 ), we apply the variance operator to (3). Note 
that if X, are independent random variables (i = 
1,..., T), then/(X,) are also independent for any 
monotonic differentiable function/. Hence, the 
squared returns are independent, and we have: 

V( & t)=J2 v (rf-i)/T 2 ( 10 ) 

i =1 

Since V(X) = £(X 2 ) — £(X) 2 for any random 
variable X, V(r 2 ) = E(rf ) — £(r 2 ) 2 . By the zero 
mean assumption E(r 2 ) = a 2 and assuming 
normality, £(r 4 ) = 3a 4 . Hence for every t: 

V (r 2 ) = 3 a 4 — a 4 = 2 ct 4 
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and substituting this into (10) gives 

V(d f 2 ) = ^ (11) 

Hence, the standard error of an equally 
weighted average variance estimate based on 

T zero mean squared returns is a 2 J^ or sim¬ 
ply yj~^, when expressed as a percentage of the 
variance. For instance, the standard error of the 
variance estimate is 20% when 50 observations 
are used in the estimate, and 10% when 200 ob¬ 
servations are used in the estimate. 

What about the standard error of the volatility 
estimator? To derive this, we first prove that for 
any continuously differentiable function/ and 
random variable X: 

V(/(X))«/'(E(X))V(X) (12) 

To show this, we take a second order Taylor ex¬ 
pansion of/ about the mean of X and then take 
expectations. See Alexander (2008a), Chapter 1. 
This gives: 

£(/(X))^/(£(X)) + V 2 /"(£(X))V(X) 

(13) 

Similarly, 

£(/(X)V/(£(X)) 2 +(/'(£(X)) 2 

+/(E(X))/"(E(X)))V(X) 

(14) 


again ignoring higher-order terms. The result 
(12) follows on noting that: 

v(/(x)) = e(/(x) 2 )-e(/(x)) 2 


We can now use (11) and (12) to derive the 
standard error of a historical volatility estimate. 
From (12) we have V (<r 2 ) ~ (2<r) 2 V (<r) and so: 


v(a) 


v(* 2 ) 

(2d) 2 


(15) 


Now using (11) in (15) we obtain the variance 
of the volatility estimator as: 



so the standard error of the volatility estimator 
as a percentage of volatility is (2T)~ 1/2 . This re¬ 
sult tells us that the standard error of the volatil¬ 
ity estimator (as a percentage of volatility) is 
approximately one-half the size of the standard 
error of the variance (as a percentage of the 
variance). 

Thus, as a percentage of the volatility, the stan¬ 
dard error of the historical volatility estimator 
is approximately 10% when 50 observations are 
used in the estimate, and 5% when 200 observa¬ 
tions are used in the estimate. The standard er¬ 
rors on equally weighted moving average volatility 
estimates become very large when only a few 
observations are used. This is one reason why 
it is advisable to use a long averaging period in 
historical volatility estimates. 

It is harder to derive the standard error of an 
equally weighted average correlation estimate. 
However, it can be shown that 

= (17) 

and so we have the following f-distribution for 
the correlation estimate divided by its standard 
error: 

— ~ tj-2 (18) 

1 - Qij 

In particular, the significance of a correlation es¬ 
timate depends on the number of observations 
that are used in the sample. 

To illustrate testing for the significance of his¬ 
torical correlation, suppose that a historical cor¬ 
relation estimate of 0.2 is obtained using 38 
observations. Is this significantly greater than 
zero? The null hypothesis is Efo : q — 0, the al¬ 
ternative hypothesis is Hi : q > 0, and the test 
statistic is (18). Computing the value of this 
statistic given our data gives 

0.2 x 6 12 3 r— _ „ 

t = - = —— = —— = Vh5 = 1.225 

VI - 0.04 V% V6 

Even the 10% upper critical value of the 
f-distribution with 36 degrees of freedom is 
greater than this value (it is in fact 1.3). Hence 
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we cannot reject the null hypothesis: 0.2 is not 
significantly greater than zero when estimated 
from 38 observations. However, if the same 
value of 0.2 had been obtained from a sample 
with, say, 100 observations our f-value would 
have been 2.02, which is significantly positive 
at the 2.5% level because the upper 2.5% critical 
value of the f-distribution with 98 degrees of 
freedom is 1.98. 


Equally Weighted Moving Average 
Covariance Matrices 

An equally weighted "moving" average is cal¬ 
culated on a fixed size data "window" that 
is rolled through time, each day adding the 
new return and taking off the oldest return. 
The length of this window of data, also called 
the "look-back" period or averaging period, is 
the time interval over which we compute the 
average of the squared returns (for variance) 
or the average cross products of returns (for 
covariance). In the past, several large financial 
institutions have lost a lot of money because 
they used the equally weighted moving aver¬ 
age model inappropriately. I would not be sur¬ 
prised if much more money was lost because of 
the inexperienced use of this model in the fu¬ 
ture. The problem is not the model itself—after 
all, it is a perfectly respectable statistical for¬ 
mula for an unbiased estimator—the problems 
arise from its inappropriate application within 
a time series context. 

A (fallacious) argument goes as follows: 
Long-term predictions should be unaffected by 
short-term phenomena such as "volatility clus¬ 
tering" so it will be appropriate to take the 
average over a very long historic period. But 
short-term predictions should reflect current 
market conditions, which means that only the 
immediate past returns should be used. Some 
people use an historical averaging period of T 
days in order to forecast forward T days; others 
use slightly longer historical periods than the 
forecast period. For example, for a 10-day fore¬ 



55000 

50000 

45000 

40000 

35000 

30000 

25000 

20000 

15000 

10000 


Figure 2 MIB 30 and S&P 100 Daily Close 


cast, some practitioners might look back 30 days 
or more. But this apparently sensible approach 
actually induces a major problem. If one or more 
extreme returns is included in the averaging pe¬ 
riod, the volatility (or correlation) forecast can 
suddenly jump downward to a completely dif¬ 
ferent level on a day when absolutely nothing 
happened in the markets. And prior to myste¬ 
riously jumping down, a historical forecast will 
be much larger than it should be. 

Figure 2 illustrates the daily closing prices of 
the Italian MIB 30 stock index between the be¬ 
ginning of January 2000 and the end of April 
2006 and compares these with the S&P 100 in¬ 
dex prices over the same period. The prices 
were downloaded from Yahoo! Finance. We will 
show how to calculate the 30-day, 60-day, and 
90-day historical volatilities of these two stock 
indexes and compare them graphically. 

We construct three different equally weighted 
moving average volatility estimates for the MIB 
30 index, with T — 30 days, 60 days and 90 days, 
respectively. The result is shown in Figure 3. Let 
us first focus on the early part of the data period 
and on the period after the September 11, 2001 
(9/11), terrorist attack in particular. The Italian 
index reacted to the news far more than most 
other indexes. The volatility estimate based on 
30 days of data jumped from 15% to nearly 50% 
in one day, and then continued to rise further, 
up to 55%. Then, suddenly, exactly 30 days after 
the event, 30-day volatility jumped down again 
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Figure 3 Equally Weighted Moving Average 
Volatility Estimates of the MIB 30 Index 

to 30%. But nothing particular happened in the 
Italian markets on that day. The drastic fall in 
volatility was just a "ghost" of the 9/11 terror¬ 
ist attack: It was no reflection at all of the real 
market conditions at that time. 

Similar features are apparent in the 60-day 
and 90-day volatility series. Each series jumps 
us immediately after the 9/11 event, and then, 
either 60 or 90 days afterward, jumps down 
again. On November 9, 2001, the three differ¬ 
ent look-back periods gave volatility estimates 
of 30%, 43%, and 36%, but they are all based 
on the same underlying data and the same in¬ 
dependent and identically distributed assump¬ 
tion for the returns! Other such ghost features 
are evident later in the period, for instance, in 
March 2001 and March 2003. Later on in the 
period, the choice of look-back period does not 
make so much difference: The three volatility 
estimates are all around the 10% level. 


Case Study: Measuring the Volatility 
and Correlation of U.S Treasuries 

The interest rate covariance matrix is an im¬ 
portant determinant of the value at risk (VaR) 
of a cash flow. In this section, we show how 
to estimate the volatilities and correlations of 
different maturity U.S. zero-coupon interest 
rates using the equal weighted moving average 



Figure 4 U.S. Treasury Rates 

Source: http:/ /www.federalreserve.gov/releases 

/hl5/data.htm. 

method. Consider daily data on constant matu¬ 
rity U.S. Treasury rates between January 4,1982 
and March 11, 2005. The rates are graphed in 
Figure 4. 

It is evident that rates followed marked trends 
over the period. From a high of about 15% in 
1982, by the end of the same period the short¬ 
term rates were below 3%. Also, periods where 
the term structure of interest rates is relatively 
flat are interspersed with periods when the term 
structure is upward sloping, sometimes with 
the long-term rates being several percent higher 
than the short-term rates. During the upward 
sloping yield curve regimes, especially the lat¬ 
ter one from 2000 to 2005, the medium- to long¬ 
term interest rates are more volatile than the 
short-term rates, in absolute terms. However, it 
is not clear which rates are the most volatile in 
relative terms, as the short rates are much lower 
than the medium to long-term rates. There are 
three decisions that must be made: 

Decision 1: How long an historical data period 
should be used? 

Decision 2: Which frequency of observations 
should be used? 

Decision 3: Should the volatilities and cor¬ 
relations be measured directly on absolute 
changes in interest rates, or should they be 
measured on relative changes and then the 
result converted into absolute terms? 
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Decision 1: How Long a Historical Data Period 
Should Be Used? 

The equally weighted historical method gives 
an average volatility, or correlation, over the 
sample period chosen. The longer the data pe¬ 
riod, the less relevant that average may be today 
(that is, at the end of the sample). Looking at 
Figure 4, it may be thought that data from 2000 
onward, and possibly also data during the first 
half of the 1990s, are relevant today. However, 
we may not wish to include data from the latter 
half of the 1990s, when the yield curve was flat. 


Decision 2: Which Frequency of Observations 
Should Be Used? 

This is an important decision, which depends 
on the end use of the covariance matrix. We 
can always use the square root of time rule to 
convert the holding period of a covariance ma¬ 
trix. For instance, a 10-day covariance matrix 
can be converted into a 1-day matrix by divid¬ 
ing each element by 10; and it can be converted 
into an annual covariance matrix by multiply¬ 
ing each element by 25. However, this conver¬ 
sion is based on the assumption that variations 
in interest rates are independent and identically 
distributed. Moreover, the data become more 
noisy when we use high-frequency data. For 
instance, daily variations may not be relevant 
if we only ever want to measure covariances 
over a 10-day period. The extra variation in the 
daily data is not useful, and the crudeness of 
the square root of time rule will introduce an 
error. To avoid the use of crude assumptions it 
is best to use a data frequency that corresponds 
to the holding period of the covariance matrix. 

However, the two decisions above are linked. 
For instance, if data are quarterly, we need a 
data period of five or more years; otherwise, 
the standard error of the estimates will be very 
large. But then our quarterly covariance matrix 
represents an average over many years that may 
not be thought of as relevant today. If data are 
daily, then just one year of data provides plenty 
of observations to measure the historical model 


volatilities and correlations accurately. Also, a 
history of one year is a better representation of 
today's markets than a history of five or more 
years. However, if it is a quarterly covariance 
matrix that we seek, we have to apply the square 
root of time rule to the daily matrix. Moreover, 
the daily variations that are captured by the 
matrix may not be relevant information at the 
quarterly frequency. 

In summary, there may be a trade-off between 
using data at the relevant frequency and using 
data that are relevant today. It should be noted 
that such a trade-off between Decisions 1 and 2 
above applies to the measurement of risk in all 
asset classes and not only to interest rates. 

In interest rates, there is another decision to 
make before we can measure risk. Since the 
price value of a basis point (PV01) sensitiv¬ 
ity vector is usually measured in basis points, 
an interest rate covariance matrix is also usu¬ 
ally expressed in basis points. Hence, we have 
Decision 3. 

Decision 3: Absolute versus Relative Measures 
Should the volatilities and correlations be mea¬ 
sured directly on absolute changes in interest 
rates, or should they be measured on relative 
changes and then the result converted into ab¬ 
solute terms? 

If rates have been trending over the data 
period the two approaches are likely to give 
very different results. One has to make a de¬ 
cision about whether relative changes or abso¬ 
lute changes are the more stable. In these data, 
for example, an absolute change of 50 basis 
points in 1982 was relatively small, but in 2005 
it would have represented a very large change. 
Hence, to estimate an average daily covariance 
matrix over the entire data sample, it may be 
more reasonable to suppose that the volatilities 
and correlations should be measured on relative 
changes and then converted to absolute terms. 

Note, however, that a daily matrix based on 
the entire sample would capture a very long¬ 
term average of volatilities and correlations 
between daily U.S. Treasury rates, indeed it 


Moving Average Models for Volatility and Correlation, and Covariance Matrices 


405 


is a 22-year average that includes several pe¬ 
riods of different regimes in interest rates. Such 
a long-term average, which is useful for long¬ 
term forecasts, may be better based on lower 
frequency data (e.g., monthly). For a 1-day fore¬ 
cast horizon, we shall use only the data since 
January 1, 2000. 

To make the choice for Decision 3, we take 
both the relative daily changes (the differ¬ 
ence in the log rates) and the absolute daily 
changes (the differences in the rates, in basis- 
point terms). Then we obtain the standard de¬ 
viation, correlation, and covariance in each case, 
and in the case of relative changes we translate 
the results into absolute terms. We now com¬ 
pare results based on relative changes with re¬ 
sults based on absolute changes. The correlation 
matrix estimates based on the period January 1, 
2000, to March 11, 2005, are shown in Table 1. 

The matrices are similar. Both matrices dis¬ 
play the usual characteristics of an interest rate 
term structure: Correlations are higher at the 
long end than the short end, and they decrease 
as the difference between the two maturities 
increases. 


Table 1 Correlation of U.S. Treasuries 


(a) Based on Relative Changes 



m3 

m6 

yi 

y2 

y3 

ys 

ylO 

m3 

1.00 







m6 

0.77 

1.00 






yi 

0.53 

0.84 

1.00 





y2 

0.44 

0.69 

0.88 

1.00 




y3 

0.42 

0.66 

0.84 

0.97 

1.00 



y5 

0.39 

0.62 

0.79 

0.91 

0.96 

1.00 


y!0 

0.32 

0.54 

0.71 

0.82 

0.88 

0.95 

1.00 

(b) Based on Absolute Changes 
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m6 

0.79 

1.00 






yi 

0.54 

0.81 

1.00 





y2 

0.40 

0.67 

0.87 

1.00 




y3 

0.37 

0.62 

0.83 

0.97 

1.00 



y5 

0.33 

0.57 

0.77 

0.92 

0.95 

1.00 


y!0 

0.26 

0.48 

0.69 

0.84 

0.88 

0.95 

1.00 


Table 2 compares the volatilities of the inter¬ 
est rates obtained using the two methods. The 
figures in the last row of each table represent an 
average absolute volatility for each rate over the 
period January 1,2000 to March 11,2005. Basing 
this first on relative changes in interest rates. Ta¬ 
ble 2(a) gives the standard deviation of relative 
returns volatility in the first row. The long-term 
rates have the lowest standard deviations, and 
the medium-term rates have the highest stan¬ 
dard deviations. These standard deviations are 
then annualized (by multiplying by 250, as¬ 
suming each rate is independent and identically 
distributed) and multiplied by the level of the 
interest rate on March 11,2005. There was a very 
marked upward sloping yield curve on March 
11, 2005. Hence the long-term rates are more 
volatile than the short-term rates: For instance, 
the 3-month rate has an absolute volatility of 
about 76 basis points, but the absolute volatil¬ 
ity of the 10-year rates is about 98 basis points. 

Table 2(b) measures the standard deviation 
of absolute changes in interest rates over the 
period January 1, 2000 to March 11, 2005, and 
then converts this into volatility by multiply¬ 
ing by sj 250. We again find that the long-term 
rates are more volatile than the short-term rates; 
for instance, the six-month rate has an absolute 
volatility of about 62 basis points, but the ab¬ 
solute volatility of the five-year rates is about 
106 bps. (It should be noted that it is quite un¬ 
usual for long-term rates to be more volatile 
than short-term rates. But from 2000 to 2004 the 
U.S. Fed was exerting a lot of control on short¬ 
term rates, to bring down the general level of 
interest rates. However, the market expected 
interest rates to rise, because the yield curve 
was upward sloping during most of the period.) 
We find that correlations were similar, whether 
based on relative or absolute changes. But Ta¬ 
ble 2 shows there is a substantial difference be¬ 
tween the volatilities obtained using the two 
methods. When volatilities are based directly 
on the absolute changes, they are slightly lower 
at the short end and substantially lower for the 
medium-term rates. 
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Table 2 Volatility of U.S. Treasuries 


(a) Based on Relative Changes 



m3 

m6 

yi 

y2 

y3 

ys 

ylO 

Standard deviation 

Yield curve on March 11,2005 
Absolute volatility (in basis points) 

0.0174 

2.76 

75.89 

0.0172 

3.06 

83.08 

0.0224 

3.28 

116.23 

0.0267 

3.73 

157.61 

0.0239 

3.94 

148.71 

0.0187 

4.22 

124.88 

0.0136 

4.56 

98.21 

(b) Based on Absolute Changes 









m3 

m6 

yi 

y2 

ys 

ys 
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Standard deviation 

Absolute volatility (in basis points) 

4.4735 

70.73 

3.9459 

62.39 

4.7796 

75.57 

6.4626 

102.18 

6.7964 

107.46 

6.7615 

106.91 

6.1738 

97.62 


Finally, we obtain the annual covariance ma¬ 
trix of absolute changes (in basis point terms) 
by multiplying the correlation matrix by the ap¬ 
propriate absolute volatilities and to obtain the 
one-day covariance matrix we divide by 250. 
The results are shown in Table 3. Depending 
on whether we base estimates of volatility and 
correlation on relative or absolute changes in 
interest rates, the covariance matrix can be 
very different. In this case, it is short-term and 
medium-term volatility estimates that are the 
most affected by the choice. Given that we have 
used the equally weighted average method- 


Table 3 One-Day Covariance Matrix of U.S. 
Treasuries, in Basis Points 


(a) Based on Relative Changes 
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32.26 
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20.87 
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58.28 

91.14 
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25.84 

45.95 

71.94 
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46.47 
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(b) Based on Absolute Changes 
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11.65 

15.30 
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26.86 

41.77 




y3 

11.17 

16.76 

26.96 

42.73 

46.19 



y5 

9.89 

15.21 

25.03 

40.09 

43.81 

45.72 


ylO 

7.17 

11.71 

20.25 

33.34 

36.92 

39.55 

38.12 


ology to construct the covariance matrix, the 
underlying assumption is that volatilities and 
correlations are constant. Hence, the choice be¬ 
tween relative or absolute changes depends on 
which are the more stable. In countries with 
very high interest rates, or when interest rates 
have been trending during the sample period, 
relative changes tend to be more stable than ab¬ 
solute changes. 

In summary, there are four crucial decisions to 
be made when estimating a covariance matrix 
for interest rates: 

1. Which statistical model should we employ? 

2. Which historical data period should be used? 

3. Should the data frequency be daily, weekly, 
monthly, or quarterly? 

4. Should we base the matrix on relative or ab¬ 
solute changes in interest rates? 

The first three decisions must also be made 
when estimating covariance matrices in other 
asset classes such as equities, commodities, and 
foreign exchange rates. There is a huge amount 
of model risk involved with the construction of 
covariance matrices; very different results may 
be obtained depending on the choice made. 

Pitfalls of the Equally Weighted 
Moving Average Method 

The problems encountered when applying this 
model stem not from the small jumps that are 
often encountered in financial asset prices, but 
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from the large jumps that are only rarely en¬ 
countered. When a long averaging period is 
used, the importance of a single extreme event 
is averaged out within a large sample of re¬ 
turns. Hence, a moving average volatility esti¬ 
mate may not respond enough to a short, sharp 
shock in the market. This effect is clearly visible 
in 2002, where only the 30-day volatility rose 
significantly over a matter of a few weeks. The 
longer-term volatilities did rise, but it took sev¬ 
eral months for them to respond to the market 
falls in the MIB during mid-2002. At this point 
in time there was actually a cluster of volatility, 
which often happens in financial markets. The 
effect of the cluster was to make the longer-term 
volatilities rise, eventually, but then they took 
too long to return to normal levels. It was not 
until markets returned to normal in late 2003 
that the three volatility series in Figure 2 are in 
line with each other. 

When there is an extreme event in the mar¬ 
ket, even just one very large return will influ¬ 
ence the T-day moving average estimate for 
exactly T days until that very large squared re¬ 
turn falls out of the data window. Hence volatil¬ 
ity will jump up, for exactly T days, and then 
fall dramatically on day T + 1, even though 
nothing happened in the market on that day. 
This type of ghost feature is simply an artifact 
of the use of equal weighting. The problem is 
that extreme events are just as important to cur¬ 
rent estimates, whether they occurred yester¬ 
day or a very long time ago. A single large, 
squared return remains just as important T - 1 
days ago as it was yesterday. It will affect the 
T-day volatility or correlation estimate for ex¬ 
actly T days after that return was experienced, 
and to exactly the same extent. However, with 
other models we would find that volatility or 
correlation had long ago returned to normal lev¬ 
els. Exactly T + 1 days after the extreme event, 
the equally weighted moving average volatil¬ 
ity estimate mysteriously drops back down to 
about the correct level—that is, provided that 
we have not had another extreme return in the 
interim! 


Note that the smaller is T, the number of data 
points used in the data window, the more vari¬ 
able the historical volatility series will be. When 
any estimates are based on a small sample size 
they will not be very precise. The larger the sam¬ 
ple size the more accurate the estimate, because 
sampling errors are proportional to 1 / ~JT. For 
this reason alone a short moving average will 
be more variable than a long moving average. 
Hence, a 30-day historic volatility (or correla¬ 
tion) will always be more variable than a 60-day 
historic volatility (or correlation) that is based 
on the same daily return data. Of course, if 
one really believes in the assumption of con¬ 
stant volatility that underlies this method, one 
should always use as long a history as possible, 
so that sampling errors are reduced. 

It is important to realize that whatever the 
length of the historical averaging period and 
whenever the estimate is made, the equally 
weighted method is always estimating the same 
parameter: the unconditional volatility (or cor¬ 
relation) of the returns. But this is a constant—it 
does not change over the process. Thus, the 
variation in T-day historic estimates can only 
be attributed to sampling error: There is noth¬ 
ing else in the model to explain this variation. 
It is not a time-varying volatility model, even 
though some users try to force it into that frame¬ 
work. 

The problem with the equally weighted mov¬ 
ing average model is that it tries to make an 
estimate of a constant volatility into a forecast 
of a time-varying volatility. Similarly, it tries to 
make an estimate of a constant correlation into a 
forecast of a time-varying correlation. No won¬ 
der financial firms have lost a lot of money with 
this model! It is really only suitable for long¬ 
term forecasts of average volatility, or correla¬ 
tion, for instance over a period of between six 
months to several years. In this case, the look- 
back period should be long enough to include a 
variety of price jumps, with a relative frequency 
that represents the modeler expectations of the 
probability of future price jumps of that magni¬ 
tude during the forecast horizon. 
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Using Equally Weighted Moving 
Averages 

To forecast a long-term average for volatil¬ 
ity using the equally weighted model, it is 
standard to use a large sample size T in the 
variance estimate. The confidence intervals for 
historical volatility estimators given earlier in 
this entry provide a useful indication of the 
accuracy of these long-term volatility forecasts 
and the approximate standard errors that we 
have derived earlier in this entry give an in¬ 
dication of variability in long-term volatility. 
Here, we saw that the variability in estimates 
decreased as the sample size increased. Hence, 
long-term volatility that is forecast from this 
model may prove useful. 

When pricing options, it is the long-term 
volatility that is most difficult to forecast. Op¬ 
tions trading often focuses on short-maturity 
options and long-term options are much less 
liquid. Hence, it is not easy to forecast a long¬ 
term implied volatility. Long-term volatility 
holds the greatest uncertainty, yet it is the 
most important determinant of long-term op¬ 
tion prices. 

We conclude this section with an interest¬ 
ing conundrum, considering two hypothetical 
historical volatility modelers, whom we shall 
call Tom and Dick, both forecasting volatility 
over a 12-month risk horizon based on equally 
weighted average of squared returns over the 
past 12 months of daily data. Imagine that it is 
January 2006 and that on October 15, 2005, the 
market crashed, returning -50% in the space of 
a few days. So some very large jumps occurred 
during the current data window, albeit three 
months ago. 

Tom includes these extremely large returns 
in his data window, so his ex-post average of 
squared returns, which is also his volatility fore¬ 
cast in this model, will be very high. Because 
of this, Tom has an implicit belief that another 
jump of equal magnitude will occur during the 
forecast horizon. This implicit belief will con¬ 
tinue until one year after the crash, when those 


large negative returns fall out of his moving 
data window. Consider Tom's position in Oc¬ 
tober 2006. Up to the middle of October he 
includes the crash period in his forecast but 
after that the crash period drops out of the 
data window and his forecast of volatility in 
the future suddenly decreases—as if he sud¬ 
denly decided that another crash was very un¬ 
likely. That is, he drastically changes his belief 
about the possibility of an extreme return. So, to 
be consistent with his previous beliefs, should 
Tom now "bootstrap" the extreme returns ex¬ 
perienced during October 2005 back into his 
data set? 

And what about Dick, who in January 2006 
does not believe that another market crash 
could occur in his 12-month forecast horizon? 
So, in January 2006, he should somehow fil¬ 
ter out those extreme returns from his data. Of 
course, it is dangerous to embrace the possibil¬ 
ity of bootstrapping in and filtering out extreme 
returns in data in an ad hoc way, before it is used 
in the model. However, if one does not do this, 
the historical model can imply a very strange 
behavior of the beliefs of the modeler. 

In the Bayesian framework of uncertain 
volatility the equally weighted model has an 
important role to play. Equally weighted mov¬ 
ing averages can be used to set the bounds 
for long-term volatility; that is, we can use 
the model to find a range [er m! „, cr max \ for the 
long-term average volatility forecast. The lower 
bound a m i n can be estimated using a long pe¬ 
riod of historical data with all the very extreme 
returns removed and the upper bound a max can 
be estimated using the historical data where the 
very extreme returns are retained—and even 
adding some! 

A modeler's beliefs about long-term volatility 
can be formalized by a probability distribution 
over the range [er m!n , cr max ]. This distribution 
would then be carried through for the rest of the 
analysis. For instance, upper and lower price 
bounds might be obtained for long-term expo¬ 
sures with option-like structures, such as war¬ 
rants on a firm's equity or convertibles bonds. 


Moving Average Models for Volatility and Correlation, and Covariance Matrices 


409 


This type of Bayesian method, which provides 
a price distribution rather than a single price, 
will be increasingly used in market risk man¬ 
agement in the future. 


EXPONENTIALLY WEIGHTED 
MOVING AVERAGES 

An exponentially weighted moving average 
(EWMA) avoids the pitfalls explained in the 
previous section because it puts more weight 
on the more recent observations. Thus as ex¬ 
treme returns move further into the past as the 
data window slides along, they become less im¬ 
portant in the average. 


Statistical Methodology 

An exponentially weighted moving average 
can be defined on any time series of data. Say 
that on date t we have recorded data up to time 

t — 1, so we have observations (x t __, X\). 

The exponentially weighted average of these 
observations is defined as: 

EWMA(x f _!, ... ,x\) 

Xf—i ~t~ XXf— 2 “b X 2 Xt— 2 -\~ . ~\~X^ 2 X\ 

- 1 + X + X 2 + . +x t ~ 2 

where X is a constant, 0 < X < 1, called the 
smoothing or the decay constant. Since A T —> 0 
as T —»■ oo the exponentially weighted average 
places negligible weight on observations far in 
the past. And since 1 + X + X 2 +.... = (1 — k)^ 1 
we have, for large f, 

T71A7A \ A I \ ~ Xf ~ 1 + ^ X t~2 + X 2 Xt -3 +. 

EWMA xi) ~ -~- 

v i+k+x 2 +. 

oo 

= (1 - A) 

i=i 

This is the formula that is used to calcu¬ 
late exponentially weighted moving average 
(EWMA) estimates of variance (with x being 
the squared return) and covariance (with x be¬ 
ing the cross product of the two returns). As 
with equally weighted moving averages, it is 


standard to use squared daily returns and cross 
products of daily returns, not in mean deviation 
form. That is: 

OO 

a? = (l-X)J2^~ lr li ( 19 ) 

Z=1 

and 

oo 

°i2,t = (1 - r U-i r 2,t-i ( 20 ) 

i =1 

The above formulas may be rewritten in the 
form of recursions, more easily used in calcula¬ 
tions: 

df = (1 - X) r^_ x + Xa?_ x (21) 

and 


<h 2 ,t = (1 - *) r u _ x r 2 ,t-\ + X <r 12 (22) 


An alternative notation used for the above is 
V\ ( r t ), for a 2 and COVx (n,t, r 2 ,t) for <ri 2 ,f when 
we want to make explicit the dependence on 
the smoothing constant. 

One converts the variance to volatility by 
taking the annualized square root, the annu¬ 
alizing constant being determined by the data 
frequency as usual. Note that for the EWMA 
correlation the covariance is divided by the 
square root of the product of the two EWMA 
variance estimates, all with the same value of 
X. Similarly for the EWMA beta the covariance 
between the stock (or portfolio) returns and the 
market returns is divided by the EWMA es¬ 
timate for the market variance, both with the 
same value of X. That is: 


and 


, _ COVi(ri t , r 2< t) 

?t '* “ jVx(r u )V k (r 2 ' t ) 

a COV x (X t , Y f ) 
pt ’ x = Vx(X t ) 


(23) 


(24) 


Interpretation of A. 

There are two terms on the right-hand side 
of (21). The first term (1 — X)r 2 _ x determines 
the intensity of reaction of volatility to market 
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events: The smaller is X the more the volatility 
reacts to the market information in yesterday's 
return. The second term Xaf_. determines the 
persistence in volatility: Irrespective of what 
happens in the market, if volatility was high 
yesterday it will be still be high today. The closer 
that X is to 1, the more persistent is volatility fol¬ 
lowing a market shock. 

Thus, a high X gives little reaction to actual 
market events but great persistence in volatil¬ 
ity, and a low X gives highly reactive volatilities 
that quickly die away. An unfortunate restric¬ 
tion of exponentially weighted moving average 
models is that the reaction and persistence pa¬ 
rameters are not independent: The strength of 
reaction to market events is determined by 1 - X, 
while the persistence of shocks is determined by 
X. But this assumption is not empirically justi¬ 
fied except perhaps in a few markets (e.g., major 
U.S. dollar exchange rates). 

The effect of using a different value of X in 
E WMA volatility forecasts can be quite substan¬ 
tial. Figure 5 compares two EWMA volatility 
estimates / forecasts of the S&P 100 index, with 
X — 0.90 and X = 0.975. It is not unusual for these 
two EWMA estimates to differ by as much as 
10 %. 

So which is the best value to use for the 
smoothing constant? How should we choose 
X? This is not an easy question. (By contrast, 
in generalized autoregressive conditional het- 



Figure 5 EWMA Volatility Estimates for SP100 
with Different As 


eroskedascity (GARCH) models there is no 
question of how we should estimate parame¬ 
ters, because maximum likelihood estimation is 
an optimal method that always gives consistent 
estimators.) Statistical methods may be consid¬ 
ered: For example, X could be chosen to min¬ 
imize the root mean square error between the 
EWMA estimate of variance and the squared 
return. But, in practice, X is often chosen subjec¬ 
tively because the same value of X has to be used 
for all elements in an EWMA covariance matrix. 
As a rule of thumb, we might take values of X 
between about 0.75 (volatility is highly reactive 
but has little persistence) and 0.98 (volatility is 
very persistent but not highly reactive). 


Properties of the Estimates 

An EWMA volatility estimate will react im¬ 
mediately following an unusually large return, 
then the effect of this return on the EWMA 
volatility estimate gradually diminishes over 
time. The reaction of EWMA volatility estimates 
to market events therefore persists over time, 
and with a strength that is determined by the 
smoothing constant X. The larger the value of X, 
the more weight is placed on observations in the 
past and so the smoother the series becomes. 

Figure 6 compares the EWMA volatility of the 
MIB index with X — 0.95 and the 60-day equally 
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Figure 6 EWMA versus Equally Weighted 
Volatility 
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weighted volatility estimate. The difference be¬ 
tween the two estimators is marked following 
an extreme market return. The EWMA esti¬ 
mate gives a higher volatility than the equally 
weighted estimate, but it returns to normal lev¬ 
els faster than the equally weighted estimate 
because it does not suffer from the ghost fea¬ 
tures discussed above. 

One of the disadvantages of using EWMA to 
estimate and forecast covariance matrices is that 
the same value of X is used for all the variances 
and covariances in the matrix. For instance, in a 
large matrix covering several asset classes, the 
same X applies to all equity indexes, foreign ex¬ 
change rates, interest rates, and / or commodi¬ 
ties in the matrix. But why should all these risk 
factors have similar reaction and persistence to 
shocks? This constraint is commonly applied 
merely because it guarantees that the matrix 
will be positive semidefinite. 

The EWMA Forecasting Model 

The exponentially weighted average variance 
estimate (19), or in its equivalent form (21), is 
just a methodology for calculating er 2 . That is, 
it gives a variance estimate at any point in time 
but there is no model as such that explains the 
behavior of the variance of returns, er 2 at each 
time t. In this sense, we have to distinguish 
EWMA from a GARCH model, which starts 
with a proper specification of the dynamics of 
er 2 and then proceeds to estimate the parame¬ 
ters of this model. 

Without a proper model, it is not clear how 
we should turn our current estimate of vari¬ 
ance into a forecast of variance over some fu¬ 
ture horizon. One possibility is to augment (21) 
by assuming it is the estimate associated with 
the model 

er 2 = (1 - X) rf _j + Xaf_ x r, |f,_i ~N (0, of) 

(25) 

An alternative is to assume a constant volatil¬ 
ity, so the fact that our estimates are time vary¬ 


ing is merely due to sampling error. In that case 
any EWMA variance forecast must be constant 
and equal to the current EWMA estimate. Sim¬ 
ilar remarks apply to the EWMA covariance, 
this time regarding EWMA as a simplistic ver¬ 
sion of bivariate normal GARCH. Similarly, the 
EWMA volatility (or correlation) forecast for all 
risk horizons is simply set at the current EWMA 
estimate of volatility (or correlation). The base 
horizon for the forecast is given by the fre¬ 
quency of the data—daily returns will give the 
one-day covariance matrix forecast, weekly re¬ 
turns will give the one-week covariance matrix 
forecast, and so forth. Then, since the returns 
are independent and identically distributed, the 
square root of time rule applies. So we can con¬ 
vert a one-day forecast into an h -day covariance 
matrix forecast by multiplying each element of 
the one-day EWMA covariance matrix by h. 

Since the choice of X itself is quite ad hoc, 
as discussed above, some users choose differ¬ 
ent values of X for forecasting over different 
horizons. For instance, as discussed later in this 
entry, in the RiskMetrics™ methodolgy a rel¬ 
atively low value of X is used for short-term 
forecasts and a higher value of X is used for 
long-term forecasts. However, this is purely an 
ad hoc rule. 


Standard Errors for EWMA Forecasts 

In the previous section, we justified the assump¬ 
tion that the underlying returns are normally 
and independently distributed with mean zero 
and variance a 2 . That is, for all t 

E (r,) = 0 and V(r t ) = E (r 2 ) = o 1 

In this section, we use this assumption to ob¬ 
tain standard errors for EWMA forecasts. From 
the above, and further from the normality as¬ 
sumption, we have: 

V(rf) = E (rf) - E (r 2 ) 2 = 3er 4 - a 4 = 2 ct 4 

Now we can apply the variance operator to 
(21) and calculate the variance of the EWMA 
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variance estimator as: 


V(af) = 0—^V(rf)=2- 
1 U (1 — X 2 ) V t} 1 


(26) 


For instance, as a percentage of the variance, 
the standard error of the EWMA variance esti¬ 
mator is about 5% when X = 0.95,10.5% when 
X = 0.9, and 16.2% when X = 0.85. 

A single point forecast of volatility can be very 
misleading. A forecast is always a distribution. 
It represents our uncertainty over the quantity 
that is being forecast. The standard error of a 
volatility forecast is useful because it can be 
translated into a standard error for a VaR esti¬ 
mate, for instance, or an option price. In any VaR 
model one should be aware of the uncertainty 
that is introduced by possible errors in the fore¬ 
cast of the covariance matrix. Similarly, in any 
mark-to-model value of an option, one should 
be aware of the uncertainty that is introduced 
by possible errors in the volatility forecast. 


The RiskMetrics™ Methodology 

Three very large covariance matrices, each 
based on a different moving average methodol¬ 
ogy, are available from www.riskmetrics.com. 
These matrices cover all types of assets includ¬ 
ing government bonds, money markets, swaps, 
foreign exchange, and equity indexes for 31 cur¬ 
rencies and commodities. Subscribers have ac¬ 
cess to all of these matrices updated on a daily 
basis—and end-of-year matrices are also avail¬ 
able to subscribers wishing to use them in sce¬ 
nario analysis. After a few days, the datasets are 
also made available free for educational use. 

The RiskMetrics™ group is the market leader 
in market and credit risk data and mod¬ 
eling for banks, corporate asset managers, 
and financial intermediaries. It is highly rec¬ 
ommended that readers visit the Web site 
(www.riskmetrics.com), where they will find a 
surprisingly large amount of information in the 
form of free publications and data. See the Ref¬ 
erences at the end of this entry for details. 


The three covariance matrices provided by the 
RiskMetrics group are each based on a history of 
daily returns in all the asset classes mentioned 
above. They are: 

1. Regulatory matrix: This takes its name from 
the (unfortunate) requirement that banks 
must use at least 250 days of historical data 
for VaR estimation. Hence this metric is an 
equally weighted average matrix with n = 
250. The volatilities and correlations con¬ 
structed from this matrix represent forecasts 
of average volatility (or correlation) over the 
next 250 days. 

2. Daily matrix: This is an EWMA covariance 
matrix with X — 0.94 for all elements. It is 
not dissimilar to an equally weighted aver¬ 
age with n = 25, except that it does not suf¬ 
fer from the ghost features caused by very 
extreme market events. The volatilities and 
correlations constructed from this matrix 
represent forecasts of average volatility (or 
correlation) over the next day. 

3. Monthly matrix: This is an EWMA covari¬ 
ance matrix with X = 0.97 for all elements 
and then multiplied by 25 (that is, using the 
square root of time rule and assuming 25 
days per month). The volatilities and correla¬ 
tions constructed from this matrix represent 
forecasts of average volatility (or correlation) 
over the next 25 days. 

The main difference between the three differ¬ 
ent methods is evidenced following major mar¬ 
ket movements: The regulatory forecast will 
produce a ghost effect of this event, and does 
not react as much as the daily or monthly fore¬ 
casts. The most reactive is the daily forecast, 
but it also has less persistence than the monthly 
forecast. 

Figure 7 compares the estimates for the FTSE 
100 volatility based on each of the three Risk- 
Metrics methodologies and using daily data 
from January 2,1995, to June 23, 2006. As men¬ 
tioned earlier in this entry, these estimates are 
assumed to be the forecasts over, respectively, 
one day, one month, and one year. In volatile 
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Figure 7 Comparison of the RiskMetrics "Fore¬ 
casts" for FTSE100 Volatility 

times, the daily and monthly estimates lie well 
above the regulatory forecast and the converse 
is true in more tranquil periods. For instance, 
during most of 2003, the regulatory estimate of 
average volatility over the next year was about 
10% higher than both of the shorter-term es¬ 
timates. However, it was falling dramatically 
during this period, and indeed the regulatory 
forecast of more than 20% volatility on average 
between June 2003 and June 2004 was entirely 
wrong. However, at the end of the period, in 
June 2006, the daily forecasts were above 20%, 
and the monthly forecasts were only just below 
this. However, the regulatory forecast over the 
next year was only slightly more than 10%. 

During periods when the markets have been 
tranquil for some time, for instance during the 
whole of 2005, the three forecasts tend to agree 
more. But during and directly after a volatile 
period there are large differences between the 
regulatory forecasts and the two EWMA fore¬ 
casts, and these differences are very difficult 
to justify. Neither the equally weighted aver¬ 
age nor the EWMA methodology is based on a 
proper forecasting model. One simply assumes 
the current estimate is the volatility forecast. 
But the current estimate is a backward-looking 
measure based on recent historical data. So both 
of these moving average models make the as¬ 
sumption that the behavior of future volatility 


is the same as its past behavior and this is a very 
simplistic view. 


KEY POINTS 

• The equally weighted moving average, or 
historical approach to estimating/forecasting 
volatilities and correlations, was the only sta¬ 
tistical method used by practitioners until the 
mid-1990s. 

• The historical method may provide a useful 
indication of the possible range for a long¬ 
term average, such as the average volatility or 
correlation over the next several years. How¬ 
ever, its application to short-term forecast¬ 
ing suffers from at least four drawbacks: (1) 
The forecast of volatility / correlation over all 
future horizons is simply taken to be the cur¬ 
rent estimate of volatility, because the under¬ 
lying assumption in the model is that returns 
are independent and identically distributed; 
(2) the only choice facing the user is on the 
data points to use in the data window; (3) fol¬ 
lowing an extreme market move the forecasts 
of volatility and correlation will exhibit a so- 
called "ghost" feature of that extreme move, 
which will severely bias the volatility and cor¬ 
relation forecasts upward; and (4) the extent 
of this bias depends very much on the size of 
the data window. 

• The bias issue associated with the his¬ 
torical approach was addressed by the 
RiskMetrics™ data and software suite. The 
choice of methodology helped to popularize 
the use of exponentially weighted moving av¬ 
erages (EWMA) by financial analysts. 

• The EWMA approach provides useful fore¬ 
casts for volatility and correlation over the 
very short term, such as over the new day 
or week. However, its use for longer-term 
forecasting is limited, and this methodology 
has two major problems: (1) The forecast of 
volatility/correlation over all future horizons 
is simply taken to be the current estimate of 
volatility, because the underlying assumption 
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in the model is that returns are indepen¬ 
dent and identically distributed, and (2) the 
only choice facing the user is about the value 
of the smoothing constant. With the EWMA 
approach, the forecasts produced depend 
crucially on this decision, yet there is no sta¬ 
tistical procedure to choose for the value of 
the smoothing constant. 

• Moving average models assume returns are 
independent and identically distributed, and 
the further assumption that they are normally 
distributed allows one to derive standard er¬ 
rors and confidence intervals for moving av¬ 
erage forecasts. But empirical observations 
suggest that returns to financial assets are 
hardly ever independent and identically, let 
alone normally, distributed. For these reasons 
more and more practitioners are basing their 
forecasts on generalized autoregressive con¬ 
ditional heteroskedasticity (GARCH) models. 

* There is no doubt that GARCH models pro¬ 
duce superior volatility forecasts. It is only 
in GARCH models that the term structure 
volatility forecasts converge to the long-run 
average volatility—the other models produce 
constant volatility term structures. Moreover, 
the value of the EWMA smoothing constant is 


chosen subjectively and the same smoothing 
constant must be used for all the returns, oth¬ 
erwise the covariance matrix need not be posi¬ 
tive semidefinite. But GARCH parameters are 
estimated optimally and GARCH covariance 
matrices truly reflect the time-varying volatil¬ 
ities and correlations of the multivariate 
returns distributions. 
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Abstract: MATLAB is a modeling environment that allows for input and output processing, sta¬ 
tistical analysis, simulation, and other types of model building for the purpose of analysis of a 
situation. MATLAB uses a number-array-oriented programming language; that is, a programming 
language in which vectors and matrices are the basic data structures. Reliable built-in functions, a 
wide range of specialized toolboxes, easy interface with widespread software like Microsoft Excel, 
and beautiful graphing capabilities for data visualization make implementation with MATLAB 
efficient and useful for the financial modeler. 


MATLAB is an interactive computing environ¬ 
ment for model development that also enables 
data visualization, data analysis, and numerical 
simulation. The core of the MATLAB environ¬ 
ment was created as a number-array-oriented 
programming language; that is, as a program¬ 
ming language in which vectors and matri¬ 
ces are the basic data structures. (MATLAB 
stands for Matrix Laboratory.) Operations in¬ 
volving matrices and vectors can be performed 
efficiently within the core MATLAB software 
product. More specialized operations, such as 
statistical data analysis, optimization, and sim¬ 
ulation, can be accessed by purchasing some 
of MATLAB's specialized toolboxes. Once a 
toolbox is installed, functions from the tool¬ 
box can be called in the same way as standard 
MATLAB functions, without any special ad¬ 
ditional syntax. MATLAB toolboxes that are 
useful for quantitative analysis in financial ap¬ 
plications include: 


* Statistics Toolbox 

* Optimization Toolbox 

* Global Optimization Toolbox 

* Curve Fitting Toolbox 

* Neural Network Toolbox 

* Partial Differential Equation Toolbox 

For example, the Statistics Toolbox contains 
data analysis tools (for multivariate analy¬ 
sis, statistical tests, statistical plots), random 
number generation tools, and quasi-random 
number generation tools, which are useful for 
implementing risk management and deriva¬ 
tive pricing routines. The Optimization Tool¬ 
box contains solvers for linear, quadratic, 
nonlinear, and binary optimization, which can 
aid quantitative portfolio allocation schemes. 
The Global Optimization Toolbox contains 
randomized search optimization subroutines 
that can be used for solving complex (e.g., 
mixed-integer) optimization problems to near 
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optimality. It is useful, for example, for creat¬ 
ing more complex portfolio allocation or trad¬ 
ing routines. For more details and information 
about the other toolboxes, see the Mathworks 
website, http: / / www.mathworks.com. 

MATLAB also has toolboxes that are specif¬ 
ically targeted at financial applications. These 
toolboxes include: 

• Financial Toolbox 

» Econometrics Toolbox 

• Datafeed Toolbox 

• Fixed-Income Toolbox 

• Financial Derivatives Toolbox 

For example, the Financial Toolbox contains 
specialized routines for computing frequently 
used financial quantities, such as present and 
future value, basic portfolio optimization, term 
structure of interest rates, bond prices, and 
derivative prices. It also contains functions that 
help with the manipulation of typical finan¬ 
cial data sets, such as multivariate regression 
with missing data. Many of these routines can 
be implemented by using standard MATLAB 
functions, but the Financial Toolbox puts them 
together in a convenient package. 

It is worth noting that most of the finan¬ 
cial toolboxes require installation of one or 
more of the mathematics toolboxes listed ear¬ 
lier. For example, the Financial Toolbox requires 
the Statistics Toolbox and the Optimization 
Toolbox. The Financial Derivatives Toolbox re¬ 
quires the Statistics, Optimization, and Finance 
Toolboxes. 

Another tool of interest to those who use Win¬ 
dows and Microsoft Excel extensively as the 
platform for their applications is Spreadsheet 
Link EX. Spreadsheet Link EX enables the ma¬ 
nipulation of Microsoft Excel worksheets from 
within MATLAB and using MATLAB functions 
from within Excel. This is a useful toolbox that 
allows powerful MATLAB capabilities to be ac¬ 
cessed through a familiar interface. 

This entry provides brief pointers to impor¬ 
tant aspects of modeling in MATLAB. We discuss 
basic array construction and operations, func¬ 


tions and scripts, as well as graphs. We also 
provide examples of MATLAB code for port¬ 
folio optimization schemes and for pricing a 
European call option by simulation. 

When readers try to implement such routines 
themselves, they may find it useful to know that 
the MATLAB manual and online help contain 
abundant information and examples. Detailed 
documentation is also provided in MATLAB it¬ 
self. For example, typing help at the prompt 
in MATLAB lists all major topics. Type help 
name of function at the prompt or in the 
box in the Flelp dialog box to access the docu¬ 
mentation on that function in MATLAB. If un¬ 
sure of which help topic is relevant, click on the 

button with question mark ( ) in MATLAB's 

top menu. It provides richer search options. 


THE MATLAB DESKTOP 
AND EDITOR 

The standard MATLAB desktop window con¬ 
tains a Workspace window, a Command His¬ 
tory window, and a Command window (see 
Figure 1). Depending on how you customize 
the MATLAB desktop window, however, you 
may see more or fewer windows. To check 
which windows are currently displayed and 
view other options, click on Desktop in the top 
MATLAB desktop window menu. 

MATLAB commands are entered in the Com¬ 
mand window. When a series of commands 
need to be given, it is more convenient to 
list them in an M-file, which is basically a file 
with instructions that MATLAB executes se¬ 
quentially. Such files ( scripts ) are saved with 
the suffix " .m" and can be called from the 
prompt in the Command window typing their 
name (without the suffix ".m"). For example, if 
you create a file OptimizePortfolio.m with in¬ 
structions on how to perform optimal portfolio 
allocation, you can call that file from the MAT¬ 
LAB command prompt by typing 

>> OptimizePortfolio 



Introduction to Financial Model Building with MATLAB 


419 


-> MATLAB 7.12.0 (R2011a) 


Elfc Edit Debug Desktop Window Help 

f Q <J | A <5 ■o e- > 3' £ <u Curent Foktec: | C ADocuments and Setthiqs\.dpacharnarK>ya\Mr DoctJiiientsVMATLAB ▼ 1 ...| ij 
Shortcuts Si How to Add 4 Whafs New 



Cunent FoJdei 

|o - MATLAB 


p to *• 


Start! 


Figure 1 The Standard MATLAB Desktop 


(If the file is saved in a directory other than 
the default MATLAB directory, you will need 
to make sure that MATLAB can find the file. 
Select Desktop > Current Directory from the 
top menu and navigate to the correct directory 
before typing the command at the prompt.) 

To create an M-file, you can use any text edit¬ 
ing program, such as WordPad, NotePad, and 
the open source editor Emacs. In general, it is 
convenient to use an editor that recognizes the 
MATLAB file type and provides helpful high¬ 
lighting for parts of the code that have differ¬ 
ent characteristics. (For example, comments in 
the code appear in different colors than com¬ 
mands.) MATLAB's own editor can do that, and 
Emacs can be set up to recognize the MATLAB 
file format as well. 

To call MATLAB's editor in order to create or 
edit M-files, select Desktop > Editor from the 
top menu. Alternatively, you can use the short¬ 
cut buttons at the top of the MATLAB desktop 

window: the button to open the MATLAB 


editor to write a new file, or the button — to 
open a file that has already been created. 


BASIC OPERATIONS AND 
MATRIX ARRAY 
CONSTRUCTION 
Basic Mathematical Operations 

MATLAB can perform many kinds of different 
mathematical operations, such as addition (+), 
multiplication (* or . *), square root (sqrt or 
sqrtm), and power (~). These commands can 
be entered at the command prompt. For exam¬ 
ple, typing 

>> 3*sqrt(4) + 15 

and pressing Enter produces the output 

ans = 

21 
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To suppress output, use the semicolon (;). For 
example, entering 

>> 3*sqrt(4) + 15; 

does not result in any visible output in the com¬ 
mand window. However, MATLAB still per¬ 
forms the calculation. To see this, let us assign 
the value of the above expression to a variable, 
ExpressionValue: 

>> ExpressionValue = 3*sqrt(4) + 15; 

Then, typing ExpressionValue at the com¬ 
mand prompt, you get 

>> ExpressionValue 
ExpressionValue = 

21 

Constructing Vectors and Matrices 

As mentioned earlier, MATLAB's core data 
structures are vectors and matrices. For exam¬ 
ple, the command 

>> x = [2 3 4 6] 

produces a horizontal vector array (one row) x 
that contains the numbers 2, 3,4, and 6. 

The semicolon (;) is used to create new rows. 
To create a vertical vector array y with the same 
entries, you can enter 

>> y = [2; 3; 4; 6] 

or press Enter after entering each number. 
(MATLAB treats semicolons and carriage re¬ 
turns in array declarations as new lines.) The 
different syntax is useful depending on the 
source for downloading the data that populate 
the arrays. 

Matrices are declared similarly. For example, 
a 2-by-2 matrix X can be specified as 

>> X = [1 2 3 4; 5 6 7 8] 

X = 

12 3 4 

5 6 7 8 


MATLAB is case-sensitive; that is, it will treat 
the matrix X and the vector x defined earlier as 
separate variables. 

Special commands exist for declaring types of 
matrices that are used often. For example, 

>> I = eye(3,3) 

I = 

10 0 
0 10 

0 0 1 

produces a 3 x 3 identity matrix. 

Similarly, the commands ones (n,tn) and 
zeros (n,m) can be used to declare matrices 
that contain only 0s or Is of the desired dimen¬ 
sion (n x m), and diag (x) can be used to create 
a matrix that has a vector x as its diagonal ele¬ 
ments, and 0s everywhere else. 

You can also "stack" matrices and vectors. For 
example, 

>> Y =[x; X] 

Y = 

2 3 4 6 

12 3 4 

5 6 7 8 

Basic Array Operations 

To transpose an array A, use the command 
transpose (A) or A' . This operation converts 
a horizontal vector into a vertical one and vice 
versa, and flips the elements of a matrix that 
contains real numbers in its entries around 
the diagonal, keeping the diagonal entries the 
same. 

For example, 

>> X' 
ans = 

1 5 

2 6 

3 7 

4 8 

To multiply two arrays, you can simply use 
the multiplication command *. Since the oper¬ 
ation * performs a matrix multiplication, you 
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need to make sure that the matrix dimensions 
agree. For example, an error results in the case 
when the 1x4 array x is multiplied by the 2x4 
array X: 

>> x*X 

??? Error using ==> mtimes 

Inner matrix dimensions must agree. 

To multiply x and X correctly, you can instead 
type 

>> x*X' 
ans = 

44 104 

If you need to perform an element-by-element 
multiplication of two arrays (of equal sizes), use 
the . * operator. For example, 

>> x.*x 
ans = 

1 4 9 16 

25 36 49 64 

Note that this is different from the matrix 
product. The matrix product would produce the 
following result: 

>> X'*X 
ans = 


26 

32 

38 

44 

32 

40 

48 

56 

38 

48 

58 

68 

44 

56 

68 

80 


When a matrix array is multiplied by a num¬ 
ber, all of the array's entries are multiplied by 
that number. Similarly, if a number is added to 
a matrix array, the number will be added to all 
of the elements of the matrix. For example, 

>> 10+X 
ans = 

11 12 13 14 

15 16 17 18 

Extracting Information from Arrays 

Suppose you have a matrix array Data with 
financial data on annual stock returns over 


10 years for 1,000 companies traded on the New 
York Stock Exchange, and you would like to 
check the entry for the return on stock 253 in 
year 7. You are dealing with a 10x1000 matrix 
array in which each row is a time period and 
each column contains the returns on a partic¬ 
ular stock. You are looking for the element in 
row 7, column 253 of this array. This can be 
requested with the command Data(7,253). 

Suppose now that you would like to extract 
information on all of stock 253's returns over 
the 10 years. This means that you are looking 
for the elements of column 253 of the matrix 
array. This can be requested with the command 
Data (: , 2 53 ). The colon operator replaces the 
row index to specify that elements with all in¬ 
dexes in the 253rd column should be produced. 
Similarly, if you would like to request all ele¬ 
ments in the same row (e.g., the returns on all 
stocks in year 7), you can use the colon operator 
again: Data (7 , :). 

To illustrate the output, let us use the matrix 
array X. To find out what the value of the ele¬ 
ment in row 1, column 3 is, enter 

>> X(1,3) 
ans = 

3 

The third column of X is 

>> X(:,3) 
ans = 

3 

7 

Similarly, the second row of X can be obtained 
as 

>> X(2,:) 
ans = 

5 6 7 8 

IMPORTANT MATLAB 
FUNCTIONS 

MATLAB supports a number of built-in func¬ 
tions. A function is written as a command and 
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takes arguments as inputs in parentheses. It 
processes the inputs by using operations hid¬ 
den from the user and passes the final results 
back to the user. While we cannot cover many 
of the MATLAB functions in this brief intro¬ 
duction, we illustrate how functions work with 
an example of the function f ind, which can be 
useful in many situations. 

Find takes in an array and a condition as 
arguments and returns the indexes of elements 
within the array that satisfy the condition. In ad¬ 
dition to traditional applications, find can be 
very helpful when dealing with missing data, 
which happens often with financial time series. 

Suppose you want to find the indexes of the 
elements that are less than 5 of the 1x4 array x 
from the previous section. At the prompt, type 

>> find(x<5) 

The result is 

ans = 

12 3 

Now let us see how find works when the 
array is a matrix rather than a vector. Recall that 
Y was the matrix array obtained by stacking x 
and X. Suppose you want to find the indexes of 
the elements in the array that are less than 5. At 
the prompt, type 

>> ind = find(Y<5) 

MATLAB creates the following array: 
ind = 

1 

2 

4 

5 

7 

8 

11 

MATLAB treated the matrix array as a 
stacked-up collection of column vectors. The 
elements of the array ind correspond to the 
indexes of the elements in that long column vec¬ 
tor. Obtaining the actual elements of Y that cor¬ 


respond to these indexes can be accomplished 
by typing 

>> Y(ind) 

This produces the answer 
ans = 

2 

1 

3 
2 

4 

3 

4 

The indexing of an array as a sequence of 
stacked columns works well if the array is a 
vector, but it can get confusing if the array is a 
matrix. In the latter case, it is more intuitive to 
obtain the indexes as a row and column index. 
For example, 

>> [indRow,indCol] = find(Y<5) 

produces 

indRow = 

1 

2 

1 

2 

1 

2 

2 

indCol = 

1 

1 

2 

2 

3 

3 

4 

This means that the following elements of Y 
have values less than 5: (row 1, column 1), (row 
2, column 1), (row 1, column 2), and so on. Un¬ 
fortunately, looking up the actual values of the 
elements of Y as Y (indRow, indCol) does not 
work. 
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CREATING USER-DEFINED 
FUNCTIONS 

The compactness of the function syntax makes 
functions desirable when a user needs to call 
a certain sequence of commands often. For 
example, the Black-Scholes formula for pricing 
European options takes a number of steps to 
compute. It is convenient to have a function 
that returns one value—the option price—to the 
user after the user inputs values of factors that 
determine that price, such as the strike price, 
the time to maturity, the volatility, and so on. 

Functions need to be written in M-files. Al¬ 
though general script M-files can contain any 
sequence of instructions that will be completed 


when the name of the file is typed at the MAT- 
LAB prompt, function M-files need to start with 
a specific first line. That line contains the word 
"function" and a declaration of the function 
name, inputs, and outputs. The function name 
and the name of the M-file should be the same. 

The Black-Scholes formula already exists in 
the Financial Toolbox, so it is convenient to see 
how the price is computed and discuss impor¬ 
tant aspects of writing user-defined functions. 
(We have skipped some lines in the code for the 
sake of brevity.) Users can view the source code 
for some of the advanced MATLAB functions 
in the toolboxes by entering type function 
name at the prompt. 


>> type blsprice 

function [call,put] = blsprice(S, X, r, T, sig, q) 

% BLSPRICE Black-Scholes put and call option pricing. 

% Compute European put and call option prices using a Black-Scholes model. 

% [Call,Put] = blsprice(Price, Strike, Rate, Time, Volatility) 

% [Call,Put] = blsprice(Price, Strike, Rate, Time, Volatility, % Yield) 

% Optional Input: Yield 

% Inputs: 

% Price - Current price of the underlying asset. 

% Strike - Strike (i.e., exercise) price of the option. 

% Rate - Annualized continuously compounded risk-free rate of 

% return over the life of the option, expressed as a positive decimal number. 
% Time - Time to expiration of the option, expressed in years. 

% Volatility - Annualized asset price volatility (i.e., annualized 

% standard deviation of the continuously compounded asset return), 

% expressed as a positive decimal number. 

% Optional Input: Yield - Annualized continuously compounded yield of the 
% underlying asset over the life of the option, expressed as a decimal 
% number. If Yield is empty or missing, the default value is zero. For 
% example, this could represent the dividend yield (annual dividend rate 
% expressed as a percentage of the price of the security) or foreign 
% risk-free interest rate for options written on stock indices and 
% currencies, respectively. 
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Outputs: 

Call - Price (i.e., value) of a European call option. 

Put - Price (i.e., value) of a European put option. 


% Copyright 1995-2005 The MathWorks, Inc. 

% {Revision: 1.8.2.5 $ $Date: 2005/09/18 16:19:06 $ 


% Input argument checking & default assignment. 


if nargin < 5 

error('Finance:blsprice:InsufficientInputs', ... 

'Specify Price, Strike, Rate, Time, and Volatility.') 

end 

if (nargin < 6) || isempty(q) 

q = zeros(size(S)); 

end 

message = blscheck('blsprice', S, X, r, T, sig, q); 
error(message); 

% Perform scalar expansion & guarantee conforming arrays. 


try 

[S, X, r, T, sig, q] = finargsz ('scalar' , S, X, r, T, sig, q) 
catch 

error('Finance:blsprice:InconsistentDimensions' , ... 

'Inputs must be scalars or conforming matrices.') 

end 


% Record array dimensions for output argument formatting. 


[nRows, nCols] = size(S); 

call = nan(nRows * nCols, 1); 
put = nan(nRows * nCols, 1); 
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% Convert to column vectors for intermediate processing. 


[S, X, r, T, sig, q] 


deal(S ( :) , X(:), r(:), T ( :) , sig(:), q(:)); 


% Enforce some boundary conditions that produce warnings (e.g., logarithm 
% of zero and divide by zero) and potential NaN's in the output option 
% price arrays: 

% (1) At expiration (i.e., T = 0), the price of all options is simply the 

% greater of their intrinsic value and zero. 

% (2) When the price of the underlying asset is zero (i.e., S = 0), the value 

% of a call option is zero and the value of a put option is equal to its 

% present value of the strike price (X). This boundary condition enforces 

% the "absorbing barrier" property associated with the geometric Brownian 

% motion diffusion process governing the price path of the underlying 

% asset (S) . 


(3) When the strike price is zero (i.e., X = 0), the value of a put option 
is zero and the value of a call option is equal to the price of the 
underlyer (S). 


isTimeZero 
call(isTimeZero) 
put (isTimeZero) 


(T == 0) ; 

max(S(isTimeZero) 
max (X(isTimeZero) 


Expired options. 

X(isTimeZero), 0); 
S(isTimeZero), 0); 


isStockZero 
call(isStockZero) 
if any(isStockZero) 
put(isStockZero) 

end 

isStrikeZero 

call(isStrikeZero) = 

put (isStrikeZero) = 


(S == 0) ; 
0 ; 


= X(isStockZero) 

(X == 0) ; 

S(isStrikeZero); 
0 ; 


% Worthless calls. 

* exp(-r(isStockZero).*T(isStockZero)); 


% Worthless puts. 


% Suppress a divide by zero warning ONLY for zero volatility conditions. Other 
% warnings could be valuable. 


state = warning; % Store the current state, 
if any(sig == 0) 

warning('off', 'MATLAB:divideByZero') 


end 
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% Now apply the general Black-Scholes European option pricing formulae, 

% excluding the boundary cases handled above, and explicitly handling 
% calculations that produce 0/0 = NaN's for the parameters of the 
% cumulative normal distribution function (i.e., dl & d2). 

% NaN's occur when S = X, r = q, and Sigma = 0. This situation corresponds to 
% at-the-money options written on riskless underlying assets. Such assets 
% should earn the risk-free rate less the dividend yield. But when r = q, the 
% net growth rate is also zero, resulting in 0/0 = NaN. 


i = ~(isTimeZero | isStockZero | isStrikeZero); 

dl = log(S(i)./X(i)) + (r(i) - q(i) + sig(i).~2/2) .* T(i); 

dl = dl ./(sig(i).*sqrt(T(i))); 
d2 = dl - (sig(i).*sqrt(T(i))); 

dl (isnan(dl)) = 0; 
d2(isnan(d2)) = 0; 

call(i) = S(i) .* exp(-q(i).*T(i)) .* normcdf( dl) - ... 

X(i) .* exp(-r(i) .*T(i)) . * normcdf( d2); 
put (i) = X(i) . * exp(-r(i).*T(i)) . * normcdf(-d2) - ... 

S(i) .* exp(-q(i) .*T(i)) . * normcdf(-dl); 

warning(state) % Restore the state. 


% Reshape the outputs for the user. 

call = reshape(call, nRows, nCols); 
put = reshape(put , nRows, nCols); 

% [EOF] 

Some aspects of this function are very compli¬ 
cated for a beginner, but a review of the func¬ 
tion syntax helps create a list of useful pointers 
to which you can refer when creating your own 
functions: 

• The first line contains the word function 
followed by a specification of the outputs of 
the function call (in this case, [call, put]). 


Note that a function can have more than one 
output. After calling the function, MATLAB 
computes the values for the outputs, and the 
variable call will contain the price of a Euro¬ 
pean call option, while the variable put will 
contain the price of a European put option. 
Next, we have an equal sign followed by the 
name of the function (blsprice) and the ar¬ 
guments for the function (S for current stock 
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price, X for strike price, r for rate of return, T 
for time to maturity, s ig for volatility, as well 
as the optional argument yield for continu¬ 
ous dividend yield). 

When the function is called with specific in¬ 
put values, you can assign the output to vari¬ 
ables. For example, 

>> [callOutput,putOutput] 

= blsprice(110,100,0.10,2,0.40) 
callOutput = 

38.1757 
putOutput = 

10.0488 

The names of the input variables need to par¬ 
ticipate in calculations in the function. For ex¬ 
ample, S appears as the current stock price 
in the first line (function [call,put] = 
blsprice (S, X, r, T, sig, q)), and 
this is the same variable that is used to store 
the value of the stock price in the compu¬ 
tations. Similarly, the names of the output 
variables (call and put) should appear 
somewhere in the text of the function and be 
assigned an expression, which can then be re¬ 
turned to the user. 

Note the abundance of the percentage sign (%) 
in the function code. This sign is used for writ¬ 
ing comments that are ignored by MATLAB 
when executing the code. It is always a good 
idea to comment abundantly in order to be 
able to retrace your reasoning later. The first 
comment line is called "the Fll line," and it is 
the line that is searched by the MATLAB built- 
in function lookfor. Lookfor searches all 
MATLAB files containing a keyword in their 
first line. (This is useful if you are not sure 
which function to use for a specific purpose, 
and you would like to find the names of all 
functions that maybe relevant.) Therefore, it 
is important to provide a meaningful descrip¬ 
tion of your function in the first commented 
line. After the first line, you can continue with 
a more detailed description of the function 
and list references. 


CONTROL FLOW 
STATEMENTS 

M-files, whether of a generic or function kind, 
can contain more advanced operations than ma¬ 
trix manipulation. Next, we briefly review a 
couple of control flow statements that are of¬ 
ten used in such files: the for loop and the if 
statement. 

The general format of a for loop is 

for n = array 

commands 

end 

The commands inside the for loop are exe¬ 
cuted once for every value in the column in the 
array. (Typically, the array is a vector of num¬ 
bers, so the loop is executed once for every num¬ 
ber.) For example, 

for n = 1:5 
v (n) = sqrt (n) ; 
end 

results in 

V = 

1.0000 1.4142 1.7321 2.0000 2.23G1 

The array 1:5 is equivalent to [1 2 3 4 5]. 
MATLAB starts out with n = 1, computes its 
square root, and assigns it to v (1). Then, it 
keeps repeating the process until it has com¬ 
puted v (5) for n = 5. 

Loops in MATLAB are often necessary, but 
as a general rule MATLAB is more efficient in 
array operations than in loops. For example, 
the same effect (adding 10 to each element of 
the vector x) can be achieved in two ways: 

for n = 1:4 

x(n) = x(n) + 10; 

end 

and 

>> x = x+10 

Both of them result in 

x = 


12 


13 


14 


16 
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The second command would typically be 
completed faster. Loops are not as inefficient as 
they used to be in older versions of MATLAB, 
however—the difference in speed between the 
two approaches has been greatly reduced in the 
latest versions of the software. 

The if statement has the following general 
format: 

if expression 

commands 

end 

The commands are completed only if all ele¬ 
ments in the expression are true. A somewhat 
more complex i f statement is 

if expressionl 
commands1 

elseif expression2 

commands2 

else expressions 

commands3 

end 

Commandsl are completed if expressionl 
is true. If expressionl is not true, MATLAB 
moves on and checks if expression2 is true. 
If expression2 is true, commands2 are com¬ 
pleted. If expression2 is not true either, 
MATLAB moves to expressions. If expres¬ 
sion is true, commands3 are completed; 
otherwise MATLAB exits. The elseif or else 
commands are optional in if statements. 

There are several other useful control flow 
statements, such as the while loop, switch- 
case constructions, and try-catch blocks. 
See the MATLAB manual and help for more 
detail. 


GRAPHS 

MATLAB is well known for its beautiful graph¬ 
ing capabilities. The most common function for 
plotting two-dimensional (2-D) graphs is plot. 

To illustrate how plot works, suppose we 
would like to plot the standard normal prob¬ 


ability distribution. We will use the function 
normpdf (available from the Statistics Tool¬ 
box), which computes the probability den¬ 
sity function (PDF) of a normal random 
variable. 

The command 

>> x = linspace(-6,6,100) 

creates a vector x with 100 values, equally 
spaced between the minimum value —6 and 
the maximum value +6. (In reality, the normal 
distribution stretches from negative infinity to 
positive infinity, but it is highly unlikely that 
we will obtain realizations that are greater than 
6 standard deviations away from the mean of 0, 
so we focus on plotting the center of the distri¬ 
bution.) 

The command 

>> y = normpdf(x) 

computes the values of the normal probability 
distribution function for every value in the ar¬ 
ray for x. 

To plot x versus y, use 
>> plot(x,y) 

The result is the graph in Figure 2. 

You can play with the options for the graph. 
For example, 

>> plot(x,y,'r:p'); title('Normal PDF'); 
xlabel('x'); ylabel('pdf') 

plots the same graph as a red dotted line with 
a pentagram symbol, labels the horizontal (x) 
and the vertical (y) axes, and creates a title for 
the graph (see Figure 3). 

To plot multiple graphs on the same picture, 
use the command hold on before you start 
and hold off when you are done with the 
instructions. For example, suppose we would 
like to plot the standard normal distribution 
and a standard t-distribution with 5 degrees of 
freedom on the same graph in order to compare 
them. The following sequence of commands ac¬ 
complishes this. 
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Figure 2 A Plot of the PDF of the Normal Distribution 
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Figure 3 A Plot of the PDF of the Normal Distribution (with Modified Options) 
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Normal Versus T Distribution with 5 DoF 



x 


Figure 4 Illustration of hold on / hold off Effect 


First, we declare a variable that follows a t- 
distribution with 5 degrees of freedom: 

>> t=tpdf(x,5); 

Then, we plot the graph: 

>> hold on 

>> plot(x,y,'r:p'); xlabel('x'); 

ylabel('pdf ') 

>> plot(x,1); 

>> title('Normal Versus T Distribution'); 
>> hold off 

The results are displayed in Figure 4. 
Alternatively, you can list several pairs of 
variables inside the plot function. For exam¬ 
ple, 

>> plot(x,y,'r:p',x,t); xlabel('x'); 
ylabel('pdf') 

>> legend('Normal PDF','T PDF') 

>> title('Normal Versus T Distribution 
with 5 DoF'); 

This script also creates a legend (Figure 5). 


Legend, titles, and other graph attributes can 
be added and modified also after the basic plot 
command has been given and a graph window 
has popped up. To modify an existing graph's 
options, click on the corresponding items in the 
top menu of the graph window. 

Suppose now that we would like to plot the 
two PDFs side by side in the same figure. To 
graph several separate graphs in the same 
figure, use the command subplot (number 
of rows, number of columns, index 
of graph within the graph array). 

For example, the code 

>> subplot(1,2,1), plot (x, y,'r:p'); 

xlabel('x'); ylabel('pdf') 

>> title (' (a) Normal PDF') 

>> subplot(1,2,2), plot(x,t); 

xlabel('x'); ylabel('pdf') 

>> title('(b) T PDF') 

produces the graph in Figure 6. 

Finally, we briefly discuss three-dimensional 
(3-D) graphs. They can be created with 
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Normal Versus T Distribution with 5 DoF 



x 


Figure 5 Changing Defaults and Plotting Multiple Graphs with the plot Function 


(a) Normal PDF (b) T PDF 



Figure 6 Multiple Plots within the Same Figure 
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commands like plot3 and surf, and as a gen¬ 
eral matter are more complex. 

The command plot3 (first variable 
x, second variable y, third vari¬ 
able z) plots points in 3-D space whose 
three coordinates are given by the vectors or 
matrices (x, y, z) in the three arguments of 
the function. The arguments need to be arrays 
of equal sizes. 

The command surf (x, y, z) plots a 
shaded surface using z as the height and (x, 
y) as the vectors or matrices that define the 
other two dimensions of the surface. When x 
and y are vector arrays, as is the case in most 
financial applications, the number of rows for 
z should be the length of the vector array y, 
and the number of columns for z should be the 
length of the vector array x. 

For example, suppose we would like to plot 
a multivariate normal distribution function for 
two normal variables, xl and x2, that have 
means of 0 and are correlated with covariance 
matrix [0.25 0.3; 0.3 1]. (Note that this notation 
means that the variance of xl is 0.25 (the stan¬ 
dard deviation of xl is 0.5), the variance of x2 
is 1 (the standard deviation of x2 is 1), and the 
covariance of xl and x2 is 0.3. 

The multivariate normal distribution func¬ 
tion can be computed with the MATLAB func¬ 
tion mvnpdf (X,mu. Sigma) . The arguments 
mu and Sigma are the vector array of aver¬ 
age (expected) values for the normal random 
variables and their covariance matrix, respec¬ 
tively. In this case, we have two normal random 
variables, so mu= [0 0] and Sigma = [0.25 
0.3; 0.3 1 ]. The first argument in the func¬ 
tion (matrix X) provides the points at which the 
function should be evaluated. The function is 
evaluated for every row of X, taking the el¬ 
ements in that row as the coordinates of the 
point at which the function should be evalu¬ 
ated. Therefore, since in our example we are 
looking at two normal random variables, there 
should be two columns of the matrix X. We 
cannot simply provide two columns with, say, 
equally spaced values for xl and x2. If we do. 


MATLAB would pair each entry of xl with the 
corresponding entry of x2, and will only use 
those combinations of coordinates, so the plot 
will look two-dimensional. The columns of X 
should provide a grid. In other words, we can¬ 
not simply provide possible coordinates along 
each axis and expect that MATLAB will know 
to take every combination of possible coordi¬ 
nates to obtain the points at which to plot the 
function. To create this grid of points, we need 
to go through a couple of steps. 

First, we would use the function [X1,X2] 
= meshgrid (xl, x2). It creates two matrices. 
The number of rows in the first matrix, Xl, is the 
same as the number of elements in the vector 
y (i.e., the number of rows equals length (y), 
another useful MATLAB command). Each row 
of the column Xl contains identical entries: the 
entries of the vector x. The matrix X2 contains 
the same number of columns as the number 
of elements in the vector x, and each column 
contains an identical copy of the vector y. While 
perhaps difficult to imagine at first, Xl (i, j ) 
and X2 (i, j ) cover all possible combinations 
of the elements of the original vectors, x and y. 

The second step is to create the array 
[Xl (:) , X2 (:) ]. The colon operator (:) has 
multiple uses, but in the context of being used 
as an argument for a matrix, it takes all en¬ 
tries of a matrix, column by column, and lists 
them as a vector array. Therefore, the array 
[Xl ( :) , X2 (: ) ] would contain two columns 
with every possible combination of coordinates 
generated by the original list in the vector ar¬ 
rays x and y. 

To summarize, here are the commands used 
to generate 30 points between -4 and 4 along 
each coordinate xl and x2, then to evaluate the 
multivariate normal PDF at each combination 
of coordinates: 

>> xl = linspace(-4,4,30); 

x2 = linspace(-4,4,30) ; 

>> Sigma = [0.25 0.3; 0.3 1]; mu = [0 0]; 
>> [X1,X2] = meshgrid(xl,x2); 

>> z = mvnpdf([Xl(:),X2(:)],mu,Sigma); 
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Multi\ariate Normal Probability Density 



Figure 7 Three-Dimensional Plot of a Multivariate Normal Distribution 


The output of this sequence of commands 
is a vertical array of values that represent the 
multivariate normal PDF evaluated at each 
combination of coordinates. (If you skip the 
semicolon at the end of the last row with the 
function mvnpdf, you can see what the out¬ 
put looks like. You can also use the command 
size (z) to check the dimensions of z.) Now 
we would like to plot these values. We will use 
the surf function. 

The surf function's third argument, z, needs 
to be a matrix whose entries represent the 
values of the function to be plotted at each 
combination of coordinates. However, we ob¬ 
tained a vector of values for the PDF. We need 
to "reshape" that vector back into a matrix. 
This can be done with the command Z = re - 
shape (z, m, n). The function reshape takes 
the array z and goes through the elements of z 
columnwise. The first m elements of z become 
the first column of the new matrix Z, the next m 
elements of z become the second column of the 
matrix Z, and so forth until n columns for Z are 
created. In this example, we would like to create 


length (xl) columns and length (x2) rows. 
(This may be a bit confusing, but, as we men¬ 
tioned earlier, the function surf expects the 
third argument to be a matrix with the number 
of columns equal to the size of the first argu¬ 
ment, and the number of rows equal to the size 
of the second argument.) 

>> Z = reshape(z,length(x2),length(xl)); 
>> surf(xl,x2,Z); 

>> title('Multivariate Normal 
Probability Density') 

>> axis([-4 4 -4 4 0 0.4]); 

>> xlabel('xl'); ylabel('x2'); 
zlabel('PDF'); 

The resulting graph is in Figure 7. 

IMPORTING DATA AND 
INTERACTING WITH 
SPREADSHEETS 

MATLAB recognizes files with the extension 
.dat as data files. Such files should contain text 
structured in rows and columns. For example. 
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suppose that the file returns.dat contains daily 
annual returns on the stocks traded in the NYSE 
for 10 years. The command 

>> load returns.dat 

imports the data in the file into a data 
structure—a matrix array with rows and 
columns that can then be referenced using some 
of the commands we described earlier. 

Many financial companies build their infra¬ 
structure around Microsoft Excel. The MATLAB 
core product contains some useful functions for 
importing Excel data and exporting MATLAB 
results to spreadsheets. The function 

>> xlsread('fileName','sheetName', 

'range') 

allows the user to read into MATLAB the data 
stored in file f ileName, worksheet sheet - 
Name, cells in range range. Instead of a range 
in the spreadsheet, you can state an array name 
if you had named the array of cells in ad¬ 
vance. Variations of this command exist; for 
instance 

>> xlsread('fileName',-1) 

allows the user to select the range in file¬ 
Name directly, through interactive selection in 
Excel. Type help xlsread at the MATLAB 
command prompt for further information. 

The function 

>> xlswrite('fileName ' ,output, ' sheetName ', 

' cell') 

allows the user to export MATLAB results 
(output) to a worksheet (sheetName) in an 
Excel file (f ileName). MATLAB preserves the 
dimensions of the output and writes it to the 
spreadsheet starting at cell reference cell. For 
example, if output is a horizontal array of 
numbers, MATLAB will write the data in a row 
in the Excel file, starting at cell. 

MATLAB operations work within the xl¬ 
swrite command. For example, you can 
switch the array dimensions (transpose) the 


output by using output' inside the parenthe¬ 
ses of the xlswrite command if you desire 
different output formatting in the Excel spread¬ 
sheet. 

More sophisticated capabilities exist through 
MATLAB's Excel Link. With Excel Link, you can 
call MATLAB's functions directly from within 
Excel, thus ensuring access to MATLAB's supe¬ 
rior computational and graphical capabilities. 
Excel Link is purchased as a separate toolbox. 
It can then be made visible from within Excel 
by selecting it as one of Excel's Add-Ins. There 
are 11 commands (they all start with "ML") 
that allow for communicating data back and 
forth between Excel and MATLAB. For exam¬ 
ple, =MLAppendMatrix () creates or appends 
a matrix in MATLAB with data from an Excel 
spreadsheet. 

A word of caution: Excel Link formulas are 
not case sensitive. For example, MLAppend- 
Matrix and mlappendmatrix are the same. 
However, MATLAB functions and variables 
called through these links are case sensitive. For 
example, x and X would still be treated as two 
separate variables. 

EXAMPLES 

This section discusses several scripts and func¬ 
tions in MATLAB that can be used in financial 
applications. The goal is to illustrate the use of 
toolboxes in MATLAB and to provide concrete 
examples of some of the tools introduced earlier 
in the entry. 

Optimization in MATLAB 

Optimization is an area in applied mathematics 
that, most generally, deals with efficient algo¬ 
rithms for finding an optimal solution among 
a set of solutions that satisfy given constraints. 
The first application of optimization in finance 
was suggested by Harry Markowitz in 1952, in a 
seminal paper that outlined his mean-variance 
optimization framework for optimal asset allo¬ 
cation. Some other classical problems in finance 
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Table 1 MATLAB Optimization Toolbox Functions/Solvers Appropriate for Specific Types of Optimization 
Problems 






OBJECTIVE 





Linear 

Quadratic 

Least squares 

Smooth nonlinear 

Nonsmooth 

c d 

H 

None 

N/A 

quadprog 

\, isqcurvefit, isqnonlin 

fminsearch,fminunc 

fminsearch, * 

H 

Bound 

linprog 

quadprog 

isqcurvefit, isqlin. 

fminbnd, fmincon. 

* 

CsS 




isqnonlin, isqnonne 

fseminf 


H 

CD 

Linear 

linprog 

quadprog 

isqlin 

fmincon, fseminf 

* 

z 

Smooth 

fmincon 

fmincon 

fmincon 

fmincon, fseminf 

* 

o 

u 

nonlinear 

Discrete 

bintprog 






Note: Asterisk (*) is used to denote solvers that are available only through the Global Optimization Toolbox. Blank 
entries mean that there is currently no solver available. Technically, the Global Optimization Toolbox can be used for 
solving discrete problems as well; however, it requires additional programming. 


that can be solved by optimization algorithms 
include: 

1. Is there a possibility to make riskless profit 
given market prices of related securities? 

2. How should trades be executed so as to reach 
a target allocation with minimum transaction 
costs? 

3. Given a limited capital budget, which capital 
budgeting projects should be selected? 

4. Given estimates for the costs and benefits 
of a multistage capital budgeting project, at 
what stage should the project be expanded/ 
abandoned? 

MATLAB's Optimization Toolbox contains 
solvers for a range of optimization problems. 
MATLAB expects optimization formulations 
to be passed to its solvers in an array form 
and has functions that call specific solvers for 
specific types of optimization problems. (See 
Table 1 for a quick overview. See also 
MATLAB's help for a complete listing.) If the 
Global Optimization Toolbox is available, the 
range of solvers is expanded to include ran¬ 
domized search algorithms. 

The most often used solver in MATLAB is 
f mincon, which is the solver for general non¬ 
linear optimization. However, if you know the 
type of problem you are trying to solve, you are 
always better off giving the optimization soft¬ 


ware as much information as you can in order 
to make the optimization process more accurate 
and efficient. In financial applications, you are 
most likely to encounter situations in which you 
need 1 inprog (a linear programming solver), 
quadprog (a quadratic programming solver), 
bintprog (a binary programming solver), and 
randomized search algorithms, such as simu- 
lannealbnd and ga. 

We will use 1 inprog and quadprog to solve 
two examples of portfolio allocation problems. 
Before we show the actual implementation, we 
need to explain how solvers are actually called 
in MATLAB. There are two ways to call the 
solvers: as functions directly from the command 
prompt (equivalently, from within M-files), or 
through the optimization tool. 

The MATLAB optimization tool provides an 
interface between the solvers and the user. 
While using such an interface may not be op¬ 
timal when solving sequences of optimization 
problems, as in the case of dynamic program¬ 
ming or stochastic programming, it is quite 
convenient when solving a single optimization 
problem, because it lists all available solvers, 
prompts the user for the different inputs that 
the optimization solvers expect, and allows for 
easy manipulation of the options. Options can 
be specified directly when a solver is called from 
the command prompt as well, but that is more 
difficult for MATLAB beginners. 
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Figure 8 The Optimization Tool Interface in MATLAB 


The optimization tool is called by typing op - 
timtool at the MATLAB command prompt. 
The optimization tool dialog box is shown in 
Figure 8. The panel on the left-hand side is ded¬ 
icated to the specification of the inputs: the type 
of solver that needs to be called, the arrays with 
the problem data, the starting point, and so on. 
The panel in the middle allows for changing the 
level of tolerance in the search for the optimal 
solution. For example, the Function tolerance is 
currently set at the default value of le-06, which 
is 10" 6 . This means that the selected algorithm 
will continue to iterate through solutions until 
the improvement in successive objective func¬ 
tion values becomes smaller than 10" 6 . Some¬ 
times, such level of accuracy is unnecessary. 
For example, if our objective function is mea¬ 
sured in dollars and cents (e.g., we are maxi¬ 
mizing dollar return as in the simple portfolio 
allocation example we will discuss next), then 
technically we do not need precision beyond 
2-3 digits after the decimal point. Therefore, we 
can speed up the algorithm by relaxing the re¬ 


quirements on tolerance. Other useful options 
include level of display (whether to show itera¬ 
tions of the optimization algorithm or not) and 
function plots at intermediate stages. 

Linear Optimization: Simple Portfolio 
Allocation 

Let us consider a specific example to illus¬ 
trate the use of the optimization function lin- 
prog. (For more details, see section 5.3.1 in 
Pachamanova and Fabozzi, 2010.) 

The portfolio manager at a large university 
in the United States is tasked with investing a 
$10 million donation to the university endow¬ 
ment. He has decided to invest these funds only 
in mutual funds and is considering the follow¬ 
ing four: an aggressive growth fund (Fund 1), 
an index fund (Fund 2), a corporate bond fund 
(Fund 3), and a money market fund (Fund 4), 
each with a different expected annual return 
and risk level. (The risk level measurement is 
deliberately simplified for the sake of this ex¬ 
ample.) The investment guidelines established 
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Table 2 Data for the Portfolio Manager's Problem 


Fund type 

Growth 

Index 

Bond 

Money 

market 

Fund # 

1 

2 

3 

4 

Expected return 

20.69% 

5.87% 

10.52% 

2.43% 

Risk level 

4 

2 

2 

1 

Max investment 

40% 

40% 

40% 

40% 


by the Board of Trustees limit the percentage of 
the money that can be allocated to any single 
type of investment to 40% of the total amount. 
The data for the portfolio manager's task are 
provided in Table 2. In addition, in order to 
contain the risk of the investment to an accept¬ 
able level, the amount of money allocated to 
the aggressive growth and the corporate bond 
funds cannot exceed 60% of the portfolio, and 
the aggregate average risk level of the portfolio 
cannot exceed 2. What is the optimal portfolio 
allocation for achieving the maximum expected 
return at the end of the year, if no short selling 
is allowed? 

The vector of decision variables for this op¬ 
timization problem can be defined as x = 
(x\, X 2 , Xj, x 4 ): amounts (in $) invested in Fund 
1, 2, 3, and 4, respectively 

Let the vector of expected returns be p = 
(20.69%, 5.87%, 10.52%, 2.43%). Then, the ob¬ 
jective function can be written as 

f(x) = \i!x = (20.69%) • xi + (5.87%) • x 2 
+ (10.52%) • x 3 + (2.43%) • x 4 

It represents the optimal expected dollar 
amount at the end of the year. 

There are also several constraints. 


3. The average risk level of the portfolio can¬ 
not be more than 2. This constraint can be 
expressed as 

4*(proportion of investment with risk level 4) + 
2 *(proportion of investment with risk level 2 ) + 
^(proportion of investment with risk level 1 ) < 
1 or, mathematically, 

4 • X\ -f- 2 ■ X 2 2 • Xj -f- 1 ■ x 4 ^ 

X\ “I - %2 “I” X 3 X 4 

In this particular example we know that the 
total amount x 4 + x 2 + x 3 + x 4 = 10 , 000 , 000 , so 
the constraint can be formulated as 

4 ■ x 4 + 2 • x 2 + 2 • X 3 + 1 ■ x 4 < 2 • 10,000,000 

1. The maximum investment in each fund can¬ 
not be more than 40% of the total amount 
($4,000,000). These constraints can be writ¬ 
ten as 

x 4 < 4,000,000, x 2 < 4,000,000, x 3 < 4,000,000, 
x 4 < 4,000,000. 

2. Given the no short selling requirement, 
the amounts invested in each fund cannot 
be negative. These are nonnegativity con¬ 
straints: x 4 > 0 , x 2 > 0 , X 3 > 0 , x 4 > 0 . 

The final optimization formulation can be 
written in matrix form. The objective function 
is 

~*i~ 

max [ 0.2069 0.0587 0.1052 0.02431 ■ * 2 

Xi,X 2 ,X 3 ,X4 L J X3 

_*4_ 

Let us organize the constraints into groups ac¬ 
cording to their signs. This will be useful when 
we input the constraints into MATLAB. 


1. The total amount invested should be $10 mil¬ 
lion. This can be formulated as X\ + x 2 + x 2 + 
x 4 = 10,000,000. 

2. The total amount invested in Fund 1 and 
Fund 3 cannot be more than 60% of the total 
investment ($6 million). This can be written 
as 


x 4 + X 3 < 6 , 000,000 


Xl 


Equality (=) : [ 1 


Inequality(<) : 


1 1 11 • X2 

J To 


10 , 000,000 






x 4 _ 




1 

0 

1 

0" 



'6,000,000' 

4 

2 

2 

1 


■*T 


20,000,000 

1 

0 

0 

0 


x 2 


4,000,000 

0 

1 

0 

0 


x 3 

~ 

4,000,000 

0 

0 

1 

0 


x 4 


4,000,000 

0 

0 

0 

1 



4,000,000 
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Nonnegativity(>) 


*r 


"O' 

*2 

> 

0 

*3 


0 

_*4_ 


0 


This is a linear optimization problem because 
all constraints and the objective function are 
linear. To solve linear optimization problems 
with MATLAB, use 1 inprog (f ,A,b,Aeq, 
beq, lb, ub). The function arguments f, A, b, 
Aeq,beq, lb,ub correspond to the following 


LP formulation: 

min f'x 

X 

s.t. Ax < b 

Aeq x = beq 
lb < x < ub 

Therefore, before calling 1 inprog, you need 
to write the problem formulation in this partic¬ 
ular form. We include the complete MATLAB 
script below. 


1 numAssets = 4; 

2 expReturnsVec = [0.2069 0.0587 0.1052 0.0243]'; 

3 %create placeholders for an array of decision variables 

4 %(amounts to invest in 

5 %each fund) and the optimal portfolio expected return (to be filled out 

6 %after the optimization) 

7 

8 amountsVec = zeros(numAssets,1); 

9 optReturn = []; 

10 

11 ^vector of coefficients of objective function f since MATLAB expects 

12 %minimization (and we are maximizing), take the negative of the function 

13 %we are trying to maximize 

14 f = -expReturnsVec; 

15 

16 %A, matrix of coefficients in constraints with inequalities so that 

17 %Ax<=b 

18 A = [1 0 1 0; 

19 4221; 

20 1000 ; 

21 0100; 

22 0010; 

23 0 0 0 1]; 

24 

25 %b 

26 b = [6000000 20000000 4000000 4000000 4000000 4000000] ' ; 

27 

28 %Aeq, matrix of coefficients in constraints with equalities so that 

29 %Aeq*x=beq 

30 Aeq = ones(1,numAssets); 

31 

32 %beq 

33 beq = 10000000; 

34 

35 %lower bounds: nonnegativity requires that all decision variables are >= 0 
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36 lb = zeros(numAssets,1); 

37 

38 %upper bounds can be left infinite (although, technically, we cannot invest 

39 %more than the $10m we have available) 

40 ub = inf*ones(numAssets,1); 

41 

42 [amountsVec,optReturn] = linprog(f,A,b,Aeq,beq,lb,ub); 

43 

44 format('bank'); 

45 

46 amountsVec 

47 %revert to correct number for maximum return (reverse sign) 

48 optReturn = -optReturn 


The process for formulating the optimiza¬ 
tion problem is as follows. First, we ask 
ourselves what corresponds to the vector of 
decision variables x in the linprog formu¬ 
lation. In our example, x maps directly to the 
vector of amounts to invest in each asset. We 
then enter problem data, such as the expected 
returns vector expReturnsVec. We allocate 
empty arrays to store the values of the optimal 
solution amountsVec and the optimal value of 
the objective function optReturn after collect¬ 
ing the information from the solver. 

Next, we create the input data for the lin¬ 
prog solver. The solver expects a vector of ob¬ 
jective function coefficients f, which in our case 
is the vector of expected returns on the dif¬ 
ferent assets. Note, however (line 14), that we 
specify f as -expReturnsVec. This is because 
MATLAB expects a minimization problem, and 
our objective function is to maximize expected 
revenue, so we need to convert our problem to 
the required form by minimizing the negative 
of the expression for the maximization objec¬ 
tive. At the end (line 48), we take the negative 
of the optimal value for expected return found 
by the solver, so that we arrive at the actual 
optimal value for the maximization problem. 
The optimal values of the decision variables, 
which in this case are the amounts to invest, 
amountsVec, do not need to be modified af¬ 


ter the optimization results are returned by the 
solver. 

Lines 14-40 contain the specification of the 
other inputs in the problem. Note that we are 
in fact using the matrices of coefficients for the 
groups of constraints (inequality, equality, and 
nonnegativity) that we defined earlier. Namely, 
A (lines 18-23) is the matrix of left-hand-side 
inequality constraint coefficients; Aeq (line 30) 
is the matrix of left-hand-side equality con¬ 
straint coefficients; b (line 26) is the vector 
of right-hand-side coefficients of the inequal¬ 
ity constraints; and beq (line 33) is the vector 
of right-hand-side coefficients of the equality 
constraints (in our example, we have only one 
equality constraint). The lower bounds, lb (line 
36), are the zeros from the right-hand-side of 
the nonnegativity constraints on the decision 
variables, so we create a vector array with size 
equal to the number of decision variables that 
contains only zeros. We have explicit upper 
bounds of $4,000,000 on each decision variable 
since we cannot invest more than that amount 
in each individual fund, so we could have stated 
those bounds as the input vector ub. However, 
these bounds have already been included in 
the matrix A, so we do not need to state them 
again. Instead, we state the individual upper 
bounds as infinity, that is, as the product of 
the number inf (in MATLAB, that denotes 
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infinity) and a vector of ones. (See line 40 of 
the code.) 

An equivalent formulation of the constraints 
from MATLAB's perspective would have been 
to specify the arrays A, beq, and ub as 

A = [10 10; 

4 2 2 1] 

b = [6000000 20000000]' 

ub = 4000000*ones(numAssets,1) 

with all other input arrays remaining the 
same. 

After all inputs have been specified, the 1 in - 
prog solver is called (line 42). The syntax in line 
42 outputs requests that the output from the op¬ 
timization be stored in the arrays we specified at 
the beginning, amountsVec and optReturn. 
The results are then printed to the screen and 
are formatted according to format ('bank' ) 
(line 44), which basically rounds numbers to 
two decimal places. 


After running the M-file, we obtain the fol¬ 
lowing output: 

amountsVec = 

2000000.00 
0.00 
4000000.00 
4000000.00 
optReturn = 

931800.00 

If you prefer to solve the problem by using 
the optimization tool for solving this problem, 
you need to fill out the dialog box as shown in 
Figure 9. Select linprog as the solver from the 
drop-down menu at the top. Under Algorithm, 
you can either leave the default (Large Scale), 
or select Medium scale - simplex, which is ap¬ 
propriate because our problem is quite small. 
We entered the names of the arrays that corre¬ 
spond to the objective function coefficients and 
the constraint coefficients in the correspond¬ 
ing fields in the left panel of the dialog box. 


•} Optimization Tool 



Figure 9 The Optimization Tool Dialog Box for the Portfolio Allocation Problem 
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MATLAB 7.6.0 (R2008a) 
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Figure 10 Handling the Structure of Optimization Results Exported from MATLAB's Optimization 
Tool 


Note that these arrays must be prefilled; that 
is, they must be entered from the command 
prompt or read from a file before the prob¬ 
lem is solved through the optimization tool; 
otherwise the solver will complain that these 
arrays are empty You can make sure that the 
arrays f, A, b, Aeq, beq, lb, ub are filled in by 
checking first whether they are listed in the 
Workspace window at the upper left comer of 
the MATLAB desktop. Once all the input data 
are specified, click on the Start button in the 
left panel to solve the problem. The solution 
appears in the field below the Start button. 

The optimization model can be saved as a 
script in an M-file by selecting File | Generate 
M-file from the main menu in the optimization 
tool. In addition, the optimization results can be 
exported to the workspace and further manip¬ 
ulated by selecting File | Export to Workspace. 
To export only the results, as opposed to the en¬ 
tire model, check Export results to a MATLAB 
structure named: optimresults. This cre¬ 
ates a structure of results, optimresults, that 


shows up in the Workspace. So, for example, 
the optimal solution (the portfolio allocation) 
can be read by typing optimresults .xatthe 
command prompt. (See Figure 10.) Similarly, 
the optimal value of the objective function can 
be retrieved by typing optimresults . fval 
at the command prompt. 

Quadratic Optimization: Mean-Variance 
Portfolio Allocation 

The classical mean-variance portfolio opti¬ 
mization problem as introduced by Harry 
Markowitz (1952) is to minimize the variance of 
portfolio return subject to the constraint that the 
expected portfolio return is at a certain level. Let 
us consider a slight variation of the problem, in 
which we require that the expected return is at 
least at a certain level r tar get. The mathematical 
formulation is 

min w'lw 

W 

S-t. w'p > Y target 

W'l = 1 
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where w is the vector of portfolio weights (to be 
determined), p is the vector of expected returns, 
£ is the covariance matrix of returns, and i is a 
vector of ones of appropriate dimension. 

The minimum variance portfolio allocation 
problem is a quadratic optimization problem 
with linear constraints. The quadprog func¬ 
tion in MATLAB solves exactly problems of this 
kind: 

1 

min -x'Hx + f'x 
x 2 

s.t. Ax < b 

Aeq ■ x = beq 
lb < x < ub 

and is called with the command 
quadprog (H, f , A, b, Aeq, beq, lb, ub) . 

It is easy to see how to match the two formu¬ 
lations: 

• x = w 

• f = 0 

• H = 2 £ 

• A = — p' 

• b = f target 

• Aeq = T 

• beq = 1 

• lb = —infinity 

• ub = infinity 

numAssets = 2; 
muVec = [9.1; 12.1]; 

SigmaMx = [272.25, -57.35; 

-57.35, 249.64]; 


For example, the inequality constraint 

W P h rtarget 

in the mean-variance formulation is mapped to 
the inequality constraint assumed by the quad - 
prog function 

Ax < b 

by rewriting the mean-variance constraint as 

W p 5 f target 

and setting A = — p' and b = — r target- 
Suppose we are given a portfolio with a num¬ 
ber of stocks equal to numAssets, expected 
returns for the stocks stored in a vertical vec¬ 
tor muVec, covariance matrix SigmaMx, and 
required expected return of targetReturn. 
Consider a simple portfolio of two stocks with 
expected returns of 9.1% and 12.1%, standard 
deviations of returns of 16.5% and 15.8%, and 
a correlation of -0.22 (covariance of -57.35). A 
MATLAB script that uses input data for the two 
stocks, calls the optimization solver for several 
instances of the problem with different values 
of targetReturn, and plots the efficient fron¬ 
tier looks as follows: 


targetReturn = 11; 

%SINGLE OPTIMIZATION 

%create the matrix X 
H = 2 *SigmaMx; 

%create a vector of length numAssets with zeros 
f = zeros(numAssets,1); 

%create right hand and left hand side of inequality constraints 
A = -transpose(muVec); 
b = -targetReturn; 
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%create lower bounds array for asset weights (negative infinity) 
lb = -inf‘ones(numAssets,1); 

%create upper bounds array for asset weights (infinity) 
ub = inf* ones(numAssets,1) ; 

%create right hand and left hand side of equality constraints 
beq = [1]; 

Aeq = transpose(ones(numAssets,1)); 

[weights,variance] = quadprog(H,f,A,b,Aeq,beq,lb,ub); 

%print results to screen 
stdDev = sqrt(variance) 
weights 

%EFFICIENT FRONTIER 

%loop through different values of the target portfolio returns, compute the 
%optimal portfolio standard deviation, and plot the efficient frontier 

iCounter = 1; 


for iTRet = 9.5:0.5:12 
b = -iTRet; 

[weights,variance] = quadprog(H,f,A, 
y(iCounter) = iTRet; 
x(iCounter) = sqrt(variance); 
iCounter = iCounter + 1; 

end 

%plot efficient frontier 
plot(x,y); 

xlabel('Portfolio standard deviation'); 
ylabel('Portfolio expected return'); 
title('Efficient Frontier'); 

The command 

[weights,variance] = quadprog(H,f,A,b, 
Aeq,beq,lb,ub); 

ensures that the optimal solution to the opti¬ 
mization problem is stored in a vector called 
weights, and the optimal objective function 
value (the minimum portfolio variance) is 
stored in the scalar variance. This is an ex¬ 
ample of using a MATLAB built-in function. 


,Aeq,beq,lb,ub); 


The portfolio standard deviation is computed 
as the square root of variance. 

The MATLAB output from running the code 
above is as follows: 

stdDev = 

10.4928 
weights = 

0.3667 

0.6333 
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The script also contains an example of a for 
loop that runs the optimization problem for val¬ 
ues of the target return between 9.5 and 12, 
increasing the target return by 0.5 at each it¬ 
eration. The expected portfolio return and the 
optimal standard deviation obtained from the 
optimization output are stored in vectors x and 
y. The last few lines in the code plot the efficient 
frontier using the values stored in x and y, and 
label the graph. 


Pricing a European Call Option by 
Simulation 

Simulation is a technique for replicating uncer¬ 
tain processes and evaluating decisions under 
uncertain conditions. In the financial context, it 
typically involves generation of random num¬ 
bers from particular probability distributions, 
using those for approximating the behavior of 
exogenous variables such as stock returns, and 
assessing outcomes of interest, such as the per¬ 
formance of a portfolio or the price of a financial 
instrument. 

Through the Statistics Toolbox, MATLAB pro¬ 
vides commands for generating the most com¬ 
monly used random numbers directly. For 
example, a normal random variable can be sim¬ 
ulated with 

>> normrnd(mean,stdev,numRows, 
numColumns) 

In the expression above, mean and stdev are 
the mean and the standard deviation of the 
normal random variable. numRows and num¬ 
Columns specify the dimension of the array of 
random numbers we would like to generate. 

We show how to use MATLAB's Statistics 
Toolbox to compute the price of a European 
call option with simulation under the assump¬ 
tions that there are no transaction costs or 
market frictions, and the price of the underly¬ 
ing follows geometric Brownian motion. (The 
closed-form formula for pricing the option un¬ 
der these assumptions is the Black-Scholes for¬ 


mula.) Option pricing by simulation was first 
suggested by Boyle (1977). For further details 
on the implementation and more examples, see 
Pachamanova and Fabozzi (2010). 

The evolution of the asset price at time f, S t/ 
can be described by the equation 

dSt = iiStdt + aSfdWt 

where W f is standard Brownian motion and 
fi and ct are the drift and the volatility of the 
process, respectively. For technical reasons (ab¬ 
sence of arbitrage), when pricing an option, the 
drift fi is replaced by the risk-free rate r. 

Under the assumption for the random process 
followed by the asset price, the value of the asset 
price Sj at time T given the asset price S t at time 
f can be computed as 

S T = s f g( r “l CT2 )-(r-f)+<T.^/(r-q.e 

where e is a standard normal random variable. 
(If the stock pays a continuously compounded 
dividend yield of q, then we use (r - q - 0.5-a 2 ) 
instead of (r - 0.5-cr 2 ) as the drift term.) 

The price of the option can be approximated 
by creating scenarios for the stock price St at 
time T, computing the discounted payoffs of 
the option, and finding the expected payoff of 
the option. Suppose we generate N scenarios 
for s: .., s <N> . Then, the price of a European 

call option with strike price K will be 


■max ^s t e^ a2HT - t)+a ^^- ein) - X,oj 

The expression above is the expected value of 
the option payoffs; that is, the weighted aver¬ 
age of the option payoffs. The "weight," or the 
probability of each scenario, is 1/N. 

In MATLAB, we create a function European- 
Call (stored in a file EuropeanCall.m), which 
follows. 
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function CEPrice = EuropeanCall(initPrice,K,r,T,sigma,q,numPaths) 

%function for evaluating the price of a European call option using crude 
%Monte Carlo 

%initPrice is the initial price, K is the strike price, r is the annual interest 
%rate, T is the time to maturity, sigma is the annual volatility, q is the 
%continuous dividend yield, numPaths is the number of scenarios to generate for 
%the evaluation 

CEpayoffs = zeros(numPaths,1); 

%compute a vector array of asset prices, one for each scenario 
assetPrices = initPrice*exp((r-q-0.5*sigma~2)*T+sigma*sqrt(T)* 
normrnd(zeros(1,numPaths),ones(1,numPaths))); 

CEpayoffs = exp(-r*T)*max(assetPrices - K,0); 

CEPrice = mean(CEpayoffs); 


In the function, we generate the (random) 
end points of numPaths paths for the un¬ 
derlying stock price under the assumption 
that the price follows geometric Brownian 
motion. We use the Statistics Toolbox function 
normrnd (mu, sigma) , which in this case 
returns a vector array with the realizations of 
normal random variables. The array has the 
dimension of the mu and sigma vectors, which 
are vectors of zeros and ones, respectively, with 
length numPaths. Then, we generate a vector 
array of asset prices by calculating the asset 
price in each scenario. We use a nice feature in 
MATLAB, which is that we can pass an array 
(namely, normrnd (zeros (1 ,numPaths ) , 
ones ( 1 , numPaths )) into a formula (namely, 
initPrice*exp((rq-0.5*sigma"2)*T + 
sigma*sqrt(T)* normrnd(zeros( 1 , 
numPaths) ,ones( 1, numPaths) )), and 
MATLAB automatically creates an array with 
results (assetPrices). In other programming 
languages, we would need to implement this 
by creating a for loop. 

Finally, we calculate the option price CEPrice 
as the average of the payoffs of the option in 
each scenario by using the function mean. 


Pricing a European Call Option 
Using a Sobol Sequence 

In the function EuropeanCall, we used the 
MATLAB built-in function normrnd from the 
Statistics Toolbox with arguments that were ar¬ 
rays of zeros and ones to generate a set of 
realizations drawn from a standard normal 
probability distribution and compute a set of 
paths for the price of the underlying. Alter¬ 
natively, we could have generated a set of 
quasirandom numbers that sometimes lead to 
a faster and more accurate estimation for the 
option price. (See the discussion in Chapter 14 
of Pachamanova and Fabozzi, 2010; Chapter 
6 in McLeish, 2005; or section 5.2.3 of Chap¬ 
ter 5 in Glasserman, 2004.) MATLAB's Statis¬ 
tics Toolbox contains built-in syntax for com¬ 
puting the elements of some low-discrepancy 
sequences, such as the Sobol sequence (Sobol, 
1967). Namely, the function sobol set (d) 
computes a Sobol sequence of dimension d, 
and the sequence can then be retrieved with 
the command net. For example, 

seq = sobolset(3); net(seq,5) 
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returns the first five elements of a Sobol se- The calculation of the European call option 
quence of dimension 3. price using the Sobol sequence is shown in the 

function EuropeanCallSobol below. 

function SCEPrice = EuropeanCallSobol(initPrice,K,r,T,sigma,q,numPaths) 

%function for evaluating the price of a European option using 
%a Sobol sequence 

%initPrice is the initial price, K is the strike price 

%r is the annual interest rate, T is the time to maturity, sigma is the 
%annual volatility 

%q is the continuous dividend yield 

%numPaths is the maximum number of scenarios to generate for the evaluation 
SCEpayoffs = zeros(numPaths,1); 

%use the sobolset function in the Statistics Toolbox to generate the 

%sequence 

seq = sobolset(l); 

SobolPoints = net(seq,numPaths+1); 

%drop the first element, which is 0 
SobolPoints = SobolPoints(2:numPaths+l); 

%compute a vector array of asset prices, one for each Sobol point 
assetPrices = initPrice*exp((r-q-0.5*sigma~2)*T+sigma*sqrt(T)* 
norminv(SobolPoints)); 

%compute a vector array of discounted payoffs, one for each scenario 
%generated from a Sobol point 

SCEpayoffs = exp(-r*T)*max(assetPrices - K,0); 

%compute price of option 
SCEPrice = mean(SCEpayoffs); 


Again, in this function, we passed an array 
(SobolPoints) into a formula (init¬ 
Price *exp( (rq-0.5*sigma~2)*T + 
sigma*sqrt(T)*norminv(Sobol¬ 
Points))), and MATLAB automatically 
created an array with results (assetPrices). 

The Sobol sequence generated in the function 
is of dimension 1 and length numPaths + 1. We 
created it with the commands 

seq = sobolset(l); 

SobolPoints = net(seq,numPaths+1); 


and remove the first element, which is 0, with 
the command 

SobolPoints = SobolPoints(2:numPaths + 1) ; 

(As explained in Chapter 14 of Pachamanova 
and Fabozzi [2010], it is common to drop some 
number of elements of low-discrepancy se¬ 
quences. It takes a certain "warming up" for the 
low-discrepancy sequence to begin producing 
stable and accurate estimates.) 
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Computing the Black-Scholes Price 
of a European Option Using the 
Financial Toolbox 

The price for the European option obtained in 
the ways described in the previous two sections 
is, of course, an approximation. It will vary 
slightly depending on the specific set of sce¬ 
narios simulated with the normrnd function, 
or on the number of points generated with the 
Sobol sequence. The true option price under 
the stated assumptions is given by the Black- 
Scholes formula. (See Black and Scholes, 1973; 
Hull, 2008; or Pachamanova and Fabozzi, 2010.) 
As we mentioned earlier in this entry, the func¬ 
tion blsprice in MATLAB's Financial Tool¬ 
box can compute this price. For example, for 
an initial price of 100, a strike price of 110, an 
interest rate of 6%, time to maturity of 1 year, 
and volatility 40%, the Black-Scholes price for 
the European call option will be computed by 
typing 

>> blsprice(100, 110, 0.06, 1, 0.40) 

at the MATLAB prompt. MATLAB returns 

ans = 

14.4018 

You should get a similar price by typing the 
names of the user-defined functions we wrote 
previously, 

>> EuropeanCall(100,110,0.06,1,0.40, 

0 , 20 , 1000 ) 

to compute it with simulation, or 

>> EuropeanCallSobol(100,110,0.06,1, 
0.40,0,1000) 

to compute it by using a Sobol sequence. Here 
we are requesting that the price be evaluated 
with 1,000 paths for the price of the underly¬ 
ing. The greater the number of paths, the closer 


the estimates will be to the Black-Scholes price. 
For this example, we obtained 14.3772 for the 
option price by crude Monte Carlo simulation, 
and 14.0882 by using the Sobol low-discrepancy 
sequence. The variability for the option price 
estimated using the crude Monte Carlo simu¬ 
lation approach is large, so readers can expect 
answers that differ quite a bit. 


KEY POINTS 

• MATLAB uses a number-array-oriented pro¬ 
gramming language; that is, a programming 
language in which vectors and matrices are 
the basic data structures. 

• Array operations are very efficient in 
MATLAB. 

• Specialized MATLAB toolboxes provide ad¬ 
ditional capabilities, save time, and simplify 
model building. Some toolboxes build on the 
capabilities of other toolboxes and need to be 
purchased in groups. 

• An M-file is a file with instructions that 
MATLAB executes sequentially. Such files are 
saved with the suffix ".m" and can be called 
from the prompt in MATLAB's Command 
window by typing their name without the 
suffix ".m". 

• M-files can be scripts, that is, a simple list¬ 
ing of instructions for MATLAB, or functions, 
which take in a certain number of arguments 
and return a certain number of outputs. 

• While general script M-files can contain any 
sequence of instructions that will be com¬ 
pleted when the name of the file is typed at 
the MATLAB prompt, function M-files need 
to start with a specific first line. That line con¬ 
tains the word "function" and a declaration 
of the function name, inputs, and outputs. 
The function name and the name of the M- 
file should be the same. 

• Control flow statements in MATLAB in¬ 
clude for loops, if statements, while 
loops, switch-case constructions, and 
try-catch blocks. 
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• MATLAB has beautiful 2-D and 3-D graphing 
capabilities. The most common function for 
plotting 2-D graphs is plot. 

• MATLAB has the ability to interact efficiently 
with Microsoft Excel. The core product con¬ 
tains commands that allow importing data 
from and exporting data to Excel. 

• Spreadsheet Link EX is a useful toolbox that 
allows a more complex interface between 
MATLAB and Excel. With Spreadsheet Link 
EX, one can call MATLAB's functions di¬ 
rectly from within Excel, thus ensuring ac¬ 
cess to MATLAB's superior computational 
and graphical capabilities. 

* Optimization in MATLAB can be performed 
through the Optimization and the Global Op¬ 
timization Toolboxes. These capabilities are 
especially useful for quantitative portfolio 
management. 

* MATLAB expects optimization formulations 
to be passed to its solvers in an array form and 
has functions that can call specific solvers for 
specific types of optimization problems. 

* The MATLAB Statistics Toolbox contains 
functions for 


random number generation and can be used 
when performing financial simulations. 
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Abstract: Visual Basic for Applications (VBA) is a programming language environment that allows 
Microsoft Excel users to automate tasks, create their own functions, perform complex calculations, 
and interact with spreadsheets. Despite some important limitations, VBA adds useful capabilities 
to spreadsheet modeling and is a good tool to know for finance professionals for whom Microsoft 
Excel is the platform of choice. 


This entry is a brief introduction to Visual Basic 
for Applications (VBA), the programming lan¬ 
guage environment that allows Microsoft Excel 
users to automate tasks, create their own func¬ 
tions, perform complex calculations, and in¬ 
teract with spreadsheets. We focus on features 
of VBA useful for financial applications. For 
a comprehensive introduction to VBA, good 
references are Walkenbach (2004) and Roman 
(2002). The Excel VBA help is also useful as a 
quick reference. All Excel commands listed in 
this entry are based on Microsoft Office 2007. 


A SIMPLE EXAMPLE OF A 
VBA PROGRAM 

Before we review some important characteris¬ 
tics of the VBA language, let us create a simple 
example of a VBA program. Excel has a tool 
for recording tasks performed in a spreadsheet, 
which can then be replayed as a macro. Macros 


in Excel record a sequence of commands, so 
that you do not have to repeat the same set 
of instructions if you need to perform the task 
several times. Macros are in effect computer 
programs whose commands are hidden from 
the user, but can be seen if you open the VBA 
editor (VBE). You can access the VBE by us¬ 
ing a shortcut, Alt-Fll, in all versions of Ex¬ 
cel. In Excel 2007, VBE can be accessed from 
the Developer tab. If the Developer tab is not 
visible, do the following to set it up: Click on 

the main MS Excel button J , then Excel Op¬ 
tions. Under the Popular Options tab, check 
Show Developer Tab in Ribbon. Once the De¬ 
veloper tab is available in Excel's top menu, 
you can click on the Visual Basic button in the 
ribbon associated with it to open the editor. 
(See Figure 1.) 

Use the Macro Security button to enable 
macros. (It is always a good idea to return 
to the default—disabled macros—after you are 
finished working with macros.) 
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Figure 1 Visual Basic Button in the Developer 
Ribbon in Excel 2007 


Open a new file and name it ReturnCalc.xlsm. 
(Excel 2007 will automatically make the file ex¬ 
tension .xlsm if there are macros already in the 
file. Here, we do not have macros yet, so the 
default in Excel 2007 will be to save the file as 
.xlsx. To save the file with extension .xlsm, you 
need to select Excel Macro-Enabled Workbook 
from the drop-down menu next to Save as Type 
in the Save dialog box.) 

We are trying to create the layout shown in 
Figure 2. First, enter the text in columns A and 
B; that is, enter stock prices for three points in 
time. Suppose we want to compute the realized 
cumulative return over the two time periods 
for any set of three stock prices in column B. 
We can do that by, for example, computing the 
realized returns over each of the two periods 
in column C, and then computing the cumula¬ 
tive return between times 1 and 3 in cell D5. 


Let us record the entries and the calculations 
as a macro. To record a macro, click on Record 
Macro in the Developer tab. Delete the default 
name Macro 1, and replace it with something 
more meaningful, for example, ReturnCalc. 
Click OK. Once the macro recorder is on, do the 
following: 

1. Enter = (B3-B2) /B2 in cell C3 (this will 
compute the return for time period 1-2). 

2. With the cursor in cell C3, enter Ctrl-C to 
copy the contents of cell C3, move the cursor 
to cell C4, and enter Ctrl-V to paste. This will 
fill cells C4 with the formula for computing 
the return between times 2 and 3. 

3. Highlight cells C3-C4, right-click, select For¬ 
mat Cells | Number | Percentage | Dec¬ 
imal Points 2 to format the returns as 
percentages. 

4. Click on cell D3, enter = (1+C3). Then 
right-click, select Format Cells | Number | 
Number | Decimal Points 2 to format the 
contents of the cell as a number. 

5. Click on cell D4, enter = D3 * (1+C4). 

6. Type Total Return in cell C5. 

7. Enter = D4 -1 in cell D5 to compute the total 
return over the five periods. 

8. Highlight cells C5:D5. Right-click, then select 
Format Cells | Border. Select the double-line. 



Figure 2 Macro Recording Example 
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then click the upper line of the cell in the Bor¬ 
der window to make the double-line appear. 
Click OK. 

9. Click on the stop button in the macro 
recorder to stop recording. 

Now let us see what the macro does. You can 
use the file you created. Delete all contents from 
the array C3:D5. Press Alt-F8 or, equivalently, 
click on the Macro button in the Developer tab. 
Select ReturnCalc, press OK. The spreadsheet 

1 Sub ReturnCalc() 

2 ' 

3 ' ReturnCalc Macro 

4 ' Macro recorded month/day/year by you 

5 ' 


should fill up with the entries that you entered 
before. If you had changed the value of the stock 
price in any of the three cells in column B, the 
macro should calculate the correct correspond¬ 
ing value for total return in cell D5. 

Behind the scenes. Excel recorded VBA code 
with instructions that tell Excel what functions 
to perform when you run the macro. You can 
see these instructions by opening the VBA ed¬ 
itor and clicking on Modules | Module 1. The 
instructions look like this: 


8 Range("C3").Select 

9 ActiveCell.FormulaRICl = "= (RC [-1]-R[-1]C[-1])/R[-1]C[-1] " 

10 Range("C3").Select 

11 Selection.Copy 

12 Range("C4").Select 

13 ActiveSheet.Paste 

14 Range("C3:C4").Select 

15 Selection.NumberFormat = "0.00%" 

16 Range("D3").Select 

17 ActiveCell.FormulaRICl = " = 1+RC[-1]" 

18 Range("D3").Select 

19 Selection.NumberFormat = "0.00" 

20 Range("D4").Select 

21 ActiveCell. FormulaRICl = " = R [-1] C* (1+RC [-1] ) " 

22 Range("C5").Select 

23 ActiveCell.FormulaRICl = "Total return" 

24 Range("D5").Select 

25 ActiveCell.FormulaRICl = " = R[-1]C-1" 

26 Range("D5").Select 

27 Selection.Style = "Percent" 

28 Selection.NumberFormat = "0.00%" 

29 Range("C5:D5").Select 

30 Selection.Borders(xlDiagonalDown).LineStyle = xlNone 

31 Selection.Borders(xlDiagonalUp).LineStyle = xlNone 

32 Selection.Borders(xlEdgeLeft).LineStyle = xlNone 

33 With Selection.Borders(xlEdgeTop) 

34 .LineStyle = xlDouble 
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35 .Weight = xlThick 

36 .Colorlndex = xlAutomatic 

37 End With 

38 Selection.Borders(xlEdgeBottom).LineStyle = xlNone 

39 Selection.Borders(xlEdgeRight).LineStyle = xlNone 

40 Selection.Borders(xllnsideVertical).LineStyle = xlNone 

41 Range("D5").Select 

42 End Sub 


Knowing the actions we took to create the 
macro, it is relatively straightforward to trace 
what the program is doing at every step. To un¬ 
derstand better how the macro works, however, 
and to know how to create such scripts without 
recording them in the spreadsheet, we need to 
understand some basic facts about VBA. 


OBJECTS, PROPERTIES, 

AND METHODS 

The most important fact about VBA is that 
it tries to act as an object-oriented language. 
(VBA does not quite qualify as an object- 
oriented language for technical reasons; how¬ 
ever, for all practical purposes it is helpful to 
remember that VBA shares many of the same 
concepts as "real" object-oriented program¬ 
ming languages.) This means that it treats ev¬ 
ery component of Excel, such as a worksheet, a 
cell, a range of cells, and a chart, as an object. 
Objects are arranged in a hierarchy and have 
properties (attributes) that can be modified by 
entering the name of the object followed by dot 
and a specific command. In addition, objects are 
associated with actions (methods) that the ob¬ 
jects can perform or have applied to them. You 
can view all objects by selecting View | Object 
Browser from the top menu in the VBE window. 
In Excel 2007, you can also view a detailed list 
of objects, their properties, and their methods 
by clicking on Help (pressing FI) and selecting 
Excel Object Model Reference. 

The largest object, the object on top of the hier¬ 
archy, is Excel itself. It is the Application ob¬ 
ject. Worksheets, ranges, selections, charts, and 


so on are all objects that are lower in the hier¬ 
archy. Objects in the same class are organized 
in collections. For instance, the Workbooks col¬ 
lection contains all workbooks (Excel files) that 
are currently open. Similarly, the Worksheets 
collection contains all Excel spreadsheets in the 
files that are currently open, the Sheets collec¬ 
tion contains all Excel spreadsheets and charts 
in the files that are currently open, and so 
on. Thus, for example, to reference cell C3 in 
Worksheet Return in file (Workbook) Return- 
Calc.xlsm, you would type 

Application.Workbooks("ReturnCalc 

.xlsm").Sheets("Return").Range("C3") 

This reference is rather long and, as we can 
see from the actual VBA code, it is not neces¬ 
sary, as long as the macro is saved within the 
active Excel workbook and the identification of 
the cell range that is referenced is unique. In 
our example. Range (" C3 ") is sufficient to ref¬ 
erence cell C3, because the objects higher in the 
hierarchy, such as the name of the worksheet 
and the name of the file, are implied in the 
reference. 

An example of an action (method) that can 
be performed on an object is the command Se¬ 
lect. The Select method applies to several 
objects, including Worksheet, Chart, and 
Range. Notice that it was used often in the 
macro we created, because clicking on a cell or 
highlighting on an array performed the action. 
For example, in line 14 we selected the range 
C3:C4. Similarly, in line 10 we selected the cell 
C3 with the command 

Range("C3").Select 
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Then, the Selection property of an object in 
the background (the Window object) was used 
to return a Range object (representing the se¬ 
lected range on the spreadsheet) on which we 
can apply other methods, such as Copy (line 11 
of the code): 

Selection.Copy 

VBA usually suggests actions and properties 
that can be used with an object, so you can select 
from a list. 

Another example of modifying the proper¬ 
ties of the object is in lines 14-15 of the VBA 
code. They request that the format of the cell 
range C3:C4 be changed to percentage with two 
digits after the decimal point. Namely, we se¬ 
lected the range C3:C4, and the NumberFor- 
mat property of the Range object that was 
returned by the Selection property was set 
to percentage with two digits after the decimal 
point. 

While the code we created by recording a 
macro is helpful in understanding the basics 
of the VBA language, it can be confusing be¬ 
cause it is unnecessarily verbose. For example, 
the same result as lines 14-15, 

Range("C3:C4").Select 
Selection.NumberFormat = "0.00%" 

can be achieved with the command 
Range("C3:C4").NumberFormat = 

" 0 . 00 %" 

which modifies directly the property Number- 
Format of the object Range ( " C3 : C4 "). 

You can test that this is the case by delet¬ 
ing lines 14-15 in the VBA code in your file 
and replacing them with Range (" C3 : C4 ") . 
NumberFormat = "0.00%". Save the code 
by pressing Ctrl-S or selecting Save from the list 

under the main Excel button J . Next, delete 
cells C3:D5 in the spreadsheet, and run the Re - 
turnCalc macro again. The result and the for¬ 
matting should be the same. 

The effect of the With/End structure in lines 
33-36 is another piece of code that can be repli¬ 


cated easily through other commands; the ad¬ 
vantage of the structure is that it allows you to 
reduce the number of listed objects in the code, 
and that it makes the code more readable. A 
With/End statement requires the specification 
of an object. Inside the With/End statement, 
one can omit mentioning the object with every 
modification of a property or application of a 
method to the object. In this particular exam¬ 
ple, lines 33-36 could be replaced with 

Range("C5:D5").Borders(xlEdgeTop) 
.LineStyle = xlDouble 
Range("C5:D5").Borders(xlEdgeTop) 
.Weight = xlThick 
Range("C5:D5").Borders(xlEdgeTop) 

.Colorlndex = xlAutomatic 

with the same effect as the With/End state¬ 
ment that references Range ( " C5 : D5" ) . Bor¬ 
ders (xlEdgeTop). However, the With/ End 
statement is more concise. 

In general, when writing VBA code you do 
not need to select cells explicitly in order to en¬ 
ter data into them. However, if you are new to 
VBA, it is helpful to record the macro first to 
see the code VBA suggests, and clean up after¬ 
ward. In addition, it is a good idea to "comment 
out" the redundant statements at first, rather 
than deleting them. (Commenting out is done 
by entering an apostrophe (') at the front of the 
line of code that you wish VBA to ignore.) Af¬ 
ter commenting out overly verbose statements, 
save the macro by pressing Ctrl-S, make sure it 
still does what you would like it to do, and 
only then go back and delete the redundant 
statements. 

A less verbose version of the VBA code is 
Sub ReturnCalc() 

' ReturnCalc Macro 
' Less verbose 

Range("C3").Formula = "= (RC[-1] 

-R [-1] C [-1] )/R[ — 1]C[ — 1] " 

Range("C3").Copy 
Range("C4").Select 
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ActiveSheet.Paste 
Range("C3:C4").NumberFormat = 
"0.00%" 

Range("D3").Formula = "= 1+RC[-1]" 
Range("D3").NumberFormat = "0.00" 
Range("D4").Formula = "= R[-1]C* 
(1+RC [-1] ) " 

Range("C5").Formula = "Total 
return" 

Range("D5").FormulaRICl = "= 

R[-l]C-l" 

Range("D5").Style = "Percent" 

Range("D5").NumberFormat = "0.00%" 
With Range("C5:D5") 

.Borders(xlDiagonalDown) 
.LineStyle = xlNone 
.Borders(xlDiagonalUp) 

.LineStyle = xlNone 
.Borders(xlEdgeLeft).LineStyle 
= xlNone 

With .Borders(xlEdgeTop) 

.LineStyle = xlDouble 
.Weight = xlThick 
.Colorlndex = xlAutomatic 
End With 

.Borders(xlEdgeBottom).LineStyle 
= xlNone 

.Borders(xlEdgeRight).LineStyle 
= xlNone 

.Borders(xllnsideVertical) 
.LineStyle = xlNone 
End With 

Range("D5").Select 
End Sub 

Notice how the Wi t h/ End structure was used 
to reduce the number of words we need to use, 
and how With/End structures can be nested 
inside one another. You can test that this code 
achieves the same effect by replacing the cur¬ 
rent code in the module in your file Return- 
Calc.xlsm, saving the new code, and rerunning 
the macro ReturnCalc. 

Before we end this section, we would like to 
mention a useful property of the Range object. 
Offset (v, h). It points to a cell that is v cells 


above or below (vertical direction) and h cells 
to the right or left (horizontal direction) from a 
specific cell. For example. 

Range("C5").Select 
ActiveCell.Offset(1,2) = 10 

sets the value of the cell that is 1 cell down and 
2 cells to the right from cell C5 (i.e., cell E6) to 
10. Similarly, 

Range("C5").Select 
ActiveCell.Offset(-1,-2) = 20 

sets the value of the cell that is 1 cell up and 2 
cells to the left from cell C5 (i.e., cell A4) to 20. 

We saw the idea of referencing cells above and 
below and to the left and right of the current 
cells in the example code at the beginning of 
this section. For example, the formula in line 9 
of the original macro, 

ActiveCell.FormulaRICl = "= (RC[-1] 

-R [-1]C[-1]) /R[— 1]C[—1]" 

uses the cell in the same row and one column 
to the left (RC [ - 1 ]) and the cell one row up 
and one column to the left (R [-1] C [-1] ) to 
compute the value in the active cell. These kinds 
of commands help when one prefers to create 
relative references—in other words, to perform 
tasks relative to a prespecified location in the 
spreadsheet without changing the code when 
the starting location is changed. 

The default in VBA is to record macros in 
absolute reference mode. To change the mode 
to relative references, make sure that the rel¬ 
ative references button in the Developer tab 
( HUse Relative References) js » pressed in " before 

starting the macro recorder. 

PROGRAMMING TIPS 

While some desired formatting of an Excel 
spreadsheet can be recorded with the macro 
recorder, knowing basic programming in VBA 
opens up a whole lot of additional functional¬ 
ity For example, suppose that you have a set of 
data on stock returns over several months and. 
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as often happens with real-world data, it is not 
recorded well—there are some duplicate rows. 
You could record a macro as you go through 
the spreadsheet and clean them by hand, but 
next time you have a set of data, duplicate en¬ 
tries will not be exactly in the same rows as the 
first set of data. How can you tell Excel to sort 
through the data and remove duplicate rows in 
any set of data? You need to construct a pro¬ 
gram from scratch and make the code general 
enough to enable the script to be transferable. 

In the remainder of this section, we cover 
some basic VBA programming concepts. We 
discuss the difference between subroutines and 
user-defined functions, explain variable decla¬ 
ration in VBA, and introduce some important 
control flow statements. These concepts are not 
unique to VBA—versions of them exist in most 
programming languages. 

Subroutines versus User-Defined 
Functions 

Subroutines and user-defined functions in VBA 
are both blocks of code saved in modules. 
(If you do not see a module when you open 
VBE, select Insert | Module from the top menu 
in VBE to create one.) The difference is that sub¬ 
routines are general scripts; that is, lists of in¬ 
structions, whereas functions complete a list of 
instructions and return a value to the user. Sub¬ 
routines have the general form 

Sub () 

[commands] 

End Sub 

whereas functions have the form 

Function FunctionName(list of inputs) 
As type [commands] 

FunctionName = Return value 
'Computed from [commands] 

End Function 

The macro recorded at the beginning of this 
entry was an example of subroutine code. Next, 
we provide another small example in order to 


illustrate the difference between a subroutine 
and a function. Do not worry about the details 
of the commands right now; we will explain 
each part of the code in subsequent sections. 

Suppose we would like to calculate n\ (pro¬ 
nounced "n factorial"), where n is an integer 
number the user provides as input, n! is the 
product of all integer numbers less than or equal 
to n; that is, n! = 1-2-.. .■/?. Next, we provide sev¬ 
eral examples of subroutines and user-defined 
functions that accomplish this goal. The sub¬ 
routine 

Sub FactorialSubl() 

'Compute factorial using control flow 
statements 

'Declare the variable that will 
'store the value for factorial 
Dim Factorial As Integer 
'Declare the variable that will 
'store the number n 
Dim inNumber As Integer 
'Declare the variable that will be 
'used as counter in the loop 
Dim i As Integer 

'Read in the number from cell Bl, 

'store it in inNumber 

inNumber = Range("Bl").Value 

'Calculate factorial 
Factorial = 1 
For i = 1 To inNumber 

Factorial = i * Factorial 
Next i 

Range("B2").Value = Factorial 
End Sub 

takes the number specified in cell Bl, computes 
the factorial of that number, and sets the value 
of the cell B2 to the value of that factorial. To 
see how this subroutine works, copy the code 
in a new module in the VBE window of a 
new Excel file. Enter the number 5 in cell Bl. 
Press Alt-F8, and select FactorialSubl. The 
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subroutine fills cell B2 with 120 (5! = 1-2-3-4-5 = 

120 ). 

The function FactorialFunl whose code is 
provided next computes the same result, but 
works in a different way. It takes a number as 
an input (inNumber), and returns a number 
as an output (FactorialFunl). The output to 
be returned should have the same name as the 
function. 

Function FactorialFunl(inNumber) 

As Integer 
Dim i As Integer 
'Calculate factorial 
FactorialFunl = 1 
For i = 1 To inNumber 
FactorialFunl = i * FactorialFunl 
Next i 

End Function 

Add this function to the module in the VBE 
in your file. To call this function, type in a 
cell in your spreadsheet (say, cell B3) = Fac¬ 
torialFunl (Bl). If the value in cell B1 was 
still 5 (the value you entered in the previ¬ 
ous example), then the value of cell B3 will 
be 120. Notice that the syntax for calling your 
(user-defined) function is not different from the 
syntax for calling built-in Excel functions. In 
fact. Excel has a function for computing a fac¬ 
torial, = Fact (number), and if you entered 
the expression = Fact (Bl) in, say, cell B4 of 
your spreadsheet, you would get the same re¬ 
sult (120). 

Excel built-in functions can be used also in¬ 
side VBA code with the prefix Application. 
It is worthwhile to note, though, that VBA itself 
has some built-in numeric functions. In particu¬ 
lar, functions such as Abs (absolute value), Exp 
(exponential), Int (integer part), Cos (cosine). 
Sin (sine). Log (natural log), Rnd (random 
number generator). Sign (sign function). Tan 
(tangent), and Sqr (square root) can be used di¬ 
rectly within VBA code without the prefix Ap - 
plication. Although it seems that this should 
make things easier, it may also be a source of 
confusion. Notice that Excel has equivalent nu¬ 


merical functions for formulas that are entered 
into cells in spreadsheets, but the syntax for 
some of the functions is different. For exam¬ 
ple, the natural logarithm function in Excel is 
Ln, and the square root function is Sqrt. So, 
typing Sqr in your program in VBA is equiv¬ 
alent to typing Application. Sqrt. In prac¬ 
tice, you would want to use the shorter syntax 
Sqr. It is important to be aware of inconsisten¬ 
cies between names of equivalent functions in 
Excel and VBA. 

The subroutine FactorialSubl () and the 
function FactorialFun2 () whose code is 
provided below illustrate how the factorial can 
be computed by calling the built-in Excel func¬ 
tion Fact. 

Sub FactorialSub2() 

'Compute factorial using Excel's FACT 
'function within a subroutine 

Range("B5") = Application.Fact_ 
(Range("Bl")) 

End Sub 

Function FactorialFun2(inNumber) As 
Integer 

'Calculate factorial 
FactorialFun2 = Application_ 

.Fact(inNumber) 

End Function 

Copy the code above in the module in your 
file. The subroutine FactorialSub2 () uses 
the number entered in cell Bl in the spread¬ 
sheet as an input and calls the Excel function 
Fact () to compute the factorial of the value 
in cell Bl. The function FactorialFun2 () is 
called with an input argument that is a number 
and returns the factorial of that number. If you 
type = FactorialFun2 (Bl) in cell B6 and 
the value in cell Bl is still 5, you should obtain 
120 in cell B6. 

What is the advantage of using user-defined 
functions rather than subroutines? In some 
cases, you can only use one or the other. How¬ 
ever, in cases in which both are possible, it 
may be preferable to structure the script as 
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a function as opposed to a subroutine. User- 
defined functions are more "transferable"—in 
other words, it is easier to use them in different 
places in the spreadsheet. There are some other 
conveniences—for example, check what hap¬ 
pens when the number for n in cell B1 is 
changed from 5 to 6. Cell B3 (which contains the 
call to the function FactorialFunl) immedi¬ 
ately updates to 720, which is the correct result. 
However, cells B2 and B5—those that are output 
ranges for the subroutines FactorialSubl 
and FactorialSub2—do not update until 
you rerun the macros associated with them. 

Variable Declaration 

Variables are a basic common concept in 
computer languages. They are used to store 
numerical and text data and handle inter¬ 
mediate output in subroutines and functions. 
For example, inNumber in the code for 
FactorialSubl was a variable that stored the 
value of n for which the factorial should be 
computed. There is no convention for naming 
variables, but a good practice is to give them 
meaningful names (rather than x, y, and z), so 
that your code is easier to follow. We prefer to 
start names of variables with small letters. If 
there is a second word in the name, that word 
starts with a capital letter. We also like to dif¬ 
ferentiate variables that store inputs (such as 
inNumber) and variables that record output 
(e.g., outFactorialValue). 

Depending on their type, variables are han¬ 
dled differently and are allocated a different 
amount of memory. For example, we specified 
that inNumber should be an integer number by 
declaring it with the syntax Dim variable- 
Name As variableType: 

Dim inNumber As Integer 

Other types of variables include String, 
Single, Double, Long, Boolean, 

Date, Object, Variant, and so on. For 
example, when you need a variable that will 
hold a fractional (also called "floating point") 


value, then you should use the Single or 
Double data type. When you need a variable 
to store text data, use the String type. The 
Variant type can be used to replace any type; 
however, it also uses up the largest amount of 
space, so it is better to specify a particular type 
for a variable if you know it. 

When specifying a variable type, make sure 
that you have enough space for the data you 
are planning to store in that variable. If the 
value gets too large for the variable type, your 
program may crash. For example, the Inte¬ 
ger type can store values between -32,768 and 
32,767. If you need to store an integer num¬ 
ber outside this range, use the Long variable 
type. Similarly, the Single (floatingpoint) type 
can store numbers between -3.402823E38 and 
-1.401298E-45 for negative values, and num¬ 
bers between 1.401298E-45 and 3.402823E38 for 
positive values. 1 If you need to work with frac¬ 
tional numbers outside this range, use the Dou¬ 
ble (floating point) variable type. 

Variables can be grouped into arrays. For ex¬ 
ample. 

Dim myArray(5) As Integer 

declares an array of integers of size 6. 

One of the most confusing things about VBA 
is the way it handles arrays. The default is to in¬ 
dex the first element in arrays as 0, which is the 
convention in most programming languages, 
which is why the total number of elements in 
myArray is 6. However, in some special cir¬ 
cumstances arrays are treated as starting with 
the index 1. To ensure consistency and minimize 
confusion, it is helpful to use the command 

Option Base 1 

at the beginning of the module, which makes 
sure that the indexing of arrays always starts at 
1. If this option is stated, then declaring 

Dim myArray(5) As Integer 

will result in an array of 5 elements. Those ele¬ 
ments can be referenced as myArray (1) ,..., 
myArray (5) later in the program. 
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You can specify arrays of multiple dimensions 
as well, for example. 

Dim myMultiArray(5,2) As Integer 

will result in an array of 5 rows and 2 columns. 

You can also declare dynamic arrays, that is, 
arrays that do not have specific dimensions 
from the beginning. This may happen if, for ex¬ 
ample, you have a set of data and you need to 
read it in before you know how many elements 
it has. In that case, you would declare an array 

Dim myDynamicArray() As Integer 

which will be filled as necessary. Once the 
number of elements is counted, the array can 
be resized by using the command ReDim, for 
example, 

ReDim myDynamicArray(10) 

ReDim reinitializes (sets to empty) all values 
within an array. If you want to preserve the 
values that are already there, use ReDim Pre - 
serve, which preserves as many elements as 
can fit in the new array dimensions. 

Working with arrays within VBA is cumber¬ 
some and prone to errors. Often, one needs to 
resort to loops (see the introduction to loops in 
the next section) to handle array operations. In 
many cases, it may be preferable to use built-in 
Excel array manipulation functions, such as 
SUMPRODUCT, which performs vector multipli¬ 
cation. As we mentioned earlier, such built-in 
Excel functions can be called with Applica¬ 
tion . FunctionName. For example, Array3 
= Application.SUMPRODUCT(Arrayl, 
Array2) will fill a variable array Array3 with 
the result of the elementwise multiplication 
and summation of the matrix arrays Arrayl 
and Array2. 

VBA will assume that you are creating a new 
variable whenever you use an expression that 
is not one of the standard commands. Stating 
the type of variables you use in the program 
can save you a lot of headache. (Typically, vari¬ 
able declaration is done at the beginning of the 
program.) 


We also strongly recommend that you write 
the statement Option Explicit in the first 
line of your modules. This statement makes 
sure that Excel will report an error if it encoun¬ 
ters an undeclared variable in your code. (This 
also can be accomplished by checking Require 
Variable Declaration under Tools | Options 
in the top VBE menu.) While this may seem 
like an inconvenience, think about a situation 
in which you mistype the name of a variable 
somewhere in your program. If Excel is not in 
the Option Explicit mode, it will treat the 
mistyped name as a new variable, ignoring any 
value that your variable may have had at that 
point in the program, and you will get nonsen¬ 
sical output. If Excel reports an error instead, 
you will know to fix the typo. 

Control Flow Statements: For and If 

Control flow statements in VBA allow for build¬ 
ing more sophisticated programs than simple 
input and output of data to Excel. We briefly re¬ 
view a couple of important control statements 
that are used in VBA code: an example of an 
iterative statement (the For loop) and an exam¬ 
ple of checking a condition (the If statement). 

The general syntax of a For loop in VBA is as 
follows: 

For i = 1 to n 
commands 
Next i 

The commands inside the For loop are exe¬ 
cuted once for every value of n. (One can also 
specify a step by writing For i = 1 to n 
Step k. For example, if n = 10 and step k = 2, 
then the commands in the loop will be executed 
for n = 1, 3, 5, 7, 9.) 

We saw an example of a For loop in the code 
for calculating the factorial of a number n. Let 
us walk through the For loop code inside Fac - 
torialSubl. 

'Calculate factorial 
Factorial = 1 
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For i = 1 To inNumber 
Factorial = i * Factorial 
Next i 

The initial value of Factorial is set to 1. 
Suppose the value for inNumber is 5. The loop 
starts at i = 1. During the first iteration, the new 
value of Factorial equals the current value of 
i (which is 1) times the current value of Facto¬ 
rial (which is 1 as well). At the end of the first 
run through the loop, the value of Factorial 
is 1. Next, the value of i is set to 2. The new 
value of Factorial equals the current value 
of i (which is 2) times the current value of Fac - 
torial (which is 1); that is, it equals 2. At the 
third iteration, the value of i is 3 and the cur¬ 
rent value of Factorial is 2; that is, the new 
value of Factorial is 3-2 = 6. And so on and 
so forth for the next values of i, which are 4 
and 5. The value of Factorial keeps getting 
updated until it reaches 720 ( = 5!) in the last 
iteration of the loop. 

There are other commands that enable iterat¬ 
ing through commands multiple times, such as 
the Do While and Do Until. See VBE's Help 
for description of the syntax and use of these 
alternatives. 

The general form of the I f statement is 

If condition Then 
commands 
End If 

When the condition is true, the block of com¬ 
mands executes. More generally, you can use a 
statement of the kind 

If conditionl Then 
commands1 

Elself condition2 Then 
commands2 

Else 

commands3 
End If 

Commands 1 will be executed if conditionl 
is true. If conditionl is not true, then (and 
only then) condition2 will be checked. If 


condition2 is true, then commands2 will be 
executed. If condition2 is not true, then com¬ 
mands 2 will be executed. 

When using If statements, one typically 
needs to compare values of variables and check 
whether conditions are true. Therefore, it is use¬ 
ful to know about the logical operators that 
allow for such comparisons and checks. The 
comparison operators are the following: 

= tests for equality 

<> tests for inequality 

< tests whether the variable to the 

left of it is less than the variable 
on the right 

> tests whether the variable to the 

right of it is less than the vari¬ 
able on the left 

< = and > = test for less than or equal to/ 

greater than or equal to 

Additional useful operators are AND, OR, 
and NOT. AND allows checking whether more 
than one statement is true at the same time. OR 
returns a True result if at least one of the state¬ 
ments is true. NOT returns a True result if the 
statement is false. 

To illustrate how we can use these operators, 
consider a couple of simple examples that in¬ 
volve three numerical variables, varl, var2, 
and var3. Let varl = 5, var2 = 10. 

The code 

If (varl <> var2) Then 
var3 = 100 

Else 

var3 = -100 
End If 

checks whether the value for varl is different 
from the value of var2. If it is (i.e., the value 
of the logical statement (varl <> var2) is 
True), then the value of var3 is set to -100; 
otherwise the value of var3 is set to 100. In 
this example, the value of var3 at the end 
of the loop is 100, since the value for varl 
(5) is indeed different from the value of var2 
( 10 ). 
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Consider also the example 

If (varl < 5) Or (var2 > = 7) Then 
var3 = 100 

Else 

var3 = -100 
End If 

The code checks if at least one of the state¬ 
ments (varl < 5) and (var2 > = 7) is 
true. If at least one of them is true, then the 
value of var3 is set to 100; otherwise the value 
of var3 is set to -100. In our case, the first state¬ 
ment is false, because the value of varl is not 
less than 5 (it is equal to 5). However, the sec¬ 
ond statement is true: The value of var2 (10) is 
indeed greater than or equal to 7. Therefore, 
the combined statement (varl < 5) Or 
(var2 > = 7) is true, and the value of var3 
will be set to 100. 

User Interaction in VBA 

While we covered the most fundamental con¬ 
cepts about the VBA language, it is fun to 
learn about some additional capabilities that 
enable your programs to interact better with 


their users. For example, once you have created 
a macro, you can associate it with a button that 
the user can press every time he or she wants the 
macro to run. To do that, go to the Developer 
tab, select Insert | Form Controls, and click on 
the button. When Excel pops up in the Macro 
dialog box, click on the macro you would like 
to have associated with this button. 

Sometimes, it is convenient to ask the user 
to input information through an input dialog 
box. This can be done with the command In- 
putBox("question for user", "title 
of the input box "). For example, 

inNumber = InputBox("Enter an 
integer", "Factorial Calculation") 

will prompt the user to enter an integer num¬ 
ber and will save that number into the variable 
inNumber. The title of the input box will be 
Factorial Calculation. 

Other useful user interaction tools include 
Message Box ( MsgBox), which allows you to 
report output not in a cell on the spreadsheet, 
but in a message box. To test how it works, let 
us go through the following modification of the 
factorial calculation program (save it in your 
file as subroutine FactorialSubMsgBox ()): 


1 Sub FactorialSubMsgBox() 

2 Dim inNumber As Variant 

3 Dim numberType As Boolean 

4 Dim outFactorial As Integer 

5 

6 inNumber = InputBox("Enter an integer number", "Factorial Calculation") 

7 

8 numberType = IsNumeric(inNumber) 

9 

10 If numberType = True Then 

11 outFactorial = Application.Fact(inNumber) 

12 MsgBox ("The factorial of " & inNumber & " equals " & outFactorial) 

13 Elself numberType = False Then 

14 MsgBox ("Not a number. Please enter an integer number.") 

15 End If 


16 End Sub 
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On line 6, we ask the user to specify the num¬ 
ber for which we want to compute the factorial. 
On line 8, we check whether this is indeed a 
number. Note that the variable numberType is 
specified as Boolean, which means that it can 
only take True or False values. If it is true, 
that is, if inNumber is indeed a number, then 
we call the Excel built-in function Fact to cal¬ 
culate the factorial of this number, and print the 
statement "The factorial of the number the user 
entered is the result obtained" in a message box 
on the screen. If it is not true, then we prompt 
the user to enter a number. 

Note that in line 2, we specified the type of 
variable for inNumber as Variant, which allows 
it to be anything. If we had declared inNumber 
As Integer and had entered a letter instead 
of a number. Excel itself would have returned 
an error, complaining that there is a variable 
type mismatch between what was declared and 
what the actual value of the variable is. Thus, 
declaring the exact type of variable whenever 
we know the type is very is important for min¬ 
imizing errors in output. 

DEBUGGING 

VBA has useful debugging tools that allow you to 
look at the code in more detail if your programs 
do not work as expected. These tools can be 
accessed through commands under the Debug 
item in the top menu of the VBE. 

The "Step Into" button ^ (shortcut F8) lets 
you execute your program step by step. When 
you are executing a program step-by-step, your 
program is in "break mode." Every time you 
press F8, the "break" is moved to the next com¬ 
mand. While the break is set on a particular 
command, placing the cursor over any variable 
above the break point will give you an updated 
stored value for that variable. This makes it 
easy to catch calculation errors and inconsis¬ 
tencies. You can "step over" (i.e., skip) some 
subroutines that you are not interested in 

double-checking (use the button v= or the 


shortcut Shift-F8) and "step out" of the break 

mode (use the button or the shortcut Ctrl- 
Shift-F8). Equivalently, you can click on the Re¬ 
set button in the top VBE menu ( ^ ) to get out 
of debug mode. 

Rather than going through the program step- 
by-step, it is sometimes helpful in long pro¬ 
grams to set breakpoints in advance, so that the 
program runs until it gets to a particular break¬ 
point. A breakpoint can be specified by placing 
the cursor at the place where it should be in¬ 
serted, and clicking on the button ^ in the 
Debug menu (or using the shortcut F9). When 
the program gets to the breaking point, it auto¬ 
matically goes into break mode and allows you 
to follow the subsequent commands step-by- 
step and check the values of the variables at that 
point in the program. To remove a breakpoint, 
simply place the cursor in the corresponding 
line, and click on the breakpoint button again. 

EXAMPLES 

The best way to learn to program in VBA is to 
see and implement many examples. Let us dis¬ 
cuss three examples of using VBA in financial 
applications. The first example is a function that 
computes the Black-Scholes price of a European 
call option. It shows how a function is created, 
how variables are declared, and how Excel func¬ 
tions are accessed from within VBA. The second 
example is a function that generates possible 
paths for an asset price assumed to follow geo¬ 
metric Brownian motion. It involves using the 
random number generator in VBA, manipulat¬ 
ing arrays, and iterating with loops. The third 
example is a function that computes the price 
of a European call option by simulation. It illus¬ 
trates how user-defined and Excel functions can 
be called from within VBA functions, and pro¬ 
vides another example of array manipulation 
and loops in VBA. Further examples of VBA 
scripts for financial applications, such as calcu¬ 
lating the price of an Asian option, or comput¬ 
ing and graphing the mean-variance efficient 
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portfolio frontier (see Markowitz, 1952), can be 
found in Pachamanova and Fabozzi (2010). See 
also Jackson and Staunton (2001). 

Pricing a European Call Option with 
the Black-Scholes formula 

The Black-Scholes formula for a European call 
option (C) is as follows (Black and Scholes, 
1973): 

C = S 0 • e~i T ■ 0>(di) - K ■ e~ rT ■ <t>(d 2 ) 
where 

, ln(S 0 /K) + (r-q+ ct 2 /2)-T 

“ 1 = -WT- 

d 2 = d 2 — cr • 

K is the strike price 
T is the time to maturity 
*7 is the percentage of stock value paid annu¬ 
ally in dividends 

O denotes the cumulative probability density 
function for the normal distribution 

The value for <t>(d) can be found in Ex¬ 
cel by using the built-in formula =NORMDIST 
(d, 0,1,1) or, equivalently, the formula 
=NORMSDIST(d) . 

To illustrate the Black-Scholes option pricing 
formula, assume the following values: 


Next, we provide the code of a VBA function 
that computes the price of a European call op¬ 
tion with the Black-Scholes formula. 

Function BSCallPrice(initPrice As_ 
Double, 

K As Double, 

T As Double, 
r As Double, 
sigma As Double, 
q As Double) 

'Computes the Black-Scholes price of a 
'European call option 

'initPrice is the initial price of the 
'stock 

'r is the interest rate 

'T is the time to maturity of the 

'option 

'sigma is the volatility of the stock 
'q is the continuous dividend yield 

Dim done As Double 

done = (Log (initPrice / K) + (r - q _ 
+ 0.5 * sigma ~ 2) * T) / _ 

(sigma * Sqr(T)) 

BSCallPrice = initPrice * Exp(-q * T)_ 
* Application.NormSDist(done) - 
K * Exp(-r * T) * Application. 
NormSDist(done - sigma * Sqr(T)) 


Current stock price (So) = $50 
Strike price ( K ) = $52 

Time remaining to expiration (T) = 183 days = 
0.5 years (183 days/365, rounded) 

Stock return volatility (cr) = 0.25 (25%) 
Short-term risk-free interest rate = 0.10 (10%) 


Plugging into the formula, we obtain 


, ln(50/52) + (0.10 - 0 + 0.25 2 /2) • 0.5 

di = - —= - 

0.25 • Vo3 

d 2 = 0.1502 - 0.25 • V05 = -0.0268 


0.1502 


d>(0.1502) = 0.5597 
$(-0.0268) = 0.4893 

C = 50 ■ 1 ■ 0.5597 - 52 ■ e" 0 - 10 ' 0 - 5 .0.4893 = $3.79 


End Function 

In the code above, all input variables (init¬ 
Price, r, T, sigma and q) are specified 
to be of type Double. A variable done is 
declared as type Double within the function, 
done stands for d\ in the definition of the 
Black-Scholes formula above. It takes the 
value of the expression (Log (initPrice 
/ K) + (r-q+0.5* sigma ~ 2) 

* T) / (sigma * Sqr(T)). (Note that 
this expression contained an underscore ("_") 
in the code above. The underscore is used 
when transferring a part of an expression to 
a new line.) The price of the option is stored 
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Black-Scholes option pricing formula (VBA) 
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Initial price 

$ 50.00 





4 

Strike price 

$ 52.00 





5 

Time to expiration 

0.50 
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Interest rate 

10% 
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Volatility 

25% 
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Dividend yield 

0 
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10 

Call price 

$ 3.79 

=BSCallPrice(B3,B4,B5,B6,B7,B8) 

A A 





Example of Using the User-Defined Function BSCall Price in a Spreadsheet 


Figure 3 

in BSCall Price. The functions Log and Exp 
used in the calculation are VBA functions. We 
also call the Excel function NormSDist with 
the expression Application. NormSDist. 

The function BSCallPrice can then be used 
in a spreadsheet. An example is given in Fig¬ 
ure 3. The inputs are stored in cells B3:B8, and 
the function is called with arguments that are 
cell references to cells where the information is 
stored. 

VBA is forgiving if you are sloppy in writ¬ 
ing the function. For example, the code below 
(without any variable declarations) would have 
worked as well. 

Function BSCallPrice(initPrice,K,T,r, 
sigma,q) 

done = (Log (initPrice / K) + (r - q 
+ 0.5 * sigma ~ 2 ) * T) / _ (sigma 
* Sqr(T)) 

BSCallPrice = initPrice * Exp(-q * T) 

* Application.NormSDist(done) - 
K * Exp(-r * T) * Application. 
NormSDist(done - sigma * Sqr(T)) 

End Function 

However, as we mentioned earlier, it is a good 
practice to keep your code well organized. It 
helps minimize errors and saves you time in 
the long run. 


Generating Paths for the Price of an 
Asset That Follows Geometric 
Brownian Motion 

In finance, the dynamics of asset price processes 
in discrete time increments are typically de¬ 
scribed by two kinds of models: trees (such as 
binomial trees) and random walks. When the 
time increment used to model the asset price 
dynamics becomes infinitely small, such pro¬ 
cesses are referred to as stochastic processes in 
continuous time. The ability to generate paths 
for asset prices following these processes is im¬ 
portant for computing prices of securities that 
depend on the asset price under consideration, 
as well as for calculating various risk measures 
associated with holding the asset in a portfolio. 

The most widely used stochastic process in fi¬ 
nance is geometric Brownian motion. The evo¬ 
lution of the underlying asset price is described 
by the equation 

d St = ijl St dt + a St dW t 

where W f is standard Brownian motion, and 
p and a are the drift and the volatility of the 
process, respectively. (See a more detailed in¬ 
troduction in Hull [2008] or Pachamanova and 
Fabozzi [2010].) It turns out that the value of the 
asset price Sr at time T given the asset price S t 
at time t can be computed as 

Sr = Sf e^~ J °' 2 H r-f )+o r -V ( r_ 0-e 
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where e is a standard normal random vari¬ 
able. If the stock pays a continuously com¬ 
pounded dividend yield of q, then we use 
(/r - q - 0.5-er 2 ) instead of (/x - 0.5-er 2 ) in the 
above formula. 

Let us create a function, GBMPaths, that 
generates a prespecified number of paths 
(numPaths) for the asset. Each path consists 


of a prespecified number of steps (numSteps). 
The value of the asset at each step is computed 
according to the formula for Sr above. In the 
formula, we replace time t with time 0 (i.e., the 
present), and time T with the time correspond¬ 
ing to the step. 

The code of the function is 


Function GBMPaths(initPrice As Double, 
mu As Double, 
sigma As Double, 

T As Double, 
q As Double, 
numSteps As Integer, 
numPaths As Integer) 

Randomize 

Dim iPath, iStep As Integer 

Dim paths() As Variant 

ReDim paths(1 To numSteps +1, 1 To numPaths) 


For iPath = 1 To numPaths 

paths(1, iPath) = initPrice 
For iStep = 2 To numSteps + 1 

paths(iStep, iPath) = paths(iStep - 1, iPath) * 

Exp((mu - q - 0.5 * sigma ~ 2) * (T / numSteps) + 
sigma * (T / numSteps) ~ (1/2) * 

(Application.NormSInv(Rnd))) 

Next 

Next 

GBMPaths = paths 
End Function 

Let us now see what this function does. First, 
we use the command Randomize to make sure 
that VBA creates a different sequence each time 
we generate normal random variables to com¬ 
pute the paths for the asset. (If you do not type 
Randomize before you use the VBA random 
generator function Rnd, Rnd will always re¬ 
turn the same sequence of numbers.) 

Next, we declare variables we will use in the 
function. The variables iPath and iStep will 
be counters for the number of paths and the 
number of steps we have generated so far. They 
are, of course, integers. The two-dimensional 
array paths will store the values of the asset 


along each path and for each step. We use 
ReDim to specify the dimensions of the array. 

We next use a for loop to populate the ar¬ 
ray paths. In fact, we have two nested for 
loops—one that iterates through the number for 
paths, and one that iterates through the points 
in each path. For each point i S t ep on each path 
iPath, we calculate the price of the asset and 
store it in paths (iStep, iPath) .The formula 
that computes the price of the asset contains the 
expression Application. NormSInv (Rnd), 
which generates a value for the normal ran¬ 
dom variable e in the formula for St ear¬ 
lier in this section. Rnd is the VBA random 
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number generator—it returns a random num¬ 
ber between 0 and 1. The reason we used the 
command Randomize at the beginning of the 
function is so that we could force Rnd to gen¬ 
erate different sequences of random numbers 
every time you call the function GBMPaths. 
Norms Inv is an Excel function that finds the 
number on the horizontal axis of the nor¬ 
mal distribution that corresponds to the value 
for cumulative probability generated by Rnd. 
(See, for example. Chapter 4 in Pachamanova 
and Fabozzi [2010] for an explanation of how 
random numbers from different probability 
distributions are generated.) As in the pre¬ 
vious example, in order to indicate to VBA 
that NormsInv is an Excel function, we use 
Application, in front of NormSInv. 

The function returns a two-dimensional ar¬ 
ray, GBMPaths (which is equal to paths, as set 
in the second-to-last line of the function). Ev¬ 
ery column of the array contains a randomly 
generated path for the asset price; that is, it has 
numSteps values that represent the values of 
the asset price along that path. 

Pricing a European Call Option by 
Simulation 

Let us now use the function we created in the 
previous section to write a function that prices a 
European call option by simulation. While this 
is not the most efficient way to price a European 
call option by simulation, it will illustrate how 
user-defined functions are called within other 
functions, and how arrays are handled as out¬ 
puts of a function. 


As in the previous section, we will make the 
assumption that the asset price follows geo¬ 
metric Brownian motion, which means that the 
value of the asset price Sj at time T given the 
asset price St at time t can be computed as 

S T = SfC^ - k a2 )-( T -‘)+ a V 

where s is a standard normal random variable. 
(When we generate asset price paths for the pur¬ 
pose of valuing an option, we use r (the risk-free 
rate) in place of the drift term fi. This is done for 
technical reasons (absence of arbitrage).) As in 
the previous section, if the stock pays a contin¬ 
uously compounded dividend yield of q, then 
we use (r - q - 0.5-er 2 ) instead of (r - 0.5-er 2 ) in 
the formula above. 

The price of the option can be approximated 
by creating scenarios for the stock price Sr at 
time to maturity T, computing the discounted 
payoffs of the option, and finding the expected 
payoff of the option. (Option pricing by simula¬ 
tion was first suggested by Boyle, 1977. See also 
Boyle et al., 1997; Pachamanova and Fabozzi, 
2010; Glasserman, 2004; or McLeish, 2005.) 

Suppose we generate N scenarios for s at time 
T: e^,..., Then, the price of a European call 

option with strike price K will be 


x max |s f e (r -5 a2) ' (T - t)+a 'V^' £< " ) - K, oj 

The expression above is the expected value of 
the option payoffs, that is, the weighted average 
of the option payoffs. 

The VBA code of the function is given below. 


Function EuropeanCall(initPrice As Double, 

K As Double, 
r As Double, 

T As Double, 
sigma As Double, 
q As Double, 
numSteps As Integer, 
numPaths As Integer) 
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Dim iPath As Integer 
Dim payoffs() As Variant 
ReDim payoffs(1 To numPaths) 

Dim paths() As Variant 

ReDim paths(1 To numSteps +1, 1 To numPaths) 

paths = GBMPaths(initPrice, r - q, sigma, T, q, numSteps, numPaths) 
For iPath = 1 To numPaths 

payoffs(iPath) = Exp(-r * T) * 

Application.Max(paths(numSteps + 1, iPath) - K, 0) 

Next 


Average(payoffs) 


EuropeanCall = Application. 

End Function 

The variable declarations are similar to the 
declarations in the previous sections; however, 
now we have an additional array, payoffs, 
that will store the payoff of the option at the 
end of each generated path (that is, for each 
generated scenario). The dimension of the ar¬ 
ray is therefore 1 x numPaths. 

After declaring the variables in the function, 
we call the function we created in the previ¬ 
ous section, GBMPaths, and store the output 
in the array paths. This is achieved with the 
command 

paths = GBMPaths(initPrice, r - q, 
sigma, T, q, numSteps, numPaths) 

The arguments of the function GBM¬ 
Paths were initPrice, mu, sigma, T, 
q, numSteps and numPaths. Note that when 
we call the function GBMPaths from within the 
function EuropeanCall, we input r - q in 
place of the argument mu. 

After generating numPaths paths for the 
price of the underlying asset, we compute the 
payoffs of the option. We only need the pay¬ 
offs at the time of maturity of the option, time 
T, so we only use paths (numSteps + 1, 
iPath) in the calculation. 

The payoff along path iPath is calculated 
as the maximum of zero and the difference be¬ 
tween the strike price K and the value of the 
underlying at the end of path iPath at time T. 


We use the Excel function Max to compute the 
maximum and call it as Application.Max. 
Each payoff is discounted, and is added to the 
array payoffs. 

After the array payoff s is filled, we compute 
the average of the payoffs to get the price of 
the option. We use the Excel function Average, 
which we call with the command Applica¬ 
tion .Average. 

KEY POINTS 

* Macros contain prerecorded tasks that can be 
performed in a spreadsheet. Macros are in 
effect computer programs whose commands 
are hidden from the user, but they can be seen 
if the VBA editor is open. 

• The most important fact about VBA is that 
it tries to act as an object-oriented language. 
This means that it treats every component of 
Excel, such as a worksheet, a cell, a range of 
cells, and a chart, as an object. 

* Objects are arranged in a hierarchy and have 
properties (attributes) that can be modified 
by entering the name of the object followed 
by a dot and a specific command. In addition, 
objects are associated with actions (methods) 
that the objects can perform or have applied 
to them. 

• Subroutines and user-defined functions in 
VBA are both blocks of code saved in 
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modules. The difference is that subroutines 
are general scripts; that is, lists of instructions, 
whereas functions complete a list of instruc¬ 
tions and return a value to the user. 

Variable types in VBA include Integer, String, 
Single, Double, Long, Boolean, Date, Object, 
and Variant. A different amount of memory 
is allocated to storing values of variables of 
different types. 

The default in VBA is to index the first ele¬ 
ment in arrays as 0, which is the convention 
in most programming languages. The com¬ 
mand Option Base 1 at the beginning of a 
module makes sure that the indexing of ar¬ 
rays starts at 1. 

Control flow statements such as For and If 
allow for building more sophisticated pro¬ 
grams than simple input and output of data 
to Excel. 

Excel functions can be accessed from VBA by 
prefixing them with Application. 

VBA has some built-in numeric functions, 
but it is important to know that their syn¬ 
tax is not always the same as the syntax of 
the same function in Excel. For example, the 
function Sqrt (square root) in Excel is Sqr 
in VBA. 

Useful tools in Excel and VBA that allow for 
interaction with users include buttons, input 
dialog boxes, and message boxes. 

VBA has debugging tools that allow you to 
look at the code in more detail if your pro¬ 
grams do not work as expected. These tools 
can be accessed through commands under the 
Debug item in the top menu of the VBE. 


NOTE 

1. The notation E in Excel denotes multiplica¬ 
tion by 10 to a specific power. For example, 
5E40 means 5-10 40 , and 5E-45 means 5-10 45 . 
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Abstract: Calculus is an important tool because it provides two key ideas for financial modeling: 
(1) the concept of instantaneous rate of change, and (2) a framework and rules for linking together 
quantities and their instantaneous rates of change. Calculus made the concept of infinitely small 
quantities precise with the notion of limit. If the rate of change can get arbitrarily close to a definite 
number by making the time interval sufficiently small, that number is the instantaneous rate of 
change. The instantaneous rate of change is the limit of the rate of change when the length of the 
interval gets infinitely small. This limit is referred to as the derivative of a function, or simply 
derivative. Starting from this definition and with the help of a number of rules for computing a 
derivative, it was shown that the instantaneous rate of change of a number of functions can be 
explicitly computed as a closed formula. The process of computing a derivative, referred to as 
differentiation, solves the problem of finding the steepness of the tangent to a curve; the process of 
integration solves the problem of finding the area below a given curve. A key result of calculus is 
the discovery that integration and derivation are inverse operations: Integrating the derivative of 
a function yields the function itself. Standard calculus deals with deterministic functions. As such, 
there are limits as to the application of integration of determinist functions to financial modeling 
such as pricing contingent claims. The major application of integration to financial modeling 
involves stochastic integrals. An understanding of stochastic integrals is needed to understand 
an important tool in contingent claims valuation: stochastic differential equations. 


In elementary calculus, integration is an op¬ 
eration performed on single, deterministic 
functions; the end product is another single, 
deterministic function. Integration defines a 
process of cumulation: The integral of a func¬ 
tion represents the area below the function. 
However, the usefulness of deterministic func¬ 
tions in financial modeling is limited. Given 
the amount of uncertainty, few laws in finan¬ 


cial theory can be expressed through them. It 
is necessary to adopt an ensemble view, where 
the path of economic variables must be consid¬ 
ered a realization of a stochastic process, not 
a deterministic path. We must therefore move 
from deterministic integration to stochastic in¬ 
tegration. In doing so we have to define how 
to cumulate random shocks in a continuous-time 
environment. These concepts require rigorous 
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definition. In this entry, we define the con¬ 
cept and the properties of stochastic integration. 
Based on the concept of stochastic integration, 
an important tool used in financial modeling, 
stochastic differential equations can be under¬ 
stood. 

Two observations are in order. First, although 
ordinary integrals and derivatives operate on 
functions and yield either individual num¬ 
bers or other functions, stochastic integration 
operates on stochastic processes and yields 
either random variables or other stochastic 
processes. Therefore, while a definite integral 
is a number and an indefinite integral is a 
function, a stochastic integral is a random 
variable or a stochastic process. A differential 
equation—when equipped with suitable initial 
or boundary conditions—admits as a solution 
a single function while a stochastic differen¬ 
tial equation admits as a solution a stochastic 
process. 

Second, moving from a deterministic to a 
stochastic environment does not necessarily re¬ 
quire leaving the realm of standard calculus. 
In fact, all the stochastic laws of financial the¬ 
ory could be expressed as laws that govern 
the distribution of transition probabilities. An 
example of this mathematical strategy is the 
application of the forward Komogorov differ¬ 
ential equation or the Fokker-Planck differential 
equation to term structure modeling, which are 
deterministic partial differential equations that 
govern the probability distributions of prices. 
Nevertheless it is often convenient to represent 
uncertainty directly through stochastic integra¬ 
tion and stochastic differential equations. This 
approach is not limited to financial theory: It 
is also used in the domain of the physical sci¬ 
ences. In financial theory, stochastic differential 
equations have the advantage of being intu¬ 
itive: Thinking in terms of a deterministic path 
plus an uncertain term is easier than thinking 
in terms of abstract probability distributions. 
There are other reasons why stochastic calculus 
is the methodology of choice in economics and 
finance but easy intuition plays a key role. 


For example, a risk-free bank account, which 
earns a deterministic instantaneous interest rate 
/(f), evolves according to the deterministic law: 

y — A exp ^ J f(t)dt\ 


which is the general solution of the differential 
equation: 

// = 

y 

The solution of this differential equation tells us 
how the bank account cumulates over time. 

However, if the rate is not deterministic but is 
subject to volatility— that is, at any instant the 
rate is/(f) plus a random disturbance—then the 
bank account evolves as a stochastic process. 
That is to say, the bank account might follow 
any of an infinite number of different paths: 
Each path cumulates the rate /(f) plus the ran¬ 
dom disturbance. In a sense that will be made 
precise in this entry and with an understand¬ 
ing of stochastic differential equations, we must 
solve the following equation: 


dy 

y 


= f(t)dt plus random distrubance 


Here is where stochastic integration comes into 
play: It defines how the stochastic rate process 
is transformed into the stochastic account pro¬ 
cess. This is the direct stochastic integration ap¬ 
proach. 

It is possible to take a different approach. At 
any instant f, the instantaneous interest rate and 
the cumulated bank account have two prob¬ 
ability distributions. We could use a partial 
differential equation to describe how the prob¬ 
ability distribution of the cumulated bank ac¬ 
count is linked to the interest rate probability 
distribution. 

Similar reasoning applies to stock and deriva¬ 
tive price processes. In continuous-time finance, 
these processes are defined as stochastic pro¬ 
cesses that are the solution of a stochastic dif¬ 
ferential equation. Hence, the importance of 
stochastic integrals in continuous-time finance 
theory should be clear. 
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Following some remarks on the informal in¬ 
tuition behind stochastic integrals, we proceed 
to define Brownian motions and outlines the 
formal mathematical process through which 
stochastic integrals are defined. A number of 
properties of stochastic integrals are then es¬ 
tablished. After introducing stochastic integrals 
informally, we go on to define more rigorously 
the mathematical process for defining stochas¬ 
tic integrals. 


THE INTUITION BEHIND 
STOCHASTIC INTEGRALS 

Let's first contrast ordinary integration with 
stochastic integration. A definite integral 

b 

A = J f(x)dx 

a 

is a number A associated to each function/(x) 
while an indefinite integral 

X 

y(x) = J f (s)ds 

a 

is a function y associated to another function 
/. The integral represents the cumulation of 
the infinite terms f(s)ds over the integration 
interval. 

A stochastic integral, which we will denote by 

b 

W = J X t dB t 


W= J X t odB t 

a 

is a random variable W associated to a stochas¬ 
tic process if the time interval is fixed or, if the 
time interval is variable, is another stochastic 
process W t . The stochastic integral represents 
the cumulation of the stochastic products X t dB t . 
The rationale for this approach is that we need 


to represent how random shocks feed back into 
the evolution of a process. We can cumulate 
separately the deterministic increments and the 
random shocks only for linear processes. In 
nonlinear cases, as in the simple example of the 
bank account, random shocks feed back into 
the process. For this reason we define stochas¬ 
tic integrals as the cumulation of the product 
of a process X by the random increments of a 
Brownian motion. 

Consider a stochastic process X f over an in¬ 
terval [S,T]. Recall that a stochastic process is a 
real variable X(co)t that depends on both time 
and the state of the economy co. For any given 
co, X(-)f is a path of the process from the origin 
S to time T. A stochastic process can be identi¬ 
fied with the set of its paths equipped with an 
appropriate probability measure. A stochastic 
integral is an integral associated to each path; it 
is a random variable that associates a real num¬ 
ber, obtained as a limit of a sum, to each path. 
If we fix the origin and let the interval vary, 
then the stochastic integral is another stochastic 
process. 

It would seem reasonable, prima facie, to de¬ 
fine the stochastic integral of a process X(co) t 
as the definite integral in the sense of Rieman- 
Stieltjes associated to each path X( ) f of the pro¬ 
cess. If the process X(co) t has continuous paths 
X(-, co), the integrals 

T 

W(fl)) = J X(s,co)ds 
s 

exist for each path. Flowever, as discussed in 
the previous section, this is not the quantity we 
want to represent. In fact, we want to represent 
the cumulation of the stochastic products X t dB t . 
Defining the integral 


b 



a 


pathwise in the sense of Rieman-Stieltjes would 
be meaningless because the paths of a Brownian 
motion are not of finite variation. If we define 
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stochastic integrals simply as the limit of X t dB t 
sums, the stochastic integral would be infinite 
(and therefore useless) for most processes. 

However, Brownian motions have bounded 
quadratic variation. Using this property, we can 
define stochastic integrals pathwise through 
an approximation procedure. The approxi¬ 
mation procedure to arrive at such a def¬ 
inition is far more complicated than the 
definition of the Rieman-Stieltjes integrals. 
Two similar but not equivalent definitions of 
stochastic integral have been proposed, the 
first by the Japanese mathematician Kiyoshi Ito 
(1951), the second by the Russian physicist Rus¬ 
lan Stratonovich in the 1960s. 1 The definition 
of stochastic integral in the sense of Ito inte¬ 
gral or of Stratonovich stochastic replaces the 
increments Ax, with the increments AB; of a 
fundamental stochastic process called Brown¬ 
ian motion. 2 The increments AB, represent the 
"noise" of the process. 

The definition proceeds in the following three 
steps: 

* Step 1. The first step consists in defining a 
fundamental stochastic process—the Brown¬ 
ian motion. In intuitive terms, a Brownian mo¬ 
tion B t {o>) is a continuous limit (in a sense 
that will be made precise in the following 
sections) of a simple random walk. A simple 
random walk is a discrete-time stochastic pro¬ 
cess defined as follows. A point can move 
one step to the right or to the left. Movement 
takes place only at discrete instants of time, 
say at time 1,2,3,.... At each discrete instant, 
the point moves to the right or to the left with 
probability \. 

The random walk represents the cumula¬ 
tion of completely uncertain random shocks. 
At each point in time, the movement of the 
point is completely independent from its past 
movements. Hence, the Brownian motion 
represents the cumulation of random shocks 
in the limit of continuous time and of contin¬ 
uous states. It can be demonstrated that a.s. 
each path of the Brownian motion is not of 


bounded total variation but it has bounded 
quadratic variation. 

Recall that the total variation of a function 
/(x) is the limit of the sums 

£ \f(*) - f(xi-i)\ 

while the quadratic variation is defined as the 
limit of the sums 

^|/( x ,)-/( x ,_ 1 )| 2 

Quadratic variation can be interpreted as the 
absolute volatility of a process. Thanks to this 
property, the AB, of the Brownian motion 
provides the basic increments of the stochas¬ 
tic integral, replacing the Ax, of the Rieman- 
Stieltjes integral. 

• Step 2. The second step consists in defining the 
stochastic integral for a class of simple func¬ 
tions called elementary functions. Consider the 
time interval [S,T] and any partition of the 
interval [S,T] in N subintervals: S = to < h 
< .. .ti < .. .f.v = T. An elementary function 
f is a function defined on the time t and the 
outcome to such that it assumes a constant 
value on the z'-th subinterval. Call I[t;+i, f,) 
the indicator function of the interval [t, + i, ti). 
The indicator function of a given set is a func¬ 
tion that assumes value 1 on the points of the 
set and 0 elsewhere. We can then write an 
elementary function (j) as follows: 

®) = ti) 

i 

In other words, the constants £, (o>) are random 
variables and the function <p{t,(o) is a stochas¬ 
tic process made up of paths that are constant 
on each z'-th interval. 

We can now define the stochastic integral, 
in the sense of Ito, of elementary functions 
<p{t,co) as follows: 

T 

W=J 0(f, co)dBt(a>) 

s 

= £e;(<y)[B i+ i(«) - B,(®)] 

i 

where B is a Brownian motion. 
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It is clear from this definition that W is a 
random variable co -> W(co). Note that the 
Ito integral thus defined for elementary func¬ 
tions cumulates the products of the elemen¬ 
tary functions <p(t,co) and of the increments of 
the Brownian motion B t (a>). 

It can be demonstrated that the follow¬ 
ing property, called Ito isometry, holds for Ito 
stochastic integrals defined for bounded ele¬ 
mentary functions as above: 


~ (} \ 2 ‘ 


- T 

/ <p(t,to)dB t (to) J 

= £ 

/ <p(t, a>) 2 dt 

Vs / 


J 

Ls J 


The Ito isometry will play a fundamental role 
in Step 3. 

• Step 3. The third step consists in using the Ito 
isometry to show that each function g which is 
square-integrable (plus other conditions that 
will be made precise in the next section) can 
be approximated by a sequence of elementary 
functions <p„(t, a>) in the sense that 


£ 


_ T 

J Ig - (t>n(t,0))] 2 dt 

s 


0 


If g is bounded and has a continuous time- 
path, the functions </>„(f, a>) can be defined as 
follows: 


4>n(t, w) = (o)I[t i+ i, ti) 

i 

where I is the indicator function. We can now 
use the Ito isometry to define the stochastic 
integral of a generic function/(f, co) as follows: 

T T 

I f(t,co)dB f ((i>)= lim / cp„(t, to)dB f (to) 

J n—>oo J 

S S 

The Ito isometry ensures that the Cauchy 
condition is satisfied and that the above se¬ 
quence thus converges. 


In outlining the above definition, we omitted 
an important point that will be dealt with in 
the next section: The definition of the stochastic 
integral in the sense of Ito requires that the el¬ 


ementary functions be without anticipation— 
that is, they depend only on the past history 
of the Brownian motion. In fact, in the case of 
continuous paths, we wrote the approximating 
functions as follows: 


<pn(t, to) = Y^g(ti,to)[B i+ i(o>) - B,(&>)] 

i 

taking the function g in the left extreme of each 
subinterval. 

However, the definition of stochastic integrals 
in the sense of Stratonovich admits anticipation. 
In fact, the stochastic integral in the sense of 
Stratonovich, written as follows 

T 

J f(t, to) o dB t (to) 

s 

uses the following approximation under the as¬ 
sumption of continuous paths: 

(pn{i, (O) = to)[B i+1 (to) - Bi(to)] 

i 

where 

,* _ fr+l ~ h 

' ~~ 2 

is the midpoint of the z-th subinterval. 

Whose definition—Ito's or Stratonovich's—is 
preferable? Note that neither can be said to be 
correct or incorrect. The choice of the one over 
the other is a question of which one best repre¬ 
sents the phenomena under study. The lack of 
anticipation is one reason why the Ito integral 
is generally preferred in finance theory. 

We have just outlined the definition of 
stochastic integrals leaving aside mathematical 
details and rigor. The following two sections 
will make the above process mathematically 
rigorous and will discuss the question of an¬ 
ticipation of information. While these sections 
are a bit technical and might be skipped by 
those not interested in the mathematical details 
of stochastic calculus, they explain a number of 
concepts that are key to the modern develop¬ 
ment of finance theory. 
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BROWNIAN MOTION 
DEFINED 

The previous section introduced Brownian mo¬ 
tion informally as the limit of a simple ran¬ 
dom walk when the step size goes to zero. This 
section defines Brownian motion formally. The 
term "Brownian motion" is due to the Scot¬ 
tish botanist Robert Brown who in 1828 ob¬ 
served that pollen grains suspended in a liquid 
move irregularly. This irregular motion was 
later explained by the random collision of the 
molecules of the liquid with the pollen grains. It 
is therefore natural to represent Brownian mo¬ 
tion as a continuous-time stochastic process that 
is the limit of a discrete random walk. 

Let's now formally define Brownian motion 
and demonstrate its existence. Let's first go back 
to the probabilistic representation of the econ¬ 
omy. The economy can be represented as a prob¬ 
ability space (fi, 3, P), where £2 is the set of 
all possible economic states, 3 is the event a- 
algebra, and P is a probability measure. The eco¬ 
nomic states cl> e £2 are not instantaneous states 
but represent full histories of the economy for 
the time horizon considered, which can be a fi¬ 
nite or infinite interval of time. In other words, 
the economic states are the possible realization 
outcomes of the economy. 

In this probabilistic representation of the 
economy, time-variable economic quantities— 
such as interest rates, security prices or cash 
flows as well as aggregate quantities such as 
economic output—are represented as stochas¬ 
tic processes X t (co). In particular, the price and 
dividend of each stock are represented as two 
stochastic processes S f (&>) and d t (co). 

Stochastic processes are time-dependent ran¬ 
dom variables defined over the set £2. It is criti¬ 
cal to define stochastic processes so that there is 
no anticipation of information, that is, at time t 
no process depends on variables that will be re¬ 
alized later. Anticipation of information is pos¬ 
sible only within a deterministic framework. 
However the space £2 in itself does not contain 
any coherent specification of time. If we asso¬ 


ciate random variables X t (co) to a time index 
without any additional restriction, we might 
incur the problem of anticipation of informa¬ 
tion. Consider, for instance, an arbitrary fam¬ 
ily of time-indexed random variables X t (a>) and 
suppose that, for some instant t, the relation¬ 
ship X t (a>) — X f+ i(&>) holds. In this case there is 
clearly anticipation of information as the value 
of the variable X t+ i(&>) at time t + 1 is known at 
an earlier time t. All relationships that lead to 
anticipation of information must be treated as 
deterministic. 

The formal way to specify in full generality 
the evolution of time and the propagation of in¬ 
formation without anticipation is through the 
concept of filtration. The concept of filtration is 
based on identifying all events that are known 
at any given instant. It is the propagation of 
information assuming that it is possible to as¬ 
sociate to each moment t a a-algebra of events 
3 f c 3 formed by all events that are known 
prior to or at time t. It is assumed that events 
are never "forgotten," that is, that 3; C 3 S , if t 
< s. An increasing sequence of er-algebras, each 
associated to the time at which all its events 
are known, represents the propagation of in¬ 
formation. This sequence (called a filtration) is 
typically indicated as 3 t . 

The economy is therefore represented as a 
probability space (£2,3, P) equipped with a 
filtration {3 t }. The key point is that every pro¬ 
cess Xfoi) that represents economic or finan¬ 
cial quantities must be adapted to the filtration 
{3f}, that is, the random variable X t (w) must 
be measurable with respect to the cr-algebras 
3f. In simple terms, this means that each event 
of the type X t (co) < x belongs to 3f while each 
event of the type X s (co) < y for t < s belongs to 
3 S . For instance, consider a process Pfco), which 
might represent the price of a stock. Any coher¬ 
ent representation of the economy must ensure 
that events such as {&>: P s (co) < c} are not known 
at any time t < s. The filtration {3f} prescribes 
all events admissible at time t. 

Why do we have to use the complex con¬ 
cept of filtration? Why can't we simply identify 
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information at time t with the values of all the 
variables known at time t as opposed to iden¬ 
tifying a set of events? The principal reason is 
that in a continuous-time continuous-state en¬ 
vironment any individual value has probability 
zero; we cannot condition on single values as 
the standard definition of conditional probabil¬ 
ity would become meaningless. In fact, in the 
standard definition of conditional probability, 
the probability of the conditioning event ap¬ 
pears in the denominator and cannot be zero. 

It is possible, however, to reverse this reason¬ 
ing and construct a filtration starting from a 
process. Suppose that a process X t (co) does not 
admit any anticipation of information, for in¬ 
stance because the X f (a>) are all mutually 
independent. We can therefore construct a fil¬ 
tration b t as the strictly increasing sequence of 
a-algebras generated by the process X t (a>). Any 
other process must be adapted to bf. 

Let's now go back to the definition of the 
Brownian motion. Suppose that a probability 
space (L2, b, P) equipped with a filtration bf is 
given. A one-dimensional standard Brownian mo¬ 
tion is a stochastic process B t (co) with the follow¬ 
ing properties: 

• B f (&>) is defined over the probability space (£2, 

S, P). 

• Bt(co) is continuous for 0 < t < oo. 

• B 0 (co) = 0. 

• B f (a>) is adapted to the filtration b f . 

• The increments B t (co) - B s (co ) are independent 

and normally distributed with variance (f-s) 

and zero mean. 

The above conditions 3 state that the standard 
Brownian motion is a stochastic process that 
starts at zero, has continuous paths and nor¬ 
mally distributed increments whose variance 
grows linearly with time. Note that in the last 
condition the increments are independent of 
the cr-algebra bs and not of the previous val¬ 
ues of the process. As noted above, this is be¬ 
cause any single realization of the process has 
probability zero and it is therefore impossible to 
use the standard concept of conditional proba¬ 


bility: Conditioning must be with respect to a 
a-algebra bs- Once this concept has been firmly 
established, one might speak loosely of inde¬ 
pendence of the present values of a process from 
its previous values. It should be clear, however, 
that what is meant is independence with respect 
to a CT-algebra b s . 

Note also that the filtration bt is an inte¬ 
gral part of the above definition of the Brow¬ 
nian motion. This does not mean that, given 
any probability space and any filtration, a 
standard Brownian motion with these char¬ 
acteristics exists. For instance, the filtration 
generated by a discrete-time continuous-state 
random walk is insufficient to support a Brow¬ 
nian motion. The definition states only that we 
call a one-dimensional standard Brownian mo¬ 
tion a mathematical object (if it exists) made up 
of a probability space, a filtration, and a time 
dependent random variable with the properties 
specified in the definition. 

However, it can be demonstrated that Brown¬ 
ian motions exist by constructing them. Sev¬ 
eral construction methodologies have been 
proposed, including methodologies based on 
the Kolmogorov extension theorem or on con¬ 
structing the Brownian motion as the limit 
of a sequence of discrete random walks. To 
prove the existence of the standard Brownian 
motion, we will use the Kolmogorov extension 
theorem. 

The Kolmogorov theorem can be summarized 
as follows. Consider the following family of 
probability measures 

bf! X . . . X H„) 

= P[(X h G Hi, ..., X tm G H m ), Hi G B n ] 

for all fi,.. ,,tk e [0, oo), k e N and where the 
Hs are n-d imensional Borel sets. Suppose that 
the following two consistency conditions are 
satisfied 

/h„ (1) ,...,f„ (m) (Hi x ... x H m ) 

— bfi,...,f m (^cr - 1 (l) X ... X H 0 -\^ m ^) 
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for all permutations a on {1,2,..., k}, and 
x ...x H k ) 

= . fm (Hi x ... x H k x R n x ... x R n ) 

for all m. The Kolmogorov extension theorem 
states that, if the above conditions are satisfied, 
then there is (1) a probability space (£2,3, P) and 
(2) a stochastic process that admits the proba¬ 
bility measures 

x ... x H m ) 

= P[(X h eH!. X tm e H m ), H, e B n ] 

as finite dimensional distributions. 

The construction is lengthy and technical and 
we omit it here, but it should be clear how, with 
an appropriate selection of finite-dimensional 
distributions, the Kolmogorov extension the¬ 
orem can be used to prove the existence of 
Brownian motions. The finite-dimensional dis¬ 
tributions of a one-dimensional Brownian mo¬ 
tion are distributions of the type 

Mfi. t k (H i x ... x HO 

p(t, X, X\)p(t2 — h, X\, X 2 ) ... 

p(t k - 4-i, x k -i, x k )dx k ... dx k Hi x ...x H k 
where 

p(t, x, y) = ( 2nt )"2 exp 

and with the convention that the integrals are 
taken with respect to the Lebesgue measure. 
The distribution p(t, x, X\) in the integral is the 
initial distribution. If the process starts at zero, 
p(t, x, X\) is a Dirac delta, that is, it is a distribu¬ 
tion of mass 1 concentrated in one point. 

It can be verified that these distributions 
satisfy the above consistency conditions; the 
Kolmogorov extension theorem therefore en¬ 
sures that a stochastic process with the above 
finite dimensional distributions exists. It can 
be demonstrated that this process has nor¬ 
mally distributed independent increments with 
variance that grows linearly with time. It is 


therefore a one-dimensional Brownian motion. 
These definitions can be easily extended to an 
n-dimensional Brownian motion. 

In the initial definition of a Brownian motion, 
we assumed that a filtration SS t was given and 
that the Brownian motion was adapted to the 
filtration. In the present construction, however, 
we reverse this process. Given that the pro¬ 
cess we construct has normally distributed, sta¬ 
tionary, independent increments, we can define 
the filtration 3 t as the filtration 3f generated 
by The independence of the increments 

of the Brownian motion guarantee the absence 
of anticipation of information. Note that if we 
were given a filtration 3 f larger than the filtra¬ 
tion 3f, Bf(a>) would still be a Brownian motion 
with respect to 3f, 

In stochastic differential equations, there are 
two types of solutions of stochastic differen¬ 
tial equations—strong and weak—depending 
on whether the filtration is given or generated 
by the Brownian motion. The implications of 
these differences for economics and finance will 
be discussed in the same section. 

The above construction does not specify 
uniquely the Brownian motion. In fact, there 
are infinite stochastic processes that start from 
the same point and have the same finite di¬ 
mensional distributions but have totally dif¬ 
ferent paths. However, it can be demonstrated 
that only one Brownian motion has continuous 
paths a.s. ( a.s. means almost surely; that is, for 
all paths except a set of measure zero). This pro¬ 
cess is called the canonical Brownian motion. Its 
paths can be identified with the space of con¬ 
tinuous functions. 

The Brownian motion can also be constructed 
as the continuous limit of a discrete random 
walk. Consider a simple random walk W, where 
i are discrete time points. The random walk 
is the motion of a point that moves Ax to the 
right or to the left with equal probability V2 at 
each time increment Ax. The total displacement 
X, at time i is the sum of i independent incre¬ 
ments each distributed as a Bernoulli variable. 
Therefore the random variable X has a binomial 
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distribution with mean zero and variance: 

A 2 x 

~aF 

Suppose that both the time increment and the 
space increment approach zero: At —> 0 and Ax 
0. Note that this is a very informal state¬ 
ment. In fact what we mean is that we can 
construct a sequence of random walk processes 
W", each characterized by a time step and by 
a time displacement. It can be demonstrated 
that if 

A 2 x 

- > a 

At 

(i.e., the square of the spaced interval and the 
time interval are of the same order) then the 
sequence of random walks approaches a Brow¬ 
nian motion. Though this is intuitive as the 
binomial distributions approach normal distri¬ 
butions, it should be clear that it is far from 
being mathematically obvious. 

Figure 1 illustrates 100 realizations of a Brow¬ 
nian motion approximated as a random walk. 
The exhibit clearly illustrates that the standard 


deviation grows with the square root of the time 
as the variance grows linearly with time. In 
fact, as illustrated, most paths remain confined 
within a parabolic region. 


PROPERTIES OF BROWNIAN 
MOTION 

The paths of a Brownian motion are rich struc¬ 
tures with a number of surprising properties. It 
can be demonstrated that the paths of a canon¬ 
ical Brownian motion, though continuous, are 
nowhere differentiable. It can also be demon¬ 
strated that they are fractals of fractal dimen¬ 
sion 3 / 2 - The fractal dimension is a concept that 
measures quantitatively how a geometric ob¬ 
ject occupies space. A straight line has fractal 
dimension one, a plane has fractal dimension 
two, and so on. Fractal objects might also have 
intermediate dimensions. This is the case, for 
example, of the path of a Brownian motion, 
which is so jagged that, in a sense, it occupies 
more space than a straight line. 


30 
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Figure 1 Illustration of 100 Paths of a Brownian Motion Generated as an Arithmetic Random Walk 




480 


Stochastic Processes and Tools 



Figure 2 Illustration of the Fractal Properties of the Paths of a Brownian Motion 

Note: Five paths of a Brownian motion are generated as random walks with different time steps and then 

magnified. 


The fractal nature of Brownian motion paths 
implies that each path is a self-similar object. 
This property can be illustrated graphically. If 
we generate random walks with different time 
steps, we obtain jagged paths. If we allow paths 
to be graphically magnified, all paths look alike 
regardless of the time step with which they have 
been generated. In Figure 2, sample paths are 
generated with different time steps and then 
portions of the paths are magnified. Note that 
they all look perfectly similar. 

This property was first observed by Mandel¬ 
brot (1963) in sequences of cotton prices in the 
1960s. In general, if one looks at asset or com¬ 
modity price time series, it is difficult to rec¬ 
ognize their time scale. For instance, weekly or 
monthly time series look alike. (Recent empiri¬ 
cal and theoretical research work has made this 
claim more precise.) 

Let's consider a one-dimensional standard 
Brownian motion. If we wait a sufficiently long 
period of time, every path except a set of 
paths of measure zero will return to the ori¬ 


gin. The path between two consecutive pas¬ 
sages through zero is called an excursion of the 
Brownian motion. The distribution of the max¬ 
imum height attained by an excursion and of 
the time between two passages through zero 
or through any level have interesting proper¬ 
ties. The distribution of the time between two 
passages through zero has infinite mean. This 
is at the origin of the so-called St. Petersburg 
paradox described by the Swiss mathematician 
Bernoulli. The paradox consists of the follow¬ 
ing. Suppose a player bets increasing sums on 
a game that can be considered a realization of 
a random walk. As the return to zero of a ran¬ 
dom walk is a sure event, the player is certain 
to win—but while the probability of winning is 
one, the average time before winning is infinite. 
To stay the game, the capital required is also in¬ 
finite. Difficult to imagine a banker ready to put 
up the money to back the player. 

The distribution of the time to the first pas¬ 
sage through zero of a Brownian motion is not 
Gaussian. In fact, the probability of a very long 
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waiting time before the first return to zero is 
much higher than in a normal distribution. It is 
a fat-tailed distribution in the sense that it has 
more weight in the tail regions than a normal 
distribution. The distribution of the time to the 
first passage through zero of a Brownian motion 
is an example of how fat-tailed distributions can 
be generated from Gaussian variables. 


STOCHASTIC INTEGRALS 
DEFINED 

Let's now go back to the definition of stochas¬ 
tic integrals, starting with one-dimensional 
stochastic integrals. Suppose that a probabil¬ 
ity space (Q, 3, P) equipped with a filtration 
St is given. Suppose also that a Brownian mo¬ 
tion B t (a>) adapted to the filtration 3 f is given. 
We will define Ito integrals following the three- 
step procedure outlined earlier in this entry. 
We have just completed the first step defin¬ 
ing Brownian motion. The second step con¬ 
sists in defining the Ito integral for elementary 
functions. 

Let's first define the set <t>(S, T) of functions 
<b(S, T) = {f(t,co): [(0, oo)x £2 —»■ R]} with the 
following properties: 


• Each/ is jointly B x S measurable. 

• Each/(f,&>) is adapted to St- 

~ T 


• E 


f f 2 (t,ca)dt 


< oo (this condition 


s 

can be weakened). 


This is the set of paths for which we define the 
Ito integral. 

Consider the time interval [S,T] and, for each 
integer n, partition the interval [S,T] in subin¬ 
tervals: fo<fi<. . ,t\<.. ,t n <.. i/v= T in this way: 


tk = % 


k2~ n if S < k2~ n < T 
S if < S 
T if £2" > T 


Consider the elementary functions (p{t,co) e <t> 
which we write as 

cp(t, = Bi(a>)I[ti+i - U) 

i 

As <j)(t,co) e <t>, £;(&>) are Jq measurable random 
variables. 

We can now define the stochastic integral, in 
the sense of Ito, of elementary functions </>(£&>) 
as 


T 

W= f <p(t, co)dB t (co) = ^ei(q>)[Bi+i(a>)-B,-(q>)] 
s 

where B is a Brownian motion. Note that the 
Sj(co) and the increments Bj(co) —B,(o>) are inde¬ 
pendent for j > i. The key aspect of this def¬ 
inition that was not included in the informal 
outline is the condition that the £;(&>) are S^ 
measurable. 

For bounded elementary functions ) e <!> 

the Ito isometry holds 


~(l \ 2 ‘ 


- T 

/ </>(L <u)dB f (o>) J 

= £ 

j <p(t, w) 2 dt 

Vs / 


J 

Ls J 


The demonstration of the Ito isometry rests 
on the fact that 


E[siSj(B ti+1 - B ti )(B tj+1 - B tj )] = 

This completes the definition of the stochastic 
integral for elementary functions. 

We have now completed the introduction of 
Brownian motions and defined the Ito integral 
for elementary functions. Let's next introduce 
the approximation procedure that allows us to 
define the stochastic integral for any (p(t,a >). We 
will develop the approximation procedure in 
the following three additional steps that we will 
state without demonstration: 


0 if f / j 
E(e?)iii = j 


This rule provides a family of partitions of the * Step 1. Any function g(t,co) e <t> that is bounded 
interval [S,X] which can be arbitrarily refined. and such that all its time paths /(•, a>) are 
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continuous functions of time can be approxi- Hence we can define the Ito stochastic integral 
mated by as follows: 


4>n(t, a>) = co)I[tj+l - U) 

i 


in the sense that: 


T 

E J [(g ~ <Pn) 2 dt] —> 0, 

s 


n -* oo, Vco 


where the intervals are those of the partition 
defined above. Note that </>„(f, &>) e <J> given 
that g(t, co) e <t>. 

* Step 2. We release the condition of time-path 
continuity of the cp„(t , co). It can be demon¬ 
strated that any function h(t, co) e <t> which 
is bounded but not necessarily continuous 
can be approximated by functions g„(t, co) e 
<t>, which are bounded and continuous in the 
sense that 


£ 



0 


T 

~ T 

I[f](w)= f f(t,co)dBt(co)= lim 

1 n—>oo 

f (j) n (t, co)dt 

j 

S 

J 

Ls J 

The limit exists as 


T 


/ </>„(£ co)dBt(co) 



s 


forms a Cauchy sequence by the Ito isome¬ 
try, which holds for every bounded elementary 
function. 

Let's now summarize the definition of the Ito 
stochastic integral: Given any function/(t, co) e 
<t>, we define the Ito stochastic integral by 


T 

~ T 

I[f](w)= f f(t,(i))dBt(co)= lim 

J n—>oo 

/ (,p n (t , C0)dt 

S 

J 

Ls J 


where the functions /„(f, co) e <t> are a sequence 
of elementary functions such that 


• Step 3. It can be demonstrated that any func¬ 
tion/(t, co) e <t>, not necessarily bounded or 
continuous, can be approximated by a se¬ 
quence of bounded functions h n (t, co) e O in 
the sense that 


£ 



0 



0 


The multistep procedure outlined above en¬ 
sures that the sequence /„(f, co) e <J> exists. In 
addition, it can be demonstrated that the Ito 
isometry holds in general for every f(t, o>) e <t> 


We now have all the building blocks to com¬ 
plete the definition of Ito stochastic integrals. In 
fact, by virtue of the above three-step approx¬ 
imation procedure, given any function/(t, co) 
e <t>, we can choose a sequence of elementary 
functions 4> n (t, co) e O such that the following 
property holds: 


£ 



LS 


- (pnfdt 


0 



7 ? \ 2 ’ 


~ T 

£ 

/ f(t,co)dB t (co)\ 

—E 

[ f(t,co) 2 dt 


Vs / 


J 

Ls J 


SOME PROPERTIES OF ITO 
STOCHASTIC INTEGRALS 

Suppose that/, g e <L(S, T) and let 0 < S < U 
< T. It can be demonstrated that the following 


















Stochastic Integrals 


483 


properties of Ito stochastic integrals hold: 


/ 


fdB t = 


U T 

/ fdB,+ / 


s u 


fdB t fora.a.tt) 


£ 



= 0 


T T T 

J (cf + dg)dB t = c J fdBt + d J gdB t , 

s s s 

for a.a. w,c,d constants 

If we let the time interval vary, say (0, f), 
then the stochastic integral becomes a stochastic 
process: 

t 

It(co) = J fdBt 

o 


It can be demonstrated that a continuous ver¬ 
sion of this process exists. The following three 
properties can be demonstrated from the defi¬ 
nition of integral: 

t 

J dB s — B t 
o 


J sdB s = tB t — J B s ds 


J B s dB s — 


The last two properties show that, after per¬ 
forming stochastic integration, deterministic 
terms might appear. 


KEY POINTS 

* Stochastic integration provides a coherent 
way to represent that instantaneous uncer¬ 


tainty (or volatility) cumulates over time. It 
is thus fundamental to the representation of 
financial processes such as interest rates, secu¬ 
rity prices, or cash flows as well as aggregate 
quantities such as economic output. 

* Stochastic integration operates on stochastic 
processes and produces random variables or 
other stochastic processes. 

* Stochastic integration is a process defined on 
each path as the limit of a sum. However, 
these sums are different from the sums of 
the Riemann-Lebesgue integrals because the 
paths of stochastic processes are generally not 
of bounded variation. 

* Stochastic integrals in the sense of Ito are de¬ 
fined through a process of approximation. 

* Step 1 consists in defining Brownian motion, 
which is the continuous limit of a random 
walk. 

* Step 2 consists in defining stochastic inte¬ 
grals for elementary functions as the sums 
of the products of the elementary functions 
multiplied by the increments of the Brownian 
motion. 

* Step 3 extends this definition to any function 
through approximating sequences. 


NOTES 

1. The publications of Stratonovich can be 
found in Romanovski (2007). 

2. A history of stochastic integrations and fi¬ 
nancial mathematics is provided by Jarrow 
and Protter (2004). For a more detailed dis¬ 
cussion of stochastic integration, see Protter 
(1990). 

3. The set of conditions defining a Brownian 
motion can be more parsimonious. If a pro¬ 
cess has stationary, independent increments 
and continuous paths a.s. it must have nor¬ 
mally distributed increments. A process with 
stationary independent increments and with 
paths that are continuous to the right and 
limited to the left (the cndlag functions) is 
called a Levy process. 
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Abstract: In nontechnical terms, differential equations are equations that express a relationship 
between a function and one or more derivatives (or differentials) of that function. It would be 
difficult to overemphasize the importance of differential equations in financial modeling where 
they are used to express laws that govern the evolution of price probability distributions, the 
solution of economic variational problems (such as intertemporal optimization), and conditions for 
continuous hedging (such as in the Black-Scholes equation). The two broad types of differential 
equations are ordinary differential equations and partial differential equations. The former are 
equations or systems of equations involving only one independent variable; the latter are differential 
equations or systems of equations involving partial derivatives. When one or more of the variables 
is a stochastic process, we have the case of stochastic differential equations and the solution is also 
a stochastic process. An assumption must be made about driving noise in a stochastic differential 
equation. In most applications, it is assumed that the noise term follows a Gaussian random variable, 
although types of random variables can be assumed. 


Stochastic differential equations solve the prob¬ 
lem of giving meaning to a differential equation 
where one or more of its terms are subject to 
random fluctuations. For instance, consider the 
following deterministic equation: 

We know from differential equations that, by 
separating variables, the general solution of this 
equation can be written as follows: 


y — Aexp 



A stochastic version of this equation might 
be obtained, for instance, by perturbing the 
term/, thus resulting in the stochastic differential 
equation 

- = [/(f) + s]dt 

y 

where e is a random noise process. 

As with stochastic integrals, in defining 
stochastic differential equations it is necessary 
to adopt an ensemble view: The solution of 
a stochastic differential equation is a stochas¬ 
tic process, not a single function. In this en¬ 
try, we first provide the basic intuition behind 
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stochastic differential equations and then pro¬ 
ceed to formally define the concept and the 
properties. 

THE INTUITION BEHIND 
STOCHASTIC DIFFERENTIAL 
EQUATIONS 

Let's go back to the equation 

= [/(f) + e]y 

where e is a continuous-time noise pro¬ 
cess. It would seem reasonable to define a 
continuous-time noise process informally as 
the continuous-time limit of a zero-mean, IID 
sequence, that is, a sequence of independent 
and identically distributed variables with zero 
mean. In a discrete time setting, a zero-mean, 
IID sequence is called a white noise. We could en¬ 
visage defining a continuous-time white noise 
as the continuous-time limit of a discrete-time 
white noise. Each path of e is a function of time 
s(-,co). It would therefore seem reasonable to de¬ 
fine the solution of the equation pathwise, as 
the family of functions that are solutions of the 
equations 

= [/(0 + e(f, (o)\y 

where each equation corresponds to a specific 
white noise path. 

However, this definition would be meaning¬ 
less in the domain of ordinary functions. In 
other words, it would generally not be possible 
to find a family of functions y(-,&>) that satisfy 
the above equations for each white-noise path 
and that form a reasonable stochastic process. 

The key problem is that it is not possible to 
define a white noise process as a zero-mean 
stationary stochastic process with independent 
increments and continuous paths. Such a pro¬ 
cess does not exist in the domain of ordinary 
functions. 1 In discrete time the white noise pro¬ 
cess is obtained as the first-difference process 
of a random walk. Random zvalk is an integrated 


nonstationary process, while its first-difference 
process is a stationary IID sequence. 

The continuous-time limit of the random walk 
is the Brownian motion. However, the paths of 
a Brownian motion are not differentiable. As 
a consequence, it is not possible to take the 
continuous-time limit of first differences and 
to define the white noise process as the deriva¬ 
tive of a Brownian motion. In the domain of or¬ 
dinary functions in continuous time, the white 
noise process can be defined only through its in¬ 
tegral, which is the Brownian motion. The def¬ 
inition of stochastic differential equations must 
therefore be recast in integral form. 

A sensible definition of a stochastic dif¬ 
ferential equation must respect a number of 
constraints. In particular, the solution of a 
stochastic differential equation should be a 
"perturbation" of the associated deterministic 
equation. In the above example, for instance, 
we want the solution of the stochastic equation 

^ = lf(t) + s(t, co)]dt 
to be a perturbation of the solution 

V = ^exp f(t)dt\ 
of the associated deterministic equation 
j = f(t)dt 

In other words, the solution of a stochastic 
differential equation should tend to the solu¬ 
tion of the associated deterministic equation in 
the limit of zero noise. In addition, the solu¬ 
tions of a stochastic differential equation should 
be the continuous-time limit of some discrete¬ 
time process obtained by discretization of the 
stochastic equation. 

A formal solution of this problem was pro¬ 
posed by Kiyoshi Ito (1951) and, in a different 
setting, by Ruslan Stratonovich in the 1960s. Ito 
and Stratonovich proposed to give meaning to 
a stochastic differential equation through its in¬ 
tegral equivalent. The Ito definition proceeds 
in two steps: In the first step, Ito processes are 
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defined; in the second step, stochastic differen¬ 
tial equations are defined. 


• Step 1: Definition of ltd processes. Given two 
functions cp(t, co) and i,//(f, co) that satisfy usual 
conditions to be defined later, an ltd pro¬ 
cess —also called a stochastic integral—is a 
stochastic process of the form: 


r r 

Z(f, co) = J cp(s,co)ds + J f(s,co)dB s (s,co) 


An Ito process is a process that is the result of 
the sum of two summands: The first is an or¬ 
dinary integral, the second an Ito integral. Ito 
processes are stable under smooth maps, that 
is, any smooth function of an Ito process is an 
Ito process that can be determined through 
the Ito formula (see Ito processes below). 

• Step 2: Definition of stochastic differential equa¬ 
tions. As we have seen, it is not possible to 
write a differential equation plus a white- 
noise term that admits solutions in the do¬ 
main of ordinary functions. However, we 
can meaningfully write an integral stochas¬ 
tic equation of the form 

t t 

X(t, co) = J cp(s, X)ds + J i/r( s, X)dB s 
o o 


It can be demonstrated that this equation ad¬ 
mits solutions in the sense that, given two func¬ 
tions cp and i fr, there is a stochastic process X 
that satisfies the above equation. We stipulate 
that the above integral equation can be written 
in differential form as follows: 


dX(t , co) = cp(t, X)dt + i/r(f, X)dB t 


Note that this is a definition; a stochastic 
differential equation acquires meaning only 
through its integral form. In particular, we can¬ 
not divide both terms by dt and rewrite the 
equation as follows: 


dX(t, co) 
dt 


dB 

= <p(t, X) + 1 jr(t, X)—^~ 


The above equation would be meaningless be¬ 
cause the Brownian motion is not differentiable. 


This is the difficulty that precludes writing 
stochastic differential equations adding white 
noise pathwise. The differential notation of a 
stochastic differential equation is just a short¬ 
hand for the integral notation. 

However, we can consider a discrete approx¬ 
imation: 

AX(t, co) = < p*(t. X)A t + ir*(t, X)AB t 

Note that in this approximation the func¬ 
tions cp*(t, X), \//*(t. X) will not coincide with 
the functions cp(t , X), i /r(t, X). Using the latter 
would (in general) result in a poor approxima¬ 
tion. 

The following sections will define Ito pro¬ 
cesses and stochastic differential equations and 
study their properties. 


ITO PROCESSES 

Let's now formally define Ito processes and es¬ 
tablish key properties, in particular the Ito for¬ 
mula. In the previous section we stated that an 
Ito process is a stochastic process of the form 

t t 

Z(t,co) = J a(s,co)ds + J b(s, co)dB(s, co) 
o o 

To make this definition rigorous, we have to 
state the conditions under which (1) the inte¬ 
grals exist, and (2) there is no anticipation of 
information. Note that the two functions a and 
b might represent two stochastic processes and 
that the Riemann-Stieltjes integral might not ex¬ 
ist for the paths of a stochastic process. We have 
therefore to demonstrate that both the Ito inte¬ 
gral and the ordinary integral exist. To this end, 
we define Ito processes as follows. 

Suppose that a one-dimensional Brownian 
motion B t is defined on a probability space 
(U, b, P) equipped with a filtration 3t- The fil¬ 
tration might be given or might be generated 
by the Brownian motion B t . Suppose that both 
a and b are adapted to Zt and jointly measur¬ 
able in 3 x 9L Suppose, in addition, that the 
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following two integrability conditions hold: 


P 


- t 

J b 2 (s, co)ds < oo for all t > 0 

_o 


= 1 


and 


r 

j | a (s, co) | ds 


< oo for all t > 0 


LO 


= 1 


These conditions ensure that both integrals in 
the definition of Ito processes exist and that 
there is no anticipation of information. We can 
therefore define the Ito process as the following 
stochastic process: 


r r 

Z(f, co) = J a(s,co)ds + J b(s, co)dB s (s, co) 


Ito processes can be written in the shorter dif¬ 
ferential form as 


dZ t — adt + bdB t 

It should be clear that the latter formula is just 
a shorthand for the integral definition. 


THE ONE-DIMENSIONAL 
ITO FORMULA 

One of the most important results concerning 
Ito processes is a formula established by Ito 
that allows one to explicitly write down an Ito 
process that is a function of another Ito pro¬ 
cess. Ito's formula is the stochastic equivalent 
of the change-of-variables formula of ordinary 
integration. We will proceed in two steps. First 
we will introduce Ito's formula for functions 
of Brownian motion and then for functions of 
general Ito processes. Suppose that the function 
g(t, x) is twice continuously differentiable in 
[0,oo) x R and that B t is a one-dimensional 
Brownian motion. The function Y t = g(t, B t ) is a 
stochastic process. It can be demonstrated that 
the process Y t = g(t, B t ) is an Ito process of the 
following form 


The preceding is Ito's formula in the case the 
underlying process is a Brownian motion. For 
example, let's suppose that g(t, x) = x 2 . In this 
case we can write 


dt 


= 0,^=2*,h| 

dx dx 2 


= 2 


Inserting the above in Ito's formula we see that 
the process B 2 can be represented as the follow¬ 
ing Ito process 


dYt — dt ~F 2BfdBf 


or, explicitly in integral form 

t 

Y t = t + 2 J B s dB s 

o 

The nonlinear map g(t, x) = x 2 introduces a 
second term in dt. 

Let's now generalize Ito's formula. Suppose 
that X t is an Ito process given by dX t = adt + 
bdB t . As X t is a stochastic process, that is, a func¬ 
tion X(f, co) of both time and the state, it makes 
sense to consider another stochastic process Y t , 
which is a function of the former, Y t = g(t, Xt). 
Suppose that g is twice continuously differen¬ 
tiable on [0,oo) x R. 

It can then be demonstrated (we omit the de¬ 
tailed proof) that Y t is another Ito process that 
admits the representation 

dY t = — (t, X t ) dt + ^-(t, X t )dX t 
dt x ' ' 

where differentials are computed formally ac¬ 
cording to the rules known as Box algebra 

dt ■ dt = dt ■ dB t = dB t ■ dt = 0, dB t ■ dB t = dt 


Ito's formula can be written (perhaps more) 
explicitly as 


dY t = 


dt dx 


1 d2 8u2 


2 dx 2 


d Sr 


b ) dt H ——bdB 
dx 


This formula reduces to the ordinary formula 
for the differential of a compound function in 
the case where b — 0 (that is, when there is no 
noise). 
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As a second example of application of Ito's 
formula, consider the geometric Brownian 
motion: 

dXt = /xXfdt T o XfdBf 

where jJL,o are real constants, and consider the 
map g(t, x) — log x. In this case, we can write 

^>=0 ^ = — 

3 t dx x’ dx 2 x 2 

and Ito's formula yields 

dYt = d log Xf = — -ct 2 ^) dt + odB t 

STOCHASTIC DIFFERENTIAL 
EQUATIONS 

An Ito process defines a process Z(f, co) as the 
sum of the time integral of the process a(t, co) 
plus the Ito integral of the process b(t, co). Sup¬ 
pose that two functions cp(t , x), i/r(f, x) that 
satisfy conditions established below are given. 
Given an ltd process X(f, co), the two processes 
cp(t, X), X) admit respectively a time inte¬ 
gral and an ltd integral. It therefore makes sense 
to consider the following ltd process: 

t t 

Z(t,co) = J <p[s, X(s, co)]ds + J i/f[s, X(s, co)]dB s 
0 0 

The term on the right side transforms the pro¬ 
cess X into a new process Z. We can now ask if 
there are stochastic processes X that are mapped 
into themselves such that the following stochas¬ 
tic equation is satisfied: 

t t 

X(t,co) = J ^>[s, X(s, a>)]ds + J ijs[s, X(s, co)]dB s 
o o 

The answer is positive under appropriate 
conditions. It is possible to prove the follow¬ 
ing theorem of existence and uniqueness. Sup¬ 
pose that a one-dimensional Brownian motion 
B t is defined on a probability space (h!, X P) 
equipped with a filtration (it and that Bt is 
adapted to the filtration (Jf- Suppose also that the 
two measurable functions cp(t, x), x) map [0, 


T] x R —> R and that they satisfy the following 
conditions: 

\<p(t,x)\ 2 + \x/s(t,x)\ 2 <C(l + \x\) 2 , 
t e [0,T], x e R 

and 

\<p(t, x)\ - cp(t, y) + | ir{t, x)| - i jr{t, y) 

< D(\x — y\),t e [0, T], x e R 

for appropriate constants C, D. The first condi¬ 
tion is known as the linear growth condition, 
the last condition is the Lipschitz condition. 
Suppose that Z is a random variable inde¬ 
pendent of the a -algebra 2s x generated by B t 
for t > 0 such that E(\Z\ 2 ) < oo. Then there 
is a unique stochastic process X, defined for 
0 < f < T, with time-continuous paths such that 
Xo = Z and such that the following equation is 
satisfied: 

t 

X(t, a)) = Xo + J cp[s, X(s, co)]ds 
o 

t 

+ J tAIa X(s, co)]dB s 
o 

The process X is called a strong solution of the 
above equation. 

The above equation can be written in differ¬ 
ential form as follows: 

dX(t , <y) = cp[t, X(f, a>)]dt + i j/[t, X(t, co)]dB t 

The differential form does not have an indepen¬ 
dent meaning; a differential stochastic equation 
is just a short albeit widely used way to write 
the integral equation. 

The key requirement of a strong solution is 
that the filtration Tt is given and that the func¬ 
tions (p,i(c are adapted to the filtration 3f- From 
the economic (or physics) point of view, this 
requirement translates the notion of causality. 
In simple terms, a strong solution is a func¬ 
tional of the driving Brownian motion and of 
the "inputs" q>,ir. A strong solution at time f is 
determined only by the "history" up to time t of 
the inputs and of the random shocks embodied 
in the Brownian motion. 
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These conditions can be weakened. Suppose 
that we are given only the two functions (p(t, x), 
i /r(t, x) and that we must construct a process X t/ 
a Brownian motion B f , and the relative filtration 
so that the above equation is satisfied. The equa¬ 
tion still admits a unique solution with respect 
to the filtration generated by the Brownian mo¬ 
tion B. It is, however, only a weak solution in the 
sense that, though there is no anticipation of in¬ 
formation, it is not a functional of a given Brow¬ 
nian motion. (See, for example, Karatzas and 
Shreve [1991].) Weak and strong solutions do 
not necessarily coincide. However, any strong 
solution is also a weak solution with respect to 
the same filtration. 

Note that the solution of a differential equa¬ 
tion is a stochastic process. Initial conditions 
must therefore be specified as a random vari¬ 
able and not as a single value as for ordinary 
differential equations. In other words, there is 
an initial value for each state. It is possible to 
specify a single initial value as the initial condi¬ 
tion of a stochastic differential equation. In this 
case the initial condition is a random variable 
where the probability mass is concentrated in a 
single point. 

We omit the detailed proof of the theorem 
of uniqueness and existence. Uniqueness is 
proved using the Ito isometry and the Lips- 
chitz condition. One assumes that there are two 
different solutions and then demonstrates that 
their difference must vanish. The proof of exis¬ 
tence of a solution is similar to the proof of ex¬ 
istence of solutions in the domain of ordinary 
equations. The solution is constructed induc¬ 
tively by a recursive relationship of the type 

t 

X (fr+1) (t, oo) = J <p[s, X k (s, w)]ds 
0 

t 

+ J 1 ls[s,X k (s,co)]dB s 
o 

It can be shown that this recursive relationship 
produces a sequence of processes that converge 
to the unique solution. 


GENERALIZATION TO 
SEVERAL DIMENSIONS 

The concepts and formulas established so far for 
Ito (and Stratonovich) integrals and processes 
can be extended in a straightforward but of¬ 
ten cumbersome way to multiple variables. The 
first step is to define a d-dimensional Brownian 
motion. 

Given a probability space (U, 3, P) equipped 
with a filtration {3t}, a d-dimensional standard 
Brownian motion B t (co), is a stochastic process 
with the following properties: 

• Bf(w) is a d-dimensional process defined over 
the probability space (U, 3/ P) that takes val¬ 
ues in B' , . 

• Bt(a>) has continuous paths for 0 <t< oo. 

• B 0 (co) = 0. 

• Bf(&>) is adapted to the filtration 3f 

• The increments B t (co) — B s (a>) are independent 
of the a -algebra 3, and have a normal distri¬ 
bution with mean zero and covariance matrix 
(f — s)U, where I,f is the identity matrix. 

The above conditions state that the standard 
Brownian motion is a stochastic process that 
starts at zero, has continuous paths, and has 
normally distributed increments whose vari¬ 
ances grow linearly with time. 

The next step is to extend the definition of the 
Ito integral in a multidimensional environment. 
This is again a straightforward but cumbersome 
extension of the one-dimensional case. Suppose 
that the following r x d-dimensional matrix is 
given: 


vn 

■ Vld 

iv 1 

Vrd 


where each entry v,j = v tj , ( t,a> ) satisfies the fol¬ 
lowing conditions: 

1. Vy are 2? d x 3 measurable. 

2. Vjj are 3r a dapted. 

t 

3. P[f ( Vij) 2 ds < oo for all t > 0] = 1. 

o 
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Then, we define the multidimensional ltd 
integral 

dB 1 - 
dBd _ 

as the r-dimensional column vector whose 
components are the following sums of one¬ 
dimensional Ito integrals: 

Vij(s, co)dBj(s, a>) 

, = L Q 

Note that the entries of the matrix are func¬ 
tions of time and state: They form a vector of 
stochastic processes. Given the previous defi¬ 
nition of ltd integrals, we can now extend the 
definition of ltd processes to the multidimen¬ 
sional case. Suppose that the functions u and v 
satisfy the conditions established for the one¬ 
dimensional case. We can then form a multidi¬ 
mensional ltd process as the following vector of 
ltd processes: 




dXj — u\dt + vudBi + • • • + viddBj 


dX dr — iiydt T VridBi T ■ ■ * T VrddB L j 


or, in matrix notation 

dX = udt + vdB 


according to the following rules: 


dY k = 


dgk(t, X) 


dt + Y] 


3 gk(t, X) 


3 1 ^ 3X, 

3 2 gk(t, X) 


dX; 






dXjdXj 


-dXjdX; 


dBjdBj =1 if i = j , 0 if z ^ j, dBjdt = dtdBj=0 


SOLUTION OF STOCHASTIC 
DIFFERENTIAL EQUATIONS 

It is possible to determine an explicit solution 
of stochastic differential equations in the linear 
case and in a number of other cases that can 
be reduced to linear equations through func¬ 
tional transformations. Let's first consider lin¬ 
ear stochastic equations of the form: 

dXt = [A(t)X t + a(t)]dt + a(t)dB t , 0 < t < oo 
Xo = ? 

where B is an /'-dimensional Brownian motion 
independent of the d-dimensional initial ran¬ 
dom vector $ and the (d x d), (d x d), (d x r ) 
matrices A(t), a(t), a(t) are nonrandom and 
time dependent. 

The simplest example of a linear stochastic 
equation is the equation of an arithmetic Brow¬ 
nian motion with drift, written as follows: 

dX f = /rdf + odB t , 0 < t < oo 
Xq = 1-, fi, a constants 


After defining the multidimensional Ito pro¬ 
cess, multidimensional stochastic equations are 
defined in differential form in matrix notation 
as follows: 

dX(f, &>) = u[f, X\ (t, a/),..., Xd(f, a>)]dt 
+ v[f, Xj(f, Xrf(f, co)dB 

Consider now the multidimensional map: 
g(t, x) = \g\(t, x ),..., g d (t, x)], which maps the 
process X into another process Y = g(t, X). It can 
be demonstrated that Y is a multidimensional 
Ito process whose components are defined 


In linear equations of this type, the stochastic 
part enters only in an additive way through 
the terms cr,y(f)dB f . The functions a(t) are some¬ 
times called the instantaneous variances and 
covariances of the process. In the example of 
the arithmetic Brownian motion, /i is called the 
drift of the process and a the volatility of the 
process. 

It is intuitive that the solution of this equa¬ 
tion is given by the solution of the associated 
deterministic equation, that is, the ordinary dif¬ 
ferential equation obtained by removing the 
stochastic part, plus the cumulated random 
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disturbances. Let's first consider the associated 
deterministic differential equation 

dx 

— — A(t)x + a(t), 0 < t < oo 
dt 

where x(t) is a d-dimensional vector with initial 
conditions x(0) = §. 

It can be demonstrated that this equation has 
an absolutely continuous solution in the do¬ 
main 0 < t < oo. To find its solution, let's first 
consider the matrix differential equation 

d<t> 

—— = A(t)<£>, 0 < t < oo 
dt 


This matrix differential equation has an abso¬ 
lutely continuous solution in the domain 0 < 
t < oo. The matrix <t>(f) that solves this equa¬ 
tion is called the fundamental solution of the 
equation. It can be demonstrated that <D(f) is a 
nonsingular matrix for each t. Lastly, it can be 
demonstrated that the solution of the equation: 

dx 

— = A(t)x + a(t), 0 < f < oo 
dt 


with initial condition x(0) = §, can be written in 
terms of the fundamental solution as follows: 
r t 


X (t) = m 



<D 1 (s)a(s)ds 


, 0 < t < oo 


0 


Let's now go back to the stochastic equation 


dX f = [A(t)Xt + a(t)]dt + a(t)dBt, 0 < t < oo 


X 0 = £ 


Using Ito's formula, it can be demonstrated that 
the above linear stochastic equation admits the 
following unique solution: 


X(f) = <D(f) 


f J <f> 1 (s)a(s)ds 
o 

f cD- 1 (s)a(s)dB, 


, 0 < t < oo 


This effectively demonstrates that the solution 
of the linear stochastic equation is the solution 
of the associated deterministic equation plus 


the cumulated stochastic term 

t 

J <D~ 1 (s)a(s)dB s 
o 

To illustrate this, below we now specialize the 
above solutions in the case of arithmetic Brown¬ 
ian motion, Ornstein-Uhlenbeck processes, and 
geometric Brownian motion. 


The Arithmetic Brownian Motion 

The arithmetic Brownian motion in one dimension 
is defined by the following equation: 

dXf — /idt -(- odBf 

In this case, A(f) = 0, a(f) = /x, er(f) = a and the 
solution becomes 

X = /xf + a B 


The Ornstein-Uhlenbeck Process 

The Ornstein-Uhlenbeck process in one dimension 
is a mean-reverting process defined by the fol¬ 
lowing equation: 

dX t = —aXfdt + (jdB t 

It is a mean-reverting process because the drift 
is pulled back to zero by a term proportional to 
the process itself. In this case, A(f) = —a, a(f) = 
0, cr(f) = a and the solution becomes 

t 

X t = X 0 + e~ at + a J e- a(t - s) dB s 

o 

The Geometric Brownian Motion 

The geometric Brownian motion in one dimension 
is defined by the following equation: 

dX = pXdt + oXdB 

This equation can be easily reduced to the pre¬ 
vious linear case by the transformation: 

Y = log X 
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Let's apply Ito's formula 

dYl = ( 3 A + % + i?VU + & MB , 

\at dx 2 dx 2 ) dx 

where 

/ ^ , 9? „ 3y 1 3 2 y 1 

' 5 at at x dx 2 x 2 

We can then verify that the logarithm of the geo¬ 
metric Brownian motion becomes an arithmetic 
Brownian motion with drift 

f 1 2 

h = b - -CT 

The geometric Brownian motion evolves as a 
lognormal process: 

1 


X f = xq exp 


b 


2 CT 


t + crBt 


KEY POINTS 

* Stochastic differential equations give mean¬ 
ing to ordinary differential equations where 
some terms are subject to random perturba¬ 
tion. 

• Following Ito and Stratonovich, stochastic 
differential equations are defined through 
their integral equivalent: The differential no¬ 
tation is just a shorthand. 


• Ito processes are the sum of a time integral 
plus an Ito integral. 

• Ito processes are closed with respect to 
smooth maps: A smooth function of an Ito 
process is another Ito process defined through 
the Ito formula. 

• Stochastic differential equations are equa¬ 
tions established in terms of Ito processes. 

• Linear equations can be solved explicitly as 
the sum of the solution of the associated de¬ 
terministic equation plus a stochastic cumu¬ 
lative term. 


NOTE 

1. It is possible to define a "generalized white 
noise process" in the domain of "tempered 
distributions." See Oksendal (1992). 
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Abstract: The dynamic of a financial asset's returns and prices can be expressed using a deterministic 
process if there is no uncertainty about its future behavior, or with a stochastic process in the more 
likely case when the value is uncertain. Stochastic processes in continuous time are the most used 
tool to explain the dynamics of a financial asset's returns and prices. They are the building blocks 
with which to construct financial models for portfolio optimization, derivatives pricing, and risk 
management. Continuous-time processes allow for more elegant theoretical modeling compared 
to discrete-time models and many results proven in probability theory can be applied to obtain a 
simple evaluation method. 


In 1900, the father of modern option pricing the¬ 
ory, Louis Bachelier, proposed using Brownian 
motion for modeling stock market prices. There 
are several reasons why Brownian motion is a 
popular process. First, Brownian motion is the 
milestone of the theory of stochastic processes. 


However, more realistic general processes that 
are better suited for financial modeling such as 
Levy, additive, or self-similar processes have 
been developed only since the mid 1990s (see 
Samorodnitsky and Taqqu (1994), Sato (1999), 
and Embrechts and Maejima (2002)). Most of 
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the practical problems in mathematical finance 
can be solved by taking into consideration these 
new processes. For example, the concept of 
stochastic integral with respect to Brownian 
motion was introduced in 1933 and only in 
the 1990s has the general theory of stochas¬ 
tic integration with respect to semi-martingale 
appeared. From a practical point of view, the 
second reason for the popularity of Brownian 
motion is that the normal distribution allows 
one to solve real-world pricing problems such 
as option prices as estimations and simulations 
in a few seconds, and most of the problems 
have a closed-form solution which can be eas¬ 
ily used. See 0ksendal (2003) or Karatzas and 
Shreve (1991) for a complete theoretical treat¬ 
ment of financial applications of continuous¬ 
time stochastic processes driven by Brownian 
motion. 

The two basic classes of continuous-time 
stochastic processes are Brownian motion and 
the Poisson process. The name of the former is 
due to the botanist Robert Brown who in 1827 
described the movement of pollen suspended 
in water. The theory of Brownian motion was 
founded by the work of Norbert Wiener who 
was the first to prove its existence and, as a 
result. Brownian motion is sometimes also 
referred to as a Wiener process. The Poisson 
process generated by the Poisson distribution is 
the building block of pure jump processes. Both 
processes are fundamentally different with 
respect to their path properties and they belong 
to the larger class of Levy processes (for more 
details about Levy processes see Sato [1999]). 
Schoutens (2003), Cont and Tankov (2004), and 
Rachev et al. (2011) provide details of Levy 
processes with applications to option pricing. 

Infinitely divisible distributions, including a- 
stable and tempered stable distributions, can be 
considered to define continuous-time stochas¬ 
tic processes. In order to model the behavior 
of a financial asset's returns and prices, one 
can consider (1) a Brownian motion, (2) a pro¬ 
cess defined as the sum of a Brownian motion 
and a Poisson process, or (3) a pure jump Levy 
process. 


In this entry, we will discuss continuous-time 
stochastic processes. We will first consider pro¬ 
cesses consisting of jumps and then we will dis¬ 
cuss continuous processes without jumps. We 
then turn our focus to processes having ran¬ 
dom time instead of physical time. Finally, we 
will discuss a general process that contains all 
of these processes. 

SOME PRELIMINARIES 

Before we continue with the discussion and the 
construction of processes, we will briefly define 
terms that will be used in this entry. 

• A stochastic process X = (X t )t>o is a family of 
M-valued random variables X f with parame¬ 
ter t > 0, defined on the sample space U. For 
every outcome a> e £2, the function 1 i-* X t (co) 
is called a sample path of the process X. 

• Let X be a stochastic process. Given 0 < ti 
< t 2 < ... < f„, if the random variables 
Xf, — Xq , X fe — Xf,, • • ■, X f „ — Xf n l are inde¬ 
pendent, we say that X has independent 
increments. Moreover, for t > 0, if the dis¬ 
tribution of of X t+h — X f does not depend on 
t > 0, we say that X has stationary increments. 
Loosely speaking, one could say that the dis¬ 
tribution of the future changes does not de¬ 
pend on past realizations. 

• A process X is said to be non-decreasing, if 
Yf > 0 almost surely (a.s.) for t > 0, and Y t > 
Y s a.s. for 0 < s < t. Usually, a non-decreasing 
process is called a s ubordinator. A process X is 
said to be non-increasing if Yf < 0 a.s. for t > 
0, and Yf < Y s a.s. for 0 < s < t. 

• We say that a process X has finite (infinite) 
variation if its sample paths are of finite 
(infinite) variation, that is, the variation 

V(X(«))f = lim J2 IXf k /n{m) - X t{k _ 1)/n (a>)\, 

n—>oQ 

Vf > 0 

is finite (infinite) for almost every w e f2. 

• The characteristic function of the stochastic 
process X = (Xf) f >o on R is defined as the 
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function / : R —> C 

<M«) = E[e iuX ‘] 


POISSON PROCESS 

Consider a process N — (N t )t>o derived by 
a Poisson distribution with parameter X as 
follows: 

1. N 0 = 0 

2. N has independent increments and station¬ 
ary increments. 

3. For any real numbers t > 0 and h > 0, the 
variable (Nt+h — M) is a Poisson distributed 
random variable with parameter Xh, that is. 


P (N t+ h — Nt = n) = e ~ xh ^-, n = 0,1,2, ■ ■ ■ 

n! 

The process N is referred to as the Poisson process 
with intensity X. 

If (r j)j e jf are independent exponential random 
variables with parameter X and the random 
variable Nt is given by 


Nt — inf 


n>l:J 2 

7=1 


T j > t 


then it can be proven that the process (N t )t > o is 
the Poisson process with intensity X. 

The Poisson process is a fundamental exam¬ 
ple of a stochastic process with discontinuous 
trajectories, and a building block for construct¬ 
ing more complex jump processes. 


The characteristic function of X t is equal to 


(pxX u ) = exp 



— 1 )f(x)dx S j 


Moreover, if/ is given by the probability density 
function of the normal distribution, then X is 
referred to as a jump diffusion process. 


PURE JUMP PROCESS 

Consider a process X x — (Xf) t >o for a given real 
number x such that 

X* = xN? x) 

where (Nf)t>o is the Poisson process with in¬ 
tensity X(x). The number x represents the jump 
size and the intensity X(x) is the expected num¬ 
ber of jumps with size x in the unit time. 

Let S = [Xj e R : Xj / 0, j = 1, 2, • • •} be a dis¬ 
crete subset of jump sizes, X(xj) > 0 for all Xj 
e S, and Y = (Y t )t >o he a process defined by 

OO 

Yt = yt + 'jT i X? 

7=1 

If S consists of positive real numbers and y > 
0, then the process Y is non-decreasing. Con¬ 
versely, if S consists of negative real numbers 
and y < 0, Y is non-increasing. 

Since the characteristic function of Xf is equal 
to 

4>xf(u) = exp (X(x)t(e~ mx - 1)) 
the characteristic function of Y f is obtained by 


Compounded Poisson Process 

The process X = ( X t ) t >o is referred to as a com¬ 
pounded Poisson process if X is defined by 

Nt 

X f = 

k=l 

where 

• Yi, Y2, • ■ ■ are independent and identically 
distributed (IID) random variables, and / is 
the probability density function of Yi. 

• (Nf) f > 0 is a Poisson process with intensity X. 

• Nt and Y* are independent for all f > 0 and 

k= 1 , 2 ,.... 


<pY t = exp 


^iynt + t Hxj)(e lux ’ - l)j 


For the process Y, the function v defined 
by v(A) = Ex eA^Xj) represents the expected 
number of jumps with size x e A in the unit 
time interval, where A is a subset of S. For ex¬ 
ample, the expected number of jumps whose 
sizes are in {xi, X 2 , ■ ■ •, x n } is equal to v({x\, 
X2 r ' ' '/ Xn\ — j _ ] k(xy). 

Now, we extend the set of jump size S to the 
real number set R. Then the expected number 
of jumps is defined by a map v from a subset 
of R to a positive number. The map v is a jump 
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measure, that is, the expected number of jumps 
whose sizes are in a real interval [a, b] is rep¬ 
resented by v([a,b]). Using v, we can obtain an 
extended process Y such that the characteristic 
function of Y t is given by 


<Py, — exp ^ iynt + t J (e lux 


1 )v(dx) 


( 1 ) 


where y e R. Jump sizes of process Y can be 
defined continuously. In this case, the measure 
v is referred to as a Levy measure, that is, a Borel 
measure on R satisfying v(0) = 0 and 



minjl, x 2 }v(dx) < oo 


The class of jump processes satisfying (1) can¬ 
not contain infinite variation processes. To in¬ 
clude infinite variation processes in the class 
of jump processes we will be using, we need 
a more general definition. Consider a process 
Z = (Z f ) t >o such that the characteristic function 
of Z f is given by 

(pz,= exp^iynt + tj ( e mx -l - iuxl\ x \^{)v(dx)j 

( 2 ) 

The process Z is referred to as the pure jump 
process. If 



then the characteristic function (1) is not de¬ 
fined, but the function (2) is well defined. The 
details can be found in Sato (1999) and Cont 
and Tankov (2004). The path behavior of the 
pure jump process is determined by the Levy 
measure v and real number y. 


• y > 0 and v(A) = 0 for all A C (—oo, 0), then Z 
is non-decreasing. 

• y < 0 and v(A) — 0 for all A C (0, oo), then Z 
is non-increasing. 

• If v(R) < oo (i.e., the expected number of 
jumps on the unit time is finite), then we say 
that Z has a finite activity. 

• If v(R) = oo (i.e., the expected number of 
jumps on the unit time is infinite), then we 
say that Z has an infinite activity. 


• If f\ x \x\v(dx) < oo, the process Z has finite 
variation. 

• If \x\v(dx) — oo, the process Z has infinite 
variation. 

The building block of the pure jump process 
Z is the Poisson process. Hence, Z has the fol¬ 
lowing properties: 

• Z 0 = 0. 

• Z has independent and stationary increments; 
that is, the random variable (Z f — Z s ) is inde¬ 
pendent of the random variable (Z v — Z n ) for 
all real number s, f, u, and v with 0 < s < f < 
u < v. 

• Z s+ f — Z s = Z ( for s > 0 and t > 0. More¬ 
over, we have 

log</> z » = t log0 Zl (u) (3) 

where 4> Zl ( u ) i s the characteristic function of 
Zf for t > 0. 

If t = 1, then we obtain the purely non-Gaussian 
infinitely divisible random variable. In fact, 
there is a one-to-one correspondence between 
a purely non-Gaussian infinitely divisible ran¬ 
dom variable and a pure jump process. 

Gamma Process 

Consider the gamma distribution with param¬ 
eter (c, X). Since the gamma distribution is a 
purely non-Gaussian infinitely divisible distri¬ 
bution, we can define a pure jump process G = 
(Gf)t>o such that Gi ~ Gamma(c, X). By equation 
(3), the characteristic function <pc, of G t is given 
by 

In this case, the process G is referred to as the 
gamma process with parameter (X, c). The sample 
path of the gamma process is non-decreasing, 
since the gamma distribution is supported only 
on the positive real line. When we take c = 1 of 
the gamma process, the process is referred to as 
an exponential process. 
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Inverse Gaussian Process 

Consider the inverse Gaussian distribution 
with parameter (c, X). Since the inverse 
Gaussian distribution is also a purely non- 
Gaussian infinitely divisible distribution, we 
can define a pure jump process X = (X f ) f > 0 such 
that X| ~ JG(c,X). By equation (3), the character¬ 
istic function <px, of X f is given by 

(p xt = exp (—ct(y/X 2 — 2iu — X)^j (5) 

In this case, the process X is referred to as the 
inverse Gaussian (IG) process with parameter 
(c, X). The sample path of the gamma process 
is nondecreasing, since the inverse Gaussian 
distribution is supported only on the positive 
real line. 


Recall the Levy measure of the a-stable process 
can be written as 

v (dx) = + | x | 1+ff l*<o^ dx 

where C + and C_ are positive constants. Then 
we can prove that 

/ OO 

v(dx) = oo 

-CO 

and hence the a-stable process is an infinite ac¬ 
tivity process. On the other hand, since we have 


J \x\v(dx) = 

we conclude that the a-stable process has finite 
variation if a < 1 and the infinite variation if 
a > 1. 


L ++L- 


< 1 


1—a ’ 

oo, a > 1 


Variance Gamma Process 

The variance gamma process is an infinitely di¬ 
visible distribution. Thus we can define pure 
jump processes X = (Xf) f >o such that X\ ~ VG(C, 
X + , By equation (3), the characteristic func¬ 
tion <px, of X f is given by 

^ X ‘ \(A+ - iu)(X_ + in)) ^ 

In this case, the process X is referred to as 
the variance gamma ( VG) process with parameter 
(C, X + , X-). 


a-Stable Process 

The pure jump process X = (X f ) f > 0 is re¬ 
ferred to as the a-stable process with parameters 
(a, er, fi, //) if Xi is an a-stable random variable, 
that is, Xj ~ S ff (cr, ft, p). By equation (3), the 
characteristic function (j>x, of X f is given by 


4>x t (u) = 


( exp(iput — t\(Tu\ a 


x 1 


na\\ 


;'h(sign u) tan—jj 
(exp(iput — fcr |u| 


x| 1 + ifi (signii)ln|« 

Tt 


a ^ 1 


= 1 


Tempered Stable Process 

The pure jump process X = (X t ) t >o is referred to 

as the tempered stable process if X\ is the tempered 

stable random variable. 

• The process X is referred to as the classical 
tempered stable (CTS) process with param¬ 
eters (a, C, X + , A_, m) if Xi ~ CTS (a, C, X + , 
a_, in). The process X is referred to as the stan¬ 
dard CTS process with parameters (a, X + , X_) if 
Xj ~ stdCTS(a, X +/ X_). 

• The process X is referred to as the generalized 
tempered stable (GTS) process with parameters 
(a + , a_, C + , C_, X + , X_, m) if Xj ~ GTS(a + , a_, 
C+, C_, X + , X_, in). The process X is referred to 
as the standard GTS process with parameters 
(a + , a_, X + , X_, p) if X\ ~ stdGTS(a + , a_, X + , 

x~,p). 

• The process X is referred to as the mod¬ 
ified tempered stable (MTS) process with 
parameters (a, C, X + , A_, m) if X\ ~ MTS (a, 
C, X + , X _, in). The process X is referred to as 
the standard MTS process with parameters (a, 
X + , X-) if Xi ~ stdMTS(a, X + , 1_). 

• The process X is referred to as the normal 
tempered stable (NTS) process with parameters 
(a, C, X, p, m) if X\ ~ NTS (a, C, X, p, m). 
The process X is referred to as the standard 
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NTS process with parameters (a, X, fi) if Xj ~ 
stdNTS(a, X, /3). 

• Moreover, the process X is referred to as the 
normal inverse Gaussian (NIG) process with pa¬ 
rameters (c, X, p, m) if X 1 ~ NIG(c, X, ft, m). 
The process X is referred to as the standard 
NIG process with parameters (X, ft) if X\ ~ 
stdNIG(A, ft). 

• The process X is referred to as the Kim-Rachev 
tempered stable (KRTS) process with parameters 
(a, k+, k-, r+,r_,p+,p-,m) if Xi ~ KRTS ( a,k+, 
k_, r+, r_, p+, p~,m). The process X is referred 
to as the standard KRTS process with parame¬ 
ters (a, r+, r_, p+, p_) if X\ ~ stdKRTS(a, r+, 
r-,P+,P-\ 

• The process X is referred to as the rapidly de¬ 
creasing tempered stable (RDTS) process with pa¬ 
rameters (a, C, X+, X-, m) if Xi ~ RDTS(a, C, 


X+, X_, m). The process X is referred to as the 
standard RDTS process with parameters (a, X+, 
X_) if Xj ~ stdRDTS(a, X+, X_). 

The characteristic function <px, of X f is obtained 
by equation (3). For example, if X is the CTS 
process with parameters (a, C, X+, A_, m), then 

<Px,(u) = exp (t log(</»crs(«;a, C, X+, A_, m))) 

= exp (iumt — iutCr( 1 — — A“ _1 ) 

+tcr(-u)((x+ - iu) a - X°l 
+(X_ + iu) a - X a _)) 

Characteristic exponents of tempered stable 
processes are presented in Table 1. 

Let v(dx) be the Levy measure of the tem¬ 
pered stable process. Then we can prove that 
v(M) = oo, j \x\v(dx) < oo if a < 1, and 
\x\v(dx) = oo if a > 1. Consequently, the 


Table 1 Characteristic Exponents of Tempered Stable Processes 


Process fx,(u) = log$x,(«) 

CTS iumt - iutCr(l - - A“ _1 ) 

+fCr(-a)((A + - iuf - A“ + (A_ + iuf - X a _) 

GTS iumt — iutr(l — a;)(C + A “ + ~ 1 — C_A“r _1 ) 

+fC + r(-o' + )((A + - zm)°+ - A“ + ) + fC_r(-a_)((A_ + iuf- - k a _~)) 


MTS 


NTS 


NIG 

KRTS 


RDTS 


iumt + tC(Gn(u;a, A + ) + Gx(ie,a, X_)) + iutC(Gi(u;a, A + ) — Gj(u;cz, A_)) 
where Gr{x; a, X) = ^/jtF —) ((A. 2 + x 2 )i — A") 


and Gj(x;a, A) = 2 T r 


1 — a 


2F1 1 , 


1 — a 


2 A 2 / 


iumt — iutl L 1 Cv^itr (l — — ^/8(A 2 —/S 2 )z 1 
+t2-^cvrr (-|) ((A 2 - (t$ + iuf)i - (A 2 - p 2 )i) 


iumt — 


iutc/3 



— (ft + iu) 2 - <Jx 2 — 



, / k+r + k_r_ 

iumt — lutrtl — a ) I- 

V p+ + 1 P- + 1/ 

+tk+H(iw,a,r+, p+) + tk-H(—iu;a, r_, p_) 
whereH(x;a, r, p) = -( 2 -Fhp, —a; 1 + p;rx) — 1) 


iumt + tC(G(iu;a, A + ) + G(—iu;a, A_)) 

whereG(x;a. A) = 2“5 _1 A Q T ^ ^ 2 ) “ 

« 1 , /1 — a\ f (1 — a 3 x 2 
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tempered stable process has infinite activity, 
and has finite variation if a < 1 and infinite vari¬ 
ation if a > 1, as explained in Carr et al. (2002), 
Kim (2005), and Kim et al. (2008 and 2010). 


BROWNIAN MOTION 

In this section, we will discuss Brownian motion 
by means of an example. We begin with a short 
summary of the most important and defining 
properties of a standard Brownian motion W = 
(W t ) f > o 

1. W 0 = 0 

2. W has independent increments and station¬ 
ary increments. 

3. For any real numbers t > 0 and h > 0, the vari¬ 
able (Wf + ;, — W f ) is a normally distributed 
random variable with mean zero and 
variance h. 

4. The paths of W = (Wt)t>o are continuous. 

Every process fulfilling the above four prop¬ 
erties is referred to as the standard Brownian 
motion. From the second and third conditions 
it can be deduced that Brownian motion W f at 
time f (which equals the increment from time 
0 to time t) is normally distributed with mean 
zero and variance t. 

The paths of Brownian motion are highly ir¬ 
regular and nowhere differentiable. In order to 
draw a true path, one would have to calculate 
the value of the process for every real number, 
which is clearly not feasible. Due to its charac¬ 
teristic path property, it is impossible to draw a 
real path of Brownian motion. The process can 
only be evaluated for a discrete set of points. 
Figure 1 illustrates possible paths of Brownian 
motion. Strictly speaking, the plotted paths are 
only discrete approximations to the true paths. 

From the above definition of the process, 
it may not be clear how one can envision a 
Brownian motion or how one could construct 
it. Therefore, we will present a constructive 
method demonstrating how one can generate 
a Brownian motion as the limit of very sim¬ 
ple processes. We restrict the presentation to 


the unit interval (i.e., we assume 0 < t < 1) but 
the generalization to the abstract case should 
be obvious. The procedure is iterative, which 
means that on the kth step of the iteration we 
define a process (Xf ) ) 0 <(<i, which will serve 
as an approximation for a standard Brownian 
motion. 

Let random variables I\, 1 2 , h, ■ ■ ■ be IID with 


h = 


1 with probability p = 0.5 
— 1 with probability 1 — p = 0.5 ’ 


7 = 1 . 2 , ••• 


Define xf^ = J= Yl"j=i h where t = n/k and 
n = 0,1, ■ ■ ■, k. If the value t is on the interval 
(j, then we take a value obtained by a 
linear interpolation as 


Xf> = (kt - n)Xgt + (kt-n- 1)X ( ^ 1)A 


By doing so, we get a stochastic process with 
continuous paths. 

Let's start with k = 1. Then we have 


4 1) = o, 



with probability p =0.5 
with probability 1 — p = 0.5 


At any time t the random variable X^ 1 ’ can take 
only two possible values, namely — f and f. At 
any time, the process has zero mean and the 
variance at time t — 1 equals 

V (X^) = l 2 ■ 0.5 + (-1) 2 • 0.5 = 1 


That is not so bad for the first step, but obviously 
the distribution of X, 11 is far from being normal. 

What we do in the next step, k = 2, is allow 
for two different values until time t = 2 and 
three different values for l<t< 1. We do so 
by defining: 


4 2) = o, 


Y< 2 ) _ 
A 0.5 


V 


— with probability p = 0.5 

v 2 
1 

-— with probability 1 — p = 0.5 

v 2 


fV2 

0 

—V2 with probability (1 — p) 2 = 0.25 


with probability p 2 = 0.25 
with probability p(l — p) = 0.5 
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Figure 1 Possible Paths of a Standard Brownian Motion (Every Path Consists of 10,000 Equally Spaced 
Observations) 


The process X* now has four possible paths. 
The mean of Xp* is zero and the variance of Xp* 
equals 

v 0 ®) = (^) 2 ■ °- 5 +( _ 7 i ) 2 ■ °- 5 =°- 5 
V (xf) = V 22 ■ 0.25 + (-V 2) 2 • 0.25 = 1 


(2) 

but still the distribution of X) ' is far from being 
normal. 

By iterating the stated procedure, the proba¬ 
bility of xf 1 is given by 


P fxf k) 


n — 2 m\ 

""7T" ) 



if m e {0,1, 2, ■ ■ n}, t = n/k, n e {0,1,2, • ■ ■, k}. 
The mean and variance can be obtained as 
follows: 


£[x]=pi>[M=(] 

v{^) = l±EVji = l 

;=i 


where t = n/k, n = 1,2, • ■ ■, k. Since X^j k is defined 
by the sum of IID random variables, it has 

• Independent increments: xf^ /k and X^ /k — X^ /k 
are independent, for all n\, n 2 e {0, 1, ■ ■ ■, k} 
with n k <n 2 . 

• Stationary increments: xJJj-xJJ* = X® _ Bi)/fc 
for all m, n 2 e {0,1, ■ • •, k} with n\ < n 2 . 

Moreover, the distribution of X, will ap¬ 
proach the normal distribution due to the 
central limit theorem. Consequently, we have 
found all the defining properties of a Brown¬ 
ian motion in this simple approximating pro¬ 
cess, that is, the process (Xpp 0 <t<i converges in 
distribution to the standard Brownian motion 
(W f ) 0 < f <i. 

In the context of financial applications, there 
are two main variants of the standard Brow¬ 
nian motion which have to be mentioned: the 
arithmetic and the geometric Brownian motion. 
Both are obtained as a function of the standard 
Brownian motion. 
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Arithmetic Brownian Motion 

Given a Brownian motion (Wt)t>o and two real 
constants /x and a , the arithmetic Brownian mo¬ 
tion (X f )f>o is obtained as: 

X f = /xf + a Wf 

The process consists of the sum of a 

purely deterministic linear trend function /xf 
and a rescaled Brownian motion cr W t . The lat¬ 
ter has the property that at time f, aW t is nor¬ 
mally distributed with mean 0 and variance a 2 t. 
The paths will therefore randomly jitter around 
the deterministic trend with a variance propor¬ 
tional to the point in time f under consideration. 
The arithmetic Brownian motion is a simple but 
popular model for financial asset returns. 

Geometric Brownian Motion 

Given a Brownian motion (W f ) f >o, two real con¬ 
stants /x and a, and a starting value So > 0, the 
geometric Brownian motion (St)t>o is obtained 
as: 

St = S 0 e /if+crW ' 

The process (St)t>o is just the exponential of 
an arithmetic Brownian motion multiplied by 
a factor. Therefore log (St/So) is normally dis¬ 
tributed and 

£[Sf/S 0 ] = e' if+ j' j2f 


TIME-CHANGED BROWNIAN 
MOTION 

If a pure jump process process T = (T f ) t > 0 is non¬ 
decreasing, that is, T t > 0 a.s. for f > 0, and T t 
> T s a.s. for s <t, then the process T is referred 
to as the subordinator or intrinsic time process. 
Intuitively, it can be thought of as the cumula¬ 
tive trading volume process for a financial asset 
which measures the cumulative volume of all 
the transitions up to physical time f (Rachev 
and Mittnik, 2000). 

The Poisson, gamma, and inverse Gaussian 
processes are non-decreasing, and hence they 
are subordinators. Moreover, for the case where 


0 < a < 1, the support of the a-stable distribu¬ 
tion S a (a, 1, 0) is the positive real line. Hence, 
the a-stable process with parameters (|, cr, 1,0) 
and 0 < a < 2 is a subordinator and referred 
to as (/-stable subordinator. If we can consider 
an additional assumption that E[T f ] = f, this 
would mean that the expected intrinsic time is 
the same as physical time. 

If we take an arithmetic Brownian motion 
and change the physical time to a subordinator, 
then we obtain the time-changed Brownian mo¬ 
tion. That is, take an arithmetic Brownian mo¬ 
tion with drift /x and volatility cr as follows: 

fit T cr Wj 

and consider a subordinator T = (T f ) f > 0 in¬ 
dependent to the standard Brownian motion 
(W t ) f >„. Then substituting t = T t in the arith¬ 
metic Brownian motion, we have a new process 
X = (X f ) f >o with 

Xf = /xTf + o Wj; 

which is the time-changed Brownian motion. 

If Tt is fixed, then the conditional probability 
of Xf with a fixed variable T t follows a normal 
distribution, that is 

P(X f < y|Tf) = P(/xT f + aW Tf < y|T f ) 

1 fV (*~;»Tf) 2 

= - / e ^T t dx 

^/2ncr 2 Tt J —oo 


Using properties of the conditional probability 
and independence between W f and T f , the dis¬ 
tribution function F Xt and the probability den¬ 
sity function/x, of X f of X f are obtained by 


F Xf (y) = P(Xf < y) 

-/'/ 


V2j, 


_ (x-fisj 2 - 

-e 2 » 2 » f Tt (s)dsdx 


and 


f Xl (y ) = -F Xl 


M =f 

xo 




li'-Ais ) 2 

~e 2 <a f Tt (s)ds 


respectively, where fr t is the probability den¬ 
sity function of T f . Moreover, we can derive the 
characteristic function cp Xl as follows: 


4>xA u ) — 0 Tt ( 


iu A o A 


( 7 ) 


2 
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where (pj t is the characteristic function of T t . 
Using the time-changed Brownian motion, we 
can define various processes. 

The time-changed Brownian motion con¬ 
struction is well known from the theory of 
stochastic processes and is referred to as the 
Skorokhod embedding problem. Theoretically, 
every Levy process can be defined as the time- 
changed Brownian motion. More in general, a 
process can be embedded in a Brownian mo¬ 
tion if and only if it is a local semimartingale, as 
proved by Monroe (1978). 

Although the representation via Brownian 
subordination is a nice property, we do not 
know a general constructive method to find 
the process Tt such that X f = pTf + a Wj t . This 
means that given a semimartingale Xf, the time 
process Tt is not always of known form. Thus, 
this approach can be applied only for some par¬ 
ticular Levy processes. 


Variance Gamma Process 

By considering the gamma process as the subor- 
dinator of the Brownian motion, we obtain the 
VG process. That is, the VG process is defined 
by X = (X f )f>o with 

X f = pG f + a W Gt 


where G = ( Gt)t>o is the gamma process with 
parameter (c, X). In order to reduce the num¬ 
ber of parameters, we consider the assumption 
£[G t ] = t. Since we have £[Gf] = j, the as¬ 
sumption is satisfied if c = X. Then the char¬ 
acteristic function of Xf is equal to 


</>x t 00 = 


l C — i/ni + ) 



2c \ ct 

%tli + U*) 


( 8 ) 

by (7) and the characteristic function of Gf 
given in (4) with c = X. Inserting into (8) the 
parametrization 


- k+ 


In 


2c 

^2 


C = c 


we obtain the form given by (6). 


Normal Inverse Gaussian Process 

By considering the inverse Gaussian process as 
the subordinator of the Brownian motion, we 
obtain the NIG process. 

Define a process X = (X f )f>o with 

Xf = fiT t + aW Tt 

where T = (T t ) f > 0 is the inverse Gaussian pro¬ 
cess with parameter (c, X), satisfying E[T t ] = f. 
The condition £[Tf] = f holds if c — X. Then the 
characteristic function of Xf is equal to 

cpx,(u) — exp ^—kt(y/k 2 — li/in + 2a 2 u 2 — k)^j 



by (7) and the characteristic function of T t given 
in (5) with k := c = X. Inserting into (9) the 
parametrization 


X 2 -p 2 = 


k^_ 

2a 2 


P = 


2er 2 


c — V2ka 


Normal Tempered Stable Process 

Assume Levy measure v is equal to 
cc 

( 10 ) 

where a e (0,2), c > 0, and 0 > 0, and consider 
the pure jump process T = (T t )t>o defined by v 
and y, where 



Since v(A) = 0 for all A C (—oo, 0) and p, > 0, 
the process T is a nondecreasing process. Hence 
it is a subordinator and referred to as the 
tempered stable subordinator with parameters 
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(a, c, 0). Using equation (2), the characteristic 
function <pj t of T t is equal to 

<p Tt (u) = exp (tc (e lux - 1) 

Solving the integration in the last equation, we 
can obtain the following formula, 

<Mw) = exp (fcT ((6» - z'u) 1 - 0?)) 

( 11 ) 

The mean of T t is computed by the first cumu- 
lant, that is, 

E[T t ] = log0r,OOU=o = tcT (l - 00? -1 

Hence, the condition £[Tf] = t holds if c = 

(r (i-i)ef-i)- 1 . 

By considering the tempered stable subordi- 
nator as the subordinator of the Brownian mo¬ 
tion, we obtain the NTS process. That is, define 
a process X = (X f ) f > o with 

X f = /xT, + aW Tt 

where T = (T,) f > 0 is the tempered 

stable subordinator with parameter 
(T (1 — |) 05 _1 ) 1 , dj. The characteris¬ 
tic function of X f is equal to 


4>x t (u) 



by (7) and (11) with c = (T (1 — |) 6 1 . The 

last equation can be changed to the following 
expression: 



Inserting into (13) the parametrization 



we obtain the NTS process with parameter 
(a, C, 1, m) where 

m = -2-^CV^r (|) P(X 2 - p 2 )^ 1 

LEVY PROCESS 

A stochastic process X = (X f ) f > 0 is called a Levy 
process if the following five conditions are satis¬ 
fied : 

1. X 0 = 0 a.s. 

2. X has independent increments. 

3. X has stationary increment. 

4. X is stochastically continuous that is, Vf > 0 
and a > 0, 

limP[|X s — X f | > a] — 0 

S—>t 

5. X is right continuous and has left limits 
(cadlag). 

The standard Brownian motion, arithmetic 
Brownian motions, and pure jump processes 
are all Levy processes. Moreover, a Levy pro¬ 
cess can be decomposed by a Brownian motion 
and a pure jump process (Z f ) f >o independent to 
the Brownian motion, that is 

X f = ctW, + Z f 

Hence we obtain the characteristic function of 
Xf as follows: 

(pXt(u) = <p a w,(u)(pz t (u) 

= exp (—f<7 2 u 2 ) exp (iyut + t ( e lux — 1 

-iuxl w <i)v(dx)) 

= exp (iyut — | a 2 u 2 + t ( e lux — 1 

- iuxl w <x)v{dx)) 

where 4> a w,(u ) is the characteristic function of 
N( 0, cr 2 t), and </>z,(«) given by (2). Therefore, 
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if X = (Xf) ( >o is a Levy process, then for any 
t > 0, X t is an infinitely divisible random vari¬ 
able. Conversely, if Y is an infinitely divisible 
random variable, then there exists uniquely a 
Levy process (X, ) f > 0 such that X 1 = Y, as proved 
by Sato (1999, p. 38). 

KEY POINTS 

• Continuous-time stochastic processes are the 
building block of financial modeling and they 
are usually used to explain the uncertain be¬ 
havior of financial assets. Some results of 
probability theory can be usefully applied to 
financial derivatives pricing and risk manage¬ 
ment. 

• Given any infinitely divisible random vari¬ 
able X\, it is possible to define a stochastic 
process with independent and stationary in¬ 
crements such that for all t > s, the increment 
X f — X s has characteristic function exp((t — 
s) log 0 Xl (;())■ These processes are known as 
Levy processes. 

• Brownian motion and Poisson processes are 
Levy processes. All Levy processes can be 
constructed by changing the deterministic 
time f of the Brownian motion W f with a 
stochastic time T f . This construction is called 
Brownian subordination and the increasing 
process T t is a subordinator. 

• There are two main variants of the standard 
Brownian motion used in financial applica¬ 
tions: the arithmetic and the geometric Brow¬ 
nian motion. 

• The Poisson process is a fundamental exam¬ 
ple of a stochastic process with discontinuous 
trajectories, and a building block for con¬ 
structing more complex jump processes. 

• Pure jump processes include also the gamma 
process, the inverse Gaussian process, the 
varaince gamma process, the a-stable process, 
and the tempered stable process. 
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Abstract: The current price of an option is obtained by the conditional expectation of the payoff 
function under the risk-neutral measure. The risk-neutral measure is the measure equivalent to the 
real market measure under which the discounted price process of the underlying stock becomes a 
martingale. In the Black-Scholes model, the risk-neutral measure can be obtained by the Girsanov 
theorem. The Esscher transform has been used to find the risk-neutral measure for the continuous 
Levy process models. The general theory of the Esscher transform is applied to find the risk-neutral 
measure under tempered stable Levy process models. 


In this entry, we present some issues in 
stochastic processes. We begin by defining 
events of a probability space mathematically, 
and then discuss the concept of conditional 
expectation. We then explain two important 
notions for stochastic processes: martingale 
properties and Markov properties. The for¬ 
mer relates to the fair price in a market and 


the latter describes the efficiency of a mar¬ 
ket. Finally, "change of measures" for processes 
are discussed. Change of measures for tem¬ 
pered stable processes are important for de¬ 
termining no-arbitrage pricing for assets. Fur¬ 
ther details about no-arbitrage pricing with the 
change of measure is discussed in Rachev et al. 
( 2011 ). 
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EVENTS, (r-FIELDS, 

AND FILTRATION 

A set of possible outcomes in a given sample 
space £2 is called an event. An event is mathe¬ 
matically defined as a subset of £2. If we have 
one event A, then the set of outcomes that are 
not included in A is also an event. For example, 
if we consider an event that the return of the 
stock of Disney tomorrow will be positive, then 
the set of outcomes that Disney's return tomor¬ 
row will be negative is also an event. Moreover, 
if we have two events A and B, then a set of out¬ 
comes included in both A and B is also an event. 
For instance, consider two events, the first event 
being that Disney's stock return tomorrow will 
be positive, and the other event that IBM's stock 
return tomorrow will be positive. Then a set of 
outcomes that both stock returns will be posi¬ 
tive tomorrow is an event. 

The class of events is described mathemati¬ 
cally by the er-field. The er-field, denoted by T, 
is the class of the subsets of £2 that satisfy the 
following properties: 

Property 1.0eT and fief. 

Property 2. If A e T , then A c = {x e £2, \x ^ A] 
e T. 

Property 3. If A\ / Ai, A 3 , ... eT, then U ™ = 1 A n 
e T. 

Let Q denote a class of subsets contained in 
£2. Then the smallest a -field containing Q is re¬ 
ferred to as the cr-field generated by Q, and is 
denoted by a(£7). For a given random variable 
X, consider the class Q = [A c £2 : A = X _1 (I), 
for all open interval I in R}, where X” 1 is the 
inverse image of X. Then the a-field generated 
by Q is referred to as the a -field generated by 
X, and denoted by er(X). If there is a cr-field 
T such that cr(X) C T, then we say that X is 
JF-measurable. 

The probability P is a map from a given a- 
field T to the unit interval [0,1]. If A c N e T 
and P(N) = 0, then the set A is referred to as a 
null set with respect to (£2. T,V). Let A f be the 
class of all null sets with respect to (£2, T, P). 
The space (£2, f, P) is referred to as a comple¬ 


tion of (£2, T, P) if f = o(T U J\f) and P (A U N) 
— P(A) for all A e T and N e AT. All probability 
spaces in this entry are assumed to be comple¬ 
tions of spaces, that is, all null sets are contained 
in given a-fields, and probabilities are defined 
on completed a-fields. 

Let be a sequence of a-field with con¬ 

tinuous index t > 0 (or discrete index t — 0,1, 2, 
...). If T s c T t for all 0 < s < f, then 0 is re¬ 
ferred to as a filtration. T t can be interpreted as 
the "information" available to all market agents 
at time t. The filtration describes increasing in¬ 
formation for time f. 

Consider a stochastic process X = ( X t ) t >o■ If X f 
is Ar-measurable for all t > 0, then X is referred 
to as a (J r f )f>o-adapted process. If X f is Tt-\- 
measurable for all discrete index t = 0,1, 2,..., 
then X is referred to as a (Af) f >o-predictable 
process. 

For a given process X = (X t ) t >o, we can gen¬ 
erate a filtration (J-t)t>o by 

Tt = <t(X s ;0 <s<t) 

where a(X s ; 0 < s < f) is the smallest a-field 
containing all a(X s ) with 0 < s < t. Then the 
process X is (JR^^o-adapted and this filtration 
is referred to as a filtration generated by X. 


CONDITIONAL 

EXPECTATION 


The conditional expectation is a value of the ex¬ 
pectation of a random variable under some re¬ 
stricted events. Let g be a Borel function, X be a 
random variable on a space (£2, P) with E[y(X)] 
< 00 , and A be an event. The conditional expec¬ 
tation £[g(X)|A] is defined by 


E[g(X)|A] = 


E\g(X)l A ] 

P(A) 


where 


1 a(o>) 


0 if co ^ A 
1 if co £ A. 


Consider a Borel function g, a stochastic pro¬ 
cess X = (X f ) f >o adapted to filtration (lFt)t>o- We 
can define the conditional expectation on Tt as 




Stochastic Processes and Tools 


509 


a random variable. That is, the conditional ex¬ 
pectation E[g(X T ) \T t ] for t < T is a random 
variable, such that 

E[g(X T )\f t ](co) = E[g(X T )\A w \, 

where A m is the smallest event in T t with co e 
4», or Au — r\ ae H lj>e T, B 0 j. Moreover, if g and h 
are Borel functions, and 0 <s <t <T < T*, then 
we have the following properties: 

• E[g(X f ) \T 0 ] = £[g(X f )] whereto = {0, ^}- 

• E[E[g(X T )\F t ]\F s ] = E[g(X T )\f s ]. 

• E[g(X t )h(X T ) \F t ] = g(X t )E[h(X T ) |F f ]. 

• E[ag(X T ) + bh(X T *)\T t ] = aE[g(X T )\F t ] + 
bE[h(X T *) for a,b eR. 

We write £ [y( X-/) X t ] instead of 
£[g(Xr) \Tt ] when J r t =o(X t ). Hence we 
have: 

• E[E[g(X T )\X t ]\X s ] = E[g(X T )\X s ]. 

• E[g(X t )h (X T ) |X f ] = g(X t )E[h(X T ) |X f ], 

• E[ag(X T ) + bh(X T *)\X t ] = aE\g(X T ) |X f ] + 
bE[h(X T *) |Xf], iora,b e K. 

If a (JFf)-adapted process X = ( X t ) t >o satisfies 
the condition 

E[g(X T )\r t ] = E\g(X T )\X t ] 

for all 0 > t > T and Borel function g, then the 
process X is referred to as a Markov process. 

In finance, a Markov process is used to ex¬ 
plain the efficient market hypothesis. Suppose 
X is a price process of an asset, and consider 
a forward contract on the asset with maturity 
T. The o -field T t contains all market informa¬ 
tion until time f. Hence, F f = £[Xy |-£f ] is the 
expected price of the forward contract based 
on the information up to t. If the market is 
efficient, all information until t is impounded 
into the current price X t . Hence, the expected 
price of the forward contract can be obtained 
by Ff = E[X t \ X t ]. 

If a (.F^-adapted process X = (X f ) f >o satisfies 
the condition 

Xf = E[X r \Ti ] 

for all 0 < t < T, then the process X is referred to 
as a martingale process. The process X = (X f ) f >o 


with X t = aWf is a martingale process, where 
a > 0 and (W ( ) f >o is the standard Brownian 
motion. Since X f is T t -measurable, we have 

E[X t \Tt ] = E[X t — X t + X t ] 

= E[X T -X t \T t ] + X t 

Since X has stationary and independent 
increments, 

£[X r - X f \T t ] = E[X t — X t ] = £[X r _ t ] 

= £[<t Wr_f] = 0 

Hence the process X is a martingale. 

In finance, a martingale process describes 
the fair price or no-arbitrage price for an asset. 
For example, consider one share of a stock and 
a forward contract that required delivery of 
one share of that stock to the forward contract 
holder at the maturity date. Suppose (Sf) f >o is 
a stock price process and (F f ) 0 < t<T is the price 
process for the forward contract with maturity 
T. The forward price at time t < T is given by 
the conditional expectation of Sr based on the 
information until time f, that is, Ff = E[Sy | Tt]. 
Moreover, we can see that F f = S t for all t with 
0 < t < T by the following argument. Suppose 
Ft > St- Then we obtain the difference Ft — St >0 
at time f by purchasing one share of the stock at 
price St and selling the forward contract at price 
Ff. We invest the proceeds in a money market 
account with interest rate r. At time T, by deliv¬ 
ering the stock to the holder of the forward con¬ 
tract, we will then have e r< ' J ~ t \F t — Sf), which is 
an arbitrage profit. If Ff > Sf, then another arbi¬ 
trage opportunity can be found by selling (i.e., 
shorting) one share of the stock and purchasing 
the forward contract. Therefore, to eliminate 
arbitrage opportunities, F t should be equal to 
St; that is, the stock price process should be a 
martingale. 


CHANGE OF MEASURES 

In this section, we will present change of mea¬ 
sure for random variables and Levy processes. 
Change of measure is an important method 
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to determine no-arbitrage prices of assets and 
derivatives. 


Equivalent Probability Measure 

Consider two probability measures P and Q on 
a sample space Q and a -field T. If they satisfy 
the condition 


Q(A) = 0 => P(A) = 0, 

then we say that P is absolutely continuous with 
respect to Q, and denote P <SC Q. Moreover, if 
P«Q and Q « P, that is, 

Q(A) = 0 <» P(A) = 0, 


then we say that P and Q are equivalent. 

If Q P, then there exists a positive random 
variable £ with f n $d P = 1 and 

Q(A) = f £dP (1) 

J A 

for any A e T. In this case, £ is referred to as the 
Radon-Nikodym derivative, and denotes 


Conversely, if there is a positive random vari¬ 
able f with f n t;d P = 1 and Q is defined by 
equation (1), then Q is also a probability mea¬ 
sure and Q <£; P. 

Let X be a random variable on a probabil¬ 
ity measure P, and /(x) = -rP(X < x) be the 
probability density function (p.d.f.) of X. Sup¬ 
pose Q is a probability measure and the prob¬ 
ability density function of X on Q is given by 
gW = £Q( X < x). If P and Q are equivalent, 
then the Radon-Nikodym derivative is equal to 

dQ = g(X) 
dv /(X) 

For example, X ~ N{ 0, 1) is normally dis¬ 
tributed on P. If we take the Radon-Nikodym 
derivative by 

?1 x 2 / - ’ 

e t/\/7jx 


then the measure Qi defined by Qi (A) = 
f A tfj d P for A e T is equivalent to P and X ~ 
N(ji, a 2 ) on the measure Qi. On the other hand, 
if we take the Radon-Nikodym derivative by 

h(X) 

2 

where 

k{X) = iz({x-fji) 2 + cj 2 ) 

which is the probability density function of the 
Cauchy distribution, then the measure Q 2 , de¬ 
fined by Q 2 (A) = f A &dP for A e T is equiva¬ 
lent to P and X ~ S| (<r, 0 . /i) on the measure Q 2 . 

Consider a finite discrete process (X t ) fe {i, 2 ,...,r) 
of independent and identically distributed (IID) 
real random variables on both probability mea¬ 
sures P and Q, where T is a positive integer. By 
the independent property of the process on P, 
we have 


P[Xi e M, • • ■, X f _i e R, X f < x, 

X f+1 g M, • • •, X 7 g M] 

= P[Xi G M] • ■ ■ P[Xf_j G M] • P[X, < x] 
■P[Xf+i G M] ■ • • P[X r G M] 

= P[X f < x] 

By the same argument, we have 

Q[Xi g M, ■ ■ •, Xf_i g M, Xf ■< x, 

X/ (1 g M, ■ • •, Xj g M] = Q[Xf < x] 

Since X ( 's are identically distributed on P 
and Q, respectively, we have P[X f < x] = 
P[X S < x] and Q[X( < x] = Q[X S < x] for all 
t, s G {1,2, • • ■, T}. Suppose that for all te 
{1,2, • ■ ■, T] the probability density functions 
of Xf are given by/(x) and g(x) on probability 
measures P and Q, respectively. That is 

f ( x ) = ^ p [ X f < 

and 

g(x) = ^~Q[X t < x] 
dx 

If the domain of the function / is the same as 
the domain of the function g, then P and Q are 
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equivalent and the Radon-Nikodym derivative 
is equal to 

dQ _ g(X 1 )g(X 2 )---g(X T ) 
dF f (X{)f (X 2 ) • • • / (X T ) 

However, that method cannot be used for either 
continuous-time processes or infinite-discrete 
processes. In the next section, we discuss the 
change of measure for continuous-time pro¬ 
cesses using Girsanov's theorem and the ex¬ 
tended Girsanov's theorem. 

Change of Measure for 
Continuous-Time Processes 

A continuous-time process is a function from 
the sample space to the set of appropriate 
functions. Hence, the change of measure for 
processes is more complex than the change of 
measure for a random variable. 

Brownian motion is a function from the sam¬ 
ple space to the set of continuous functions. For 
Brownian motion, we can find an equivalent 
measure using the following theorem, which is 
referred to as Girsanov’s theoremf 

Theorem 1. Let W = (W t ) t > 0 be a standard 
Brownian motion under measure P and (tF f )t>o 
be a filtration generated by W. Consider a pro¬ 
cess (£t)t>o defined by 

§ f = e~ ew ‘ -T f 

Then the probability measure Q given by 

Q(A) | rt = f |fdP, AeT t 
J A 

is equivalent to F\F t for all t > 0, and the pro¬ 
cess W — (W f )f>o with W t — 9t + Wf is a stan¬ 
dard Brownian motion under the measure Q. 

Girsanov's theorem shows how stochastic 
processes change under the change of measure. 
For example, let a process X = (X t ) t > 0 be an 
arithmetic Brownian motion under measure P 
such that 


Xt = fj,t + crWt 


where (W f ) f >o is the standard Brownian motion. 
The process X is not martingale on the measure 
P, but we can obtain a measure where X is a 
martingale by Girsanov's theorem. Indeed, we 
define a measure Q equivalent to P such that 

i f I ’-Wt A i 

Q(A) |jr f = / r^'rfP.AeF, 

J A 

Then the process X becomes X f = oW t with 
Vf t = g + W t and the process (Wf)f>o is a stan¬ 
dard Brownian motion on the measure Q. 
Therefore, the process X is a martingale on the 
measure Q. 

A Levy process is a function from the sample 
space to the set of right continuous functions 
with left limits at any point of the domain. 2 
Girsanov's theorem can be extended for Levy 
processes by the following theorem: 


Theorem 2. Suppose a process X = (X f ) f > 0 is a 
Levy process with Levy triplets (er 2 ,v,y) under 
measure P. If there is a real number 0 satis¬ 
fying f , >;1 e ex v(dx) < oo, then we can find the 
equivalent measure Q whose Radon-Nikodym 
derivative is given by 


dQ . ^ e ex ‘ 

JP Ep[e ex >] 


= e sx,~mt 


where 1(8) = log £p[e eXl ]. That is. 


Q(A)|^ = f | f dP, AeTt 
' J A 


is equivalent to P| Tt for all t > 0. Moreover, the 
process X is a Levy process with Levy triplets 
(a 2 .v, y) under the measure Q, where v(dx) — 
e Bx v(dx) and y = y + f M<1 x(e 0x — 1 )v(dx). 

The change of measure using Theorem 2 is 
referred to as the Esscher transform. The most 
general theorem of change of measure for Levy 
processes is given by the following theorem (see 
Sato, 1999): 


Theorem 3. Suppose a process X = (X t )t>o 
has Levy triplets (er 2 , v, y) and (a 2 , v, y) under 
measures P and Q, respectively. 

1. In the case where er 2 ^ 0 and cr 2 f O.PI* and 
Q|jr f are equivalent for all f > 0 if and only if 
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the Levy triplets satisfy 


teristic function of X t is given by 


a 2 = a 2 > 0 (2) 


and 



f(x)/2 


— 1 ) 2 v(dx) < oo 


(3) 


where f(x) = In 

2. In the case where a 1 = a 2 = 0, P|j f, and Q|jc f 
are equivalent for all f > 0 if and only if the 
Levy triplets satisfy (3) and 


?-y = 



(4) 


When P and Q are equivalent, the Radon- 
Nikodym derivative is 


dQ 

dQ 


T,=e 


ft 


where § = (%t)t>o is a Levy process with Levy 
triplet (a 2 , v^,yi=) given by 


2 2 2 
of = at] 

= v o i/'" -1 

a 2 ,! 2 

n = — 


/ oo 

(e y ~ 1 - ijl\ y \<i)v%(dij) 

-OO 


(5) 


and i] is such that 


Y — Y 



v)(dx) = 


o 2 rj if a > 0 
0 if a = 0 


Change of Measure in Tempered 
Stable Processes 

In this section, we present the change of mea¬ 
sure for six tempered stable processes: the 
classical tempered stable (CTS) process, Kim- 
Rachev tempered stable (KRTS) process, mod¬ 
ified tempered stable (MTS) process, normal 
tempered stable (NTS) process, and rapidly de¬ 
creasing tempered stable (RDTS) process. The 
six processes are defined as follows: 

• Let a e (0,2), C, X + , > 0, and m e R. A Levy 

process ( X t ) t >o is referred to as the classical 
tempered stable (CTS) process 3 if the charac- 


<px,(u) = exp(iumt—iutCT(l — a)(A.“ 4 — Xf 4 ) 
+fCr(-or)((A,+ - in) 01 - a“ 

+(k_ + inf - Xf)) 


If we take a special parameter C defined by 
C = (T(2 - a)(^“ 2 + 2k“- 2 ))- 1 (6) 


and m — 0 then E[X f ] = 0 and V(Xf) = f. In 
this case, X is called the standard CTS process 
with parameters (a, X + , /._). 

• A Levy process (Xf) f >o is referred to as the 
generalized tempered stable (GTS) process if 
the characteristic function of X f is given by 

4>x,(u) = exp(iumt — iutT(l — a)(C + k" +_1 

- c.xf-- 1 ) 

+tc + r(-a + )((x + - iuf- - x a +) 
+fC_r(—«_)((!_ + iuf - - xf-)), 

(7) 

where a + , a_ e (0, 1) U (1, 2), C+, C_, X + , 

> 0, and m e R. If we substitute 


P^~ a+ c _ (1 ~ P)7 2 -° 

r(2 - a+) ’ T(2 - a_) 


( 8 ) 


where p e (0,1), and m = 0 then £ [X f ] = 0 and 
V(X t ) = t. In this case, X is called the standard 
GTS process with parameters (o! + ,a_,k + ,k_, p). 

• Let b e (0, 2) \ {1}, , k +/ k_, r+, r_ > 0, 
p+,p- e {p > —a\p ^ —1, p ^ 0 }, andwj e R. 
A Levy process ( X t ) t >o is referred to as the 
Kim-Rachev (KR) process 4 if the characteris¬ 
tic function of X f is given by 


< p x ,(u) = exp (iumt — zwfT(l — a) 
k+r . k_r_ 


p+ + 1 p~ + 1, 
+tk + H(iu;a,r + , p + ) 
+tk-H(—iir,a, r_, p_)) 


where 


H(x;a, r, p) = ^ a \ 2 Fi(p,-a; l + p;rx)-l) 


where 2 F 1 is the hypergeometric function. 5 If 
p + and p_ approach to the infinite, then the 
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KR process converges to the CTS process. If 
we substitute 


k+ = C 
k_ = C 


« + V+ 

r a 
' + 

a + P- 


where 


1 

T(2 - a) 


( a + P+ r 2-a 
V 2 +p+ + 


2 + p_ 



(9) 


and m = 0 then £[X t ] = 0 and V(X f ) = t. In this 
case, X is called the standard KRTS process 
with parameters (a, r +/ r_, p +/ pJ). 

Let a e (0, 2) \ {1}, C, A + , A_ > 0, and 
m e R. A Levy process (X f ) t > 0 is referred 
to as the modified tempered stable (MTS) 
process 6 if the characteristic function of X f is 
given by 

<pXt(u) = exp (hunt + tC(GR(n;a, C, A + ) 
+Gn(i(;a, C, A_)) 

+iutC(Gi(n;a, A + ) — Gj(u;a , A_))) 


characteristic function of X t is given by 

(pX t (n) = exp (iumt + iut2~^ k C s /TtY ^ ——) 
xa/3(X 2 - p 2 Y~ l + 2-^fCV^r 
x ((X 2 — (P + iu) 2 )% — (A. 2 — ^ 2 )t) 

If we substitute 

C = 2^ ^vGrT ^ a (A 2 — /l 2 ) 2 

x (afi 2 - X 2 - fi 2 ))^ (11) 

and m — 0 then £[X t ] = 0 and V(Xf) = f. In 
this case, X is called the standard NTS process 
with parameters (a, A,/l). 

• Let a e (0, 2) \ {1}, C, X + , A_ > 0, and m e 
M. A Levy process (Xf) f >o is referred to as the 
rapidly decreasing tempered stable (RDTS) 
process 8 if the characteristic function of X t is 
given by 

(t>x,( u ) — exp(iumt + tC(G(in;a, A+) 
+G(—iu;a, A._))), 


where 


where for u e R, 

Gr(x-,oi, X) = 

x((X 2 + x 2 )? — X a ) 


and 


G/(x;a, X) = 2 'Tf ( 1—^ ) A "' 1 


2h 1 


1 — a 3 x ^ 


' 2 ' A 2 1 


If we substitute 

C = 2^ (y*Y (l - (A "“ 2 + A“- 2 )) 1 

( 10 ) 

and m — 0 then £[X f ] = 0 and V(X t ) = t. In 
this case, X is called the standard MTS process 
with parameters (a,A + ,A_). 

Let a e (0, 2), C, A > 0, \fi\ < A, and m e K. 
A Levy process ( X t )t>o is referred to as the 
normal tempered stable (NTS) process 7 if the 


G(x;a, A) 
= 2 - 2 “ 




2’ 2'2X 2 


+2-2-2A“- 1 xT 

1 


x M 


2 

a 3 x 2 


2 ’ 2' 2X 2 


and M is the confluent hypergeometric func¬ 
tion. See Andrews (1998). If we take a special 
parameter C defined by 

C=2? (r(l-^)(A “- 2 + A “- 2 )) _1 (12) 

and m = 0 then £[X t ] = 0 and V(X t ) = t. In 
this case, X is called the standard RDTS process 
with parameters (a, A + , A_). 

The six tempered stable processes are pure 
jump Levy processes with Levy triplet (0, v, y), 
where y = m — L. a xv(dx) and Levy measures 
are presented in Table 1. 
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Table 1 


CTS 

GTS 

MTS 

NTS 

KRTS 

RDTS 


Levy Measures for Tempered Stable Processes 


Levy Measure i (dx) 


C 


(e~ x + x e —x_|x| \ 

yx^ lx>0+ * 

r,e~ x + x C e~ x -W \ 

K«±i(A. + x) C_(A._|x|) ? r 1 Ka±i(L_|x|) \ 

^ + -- ll<0 dx 



IxD^KWLIxl) 

—-- - - dx 

|x| 1+ “ 


k + r~ p+ 

x 1+ “ 


fV*' 

Jo 


-x/s s a+p + -l dg y 

Jo 


k_rZ p - 

I>0 + 


fv 


-Ll/S q a +P-~ l r 


>-+A 


Let X = (X t )t>o be one tempered stable pro¬ 
cess among the six tempered stable processes. 
Then E[X{] = nit and X has stationary and 
independent increments. Therefore, we have 

E[X T \T t ] = E[X T - X t \T t \ + X t 

= E[X T - t ] + X t = m(T -t) + X t 

and hence X is a martingale when m = 0. 

The properties of tempered stable processes 
change under the change of measure using the 
Esscher transform. For example, let a process 
X = (X f ) f >o be a symmetric CTS process under 
measure P (that is X + = = /.). Then the Levy 

measure v(dx) of X is given by 

( £ -Xx £ -X\x\ 

^ lx>0 + 

Since we have e 8 x v(dx) < oo for some real 
number 0 with -X < 0 < X, we can define a 
measure Q equivalent to P such that 

Q{A)\r t = [ e 0 X ‘~ mt dV, A e T t 

J A 

where 

m = log £ P [e 0Xl ] = CT(—a)((L - Of 
+(L + 9f - 2X a ) 



Moreover, the Levy measure v(dx) of X under 
Q is given by 

v(dx) = e 8 x v(dx) 

e -(k-6)x e -(X+B)\x\ 

~lA ^ lx>0 + |x|i+“ lx<0 

By the same argument, we discuss the relation 
between the symmetric MTS and NTS process. 
That is, let a process X = (X f ) f >o be a symmetric 
MTS process under measure P. Then the Levy 
measure v(dx) of X is given by 

v(dx) = Ka±i(X\x\)dx 

Since we have e~^ x v(dx) < oo for some 
real number /J with -X <fi <X, we can define 
a measure Q equivalent to P such that 

Q(A)|* = [ e-^-'^dF, A e T u 

Ja 

where 

l(x) = log E F [e xXl ] 

= ci-^sfur ((x 2 + x 2 ) % - - x a ) 

Moreover, the Levy measure v(dx) of X under 
Q is given by 

v(dx) — e~P x v(dx) = Ce~^ x (X IxD^ 1 K^vi(X\x\)dx 
which is the Levy measure for the NTS process. 
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Table 2 Condition for Equivalent between P and Q 


Parameters under 
(X f ) fe0 Measure P 

CTS process (a, C, X+, X_, m) 


GTS process 


/ a+a-, C+, C_, 
y A.+ , X-, m 


MTS process (a, C, X+, A_, m) 


NTS process (a, C, X, fi,m) 


KRTS process 


( ai,fci,+,fci,-,n,+. 

\ri,_, p-i,+, pi,-, mi 


Parameters under 
Measure Q 

(a, C, X + , X_, in) 

(a+a.-, C+, C_, \ 
\X+,X_,m J 


(a, C, I + , X-, fti) 


(a, C,X, fi, fti) 


l <* 2 , k 2 ,+, k 2 ,~, r 2 ,+, \ 
\r 2 ,~, p 2 ,+, p 2 -, m 2 J 


Equivalent Condition 

C = C,a = a, and 

m - m = Cr(l - - X“ _1 - A“ _1 + A.“ -1 ) 

or + = a+, a- = d_, C+ = C+, C_ = C_, and 

in — m = C + r(l — a + )(X+ +_1 — X“ _-1 ) 

-C_r(l - a_)(A“ +_1 + x“- _1 ) 

C = C,a = a, and 

m - m = 2-^Cr ^ (XT" 1 - X- -1 - A.+” 1 + A.“ _1 ) 

C = C,a = a, and 

in — m = K(fS(X 2 — fi 2 )! -1 — fi(X 2 — /l 2 )i _1 ), 

where k = 2 -5 / 1 ^fnCT ) 

Vj,± > 1/2 - <*j and p /j± ^ 0, oq € (0,1) . . , 

P/,± > 1 — aj and py >± ^ 0, ay e (1, 2) ’ 2 


RDTS process (or, C, A.+, A._, m) 


(a, C, X + , X_, fti) 


a := a\ = a 2 

ki, + r“ + _ k 2 ,+rl + ki _r“_ _ fc 2 ,_r| 


, and 

m2 _ mi = r(i - «) E (-1)' ( 1 

7=1,2 VP/.+ + 1 Pj,-+ly 

C = C, a = a , and 

m — m = 2^tCT | ) (X^ 1 - X a _~ l - M _1 + A.“ -1 ) 


a + pi,+ a + p 2 ,+ a + pi _ a + p 2 

/ k i.+ r i.+ 


We can apply Theorem 3 to tempered stable 
processes. For example, let X = (X f ) f >o be a CTS 
process with parameters (a, C, X + , X_, m) on 
measure P and a CTS process with parameters 
(a, C, X + , X_, in) on measure Q. Then P and Q 
are equivalent if and only if C = C, a = a, and 


in—m = CT(1 — a )(X' 


a —1 


r - 1 - x ;- 1 + xr 1 ) 

(13) 

When P and Q are equivalent, the Radon- 
Nikodym derivative is ^ = e Ut where U 

•Ft 

= (U,)t>o is a Levy process with Levy triplet 
(ofh vu, Yu) given by 


(?u = 0, v u = v o f 1 



3/l|y|<i )(> 2 °f V ){dy) (14) 


In equation (14), v is the CTS Levy measure 
given by 

(e~ k + x e~ k - M \ 

v(dx) = C ^-^-lz>o + | x |i +Cf !^<o J dx 

and i/s(x) = (X+ - X+)xl x>0 - (X_ - X_)xl x<0 . 
Proofs can be obtained by Theorem 3, but we 
will not discuss the proofs here. 9 

We can apply the same argument to the GTS, 
MTS, NTS, KRTS, and RDTS processes. 10 The 
necessary and sufficient equivalent condition 
for change of measures for the six tempered 
stable distributions are presented in Table 2. 
Radon-Nikodym derivatives are omitted in the 
table. 

By applying change of measures, we can ob¬ 
tain a martingale process from a CTS process. 
Let a process X° = (X°) f >obeaCTSprocesswith 
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Table 3 Change of Measures for Standard TS Processes: Y t = pt + X t 


(Xf)f>o under Measure P 

(Y f )f> 0 under Measure Q 

Relations of Parameters 

Standard CTS process 
with parameters 
(a, A +/ k _) 

Standard CTS process 
with parameters 

(a, X+, X_) 

A.+ -2 + xr 2 = x+ -2 + xr 2 , 
a“ _ 1 - xr 1 - X“ _1 + xr 1 

(i-cr)(A“- 2 + r- 2 ) 

Standard GTS process 
with parameters 
(a + , ot—, A +/ X-, p) 

Standard GTS process 
with parameters 

(a + , a_, X+, X_, p) 

p 4““ + = pX 2 "“ + 

(1 - P )x a _- 2 = (1 - p)x a _- 2 

,a + -l ra+-l ra_-l , a + -l 

/X | A i A A 

p — p . + (1 — p) . 

(i-a+)x“ + d -«-)C 

Standard MTS process 
with parameters ( a, X + , 
X-) 

Standard MTS process 
with parameters 
(a, X + , X_) 

x+" 2 + xr 2 = r + - 2 + xr 2 

r (ifs) (a*- 1 - xr 1 - x *- 1 + Xr 1 ) 
v/jfr (l - f) (X“ -2 + Xr 2 ) 



ctp 2 -X 2 -p 2 a~fi 2 -X 2 -p 2 

Standard NTS process 
with parameters (a, X, fi) 

Standard NTS process 
with parameters ( a , X, p) 

(X 2 -p 2 ) 2 ~-2 (X 2 -P 2 ) 2 ~i 

P& 2 - ^ 2 )S-1 -/i(A 2 - d 2 )i-i 

M (A. 2 - d 2 ) 1 “ 2 («d 2 ~k 2 ~P 2 ) 



D,+, f 2 ,_ > 0 

“’+Pl’+.2-a , a + Pl.-.2-«“ + P2.+ . 2 _ a , a+P2,-,. 2 -„ 

Standard KRTS process 
with parameters (a,)% + , 
n,~, pi,+, pi,_) 

Standard KRTS process 
with parameters (a, r 2i+ , 
r 2,~, P2,+, P2-) 

2 + p 1>+ ^ + 2 + v ri - 2 + p 2 , + + 2 + p 2 ,_ 2 ~ 

M = E/i 2 ( 1 ycj( Pi,+ + C< r}- a P’’-+ a r i- a \ 

^,=i,2 p . + + ! ;,+ p . _ + 1 h- ) 

Standard RDTS process 
with parameters 
(a, A + , X -) 

Standard RDTS process 
with parameters 

(i a , X + , X_) 

where Cj = 1 . f'“ + Pi ' + rj-“ + “ + Vi ’~ r 2 A 

1 “- 1 V2+P;,+ 2 + p j ,_>--J 

x a + 2 + xr 2 = x “- 2 + xr 2 , 

r ( 1 = 2 ) (a “ _1 - xr 1 - xr 1 + xr 1 ) 

P - V 2 r(i- |)(X “- 2 + Xr 2 ) 


parameters (a, C, k + , /._, 0) on measure P and 
let X = (X f ) f >o be a process with X t = mt + Xf. 
Then X becomes the CTS process with param¬ 
eters (a, C, k + , in) on the measure P. The 
process X is not a martingale on the measure 
P, but we can obtain a measure where X is 
a martingale by the change of measures for CTS 
processes. We assume that X + and X_ are posi¬ 
tive real numbers such that 

0 - m = CT(1 - q ! )(X“" 1 - Xr 1 - A" -1 + kT V ) 

and we define a measure Q equivalent to P such 
that 

Q(A) \jr t = f e u ‘dF, AeT, 

J A 

where {U t )t>o is the Levy process with Levy 
triplet (rjy. vjj, yu) given by equation (14). Then 


the process X becomes the CTS process with 
parameters (a, C, X +/ X_, 0 ) on the measure Q. 
Therefore, the process X is a martingale on mea¬ 
sure Q. 

Furthermore, by applying change of measures 
to the standard CTS process, we obtain the 
following result. Let ( X t ) t >o be a standard CTS 
process with parameters (a, k + , /,_) under a 
measure P, and X + , X + > 0 and real number 
jji satisfy the following: 


" 2 + A “- 2 = X“ 


- 2 + x «-2 


1 xr+xr 1 




(15) 


(i- a )(A.“^+r-h 


Then we can find a measure Q equivalent to 
P such that a process {Y t )t>o with Y t = fit + 
Xt is a standard CTS process with parameters 
(a, X+, X_) under a measure Q. 
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We apply the same argument to the standard 
GTS, standard MTS, standard NTS, standard 
KRTS, and standard RDTS processes. The rela¬ 
tions of parameters between standard tempered 
stable process (X f ) f > 0 under P and standard 
tempered stable process ( Yt)t>o with Y f = /xf 
+ X f under Q are presented in Table 3. 

KEY POINTS 

* The information available to all market agents 
at one time interprets the filtration. 

* Conditional expectation is the best approxi¬ 
mation of the price of assets, portfolios, and 
derivatives under information until the cur¬ 
rent time. 

* Markov processes are used to explain the ef¬ 
ficient market hypothesis in finance. 

* Martingale processes describe the fair price 
or no-arbitrage price for an asset in finance. 

* Change of measure on the Brownian motion 
process is achieved by Girsanov's theorem, 
while change of measure on the Levy process 
is achieved by the Esscher transform or the 
generalized Girsanov theorem. 

* Using the generalized Girsanov theorem, the 
tempered stable process becomes a martin¬ 
gale process. 

NOTES 

1. The general form of the Girsanov's theo¬ 
rem is presented in many articles includ¬ 
ing Karatzas and Shreve (1991), Oksendal 
(2000), and Klebaner (2005). The Black- 
Scholes option pricing formula is derived 
by applying Girsanov's theorem in Harri¬ 
son and Pliska (1981). 

2. We refer to such functions as cadlag func¬ 
tions. 

3. See Koponen (1995), Boyarchenko and Lev- 
endorskii (2000), and Carr et al. (2002). 

4. See Kim et al. (2008c, 2007). 

5. See Andrews (1998). 

6 . See Kim et al. (2009). 


7. See Barndorff-Nielsen and Levendorskii 

( 2001 ). 

8 . See Bianchi et al. (2010) and Kim et al. 

( 2010 ). 

9. See Kim and Lee (2006) for more details. 

10. See Kim et al. (2008a, 2008b, 2009,2010) and 

Bianchi et al. (2010) for more details. 
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Change of Time Methods 
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Abstract: Change of time can be used in financial modeling to introduce stochastic volatility or solve 
many stochastic differential equations. The main idea of the change of time method is to change 
time from t to a nonnegative process T(f) with nondecreasing sample paths (e.g., subordinator). 
Many Levy processes may be written as time-changed Brownian motion. Levy processes can also 
be used as a time change for other Levy processes (subordinators). Using change of time, we can get 
an option pricing formula for an asset following geometric Brownian motion (e.g., Black-Scholes 
formula) and obtain an explicit option pricing formula for an asset following the mean-reverting 
process (e.g., continous-time GARCH proccess). 


In this entry, we provide an overview on 
change of time methods (CTM), and show 
how to solve many stochastic differential equa¬ 
tions (SDEs) in finance (geometric Brown¬ 
ian motion [GBM], Omstein-Uhlenbeck [OU], 
Vasicek, continuous-time GARCH, etc.) using 
the change of time method. As applications of 
CTM we present two different models: geomet¬ 
ric Brownian motion (GBM) and mean-reverting 
models. The solutions of these two models are 
different. But the nice thing is that they can be 
solved by CTM like many other models men¬ 
tioned in this entry. And moreover, we can use 
these solutions to find easy option pricing for¬ 
mulas: One is classic-Black-Scholes and another 
one is new for a mean-reverting asset. These for¬ 
mulas can be used in practice (for example, in 
the energy market) because they all are explicit. 1 


This includes: 

• CTM in martingale and semimartingale setting 

• CTM in SDEs setting 

• Subordination as a change of time 

We present two appplications of CTM: 

• Black-Scholes formula 

• Explicit option pricing formula for a mean- 
reverting asset 

CHANGE OF TIME METHOD 

The main idea of the change of time method is 
to change time from t to a nonnegative process 
T(f) with nondecreasing sample paths. One ex¬ 
ample is subordinator: If X(f) and T(f) > 0 are 
some processes, then X(T(f)) is subordinated to 
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X(f); T(f) is a change of time. Another example is 
time-changed Brownian motion: M(f) = B(T(f)), 
where B(f) is a Brownian motion and T(f) is 
a subordinator (e.g., variance-gamma process 2 
V(t) = B(T(t), where T(f) is a gamma process). 

Bochner (1949) introduced the notion of 
change of time (time-changed Brownian 
motion). Clark (1973) introduced Bochner's 
change of time into financial economics. Feller 
(1966) introduced subordinated process X(T(t)) 
with Markov process X(f) and T(f) as a process 
with independent increments (T(f) was called 
randomized operational time). Johnson (1979) 
introduced the time-changed stochastic volatility 
model (SVM) in continuous time. Johnson and 
Shanno (1987) studied the pricing of options 
using the time-changed stochastic volatility 
(SV) model. Ikeda and Watanabe (1981) intro¬ 
duced and studied change of time for the solu¬ 
tion of SDEs. Barndorff-Nielsen, Nicolato, and 
Shephard (2003) studied the relationship be¬ 
tween subordination and SVM using change of 
time (T(f)-chronometer). Carr, Geman, Madan, 
and Yor (2003) used subordinated processes to 
construct SV for Levy processes (T(f)-business 
time). 

The change of time method is closely asso¬ 
ciated with the embedding problem: To em¬ 
bed a process X(f) in Brownian motion is to 
find a Wiener process W(f) and an increasing 
family of stopping times T(f) such that W(T(f)) 
has the same joint distribution as X(f). Sko- 
rokhod (1965) first treated the embedding prob¬ 
lem, showing that the sum of any sequence 
of independent random variables (r.v.) with 
mean zero and finite variation could be embed¬ 
ded in Brownian motion using stopping times. 
Dambis (1965) and Dubins and Schwartz (1965) 
independently showed that every continuous 
martingale could be embedded in Brownian 
motion. Knight (1971) discovered the multivari¬ 
ate extension of Dambis (1965) and Dubins and 
Schwartz's (1965) result. Huff (1969) showed 
that every process of pathwise bounded vari¬ 
ation could be embedded in Brownian motion. 
Monroe (1972) proved that every right continu¬ 


ous martingale could be embedded in a Brow¬ 
nian motion. Monroe (1978) proved that a pro¬ 
cess can be embedded in Brownian motion if 
and only if this process is a local semimartin¬ 
gale. Meyer (1971) and Papangelou (1972) in¬ 
dependently discovered Knight's (1971) result 
for point processes. 

Rosinski and Woyczynski (1986) considered 
time changes for integrals over stable Levy 
processes. Kallenberg (1992) considered time 
change representations for stable integrals. 

Levy processes can also be used as a time 
change for other Levy processes (subordina- 
tors). Madan and Seneta (1990) introduced the 
variance gamma (VG) process (Brownian mo¬ 
tion with drift time changed by a gamma 
process). Geman, Madan, and Yor (2001) 
considered time changes for Levy processes 
(business time). Carr, Geman, Madan, and 
Yor (2003) used change of time to intro¬ 
duce stochastic volatility into a Levy model 
to achieve leverage effect and a long-term 
skew. Kallsen and Shiryaev (2001) showed 
that the Rosinski-Woyczynski-Kallenberg state¬ 
ment cannot be extended to any other Levy 
process but symmetric a-stable. Swishchuk 
(2004, 2007) applied change of time method 
for options and swaps pricing for Gaussian 
models. 3 

The General Theory of 
Time Changes 

The general theory of change of time for mar¬ 
tingale and semimartingale theories 4 is well 
known. In this entry we give a brief description 
of the change of time method in the following 
settings: martingales and stochastic differential 
equations. 

Martingale and Semimartingale Settings of 
Change of Time 

Let (£2, T, P) be a given probability space with 
a right continuous filtration {T t )t> o- Suppose 
Mf is a square integrable local continuous 
martingale such that lim f ^ +oo (M)(f) = Too 
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almost sure (a.s.), where q := infjp : (M)(u) > 
f | and T t = T Xt Then the time-changed process 
B(t ): = M(r f ) is an jF r Brownian motion. Also, 
M(t) = B(<M> (t )). Here, (•> defines predictable 
quadratic variation. 

If (f > t is a change of time process (i.e., any 
continuous ^-adapted process such that (po = 
0, t -» <j) t is strictly increasing and lim f , +oc 
(p t = +oo a.s.) and if X f is an ^-adapted semi¬ 
martingale, then the process Xf := X T( is an 
^-adapted semimartingale, where tf := infjii : 
<p u > t}, and Tt T tt X t is called the time 
change of X f by <pt- 

Geman, Madan, and Yor (2001) consider pure 
jump Levy processes (which are semimartin¬ 
gales) of finite variation with an infinite arrival 
rate of jumps as models for the logarithm of as¬ 
set prices. These processes also may be written 
as time-changed Brownian motion. Their paper 
exhibits the explicit time change for each of a 
wide class of Levy processes and shows that the 
time change is a weighted price move measure 
of time. 

Stochastic Differential Equations Setting of 
Change of Time 

The change of time method is used to solve the 
following SDE: 

dX t = a(t, X f )dB(t) 

with B(t) being a Brownian motion and a(t, x) 
being a "good" function of t > 0 and x e R. 
Having solved the equation we can also solve 
the general SDE 

dX t = P(t, X t )dt + y(t, X f )dB(t) 

with drift /l(f, X f ) using the method of transfor¬ 
mation of drift (the Girsanov transformation). 5 

Subordinators as Time Changes 

Subordinators 

Feller (1966) introduced a subordinated process 
X r , for a Markov process X f and x t a process 
with independent increments. r f was called a 


randomized operational time. Increasing Levy 
processes can also be used as a time change for 
other Levy processes. 6 Levy processes of this 
kind are called subordinators. They are very 
important ingredients for building Levy-based 
models in finance. 7 If St is a subordinator, then 
its trajectories are almost surely increasing, and 
Sf can be interpreted as a "time deformation" 
and used to "time change" other Levy pro¬ 
cesses. Roughly, if (X f ) f >o is a Levy process and 
(Sf)f>o is a subordinator independent of X f , then 
the process (Yt)t >o defined by Y t : = Xg t is a Levy 
process. 8 This time scale has the financial in¬ 
terpretation of business time, 9 that is, the inte¬ 
grated rate of information arrival. 

Subordinators and Stochastic Volatility 

The time change method was used to introduce 
stochastic volatility into a Levy model to 
achieve the leverage effect and a long-term 
skew. 10 In the Bates (1996) model the leverage 
effect and long-term skew were achieved using 
correlated sources of randomness in the price 
process and the instantaneous volatility. The 
sources of randomness are thus required to be 
Brownian motions. In the Barndorff-Nielsen 
et al. (2001, 2002) model the leverage effect 
and long-term skew are generated using the 
same jumps in the price and volatility without 
a requirement for the sources of randomness to 
be Brownian motions. Another way to achieve 
the leverage effect and long-term skew is to 
make the volatility govern the time scale of 
the Levy process driving jumps in the price. 
Carr et al. (2003) suggested the introduction 
of stochastic volatility into an exponential- 
Levy model via a time change. The generic 
model here is S t = exp(X,) = exp(Y„ t ), where 
u t := /,j cr^ds. The volatility process should be 
positive and mean-reverting (i.e., an Ornstein- 
Uhlenbeck or Cox-Ingersoll-Ross process). 
Barndorff-Nielsen et al. (2003) reviewed and 
placed in context some of their recent work 
on stochastic volatility models including the 
relationship between subordination and 
stochastic volatility. 
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The main difference between the change of 
time method and the subordinator method is 
that in the former case the change of time pro¬ 
cess (p t depends on the process X t/ but in the 
latter case, the subordinator St and Levy pro¬ 
cess X f are independent. 

APPLICATIONS OF CHANGE 
OF TIME METHOD 

The change of time method may be applied to 
get Black-Scholes formula for GBM, explicit op¬ 
tion pricing formula for a mean-reverting as¬ 
set, and to price swaps in financial models with 
stochastic volatility 

Black-Scholes by Change of 
Time Method 

In the early 1970s, Black et al. (1973) made a 
major breakthrough by deriving a pricing for¬ 
mula for vanilla option written on a stock. Their 
model and its extensions assume that the prob¬ 
ability distribution of the underlying cash flow 
at any given future time is lognormal. There 
are many proofs of their result, including par¬ 
tial differential equation and the martingale 
approach. 11 

One of the aims of this entry is to give an idea 
of how to get the Black-Scholes result by the 
change of time method. 

An Option Pricing Formula for a 
Mean-Reverting Asset Model Using 
a Change of Time Method 

Some commodity prices, like oil and gas, ex¬ 
hibit mean reversion. This means that they tend 
over time to return to some long-term mean. 
This mean-reverting model is a one-factor ver¬ 
sion of the two-factor model made popular in 
the context of energy modeling by Pilipovic 
(1997). Black's model (1976) and Schwartz's 
model (1997) have become standard tools to 
price options on commodities. These models 
have the advantage that they give rise to closed- 


form solutions for some types of option. 12 We 
note that the recent book by Geman (2005) dis¬ 
cusses hard and soft commodities (that is, en¬ 
ergy, agriculture, and metals) and also presents 
an analysis of economic and geopolitical issues 
in commodities markets. Here, we show how 
to get an explicit option pricing formula for 
a continuous-time GARCH asset price model 
using change of time. 

One of the aims of this entry is to get an 
explicit option pricing formula for a mean- 
reverting asset using change of time method. 

Swaps by Change of Time Method: 
Heston Model 

One of the applications of change of time 
method is to value variance, volatility, covari¬ 
ance, and correlation swaps for Heston's (1993) 
model. Change of time method for pricing of 
different types of swaps for Heston's model 
and pricing of options has been considered in 
Swishchuk (2004, 2007, 2008c). Applications of 
change of time method to Levy-based stochas¬ 
tic volatility models, interest rates, and energy 
derivatives have been considered in Swishchuk 
(2008a, 2008b, 2010a, 2010b). 

In this section, we apply the change of time 
method to get the Black-Scholes formula and to 
obtain an explicit option pricing formula for a 
mean-reverting asset. 

Change of Time Method 

In this section we give a brief description of 
the change of time method for the martingales 
and stochastic differential equations. Through¬ 
out this entry we consider (£2, T, T t , P) to be a 
probability space with a right continuous filtra¬ 
tion (Ji) t > o 

Change of Time Method in 
Martingale Setting 

In this section, we describe the change of 
time method for a martingale M(f) e M c ^ oc , 
the space of local square integrable continuous 
martingales. 13 
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If M(f) e M C 2° l , limt^ +00 < M > (f) = +oo 
a.s., tf := inf{u :< M > (u) > f} and ft '■= T Xt , 
then the following process with changed time 

W(t) := M(r f ) 

is an f t -Brownian motion (or standard Wiener 
process). 

Consequently, we can express a local martin¬ 
gale M(f) using an JF f -Brownian motion W(f) 
and an ^-stopping time, (since {< M > (f) < 
«} = {r u >t}e T xu = F u ) 

M(t) — W(< M > (f)) 

Change of Time Method in a Stochastic 
Differential Equation Setting 

We consider the following generalization of the 
previous results to an SDE of the following form 
(without a drift) 

dX(t) = a(t, X(t))dW(t) 

where W(f) is a Brownian motion and a(t, X) is a 
continuous and measurable by f and X function 
on [0, +oo) x R. 

The reason we consider this equation is if we 
solve the equation, then we can solve a more 
general equation with a drift f(t, X) using the 
Girsanov transformation. 14 The following re¬ 
sult is used frequently to find a solution of 
an SDE using change of time method. The fol¬ 
lowing theorem is due to Ikeda and Watanabe 
(1981). 15 

Let W(f) be a one-dimensional JF r Wiener pro¬ 
cess with W(0) = 0, given on a probability space 
(£2, T, (f)t>o, P) an d let X(0) be an JF 0 -adopted 
random variable. Define a continuous process 
V = V(t) by 

V(t) = X(0) + W(f) 

Let cp, be the change of time process: 

<pt = l «- 2 (0 s ,X(0) + W(s))ds 
Jo 

X(f) := Vfor 1 ) = X(0) + WO^r 1 ) 


and f t := f^~ i, then there exists ^-adopted 
Wiener process W = W(t) such that (X(t), W(f)) 
is a solution of the initial equation on probabil¬ 
ity space (S3, T, ft , P). 16 

We note that the solution of the following 

SDE 

dX(t) = a(X(t))dW(t) 

may be presented in the following form (which 
follows from the previous theorem) 

m = x(o) + w^- 1 ) 

where a(X) is a continuous measurable func¬ 
tion, W(f) is an ne-dimensional .F f -Wiener pro¬ 
cess with W(0) = 0, given on a probability space 
(£2, T, (f)t>o, P) and X(0) is an J^-adopted ran¬ 
dom variable. In this case 17 

0t= f a~ 2 (X(0) + W(s))ds 
Jo 

and 

0 f - 1= l a 2 (X(0) + W((P; v )ds 
Jo 

Examples: Solutions of Some SDEs 18 

1. Solution for Ornstein-Uhlenbeck (OU) Process 
Using Change of Time. 

Let St satisfy the following SDE: 

dSt = —aStdt + <jdW t 

Then St may be presented in the following 
form using the change of time method: 

St = e~ at [So + Wiff 1 )] 
where <pf l satisfies 

ff l = o 2 [' (e as (So + W(0 s -1 ))) 2 ds 
Jo 

2. Solution for Vasicek Process Using Change of 
Time. 

Let St satisfy the following SDE: 

dSt = oi(b — Sf)dt + adW t 

Then St may be presented in the following form 
using the change of time method 

St = e- at [So - b + Wiff 1 )] 


If 
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where <p t 1 satisfies 


and 


<P 


-l 

t 



— b + W(cj) s *)) + b) 2 ds 


i n* 

®(y) := ~t= / e 2 dx 

V ATI J—co 


(5) 


The above theorem may also be applied to 
solve the Cox-Ingersoll-Ross (1985) equation, 
mean-reversion equation for commodity price 
(Pilipovic, 1997) and geometric Brownian mo¬ 
tion equation (Black-Scholes, 1973). 19 

Black-Scholes Formula by Change 
of Time Method 

Let (£2, T, Tt, P) be a probability space with a 
sample space £2, a -algebra of Borel sets T and 
probability P. The filtration Tt, t e [0, T] is the 
natural filtration of a standard Brownian mo¬ 
tion Wt,t e [0, T], and Tj — T. 

Black-Scholes Formula 

The well-known Black-Scholes (1973) formula 
states if we have (B, S)-security market consist¬ 
ing of riskless asset B(f) with interest rate r as a 
constant 

dB(t) = rB(t)dt, B( 0) >0, r > 0 (1) 

and risky asset (stock) S(t) 

dS(t) = fiS(t)dt + aS(t)dW(t), S(0) > 0 

( 2 ) 

where /x e R is an appreciation rate, a > 0 
is a volatility, then the option price formula 
for European call option with pay-off function 
f(T) = max(S(T) — K, 0) (K > 0 is a strike price) 
has the following look 

C(T) = Sm(y + )-e- rT K®(y_) (3) 

where 

ln(f!) + (r±!i)T 

aVT 


Solution of SDE for Geometric Brownian 
Motion using Change of Time Method 

The solution of equation (2) has the following 
look: 

S(t) = e Mf (S(0) + Wfor 1 )) (6) 

where W(f) is a one-dimensional Wiener pro¬ 
cess, 

0," 1 = f 1 [S(0) + W(07 1 )] 2 ^ 

Jo 

and 

(p t = er“ 2 f [S(0) + W(s)] _2 ds 
Jo 

Black-Scholes Formula by Change of Time 
Method 

In a risk-neutral world the dynamic of stock 
price S(f) has the following look: 

dS(t) = rS(t)dt + aS(t)dW*(t) (7) 

where 

W*(f) := W(t) + —- (8) 

o 

From (6) we have the solution of equation (7) 
S(f) = e rt [S( 0) + w*(^)] (9) 

where 

W*(0 f -!) = S(0)(e aW * (f) -^ - 1) (10) 


y± := 


(4) 


and W*(f) is defined in (8). 
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Let Ep* be an expectation under risk-neutral 
measure (or martingale measure) P* (i.e., 
process e~ rT S(t) is a martingale under the 
measure P*). 

Then the option pricing formula for European 
call option with payoff function 

f{T) = max[S(T) — K, 0] 

has the following look 

C(T) = e~ rT E P *[f(T)] 

= e~ rT E P ,[max{S(T) - K, 0)] (11) 

Using change of time method we have the 
following representation for the process S(f) 
(see (9)) 

S(t) = e rf [S(0) + VT^r 1 )] 

where W*(</> f _1 ) is defined in (10). From (7)-(ll), 
after substitution W*(0 f -1 ) into (9) and S(T) into 
(11), it follows that 

C(T) = e _rT £p»[max(S(T) - K, 0)] 

= e _rT —L= [ +X max[S(0)e auVT+{r -^ T 

V 2tT J —oo 

-K,0]e~^du (12) 

Let y 0 be a solution of the following equation 

S(0 ) e °ySr+( r ~° 2 l 2 ) T = k 

namely. 

In ( —— ) — (r — ct 2 /2)T 
m = --' 

Then (12) may be presented in the following 
form 

C(T) = e -rT —L= j + (S{0)e cruVT+{r -^ )T - K) 

\]2jt J y0 

xe _T du (13) 

Finally, straightforward calculation of the inte¬ 
gral in the right-hand side of (13) gives us the 


Black-Scholes result: 20 

-| z’+OO 2t 

C(T) = —= / S(0)e auVT -^r e~ u/2 du 
V 2 tT J y 0 

- Ke~ rT {\ - <t>(i/o)] 

= ^21 [ + °° e~ u2/2 du - Ke~ rT { 1 - d>(i/o)] 

= S(0)[1 - *(y 0 - <r*/T)] - Ke~ rT [l - <J>(y 0 )l 
= S(0)4»(y + ) - Ke~ rT (14) 

where y± and <J>(y) are defined in (4) and (5). 

Explicit Option Pricing Formula for 
Mean-Reverting Asset Model 
(MRAM) by Change of Time 
Method 

In this section, we consider a risky asset S f fol¬ 
lowing the mean-reverting stochastic process 
given by the following stochastic differential 
equation 

dSt — a(L — St)dt + a S t dW t (15) 

where W f is an ^-measurable one-dimensional 
standard Wiener process, a > 0 is the volatility, 
constant L is called the long-term mean of the 
process, to which it reverts over time, and a > 0 
measures the "strength" of mean reversion. We 
find explicit solution of the equation (15) using 
the change of time method, give some proper¬ 
ties of the mean-reverting asset S f , and present 
an explicit option pricing formula for the 
European call option for this mean-reverting as¬ 
set model of commodity price. 

Explicit Solution of SDE for MRAM 
Equation 

dSt =a(L — St)dt + aStdW t 

in (15) has the following solution 

S t = e~ at [S 0 -L + W^r 1 )] + L 

where W(0 f _1 ) is a one-dimensional Wiener pro¬ 
cess and 

0- 1 = a 2 [ (S 0 - L + W^- 1 ) + e as L) 2 ds 

Jo 
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which follows from the substitution 
V t := e nt (S t - L) 


and theorem above. 

Explicit Option Pricing Formula for 
European Call Option under 
Risk-Neutral Measure 

In this section, we are going to obtain an explicit 
option pricing formula for a European call op¬ 
tion under risk-neutral measure P* using the 
change of time method. 


Mean-Reverting Risk-Neutral Asset Model 

Consider the model given by (15) 

dS t = a (L — St)dt + aStdW t 

We want to find a probability P* equivalent to P, 
under which the process e~ rt St is a martingale, 
where r > 0 is a constant interest rate. 

In a risk-neutral world the model in (15) takes 
the following look: 

dSt = a*(L* — S t )dt + aS t d\N* t (16) 
where 


a*: = a+Xa, L* := - (17) 

CL ~\~ X(T 



and XeR is a market price of risk, which follows 
from the Girsanov theorem. 21 

Now, we are going to apply our method of 
changing of time to the model (16) to obtain the 
explicit option pricing formula. 

Explicit Solution for Mean-Reverting 
Risk-Neutral Asset Model 

Applying the above results to our model (16) 
we obtain the explicit solution (19) for our risk- 
neutral model (16). The explicit solution for the 
risk-neutral model given by (16) has the follow¬ 
ing look 

S t = e- fl *'[S 0 - L* + W*((0n -1 )] + L (19) 


where W*(f) is an .^-measurable standard 
one-dimensional Wiener process in (18) under 
measure P*, (</>,* ) _1 is an inverse function to (p*: 

<p* = a“ 2 [ (S 0 - L* + W*(s) + e B *** L*)~ 2 ds 

Jo 


( 20 ) 


We note that 


wor 1 = ° 2 f 

Jo 


(S 0 -L* + W*((0f)- 1 ) + e a ' s L*fds 


where a* and L* are defined in (17). 


Explicit Option Pricing Formula for European 
Call Option under Risk-Neutral Measure 

The payoff function fj for the European call 
option equals 

f T = (St - K)+ := max(S r - K, 0) 

where Sj is an asset price, T is an expiration 
time (maturity), and K is a strike price. 

In this way (see (19)), 

h = le~ aT (So -L + W*(cPt 1 )) + L - K]+ 

= [S(0)e-‘ , ’V w ’ (r) - ! T L 

Jo 

x ds - K]+ (21) 

The explicit option pricing formula for the 
European call option under a risk-neutral mea¬ 
sure for mean-reverting asset S(f) in (21) has the 
following look: 

C* = e-( r+fl *)Tg(o)ci>(y + ) _ e~ rT K<&(y_) 

/> I/O 

+ L*e- {r+a,)T [(e a * T - 1) - / zF*(dz)] 

Jo 

( 22 ) 


where yo is the solution of the following 
equation 


yo = 


oVt 

ln ( 1+ w fo T e fl ’ s e- CT ^ + ^ds) 


aVf 

y + : = o\fP — y 0 and y_ := —y 0 , 

a* : = a + Xa, L* := ——— 
a +Xo 


(23) 


(24) 
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X is the market price of risk and Fj(dz) is the 
distribution with characteristic function 

= e iHe °’ T ~ l) , i := V^T, AeC 

This result can be obtained from the following 
expression 


~ rT l 

rTj 


Ct ■ = e 1 E P ,fj 

= e~ rT E P *[e~ a * T (So -L + W*^ 1 )) + L* - k] 4 

p+oo 


-rT 


J —c 


max[S(0)e “ ‘ e' 

rT 


~«*T p ayy/T-^f 


+ aLe** 7 f e^e-^+^ds 

Jo 


— K , 0]e 2 dy 

and above-mentioned results. 


Connection with Black-Scholes Result: L* = 0 
and a* = —r and Black-Scholes formula 
follows! 

If L* = 0 and a* — — r then we obtain from (22) 
Ct = S(0)<J>(y + ) — e~ rT K<t>(i/-) (25) 

where 

y+ := <tVt - i/o and V- ■= —yo (26) 

and yo is the solution of the following equation 
(see (23)) 

S(0)e- rT e a y oVT -^ = K 


or 


and 


yo 


In (sfo)) + ( CT 2 r ) 


rVf 



(27) 


But (25)-(27) is exactly the well-known Black- 
Scholes result! 

In this way, we can see that the option pric¬ 
ing formula in (22) for the mean-reverting asset 
S(f) consists of a Black-Scholes part and an ad¬ 
ditional part due to mean reversion. 

The results of this section may be also used to 
model and price variance and volatility swaps 
in energy and commodity markets for assets 
with stochastic volatility that are described 


by a contunuous-time mean-reverting GARCH 
model; see Swishchuk (2010a). 


KEY POINTS 

• The main idea of the change of time method is 
to change time from t to a nonnegative process 
T(f) with nondecreasing sample paths (e.g., 
subordinator). 

• Many Levy processes may be written as time- 
changed Brownian motion. 

• Levy processes can also be used as a 
time change for other Levy processes (sub- 
ordinators). 

• Change of time can be used to introduce 
stochastic volatility or solve many stochastic 
differential equations. 

• Using change of time, we can get an option 
pricing formula for an asset following geo¬ 
metric Brownian motion such as the Black- 
Scholes formula. 

• Using change of time, we can get an ex¬ 
plicit option pricing formula for an asset fol¬ 
lowing the mean-reverting process, such as 
continuous-time GARCH process. 

NOTES 

1. Swishchuk (2007) and Swishchuk (2008c). 

2. Madan et al. (1990). 

3. Barndorff-Nielsen and Shiryaev (2010) state 
the main ideas and results of the stochas¬ 
tic theory of change of time and change of 
measure. 

4. Ikeda and Watanabe (1981). 

5. Ikeda and Watanabe (1981), Chapter IV, Sec¬ 
tion 4, p. 176. 

6. Applebaum (2004), Barndorf-Nielsen et al. 
(2001), Barndorf-Nielsen et al. (2003), 
Bertoin (1996), Cont et al. (2004), and 
Schoutens (2003). 

7. Cont et al. (2004) and Schoutens (2003). 

8. Cont et al. (2004). 

9. Gemanetal. (2001). 

10. Carr et al. (2003). 

11. Wilmott et al. (1995) and Elliott et al. (1999). 

12. Wilmott (2000). 
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13. Ikeda and Watanabe (1981), Theorem 7.2, 
Chapter 2. 

14. Ikeda and Watanabe (1981), Chapter 4, 
Section 4. 

15. Chapter IV, Theorem 4.3. 

16. The proof of this theorem may be found 
in Ikeda and Watanabe (1981), Chapter IV, 
Theorem 4.3. 

17. Ikeda and Watanabe (1981), Chapter IV, 
Example 4.2. 

18. Swishchuk (2007). 

19. Swishchuk (2007). 

20. Black and Scholes (1973). 

21. Elliott etal. (1999). 
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The Concept and Measures of Interest 
Rate Volatility 

ALEXANDER LEVIN, PhD 

Director, Financial Engineering, Andrew Davidson & Co., Inc. 


Abstract: The knowledge of interest rates and cash flows represents the basis for valuation of fixed 
income financial instruments. In reality, not only are future interest rates random, but the future 
cash flows of many securitized investments are also uncertain, as they depend (are "contingent") 
on interest rates. Valuation of rate options and embedded option bonds, including MBS and ABS, 
requires sophisticated models of this randomness. 


In this entry, we introduce the concepts of mar¬ 
ket volatility and discuss how it is measured. 
The dynamics of rates are subject to market 
forces, mean reversion, and combinations of 
diffusions and jumps. 


BASIC DEFINITIONS AND 
FIRST FINDINGS 

We can't tell in advance what interest rates 
will be. Investors may be either enriched or 
bankrupted from sudden changes in interest 
rates. Financial institutions devote considerable 
resources to risk management and hedging. Yet, 
if future interest rates were deterministic, there 
would be no need to hedge. Coping with uncer¬ 
tainty is a central feature of investment markets. 

The pricing of options and embedded-options 
instruments utilizes a statistical concept to de¬ 
scribe the magnitude of potential interest rates 
changes. The key notion is the volatility of in¬ 
terest rates. While this term conjures up images 


of instability, flares of activity, and unpredict¬ 
ability, it is actually a very specific description of 
the range of possible outcomes. More precisely, 
volatility can be defined as the standard devi¬ 
ation of a rate's annualized daily increments. 
Table 1 provides an example for yields on the 
10-year Treasury measured over 10 consecu¬ 
tive business days. As part of the measurement, 
we will be taking a daily time series and then 
transforming into "absolute returns" and "rel¬ 
ative returns"—much like measuring portfolio 
performance. 

The absolute rate changes are computed by 
taking the difference between the interest rates 
on successive days. The relative changes are 
computed by dividing the absolute change by 
the starting rate. For example, for the first day 
the absolute change is 5.00343 — 5.03234 = 
—0.0289. The relative increment is —0.0289/ 
5.03234 = —0.0057. In order to calculate the 
daily volatility, we just take the standard devia¬ 
tion of the daily absolute and relative change 
series. In the example above, the standard 
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Table 1 Example of Volatility Calculations 


Date 

Rate 

Absolute 

Increments 

Relative 

Increments 

03-Jun-02 

04-Jun-02 

5.03234 

5.00343 

-0.0289 

-0.0057 

05-Jun-02 

5.04900 

0.0456 

0.0091 

06-Jun-02 

5.01176 

-0.0372 

-0.0074 

07-Jun-02 

5.06165 

0.0499 

0.0100 

10-Jun-02 

5.03885 

-0.0228 

-0.0045 

ll-Jun-02 

4.97500 

-0.0639 

-0.0127 

12-Jun-02 

4.95004 

-0.0250 

-0.0050 

13-Jun-02 

4.90280 

-0.0472 

-0.0095 

14-Jun-02 

4.80276 

-0.1000 

-0.0204 


deviations are 0.048 (absolute increments) and 
0.00966 (relative increments). The former num¬ 
ber is the standard deviation for daily absolute 
increments; the latter number represents that of 
the daily relative changes. To compute volatil¬ 
ity, we place these daily measures on an annual 
basis scaling by the number of trading days in 
the year (approximately 260): 

Relative Volatility 

= Daily Standard Relative Deviation x V 260 
= 0.00966 x V260 = 0.1557 
Absolute Volatility 

= Daily Standard Absolute Deviation x V 260 
= 0.0480 x V260 = 0.773 

Thus, in our example of the 10-day yield se¬ 
ries, we would calculate the annual volatility as 
77.3 basis points (absolute) or 15.57% (relative). 
The relative volatility times the average yield 
for the period 0.1557 x 4.983 = 0.776 is close 
to the absolute yield volatility of 0.773—as one 
would expect. 

The second relevant clarification may dam¬ 
age a naive understanding of volatility as the 
annual standard deviation. Volatility measures 
only the pace of uncertainty; this concept does 
not assume the daily-measured volatility re¬ 
mains constant over time, just as when driving 
in traffic with starts and stops there is a dif¬ 
ference between instantaneous speed and the 
average velocity. Third, an important assump¬ 
tion for annualizing the daily volatilities is that 


the daily increments are serially independent. If 
there is a relationship between rate changes on 
one day and another day, then we say there is se¬ 
rial correlation. The "square-root rule" will not 
be an accurate measure of the annual volatility 
if there is serial correlation in the random pro¬ 
cess. Figure 1 illustrates that volatility has been 
volatile. 

In the 1980s, both volatility measures exhib¬ 
ited instability, although the relative one ap¬ 
peared to be much more stable. However, since 
the 1990s, the absolute volatility measure has 
become more stable, oscillating around 1% (100 
basis points). Based on these observations, it is 
not hard to understand why during different 
time periods, the relative volatility was mov¬ 
ing inversely, and the absolute volatility di¬ 
rectly, with respect to the rate level. Aside from 
the explicit level-related effect, both volatil¬ 
ity measures seemingly synchronously react to 
economic disturbances. Pricing in the interest 
rate options market reflects these important 
findings. 

Different points of the yield curve have dif¬ 
fering volatility, too. This observation suggests 
that not only do the rates have a "term struc¬ 
ture," but their volatility has a term structure as 
well. A hump shape of such a volatility curve 
is often observed (see Figure 2). It can be at¬ 
tributed to (1) absence of change in the short 
rates unless regulators take actions and (2) the 
dampening force of the mean reversion. We will 
explain both factors further in the entry. 

A DIFFUSIVE MODEL FOR 
RANDOMNESS 

Can we describe the randomness mathemati¬ 
cally? It is perhaps simpler than it sounds. In 
fact, having become acquainted with volatility, 
we did most of the task. A general diffusive 
model for an interest rate process that describes 
how interest rates will vary over time, r(t), will 
have the following form: 

dr = ( Drift)dt + (Volatility)dz 


( 1 ) 
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Figure 1 History of Volatility for the 10-year Treasury Yield 


What does this mean? Notations dr and dt re¬ 
fer to small increments measured over infinites¬ 
imally short time. Variable dz represents small 
changes in z(f) which is called Brownian mo¬ 
tion, also known as the Wiener process. It is the 
source of randomness. We cannot control the 
exact value of this variable. Drift and volatility 


describe how the changes in rates are related 
to changes in time and the random variable dz. 
Mathematical model (1) can be thought of in the 
following way: The change in interest rate over 
a small time period is the result of a number 
representing systematic drift times the amount 
of time change plus a random shock scaled by 
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Figure 2 



Maturity Months 

History: 1 st quarter of 2002 

Historical Volatility Term Structure for the Swap Rates 


the amount of volatility. 

A r = ( Drift)(Passage of time) + ( Volatility) 
x (Random shock) 

A Brief Excursion to Brownian 
Motion 

Brownian motion: 

* Is continuous 

* Is normally distributed 

* Has a zero mean ("centered") 

* Has time increments that are serially indepen¬ 
dent 

* Has its own volatility scaled to 1 

Therefore, z(t) has a standard deviation of >/f 
(for the same reason that a square root appears 
in the annualization of daily volatility). Any 
particular function z(f) is said to be a "real¬ 
ization" or a "sample path" of the Brownian 
motion. The Brownian motion, therefore, can 
be thought of as a container of random paths 
subject to the conditions described above. Fig¬ 
ure 3 depicts a sample path and the single- and 
double-standard deviation zones. 


With the use of volatility multiples, we can 
scale the rate process to any volatility level. The 
drift variable simulates a systematic, nonran¬ 
dom tendency. For example, it can model a cen¬ 
tral tendency function known as mean reversion. 
Equation (1) is called the stochastic differential 
equation. Both multiples, drift and volatility do 
not have to be constants. They can be functions 
of time, f, and rate, r. Any particular specifi¬ 
cation of drift(t,r) and volatility(t,r) leads to a 
specific rate model, but not necessarily a good 
one. At this stage, it is enough to understand 
that a good model can be a strong quantita¬ 
tive pricing tool. Although we cannot know 
what the random variable z(t) is going to do, 
we, at least, can simulate its behavior with a 
large number of random scenarios. The Monte 
Carlo method draws on this idea. On the other 
hand, we may be able to do some intelligent 
analytical work making the brute-force simu¬ 
lations unnecessary. We could also make sure 
that the model is consistent with ("calibrated 
to") prices of widely traded interest rate instru¬ 
ments; then we will feel more confident apply¬ 
ing it to the exotic options or the market for 
mortgage-backed securities (MBSs) and asset- 
backed securities (ABSs). 
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Figure 3 Brownian Motion's Sample Path with Deviation Zones 


MEAN REVERSION AND 
MARKET STABILITY 

Consider the following special form of equation 
(1): dr = 5 rdt + adz. Can this equation model 
an actual interest rate? Since the formula shows 
that the change in interest rate increases with 
changes in time, the average rate will continue 
to grow. Utilizing calculus, we note that the so¬ 
lution to this equation will not only grow with 
time, but will grow exponentially as it contains 
an e 5t term. Since interest rates cannot increase 
exponentially forever (at least, they never have), 
we need to dismiss this formula as inappropri¬ 
ate for the job. 

How about dr = adz ? The drift is chosen to be 
zero, and provided that the initial value r( 0) 
is known, the process will randomly evolve 
around this value, on average. Whether the ini¬ 
tial rate is high or low, the model will stay cen¬ 
tered around it. The standard deviation, as we 
already know, will grow as *Jt . This may not be 
a very good thing either. A century from now, 
the magnitude of the standard deviation will be 
huge, at ten times annual volatility. Figure IB 
demonstrated that interest rates tend to stay 
within a range. 


Both models briefly reviewed above suffer 
with the same disease: They are unstable. Ob¬ 
servable objects in economy, finance, engineer¬ 
ing, or physics tend to be stable; otherwise, they 
would not be able to exist long enough to be ob¬ 
served. The feature making financial markets 
stable is known as mean reversion. It is sim¬ 
ply a properly chosen specification of the drift 
term that would ensure the dampening effect 
(also known as central tendency). If the rate ran¬ 
domly has grown too high or fallen too low, the 
drift term will help "return" it back. Here is an 
example: 

dr = a(roo — r)dt + adz (2) 

where mean reversion parameter a > 0. This 
time, the solution will contain e~ at , a decaying 
component that indicates stability. The mean 
converges to parameter r 0 0 , the long-term equi¬ 
librium (now we see the point for this strange 
notation). The standard deviation will grow 
with time as 

aj(l — e _2fl ')/2fl 
and converges to 


a/V2a 
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Figure 4 Standard Deviation and Average Volatility for Different Values of a 


as the horizon extends. Figure 4 compares the 
standard deviation (lines launching from the 
origin) and equivalent ("average") volatility for 
different levels of mean reversion including the 
zero one (a = 1% was assumed). 

Mean reversion, therefore, stabilizes the mar¬ 
ket. It also explains why volatility is typically 
measured on a daily basis; in the presence 
of mean reversion, the average volatility mea¬ 
sured over a time horizon generally depends on 
this horizon. For example, if a — 10%, only 95% 
of actual volatility is seen in annual increments. 

If r(t ) in (2) is the short market rate, then ev¬ 
ery other rate (the 5-year, the 10-year, etc.) can 
be derived as a function of it. In particular, for 
mean-reverting models, volatility of long rates 
should eventually fade with maturity of the 
rate, and it does happen as seen in Figure 2. 
Mathematically, it is a direct consequence of the 
mean reversion: The short rate's uncertainty gets 
limited going forward, thereby making long¬ 
term bonds less volatile now. Economically, dis¬ 
count rates for very remote cash flows should 
be almost certain; otherwise their present val¬ 
ues would be infinitely risky. 

Does it seem that model (2) makes sense? 
Well, Vasicek (1977) noticed it, as one of the 
first interest rate models. It is been popular and 


important since and was a basis for many of the 
models that are used today. 


THE RATE DISTRIBUTION 

Equation (2) is a linear stochastic differential 
equation disturbed by a Brownian motion. The 
math tells us that the output of this equation, 
rate r(t), is going to be normally distributed. Al¬ 
though it makes the model tractable, the nega¬ 
tive rates are not precluded, which may or may 
not be a problem. Arguably, the actual rates 
should stay positive—at least, they almost al¬ 
ways have been. When using process (2), odds 
of negative rates grow with future time, as the 
present value falls. In addition, mortgages and 
related securities are amortizing and may have 
small balances and cash flows years from now. 
Levin (2004) provided a quantitative support 
for the use of normal distribution by showing it 
does not distort options' value materially. 

The fact that a Brownian motion is normally 
distributed does not require that the rate pro¬ 
cess be such. For example, considering expo¬ 
nential transformation R = exp(r) and using R 
as the rate, rather than r, we ensure that the rate 
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Figure 5 Jumpy and Continuous Interest Rates 


remains positive. Such a process is said to have 
lognormal distribution. The mean and standard 
deviation for this known distribution is explic¬ 
itly stated through the mean /i and the standard 
deviation a of the original variable, r(f): 

£(R) = exp(V + i<7 2 ^ 

std(R) = exp + -ct 2 ^ yexp(er 2 ) — 1 

Another popular example is the squared 
transformation, R — r 2 , that also guarantees that 
rate R stays positive; the distribution of such 
defined rate is known as noncentral y 2 . For the 
squared transformation, 

E(R) = ix 2 + (j 2 
std(R) = a^Jla 2 + 4/r 2 

INTEREST RATE JUMPS 

Stochastic differential equation (1) is not the 
most general mathematical form of a random 
process. It applies only to diffusions, that is, con¬ 
tinuous random processes. Stochastic calculus 
considers many other forms of randomness; an 
important one is called random jumps or ran¬ 
dom events. To appreciate this type of random¬ 


ness, let us consider the history of three rates, 
the 1-month London Interbank Offered Rate 
(LIBOR) ("LIBOR 01"), the 2-year swap ("LI¬ 
BOR 24"), and the 10-year swap rate ("LIBOR 
120"), depicted in Figure 5. 

Both swap rates change almost continuously, 
day after day, and little by little, at times 
randomly oscillating, in response to the mar¬ 
ket forces. The 1-month rate was changing in 
a suspiciously smooth fashion between sud¬ 
den jumps featuring apparent and prolonged 
plateaus that are not seen for the swap rates. 
For example, in 2002 to 2004, it had been barely 
changing for a while, and then plunged re¬ 
sponding to the Fed's actions. Furthermore, 
whereas the visual dynamics for all three rates 
seem to resemble one another, statistical mea¬ 
surements of correlation between daily incre¬ 
ments overwhelmingly reject such a conclusion. 
For this 18-year-long history (over 4,500 obser¬ 
vations) we computed a small 7% correlation 
between daily increments of the 1-month and 
the 2-year rates, and an even smaller 4% corre¬ 
lation between 1-month and 10-year rates. What 
if we measure correlations between increments 
in monthly averages (216 observations), thereby 
filtering out the disparity between daily dy¬ 
namics? Then the 7% goes way up to 46% and 
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Table 2 Empirical Correlations between Periodic Increments, 1992-2010 History 



Daily 

Monthly 

Quarterly 

Semiannually 

Annually 

2-year to 1-month 

7% 

46 

65 

81 

90 

10-year to 1-month 

4 

24 

40 

51 

71 

Total observations 

4,527 

216 

71 

35 

17 


the 4% to 24%, and the interrate correlations 
continue to improve steadily as we extend the 
averaging period (Table 2). 

These objective facts suggest that a stochas¬ 
tic diffusive model suitable for swap rates may 
be not perfectly appropriate for short rates. A 
random jump component may be necessary to 
explain the actual short-rate behavior and as¬ 
sociated option pricing. One popular mathe¬ 
matical form of jumps is the Poisson process. 
It is simply a random occurrence of events de¬ 
scribed by a single parameter k called frequency 
or intensity. The average number of events to 
occur during a time interval of t is equal to 
kt; curiously enough the variance of this num¬ 
ber is equal to kt too. Probability that we will 
have exactly j events during this time interval 
is equal to e~ xt (kt)i //! It is only the period's 
length that really matters, not when the period 
starts—for this reason, the Poisson process can't 
be used to describe, say, human deaths or bulb 
failures when the attained age is a strong fac¬ 
tor. However, it is plausible to assume that the 
Poisson jumps describe some events in financial 
markets. 

Aside from the jumps' arrival, the size of 
jumps can be also random. Merton (1976) intro¬ 
duced an option-pricing model when the un¬ 
derlying process includes Poisson jumps with 
normally distributed magnitude. Using math¬ 
ematical notations, we can express the model 
as 

dr = ( Drift)dt + (Volatility)dz 

+ (Jump Volatility)dN (3) 

where N is the Poisson-Merton jump variable. 
When jump occurs, dN is drawn from the stan¬ 
dard normal distribution N[0,1]; it stays 0 oth¬ 


erwise. In a less strict notations, 

A r = ( Drift)(Passage of time) 

+ ( Volatility)(Random shock) 

+ (Jump volatility)(Random jump) 

The practical difference between random 
shock and random jump is that, for a small 
time interval, the former is small, but nonzero, 
whereas the latter is mostly zero and rarely fi¬ 
nite. Hence, equation (3) describes a more gen¬ 
eral stochastic process combining diffusion and 
jumps ("jump-diffusion"). Notably, mathemat¬ 
ical variance of the Poisson process N(t) is too 
proportional to the time horizon t. This fact al¬ 
lows aligning interpretations of a <i = Volatility 
and Uj = Jump Volatility: for very small f, the 
standard deviation of r(t) is equal to 

t^aj + ka'j 

meaning that the mixed volatility will be simply 



Furthermore, if we generalize the linear mean- 
reverting Vasicek model given by (2) by adding 
a jump term, then expressions for the mean and 
the standard deviation of r(t) won't change; it 
will be enough to replace a. 

At first, it is tempting to interpret a jump- 
diffusion process as diffusion with another 
volatility scale. In reality, probability distri¬ 
butions of these two stochastic patterns are 
different. Inclusion of jumps "fattens" the distri¬ 
bution's tail (see Table 3) and is much more suit¬ 
able for modeling and pricing rare events like 
a corporation's defaults or credit downgrade, a 
financial crisis, reaching a very remote option's 
strike, or change in the short rate over a fairly 
short horizon. 
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Table 3 Comparison of Distribution's Tails for Poisson-Merton Jump Processes 


Value of Xt 

Prob (r < fi — 4(t) 

Prob (r < fi - 3or) 

Prob(r < a — 2a) 

Prob(r < fi — la) 

0.2 

0.789% 

1.777% 

3.505% 

6.022% 

1 

0.158 

0.753 

3.357 

12.568 

5 

0.027 

0.303 

2.559 

14.732 

Infinite (normal) 

0.003 

0.135 

2.275 

15.866 


Models with Poisson-Merton processes con¬ 
verge to normal when the value of Xt is 
large (frequent jumps are similar to diffusion), 
but may produce significantly different results 
when it is small. 

Stochastic differential equations (1) and (3) 
can be viewed as building blocks for the interest 
rate modeling. Some models used today in the 
financial industry are multifactor with the short 
rate r(t) defined not as the solution to equations 
(1) or (3), but as their sum. When modeling 
LIBOR, neither the jump arrivals have to be 
Poissonian, nor the magnitude has to be nor¬ 
mal. For example, Chan et al. (2003) developed 
a model with rate jumps timed to periodic Fed 
meetings, and the magnitude being a random 
multiple of 25 bps. There exist other modeling 
views at interest rate dynamics that we don't 
cover in this entry including continuous ran¬ 
domness with stochastic volatility levels; see 
James and Webber (2000) for a comprehensive 
overview. 


KEY POINTS 

* The most common way of simulating inter¬ 
est rates' uncertainty is employing stochas¬ 
tic differential equations containing drift and 
volatility terms. 

• In older times, absolute volatility was direc¬ 
tionally related to the rate's level; this rela¬ 
tionship has gradually disappeared. 


• The drift term must contain a mean reversion, 
that is, stabilization force. 

• Actual short rates (LIBOR) have been his¬ 
torically jumpy and require adding random 
jumps to diffusions. 
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Abstract: Market randomness makes the fair value of a financial instrument an expectation. It also 
requires a rigorous quantification of the dynamics of interest rates; that is, a well-defined interest 
rate model. Prices of interest rate options and options embedded in bonds such as corporate or 
agency callable debts, mortgage-backed securities, and asset-backed securities will firmly depend 
on this modeling work. Contemporary interest rate models employ the available information about 
currently observed forward rates and vanilla European options and are "calibrated" to them. The 
relationships between bond rates should preclude arbitrage. Some analytically tractable models 
ensure these properties explicitly. Selecting the "best" term structure model is becoming more 
a conscientious task and less a matter of taste. Measuring "volatility skew" for widely traded 
swaptions is a simple technique that yields rich results. Another method is computing volatility 
indexes produced by different models and tracking their stability. Recent trading history confirms 
normalization of the swaption market making the Hull-White model, the extended Cox-Ingersoll- 
Ross model, or the squared Gaussian model more attractive than formerly popular lognormal 
models. Single-factor models cannot value accurately curve options or some exotic derivatives that 
are exposed to the yield curve shape and require multifactor modeling work. The affine theory offers 
a systematic method of constructing such models. It also allows for jump-diffusion extensions that 
may be necessary to explain volatility smile; that is, an excessive convexity of the Black volatility 
as a function of strike. 


This entry introduces a family of models for 
stochastic behavior of interest rates and the 
principles of their design widely used by mar¬ 
ket participants. 


THE CONCEPT OF 
SHORT-RATE MODELING 

Why do we call interest rate models term- 
structure models? Aren't there too many rates 
for one model? The tree-based valuation ex¬ 


amples found in many books and research 
papers show us that we can value an any- 
maturity bond and thereby reconstruct the en¬ 
tire term structure using only dynamics of 
one-period rate (see, for example, Davidson 
et al., 2003 [Chapter 12], and Fabozzi, 1994). 
Interest rate models operating with the short 
(one-period) rate r(t ) as their main object are 
commonly referred to as "short-rate models." 
They are different by construction from so- 
called "forward rate models," such as the 
Heath-Jarrow-Morton model (Heath, Jarrow, 
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and Morton, 1992) or the Brace-Gatarek- 
Musiela model (Brace, Gatarek, and Musiela, 
1997). Both types of interest rate modeling are 
designed to solve the same problems and are 
widely used for valuation of fixed income op¬ 
tions and embedded option bonds, but oper¬ 
ate with different objects. Unlike the short-rate 
modeling family, forward rate models employ 
and randomly evolve the entire forward curve 
of the short rate, /(f,T), in which the f is time 
and T is the forward time, to which the short 
rate applies. 

We restrict our attention solely to the short- 
rate modeling. This term does not assume that 
any short-rate term structure model is a one- 
factor model or depends only on the short rate. 


Formula (1) allows us to compute any- 
maturity zero-coupon rates via some expecta¬ 
tion involving random behavior of the short 
rate. Of course, once we establish the entire 
zero-coupon curve, we can restore a yield for 
any other bond including a coupon-paying one. 
To compute the expectation in (1), we must 
know two things: stochastic equation (or equa¬ 
tions) for r( r) and initial (time f) conditions. The 
latter represents public information about the 
market at time f and includes every factor affect¬ 
ing the short rate. Therefore, it would be correct 
to state that an any-maturity rate can be recov¬ 
ered using only factors that determine the evo¬ 
lution of the short rate. In particular, if only one 
Brownian motion drives the short rate dynam¬ 
ics, it will define the entire yield curve as well. 


The Arbitrage-Free Interrate 
Relationship 

Let us assume that we have a stochastic process, 
possibly multifactor, describing the short rate 
dynamics r(t). Let us denote P-/ (f) to be the mar¬ 
ket price observed at time t of a T-maturity zero- 
coupon bond; that is, a bond paying $1 at t + T. 
This price is exponential to the yield to maturity 
("rate") rj(t) of this bond: Pj(0 = exp[— rr(t)T]. 
However, we can use the arbitrage argument 
claiming that, once prices of instruments re¬ 
flect rate expectations and risks, there should 
exist no advantage or disadvantage in invest¬ 
ing in the zero-coupon bond over continuous 
reinvesting into the short rate. Hence, the same 
price should be equal to 




t+T ~\ ~ 

Pr(i) = E 

exp 

1 

-1 

1 

_1 


where E denotes the arbitrage-free expectation. 
Equating these two expressions, we get 


■1 


t+T ~\ ~ 

MO = ~Y LnE 

exp 

1 

-1 

b. 

1 

_1 


Consistency with the Initial 
Yield Curve 

Let us apply the interrate relationship (1) to the 
initial point of time, t = 0: 

( 2 ) 

The left-hand side of this formula is known 
from today's term structure of interest rates. 
Hence, the short rate dynamics r(t) must be 
such as to ensure (2) holds. In practical terms, 
adjusting a rate process to fit the initial yield 
curve is part of a more general task often termed 
"calibration." Without this necessary step, an 
interest rate model can't be used to value even 
simple, option-free bonds. Computation of ex¬ 
pectation in formulas (1) and (2) can be done 
numerically or, in some models, analytically. 


r T (0) = ——LnE 


exp 


1 

f 


r{r)dr 


_ o 


Consistency with European 
Option Values 

If a term structure model is built to value com¬ 
plex derivative instruments, it must value, at 
minimum, simple European options. Suppose 
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we have an option that is exercised at a future 
point of time f and generates a cash flow that 
we denote g[r(t)]; that is, some nonlinear func¬ 
tion of the short rate observed at f. Note that the 
actual option's exercise may be triggered by a 
long, rather than the short, rate; nevertheless, it 
will depend either on r(f) (single-factor models) 
or all market factors (multifactor models) known 
at t. The value of the option is going to be 



where £ denotes the same expectation as before. 

We may now demand that the short rate 
process r(f) produces options values (3) 
that match market prices. Most commonly, 
term structure models are calibrated to 
LIBOR caps, or European options on swaps 
(swaptions), or both. These are standard, 
widely traded European options. For example, 
a call option on a T-maturity swap will gener¬ 
ate cash flow equal to g[r(f)] = Ar(t)[K — cr(f)] + 
where A denotes annuity, c denotes the swap 
rate, both measured at f, and superscript "+" 
indicates that only a positive value is taken. 
Another standard derivative is the LIBOR cap 
made of "caplets," that is, European calls on 
some relatively short rate. A T-maturity LIBOR 
caplet (T = 3 months for standard caps) expir¬ 
ing at t pays [?r(f) — K] + at f + T. To recognize 
the time difference T between the caplet's ex¬ 
piry and the actual pay, we can move the pay¬ 
off from f + T to f and express it as y[r(f)] = 
[rT’(t) — K] + /(1 + Trj(t)]. We then have to make 
sure that formula (3) yields correct values for 
the caplets. Note that the cap market does not 
usually quote caplets directly; however, their 
values can be assessed by bootstrapping. 


SINGLE-FACTOR 
SHORT-RATE MODELS 

In the this section, we describe several dif¬ 
ferent single-factor models, which employ the 


short rate as the only factor. We also give some 
evidence on the relative performance of the 
models. For each of the models, we emphasize 
three key aspects: the model's formulation, its 
arbitrage-free calibration, and the interrate rela¬ 
tionship that recovers the entire term structure 
contingent on the dynamics of the short rate. 

The Hull-White/Vasicek Model 

The Hull-White (HW) model (Hull and White, 
1994) describes the dynamics of the short rate 
r(t ) in the form given by 

dr = a(t)(6(t) — r)dt + a(t)dz (4) 

Here, a(t) denotes mean reversion, er(f) stands 
for volatility; both can be time-dependent. 
Function 0(f) is sometimes referred to as 
"arbitrage-free" drift. This terminology is 
caused by the fact that, by selecting proper 0(f), 
we can match any observed yield curve. The 
HW model was preceded by the Vasicek model 
having 0(f) = 0. The short rate is normally dis¬ 
tributed in this model, so the volatility repre¬ 
sents absolute rather than relative changes. 

This can be seen mathematically as (4) is a 
linear equation disturbed by the Brownian mo¬ 
tion (a normally distributed variable); the short 
rate is normally distributed as well. Therefore, 
its integral is normally distributed too, and the 
expectation found in the right-hand side of for¬ 
mulas (1), (2), and, in some cases, (3) can be com¬ 
puted in a closed form. Without going through 
the math we provide here the analytical calibra¬ 
tion results to the observed short forward curve 
/(f) for the constant-parameter case: 

0(») = /(»)+lit/+ ^0-e- 2 ") (5) 

The short rate's expectation is found as 

E[r(t)] = f(t) + £ i ( l-e- at ) 2 (6) 

The last term in (6) is called the convexity ad¬ 
justment; that is, the difference between mathe¬ 
matically expected short rates in the future and 
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the forward short rates. This adjustment is pro¬ 
portional to volatility squared; for zero mean 
reversion, it is simply equal to ^a 2 t. It is there¬ 
fore up to financial engineers to make sure the 
convexity adjustment is properly implemented 
in a pricing system; it is very volatility sensitive. 

The expected value for any long, T-maturity, 
zero-coupon rate is proven to be in the 
same form: forward rate + convexity adjust¬ 
ment. This time, the exact formula for this 
relation is 

E[r T (t)] = f T (t) 

+ 4^ (1 - e " flT)[2(1 - e "" )2 
+(1 - e -2flt )(l - e~ aT ) (7) 

Any long zero-coupon rate is normally dis¬ 
tributed too and proven to be linear in the short 
rate; deviations from their respective mean lev¬ 
els are related as 

A r T = r T (t) - E[r T (t)] _ 1 - e~ aT = g 

A r r(t) — E[r(t)] aT T 

( 8 ) 

The function Bt of maturity T plays an impor¬ 
tant role in the H W model. It helps, for example, 
to link the short-rate volatility to the long-rate one 
and explicitly calibrate it to the market. If a = 0, 
this function becomes identical to 1, regardless 
the maturity T. This important special case al¬ 
lows for a pure parallel change in the entire 
curve (every point moves by the same amount). 
This particular specification can be suitable for 
standardized risk measurement tests. 

The HW model is a very tractable arbitrage- 
free model, which allows for the use of 
analytical solutions as well as Monte Carlo sim¬ 
ulation. The volatility er and mean reversion a 
can be analytically calibrated to European op¬ 
tions on zero-coupon bonds. Most commonly, 
the HW model is calibrated to either a set of 
short-rate options (LIBOR caps) or swaptions. 
In the later case, very good approximations can 
be constructed (see Levin, 2001; Musiela and 
Rutkowski, 2000). The model's chief drawback 
is that it produces negative interest rates. How¬ 
ever, with mean reversion, the effect of negative 


rates is reduced. The rate history of the 1990s 
and 2000s supports this type of formulation of 
a term structure model. 


The Cox-Ingersoll-Ross Model 

The Cox-Ingersoll-Ross model (CIR model) is 
a unique example of a model supported by 
the general equilibrium arguments (see Cox, 
Ingersoll, and Ross, 1985). CIR argued that the 
fixed income investment opportunities should 
not be dominated by neither expected return 
(the rate), nor the risk. The latter was associ¬ 
ated with the return variance, thus suggesting 
that volatility-squared should be of the same 
magnitude as the rate: 

dr = a(t)(6(t) — r)dt + o(t)y/r dz (9) 


Equation (9) is actually a no-arbitrage exten¬ 
sion to the "original CIR" that allows fitting 
the initial rate and volatility curves. Since the 
volatility term is proportional to the square root 
of the short rate, the latter is meant to remain 
positive. The extended CIR model is analyti¬ 
cally tractable, but to a lesser extent than the 
HW model. Perhaps the most important result 
of CIR is that the long zero-coupon rates are also 
proven linear in the short rate—in line with (8). 
However, the slope function has now a quite 
different form; it depends on both maturity T 
and time t and is found as Bt( t) = —b(t,t + T)/T. 
Function b(t,T) used in this expression solves 
a Ricatti-type differential equation, considered 
for any fixed maturity T: 


= a(t)b(t, T) - ia 2 (f)b 2 (f, T) + 1 

( 10 ) 

subject to terminal condition b(T,T) = 0. 

If the mean reversion a and "CIR volatility" a 
are constant (the "original CIR"), equation (10) 
allows for an explicit solution. In this case, b(t,T) 
is a function of T — t only, and Bj is appeared 
to be time-independent: 


2(e yT - 1) 

(yT +aT)(e yT -l) + 2yT 


(11) 
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where y = s/a 2 + 2a 2 . 

Without a mean reversion, this formula re¬ 
duces to a more concise 

tanh(yT/2) 

jDT — - 

(yT/2) 

Note that this ratio is always less than 1. This 
means that the long rates are less volatile than 
the short one, even without a mean reversion. 
This is in contrast to the HW model where, with 
a = 0, the yield curve would experience a strictly 
parallel reaction to a short rate shock. 

Generally speaking, calibration to the cur¬ 
rently observed short forward curve/(T) can¬ 
not be done as elegantly and explicitly as in the 
HW model. Once the b(t,T) function is found, 
the calibrating function 0(t) satisfies an integral 
equation: 

,, v f dbtt , T) , „ , x , db(0, T) 

— /( T ) = J dT 0(t)a(t)dt H-^r 0 

o 

( 12 ) 

Numerical methods, well developed for inte¬ 
gral equations, should be employed. 

It is established that all zero-coupon rates, 
under the CIR model, have noncentral y 2 dis¬ 
tributions and remain positive. Economic ratio¬ 
nale, nonnegative rates, and analytic tractability 
have made the CIR model deservedly popular; 
it is one of the most attractive and useful in¬ 
terest rate models. It is also consistent with the 
Japanese market and some periods of the U.S. 
rate history when rates were very low. 

The Squared Gaussian Model 

To describe the squared Gaussian model (SqG 
model, and also known as the quadratic model), 
we employ a linear differential equation (4) only 
to define an auxiliary variable x(t); we then de¬ 
fine the short rate in a form of its square: 

dx = —a(t)xdt + a(t)dz 
r(f) = [R(f) + x(f)] 2 (13) 

For convenience, we removed previously 
used arbitrage-free function 9(t) from the first 


equation and introduced a deterministic cal¬ 
ibrating function R(f) to the second equation 
serving the same purpose. Note that we could 
have introduced the HW model similarly by 
defining the short rate as r(t) — R(t) + x(t). Ito's 
lemma allows us to convert model (13) to a sin¬ 
gle stochastic differential equation for the short 
rate: 

dr = [2R'v / F — 2 a(r — Rfr) + a 2 ]dt 

+2afrdz (14) 

where R' stands for dR / dt. The SqG model has 
an apparent similarity to the CIR model in that 
its volatility term is proportional to the square 
root of the short rate, too. However, comparing 
stochastic equations (14) and (9) we see that 
they have different drift terms. 

The SqG model has been studied by Beagle¬ 
hole and Tenney (1991), Jamshidian (1996), and 
Pelsser (1997), among others. The most notable 
fact established for the SqG model is that any 
zero-coupon rate i'r(t) is quadratic in x(t) that is 
linear in the short rate rtf) and its square root 

(T - t)r T (t) = A(t, T) - B(t, T)y/r(t) 

—C(t, T)r(t) (15) 

Functions A, B, and C satisfy a system of 


ordinary differential equations: 


A = BR' + a 2 (±B 2 + C) + aRB 

(16a) 

with A(T,T) = 0 


B' = aB- 2CR' - 2aCR - 2a 2 BC 

(16b) 

with B(T,T) = 0 


C = 1 + 2aC - 2 cr 2 C 2 

(16c) 


with C(T,T) = 0 

where, for brevity, A ' and the like denote 
derivatives with respect to time f and the de¬ 
pendence of all functions on t and T is omitted. 
Note that all the terminal conditions are set to 
zero. Indeed, once t is equal to T, both sides 
of the relationship (15) must become zero for 
any value of r; this is possible if and only if 
functions A, B, and C turn to zero. Much like 
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in the CIR model, equation (16c) for the linear 
term's slope, this time denoted via C, is of a Ric- 
cati type (see Boyle, Tian, and Guan, 2002) and 
can be solved in a closed-end form. In fact, it 
is identical to already solved equation (10) ex¬ 
cept it operates with a doubled mean reversion 
and a doubled volatility Other equations in (16) 
and calibration to the initial yield curve can be 
solved numerically. 

The short rate has a noncentral y 2 distribution 
with 1 degree of freedom. Long rates are mix¬ 
tures of normal and y 2 deviates. Like the CIR 
model, the SqG model ensures positive rates; 
the square-root specification of volatility is suit¬ 
able for many options. Due to some analytical 
tractability and known form for long rates, the 
volatility function and mean reversion can be 
quite accurately calibrated to traded options. 


The Black-Karasinski Model 

Once a very popular model, the Black- 
Karasinski model (BK model) expresses the 
short rate as r(t) = R(f)exp[x(f)], where, as in 
the previous case, random process x(t) is nor¬ 
mally distributed (see Black and Karasinski, 
1991). The short rate is, therefore, lognormally 
distributed. Assuming the same process for x(t) 
we can write the stochastic differential equation 
for the short rate as 

/ R' 1 r\ 

dr = r (-1— a 2 — rtln — I dt + radz (17) 

\R 2 RJ 

The rate's absolute volatility is therefore pro¬ 
portional to the rate's level. Although the en¬ 
tire short-rate distribution is known (includ¬ 
ing the mean and variance), no closed-form 
pricing solution is available. This is because 
the cumulative discount rate, the integral of r, 
has an unknown distribution. Traditionally, the 
BK model is implemented on a tree. Calibra¬ 
tion to the yield curve and volatility curve can 
be done using purely numeric procedures. For 
example, one could iterate to find R(f) period- 
by-period until all the coupon bonds or 
zero-coupon bonds (used as input) are priced 


exactly. Alternatively, one could find approxi¬ 
mate formulas and build a faster, but approxi¬ 
mate scheme. 

Despite its past popularity, the BK model's 
main assumption, the rate's lognormality , is not 
supported by the recent rate history. The volatil¬ 
ity parameter a entering the BK model is not 
the same as the Black volatility typically quoted 
for swaptions or LIBOR caps. For example, se¬ 
lecting a = 0.15, a = 0 does not ensure 15% 
volatility even for European options on short 
rates (caplets). Hence, calibration of the model 
to volatilities found in the option market is not 
an easy task. 


The Flesaker-Hughston Model 

The Flesaker-Hughston model (FH) is an in¬ 
teresting model because it is different from all 
previously described ones in that it allows for 
computing the coupon rates analytically (see 
Flesaker and Hughston, 1996). The model starts 
with defining a random process M(f), which is 
any martingale starting from 1, and two de¬ 
terministic positive functions A(t) and B(t), de¬ 
creasing with time t. Then, at any point of time f, 
a zero-coupon bond maturing at T has its price 
in a rational functional form of M(f): 


A(T) + B(T)M(t) 
A(t) +B(t)M(t) 


(18) 


Taking the natural logarithm of this expres¬ 
sion, changing the sign, and dividing it by T — 
t gives us, of course, the zero-coupon rate. In 
order to derive a coupon rate c(f,T), let us recall 
that a coupon-bearing bond generates periodic 
payments at a rate of c and returns the princi¬ 
pal amount ($1) at maturity. Let us denote the 
time-f value of this bond as P c (t,T): 


n 

P c (t,T) = J2 c P(tAi) + P(t,T) 

i =1 


where f, are the timings of coupon payments, 
with t„ — T. To express the par coupon rate 
c, let us equate this P c (t,T) to 1 and substitute 
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postulated expression (18) for all discount 
factors: 


A(t) — A(T) + [B(f) — B(T)]M(t) 

el i im+B^Mm 

A(t) + B'(t)M(t) 

~ A(t)+ B(t)M(t) 


(19) 


Hence, all coupon rates and the short rate are 
too rational functions of M(f). If we select a pos¬ 
itive martingale process M(f); for example, a 
lognormal one, dM — aMdz, then all rates will 
stay positive. Functions A(t ) and B(t) can fit the 
initial term structure of rates and volatilities. 
(See Flesaker and Hughston, 1996, or James and 
Webber, 2000, for additional details.) 


Other Single-Factor Models 

There exists a fair amount of "named" models 
not mentioned in this entry thus far. They differ 
in specifications of drift and volatility functions. 
They include the Ho-Lee model, the Black- 
Derman-Toy model, and the Brennan-Schwartz 
model. We will briefly review some of them. 

A predecessor to the HW model, the Ho-Lee 
model (HL model) was offered as a discrete¬ 
time, arbitrage-free, model (see Ho and Lee, 
1986). Its continuous version is equivalent to the 
HW model with zero mean reversion. Hence, all 
analytical statements made for the HW model 
are valid for the HL model. 

The Black-Derman-Toy model (BDT model) 
is a lognormal short-rate model with endoge¬ 
nously defined mean reversion term equal to 
a\t)/a(t) (see Black, Derman, and Toy, 1990). 
This specification means that a constant volatil¬ 
ity leads to a zero mean reversion; a growing 
short-rate volatility function a(t) causes a nega¬ 
tive mean reversion, thereby destabilizing the 
process. Once very popular in financial indus¬ 
try, BDT was replaced by the BK model; both of 
these models are now recognized as outdated. 

The Brennan-Schwartz model is a pro¬ 
portional volatility, mean-reverting, short-rate 
model (see Brennan and Schwartz, 1979). In¬ 
troduced in 1979 as an equilibrium model, it 
has some similarity in its volatility specification 


to lognormal models; however, rates are not 
lognormally distributed. 

Calibration Issues 

The Vasicek model and the original Cox- 
Ingersoll-Ross model laid the foundation of 
term structure modeling. Despite their unques¬ 
tionable historical importance, traders almost 
never employ them today. The reason is fairly 
simple: Built with constant parameters, these 
models can't be calibrated to the market accu¬ 
rately enough. The extensions, known as the 
Hull-White ("extended Vasicek") model and 
the extended CIR model, allow for selecting 
time-dependent functions a(t), a(t), and 0(t) so 
that the model produces exact or very close 
prices for a large set of widely traded fixed 
income instruments, ranging from option-free 
bonds (or swaps) to European ("vanilla") op¬ 
tions on them and more. In particular, function 
6(t) [or R(f)] is normally selected to fit the entire 
option-free yield curve as formula (5) demon¬ 
strates. In contrast, functions a(t), a(t) are usu¬ 
ally found to match prices of European options. 
For example, using just a pair of constants (a, a) 
one can match exactly prices of two options, for 
example, a 1-year swaption on the 2-year swap 
and 10-year swap. Clearly, we can match many 
more expiration points if we make a(t), cr(f) time 
dependent. In some systems, volatility function 
is allowed to be time dependent, but mean re¬ 
version remains a positive constant. This way, 
one can fit options' expiration curve only on 
average, but the model remains stable and ro¬ 
bust. Note that a negative mean reversion may 
destabilize the dynamic process. 

As we pointed out, single-factor models 
possess various degrees of analytical tractabil- 
ity. When using the HW model, a large 
portion of calibration work can be done 
analytically—starting from formula (5). The 
CIR model and the SqG model are somewhat 
analytical, but, practically speaking, require nu¬ 
merical solutions to ordinary differential equa¬ 
tions. The BK model has no known solution 
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at all. A lack of analytical tractability doesn't 
preclude using numerical methods or efficient 
analytical approximations that are beyond this 
entry. 

Single-factor models can't be calibrated to all 
market instruments. For example, each of the 
models we have considered thus far creates cer¬ 
tain dependence of a European option's value 
(hence the implied volatility) on an option's 
strike known as volatility skew. Once a model 
is selected, luckily or not (see the next section), 
the skew implied by it cannot be changed by the 
model's parameters. Another problem is that 
all rates are perfectly correlated in any single¬ 
factor model. Hence, none of them can replicate 
values of "spread options" or "curve options," 
that is, special derivatives that are exercised 
when the yield curve flattens or steepens. The 
solution may lie in using multifactor models as 
discussed further in this entry. 

WHICH MODEL IS BETTER? 

The HW model, the CIR model, the SqG model, 
and the BK model are special cases of a more 
general class of "CEV models" introduced in 
1980s: 

dr = (Drift)dt + ar v dz (20) 

Parameter y is called constant elasticity of 
variance (CEV). For y = 0 we may have the 
HW model; for y = 0.5, the CIR model or the 
SqG model; for y — 1, the BK model. There ex¬ 
ist no specific economic arguments supporting 
the r y functional form for volatility. Often, the 
CEV constant lies between 0 and 1, but it is not 
necessary. 

Measuring Volatility Skew 

Blyth and Uglum (1999) linked the CEV con¬ 
stant to the volatility skew; that is, its depen¬ 
dence of the Black volatility (also called implied 
volatility) on the option's strike, found in the 
swaption market. They argue that market par¬ 
ticipants should track the Black volatility ac¬ 


cording to the following simple formula: 


where uj; is the Black volatility for the option 
struck at K, ap is the Black volatility for the "at- 
the-money" option struck at today's forward 
rate, F. Importantly, one can recover the best 
CEV constant to use in the model by simply 
measuring the observed skew. 

The skew measured for the 5-year option 
on the 10-year swap quoted for the period of 
1998 to 2004 suggests y — 0.14 being optimal, 
on average. This means that the most suitable 
model lies between the HW model and the 
CIR/SqG model (Figure 1). It is also seen that 
low-struck options are traded with a close-to- 
normal volatility, while high-struck options are 
traded with a square-root volatility profile. This 
fact may be a combination of the "smile" effect 
discussed at the end of this entry and the broker 
commission demand. As shown a little further, 
the square-root volatility specification becomes 
very suitable in a low-rate environment. 

The most recent tendency has been clearly to¬ 
ward y — 0, that is, normality (Figure 2), thereby 
making the HW model the best single-factor 
model choice currently. Note that neither the 
rate history of the 20-year period from 1991 to 
2010, nor the available swaption volatility skew 
data support lognormality, although earlier rate 
history did appear to support y > 1. 

Using the Volatility Index 

To compare rate models, it is useful to design a 
market volatility index —a single number reflect¬ 
ing the overall level of option volatility deemed 
relevant to the interest rate market. Levin (2004) 
describes a method of constructing such an 
index by first designating a family of at-the- 
money (ATM) swaptions ("surface"); that is, op¬ 
tions on swaps struck exactly at current forward 
rate. Then, assuming zero mean reversion, one 
can optimize for the single short-rate volatility 
constant a ( volatility index) best matching the 
swaptions' volatility surface, on average. This 
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Figure 1 Implied Volatility Skew on 5-Year-into-10-Year Swap (1998-2004 Average) 

‘Source of actual volatility: Bank of America; volatility for 200 bps ITM/OTM was not quoted. 


measure is model-specific; unlike some other 
volatility indexes, it is not a simple average of 
swaption volatilities. The internal analytics of 
each model, exact or approximate, are used to 
translate the short rate volatility constant into 
swaption volatilities used for calibration. Note 
that this constant-volatility, zero mean rever¬ 


sion setup is employed only to define the in¬ 
dex; it is not a recommended setup for pricing 
complex instruments. 

Figure 3 depicts the history of three volatil¬ 
ity indexes (sigmas) computed from the begin¬ 
ning of 2000 for the HW model, the BK model, 
and the squared Gaussian model. Each index is 



Figure 2 Historical CEV Values 
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Deviation from the 
Starting Point 



Figure 3 Which Volatility Index Is Most Stable? 


calibrated to the same family of equally 
weighted ATM swaptions on the 2-year swap 
and the 10-year swap with expirations ranging 
from 6 months to 10 years. We add for compar¬ 
ison a line for the 7-year rate level, and scale all 
four lines so that they start from 1.0. 

Figure 3 strongly confirms the normalization 
of the interest rate market; the volatility index 
constructed for the HW model has gradually 
become the most stable one. For example, the 
swap rate plunged a good 60% between Jan¬ 
uary 2000 and June 2003, but the HW volatility 
index barely changed. The two other models 
produced volatility indexes that looked mirror- 
reflective of the rate level (the lognormal model 
does by far the worst job). A similar observation 
applies to the 2007-2010 period. 

Interestingly enough, the SqG index was sta¬ 
ble for most of 2003 and could handle the 
record-setting rate plunge. This confirms that 
the square root volatility pattern may outper¬ 


form others when the rates are very low. These 
findings are consistent with the swaption skew 
measures we have discussed. This is not a coin¬ 
cidence at all. People who set the market for the 
ATM swaptions are the same ones who trade 
out-of- and in-the-money options. 

In the sections to follow we will discuss how 
to extend the short-rate modeling framework to 
multifactor models and jump-diffusion models, 
which are often constructed in so-called affine 
analytical form. 

ADDING A SECOND FACTOR 
TO SHORT-RATE MODELS 

Let us consider a fixed income instrument that 
pays floating coupons indexed to some short 
rate (such as the 3-month LIBOR). The payer 
does not want to pay too much in case the 
curve inverts, so a cap is established equal 
to the level of some long, say 10-year, rate. 
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How much is this cap worth? Practically speak¬ 
ing, the curve's inversion is not so rare a phe¬ 
nomenon of the fixed income market. However, 
if the initial curve is steep, we will greatly un¬ 
dervalue the cap using any of the single-factor 
models described above. This example high¬ 
lights the limitation of single-factor modeling: 
All rates change in unison. Instruments that 
contain "curve options," that is, asymmetric re¬ 
sponse to a curve's twist or butterfly moves, 
cannot be valued using single-factor term 
structures. Much more complex examples re¬ 
quiring multifactor modeling include Ameri¬ 
can or Bermudan options, certain collateralized 
mortgage obligations (CMOs) that are much 
shorter or longer than the collateral itself. 

Mathematically, a two-factor normal model 
can be constructed in a fairly simple way. 
Suppose that, instead of having one auxiliary 
Gaussian variable x(t), we have two, X\ (t) and 
X2 (f), that follow linear stochastic differential 
equations: 

dx i = —a\(t)x\dt + (Ti(t)dzi 

dx2 — —Cl2(t)X2dt + (J2(t)dZ2 ( 22 ) 

Brownian motions Zi(f) and Z 2 (f) may have cor¬ 
related increments, corr[dz\,dz 2 \ = p. Let us as¬ 
sume that p is equal to neither +1 nor -1, and 
mean reversions 01 (f) and 02 (f) are positive and 
not identical to one another. These conditions 
ensure that the system (22) is stable and cannot 
be reduced to single-factor diffusion. 

We now define the short rate simply as r(t) = 
R(t) + X\ (f) + X2 (f) where deterministic function 
R(t) is chosen to fit the initial yield curve. The 
short rate will be normally distributed; it can 
be shown that such a model possesses analyti¬ 
cal tractability similar to the Hull-White single¬ 
factor model, see Levin (1998). In particular, the 
calibrating function R(t) can be computed in a 
closed-end form given the forward curve,/(f). 
The long zero-coupon rates are linear in X\(t) 
and X2 (f), 

r r (f) = A(t, T) + B 1T (t)x,(t) + f> 2 r(f)* 2 (f) 


Functions B's depend on time f only if the mean 
reversions 0 's do. If 0 's are constant, then B's 
depend only on maturity T and have a familiar 
form: B, r = (1 — e _fl,r )/ 0 ; T, i— 1 or 2. 

The normal deviates, X\ (f) and zc 2 (f), bear 
no financial meaning. However, we can com¬ 
plement the short rate with an independent 
"slope" variable, v = x\ + f J >x 2 with 

/ = -cri(cri + po 2 )/o 2 (o 2 + po\) ^ 1 

The new variable has increments dv mathe¬ 
matically uncorrelated to dr; it therefore can be 
interpreted as the driver of long rates indepen¬ 
dent of the short rate. The underlying processes, 
X] (f) and X 2 (f), can be transformed differently, 
thereby creating a pair of state variables with 
desired financial meanings, see Levin (2001). 
Levin (1998) developed a three-point calibra¬ 
tion method that analytically computes param¬ 
eters of the two-factor model using volatility 
of and correlation between the short rate and 
two arbitrary long rates. The method allows for 
constructing term structure models with inter¬ 
rate correlations selected by the user and main¬ 
tained steadily over time. The latter property 
can be achieved by constructing a model with 
constant mean reversion parameters a\ and 112 , 
and a constant oi(t)/ 172 (f) ratio. 

Interestingly enough, all stable two-factor 
normal models having two real eigenvalues 
can be presented in the above-written form. 
Hull and White (1994) introduced a two-factor 
model that was designed in the form of a single¬ 
factor HW model for the short rate (factor 1) 
with a random long-term equilibrium rate (fac¬ 
tor 2). Their approach draws on Brennan and 
Schwartz (1979). It is now clear that such an 
appeal to the financial meaning was unneces¬ 
sary, and the general mathematical approach is 
as good or even better. 

If we transform X\ (f) and x 2 (t) nonlinearly, 
we will get multifactor versions of other pre¬ 
viously considered models. For example, we 
could define the short rate as r(t) = R(f)exp[xi(f) 
+ x 2 (f)], thereby creating a two-factor lognor¬ 
mal model. As one would expect, these models 
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inherit main properties of the single-factor par¬ 
ents, but add a greater freedom in changing the 
curve's shape and calibrating to volatility and 
correlation structures. 


THE CONCEPT OF AFFINE 
MODEEING 

Affine modeling is a term introduced by Duffie 
and Kan (1996). It is a class of term struc¬ 
ture models, often multifactor, where all 
zero-coupon rates are linear functions of fac¬ 
tors. Therefore, the zero-coupon bond pricing 
has an exponential-linear form. Let us revisit the 
general stochastic model given by 

dr = (Drift)dt + (Volatility)dz 


Duffie and Kan showed that the model will be 
affine if drift and the square of volatility are 
both linear in rate r, or, more generally, in all 
market factors. In order to illustrate the main 
idea, let us denote the drift term as n(x,t), the 
volatility term as er(x,f), and assume for the sake 
of simplicity that r — x, the lone market factor. 

Every financial derivative satisfies a partial 
differential equation, see Duffie (1996). The left- 
hand side of this equation is equal to the invest¬ 
ment's arbitrage-free expected return, which is 
the product of price ( P ) by the short rate (r). The 
right-hand side collects all the terms arising in 
the course of random behavior of P(x,f): the de¬ 
cay, the drift, the diffusion, and cash received. 
In particular, a zero-coupon bond receives no 
cash; its equation is 


rP(x, t) = 


3 P(x, t) 
3 1 


/z(x, t) 


3 P(x, t ) 


dx 


1 - d 2 P(x, t) 


(23) 


subject to the terminal condition, P(x,T) — 1 
(bond pays sure $1 at maturity regardless of 
the market conditions). Suppose now that func¬ 
tions /i(x,t) and <x 2 (x,f) are linear in x: 


p(x, f) = ai(t) + a 2 (t)x; 
a 2 (x, t) = fi (f) + p 2 (t)x 


It turns out that the solution to equation (23) 
will have an exponential-linear form: 

P(x, t) = exp [a(t, T) + b(t, T)x] 

To prove this conjecture, we place the above 
expressions into equation (23), take all deriva¬ 
tives, and observe that all the terms are either 
independent of x or linear in x. Collecting them, 
we get two ordinary differential equations 
defining unknown functions a(t,T) and b(t,T): 

K(t, T) = —a 2 (t)b(t, T) - T) + 1 

b(T, T) = 0 (24) 

aft, T) = - ai (t)b(t, T) -\pi(t)b 2 (t, T) 
fl(T,T) = 0 (25) 

The terminal conditions for a(t,T ) and b(t,T) 
are dictated by the terminal condition for the 
price function, P(x,T) = 1. Note that equation 
(24) defines function 3(f,T); once it is solved, we 
can solve (25) for a(t,T). 

It is clear that the HW model and the CIR 
model we considered earlier in the entry were 
affine. Indeed, in the HW model, p 2 is zero, a 2 
is —a, P\ is a 2 , and (24) becomes a linear dif¬ 
ferential equation. In the CIR model. Pi is zero, 
a 2 is again —a, and p 2 is ct 2 ; (24) becomes the 
Ricatti equation (10). In fact, these two models 
cover all most important specifications of the 
affine modeling, for the single-factor case. The 
concept of affine modeling lets us build mul¬ 
tifactor models systematically. The two-factor 
Gaussian model we introduced above was 
affine, too. Much more complex three-factor 
affine models were analyzed by Balduzzi et al. 
(1996) and by Dai and Singleton (2000). Among 
early works we should mention the model of 
Longstaff and Schwartz (1992). In their model, 
both the short rate and its volatility are affine in 
two factors that follow CIR-like processes. 


The Jump-Diffusion Case 

All term structure models considered thus far 
are based on diffusion —a continuous random 
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disturbance known as Brownian motion 
(Wiener process), z(f). Short rates are some¬ 
what jumpy and may require an addition of 
the Poisson process for modeling. The jump- 
diffusion extension to the affine modeling con¬ 
cept has been considered by many researchers 
(see Duffie and Kan, 1996; Das et al., 1996; and 
Das, 2000). The key point is that, under certain 
conditions, addition of jumps does not change 
the complexity of the problem; long rates re¬ 
main affine in factors and even equation (24) 
for b(t,T) remains unaffected. 

Under the presence of jumps, the main 
stochastic differential equation for the short rate 
(or other market factors) gets an additional term 
as shown below: 

dr = (Driftjdt + (Volatilityjdz + (Jump VolatilityjdN 

where N is the Poisson-Merton jump variable 
having intensity of X. When a jump occurs, 
dN is drawn from the standard normal distri¬ 
bution N[o,ij; it stays 0 otherwise. Continuing 
our affine-model notational style, let us denote 
the jump volatility term as <jj(t) and the jump 
intensity as X(x,t). Note that we allow jump's in¬ 
tensity, but not the size, to be factor dependent. 

With jumps, the partial differential equation 
(23) will get one additional term to its right- 
hand side. If a jump of size S occurs, the price 
of a zero-coupon bond, P(x,t) before the jump, 
will become P(x + S,t). The expected change of 
price can be written as 

OO 

J [P(x + 8,t) — P(x, t)]n [0 , aj ](8)d8 

—OO 

where, as usual, n denotes a normal density 
function. This expression captures the random¬ 
ness of the jump's size, not the randomness of 
the jump's occurrence. Multiplying it by the 
probability of a jump to occur between f and 
t + dt (that is, Xdt) we get the cumulative ex¬ 
pected effect of price change. Finally, dividing 
by dt we get the annualized return component 
caused by the jumps. Therefore, the partial- 


differential equation (23) will now become a 
partial integral-differential equation: 


rP(x, t) = 


St 


+ t)~ 


Sx 


+ r (xJ) ~ s X 2 


+X(x,t) J [P{x + S,t)-P(x, t)]n [0jaj] (S)dS 


( 26 ) 


For the diffusion case, we required functions 
p(x,t) and a 2 {x,t) to be linear in x. Let us extend 
this condition to the jump's intensity: X(x,t) = 
y i(f) + y 2 (t)x. It turns out that the exponential- 
linear form P{x,t) = exp[«(f,T) + b(t,T)x ] still 
fits the equation. Again, collecting terms, we 
get two ordinary differential equations defining 
unknown functions a(t,T ) and b(t,T): 

b' t (t, T) = -a 2 (t)b(t, T) - \m)b 2 (t, T) 

-Y2(t)[ei b2(t ’ T)a i w - 1] + 1 
b(T, T) = 0 (27) 

a'(t, T ) = - ttl ( t)b(t, T) - ifr (t)b 2 (t, T ) 

-yi(t)[ei bHt ’ T)a f {t) - 1] 
a(T,T) = 0 (28) 

Notably, equation (27) defining function b(t,T ) 
will coincide with previously discussed equa¬ 
tion (24) if y 2 = 0. If we have a single-factor 
model, the linear relationship between long 
rates and the short rate will have a slope of 
b(t,t + T)/T. This slope, found for an affine dif¬ 
fusive model, won't change if we add jumps of 
factor-independent intensity and size. Hence, 
in such affine models, jumps and diffusions 
are equally propagated from the short rate to 
long rates. Knowing that actually observed long 
rates are chiefly diffusive and the short rate is 
notably jumpy, one can conclude that the jump- 
diffusive setting makes more practical sense 
within the frame of multifactor modeling. 

Using jump-diffusion models may be re¬ 
quired when valuing options struck away from 
the current forward rate (that is, the ATM 
point). Aside from the volatility skew, option 
pricing features volatility smile, or simply an 
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A. 1-month Expiration on Various Swap Tenors 
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C. 2-year Expiration on Various Swap Tenors 

Figure 4 Daily Normalized Volatility Smile for Traded Swaptions (bp / day) 
Data are courtesy of Bear Stearns, January 2007. 

excessive convexity in er^. Revisiting Figure 1, 
one can notice that the actual dependence of 
volatility on the strike is more convex than 
even the optimal CEV model predicts. This is 
the smile effect, albeit fairly moderate for op¬ 
tions on long rates. Smiles for options on shorter 
rates are very apparent, especially for short ex¬ 


pirations. Figure 4 depicts swaption volatility 
measured in basis points per day, as a function 
of strike. 

In this normalized scale, all panels of 
Figure 4 exhibit similar volatility skews, the 
ones close to normal (CEV = 0). However, the 
smile effect looks very different in panels A, B, 
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and C; it clearly fades with maturity of the un¬ 
derlying rate and the option's expiry. The pres¬ 
ence of jumps fattens the distribution tails and 
inflates out-of-the money or in-the money op¬ 
tion values relatively to the ATM values. There¬ 
fore, jump modeling can capture the smile effect 
and explains its dependence on the swap's ma¬ 
turity and the option's expiry: Jumps allowed 
to occur over a longer time horizon look more 
like diffusion. 


KEY POINTS 

• The concept of short-rate modeling serves as 
a foundation for the fixed-income derivatives 
market. 

• Short-rate models can be single- or multifac¬ 
tor, but their central object is a theoretical risk¬ 
free rate. Models employed in the financial 
markets have to be calibrated to the initial 
yield curve and simple options; some models 
let us solve this task analytically. 

• There are a number of single-factor models 
that differ with respect to their distribution 
of rates, interrate relationships, and ability to 
fit the swaption market; the Hull-White (nor¬ 
mal) model seems to fit the observed volatility 
skew the best. 

• A two-factor normal model can be con¬ 
structed by borrowing the recipes of so-called 
"affine" models; such a model can be used to 
price complex derivatives that are asymmet¬ 
rically exposed to changes in the yield curve's 
shape. 

• With jumps included, models can be em¬ 
ployed to capture volatility "smile," that is, 
value options struck far out-of- or in-the- 
money. 
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Abstract: The term structure of interest rates represents the cost of (return from) borrowing (lending / 
investing) for different terms at any one moment in time. The term structure is most often specified 
for a specific market such as the U.S. Treasury market, the bond market for double-A rated financial 
institutions, the interest rate market for LIBOR and swaps, and so on. The term structure is usually 
specified via a rate or yield for a given term or the discount to a cash payment at some time in the 
future. These are often summarized mathematically through a wide variety of models. In addition, 
term structure models are fundamental to expressing value and risk, and establishing relative value 
across the spectrum of instruments found in the various interest-rate or bond markets. Static models 
of the term structure are characterizations that are devoted to relationships based on a given market 
and do not serve future scenarios where there is uncertainty. Standard static models include those 
known as the spot yield curve, discount function, par yield curve, and the implied forward curve. 
Instantiations of these models may be found in both a discrete- and continuous-time framework. 
An important consideration is establishing how these term structure models are constructed and 
how to transform one model into another. 


The objective of this entry is to describe the prin¬ 
ciples and approaches for a deterministic model 
of the term structure of interest rates. This is 
done first in a discrete-time setting, followed by 
a more analytical development in a continuous¬ 


time setting. We provide an eclectic mixture of 
ideas from the academic literature in concert 
with adaptations well known to practitioners. 

Computational implementation of anything 
as complex as interest rate term structure 
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models naturally engenders the rigorous ad¬ 
herence to, yet clever application of, some ar¬ 
cane ideas from software/system engineering. 
These are beyond the scope of this introduc¬ 
tion, but such topics include numerical recipes; 
mechanisms to ensure internal consistencies 
during development and build-up; tests for in¬ 
ternal consistency, verification, and validation 
of completed applications (e.g., put-call parity, 
cash-and-carry arbitrage, and others); parame¬ 
terization of models and applications from the 
markets; and the utility of advanced computer 
architectures. 

A deterministic approach to the term struc¬ 
ture of interest rates (or simply, the term 
structure) may be appropriately thought of as a 
static modeling approach. This is distinguished 
from a dynamic model of term structure. The 
chief distinction is that in a static term structure 
model, no accommodation is made of the 
course of interest rates over time. On the other 
hand, a dynamic model explicitly incorporates 
how interest rates change over time and there¬ 
fore needs to admit a notion of uncertainty in 
considering the future course of interest rates. 
The following discussion will concentrate on 
static models. First, we address a taxonomy 
for term structure models in some additional 
detail. 

INTRODUCTION TO TERM 
STRUCTURE MODELING 

The term structure of interest rates (or term struc¬ 
ture) is simply a price or yield relationship 
among a set of securities that differ only in the 
timing of their cash flows or their term until ma¬ 
turity. These securities invariably have a speci¬ 
fied set of other attributes in common so that the 
study of the term relationship is meaningful. 

It is common to think of the term structure 
as consisting of the current-coupon U.S. Trea¬ 
sury issues only. This restriction is not necessary 
since it is possible to define other term struc¬ 
tures derived from other securities. For exam¬ 


ple, it is meaningful to define the term structure 
of sets of coupon or principal Treasury strips. 
Other examples include off-the-run Treasury is¬ 
sues, agency debentures, LIBOR/interest-rate 
swaps, or the notes of single-A rated banks 
and finance companies. The set of securities 
used to define a term structure is called the 
reference set. A market sector (sometimes re¬ 
ferred to as a market or a sector) consists of all 
those instruments described by a specific term 
structure. There is the market sector of coupon 
or principal Treasury strips, off-the-run Trea¬ 
suries, agency debentures, interest-rate swaps, 
and single-A rated banks and finance compa¬ 
nies, and so forth. Very often, the reference set 
for a market sector may have restrictions on 
the structure (noncallable only), liquidity (re¬ 
cent issues only), or price (close to par only) of 
the securities that make up the set. 

The relationship expressed by the term 
structure is traditionally the par-coupon yield 
relationship, hence the terminology: yield curve. 
This also is not a necessary restriction. In gen¬ 
eral, the term structure could be the discount 
function, the spot-yield curve, or some other 
expression of the price or yield relationship 
between the securities. Given the widespread 
usage of the (par) yield curve for the Treasury 
market, it is not surprising that many market 
sectors are defined from a reference set derived 
from the Treasury market. For example, the 
reference set that defines the agency debenture 
market is a set of yield spreads to the on-the-run 
Treasuries, so that a 5-year debenture issued by 
an agency may be priced at par to yield 15 basis 
points more than the current 5-year Treasury 
issue. If the Treasury issue is trading at a 6.60% 
yield to maturity, the par priced agency issue 
has a 6.75% coupon. By inference, from the 
spread quote of 15 basis points, the reference 
yield for the 5-year term is 6.75%. Similar 
statements can be made for the interest-rate 
swap and the corporate bond markets. 

It needs to be emphasized that the reference 
set of bonds used to define the term structure 
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of interest rates and the resulting term structure 
itself are not one and the same. Indeed, the term 
structure, as a complete description of the entire 
yield curve, ultimately can be used to analyze 
all manner of option-laden, index-amortizing 
swaps or debentures that are in the same mar¬ 
ket sector. The "vanilla" reference set consists 
of individual bonds that are used mainly to de¬ 
fine the term structure or to derive its defining 
relationships—spot-yield curve, spot-rate pro¬ 
cess, discount function, and the like. 

Theories about the term structure of interest 
rates fall into two categories: 

• Qualitative theories seek to explain the shape of 
the yield curve based on economic principles. 
Three theories attract the widest attention: the 
expectations, liquidity preference, and preferred 
habitat (or hedging pressure) theories. 

• Quantitative theories seek to mathematically 
characterize the term structure (often in har¬ 
mony with one of the qualitative theories). 

Usually, a quantitative theory about the term 
structure of interest rates culminates in a math¬ 
ematical model, a term structure model that 
exhibits useful properties. Specifically, a term 
structure model is the mathematical represen¬ 
tation of the relationship among the securities 
in a market sector. This formalizes the distinc¬ 
tion between the reference set used to define a 
market sector and a term structure model. 

TERM STRUCTURE MODELS 

The simplest and most familiar term structure 
model is the (semilogarithmic) graph of the 
U.S. Treasury yield curve (once found daily 
in the Wall Street Journal and in the business 
section of many newspapers). This model is 
useful mainly as a visualization of the yield 
relationship between the most recently issued 
shorter-term Treasury instruments and bonds. 
The graph can be characterized by a mathe¬ 
matical equation and is one example of the set 
of interpolation models of the term structure. 


These "connect-the-dots" models can be useful 
in providing a quantitative way to price bonds 
outside the current-coupon Treasury issues, but 
their utility is rather limited. Bonds that are val¬ 
ued through a linear-interpolation technique 
may not be "fairly" valued in the sense that 
an average yield may not be equal to the "par- 
coupon" yield corresponding to the same date. 
Later we provide a discussion of how the par- 
coupon curve is constructed to be fairly valued 
in comparison to the set of reference (Treasury) 
issues. 

The term structure model as described above 
simply provides a snapshot of the relationship 
between the yields for selected Treasury maturi¬ 
ties on a given day. It is often required that term 
structure models exhibit additional "analytic" 
properties. One such property is the consistency 
associated with the preclusion of riskless arbi¬ 
trage when the term structure model is used for 
pricing. More will be said about this later. For 
now, it is intended merely to indicate that the 
"visualization" of the yield relationship to term 
may be neither completely useful nor adequate. 

More generally, term structure models are 
called on to describe the evolution of a set of 
interest rates over time. This motivates the fol¬ 
lowing distinction in classifying term structure 
models: 

• Static models of the term structure offer a 
mechanism to establish the "present value 
of a future dollar" in a deterministic econ¬ 
omy. That is, no allowance for uncertainty 
or interest-rate volatility is explicitly incor¬ 
porated into the model. 

• Dynamic models of the term structure, in con¬ 
trast to static models, explicitly allow for un¬ 
certainty in the future course of interest rates. 

Ideally, a dynamic model of the term struc¬ 
ture should have useful static models embed¬ 
ded within. That is, with no contingency on 
the receipt of a future cash payment or when 
there is an assumption of negligible volatility, a 
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dynamic model should correspond to a consis¬ 
tent static model. 

The essence of term structure modeling is the 
process of converting the market description of 
a sector's reference set (the data) into a math¬ 
ematical set of relationships that characterizes 
all issues in a sector. This is by no means trivial 
to do correctly. For example, the same model 
that correctly values a note in the Treasury mar¬ 
ket should also correctly value an option on 
that note, the futures contract into which that 
note may be deliverable, and an option on that 
futures contract. It should also reveal if the 
traded basis on that note is rich or cheap rel¬ 
ative to the cash, futures, and options markets. 
It should also be able to describe any strip¬ 
ping or reconstitution opportunities between 
coupon and principal strips and the cash mar¬ 
ket. These analyses should not be the result of 
several models, but of a single term structure 
model. 

A key element of the modeling process is to 
eliminate distinguishing characteristics associ¬ 
ated with each constituent of the reference set. 
For example, in the on-the-run set of Treasury 
issues, there are bills as well as notes and bonds. 
The bills have different conventions for day 
counting, pricing, and yield expression from 
those of the coupon-paying issues of the sec¬ 
tor. These characteristics need to be removed 
prior to developing the mathematical relation 
of the term structure model (as do the distin¬ 
guishing characteristics for notes and bonds). 
In this simple example, a model of the Treasury 
term structure might be the spot curve or the 
discount function, as opposed to a "connect- 
the-dots" model to which no yield adjustments 
have been made. 

The mathematical relationship of a term 
structure model can be used to characterize all 
issues in a sector. As is the case for the Treasury 
sector, every instrument can be considered a 
collection of zero-coupon bonds (the maturities 
of which correspond to the coupon/principal 
payment dates, the denominations of which 


correspond to the amount of coupon/principal 
paid). Accordingly, the discount function or 
equivalently, its corresponding spot-yield 
curve, furnishes a pricing technique for each 
zero-coupon bond and, therefore, for each of 
the instruments. With this insight, the utility of 
equivalence between the spot-yield curve and 
discount function, which are derived from the 
original reference set, is readily apparent. 

We begin with the familiar, discrete-time mod¬ 
eling approach. That is, units of time quanta 
are defined (usually in terms of compounding 
frequency) and financial manipulations are 
indexed with integer, multiple periods. We 
continue to build on the discussion by intro¬ 
ducing the continuous-time analogies to the 
concepts developed for discrete-time modeling. 
Continuous-time modeling allows financial 
manipulations to be freed from discretization 
artifacts (such as compounding frequency) 
and provides an algebraic framework that 
more naturally and rigorously accommodates 
"rate" as a concept of change. In addition, this 
approach opens up a huge field of applicable 
mathematics with the attendant opportunity 
for abstraction. For example, continuous-time 
models free the analyst from artificial a priori 
assumptions about interest-rate lattices; allow 
concentration on the financial analyses at hand; 
defer time-step issues to final implementation 
of an algorithm; and let the analyst choose an 
approach based on convenience, speed, and 
accuracy. 

DISCRETE-TIME MODELS OF 
THE TERM STRUCTURE 

In the discrete-time framework, we introduce 
some fundamental concepts in term structure 
theory. These include the discount function, 
the spot rate and spot yield, and the forward 
rate. While these initially may appear to be 
esoteric in nature, they are in fact closely in¬ 
terrelated quantities that directly represent the 
term structure, or act to influence the course 
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of future interest rates in an arbitrage-free en¬ 
vironment. In this section these concepts are 
shown to be incorporated into the different ex¬ 
pressions that describe the various qualitative 
term structure theories, such as the expectation, 
preferred habitat, and liquidity preference hy¬ 
potheses. The continuous-time term structure 
model is evolved next from the same underly¬ 
ing premises as found in discrete time, thereby 
speeding the exposition. 

DISCOUNT FUNCTION 

The discount function incorporates market 
yield-curve information to express the present 
value of a future dollar as a function of the 
term to its receipt. As such, the discount func¬ 
tion is a valid expression of the term structure 
of interest rates by virtue of the price/yield 
relationship. Since the discount function is 
used to quantify the value of a future dollar, 
the discount function also provides a direct 
means to value a coupon-paying bond since 
the coupon and principal payments are simply 
scalar multiples of a single dollar. As a result, 
the discount function can be used as a refer¬ 
ence check for other quantitative term structure 
models. 

Quantitative term structure models ulti¬ 
mately deal with the analysis of pure discount 
bonds. (Discount bonds, or zero-coupon bonds, 
are the simplest types of bonds to analyze as 
there is only the repayment of par at maturity. 
Further, all other bonds can be built from a se¬ 
ries of discount bonds and options on discount 
bonds.) As a consequence of modeling the yield 
movements of discount bonds, term structure 
models describe their price movements since 
the price/yield relationship allows the term 
structure to be analyzed in terms of either price 
or yield. 

This relationship is addressed further later in 
this entry, in which the term structure model is 
expressed in terms of price as a function of rate 
and time. 



Figure 1 Discount Function 

If it is assumed that the discount bond pays 
one dollar at maturity, then the present value of 
the bond is some decimal fraction less than one. 
For a set of discount bonds of increasing matu¬ 
rities, there is the corresponding set of present 
values starting from approximately 0.999 and 
decreasing thereafter. This set of present values 
is called the discount function and is shown in 
Figure 1. 

The discount function is the term-to-maturity 
relationship of the present value of a future 
unit of cash flow. More formally, for a cash 
flow, CF, received after a term, T, from today, 
t, the present value, PV, of that cash flow is 
discounted, d, from the future value CF as ex¬ 
pressed by the relation 

PV(t, T) = d(t , T) x CF(t, T) (1) 

where 

PV(t , T) = present value of the cash flow 
at t 

d(t , T) = discount at t for a cash flow 
received T after t 

CF(t , T) cash = flow received at t + T 

As we are able to generate the discount func¬ 
tion, d, for all terms-to-maturity, T, this can 
be a valid representation of the term structure 
of interest rates. Indeed, the discount function 
reflects the Treasury term structure when the 
discount function exactly reprices the current- 
coupon Treasury issues. 
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Deriving the Discount Function for 
On-the-Run Treasuries 

More generally, let — Pi(t) be the set of 

closing prices on (date) t for the set of current- 
coupon Treasury bonds (where the index, i, as¬ 
sociates a specific issue among several) 

P(t,3-month) : price of the 3-month 
(13-week) bill, at time t 
P(t,6-month) : price of the 6-month 
(26-week) bill, at time t 
P(t, 52-week) : price of the 1-year 
(52-week) bill, at time t 
P(t, 2-year) : price of the 2-year 
note, at time t 

P(t, 30-year) : price of the 30-year 
bond, at timet 

Each of these instruments has its own time se¬ 
ries of cash flows, each with its own individual 
term-to-maturity (or better, term-to-payment). 
For the Treasury bills, the cash flows and asso¬ 
ciated terms-to-maturity are 

3-month bill: CF(f, T(3-month, 1)), 

6 -month bill: CF(f, T(6-month, 1)), 
and for the periodic instruments, 

2-year note: CF(f, T(2-year, 1)), 

CF(t, T(2-year, 2)),..., CF(t, T(2-year, 4)), 

30-year bond: CF(t, T(30-year, 1)), 

CF(t, T(30-year, 2)), CF(t, T(30-year, 60)) 

The term to each of the cash flows, T(i, j) = 
Tj j, is specific to the instrument and the context 
of the notion of "today," t for the purpose of es¬ 
tablishing a present value. (In this sense, the de¬ 
pendence on f has been suppressed and it might 
be more precise to specify T asT (t;i, j), but we 
believe this to be unnecessary) The index j is 
the sequence of the cash flow in the time se¬ 
ries for security i. Finally, in general, cash flows 
only exist in a futures sense. If T(i,j) is less than 
zero (the cash flow has already been paid), then 


those j- cash flows are not included in the series 
(although this is not an issue for the current- 
coupon Treasury issues). 

The present value of a coupon-paying in¬ 
strument is simply the sum of the discounted 
present values of the cash flows that make up 
the coupon payments and the payment of prin¬ 
cipal. Accordingly, for the discount function to 
model the Treasury term structure (i.e., the mar¬ 
ket sector defined by the on-the-run Treasury 
reference set), the following equations must be 
simultaneously satisfied. In this way, the dis¬ 
count function will reprice the current-coupon 
Treasury issues. 

P(t, 3-month) = d(t, T(3-month, 1)) 

xCF(f, T(3-month, 1)) 

= d(t, T u ) x CF(t, T u ) 

P(t, 6-month) = d(t, T(6-month, 1)) 

xCF(t, T(6-month, 1)) 

= d(t, T 2ll ) x CF(t, T 2 ,i) 

4 

P(t, 2-year) = P(t, 3) = £i(f, T 3 , ; ) x CF(t, T xj ) 
7=1 
60 

P(t, 30 -year) = P(t , 8) = ^d(f, x CF (b T »./) 

7=1 

The last cash flow of each series consists of the 
principal payment and, for the notes and bond, 
one coupon payment. The solution to these si¬ 
multaneous equations furnishes many distinct 
points of term in which the discount function is 
defined; the long bond alone may have as many 
as 60 term points. Depending on the circum¬ 
stances surrounding each auction, there may 
be as many as over 90 distinct points of term 
defining the discount function. 

As with the earlier "connect-the-dots" model 
for the yield curve, in which the yield points 
were connected to generate intermediate 
values for the term structure, similar ideas can 
be used to accommodate the cash flows that do 
not fall on one of the terms, T(i, j), enumerated 
above. In fact, interpolation techniques using 
spline functions may be applied to create 
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a continuous discount-function curve, as in 
Oldrich and Fong (1982). 

The discount function forms the basis for 
the development of a term structure model, 
as will be developed further in later sections. 
As the discount function is an expression of 
the term structure based on price, there is no 
ambiguity of compounding periodicity, as with 
yield-based term structure models. The dis¬ 
count function simply expresses the nondimen- 
sional, fractional, present value of a unit cash 
flow to be received after some term. The term 
may be specified in a unit of time (e.g., years, 
months, or days) or in periods, in which the 
period length is a unit of time. 


SPOT YIELD CURVE 


With the assumption of a compounding con¬ 
vention (usually semiannual), the discount 
function can be used to derive the equivalent 
Treasury zero-coupon structure—sometimes 
referred to as the spot-yield curve. In this case, the 
spot-yield curve is an equivalent term structure 
representation based on yield that provides a 
view of the term structure that is more famil¬ 
iar. The equivalence between these two forms 
of the term structure is used later in this entry. 

The spot yield, R, is related to the discount 
function, d, through a price/yield relation. By 
definition, the present value at f, PV(t, n), of 
a cash flow received n periods in the future, 
CF(t, n), has the spot yield, R(t, n), through the 
relation 


PV(t, n) 


CF(t , n) 

[1 + R(t,n)] n 


( 2 ) 


We use the discrete notion of integer periods, 
with each period of length P, to keep the math 
simple at this point. The more general case of a 
noninteger world is treated when a continuous 
time model is introduced. 

Comparing equations (2) and (1) provides the 
relation between the spot yield and the discount 
function 


d(t, n) 


1 

[1 + R(t,n)] n 


(3) 


where 


d(t , n) = discount of a cash flow received 
n periods after t 

R(t , n) = n-period spot yield on t 


The spot-yield curve is just the set of spot 
yields for all terms-to-maturity. In contrast, the 
spot rate is simply the one-period rate prevail¬ 
ing on t for repayment one period later. In the 
above notation, the spot rate is denoted R(t, 1). 

We can generalize the earlier comment about 
coupon-paying bonds in terms of the set of spot 
yields. The present value of a coupon-paying in¬ 
strument is simply the sum of the discounted 
(present value) of the cash flows that make up 
the coupon payments and the payment of prin¬ 
cipal. The analogy to equation (2) for a coupon¬ 
paying bond using spot yields is 


PV(t , n) = 


CF(t , 1) 


CF(t, 2) 


[1 + R(t, 1)] [i + R(f, 2)] 2 
CF(t, n) 

+ '" + [l + R(t,n)] n 


(2a) 


Similarly, the analogy to equation (1) for a 
coupon-paying bond using the discount func¬ 
tion is given by 


PV(t, n) = d(t, 1) x CF(f, 1) + d(t, 2) x CF(t, 2) 
H-+ d(t,n) x CF(t,n) (la) 


IMPLIED FORWARD RATE 

A consequence of the discount function, spot 
yield, and spot rate is the immediate relation 
to the (implied) forward rates. The implied for¬ 
ward rate is the spot rate embodied in today's 
yield curve for some period in the future. The 
forward rate generally is regarded as an indi¬ 
cation of future spot rates in an arbitrage-free 
economy. In the absence of arbitrage and un¬ 
certainty, the future spot rate, by definition, is 
equal to the forward rate. In the arbitrage-free 
term structure model discussed later, it can be 
shown that the future spot rate continuously 
converges toward the forward rate as the spot 
rate evolves over time. 
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Specifically, the one-period forward rate, F, 
can be determined from the spot yields as fol¬ 
lows. Consider the one-period and two-period 
spot yields; the forward rate, F, may be found 
from 


(1 + R(t, 2)) 2 = (1 + R(t, 1)) x (1 + F (f, 1,1)) 

(4) 


where 


R(t, 2) = two-period spot yield on t 
R(t, 1) = one-period spot rate on f 
F(f, 1,1) = one-period forward rate one 
period from t 


This relation follows from the no-arbitrage as¬ 
sumption intrinsic in the concept of forward 
rates. The calculation of the forward rate pre¬ 
sumes that an investment today for two peri¬ 
ods provides the same return as a one-period 
investment today immediately rolled into an¬ 
other one-period investment one period from 
now. That is 


PV(t) = 


CF(t, 2) 

[1 + R(t, 2)] 2 


(5) 


PV(t) = 


CF(t, 2) 

[1 + R(t,2)] x [1 + F(f, 1,1)] 


( 6 ) 


By equating equations (5) and (6), equation 
(4) results. 


Deriving Forward Rates from 
Spot Yields 

Implied from the term structure, through the 
spot-yield curve, is a set of forward rates. These 
forward rates may be iteratively defined from 
the above and written as follows 

(1 + R(t, ri)) n = (1 + R(t, n - I))’ 1 " 1 
x (1 + F(f, 1, w — 1)) 

where in addition to the earlier notation, F(f, 1, 
n— 1) = one-period forward rate n— 1 periods 
from f, and noting, through substitution, that 

(1 + R(t, ft))" = (1 + R(t, 1)) x (1 + F(i, 1,1)) 
x(l + F(f, 1,2)) x • ■ ■ 
x (1 + F(f, 1, ft — 1)) (7) 


this furnishes the first ft — 1 one-period forward 
rates. 

The relation among spot yield, spot rate, and 
forward rates, equation (7), can be combined 
with equation (2) to furnish a method for calcu¬ 
lating the present value, at f, of a single H-period 
future cash flow based on a series of one-period 
forward rates 


PV(t, n) 


CF(t, n) 

[1 + R(t, 1)] x ■ ■ ■ x [1 + F(t, 1, n - 1)] 


( 8 ) 


Since the present value of a coupon-paying 
security is simply the sum of the discounted 
present value of the cash flows that make up the 
coupon payments and the payment of princi¬ 
pal (see equations (la) and (2a)), the analogy to 
equation (8) for determining the present value 
of a coupon-paying bond is 


PV(t, ft) 


CF(f, 1) 

[1 + K(U)] 

CF(t, 2) 

+ [l + R(t,l)] x [1 + F(f, 1.1)] + "' 
CF(t, m) 

+ [1 + R(t, 1)] x---x [1 + F(f, 1, ft — 1)] 


This equation may be used to define multi¬ 
period forward rates. 


Deriving Forward Rates from the 
Discount Function 


The discount function provides a direct method 
for generating forward rates. The one-period 
forward return ft —1 periods from f is obtained 
through the following 


1 + F(f, 1, « — 1) 


d(t, ft — 1) 
d(t, ft) 


(9) 


Equation (9) may be derived from earlier 
equations, or from the following argument that 
creates a synthetic forward position. For each 
unit of cash delivered w periods from today, f, 
we pay d(t, n ). We take a long position in this 
zero. We also short d(t, n)/d(t, n — 1) units of 
cash to be delivered n — 1 periods from f. For 
this we receive d(t, ft — 1) times d(t,n)/d(t,n — 1), 
or simply d(t, ft), units. There is no net change 
in our cash position today. After ft — 1 periods 
we pay out d(t, n) / d(t, ft — 1) and after n periods 
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receive one unit of cash. Thus the forward price 
per unit, FP, to be paid n — 1 periods from now is 


FP(f, l,n— 1) = 


d(t , n — 1) 
d(t, n) 


where 


FP(t, 1, n — 1) = forward price of a one-period 
unit of cash n — 1 periods from 
now 


The forward price then gives the forward one- 
period rate, n — 1 periods from t as 


TP(f, l,n - 1) 


1 

1 + F(f, 1, « — 1) 


Equating these results in equation (9). 


TERM STRUCTURE IN A 
CERTAIN ECONOMY 

As discussed earlier, term structure models de¬ 
scribe the evolution of interest rates over time. 
Often, future interest rates are expressed in 
terms of the future spot rate. If the future spot 
rate (or equivalently, the future rate of return 
on a bond) is known, the future term structure 
of interest rates may be found from the pre¬ 
viously established interrelationships between 
the spot rate and the discount function or spot 
yield. In fact, it is this relationship between the 
spot rate and the discount function that is used 
to motivate the formulation of the term struc¬ 
ture models described later as a function of the 
spot rate. As a precursor to a generalized term 
structure theory, we first discuss the ramifica¬ 
tions for a term structure in a certain economy. 
(In this context, "certain" refers to an economy 
with a lack of randomness, in other words, a 
lack of uncertainty.) 

If the future course of interest rates is known 
with certainty, then arbitrage arguments de¬ 
mand that future spot rates be identical to 
forward rates. In the notation presented in 
equation (7), this is equivalent to noting that 

( 10 ) 


for n = 1, 2, 3,... and where P is the term of 
the period. If this condition were violated, say, 
for example, 

F(t, 1, n) > R(t + nP, 1) 


then the same arbitrage argument may be made 
as before: If we buy the synthetic forward (this 
is a long position in a unit zero to be deliv¬ 
ered n + 1 periods from today, f); and short 
d(t,n + l)/d(t,n) units of cash to be delivered 
n periods from today, f, no cash changes hands 
today. However, after n periods, we pay the 
forward price, FP, 


FP(t,l,n-l) 


1 

1 + F(t, 1, n — 1) 


to receive one unit of cash after n + 1 periods. 
Also, after n periods, at t + nP, we sell the 
one-period unit zero for a price of 

1 

1 + R(t + nP, 1) 


We know we can do this since there is no 
uncertainty in the economy. If, as assumed, 
F(t,l,n) > R(t + nP, 1), then after n periods 
the long and short positions yield a positive 
net cash flow, or a riskless arbitrage, of 


l + R(t + nP,l) 1 + F(f, 1, n) 


after n periods with no uncertainty and with 
no net investment. Arbitrageurs will exploit the 
imbalance of the n-period forward rate with 
the spot rate n periods from now by continu¬ 
ing to buy the synthetic forward until demand 
outstrips supply. In this scenario, the synthetic 
forward price goes up, and the forward rate, 
F(t, 1, n), goes down to R(t + nP, 1)—with pre¬ 
dictable effect on d(t, n + 1) and/or d(t, n). On 
the other hand, if F (f, 1, n ) < R(t + nP, 1), we 
may reverse our positions, and the same ar¬ 
gument shows that F(f, 1, n) will increase to 
R(t + nP, 1). 

Using the no-arbitrage condition in a certain 
economy, equation (10), in the present value 
expression from the implied forward-rate ex¬ 
pression, equation (8) (which always holds ir¬ 
respective of assumptions about the economy). 


R(t + nP, 1) = F(f, 1, n) 
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we have, 
PV(t, n) 


CF(t, n + 1) 


[1 + K(f,l)]x [1 + R(f + P, l)]x. 

CF(t, n + 1) 

[1 + R ( f , n + 1)]" +1 


■ x [1 + R(f + jjP, 1)] 
( 11 ) 


This means that the certain return of hold¬ 
ing an n + 1 period zero until maturity is the 
same as the total return on a series of one-period 
bonds over the same period. Later we will dis¬ 
cuss the various forms of equation (11) from 
various qualitative term structure theories. 

Given equation (11), we have, at time P (one 
period) later. 


PV(t + P,n) 

CF(t, n + 1) 

~ [1 + R (t + P, 1)] x • • • x [1 + R(t + nP , 1)] 


so we find that the single-period return on a 
long-term zero is 


PV(t + P) 
PV(t) 


= 1 + R(t, 1) 


( 12 ) 


Since the term-to-maturity was not specified, 
equation (12) must be true for zeros of any ma¬ 
turity. That is, the return realized on every dis¬ 
count bond over any period is equal to one plus 
the prevailing spot rate over that period. This 
will be expanded upon later. 

Alternatively, we can use our relation for the 
discount function in equation (1), noting 


PV(t + T, n) = d(t + P, n) x CF(f, n+1) 


and 


PV(t , n) = d(t, n + 1) x CF(t, n + 1) 


and restate equation (12) in its discount- 
function based form: 


d(t + P, n) 
d(t, n + 1) 


= 1 + R(t, 1) 


TERM STRUCTURE IN THE 
REAL WORLD—NOTHING 
IS CERTAIN 

In the real-world economy, the future course 
of interest rates contains uncertainty. In at¬ 
tempting to deal with uncertainty, however, it 
would not be inconceivable that a belief in the 
efficiency of the market would prompt one to 
use the term structure and the relation between 
forward rates and spot rates as indicators of 
expectation about the future. Indeed, market 
efficiency states that prices reflect all available 
information bearing on the valuation of the 
instrument. Equilibrium supply and demand 
for fixed-income instruments reflect a market- 
cleared consensus of the economic future. As 
uncertainty represents a departure from this 
consensus, the expected equilibrium offers a 
natural starting point for analysis. 


Expectations Hypothesis 

The expectations theory of the term structure 
of interest rates offers a good starting point 
for dealing with an uncertain future. Actually, 
there is a whole family of expectations theories. 
Broadly, the expectations theory states that the 
expected one-period rate of return on an invest¬ 
ment is the same, regardless of the maturity of 
the investment. That is, if the investment hori¬ 
zon is one year, it would make no difference to 
invest in a one-year instrument, a two-year in¬ 
strument sold after one year, or two sequential 
six-month instruments. 

The most common form of this statement uses 
equation (10) as the basis for the theory. This 
is referred to as the unbiased expectations hy¬ 
pothesis, which states that the expected future 
spot rate is equal to the forward rate, or 

E [R(t + nP, 1)] = F(t + kP, 1, n - k) 


While these developments for the certain 
economy may appear trivial and obvious, they 
serve as a guide for modeling the term structure 
under uncertainty as well. 


for k = 0, 1 ,..., n — 1 , and where £[•] is the 
expectation operator. 

Using this relation, we find from equation 
(8) that the present value in an economy 
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characterized by unbiased expectations is 


PV(t, n) = 


CF(t, n+ 1) 

[l+R(U)]x{l+E[R(f+P,l)]} 

x • • • x {l+E [R(t+nP, 1)]} 


(13) 


Therefore, the unbiased expectations hypoth¬ 
esis concludes that the guaranteed return from 
buying an (n + 1) period bond and holding it to 
maturity is equivalent to the product of the ex¬ 
pected returns from holding one-period bonds 
using a strategy of rolling over a series of one- 
period bonds until maturity 
Alternatively, the return-to-maturity expecta¬ 
tions hypothesis is based on equation (11). Here 
we find that present value in such an economy 
is 


PV(t,n) = 


CF(f, n+ 1) 

E{[l+R(f,l)]x[l+R(f+P,l)] 
x x [l+R(t+nP, 1)]} 


(14) 


The return-to-maturity expectations hypoth¬ 
esis assumes that an investor would expect to 
earn the same return by rolling over a series of 
one-period bonds as buying an (n + l)-period 
bond and holding it to maturity 
The last version of the expectations hypothe¬ 
sis that we will mention (there are others) is the 
local-expectations hypothesis (or risk-neutral 
hypothesis). This hypothesis is based on equa¬ 
tion (12), or equivalently, its associated discount 
function-based equation. Under this hypothe¬ 
sis, the expected rate of return over a single 
period is equal to the prevailing spot rate of in¬ 
terest. Applying these expressions recursively 
gives 


PV(t) = 


E [PV(t + P)] 

[i + R(f. D] 


= E 


PV(t + 2P) 

[1 + R(f + P, 1)] x [1 + R(f, 1)] 


= CF(f, ii + 1) x E 


1 

[1+R(f, 1)] x [1+R(f+P, 1)] 
x ■ ■ • x [l+R(f+iiP, 1)] 

(15) 


Equations (13), (14), and (15) are clearly dif¬ 
ferent in that the coefficient of the cash flow. 


CF(f, n + 1), received n + 1 periods in the future 
is a different expression in each case. Further¬ 
more, by the principle from mathematical anal¬ 
ysis known as Jensen's inequality, only one of 
the expressions can be true if the future course 
of interest rates is uncertain. 

In fact, in discrete time, we find that bond 
prices given by the unbiased and return-to- 
maturity hypotheses are equal but less than that 
given by the expectations hypothesis. Although 
the three hypotheses are different, in discrete 
time, any of these hypotheses is an acceptable 
description of equilibrium. 

In the next section, term structure model¬ 
ing in continuous time is developed. Equa¬ 
tions (13), (14), and (15) have continuous-time 
analogs, which (as in discrete time) are dif¬ 
ferent from one another. This is again due 
to Jensen's inequality. Unlike in discrete time, 
however, only the local expectations hypothesis 
is acceptable as a statement of equilibrium be¬ 
cause the expected returns under each of these 
hypotheses are not consistent with those im¬ 
plied in a general equilibrium, as noted by Cox 
etal. (1981). 

Preferred Habitat Hypothesis 

Crucial alternatives to the expectations theory 
of the term structure of interest rates are theo¬ 
ries that add an element of risk when conferring 
the expected rate of return for bonds of different 
maturities; that is, the indifference assumption 
that was stated earlier no longer holds. If the 
investment horizon is one year, it does make a 
difference whether to invest in a one-year in¬ 
strument, a two-year instrument sold after one 
year, or two sequential six-month instruments. 
The preferred habitat theory argues that we first 
must know the investment horizon to deter¬ 
mine relative risk among bonds. In the simple 
example, the horizon is one year. The one-year 
instrument is safest for this horizon. Under the 
preferred habitat theory, the investor would re¬ 
quire a higher rate of return on both the two- 
year and six-month instrument. 
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Liquidity Preference Hypothesis 
The liquidity preference theory can be con¬ 
sidered a special case of the preferred habitat 
theory. Here, it is held that investors demand 
a risk premium as compensation for holding 
longer-term bonds. In addition, since the vari¬ 
ability of price increases with maturity, the risk 
premium demanded by investors increases. As 
a special instance of the preferred habitat the¬ 
ory, the liquidity preference theory says that as 
all investors have a habitat of a single period, 
the shortest-term bond is judged safest. 

With each of these theories, one can assess 
their efficacy only in the context of the gen¬ 
eral economy. Specifically, we assume that the 
economy is one in which investors have an in¬ 
clination to consume, as well as to invest (in 
fact, even in a diverse set of risky investments). 
With a specification of utility of consumption 
and wealth, as well as a formal expression for 
risk aversion, the risk-based term structure the¬ 
ories can be viewed in the context of markets. 
Given that risk-based term structure theories 
can be viewed in the context of a defined mar¬ 
ket, the following conclusions can be made. 

Term premiums are monotonic in maturity (or 
term). Interest-rate risk is inherently intertem¬ 
poral. That is, it is a multiperiod phenomenon, 
in which an unexpected interest-rate change at 
any period affects all future returns and risk 
compounds over time. The traditional notion 
of preferred habitat seems difficult to reconcile 
with real markets. As it turns out, the traditional 
notion omits the importance of risk aversion. As 
we incorporate a varying need to hedge against 
interest-rate changes, the theory converges to 
a more acceptable view of markets. The gen¬ 
eralization of these economic analyses has led 
to what has been called an eclectic theory of the 
term structure that recognizes and accommo¬ 
dates the many factors that play a role in shap¬ 
ing the term structure. Expectations of future 
events, risk preferences, and the characteristics 
of a variety of investment alternatives are all im¬ 
portant, as are the individual preferences (habi¬ 
tats) of market participants about the timing of 


their consumption. It is this eclectic theory that 
one needs to embrace in the development of the 
dynamic term structure. 

CONTINUOUS-TIME 
MODELS OF THE TERM 
STRUCTURE 

Now we discuss how the earlier concepts 
of discount function, spot rate, spot yield, 
and forward rate have their analogies in the 
continuous-time domain. It will be seen that 
while the mathematics are slightly more com¬ 
plex, the roles that each of these quantities play 
in the term structure of interest rates remain 
unchanged. 

In summary, the priced-based representation 
of the term structure, or the discount function, 
facilitates both the mathematical formulation 
of the problem and its subsequent solution. 
Once the term structure equation is solved ex¬ 
plicitly in terms of price, the price/yield equa¬ 
tion (in continuous time) is used to convert the 
term structure to its equivalent representation 
in terms of yield. 

Given the intertemporal nature of the term 
structure and the apparent efficiency of the mar¬ 
ket to incorporate information, it is assumed 
that the market acts instantaneously, and that a 
period in time is but an instant. This is the un¬ 
derlying premise for continuous-time models 
in economics and finance. 

Traditional fixed-income analysis assumes 
that compounding occurs at discrete points or 
over finite intervals, typically on a semiannual 
basis. However, as the compounding period 
grows ever shorter, discrete compounding is 
replaced by continuous compounding. We ex¬ 
pand our original equation (2) for the present 
value (at f), PV(t, T ), of a cash flow received T 
years from today, CF(f, T), which is invested at 
the spot yield, R(t, T ), to be 

PV(t , T) = CF(t, T)e~ TR(t ’ T) (16) 

Equation (16) is the fundamental price/yield 
relationship for the case of continuous 
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compounding of a discount bond and is 
the direct analog of the price/yield relation¬ 
ship shown in equation (2) for discrete com¬ 
pounding. 


DISCOUNT FUNCTION 

For a pure discount bond that pays one dol¬ 
lar at maturity, CF(f, T) = 1. Let P be the price 
of the pure discount bond. Thus equation (16) 
becomes 


P(t, T) = e~ TR ^ T) (17) 

Combining the above with equation (16), 
which equates the price of a discount bond to 
the discount function, we obtain 

P(t, T) = e~ TR(t ' T) = d(t, T) (18) 

Equation (18) provides an expression for the 
relationship between the discount function d 
and the spot yield R, and is the continuous-time 
analogy to equation (3). 

Spot Rate 

In the previous section, the spot rate was de¬ 
fined as the one-period rate of return. Under 
continuous compounding, the spot rate r is de¬ 
fined as the continuously compounded instan¬ 
taneous rate of return. Stated another way, the 
spot rate is the return on a discount bond that 
matures in the next instant. The spot rate is re¬ 
ally an expression of the concept that a discount 
bond with a specified term-to-maturity and 
yield is equivalent to a series of instantaneously 
maturing discount bonds that are continuously 
reinvested at a rate r until the final term T. This 
is discussed in the following section. 


where 


dW(t) = incremental increase in the value of 
the loan from time t to time t + dt 
W(t) = value of loan at time f 
r(t) = spot rate at time t 


To find the value of the loan W at maturity, 
integrate equation (19) 


r t+T DW(r) 
Jt W(r) 



W(t) = W(t + T) exp 



( 20 ) 


If W is a discount bond, W(t) is equal to the 
present value P(f, T) and the value of W(t + T) 
is one. Equation (20) is rewritten as 


P(f, T) = exp 



( 21 ) 


From equation (17), the price P is expressed in 
terms of its spot yield R. By equating (17) and 
(21), we obtain the following expression for the 
spot yield in terms of the spot rate 

1 r t+T 

R(t, T)= f J r(r)dr (22) 

Equation (22) is a general expression that 
always holds. 

Another view of the relationship between the 
spot yield and the spot rate is that instead of 
continuously reinvesting at the spot rate r for 
a fixed maturity T to obtain the spot yield R, 
if the term-to-maturity grows ever shorter, the 
spot yield R approaches the spot rate r "in the 
limit." r may be stated as 


Spot Yield 

If the spot rate is a known function of time, then 
a loan amount W that is invested at the spot rate 
r will grow by an increment dW that is given by 

(19) 


r(t) = R(t, T = 0) = lim R(t, T) (23) 

Graphically, the spot rate at t — 0 may be visu¬ 
alized as the yield corresponding to the point at 
which the spot-yield curve intercepts the yield 
axis. 


dW(t) = W(t)r(t)dt 
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FORWARD RATE 

The forward rate, F(fo, f) is the marginal rate of 
return for extending an investment to an addi¬ 
tional increment of term at t > to. The forward 
rate is defined by 

R(t,T)= i^ f+T F(t,r)dr (24) 

Comparing the above notations for the for¬ 
ward rate with that in equation (4), note that the 
parameter "1" from the previous parameter set 
(denoting one time period) is no longer present. 
In the continuous-time domain, one time period 
collapses to just an instant. 

Rearranging and applying Leibniz's rule, the 
above becomes 

d d r t+T 

- ITR( f , T)I = - y ( F(t, t)rfr 

= F(t,t + T) 

= F (t,s) (25) 

where s is the maturity date. The above equa¬ 
tions relate the forward rate to the spot yield 
R. As with the case of discrete compounding, 
the forward rate may be expressed similarly in 
terms of the discount function d(t, T) or the spot 
rate r(f). 

From equations (17), (18), and (25), 

F (f, t + T) = ^ In [d(t,T)] (26) 

where ln[] is the natural logarithm. 

Separately, from equations (22) and (24), 

r(t) = lim R(t, T) 

T—fO 

r (t) = lim R(t, T )f J, F(f, r)dr 

= lim — F(t, t)T (t < l < t + T) 

r^o T 

= F(U) (27) 

Under a certain economy, equations (22) and 
(27) show that the spot rate needs to be equal 
to the forward rate to preclude arbitrage. In the 
case in which the spot-yield curve R(t, T) (and 
consequently the term structure) is defined, it 
follows that the spot rate needs to be equal to 


the instantaneous forward rate over the term of 
the discount bond for equation (27) to hold true 
(see equation (7) for the analogy in the case of 
discrete compounding). 

Since R is the yield of a discount bond and 
the term structure of interest rates is the set of 
spot yields as a function of maturity, equation 
(22) defines the term structure when the evo¬ 
lution of the spot rate is a known function of 
time. However, in general, the spot rate is not 
known; only the current spot rate is known from 
the current spot-yield curve. Nevertheless, term 
structure theory expands the basic relationship 
that is shown in equation (22), namely that the 
yield of a discount bond is a function of the spot 
rate. This is discussed in more detail in the next 
section when the spot rate assumes the form of 
a stochastic differential equation. 

TERM STRUCTURE IN 
CONTINUOUS TIME 

As stated in the previous section, the term struc¬ 
ture of interest rates describes the relationship 
between the yields of default-free, zero-coupon 
securities as a function of maturity. Conse¬ 
quently, the term structure may be envisioned 
as a continuous set of yields for zero-coupon 
securities over a range of maturities. 

Equation (18) describes the price/yield rela¬ 
tionship for a single zero-coupon bond of a 
given maturity. As the term-to-maturity T spans 
the range of possible maturities within the term 
structure, the associated spot yields are gener¬ 
ated for each maturity point, that is, R is a func¬ 
tion of the term T. Furthermore, for any one 
value of T, the spot yield will vary as a function 
of the time t. In general, the spot yield R is a 
function of the term-to-maturity T, the time t 
and the spot rate r (as shown by equation (22)). 
R may be expressed as 

R = R(r, t, T) (28) 

Equation (28) describes the functional form of 
the term structure in terms of the spot yield R. In 
order to describe the term structure completely. 
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an equation is needed that mathematically spec¬ 
ifies the form of the relationship between the 
spot yield R and the term T over time t. 

Such an equation for the term structure may 
be found by considering that the term struc¬ 
ture may be expressed equivalently in terms of 
the prices of discount bonds (i.e., through the 
discount function). Thus equation (17) may be 
rewritten as 

R(r,t,T) = ~ln[P(r,t,T)] (29) 

where ln [] is the natural logarithm. 

If an expression for P(r, f, T) can be found 
that defines the value of a zero-coupon bond at 
different points in time and for varying terms 
T, then the term structure of interest rates has 
been defined fully. Alternatively, equation (29) 
provides an equivalent description of the evo¬ 
lution of the term structure over time in terms 
of the spot yield. 

KEY POINTS 

* There are three main static models for the 
term structure of interest rates: the spot yield 
curve, the discount function, and the curve of 
implied forward rates; straightforward trans¬ 


formations allow moving from one model to 
the other. 

• These representations exist in both discrete¬ 
time and continuous-time versions and may 
be readily constructed from market data. 

• Static models of the term structure suit val¬ 
uation and comparisons of fixed-income in¬ 
struments for which there is no dependency 
(contingency) on future events. 

• Even though implied forward rates provide 
an arbitrage-free forecast for the future course 
of interest rates, static models do not admit 
uncertainty about the future. 

• There are three main explanations for the fu¬ 
ture course of interest rates in equilibrium: the 
expectations hypothesis, the preferred habi¬ 
tat hypothesis, and the liquidity preference 
hypothesis. 
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Abstract: The term structure of interest rates represents the cost of (return from) borrowing (lend¬ 
ing/investing) for different terms at any one moment in time. The term structure is most often 
specified for a specific market such as the U.S. Treasury market, the bond market for double A rate 
financial institutions, the interest rate market for LIBOR and swaps, and so on. The term structure 
is usually specified via a rate or yield for a given term or the discount to a cash payment at some 
time in the future. These are often summarized mathematically through a wide variety of models. 
In addition, term structure models are fundamental to expressing value, risk, and establishing 
relative value across the spectrum of instruments found in the various interest-rate or bond mar¬ 
kets. Dynamic models of the term structure are characterizations that are specifically established to 
consider future market scenarios where there is uncertainty. As such they are rooted in probability, 
stochastic process, and martingale theory. Standard models include those derived from assump¬ 
tions that include a short-rate or a forward rate process as an explanatory factor for the evolution of 
markets. Instantiations of these models include a general zero-coupon bond pricing equation and 
the LIBOR market model. An important consideration includes expressing the market price of risk 
that allows for the complexity of the term structure of interest rates to exist without arbitrage, as 
found from the traded markets. This consideration provides a platform to analyze bond and interest 
rate derivatives in the risk-neutral setting or with a real-world/objective probability measure. 


Modern financial markets are predicated on 
the notions of contingency and uncertainty. 
Many recent financial innovations are directed 
at coping with the uncertainty of markets 
and the contingency of obligations. As part 


of this evolutionary process, dynamic mod¬ 
els of securities and their behavior in the 
markets are at the forefront of financial eco¬ 
nomic research and application. In the fixed- 
income markets, this condition dominates and 
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drives the need for dynamic term structure 
models. 

The dynamic term structure model of a mar¬ 
ket sector, as defined by a reference set of se¬ 
curities, is a mathematical set of relationships 
that can be used to characterize any security 
in that market sector in which market un¬ 
certainty dominates the expected timing and 
receipt of cash flows. There are several quali¬ 
tative essentials that need to be accommodated 
by a useful modeling approach. The ability to 
value fixed-income securities at any point in 
time (present or future) for conventional or for¬ 
ward settlement is a necessary first step. This 
is especially true in the valuation of compound 
or derivative instruments. Indeed, before the 
value of a bond option may be determined, the 
ability to calculate the (probabilistic) expected 
value of the bond on the future exercise date 
(conditioned on current market condition) is 
needed. Complementing this, reasonable vari¬ 
ations from this expectation also need to be de¬ 
termined and weighed relative to the expected 
outcome. It is essentially this same idea that al¬ 
lows for the analysis of a futures contract, an 
interest-rate cap, or an option on a swap. In ad¬ 
dition, to determine the performance risk that 
results from market moves, a rationale for incor¬ 
porating market changes needs to be embedded 
into the modeling process. 

With these premises in mind, the following 
assertions regarding dynamic models for the 
term structure of interest rates are postulated: 

• The model must have the capability to extrap¬ 
olate into the future an equilibrium evolution 
of the term structure of interest rates, given its 
form on a specified day, and must preclude 
riskless arbitrage. 

• The model must allow a probabilistic descrip¬ 
tion of how the term structure may deviate 
from its expected extrapolation while main¬ 
taining the model's equilibrium assumption. 

• The model must embody a rationale to in¬ 
corporate perturbations from the equilibrium 
that correspond to the economic fundamen¬ 
tals that drive the financial markets. 


A technical discussion of term structure mod¬ 
els is really equivalent to a discussion of the 
(zero-coupon or) spot-yield curve. The theory 
of the term structure of interest rates focuses 
on a term structure model that models the 
movement of the spot (zero-coupon) yield over 
time. Such term structure models are developed 
where any coupon-paying bond may be viewed 
in terms of its constituent zero-coupon bonds 
and analyzed in the context of this term struc¬ 
ture model. 

In this entry we focus on arriving at dy¬ 
namic term structure models that respond to 
these imperatives. We first describe a dynamic 
term structure model in the case of objective 
(or real-world) probability measures. The as¬ 
sumptions, derivation, and parameterizations 
of the general model are described. We then in¬ 
dicated how this dynamic term structure model 
represents zero-coupon bonds, coupon-paying 
bonds, and determines par-coupon and hori¬ 
zon yield curves. It can also be used to model 
option-laden bonds and derivatives. The key 
feature of this model is dependence on a short- 
rate model as the (single) explanatory factor. 

Next, a dynamic term structure model in a 
risk-neutral measure is presented. It is here that 
connections between the risk-neutral and the 
real-world setting are made; the importance of 
the forward rate model as the key explanatory 
factor is identified; and the implementation of 
computational imperatives in the context of ap¬ 
plying the model to interest rate derivatives are 
identified. 

KEY ELEMENTS IN A 
DYNAMIC TERM 
STRUCTURE MODEL 

The following key ideas guide the development 
of dynamic term structure models: 

• Equilibrium 

• Arbitrage-free 

• Continuous time/continuous state 

• Spot rate/forzvard rates as underlying variable 

• Completeness of markets 
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These five principles not only provide an el¬ 
egant mathematical formulation of the term 
structure of interest rates, but also one that is 
applicable to a number of different market sec¬ 
tors and situations. Later we look at alternatives 
to the spot rate as the underlying variable and 
introduce a concept that highlights the market¬ 
clearing consequence of equilibrium—namely, 
the consensus of a fair market as embodied in 
the idea of a martingale in probability theory 
and forward rates as the underlying variable. 

EQUILIBRIUM 

General equilibrium models of the economy de¬ 
scribe the basic workings of the macro econ¬ 
omy as a function of a given "state variable." 
This implies that the production processes and 
assets that constitute the economy are deter¬ 
mined by the value of the state variable. Cox, 
Ingersoll, and Ross (CIR; 1985) showed that this 
general equilibrium model of the economy may 
be used to derive a model for the term structure 
of interest rates in terms of this state variable. 
Such an approach is considered to be a general 
equilibrium model of interest rates in that the 
interest-rate model is a consequence of a gen¬ 
eral economic model. 

In contrast to general equilibrium models, 
"partial equilibrium" models assume a par¬ 
ticular form of the interest-rate process as a 
given. This type of approach does not require 
the particular interest-rate process to be a re¬ 
sult of some greater underlying theory. Exam¬ 
ples of partial equilibrium models are those of 
Vasicek (1977), Ho and Lee (1986), and Black, 
Derman, and Toy (1990), among others. In addi¬ 
tion, partial equilibrium models are calibrated 
exogenously to the current term structure of in¬ 
terest rates. Without this exogenous informa¬ 
tion, partial equilibrium models cannot quan¬ 
tify the term structure. 

On the other hand, general equilibrium mod¬ 
els theoretically can specify a term structure in¬ 
dependently of any bond-market information. 
It has been observed, though, that such a term 
structure (as provided by earlier general equi¬ 


librium models) may not be consistent with 
the entire market term structure. For this rea¬ 
son and due to the difficulty that some term 
structure practitioners have had in quantifying 
the parameters in the CIR model, many imple¬ 
mented of term structure models have pursued 
the development of partial equilibrium models. 

We approached these issues in the develop¬ 
ment of this term structure model in a variety of 
ways. While the model described herein is not 
purely a general equilibrium model, we began 
with the basic CIR model as a starting point and 
then further generalized that model's stochas¬ 
tic interest-rate process. Furthermore, we de¬ 
veloped an approach for the specification of 
CIR-type model parameters such that the de¬ 
rived term structure was consistent with the 
observed market term structure. Thus, draw¬ 
ing upon a cornerstone in term structure theory, 
we developed an extension to the CIR model 
that can be readily applied to the financial 
marketplace. 


ARBITRAGE-FREE 

One underlying principle that the term struc¬ 
ture model under discussion shares with many 
of the above-mentioned references is that the 
term structure is arbitrage-free. This concept, an 
extension of the arbitrage-free principles found 
in the Black-Scholes options theory for com¬ 
modity and equity markets, states that the term 
structure observes a given relationship among 
its constituent parts and that purely arbitrary 
yield-curve shapes do not occur. Given today's 
yield curve, subsequent yield curves are as¬ 
sumed to evolve in a "rational" manner that 
precludes riskless arbitrage. This indicates that 
the prices of bonds defining the yield curve 
move in such a way that it is not possible to 
create a portfolio of securities that always will 
outperform another portfolio without entailing 
any risk or net investment; in other words, there 
is no "free lunch." The arbitrage-free principle 
plays an important role in the mathematical 
pricing of fixed-income securities. 
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CONTINUOUS 
TIME/CONTINUOUS STATE 

Another distinguishing feature of this term 
structure model is the strict adherence to the 
continuous-time/continuous-state approach to the 
modeling of stochastic processes. This assumes 
that interest rates and bond prices move in a 
continuous fashion over time, rather than in 
discrete jumps. Thus a spot-yield curve may be 
found for any point in time during the life of a 
bond, rather than only at specific points (such as 
a coupon payment date). This concept is consis¬ 
tent with the notion of a continuous yield curve 
and allows for the use of continuous stochastic 
calculus. 

Continuous Probability 
Distributions 

Furthermore, the generality of the transitional 
probability density function, as a complete 
specification of the statistical properties of the 
rate process, is maintained throughout the term 
of the bond. This is in contrast to the common 
approach of describing individual sample 
paths or scenarios, as found in Monte Carlo 
approaches to security analysis. The ability to 
extend the analyses to compound, derivative 
instruments is unimpaired through the use of 
this transitional probability density function. 
Moreover, the continuous-time / continuous- 
state approach avoids the computational issues 
associated with the number of sample paths 
analyzed. Since the complete specification of 
the statistical properties is maintained, it is as 
if an infinite number of sample paths are run. 

Numerical Solution Technique 

The computer numerical solution technique 
that accompanies the continuous-time for¬ 
mulation is one that is well known in the 
engineering and physical sciences as the Crank- 
Nicholson finite-difference method for the 
solution of partial differential equations (PDEs). 


This solution technique has been used exten¬ 
sively in the study of aerodynamics and fluid 
flow, and has the flexibility to focus its com¬ 
putational efforts in areas that require greater 
numerical precision, such as the time period 
surrounding an option exercise period. This 
is in contrast to binomial interest-rate lattices, 
which are constrained to jump, for example, in 
six-month intervals, such as in some commer¬ 
cially available applications. 


COMPLETENESS OF 
MARKETS 

One of the key ideas in developing financial 
models—especially term structure models—is 
formulating valuation in the context of a repli¬ 
cating portfolio. That is, for a given structure, a 
portfolio is formed that replicates or hedges the 
instrument with the same risk-return proper¬ 
ties. Then the replicating portfolio dictates the 
value of the given structure. Otherwise, a self¬ 
financing riskless arbitrage can be engaged. Pre¬ 
sumably, price convergence would result given 
sufficient market awareness. Essentially, a mar¬ 
ket is complete if this can be always done with a 
certain characterization of uniqueness. 


DYNAMIC TERM 
STRUCTURE MODEL 

The formulation and implementation of the 
term structure model needs to be completely 
general so as to be applicable across a broad 
range of fixed-income markets in a straight¬ 
forward and consistent manner. For example, 
once the value of the fixed-income instrument 
is found, the value of its derivative (such as 
its futures contract) also may be found. Fur¬ 
thermore, it is possible to value the quality 
and delivery options within the bond futures 
contract. These effects also can be incorporated 
when valuing an option on the bond futures 
contract. 
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General Assumptions 

The analytical model that describes spot-rate 
movement is a one-factor, mean-reverting, dif¬ 
fusion process model. The model assumes: 

1. The evolution of interest rates is a contin¬ 
uous process and may be described by a 
single variable, that is, by the instantaneous 
spot rate, which is the return on an invest¬ 
ment over an infinitesimally short period of 
time. This allows for the use of continuous¬ 
time mathematics, which requires greater 
technical sophistication, but which increases 
the flexibility of the mathematical modeling 
process. 

2. The model assumes that interest rates move 
in a random fashion, which is known as 
Brownian motion or a Wiener process. The 
Weiner process has been used in the physical 
sciences to describe the motion of molecular 
particles as they diffuse (or spread) over time 
and space. 

3. The term structure of interest rates is as¬ 
sumed to be represented by a Markov pro¬ 
cess, which states that the future movement 
in interest rates depends only on the current 
term structure and that all past information 
is embodied in the current term structure. 

4. The term structure is arbitrage free in that a 
portfolio of securities derived from the term 
structure is constrained to have an instanta¬ 
neous rate of return that is equal to the risk¬ 
free rate. Future movements in interest rates 
are similarly constrained so that the possi¬ 
bility of riskless profit is precluded. This im¬ 
plies that there are a sufficient number of 
sophisticated investors who will take advan¬ 
tage of any temporary mispricing in the mar¬ 
ketplace, thus quickly diluting any arbitrage 
opportunities that exist. 

Technically, an arbitrage-free term structure 
indicates that a portfolio of securities derived 
from the term structure may be constructed 
such that the portfolio instantaneously returns 
the risk-free rate. Since the above holds true for 


any arbitrary set of maturities in this portfolio of 
securities, it is said to be true for all maturities. 
This indicates that all securities that comprise 
the term structure are related in a common fash¬ 
ion. This commonality is expressed through the 
concept of the market price of risk, which is the 
incremental return over the risk-free rate that is 
required for incurring a given amount of addi¬ 
tional risk. In this context, risk is measured by 
the variance of a bond's rate of return. A result 
of the arbitrage-free nature of the term struc¬ 
ture is that all securities share the same market 
price of risk. As we demonstrate at the end of 
the entry, the risk premium is one component 
of the market price of risk. 

1. The price of a default-free, zero-coupon (dis¬ 
count) bond at any point in time continu¬ 
ously depends on the spot rate, time, and 
maturity of the bond. This models the in¬ 
teraction between the bond's price and the 
probabilistic movement in the spot rate. This 
is an extension of the point discussed earlier 
that stated the yield of a discount bond is a 
function of the spot rate. 

2. The market is efficient in that all investors 
have the same timely access to relevant mar¬ 
ket information. Furthermore, investors are 
rational and there are no transaction costs. 

SPOT-RATE MODEL 

As a result of assumptions 1 through 3 above, 
the equation that describes the diffusion process 
for the movement in the spot rate is given by 
equation (1) 

dr — k(6 —r)dt + osfrdz (1) 

where 

r — spot rate, the instantaneous rate of re¬ 
turn 

dr — infinitesimal change in the spot rate 
k — mean reversion constant 
0 = "target" spot rate, which will be ex¬ 
pressed as a function of time 
dt — infinitesimal change in time 
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a — volatility of r 

dz — infinitesimal change in the random vari¬ 
able z (a characterization of the Weiner 
process) 

There are many alternatives to the form (1) 
(see, for example, Hull, 2009) and while this 
model has some attractive features, we in no 
way argue that it is "best." It is just useful and 
has been shown to work well in practice. Its 
features include the following. 

Mean Reversion 

Equation (1) states that the rate r changes with 
respect to time and the degree of randomness. 
The first term on the right-hand side of equation 
(1) states that the "drift" in the spot rate over 
time is proportional to the difference between 
the rate r and 0. As r deviates from 0, the change 
in r is such that r has a tendency to revert back to 
9, a feature that is known as mean reversion. The 
presence of mean reversion imposes a central¬ 
izing tendency such that rates are not expected 
to go to extremely high or low levels. In ad¬ 
dition, mean reversion precludes the existence 
of negative interest rates in our interest-rate 
model, given that the initial interest rates are 
positive. 

One can easily derive a closed-form expres¬ 
sion for 9 as a function of time. Note that 9 is 
not assumed to be constant, which is usually 
the case for the traditional CIR approach. 

Effect of Randomness 

The second term on the right-hand side of equa¬ 
tion (1) states that the contribution to the change 
in r due to randomness is driven by movements 
in the random variable z. The variable z is nor¬ 
mally distributed with a mean of zero and a 
variance that is proportional to time. This in¬ 
dicates that the amount of random "noise," as 
represented by the variable z, may be any pos¬ 
itive or negative value, but that its expected 
value is zero. In addition, as time passes, the 
variance increases so that the "amplitude" of 


the noise also increases. The variables a and r, 
which are coefficients of dz in equation (1), show 
that the change in r also depends on the level of 
volatility and interest rates. The variable z has 
its own defined level of uncertainty so that as 
volatility and rate change, the overall degree of 
uncertainty is influenced by the level of these 
variables. 

Endogenous Parameterization 
(Tuning the Model) 

Equation (1) describes the rate in terms of the 
parameters /c, a, and 9. The volatility parameter 
a is specified externally so that it reflects either 
the historical level of volatility or the volatil¬ 
ity that is currently present in the market. Sec¬ 
ondly, 9 reflects the current term structure such 
that the future movements in r are influenced 
by today's term structure. Finally, the mean re¬ 
version constant k determines the speed of ad¬ 
justment of r back to 9. In order for the interest- 
rate model to be of any utility, the parameter 
k is chosen to be consistent with the observed 
market prices of bonds comprising the current 
yield curve, while 9 is derived directly from the 
current yield curve. This process of determin¬ 
ing k and 9 "parameterizes" the model to the 
observed yield curve. 

There are several variations of equation (1) 
that exist within the academic literature that 
appear to be similar to equation (1); see, for ex¬ 
ample, Chan et al. (1992). However, the details 
surrounding the functional form of each term in 
equation (1) and the associated parameteriza¬ 
tion process can result in very different models. 
The specification of parameters for this term 
structure model is driven by the requirement 
to be able to precisely reprice the set of securi¬ 
ties that constitute the reference yield curve. A 
properly calibrated term structure model needs 
to be able to define a bond whose cash flow 
characteristics match those of an on-the-run is¬ 
sue exactly and then have the price of that con¬ 
structed bond match exactly the market price 
of the Treasury issue. By repeating this process 
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for each of the on-the-run issues, the mean re¬ 
version constant and the risk premium that are 
appropriate over the range of reference issues 
may be quantified. 

As a technical side note, the term structure 
model needs to satisfy internal consistency 
checks, and the parameter specification process 
plays a part in the internal system for checks 
and balances. For the set of chosen parame¬ 
ters, the price furnished by the term structure 
model—as the solution to a PDE—needs to be 
equal to that provided by applying the discount 
function to the cash flows of the specific on-the- 
run issue, as explained earlier. Thus the dis¬ 
count function is a direct means of verifying 
the results of the term structure model. In fact, 
the PDE may be decomposed into two coupled 
ordinary differential equations (ODE) in the ab¬ 
sence of any embedded options. Thus prices 
obtained from the PDE, ODE, and discount- 
function approaches all need to be identical. 


Calculation of the Spot Rate 

The solution to equation (1) is obtained through 
computer numerical solution techniques and 
accounts for the current value of the spot rate 
(as an initial condition) and its level of volatil¬ 
ity. As time moves forward, the solution ex¬ 
presses the probable distribution of the spot rate 
as the spot rate propagates through time. Thus, 
at any point in time, it is possible to calculate the 
probability distribution of the spot rate. It was 
discussed previously that the price of a bond 
depends on the spot rate so that the spot-rate 
probability distribution is also the probability 
distribution for the bond price. This is useful 
in calculating the probability that an embedded 
call or put option will be exercised, which is the 
probability that the price of a particular bond is 
greater than or less than, respectively, the spec¬ 
ified strike price at exercise. 

The calculation of the probabilities is made 
possible by assuming a specific mathematical 
form for the random variable z, or a Wiener pro¬ 
cess. Generally, a probability distribution func¬ 


tion is described by its mean and variance as 
functions of time. If these quantities are known, 
then the probability of different spot rates is 
known. The Wiener process assumption states 
that the statistical variance for the random vari¬ 
able z varies with the length of time under con¬ 
sideration. As time increases, the variance of z 
also increases. The known change in the vari¬ 
ance of z is subsequently translated (in a known 
fashion) to the change in the variance of the 
rate r, which may be used to obtain the desired 
probability in terms of r . In general, we use the 
solution of the Kolmogorov (forward or back¬ 
ward) equation to establish an expression for 
the probability density of the short rate. 


BOND-PRICE VALUATION 
MODEL 


As a consequence of assumptions 4 and 5 (the 
price of a default-free discount bond depends 
continuously on the spot rate), it can be shown 
that the price of a discount bond of term T is 
expressed as 


dP 


dP 1 d 2 P 
= rP - [k(6 - r) + Xar] — - -<r 2 r-^- 

( 2 ) 


where 


P — price of zero-coupon bond for 
time t and rate r 

dP/dt — partial derivative of price with re¬ 
spect to time 

dP/dr — partial derivative of price with re¬ 
spect to rate 

3 2 P/dr 2 — second partial derivative of price 
with respect to rate 
X = "risk premium" 

The "risk premium" is the variable that rep¬ 
resents the additional return over the risk-free 
rate that the market requires for holding a 
longer-term instrument. This is determined 
from the current term structure. In addition to 
the bond price equation, to represent the 
behavior of the instrument, boundary 
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conditions on the solution to (2) need to 
be prescribed. These conform to given circum¬ 
stances, but in the simplest case, they include 
cash flows and constraints on P as r converges 
toward zero or becomes arbitrarily large. 


Developing the Bond-Price Equation 

A development of the bond-price valuation model 
(for the zero-coupon bond) follows in a straight¬ 
forward manner. Arguments of variables are 
suppressed except when needed to clarify de¬ 
pendencies. 

Equation (1) describes the process for the 
propagation of the spot rate and is given by 

dr = k(0 — r)dt + ojrdz 


where 


and 


Thus 


W 2 = 


W 1 = 


Pi 


Pi — Pi, 


Pi 


Pi — Pi. 


w 


w 


dW = dW 2 - dW\ 

Substituting for dW i and dW 2 yields 


dW = 


PiPi 
.Pi — Pi. 
P-iPi 


Wdt 


Pi~ Pi. 
PiPi — p\p 2 
Pi — Pi 


Wdz + 
Wdt 


Pi Pi 
.Pi — Pi. 
Pi Pi 


.Pi — Pi 


Wdz 
Wdz 


If we assume that P is a function of the two 
variables r and t expressed as the following 
P = P(r, t), then Ito's lemma (see Shreve, 2004) 
provides that 


dP = 


. 3P 3 P 1 , 3 2 P' 

- „J7—dz 

dr 


dt 


To apply the principal of an arbitrage-free 
term structure, consider the representation of 
evolutions of the price to be 


dP = pPdt — pPdz 


where 




P = 


1 , 3 P 
—b — 
P dr 


3 P 
~dT 



d 2 P~ 
dr 2 


Any security W, with maturity s, is subject to 
the same relationship such that 


dWt = fnWidt - piW.dz 


Consider a portfolio W consisting of owning 
an amount of Wj and shorting an amount of W 2 
such that 


Since the stochastic element dz disappears, the 
rate of return on the portfolio W is equal to the 
riskless rate r. Therefore, 

dW = rWdt 

where we see it must be that 

P-1P1 — P1P1 
r = - 

Pi — Pi 

This gives the following relationship 
rpi - rp 2 = p 2 Pi - P1P2 
or, equivalently, 

Pi ~r _ p\ - r 
Pi Pi 

Since the maturities S j and S2 were chosen ar¬ 
bitrarily, the above is true for any maturity s. 
Therefore, the term 

p — r 
P 

is not a function of maturity and may be written 
as 



where ij(t, r) is the market price of risk. 

Applying separation of variables, we choose 
q(t, r) to be the following 

q(t,r) = Ht)y/r 


W=W 2 -W 1 
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where /.(£) is the risk premium, which can be 
shown to be 



(As the term extends, the premium is higher.) 
We see, therefore, that 


-= q(t, r) =>• fi = r + X(t)*Jr p 

P 

or that the expected return of a bond is equal 
to the riskless rate plus another term related to 
the risk premium. 

With p — —jb^r, the above becomes 

p = cr-s/r —— 

Substituting the above into dP = pPdt — pPdz 
gives (where |^ < 0) 

d P 1 \ 

— /.or-) Pdt — pPdz 

dr P J y 



Equating the coefficients of dt between the 
above and 


dP = 


k(9 — r) 


dP 

dr 


+ a 


^—dz 

dr 


dP 

~d7 


1 - d 2 P 

-or —— 

2 dr 2 


dt 


gives 


dP 

~dt 


— rP — [k(0 — r) + Xor] 


dP 

dr 


1 - d 2 P 
2° Y ltr 2 


where, at maturity, we have the boundary con¬ 
dition 


P(r . t) = 1 

This completes the derivation of equation (2). 
Next, if we assume a separation of variables 
for P(r, t ) of the form 

P(r, t) = exp [C(f) — B(t)r] 

it can be derived that the target spot rate, 9{t), 
is of the form 

9(t 0 + T) = —^-]nd(t 0 , T) — p -j~2 lnd(fo> T) 
dT k dT 

or 

0(to + T) = F(fe, to + T) + - —T(f o, to + T) 


which will provide a solution to equation (2) 
that will exactly reprice the reference set where 
the discount function d(to, T) and the forward 
rates F(fo to + T) are derived from the refer¬ 
ence set using spline functions. Furthermore, 
this property is true for all volatilities when the 
above-specified risk premium is used. 


THE TERM STRUCTURE 

Equation (2) is a PDE whose solution is ob¬ 
tained through a numerical finite-difference 
technique. The solution gives the price P of the 
bond for different times and spot rates, and can 
be visualized as a three-dimensional surface for 
which the height of the surface is the price of 
the bond and the location of the point (i.e., lon¬ 
gitude and latitude) is given by the time and 
spot rate. The solution takes into account that 
the bond's price is par at maturity, regardless 
of the level of interest rates. As the solution 
steps back from the maturity date, the price 
of the bond may be calculated for varying lev¬ 
els of the spot rate and the familiar price / rate 
graph may be drawn for this time-step. (Not 
all bond prices are equally likely to occur since 
interest-rate movements and the probabilities 
associated with these movements are described 
by equation (1).) 

As the solution process continues backward 
from maturity to the present day, the theoretical 
price corresponding to today's spot rate can be 
calculated. Once the price behavior of a bond 
is known, the value of an option on that bond 
may also be calculated. In general, the expected 
value of the bond may be determined at any 
time from the present to maturity under the 
expectation operation over the solution to (2) 
and the probability density function for r. 

Since the solution to equation (2) furnishes the 
price as a function of time and rate, equation 
(14) of the previous section may be solved to 
provide the zero-coupon yield for a bond with 
the term-to-maturity T. As the term T is varied, 
the entire term structure may be obtained. 
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(The obtained term structure, in general, can 
take a variety of shapes. If the current spot rate 
is below the current value of the long-term rate, 
0, the obtained term structure will be upward 
sloping. If the current spot rate is substantially 
above the long-term rate, the obtained term 
structure will be inverted to downward sloping. 
For spot-rate values in between, the term struc¬ 
ture will be humped, displaying both upward 
sloping and downward sloping segments. Thus 
an attractive feature of the term structure model 
is the ability to obtain term structure specifica¬ 
tions that are consistent with those that have 
been observed historically.) 


APPLICATIONS OF THE 
TERM STRUCTURE MODEL 

We conclude this entry with a description of the 
application of the term structure model devel¬ 
oped in the previous section in the valuation of 
fixed-income securities. For the simple case of 
noncallable bonds, many term structure mod¬ 
els can be used to determine value. In fact, the 
spline-fit discount function is a very straightfor¬ 
ward method of calculating the value of such a 
bond. However, when option-embedded bonds 
or compound instruments are considered, us¬ 
ing the PDE approach is opportune to reflect 
the specific nature of the option features. As 
this entry demonstrates, the PDE-based term 
structure model is but the first step that leads 
to a greater assortment of analytical financial 
tools. 

Zero-Coupon Bonds 

Most yield curves, such as the U.S. Treasury 
curve, are expressed in terms of the yields of 
coupon-bearing bonds, not zero-coupon bonds. 
Thus a procedure is required to translate the 
current-coupon yield curve to an initial zero 
curve (i.e., the current term structure) expressed 
in terms of a spot-yield curve. One of several 
methods may be employed; see Vasicek and 


Fong (1982). In summary, a reference set of se¬ 
curities is chosen to represent the yield curve, 
and each of the cash flows from this set of se¬ 
curities is treated as a zero-coupon bond that 
is part of the term structure. Since each of the 
reference securities has a known market price, 
the price /yield relationship, along with a curve¬ 
fitting process, is applied sequentially to each of 
the cash flows to derive the current term struc¬ 
ture. This process establishes the set of initial 
conditions necessary to predict the evolution of 
the term structure. 

If the actual zero-coupon yields are compared 
to the theoretical zero-coupon yields, then the 
richness or cheapness of the zero-coupon mar¬ 
ket may be gauged. Since the discount func¬ 
tion may be constructed from any reasonable 
set of reference bonds, if the reference bonds 
consisted of off-the-run Treasury issues that are 
commonly stripped and/or reconstituted, then 
the corresponding theoretical zero curve should 
be indicative of the shape and level of the mar¬ 
ket strip curve. 

Additionally, as the Treasury curve flattens 
or steepens, the theoretical zero curve changes 
accordingly to reflect the new shape of the Trea¬ 
sury curve. Consequently, as the Treasury curve 
steepens or flattens, the degree of anticipated 
yield-spread widening or tightening in the zero 
market may be estimated. 

Coupon-Paying Bonds 

While our discussion thus far applies mainly 
to the price of a zero-coupon bond, it is more 
common to encounter coupon-paying bonds. 
To value coupon-paying bonds, we simply sum 
the present values of each of the coupon pay¬ 
ments to determine the price. As discussed ear¬ 
lier, each coupon is treated as an individual 
zero-coupon bond. 

Determination of the Theoretical 
Fair Value 

Once the term structure is defined, it may be 
used to value any collection of cash flows and 
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serves as the standard of fair value. The theoret¬ 
ical price of a security that is calculated in this 
manner may be compared to its actual market 
price. Any difference in price that results indi¬ 
cates whether the security is rich or cheap rela¬ 
tive to its fair value. If the market price is equal 
to the fair value, then the security is said to be 
fairly priced. 

Generally, Treasury securities are chosen to 
represent the basis for fair value and most 
securities (such as corporate and government- 
agency debt obligations) are cheap to Trea¬ 
suries. However, if there are a sufficient number 
of securities from a particular sector or issuer, 
these issues may be used as the reference set of 
securities and a new yield curve may be defined 
to be the standard of fair value. Thus corpo¬ 
rate, agency, or municipal debt issues may be 
compared to their own family of securities or 
to their own sector to determine their relative 
value within the specified sector. 

Determination of Par-Coupon and Horizon 
Yield Curves 

A par-coupon yield curve is a theoretical yield 
curve comprised of par-priced bonds along the 
maturity spectrum. Each of these par-priced 
bonds is constructed from the same discount 
function, which in turn is derived from a spec¬ 
ified set of reference bonds. Since the discount 
function is defined continuously at different 
maturity points and cash-flow dates (via a 
spline-fitting procedure, for example), the par- 
coupon bonds corresponding to these same 
points may be determined. 

The procedure for constructing a par-coupon 
bond involves an iterative process in which an 
initial coupon is assumed. For a given maturity 
date and associated coupon-payment dates, the 
cash flows and cash-flow dates are known for 
the assumed coupon level. The present value 
of each of the cash flows is found through the 
discount function, and the sum of the present 
values is compared to a price of par. The coupon 
then is varied until a par-priced bond is found. 
The process may be repeated for as many ma¬ 


turity points as desired to construct an entire 
par-coupon yield curve. 

A par-coupon yield curve is helpful in pric¬ 
ing bonds with off-the-run maturities. Often the 
question arises as to what exactly is the compa¬ 
rable Treasury yield when pricing off-the-run 
bonds. Depending on the fixed-income market 
sector, the comparable Treasury yield may be 
that of a specific Treasury note, or it may be an 
interpolated yield. The par-coupon curve pro¬ 
vides a more technically rigorous means of cal¬ 
culating the interpolated yield, as opposed to a 
simple straight-line interpolation scheme. 

Another application of the concept of the par- 
coupon yield curve is the horizon yield curve, 
the par-coupon yield curve for a future horizon 
date. Since the discount function may be deter¬ 
mined as a function of time, the corresponding 
horizon yield curves at various points in time 
also may be found. The horizon yield curve is 
one way to help visualize how the present yield 
curve may evolve in the future in an arbitrage- 
free environment. (Of course, as new informa¬ 
tion is incorporated into the marketplace as 
time passes, the actual yield curve may devi¬ 
ate from the horizon yield curve. However, a 
horizon yield curve may still be calculated that 
reflects particular views about the future move¬ 
ments in both short-term and long-term rates.) 

Yield-Curve Shocks and Shifts 

The shape of the yield curve is governed by 
exogenous ( real-world ) factors. As the Federal 
Reserve alters its monetary policy, or as the 
inflation outlook changes, the yield curve re¬ 
sponds accordingly. These perturbations to the 
curve can be characterized as "shocks" to short¬ 
term rates and as "shifts" to long-term rates. A 
shock can occur when there is a sudden and 
unexpected event that causes short-term rates 
to jump, even though the overall economic fun¬ 
damentals have not changed. 

The clearest example of a shock is the crash of 
1987, during which investors fled to the safety 
of the Treasury market. During October 19, 
short-term rates dropped by approximately 90 
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to 100 basis points as investors sought a tem¬ 
porary safe haven. At the same time, long-term 
rates fell by about 20 to 30 basis points. Since 
the crash was a market phenomenon, rather 
than an altering of economic fundamentals, it is 
characterized as a shock to the system. (This is 
described mathematically within the term 
structure model as a change to the initial con¬ 
dition of the differential equation, where the 
differential equation remains the same. The so¬ 
lution to the differential equation shows how 
the entire yield curve responds to a shock in 
short-term rates.) 

A shift in the yield curve results from a change 
in the economic landscape where federal bud¬ 
getary concerns or inflation outlooks can affect 
the view on long-term interest rates. (In contrast 
to a shock, the term structure model represents 
a shift as a respecification of the parameters to 
the differential equation, while the initial condi¬ 
tion has remained unchanged. The most general 
situation can consist of a combination of shocks 
and shifts.) 

The basic premise underlying the shocked 
and / or shifted horizon yield curve is that the 
curve evolves in an arbitrage-free manner as 
prescribed by the term structure model despite 
alterations to the curve. Thus, even though a 
shock or a shift has occurred, the entire yield 
curve responds in such a way as to preclude 
arbitrage. As a result of different combinations 
of shocks and shifts of varying magnitudes, a 
series of horizon yield curves can be found for 
different yield-curve steepening and flattening 
scenarios. 

TERM STRUCTURE OF 
FORWARD RATES 

The financial markets can be viewed as a 
"game" with bids and offers between par¬ 
ticipants. To characterize fairness among the 
participants, the concept of a martingale (from 
probability theory) is introduced. Briefly, a mar¬ 
tingale M(t) is a stochastic process with finite 
first moment for any t and where 

E [M(s)|F f ] = M(f) tors >f 


with F t denoting that the conditioning is on 
a given filtration or data set. Additionally, a 
portfolio may be thought of as a quantity vec¬ 
tor representing a particular set of positions 
(0ksendal, 2007). If the market is fair, then 
the discounted future value of any portfolio 
should be the same as today's portfolio value 
when an appropriate discounting methodology 
is employed. However, in the objective (or real) 
world, equipped with the real-world measure, 
discount functions vary according to individ¬ 
ual risk preferences, each associated with its 
own sector/market consensus. It is tedious to 
quantify these preferences for every case. So, 
instead of working under the real-world mea¬ 
sure, we seek to explore an artificial probability 
measure under which every situation is risk- 
neutral. This probability measure is called the 
risk-neutral measure. 

Modern pricing theory for financial deriva¬ 
tives is based on replicating a given deriva¬ 
tive's payoff by putting together a self-financing 
portfolio consisting of the underlying assets 
and risk-free bonds. By buying a derivative and 
selling its replicated portfolio (or vice versa), 
the self-financing portfolio is found to be risk¬ 
free. Constructing such a risk-free portfolio is 
beyond the scope of this discussion, but un¬ 
derstanding and utilizing the existence and 
uniqueness of this replicating strategy is the 
key for what follows (see Bjork, 2009). Next, 
we first examine the derivation of a risk-neutral 
probability measure from a forward-rate model. 
Then we look at a general no-arbitrage condition 
for the bond market. Finally, we address some 
practical issues and solutions in a conceptual 
fashion. 


HEATH, JARROW, AND 
MORTON MODEL OF THE 
TERM STRUCTURE 

Heath, Jarrow, and Morton (1992) proposed a 
general condition for no-arbitrage using the in¬ 
stantaneous forward-rate curve dynamics. The 
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instantaneous forward-rate is defined as 


hypothesis, that is. 


F(f,T) 


3 In B(t,T) 
3 T 


£ 


'dB(t, T) 
_ B(t, T) 


F (t, t)dt 


where B(f, T) is the zero-coupon bond price at 
time t and maturity T. This stochastic process is 
usually written in a differential form 

dF(t, T) = a(t , T)dt + a(t, T)dW(t) 


where the expectations £ [] is taken under this 
risk-neutral measure. Therefore the discounted 
bond-price processes D(f)B(f,T)isa martingale 
for all T, that is, 

£ [D(s)B(s, T)|F f ] = D(f)B(f, T) for f < s < T 


where a and a satisfy the usual conditions for 
an Ito process and W(t) is a standard Brownian 
motion (under the real-world measure). Here, 
F(0, T) is the initial forward-rate term structure. 
In many situations, instantaneous forward rates 
are fundamental building blocks for modeling 
fixed-income contingent claims. For example, 
a bond price process can be derived from Ito's 
lemma such that 


dB(t, T) 
B(t, T) 




cr(t, u)dudW(t) 


Details can be found in Shreve (2004). Also, 
the money market account can be written as 


dM(t) 


F ( t , t)dt (or equivalently, 

M(t) = e /o F ('M<)dU) 


A discount factor, D(f) = AT 1 (f), is defined 
similarly. A variation of this setting is one 
where we use the notation T to represent time- 
to-maturity (also called term). This alternative 
model is closer to the market reality because the 
curve won't shorten and will validate rolling- 
over trading strategies. For simplicity we set T 
to be maturity in the rest of this entry. 

Let's first assume the existence of a 
risk-neutral probability measure, which is 
equivalent to imposing the local expectations 


This hypothesis also implies that the short 
rate evolves along today's instantaneous for¬ 
ward rate curve. Refer to Bjork (2009) or Shreve 
(2004) for more details. Based on the martin¬ 
gale property we can then derive the HJM no¬ 
arbitrage condition shown in Heath et al. (1992) 
that 


a(t, T) — a(t, T) / a(t, u)du 


Jf 


That is, the drift term of the instantaneous 
forward-rate curve process is tightly defined 
by the volatility term. This remarkable result 
tells us that only volatilities matter when mod¬ 
eling interest rates under a risk-neutral mea¬ 
sure. Since the martingale property is imposed 
on all zero-coupon bonds to ensure fairness, ar¬ 
bitrage trades are precluded. If a pricing model 
is designed only for a derivatives pricing pur¬ 
pose, further investigation on risk premium is 
not necessary. This is an important point. For 
once the HJM no-arbitrage condition is applied 
to a particular model, the existence of a risk- 
neutral measure is assumed and the risk pre¬ 
mium is zero. Nonetheless, not every modeler 
appreciates the consequence of ignoring the 
risk premium—especially when an asset and its 
derivative are priced congruently. For example, 
mortgage-backed derivatives usually involve 
prepayment statistics, which cannot be quanti¬ 
fied under a risk-neutral measure, and the risk 
premium is usually given exogenously. The an¬ 
swer of which model should be used is based 
on the modeler's discretion involving calibra¬ 
tion, implementation, and market assumptions, 
which we will talk about a bit more below. 
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MARKET PRICE OF RISK 


Let the market price of risk be denoted by ©(f). 
By the HJM no-arbitrage condition 


a(f, T) — <r(f, T)0(f) 



cr(f, u)du 


which shows that the risk premium can be writ¬ 
ten as 


©(f) = 


a(t, T) 



If 0 exists, then the market is arbitrage-free. 
Moreover if © is unique, then the market is com¬ 
plete. For a multifactor model, completeness 
can be shown by nonsingularity of the volatil¬ 
ity matrix. A remark can be made here that risk 
premiums are determined endogenously by the 
HJM no-arbitrage condition following from the 
local expectations hypothesis. This market price 
of risk identified in the HJM model is, however, 
a constant function of all maturities. The lack of 
flexibility limits the interest rate curve evolution 
under the real-world measure. In other words, 
if the curve dynamic is initially set up under a 
risk-neutral measure, then ©(f) is usually im¬ 
possible to find so that the "model-derived" 
real-world interest rates could satisfy the "real" 
real-world statistics. 


BOND PRICING 

When the market is assumed to be arbitrage- 
free and complete, zero-coupon bonds can then 
be derived under a unique risk-neutral measure 
that 

dB(t, T) f T 

= F(t, t)dt - / a(t,u)dudW(t) 
o(t, l ) Jt 

The rate of return for any bond is the same 
as the short rate; nonetheless, the bond-price 
process is not Markov for a general forward- 
rate model. This result is critical when it comes 
to derivatives pricing since Monte Carlo sim¬ 
ulation is often the only approach, and it 
can be slow and imprecise. Furthermore, no 
closed-form solution for bond dynamics can be 


given, thus there is no closed-form solution for 
bond derivatives. Besides the computational is¬ 
sues due to the complexity in bond dynamics, 
the HJM framework cannot be used for log- 
normally distributed forward rates since, under 
the continuous compounding environment, the 
process "explodes" with positive probability. 
Therefore, practitioners seek eclectic methods 
to resolve the issues. A powerful tool invented 
for interest-rate derivatives pricing is the tech¬ 
nique of "changing the numeraire," discussed 
next. 


CHANGE OF NUMERAIRE 

The numeraire is a traded asset used for measur¬ 
ing value. Given a numeraire, all other prices 
are measured relative to this asset. In general, 
risk-neutral measures can have various forms in 
terms of different numeraires. For instance, if a 
money market account is used as a numeraire, 
it is the tradition risk-neutral measure as we 
see in the Black-Scholes option pricing setting. 
In a traditional risk-neutral world, the general 
evaluation form is written as 

V(t) = £ [D(T)V(T)|F f ] 

where V(T) is the payoff of a contingent claim 
maturing at time T, and 1/(0) is its price at time 
0. Normally interest rates and underlying assets 
are assumed to be uncorrelated. This assump¬ 
tion makes the evaluation of the expectations 
above easier, but it is obviously invalid when 
a derivative V is based on interest rates. Fur¬ 
ther investigation in separating the derivative 
value process and the discount factor has been 
established by Geman et al. (1995). 

In a traditional risk-neutral world, every 
discounted traded-asset price process is a mar¬ 
tingale. When we take, for example, a zero- 
coupon bond with maturity T as our numeraire, 
the drift term of any other discounted traded- 
asset price process is adjusted according to this 
zero-coupon bond volatility. The new measure 
based on the zero-coupon bond numeraire is the 
T-forward risk-neutral measure. Consequently 
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we have 


V(t) = B(t, T)E t [V(s)\F t ] 

where £ T [] is the expectation under the T- 
forward risk-neutral measure. When the money 
market account is used as the numeraire, this 
adjustment to the drift term is unnecessary 
since the money market account process has 
zero volatility In this new pricing equation the 
discount factor is taken out of the bracket and 
replaced with the zero-coupon bond discount. 
Therefore, the expectation is performed solely 
on the derivative V. 


MARKET MODELS 

For practitioners, the continuous compound¬ 
ing framework is unnecessary since most in¬ 
terest rates, such as LIBOR, for example, have 
only 1-week, 1-month, 3-month, 6-month, and 
1-year investing intervals. Therefore, adopting 
the general no-arbitrage condition under the 
HJM framework. Brace et al. (1997) created 
a model for simple forward rates, which are 
compounded under a discrete-time framework. 
Based on the change of numeraire technique, 
forward rate processes are martingales under 
specific forward risk-neutral measures. This 
phenomenon can be justified via analyzing a 
bond portfolio used to create the payoff of a 
forward rate agreement: Let F(f, T,T + r) de¬ 
note the process of a simple forward rate for the 
period [f, T] with tenor r. Then 


F(f, T, T + r) 


B(t, T) — B(t, T + r) 
rB(t, T + r) 


Here B(f, T + r) serves as the numeraire and 
transforms the traditional risk-neutral proba¬ 
bility into a forward risk-neutral probability. 
By Ito's lemma, the forward rate dynamic can 
therefore be written as 


dP(t, T,T + r) 
F(t , T, T + r) 


y(t, T, T + r)dW T+z (t) 


where 

y(t,T,T + r) 


1 + rF(f, T,T + r) 


rP(t,T,T + r) 


x 



a(t, u)du 


The main advantage of the LIBOR market 
model is set on the practical side. First, if y 
are assumed to be nonstochastic, then for¬ 
ward rates are log-normal, which coincides 
with Black's pricing formula. Moreover, the 
consequence that interest rates are nonnegative 
and zero-coupon bond prices are nonzero un¬ 
der Monte Carlo simulations makes the model 
widely accepted. Therefore, for the past two 
decades, the LIBOR market model has been 
highly developed for various applications in¬ 
cluding the LIBOR swap market. Derivations 
and implementations of these market models 
can be found in Brigo and Mercurio (2006) and 
Rebonato (2002, 2004). 


INTEREST RATE 
DERIVATIVES 

An interest-rate cap consists of a portfolio of 
caplets that provide insurance against rising 
borrowing costs. Let C(T) denote a caplet with 
maturity T on a simple r-LIBOR forward rate 
P over time interval [t, T]. The payoff of this 
LIBOR caplet is 

C(T, T) = L (P(T, T, T + r) - K) + 

where L is the principal amount and K is the 
strike rate. Under the market model setting 
with deterministic forward-rate volatilities, the 
caplet price can be written in Black's formula 
by 


C(0, T) = B(0, T)L [f (0. T, T + r)N(dd - KN(d 2 )] 


In 


di = 


'P( 0, T.T + t) 
, K 


Hf 


y z (u, T, T + r)du 


d-2 = d\ — 


Is: 




y 2 (u, T,T + r)du 
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in which the volatility structure is flat with re¬ 
spect to the caplet strike prices. Despite this lim¬ 
itation, the model becomes the building block 
for replicating exotic interest-rate derivatives 
since the implied volatilities can be derived 
from several plain-vanilla traded derivatives. 
The information determined from this smaller 
scale market is then extended to characterize the 
whole-market dynamic. The operation usually 
involves interpolating, and many techniques 
are introduced in Rebonato (2002). 

For pricing exotic interest-rate derivatives, 
interpolation on implied volatilities is often nec¬ 
essary, though undesirable because the HJM no¬ 
arbitrage condition cannot hold in most cases. 
LIBOR serial options, for example, are not as ac¬ 
tively traded, so the prices are calculated based 
on the LIBOR cap/floor market. A serial option 
has two different maturities for the underlying 
forward rate agreement different from the op¬ 
tion itself. Despite the availability of a closed- 
form solution, the needed volatility input for 
Black's formula turns out to be a partial integra¬ 
tion from time 0 to the option maturity, and this 
information is not available from the cap / floor 
market. Therefore, further heuristic treatment 
is usually undertaken to connect the dots, in 
which case the curve would behave in explicit 
patterns and allow arbitrage. 


DESIGNING YOUR NEXT 
MODEL 

No single model is perfect in general for all as¬ 
sets in any market environment. The trade-offs 
between convenience and accuracy are eval¬ 
uated by individual trading desks, quantita¬ 
tive analysts, and ultimately validated by the 
market. Nonetheless, when presenting a new 
model, three aspects are usually evaluated. 

From a financial aspect, a model must be able 
to price the underlying asset(s) and its deriva¬ 
tives simultaneously. The market for an asset 


and its derivatives are congruent, and there is 
no logic in pricing them separately, thereby risk¬ 
ing "model" arbitrage. For example, we con¬ 
struct an interest-rate model for LIBOR-swaps 
curve in the real world and organically em¬ 
bed it in the model to price LIBOR derivatives 
such as LIBOR caps, floors, or even serial op¬ 
tions in a risk-neutral world. Another example 
is for an underlying bullet bond and its callable 
counterpart. A callable bond is a bullet bond 
with an issuer-long, embedded American-style 
call option; however, the bullet bond price is 
determined under the real-world measure and 
the embedded option can be priced in the risk- 
neutral world. Therefore, a good model should 
be able to value a callable bond by valuing the 
bullet bond and the embedded American-style 
call option simultaneously. 

From a mathematical standpoint, a model 
must be able to exhibit equivalency under dif¬ 
ferent measures by explicitly characterizing the 
market price of risk. This mathematical com¬ 
ponent builds the bridge connecting the real 
world and a risk-neutral world. A complete fi¬ 
nancial market infers the existence of a unique 
market price of risk; but we should empha¬ 
size that whether a market is complete or not 
does not depend on the existence of a set 
of complete traded assets, but on the exis¬ 
tence of an entity that can make the market 
if an arbitrage opportunity is revealed. There¬ 
fore, modem financial markets create not only 
hedging tools but an intangible equilibrium, 
which validates the underlying mathematical 
assumptions. 

Finally, as we employ computation, this as¬ 
pect demands that models/derivatives that re¬ 
quire Monte Carlo analysis must be simulated 
by the same algorithm efficiently under dif¬ 
ferent measures. This issue is more important 
in interest-rate modeling since there may be a 
trade-off between satisfying the mathematical 
requirements of a model and employing a com¬ 
putational implementation. Finding a model 
that satisfies both criteria is not trivial even 
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though the markets are assumed to be complete. 
We specifically use the word "efficiently" to im¬ 
plicitly indicate that a model can be simulated 
by a recombination tree for American-style 
options. 

Dynamic term structure models represent 
a highly developed condition where finance, 
mathematics, and computation come together. 
As opposed to the case with static term struc¬ 
ture models where the term structure ap¬ 
pears explicitly, for dynamic models the term 
structure of interest rates is usually implicitly 
embedded in models that engage in represent¬ 
ing risk/value relative to current conditions for 
lending and borrowing over the spectrum of 
terms available in the market. Preclusion of ar¬ 
bitrage is fundamental for these models. We 
have shown two approaches to dynamic term 
structure models, one depending on a repre¬ 
sentation through the spot rate, the other de¬ 
pending on a representation through implied 
forward rates. In each case the relationship 
between the objective and risk-neutral world 
(measure) has been exploited to ensure coher¬ 
ence between underlying asset prices and any 
resulting derivative. Here, the value of the as¬ 
set and the derivative each depend on a repre¬ 
sentation of the same determining condition of 
interest rates. 


KEY POINTS 

* Dynamic term structure models of interest 
rates readily admit uncertainty in valuation/ 
risk analyses requiring a characterization of 
future market scenarios. 

* In building dynamic term structure models it 
is important that equilibrium, in an arbitrage- 
free sense, is represented and that variations 
from the equilibrium may be represented in 
an appropriate, probabilistic sense through a 
choice of stochastic processes and probability 
measures. 


• Two approaches in explaining the future 
course of interest rates embody the short-rate 
model or an evolution of forward rates. 

• Common methods for analyzing fixed- 
income/ interest-rate instruments include for¬ 
mulation through a risk-neutral measure or 
by maintaining a real-world (objective) prob¬ 
ability measure. Each has its own merit. 

• The market price of risk is the key link be¬ 
tween the risk-neutral and objective proba¬ 
bility measures. 
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Abstract: Models of the term structure of interest rates have become increasingly important in financial 
modeling. However, the understanding of these models by practitioners has not always kept pace 
with the breadth of the application of these models. In particular, misinterpretation of the proper 
uses of a particular model can lead to significant errors. The confusion regarding these models has 
arisen because of the overuse and misuse of the term "arbitrage-free." 


In this entry, we attempt to clear up some of 
the most commonly misconstrued aspects of 
interest rate models: the choice between an 
arbitrage-free or equilibrium model, and the 
choice between risk neutral or realistic parame- 
terizations of a model. These two dimensions 
define four classes of model forms, each of 
which has its own proper use. 

Much of the confusion has arisen from 
overuse and misuse of the term "arbitrage- 
free." Virtually all finance practitioners believe 
that market participants quickly take advan¬ 
tage of any opportunities for risk-free arbitrage 
among financial assets, so that these oppor¬ 
tunities do not exist for long; thus, the term 
"arbitrage-free" sounds as if it would be a good 
characteristic for any model to have. Simply 
based on these positive connotations, it almost 
seems hard to believe that anyone would not 
want their model to be arbitrage-free. Briefly, 


in the world of finance this expression has the 
associations of motherhood and apple pie. 

Unfortunately, this has led some users (and 
even builders) of interest rate models to link un¬ 
critically the expression "arbitrage-free" with 
the adjective "good." One objective of this 
entry is to show that arbitrage-free models 
are not appropriate for all purposes. Further, 
we show that just because a model uses the 
arbitrage-free approach does not mean that it 
is necessarily good, even for the purposes for 
which arbitrage-free models are appropriately 
used. 

Another common confusion ensues from im¬ 
plicitly equating the terms "arbitrage-free" and 
"risk neutral." This arises partly from the fact 
that, in the academic and practitioner literature, 
there have been very few papers that have ap¬ 
plied the arbitrage-free technique to a model 
that was not in risk-neutral form. We explain 
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the reason for this below. The natural result is 
that the terms have sometimes been used inter¬ 
changeably. In addition, since quantitative risk 
management is a relatively new concept to the 
finance community, most well-known papers 
have focused only on the application of inter¬ 
est rate models to simple valuation and hedging 
problems. These have not required either the re¬ 
alistic or equilibrium approaches to modeling. 
This lack of published work has led to a mis¬ 
taken belief that an arbitrage-free, risk-neutral 
model is the only valid kind of term structure 
model. In this entry, we intend to dispel that 
notion. 


CATEGORIZATION OF 
APPROACHES TO TERM 
STRUCTURE MODELING 
Arbitrage-Free Modeling 

Arbitrage-free models take certain market 
prices as given and adjust model parameters 
in order to fit the prices exactly. Despite be¬ 
ing called "term structure" models, they do not 
in reality attempt to emulate the dynamics of 
the term structure. Instead, they assume some 
computationally convenient, but essentially ar¬ 
bitrary, random process underlying the yield 
curve, and then add time-dependent constants 
to the drift (mean) and volatility (standard de¬ 
viation) of the process until all market prices are 
matched. To achieve this exact fit, they require 
at least one parameter for every market price 
used as an input to the model. 

For valuation, it is possible to produce rea¬ 
sonable current prices for many assets with¬ 
out having a realistic term structure model, by 
using arbitrage-free models for interpolation 
among existing prices. To this end, the trading 
models used by most dealers in the over-the- 
counter derivatives market employ enormous 
numbers of time-dependent parameters. These 
achieve an exact fit to prices of assets in partic¬ 
ular classes, without regard to any differences 


between the behaviors of the models and the 
actual behavior of the term structure over time. 
Placed in terms of a physical analogy, the dis¬ 
tinction here is between creating a robot based 
on a photograph of an animal, and creating a 
robot based on multiple observations of the an¬ 
imal through time. While the robot produced 
using only the photograph may look like the 
animal, only the robot built based on behav¬ 
ioral observations will act like the animal. An 
arbitrage-free model is like the former robot, 
constructed with reference to only a single point 
in time; that is, a snapshot of the fixed-income 
marketplace. 

Equilibrium Modeling 

In contrast to arbitrage-free models, equilib¬ 
rium term structure models are truly models 
of the term structure process. Rather than inter¬ 
polating among prices at one particular point 
in time, they attempt to capture the behaviors 
of the term structure over time. An equilibrium 
model employs a statistical approach, assum¬ 
ing that market prices are observed with some 
statistical error, so that the term structure must 
be estimated, rather than taken as given. Equi¬ 
librium models do not exactly match market 
prices at the time of estimation, because they 
use a small set of state variables (fundamen¬ 
tal components of the interest rate process) to 
describe the term structure. Extant equilibrium 
models do not contain time-dependent parame¬ 
ters; instead they contain a small number of sta¬ 
tistically estimated constant parameters, drawn 
from the historical time series of the yield 
curve. 

Risk-Neutral Probabilities: 

The Derivative Pricing 
Probability Measure 

When we create a model for pricing interest rate 
derivatives, the "underlying" is not the price of 
a traded security, as it would be in a model for 
equity options. Instead, we specify a random 
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process for the instantaneous, risk-free spot 
interest rate, the rate payable on an investment 
in default-free government bonds for a very 
short period of time. For convenience, we call 
this interest rate "the short rate." Financial mod¬ 
elers have chosen to create models around the 
short rate because it is the only truly riskless in¬ 
terest rate in financial markets. An investment 
in default-free bonds for any noninstantaneous 
period of time carries market risk, the chance 
that the short rate will rise during the term of 
the investment, leading to a decline in the in¬ 
vestment's value. 

As with any risky investment, an investor in 
bonds subject to market risk expects to earn a 
risk-free return (that is, the return from contin¬ 
uously investing at the short rate, whatever that 
may be) plus a risk premium, which could in¬ 
crease or decrease as the term of the investment 
increases. Thus, the spot rate for a particular 
term is composed of the return expected un¬ 
der the random process for the short rate up 
to the end of that term, plus a term premium, 
an additional return to compensate the investor 
for the interest rate risk of the investment. The 
term premium offered in the market depends 
on the aggregate risk preference of market 
participants, taking into account their natural 
preferences for securities that conform to their 
investment (term) needs. 

Let r t be the short rate at time t. Let D(f, T) be 
the price, at time f, of a discount bond paying 
one dollar at time T. Let s(f, T) be the spot rate 
at time t for the term (T—t). Finally, let <p(T— t) 
be the term premium (expressed as an annual 
excess rate of return) required by investors for 
a term of (T—t). All rates are continuously com¬ 
pounded. We can then write. 


D(t, T) = 


e s(t,T)x(T—f) £ (f>(T—t)x(T—t) 


oh r s ds 


(1) 


The second term in the two-term expression 
above is a discount factor that reflects the ex¬ 
pected return from investing continuously at 


the short rate for the term (T-t). The first term 
is the additional discount factor that accounts 
for the return premium that investors require to 
compensate them for the market risk of invest¬ 
ing for a term of (T-t). The use of an integral 
in the expression for the expected short rate 
discount factor is necessary because the short 
rate is continuously changing over the bond's 
term. 

From this description and formula, it may 
seem necessary to know the term premium for 
every possible term, in addition to knowing 
the random process for the short rate, in or¬ 
der to value a default-free discount bond. This 
is not the case, however. As in the pricing of a 
forward contract or option on a stock, we can 
use the mathematical sleight-of-hand known as 
risk-neutral valuation to find the relative value 
of a security that is derivative of the short 
rate. 

The principle of risk-neutral valuation as it 
applies to bonds and other interest rate deriva¬ 
tives is that, regardless of how risk averse in¬ 
vestors are, we can identify a set of spot rates 
that value discount bonds correctly relative to 
the rest of the market. We do not have to iden¬ 
tify separately the term premium embedded in 
each spot rate in order to use it to discount fu¬ 
ture cash flows. This fact can be used to make 
the valuation of all interest rate derivatives eas¬ 
ier by risk adjusting the term structure model; 
that is, by changing the probability distribution 
of the short rate so that the spot rate of every 
term is, under the new model, equal to the ex¬ 
pected return from investing at the short rate 
over the same term. This is accomplished by re¬ 
defining the model so that, instead of being a 
random process for the short rate, it is a random 
process for the short rate plus a function of the 
term premium. If we specify the process for r* 
in such a way that 

r* =r s + 4>(s -t) + <p'(s - t) x (s - t) (2) 

at every future point in time s (accomplished by 
adjusting the rate of increase of r t upward) then 
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we can write, 

D ^’ T ) = e s(t,T)x(T-t) 


- 


~ 

1 

£ 

i 

r T 


r T 

/ (r s + <p(T - t))ds 

-eJt 


/ rfds 

.eJt 


( 3 ) 


By transforming the short rate process in this 
manner, we have created a process for a random 
variable which, when used to discount a cer¬ 
tain future cash flow, gives an expected present 
value equal to the present value obtained by 
discounting that cash flow at the appropriate 
spot rate. It is important to note that this ran¬ 
dom variable is no longer the short rate, but 
something artificial that we might refer to as 
the risk-adjusted short rate} 

The resulting risk-neutral model might be con¬ 
strued as a model for the true behavior of the 
short rate in an imaginary world of risk-neutral 
market participants, where there is no extra ex¬ 
pected return to compensate investors for the 
extra price risk in bonds of longer maturity. This 
impression, while accurate, is not very informa¬ 
tive. The important aspect of the risk-neutral 
model is that the term premiums, whatever 
their values, that exist in the marketplace are 
embedded in the interest rate process itself, so 
that the expected discounted value of a cash 
flow at the risk-adjusted short rate is equal to 
the discounted value of the cash flow at the spot 
rate. 2 

The value of the risk-neutral probability measure 
is that, under this parameterization, an interest- 
sensitive instrument's price can be estimated by 
averaging the present values of its cash flows, 
discounted at the short-term interest rates along 
each path of the short rate under which those 
cash flows occur. In contrast, valuing assets un¬ 
der the model before it was risk adjusted would 
require a more complicated discounting proce¬ 
dure that applied additional discount factors to 
the short rate paths to compensate for market 
risk; however, the price obtained under both 
approaches would be the same. For this rea¬ 


son, we use randomly generated scenarios from 
risk-neutral interest rate models for pricing. 

To sum up, there is nothing magical about risk 
neutrality. There are any number of changes of 
variables we could make to a short rate process 
that would retain the structure of the model, but 
have a different (but equivalent) probability dis¬ 
tribution for the new variable. We could change 
the measure to represent imaginary worlds in 
which market participants were risk seeking 
(negative term premiums), or more risk averse 
than in the real world; regardless, as long as we 
structured the discounting procedure properly 
we would always determine the same model 
price for an interest rate derivative. The specific 
change of variables that produces a risk-neutral 
model simply makes the algebra easier than the 
others, because one can ignore risk preferences. 


Realistic Probabilities: 

The Estimated Market 
Probability Measure 

We have described why risk-neutral interest 
rate scenarios are preferred for pricing bonds 
and interest rate derivatives. However, it is 
important to note that risk-neutral scenarios 
are not appropriate for all purposes. For ex¬ 
ample, for scenario-based evaluation of port¬ 
folio strategies, realistic simulation is needed. 
And a computerized system for stress testing 
asset/liability strategies under adverse move¬ 
ments in interest rates is to actuaries what a 
wind tunnel is to aerospace engineers. The rel¬ 
evance of the information provided by the test¬ 
ing depends completely on the realism of the 
simulated environment. Stated differently, the 
test environment must be like the real environ¬ 
ment; if not, the test results are not useful. 

The realistic term structure process desired 
for this kind of stress testing must be distin¬ 
guished from the risk-neutral term structure 
process used for pricing. The risk-neutral pro¬ 
cess generates scenarios in which all term pre¬ 
miums are zero. This process lacks realism; in 
the real world, term premiums are clearly not 
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zero, as evidenced by the fact that the implied 
spot curve from Treasuries has been predom¬ 
inantly upward sloping. This predominantly 
upward slope reflects an expected return pre¬ 
mium for bonds of longer maturity, although 
at times other configurations of buyer prefer¬ 
ences can be inferred; for example, an inverted 
curve suggests that buyers demand an increas¬ 
ing premium for decreasing the term of their 
positions. 

Thus, the user of an interest rate model must 
be careful. When generating scenarios for re¬ 
serve adequacy testing, where the purpose is 
to examine the effect on a company's balance 
sheet of changes in the real (risk-averse) world, 
the user must not use the scenarios from a risk- 
neutral interest rate model. 


WHEN DO I USE EACH OF 
THE MODELING 
APPROACHES? 

The two dimensions, risk-neutral versus realis¬ 
tic and arbitrage-free versus equilibrium, define 
four classes of modeling approaches. Each has 
its appropriate use. 

Risk-Neutral and Arbitrage-Free 
Model 

The risk-neutral and arbitrage-free model is the 
most familiar form of an interest rate model for 
most analysts. The model has been risk adjusted 
to use for pricing interest rate derivatives, and 
its parameters have been interpolated from a 
set of current market prices rather than being 
statistically estimated from historical data. It is 
appropriately used for current pricing when the 
set of market prices is complete and reliable. 

It is worth noting that, just because two mod¬ 
els are each both risk neutral and arbitrage- 
free, we cannot conclude that they will give the 
same price for a particular interest rate deriva¬ 
tive. Two arbitrage-free models will produce the 
same prices only for the instruments in a subset 


common to both sets of input data. The form 
of the model, and particularly the number of 
random factors underlying the term structure 
process, can make a large difference to valua¬ 
tions of the other instruments. 

When the market data are sparse, the behavior 
of the model becomes important. For example, 
the value of a Bermudan or American swap¬ 
tion depends on the correlations among rates 
of different maturities. The swaption market is 
not liquid, nor are its prices widely dissemi¬ 
nated, so there is no way to estimate a "term 
structure of correlations" that would allow a 
simple arbitrage-free model to interpolate rea¬ 
sonable swaption prices. In this case, a multi¬ 
factor model that captures the nature of cor¬ 
relations among rates of different maturities, 
including the way that those correlations are 
influenced by the shape of the term structure, 
will perform better for pricing swaptions than 
will a one-factor model. Models with good sta¬ 
tistical fit to historical correlation series are 
needed for Bermudan or American options on 
floating-rate notes, caps, and floors for the same 
reason. Model behavior is also important for 
long-dated caps and floors, where there is a lack 
of reliable data for estimating the "term struc¬ 
ture of volatilities" beyond the 5-year tenor. 

Risk-Neutral and Equilibrium 

There are a number of sources of "error" in 
quotations of the market prices of bonds, so 
that the discount rates that exactly match a set 
of price quotations may contain bond-specific 
effects, corrupting the pricing of other instru¬ 
ments. These sources, defined as any effects 
on a bond's market price apart from the dis¬ 
count rates applying to all market instruments, 
include differences in liquidity, differential tax 
effects, bid-ask spreads (the bid-ask spread de¬ 
fines a range of possible market prices, imply¬ 
ing a range of possible discount rates), quota¬ 
tion stickiness, timeliness of data, the human 
element of the data collection and reporting 
process, and market imperfections. 
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Since arbitrage-free models accept all input 
prices as given, without reference to their rea¬ 
sonability or comparability to other prices in the 
input data, they impound in the pricing model 
any bond-specific effects. In contrast, equilib¬ 
rium models capture the global behavior of the 
term structure over time, so security-specific ef¬ 
fects are treated in the appropriate way, as noise. 
For this reason, risk-neutral equilibrium models 
can have an advantage over arbitrage-free mod¬ 
els in that equilibrium models are not overly 
sensitive to outliers. Also, for current pricing (as 
distinguished from horizon pricing, described 
below), equilibrium models can be estimated 
from historical data when current market prices 
are sparse. Thus, a risk-neutral and equilibrium 
model can be used for pricing when the current 
market prices are unreliable or unavailable. 

For most standard instruments, circum¬ 
stances rarely prevail such that the current mar¬ 
ket prices needed for estimating an arbitrage- 
free model are not available. Flowever, such 
circumstances always prevail for horizon pric¬ 
ing, where the analyst calculates a price for an 
instrument in some assumed future state of the 
market. Since arbitrage-free models require a 
full set of market prices as input, arbitrage-free 
models are useless for horizon pricing, the fu¬ 
ture prices being unknown. Thus, the horizon 
prices obtained under the different values of 
the state variables in an equilibrium model pro¬ 
vide an analytical capability that arbitrage-free 
models lack. 

USING MODELS OF 
BORROWER BEHAVIOR 
WITH A RISK-NEUTRAL 
INTEREST RATE MODEL 

Often, an interest rate model is not enough to 
determine the value of a fixed-income security 
or interest rate derivative. To value mortgage- 
backed securities or collateralized mortgage 
obligations (CMOs), one also needs a prepay¬ 
ment model. To value bonds or interest rate 
derivatives with significant credit risk, one 


needs a model of default and recovery. To value 
interest-sensitive annuities and insurance liabil¬ 
ities, one needs models of lapse and other pol¬ 
icyholder behaviors. In all of these behavioral 
models, the levels of certain interest rates are 
important explanatory variates, meaning that, 
for example, the prepayment speeds in a CMO 
valuation system are driven primarily by the 
interest rate scenarios. 

Common practice has been to estimate pa¬ 
rameters for prepayment, default, and lapse 
models using regression on historical data 
about interest rates and other variables. Then, 
in the valuation process, the analyst uses the in¬ 
terest rates from a set of risk-neutral scenarios 
to derive estimates for the rates of prepayment, 
default, or lapse along those scenarios. This bor¬ 
rower behavior information is combined with 
the interest rates to produce cash flows and, 
ultimately, prices. Unfortunately, this practice 
leads to highly misleading results. 

The primary problem here is that the regres¬ 
sions have been estimated using historical data, 
reflecting the real probability distributions of 
borrower behavior, and then used with scenar¬ 
ios from a risk-neutral model, with an artificial 
probability distribution. The risk-neutral model 
is not a process for the short rate; rather, it is a 
process for the risk-adjusted short rate. Since 
the real world is risk averse, the risk-adjusted 
short rate usually has an expected value much 
higher than the market's forecast of the short 
rate; the extra premium for interest rate risk per¬ 
mits one to value optionable default-free bonds 
by reference to the forward rate curve. 

The same procedure can be applied to cor¬ 
porate bonds. Corporate bonds are exposed to 
default risk in addition to interest rate risk. One 
may construct a behavioral model of failure 
to pay based on historical data about default 
rates and recovery, perhaps using bond ratings 
as explanatory variates in addition to interest 
rates. One can then attempt to compute the 
present value of a corporate bond by finding the 
expected value of the discounted cash flows 
from the two models in combination: a risk- 
neutral model of the Treasury curve, and a 
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realistic model of default behavior as a function 
of interest rates and other variables. Because 
the cash flows of the bond, adjusted for default, 
will be less than the cash flows for a default-free 
bond, the model will price the corporate bond 
at a positive spread over the Treasury curve. 

This spread will almost certainly be sub¬ 
stantially too low in comparison to the cor¬ 
porate's market price. The reason for this is 
that, just as investors demand a return pre¬ 
mium for interest rate risk, they demand an 
additional return for default risk. The appli¬ 
cation of an econometrically estimated model 
of default to pricing has ignored the default 
risk premium encapsulated in the prices of cor¬ 
porate bonds. Market practice has evolved a 
simple solution to this; one adjusts the default 
model to fit (statistically, in the equilibrium 
case; exactly, in the arbitrage-free case) the cur¬ 
rent prices of active corporates in the appro¬ 
priate rating class. By using the market prices 
of active corporates to embed the default risk 
premium in the model, the analyst is really ap¬ 
plying the principle of risk-neutral valuation to 
the default rate. The combined model of risk- 
adjusted interest rates and risk-adjusted default 
rates now discounts using the corporate bond 
spot rate curve instead of the Treasury spot 
curve. 

The same technique of risk neutralizing a 
model by embedding information about risk 
premiums derived from current market prices 
can be applied to prepayment models as well. 
The results of a prepayment model can be 
risk adjusted by examining the prices of active 
mortgage-backed securities. Unfortunately, one 
can only guess at the appropriate expected re¬ 
turn premium for insurance policy lapse risk 
or mortality risk. Nevertheless, these quanti¬ 
ties should be used to "risk neutralize" these 
models of behavior to the extent practical. The 
integrity of risk-neutral valuation depends on 
risk adjusting all variables modeled; otherwise, 
model prices will be consistently overstated. 

A final note can be made in this regard about 
option-adjusted spread (OAS). OAS can be un¬ 
derstood in this context as a crude method to 


risk adjust the pricing system to reflect all risk 
factors not explicitly modeled. 

Realistic and Arbitrage-Free 

A realistic, arbitrage-free model starts by exactly 
matching the term structure of interest rates 
implied by a set of market prices on an initial 
date, then evolves that curve into the future 
according to the realistic probability measure. 
This form of a model is useful for producing 
scenarios for evaluation of hedges or portfolio 
strategies, where it is important that the initial 
curve in each scenario exactly matches current 
market prices. The difficulty with such an ap¬ 
proach lies in the estimation; realistic, arbitrage- 
free models are affected by confounding, where 
it is impossible to discriminate between model 
misspecification error and the term premiums. 
Since the model parameters have been set to 
match market prices exactly, without regard to 
historical behavior, too few degrees of freedom 
remain to estimate both the term premiums and 
an error term. Unless the model perfectly de¬ 
scribes the true term structure process (that is, 
the time-dependent parameters make the resid¬ 
ual pricing error zero at all past and future 
dates, not just on the date of estimation), the 
term premiums cannot be determined. The re¬ 
sult is that realistic, arbitrage-free models are 
not of practical use. 

Realistic and Equilibrium 

Since the arbitrage-free form of a realistic model 
is not available, the equilibrium form must be 
used for stress testing, Value-at-Risk (VAR) cal¬ 
culations, reserve and asset adequacy testing, 
and other uses of realistic scenarios. 

Some analysts express concern that, because 
the predicted initial curve under the equilib¬ 
rium model does not perfectly match observed 
market prices, then the results of scenario test¬ 
ing will be invalid. However, the use of an 
equilibrium form does not require that the pre¬ 
dictions be used instead of the current mar¬ 
ket prices as the first point in a scenario. The 
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Table 1 When to Use Each of the Model Types 


Model Classification Risk Neutral 

Realistic 

Arbitrage-free 

• Current pricing, where input data 

• Unusable, since term premium cannot 


(market prices) are reliable 

be reliably estimated 

Equilibrium 

• Current pricing, where inputs (market • Stress testing 


prices) are unreliable or unavailable 
• Horizon pricing 

• Reserve and asset adequacy testing 

Table 2 Four Forms of the Black-Karasinski Model 

Model 

Risk 


Classification 

Neutral 

Realistic 

Arbitrage-free 

du = ic(t) (9(t) — u) dt + a(t) dz 

du = K(t) ( 9(t) — X(u,t) — u) dt + cr(t) dz 


• uo and 9(t) matched to bond prices 

• uo and 9(t) matched to bond prices 


• /c(f) and cr(f) matched to cap or option prices 

• K(t) and o(t) matched to cap or option prices 

• X(u,t) cannot be reliably estimated 

Equilibrium 

du = k(9 — u) dt + a dz 

du = k(9 — X(u) — u) dt + a dz 


• uo statistically fit to bond prices 

• uo statistically fit to bond prices 


• k,0,o historically estimated 

• k, 9, a, X(u) historically estimated 


scenarios can contain the observed curve at the 
initial date and the conditional predictions at 
future dates. This does not introduce inconsis¬ 
tency, because the equilibrium model is a statis¬ 
tical model of term structure behavior; by tak¬ 
ing this approach we explicitly recognize that 
its predictions will deviate from observed val¬ 
ues by some error. In contrast, the use of an 
arbitrage-free, realistic model implicitly assumes 
that the model used for the term structure pro¬ 
cess is absolutely correct. 

Summary of the Four 
Essential Classes 

Table 1 summarizes the uses of the four Es¬ 
sential Classes of interest rate models. Table 2 
shows the mathematical form of a commonly 
used interest rate model, disseminated by Black 
and Karasinski (1991), under each of the mod¬ 
eling approaches and probability measures. In 
each equation, u is the natural logarithm of the 
short rate. 

In the above models, a is the instantaneous 
volatility of the short rate process, k is the rate of 
mean reversion, 9 is the mean level to which the 
natural logarithm of the short rate is reverting, 
and X represents the term premium demanded 


by the market for holding bonds of longer ma¬ 
turity. The value of the state variable u at the 
time of estimation is represented by Uq. 

The realistic model forms can be distin¬ 
guished from the risk-neutral forms by the 
presence of the term premium function X. The 
difference between the arbitrage-free forms and 
the equilibrium forms can be discerned in that 
the parameters of the arbitrage-free forms are 
functions of time. 

KEY POINTS 

• Models of the term structure of interest rates 
are important in financial modeling. 

• The most commonly misconstrued aspects of 
interest rate models are important to under¬ 
stand to make the correct choice between an 
arbitrage-free or equilibrium model, and the 
correct choice between risk-neutral or realis¬ 
tic parameterizations of a model. 

• A common confusion is the result of implicitly 
equating the terms "arbitrage-free" and "risk 
neutral." 

• Arbitrage-free models take certain market 
prices as given and adjust model parameters 
in order to fit the prices exactly. 
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* EqLiilibrium term structure models are truly 
models of the term structure process because 
rather than interpolating among prices at one 
particular point in time, they attempt to cap¬ 
ture the behaviors of the term structure over 
time. 

* The principle of risk-neutral valuation as 
it applies to bonds and other interest rate 
derivatives is that, regardless of how risk 
averse investors are, a set of spot rates that 
value discount bonds correctly relative to the 
rest of the market can be identified. 

* The two dimensions, risk-neutral versus re¬ 
alistic and arbitrage-free versus equilibrium, 
define four classes of modeling approaches. 

* The risk-neutral and arbitrage-free model is 
appropriately used for current pricing when 
the set of market prices is complete and 
reliable. 

* Because equilibrium models capture the 
global behavior of the term structure over 
time, so security-specific effects are treated as 
noise, a risk-neutral and equilibrium model 
can be used for pricing when the current mar¬ 
ket prices are unreliable or unavailable. 

* For several reasons, realistic, arbitrage-free 
models are not of practical use. 

NOTES 

1. This is not the way that risk neutrality is 
usually presented. Typically, writers have 
focused on the stochastic calculus, using 
Girsanov's theorem to justify a change of 


probability measure to an equivalent (i.e., an 
event has zero probability under one mea¬ 
sure if and only if it has zero probability 
under the other measure) martingale mea¬ 
sure. This complexity and terminology can 
obscure the simple intuition that we are mak¬ 
ing a change of variables in order to restate 
the problem in a more easily solvable form. 
For this approach to explaining risk neutral 
valuation, see Courtadon (1982) or Harrison 
and Pliska (1981). 

2. Note that this is not the same as the expecta¬ 
tions hypothesis of the term structure, which 
holds that the term structure's shape is de¬ 
termined solely by the market's expectations 
about future rates. The expectations hypoth¬ 
esis is a theory of the real term structure pro¬ 
cess, whereas the risk-neutral approach is an 
analytical convenience that takes no position 
about the truth or falsity of any term struc¬ 
ture theory. 
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Abstract: Interest rates are commonly modeled using stochastic differential equations. One-factor 
models use a stochastic differential equation to represent the short rate and two-factor models use 
a stochastic differential equation for both the short rate and the long rate. The stochastic differential 
equations used to model interest rates must capture some of the market properties of interest rates 
such as mean reversion and/or a volatility that depends on the level of interest rates. There are two 
distinct approaches used to implement the stochastic differential equations into a term structure 
model: equilibrium and no arbitrage. 


In modeling the behavior of interest rates, 
stochastic differential equations (SDEs) are com¬ 
monly used. The SDEs used to model interest 
rates must capture some of the market prop¬ 
erties of interest rates such as mean reversion 
and/or a volatility that depends on the level of 
interest rates. For a one-factor model, the SDE 
is used to model the behavior of the short¬ 
term rate, referred to simply as the "short rate." 
The addition of another factor (i.e., a two-factor 
model) involves extending the SDE to represent 
the behavior of the short rate and a long-term 
rate (i.e., long rate). 

There are two distinct approaches used to im¬ 
plement the SDEs into a term structure model: 


equilibrium and no arbitrage. Each can be used 
to value bonds and interest rate contingent 
claims. Both approaches start with the same 
SDEs but apply the SDE under a different 
framework to price securities. 

Equilibrium models such as those developed by 
Vasicek (1977), Cox, Ingersoll, and Ross (1985), 
Longstaff (1989,1992), Longstaff and Schwartz 
(1992), and Brennan and Schwartz (1979, 1982) 
all start with an SDE model and develop pric¬ 
ing mechanisms for bonds under an equilib¬ 
rium framework. The actual implementation 
may vary depending on the model. Vasicek and 
Cox, Ingersoll, and Ross (CIR) develop analytic 
pricing expressions while Backus, Foresi, and 
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Telmer (2001) present econometric and recur¬ 
sive approaches to implement the equilibrium 
models. Brennan and Schwartz use a finite dif¬ 
ference scheme that approximates a partial dif¬ 
ferential equation. 

No arbitrage models such as Black and 
Karasinski, (1991), Black, Derman, and Toy 
(1990), Ho and Lee (1986), Heath, Jarrow, and 
Morton (1992), and Hull and White (1990,1993) 
begin with the same or similar SDE models as 
the equilibrium approach but use market prices 
to generate an interest rate lattice. The lattice 
represents the short rate in such a way as to en¬ 
sure there is a no arbitrage relationship between 
the market and the model. The numerical ap¬ 
proach used to generate the lattice will depend 
on the SDE model(s) being used to represent 
interest rates. 

No arbitrage models are the preferred frame¬ 
work to value interest rate derivatives. This is 
because they minimally ensure that the market 
prices for bonds are exact. Equilibrium models 
will not price bonds exactly, and this can have 
tremendous effects on the corresponding con¬ 
tingent claims. No arbitrage lattices also allow 
for a systematic valuation approach to almost 
all interest rate securities. 

Three general SDE functional forms are con¬ 
sidered in this entry. The first is the Hull-White 
(HW) model. The HW model is a more general 
version of the Ho and Lee (HL) 1 approach except 
that it allows for mean reversion. Implementing 
the HW in a binomial framework removes a de¬ 
gree of freedom, and in this case the HW model 
collapses to the HL model if a constant time 
step is retained. The second model we consider 
is the Black-Karasinski (BK) model. The BK model 
is a more general form of the Kalotay, Williams, 
and Fabozzi (KWF) model. 2 The BK model (like 
the HW model) in the binomial setting does not 
have enough degrees of freedom to be properly 
modeled and so the time step must be allowed 
to vary. The third is the Black, Derman, and 
Toy model. 

We implement the HW and BK trinomial 
models using the Hull and White approach. 


Within the trinomial setting the time step re¬ 
mains constant and mean reversion can be ex¬ 
plicitly incorporated. We discuss the SDEs, the 
properties of the SDEs, the numerical solutions 
to the SDEs, and the binomial and trinomial in¬ 
terest rate lattices for these models. 

The focus of our presentation is on the end 
user and developer of interest rate models. 
We will highlight some significant differences 
across models. Most of these are due to the dif¬ 
ferent distributions that underlie the models. 
This is done to emphasize the need to calibrate 
all models to the market prior to their use. By 
calibrating the models to the market we reduce 
the effects of the distributional differences and 
ensure a higher level of consistency in the met¬ 
rics produced by the models. 

The outline of this entry is as follows. In the 
next section we present the SDEs and some of 
their mathematical properties. We also use the 
mathematics to highlight properties of the short 
rate. We then develop the methodology used 
to implement our approach in both the bino¬ 
mial and trinomial frameworks. A comparison 
of some numerical results across the different 
models including some interest rate risk and 
valuation metrics is then presented. 


THE GENERAL MODELS FOR 
THE SHORT RATE 

The models considered in this entry take the 
form of the following one-factor SDE: 

df(r(t)) = [0(f) + p(t)g(r(t))]dt + a(r(t), t)dz 

( 1 ) 

where/ and g are suitably chosen functions, 9 is 
determined by the market, and p can be chosen 
by the user of the model or dictated by the mar¬ 
ket. We will show that 9 is the drift of the short 
rate and p is the tendency to an equilibrium 
short rate. The term a is the local volatility of 
the short rate. The term dz = eVdt arises from 
a normally distributed Wiener process, since 
e ~ N(0,1), where N(0,1) is the normal 
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distribution with mean 0 and standard devi¬ 
ation of 1. This means that the term a(r(t), t)dz 
has an average or expected value of 0. 

Equation (1) has two components. The first 
component is the expected or average change 
in rates over a small period of time, dt. This is 
the component where certain characteristics of 
interest rates, such as mean reversion, are in¬ 
corporated. The second component is the un¬ 
known or the risk term since it contains the 
random term. This term dictates the distribu¬ 
tion characteristics of interest rates. Depending 
on the model, interest rates are either normally 
or lognormally distributed. 

The Ho-Lee Model 

In the HL model or process f(r) — r, g(r) = 0, 
and p — 0 in equation (1). The HL process is, 
therefore, given by 

dr = 6dt + adz (2) 

Since z is a normally distributed Wiener pro¬ 
cess, we say the HL process is a normal process 
for the short rate. The solution to equation (2), 
assuming r(0) = To is given by 

t t 

r(t) = r 0 + / Ods T j* a dz (3a) 

o t 

where the integral involving a is a stochastic 
integral. If 9 is constant this can be expressed as 

f 

r(t) — r 0 + 6t + J adz (3b) 

o 

Equation (3b) shows that the HL process 
models an interest rate that can change pro¬ 
portionally with time f through the constant of 
proportionality, 9, and a random disturbance 
determined by cr. That is, the larger 0 is in 
magnitude, the larger the average change in 
the short rate over time. This is why 9 is called 
the "drift in the short rate." Also, the smaller 
9 is, the larger the influence of the random 
disturbance. The short rate can be negative in 


the HL process. This is a shortcoming of the 
model. Hull (2000) shows that 9 is related to 
the slope of the term structure. 

To obtain a numerical approximation for 
equation (2) we approximate equation (2) by 
using equations (3a) and (3b). Letting f/ c = kr 
and r k mr(k r) gives 

r\ c+i - r k = 9 k r + a k Az k 

or 

r k +i =r k + 6 k r + a k Az k (4) 

where Az k is a numerical (discrete) approxi¬ 
mation to dz. Since dz = sVdt, we can further 
approximate equation (4) by 

r k+ i =r k + 9 k r + a k e k Jx (5) 

where e k is a random number given by a normal 
distribution N(0,1). Equation (5) is the form of 
the expression that is used for r k+ \ to build the 
HL binomial tree. 

We first consider the solution to equation (5) 
without the stochastic term when 9 is constant. 
Equation (5) under these requirements is 


r> c+i =r k + x9 

(6a) 

and the solution is given by 


r k — c +kS 

(6b) 


where c and <5 are constants. In particular, c = r$ 
and 5 = 9 r. It is seen from this last equation that 
the mean short rate in the HL process increases 
or decreases at a constant rate 9 over time de¬ 
pending on the sign of 9. As a matter of fact, 
equation (6b) shows that the short rate grows 
without bound if 9 > 0 and decreases without 
bound (i.e. becomes very negative) if 9 < 0. 

The Hull-White Model 

In the HW model or process f(r) — r, g(r) = r, 
and p = —<p. Therefore, the stochastic process 
for the HW model for the short rate is 

dr = (9 — (pr)dt + adz 


( 7 ) 
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The short rate process in the HW model is seen 
to be normal as in the HL process. We consider 
the case where the parameters 0 and (p are con¬ 
stant over time. Note that if </> = 0 the HL process 
reduces to the HW process. (The HW process 
will, therefore, be similar to the HL process if 
4> is close to 0.) We will see that the introduc¬ 
tion of 4> in the HW model is an attempt to 
incorporate mean reversion and to correct for 
the uncontrolled growth (or decline) in the HL 
model discussed later. 

Eliminating the stochastic term in equation (7) 
gives the ordinary differential equation 

dr = (6 — (pr)dt (8) 


whose solution is given by 

r{t)= e - + ce-‘ l,t (9) 

<P 

where 


c =r o - - 
<P 

If (p > 0 we see from equation (9) that 


( 10 ) 


lim r(t) = - = n 

t->oo (p 


Therefore, for positive mean reversion (</> > 0) 
the HW process will converge to the short rate, 
p. Due to this, the term p, is called the "target" 
or "long run mean rate." For negative mean 
reversion (cp < 0), the short rate grows expo¬ 
nentially over time. 

Factoring (j> in equation (7) leads to 


dr = <p(p — r)dt + adz 


and eliminating the stochastic term leads to 
dr = cp(p — r)dt 


We see that if r > p then dr is negative and r 
will decrease and if r < p then dr is positive 
and r will increase. That is, r will approach the 
target rate p. The larger (p is, the faster this ap¬ 
proach to the target rate p. This is why <p is 
called the "mean reversion" or "mean rever¬ 
sion rate." It regulates how fast the target rate 
is reached. However, it does not eliminate the 
negative rates that can occur in the HL process. 


Since the target rate p is equal to 9 /<p, we can 
solve for the drift, 0, or the mean reversion, cp. 
That is. 


0 = fup 

(U) 

e 

<P = - 

(12) 


b 


It is seen from equations (11) and (12) that 
there is a strong relationship between the drift 
and mean reversion that can be used to reach 
any desired target rate. How large the mean 
reversion should be is an important financial 
question. Equations (11) and (12) can be used 
to set target rates. Equations (9) and (10) allow 
one to determine how long it takes to reach the 
target rate. 

Approximating equation (7) gives us 

bfc+i =rk + (0k ~ <t>kTk)T + vkSkV? (13) 

If 6 and cp are constant and we eliminate the 
stochastic term, then the solution to equation 
(13) has the form 

r k = a/3 k + y 

To determine a, fi, and y we substitute this form 
for r k into equation (13) under these conditions 
and obtain that fi = (1 — cpr), y = 9/<p = q, and 
a = ro — /x. Therefore, 

r k = a(l - <pr) k + (14) 

0 

Note that if 0 < 4>r < 2 then —1 < 1 — 0r < 1 
and 

lim r k = — = /I 

k —^oo (p 

which is the same result we obtained from 
equation (9) for the HW SDE. The condition 
0 < (pr < 2 is easily maintained in modeling the 
short rate. 


The Kalotay-Williams-Fabozzi 
Model 

For the KWF process/(r) = ln(r), g(r) = 0, and 
p = 0 in equation (1). This leads to the 
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differential process 

d ln(r) = Odt + adz (15a) 

This model is directly analogous to the HL 
model. If u — In r, then we obtain the HL process 
(equation (2)) for u 

du — Odt + adz (15b) 

Because n follows a normal process, ln(r) fol¬ 
lows a normal process and so r follows a lognor¬ 
mal process. Since u follows the same process as 
the HL and HW models, u can become negative, 
but u — ln(r) and r — e“ ensuring r is always pos¬ 
itive. Therefore, the KWF model eliminates the 
problems of negative short rates that occurred 
in the HL and HW models. 

Eliminating the stochastic term in equation 
(15) we obtain 

d ln(r) = 6(t)dt 

and 


du = 9(t)dt 
From equation (3a) we have 

t 

In r(t) = u — «(0) + J 6(s)ds 
o 

since n(0) = In r(0) = In r 0 , 

t 

lnr(f) = lnr(0) + J 0(s)ds 
o 

Taking the exponential of both sides gives us 
r(t) = r 0 e / ° e(s)ds (16) 

showing that r(t) > 0 since r(0) > 0. Therefore, 
if 0(t) > 0 the short rate in the KWF process can 
grow without bound and if 0(t) < 0 the short 
rate in the KWF process can decay to 0. 

From equation (5) for the HL process the dis¬ 
crete approximation to equation (15b) is 

Mjt+i = u k + 0 k r + a k e k Jx (17a) 


and the exponential of this equation gives the 
discrete approximation to equation (15a): 


r k+ i = r k e 


(17b) 


From equation (17b) and equation (16) we see 
that the numerical approximation to equation 
(15a) has similar properties to the solution to 
the HL SDE. That is, if 9(t) > 0 the short rate 
can grow without bound and if Q(t) < 0 the 
short rate can decay to 0. 


The Black-Karasinski Model 

In the BK model we set f(r) = In r,p = —cp, and 
g(r) — In r in equation (1) to obtain the SDE 

din;' = (0 — cp In r)dt + adz (18a) 

We now work with equation (18a) using equa¬ 
tion (7) for the HW process in a manner similar 
to how we used results from the HL process to 
develop the KWF process. If we let u = In r in 
equation (18a) we obtain 

du — (0 — <pu)dt + adz (18b) 

which is the HW process for u. Again, note that 
u has all the same properties as r in the HW 
model. Since r — e" in the BK process, r > 0. 
This is the advantage the BK model has over the 
HW model. Therefore, we see that the BK pro¬ 
cess is an extension of the KWF process as the 
HW process is an extension of the HL process. 
The main difference is the BK is a lognormal 
extension of the lognormal KWF process. As a 
matter of fact, if (p = 0 the BK process reduces 
to the KWF process. Black and Karasinski intro¬ 
duced cp to control the growth of the short rate 
in the KWF process. 

From equation (9) we have 

u(t) =- + ce-V 

<t> 

and after taking exponentials 

r(t) = e u{f) = e e * +ce ~*‘ (19) 

For <p < 0 we see that r grows without bound 
and that for <p > 0 

e 

lim r(t) = e* = n 

t—>oo 

The target rate for the BK process is the expo¬ 
nential of the target rate for the HW process. 
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As in the HW process, from equation (19) (or 
equations (9) and (10)) we see that 

c = lnr 0 — (20) 

<P 

in the BK process. The closer the initial rate is 
to the target rate, the faster the BK process con¬ 
verges to the target rate. From equations (19) 
and (20) we see that if the initial short rate is 
the target rate, then r(t) = /x for all t in the BK 
process, which is analogous to the HW process. 

Given the target rate /x. we can solve for the 
drift or the mean reversion similarly to equa¬ 
tions (11) and (12) in the HW model. We have 


0 = (p ln/x (21) 


and 



( 22 ) 


We discretize u = In r in equation (18b) just as 
we did for the HW SDEs and then let r = e". 
This is analogous to how we used the HL dis¬ 
crete process to get the KWF discrete process. 
The equations corresponding to equation (13) 
are 


Uk +i = u k + ( 9 k - <p k u k )r + cr k e k y/r (23a) 

or after taking the exponential of both sides of 
equation (23a) 

V k+1 = rjfce (&-¥*lnr*)T-HW? ( 23 b) 


For constant 6 and (/> (similarly to equation (14)), 
the solution to equation (23b) after eliminating 
the stochastic term is 


r k =e 


a(l-0r) t +| 


(24) 


Note from equation (24) that 

0 

lim r k = e* = /x 

k—^oo 

for 0 < (pr < 2. This is similar to the result we 
obtained from equation (14) for the HW SDEs. 


The Black-Derman-Toy Model 

The Black-Derman-Toy (BDT) model is a log¬ 
normal model with mean reversion, but the 
mean reversion is endogenous to the model. 


The mean reversion in the BDT model is deter¬ 
mined by market conditions. 

The equation describing the interest rate dy¬ 
namics in the BDT model has f(r) = In r and 
g(r) = In r in equation (1) as in the BK model. 
Therefore, the short rate in the BDT model fol¬ 
lows the lognormal process 


tflnr + [0(f) + p(t) lnr]dt + a(t)dz 
However, in the BDT model p(t) = In <r(f) = 
7F§ g ivin g us 

dlnr = (6(t ) H-^ lnr^ dt + a(t)dz (25a) 

V n(t) ) 

Making the substitution u = In r leads to 

du = ^ Q(t) H- ^ "i~ a (i)^ z (25b) 

Notice the similarity in equations (25) and the 
equations (18) of the BK model. We expect 

n '(0 

a(t) 

to behave similarly to —<p(t) in the BK model. 
This expression should give mean reversion 
in the short rate when it is negative. That 
is, we expect that if a '(f) < 0 (implying a(t) 
is decreasing) then the BDT model will give 
mean reversion. On the other hand, when 
er'(f) > 0 (implying a(f) is increasing) the short 
rates in the BDT model will grow with no 
mean reversion. If a(t) is constant in the BDT 
model, then cr'(t) = 0 so p — 0 and equation 
(25a) becomes the KWF model (equation (15)). 
Therefore, we will only study the case of 
varying local volatility for the BDT model. 

Eliminating the stochastic term in equation 
(25) leads to 

d In r = du = (9{t) + ° u j dt 

\ o(t) J 

= ( 0 ^) + ( 26 ) 


Solving this equation for u, as we did in the KF 
and BK models, gives us 


u(t) = 


m 

ct ( 0 ) 


L 


1 

cr(s) 


ds 


a(t) 
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or 


,(0 = e (^ +/ «i^ s ) ff(f) 


tr(Qlog(r 0 ) 

e ct o 


^«/o e Mds 


or 


r(t) = r 0 e 




(27) 


Note that the BDT mean short rate depends 
on the local volatility. If the local volatility has a 
decreasing structure, then the first exponential 
term in equation (27) has a negative exponent 
and will cause a decrease in the short rate and 
vice versa if the local volatility has an increasing 
structure. It is important to note that mean re¬ 
version in the BDT model comes from the local 
volatility structure (i.e., it is endogenous). 

We now consider numerical solutions to the 
BDT process. To discretize equation (25a) for the 
BDT model we start off again by approximating 
du in equation (25b) by » to get 

Kfc+i = Uk + (Ok + PkUk)r + (TkSkVr (28) 
The exponential of equation (28) gives us 

Tk+\ = rke^ ek+Pkinri ^ z+arkSk ^ (29) 


where 



Ok 

We approximate this term by 

erjc+l — crjc 
r 

ht 

That is, we approximate cr( by a discrete approx¬ 
imation to the derivative. We now have 


u k +1 — u k + 


/ CTfc+1 — O k \ 


Ok + 

V 


r 

Ok 


u k 


r + OkSksfr 


or 


n k+ \ = + 0 k r + o k £ k Vr (30) 

Ok 

If the random term is 0 equation (30) becomes 

Uk +1 = —— Mjt + OkT (31) 

Ok 


In particular, if 

Ok+l 

- = a 

Ok 

where a is a constant then 

k-l 

u k = a k u 0 + ^2 a ’0k-j-\T 
i =o 

The exponential of this gives 


i 


ffc = r 0 e 


(a k — l)lnr 0 


J2 a’tk-j-ir 
e’=° 


This equation is interesting because In Tq < 0. If 
a > 1 then the first exponential term decreases. 
When 6 < 0 the second exponential term also 
decreases and the BDT short rate should ap¬ 
proach a target rate. Conversely, when 6 > 0 
the second exponential term increases. In this 
case we can approach a target rate or the sec¬ 
ond term can dominate. If a < 1 then a similar 
situation arises. Therefore, in order to get mean¬ 
ingful numerical results for the BDT short rates 
we strongly recommend that a be close to 1 and 
that the term structure of spot rates not have 
too large a slope. 

The analysis of the equations without the 
stochastic term presented in this section is im¬ 
portant. Recall that the characteristics of the 
random term are such that average influence of 
this term will be much smaller than the mean 
term in the SDEs. Consequently, the properties 
presented within this section will also hold un¬ 
der more general circumstances. The discrete 
approximations we developed for the models 
will be used to build the binomial and trino¬ 
mial models in the next section. Note that we 
are highlighting the difference across the mod¬ 
els and do not calibrate the models to market 
information. 

For numerical reasons, the BK and HW mod¬ 
els are best implemented in the trinomial 
framework. The HL, KWF, and BDT models 
are more easily implemented in the binomial 
framework. 3 We will discuss the specifics of this 
in the next section. For the trinomial framework 
we use the approach of Hull and White (1994). 
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BINOMIAL AND TRINOMIAL 
SOLUTIONS TO THE 
STOCHASTIC DIFFERENTIAL 
EQUATIONS 

In this section we present the binomial and tri¬ 
nomial lattice models that are obtained for the 
discretized versions of SDEs given in the pre¬ 
vious section. The binomial method models the 
short rate in a geometrically analogous man¬ 
ner as equities. 4 The up move has a probability 
q and so the down move has a probability of 
1 — cj. We use q = 0.5 within the framework 
of risk neutrality. This binomial process of two 
possible moves for the short rate in the next 
time period is then continued at each time to 
produce a binomial lattice of interest rates. 

The trinomial model is similar in spirit to 
the binomial except there are three possible 
states emanating from each node. From each 
point in time we call the upward-most move 
the "up move," the downward-most move the 
"down move," and the center move the "mid¬ 
dle move." The probabilities for an up move, 
middle move, and down move are given by q\, 
q 2 , and q 3 with q 3 + q 2 + q 3 = 1- 

Interest rate lattices should possess the prop¬ 
erty of recombination for them to be computa¬ 
tionally tractable. That is, from any given node 
in the binomial model we will require an up 
move followed by a down move to get to the 
same point as a down move followed by an up 
move. This ensures that the number of nodes in 
the binomial lattice increase by only one at each 
time step. In the trinomial case recombination 


is a little more complicated. From any node in 
the trinomial lattice an up move followed by a 
down move will get to the same node as two 
successive middle moves and as a down move 
followed by an up move. This ensures that the 
number of nodes in the trinomial lattice increase 
by only two at each time step. 

Figure 1 represents a binomial short rate lat¬ 
tice and Figure 2 represents a trinomial short 
rate lattice. The notation ryt is used to denote 
the short rate value at level j at time f/ t . In the 
binomial lattice, an up move from ryyt is given 
by rj; c+i and a down move is given by ry + i,k+i- 
At time h there are k + 1 possible values for 
the short rate in the binomial lattice. That is, j 
ranges from 1 to k + 1. In the trinomial model, 
an up move, middle move, and down move 
from the short rate ryyt are given by r^+i, r, + i j k+i, 
and fj + 2 ,k+ \, respectively. In the trinomial model 
there are 2k + 1 possible values for the short rate 
at time tk■ That is, / ranges from 1 to 27c + 1. The 
short rates forming the top of the lattice will 
be called the up state for the short rates and the 
short rates forming the bottom of the lattice will 
be called the down state for the short rates. For 
the binomial and trinomial model, the up state 
is the set of short rates ryt for 0 < k < n and 
the down state for the binomial case is the set of 
short rates pyt for 0 < k < n; within the trinomial 
tree the down state is the set of short rates r 2 k+\,k 
for 0 < k < n. 

Hull-White Binomial Lattice 

Since the H W model is a more general version of 
the HL model we present the binomial version 
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only for the HW. In the HW binomial lattice the 
expressions for Tj /k that correspond to equation 
(13) are 


Y j,k+ 1 — "b 4* k Yj,k^k T YJ k \fx k (32) 

for an up move and 

Yj+l,k+l — Yj,k "b 0 k X k (f) k Tj, k X k rr k «J~Tk (33) 


for a down move. (We are using r k for A/>.) 

These equations suggest that in order to have 
recombination the following must be true: 

2 

/ (TV ' 

41 


hc+l — h: 


CT(c 





Yk4>k+1 


(34) 

Equation (34) illustrates that if you want a 
constant time step when the local volatility is 
constant, the mean reversion must be 0. The 
recombination requirement has put the strin¬ 
gent condition on the HW binomial lattice that 
the mean reversion is determined by the local 
volatility To avoid this problem within the bi¬ 
nomial framework we must allow the time step 
to vary with k in equations (32) through (34). As 
a matter of fact, for a constant time step. 


, CTfc — CT(c + l 

<Pk+l = - 

<J k T 

which can also be solved for a k+ \ to give 


(35) 


cr/c+1 — ^(1 ( Pk+ it) (36) 


Equation (36) shows that the mean reversion 
can be used to match any given local volatility 
for a constant time step. If the local volatility is 
decreasing the mean reversion will be positive, 
and if the local volatility is increasing the mean 
reversion will be negative. We point out that if 
a variable time step is used, one does not have 
to have mean reversion match local volatility. 


sponding to equations (32) and (33) of the HW 
model and from equation (23b) are 

(37) 

for an up move and 

r j+hk+1 =r jtk e^-^ r i* ))rk -^ (38) 
for a down move. 

Using equations (37) and (38) we can develop 
equations for the BK binomial lattice that are 
identical to equations (34) and (36) for the HW 
binomial lattice. This should be expected since 
the BK SDE is just a lognormal version of the 
HW SDE. A crucial point here is that we can use 
the HW and BK models to match local volatility 
and to compare results. It is important to point 
out that the HW and BK binomial lattices have 
a constant time step. If a variable time step is 
used, then interpolation is required to give the 
short rates at the fixed time steps. We do not of¬ 
fer this framework. Instead we present the HW 
and the BK models in the trinomial framework. 

Within the binomial framework, the HW and 
BK models only approximate the distributional 
properties of their respective SDEs. The ac¬ 
curacy of the approximation is a function of 
the mean reversion. As the mean reversion in¬ 
creases, the accuracy decreases. Note that since 
the HL and KWF models have a zero mean re¬ 
version the distributional characteristics of their 
SDEs are perfectly matched within the binomial 
framework. This is the reason for using the tri¬ 
nomial method for the HW and BK models. 

The Trinomial Lattices 

A better way to keep a constant time step and to 
match the appropriate distributional properties 
is to use a trinomial lattice instead of a binomial 
lattice. If we use a trinomial lattice for the HW 
SDEs, then from equation (13) we use 


Black-Karasinski Binomial Lattice 

Since the BK model is a more general form of the 
KWF model, we only present the binomial ver¬ 
sion for the BK model. The expressions corre¬ 


Yj,k+ 1 = Tjx + e k r - (p k rj' k T + ct k o ky fx (39a) 
for an up move, 

Yj+ 2 ,/c+i = rp t + e k r - (p k rj' k T - a k a ky /r (39b) 
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for a down move, and 


H+U+i = rj'k + e k x - (p k r j, k x (39c) 

for a middle move. Similarly, if we use a trino¬ 
mial lattice for the BK SDEs then from equation 
(23b) we use 


rj k+1 = rj (te ( 0 *- | b Wr/.db+ffM-W 

(40a) 

for an up move. 


r ; - +2 ,jc+i = 

(40b) 

for a down move, and 


1 ;+l,fc+l — ' ],k£ 1 

(40c) 


for a middle move. 

Note that a constant time step is now used. 
The expression a k is used to guarantee recom¬ 
bination. The probabilities of an up, middle, 
and down move are chosen to give the correct 
variance. 


The No Arbitrage Equations 

The procedure to generate the no arbitrage 
equations for the binomial and trinomial lattices 
is outlined in the appendix. The no arbitrage 
polynomial for the short rates in the binomial 
tree is given by 

i i i 

fi = C U [ (1 + Tjj t) + E Cm+l,i n<i+^> 

7=1 m =1 n=l 

n^m 

(41) 

where, for i > 3 

i'-l i 

<*U = | | | | (1 “f" ^772,77"0 
71=0 772=1 

a 2 ,/ = bi.f—i, a j,i = + bj-i'i-i, for 

j = 3,..., z, cij+i j = and Ci,/ = 

P i+ in u ,c j+u = - q)i~ 1 a j+u for 

j = 

We solve equation (41) for 0,- by setting 
/, = 0. We then use 0; to compute r h , for 
j = 1,..., i at the zth period. The bisection 
method will converge quickly because there is 


only one root between —1 and 1 for the HW bi¬ 
nomial lattice and one root between 0 and 1 for 
the BK binomial lattice. 5 
After generating the new rates we let 

2 

bj,i = Otj+l,i | | (1 4 “ I'm,it) 

m =1 
mj=j 

For the variable time step, r, we replace the 
terms (1 + rjjr) by (1 + ry,T) Ti ' r and the terms 
(1 + r n jr) by 

(l + ViT)^ 

in equation (41). 

Similarly, the no arbitrage polynomial for the 
trinomial trees is given by 


2 i —1 2 i —1 2 f —1 

fi = Cl,i 0 (i + r j,‘ T ) +E ^772+1,2 n a + rn,iT) 

7 =1 772=1 72=1 

1 Wfhn 

(42) 


where we first let 

2/-3 

«i,i = n (1 +r jA x) 

;=i 

a 2,i — — <72&l,i-l fl 2,i-l 

+ <Jl&2,i-l fl 3,i-l 

— ^3^;-3,i-l fl ;-2,i-l, + ^2^ j-2,i-l a j-l+qtbj-ii-l 11 

for / = 4, . . . , 2z — 2, 

a 2i-l,i — ( ?3^2i-4,/-l fl 2/-3,i-l + ^2D2l-3,!-l fl 2i-2,i-l, fl 2!,! 
— ^3^2i —3,i—1^2z —2,i —1 


and then let 


Cl ,- = P i+ ifliCy; = fly, for / = 2, .. . , 2/ + 1 

We solve equation (42) for 0, by setting/)- = 0 
using the bisection method. From this the short 
rates for either the HW or BK trinomial lattices 
are determined at step i. We then let 

22-1 

fcn = J”! (1 + H',H) 

7=1 

;'#« 

for ft = 1,..., 2z — 1 and then repeat the process. 
In these derivations P,- = 1/(1 + R,r)' is the 
discount factor given by the spot rates (zero 
curve). 
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The Hull and White Lattice 

We now briefly outline the Hull and White 
methodology for generating HW and BK trino¬ 
mial lattices . 6 The Hull and White methodology 
uses 

r h k = x + (j k )Ap (43) 

for the HW trinomial lattice short rates and 

rjk = e [*+(»)Ap] (44) 

for the BK trinomial lattice short rates. 

They choose A p = a \/3r to minimize numer¬ 
ical error and introduce the mean reversion 
through the probabilities q\, l] 2 , and q 3 . Specifi¬ 
cally, they use 

1 , (jk) 2 cp 2 r 2 + (jk)(pr 

qi =6 + - 2 - 

1l=\- (h) 2< l> 2 * 2 

and 

1 . (jk) 2 (t> 2 r 2 - (jk)<pr 

‘ ?3= 6 + - 2 - 

for the up, middle, and down moves at r/jt, 
respectively, since this matches the expected 
change and variance of the short rate over the 
next time period. However, as they point out, 
these probabilities must remain positive. In or¬ 
der to do this they "prune" the upper and lower 
branches of their lattice at the level j that keeps 
these probabilities positive. Since t /2 is the only 
one that can become negative they require the 
following 

. >/6 _ 0.816 

^ 3cpr (pr 

At this maximum value of j, Hull and White ap¬ 
ply a different branching procedure with differ¬ 
ent probabilities in order to "prune" the lattice. 
However, as they point out, using this value of 
j can lead to computational problems so they 
actually use the first j satisfying 

3-V6_0.184 

^ 3(pr (pr 


This leads to a reduction in the spread of the 
rates. 


COMPARATIVE STUDY OF 
THE NUMERICAL 
SOLUTIONS 

In this section a comparison between the 
methodologies is presented. In particular, we 
look at the effects of mean reversion and lo¬ 
cal volatility on the drift and the spread in the 
short rates. We present numerical results for the 
term structures, volatility, and mean reversion 
in Table 1. The table also includes the bond in¬ 
formation for use later. 

Original Term Structure with No 
Mean Reversion 

We first consider the original term structure 
with no mean reversion for the HL and HW 
models. In Figure 3 we present the binomial 
tree for the HL model and the trinomial for the 
HW model using the HW trinomial method¬ 
ology. We use a 10% volatility throughout the 
trees. We see that the spread in the short rates 
increases over time in the models as expected. 


Table 1 Input Information 


Original TS 

Volatility 

Mean 

Reversion 

6.20% 

10.00% 

5% 

6.16% 

10.00% 


6.15% 

9.00% 


6.09% 

9.00% 


6.02% 

8.00% 


6.02% 

8.00% 


6.01% 

7.00% 


6.01% 

7.00% 


6.00% 

7.00% 


6.01% 

7.00% 


Bond Information for ED, EC, and OAS 

Call Price (Regular Callable) 

$102.50 

Put Price (Regular Putable) 

$95.00 

Annual Coupon ($ per $100) 

$6.00 

Time Option Starts (years from now) 

1 
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a. The Ho-Lee Interest Rate Lattice 
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81.50% 
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41.49% 
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17.05% 

21.49% 


29.85% 


41.50% 


56.31% 

6.20% 8.93% 


14.99% 


25.20% 


38.42% 


-2.95% 

1.49% 


9.85% 


21.50% 


36.31% 

-11.07% 

-18.51% 

-5.01% 

-10.15% 

5.20% 

1.50% 

18.42% 

16.31% 



-25.01% 

-30.15% 

-14.80% 

-18.50% 

-1.58% 

-3.69% 





-34.80% 

-38.50% 

-21.58% 

-23.69% 







-41.58% 

-43.69% 

Time in Years 0.0 1.0 2.0 

3.0 

4.0 

5.0 

6.0 

7.0 

8.0 
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b. The Hull-White Trinomial Interest Rate Lattice Using the HW Method with No Mean Reversion 


172.58% 


203.31% 

185.99% 









142.30% 

155.26% 

168.67% 







131.70% 

124.98% 

137.94% 

151.35% 






107.78% 

114.38% 

107.66% 

120.62% 

134.03% 





84.92% 

90.46% 

97.06% 

90.34% 

103.30% 

116.71% 




63.71% 

67.60% 

73.14% 

79.74% 

73.02% 

85.98% 

99.39% 



43.65% 

46.38% 

50.28% 

55.82% 

62.42% 

55.70% 

68.66% 

82.07% 


24.39% 

26.33% 

29.06% 

32.96% 

38.50% 

45.10% 

38.38% 

51.34% 

64.75% 

6.20% 

7.07% 

9.01% 

11.74% 

15.64% 

21.18% 

27.78% 

21.06% 

34.02% 

47.43% 


-10.25% 

-8.31% 

-5.58% 

-1.68% 

3.86% 

10.46% 

3.74% 

16.70% 

30.11% 



-25.63% 

-22.90% 

-19.00% 

-13.46% 

-6.86% 

-13.58% 

-0.62% 

12.79% 




-40.22% 

-36.32% 

-30.78% 

-24.18% 

-30.90% 

-17.94% 

-4.53% 





-53.64% 

-48.10% 

-41.50% 

-48.22% 

-35.26% 

-21.85% 






-65.42% 

-58.83% 

-65.54% 

-52.58% 

-39.18% 







-76.15% 

-82.86% 

-69.90% 

-56.50% 








-100.18% 

-87.22% 

-73.82% 









-104.54% 

-91.14% 

-108.46% 

Time in Years 0.0 

1.0 

2.0 

3.0 

4.0 

5.0 

6.0 

7.0 

8.0 

9.0 


Figure 3 The HL Binomial and HW Trinomial Trees for the Original Term Structure with No Mean Reversion 


We also see that the HL model can give negative 
short rates. 

In Figure 4 we present the binomial tree for 
the KWF model, the trinomial for the BK model 
using the HW trinomial methodology, and the 
BDT binomial model. The KWF and BK models 
use the 10% volatility throughout the tree and 


no mean reversion. Note the volatile nature of 
the BDT model. This is due to the time varying 
volatility structure and the way mean reversion 
is incorporated into the BDT model through this 
decreasing volatility structure. Note that all the 
short rates are positive and that the spread in 
the rates is significantly less than in Figure 3. 
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a. The Kalotay, Williams, and Fabozzi Interest Rate Lattice 

14.72% 

12.92% 

11.87% 12.05% 
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9.76% 
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8.72% 
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6.46% 


6.54% 


6.52% 


6.61% 

6.20% 

5.51% 

6.09% 

5.29% 

5.65% 

5.36% 

5.84% 

5.34% 

5.81% 

5.41% 



4.98% 

4.33% 

4.62% 

4.39% 

4.78% 

4.37% 

4.75% 

4.43% 





3.79% 

3.59% 

3.92% 

3.58% 

3.89% 

3.63% 







3.21% 

2.93% 

3.19% 

2.97% 









2.61% 

2.43% 
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b. The Black-Karasinski Trinomial Interest Rate Lattice Using the HW Method with No Mean Reversion 
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23.21% 

23.92% 








19.82% 

19.52% 

20.12% 







16.52% 

16.67% 

16.41% 

16.92% 
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14.23% 
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11.84% 

11.68% 
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11.97% 




9.82% 

9.51% 

9.96% 

9.83% 

9.92% 

9.76% 

10.06% 



8.60% 

8.26% 

8.00% 

8.37% 

8.26% 

8.34% 

8.21% 

8.46% 


7.25% 

7.23% 

6.95% 

6.73% 

7.04% 

6.95% 

7.01% 

6.90% 

7.12% 

6.20% 

6.09% 

6.08% 

5.75% 

5.66% 

5.92% 

5.84% 

5.90% 

5.81% 

5.98% 


5.12% 

5.11% 

4.91% 

4.76% 

4.98% 

4.91% 

4.96% 

4.88% 

5.03% 



4.30% 

4.13% 

4.00% 

4.19% 

4.13% 

4.17% 

4.11% 

4.23% 




3.47% 

3.37% 

3.52% 

3.48% 

3.51% 

3.45% 

3.56% 





2.83% 

2.96% 

2.92% 

2.95% 

2.90% 

2.99% 






2.49% 

2.46% 

2.48% 

2.44% 

2.52% 







2.07% 

2.09% 

2.05% 

2.12% 








1.75% 

1.73% 

1.78% 









1.45% 

1.50% 

1.26% 

Time in Years 0.0 

1.0 

2.0 

3.0 

4.0 

5.0 

6.0 

7.0 

8.0 

9.0 


Figure 4 The BDT and KWF Binomial and the BK Trinomial Trees for the Original Term Structure with No Mean 
Reversion 


Table 2 presents the trinomial lattices for the 
HW and BK models using the information in 
Table 1 and a mean reversion of 5%. The volatil¬ 
ity is 10%. Notice the pruning that takes place 
within the lattice when we have mean rever¬ 
sion. This produces lattices that are significantly 


different from those shown in Figures 3 and 
4. This is a peculiarity of the Hull and White 
methodology. The pruning is a result of incor¬ 
porating mean reversion into the model and 
ensuring that the distributional characteristics 
of the SDEs are retained. 
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c. The Black, Derman, and Toy Interest Rate Model 








11.39% 







10.29% 







6.47% 


9.89% 





9.52% 


8.93% 





7.36% 


6.34% 


8.59% 



8.12% 


8.10% 


7.76% 



7.24% 
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6.90% 


6.74% 
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6.30% 


6.26% 


6.08% 


6.47% 

6.20% 6.09% 


5.66% 


5.88% 


5.85% 


5.51% 

5.49% 
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5.95% 


5.62% 

4.98% 

4.78% 

4.73% 

5.32% 

5.00% 

5.83% 

5.09% 

4.88% 



3.95% 

4.91% 

4.26% 

5.71% 

4.42% 

4.24% 
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3.68% 







3.33% 
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Figure 4 ( Continued ) 


Table 2 Trinomial Model 


a. The Hull-White Trinomial Interest Rate Lattice Using the HW Method with Mean Reversion of 5% 






83.50% 

87.60% 

91.92% 

96.84% 

101.89% 

107.24% 




63.14% 

66.18% 

70.28% 

74.60% 

79.52% 

84.57% 

89.91% 



43.51% 

45.82% 

48.86% 

52.96% 

57.28% 

62.20% 

67.25% 

72.59% 


24.39% 

26.18% 

28.50% 

31.54% 

35.64% 

39.96% 

44.88% 

49.93% 

55.27% 

6.20% 

7.07% 

8.86% 

11.17% 

14.22% 

18.32% 

22.64% 

27.56% 

32.61% 

37.95% 


-10.25% 

-8.46% 

-6.15% 

-3.10% 

1.00% 

5.32% 

10.24% 

15.29% 

20.63% 



-25.78% 

-23.47% 

-20.42% 

-16.32% 

-12.00% 

-7.09% 

-2.03% 

3.31% 




-40.79% 

-37.75% 

-33.64% 

-29.32% 

-24.41% 

-19.35% 

-14.01% 





-55.07% 

-50.96% 

-46.64% 

-41.73% 

-36.67% 

-31.33% 

Time in Years 
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b. The Black-Karasinski Trinomial Interest Rate Lattice Using the HW Method with Mean Reversion of 5% 






11.34% 

11.87% 

11.73% 

11.84% 

11.67% 

12.03% 




9.83% 

9.53% 

9.99% 

9.86% 

9.96% 

9.81% 

10.12% 



8.60% 

8.27% 

8.02% 

8.40% 

8.29% 

8.38% 

8.25% 

8.51% 


7.25% 

7.26% 

6.95% 

6.74% 

7.06% 

6.98% 

7.04% 

6.94% 

7.16% 

6.20% 

6.09% 

6.08% 

5.85% 

5.67% 

5.94% 

5.87% 

5.92% 

5.84% 

6.02% 


5.12% 

5.11% 

4.92% 

4.77% 

4.99% 

4.93% 

4.98% 

4.91% 

5.06% 



4.30% 

4.14% 

4.01% 

4.20% 

4.15% 

4.19% 

4.13% 

4.26% 




3.48% 

3.37% 

3.53% 

3.49% 

3.52% 

3.47% 

3.58% 





2.84% 

2.97% 

2.93% 

2.96% 

2.92% 

3.01% 

Time in Years 

1 

2 

3 

4 

5 

6 

7 

8 

9 
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Table 3 Effective Duration and Effective Convexity Results 


Shift==> 

-500 bp 


-250 bp 


Current 


250 bp 


500 bp 


Model/ 

Structure 

Eff. 

Duration 

Eff. 

Convexity 

Eff. 

Duration 

Eff. 

Convexity 

Eff. 

Duration 

Eff. 

Convexity 

Eff. 

Duration 

Eff. 

Convexity 

Eff. 

Duration 

Eff. 

Convexity 

Ho Lee 

Callable Bond 
Putable Bond 

3.72119 

6.48070 

-31.15230 

55.51213 

3.62427 

5.96968 

10.51371 

26.45835 

3.43354 

4.82856 

9.58153 

41.73014 

4.19081 

4.33750 

-6.18888 

17.68955 

4.18588 

3.52379 

12.92063 

15.98202 

BDT 

Callable Bond 
Putable Bond 

0.98815 

8.15290 

0.97643 

41.20380 

0.96433 

7.75444 

0.92992 

37.88876 

5.72746 

6.94320 

-100.52077 

136.25219 

6.97619 

0.91997 

31.91884 

0.84634 

6.59872 

0.89929 

29.24115 

0.80871 

KWF 

Callable Bond 
Putable Bond 

0.98815 

8.15311 

0.97643 

41.26110 

0.96433 

7.75438 

0.92992 

37.97492 

5.48099 

6.02987 

-8.70115 

132.82680 

6.90354 

0.91997 

18.94888 

0.84634 

6.59875 

0.89929 

29.22747 

0.80871 

HW-HW 

Callable Bond 
Putable Bond 

3.35706 

5.82483 

5.81085 

23.71025 

3.24446 

5.33913 

8.80890 

20.81987 

3.33140 

4.79375 

9.55382 

17.78372 

3.46677 

4.14647 

-9.19552 

14.50538 

4.65946 

3.30034 

14.99510 

10.76225 

BK-HW 

Callable Bond 
Putable Bond 

0.98815 

8.09134 

0.97643 

40.58931 

0.96433 

7.70100 

0.92992 

37.39723 

5.21624 

6.79269 

-77.28716 

72.05773 

6.93694 

0.91997 

31.17366 

0.84634 

6.56855 

0.89929 

28.88729 

0.80871 


Comparison of the Models Using 
Common Risk and Value Metrics 

Here we contrast the effective duration, effec¬ 
tive convexity, and the option-adjusted spread 
(OAS) for 10-year callable and putable bonds 
each with a one-year delay on the embedded 
option. The information in Table 1 is used for 
the analysis. We computed the effective dura¬ 
tion for the original term structures shown in 
Table 1 using a yield change of 25 basis points. 
The original term structure is then shifted up 
and down in a parallel manner by ±250 basis 
points and by ±500 basis points, respectively. In 
other words, we computed the effective dura¬ 
tion at five different term structure levels using 
a yield change of 25 basis points. 

Table 3 presents the effective duration and 
convexity results for the two securities for each 
model. The results are interesting. It is clear that 
the normal models do not agree with the lognor¬ 
mal models. Specifically, the normal models do 
not match the characteristics of the price yield 


relationship at extreme interest rate levels. 7 
Furthermore, each model gives slightly dif¬ 
ferent results. This is an important finding 
and must be appreciated by any user of these 
models. 

Table 4 presents the OAS results. We used a 
market price that is 3% below the model price 
for the OAS computation. They are consistent 
with the results in Table 3. Note that the nor¬ 
mal models produce OAS values larger than 
any of the lognormal models. This is due to the 
distributional differences and the property of 
allowing very low and negative interest rates. 
Clearly, normal models are not desirable when 
evaluating securities with embedded options. 8 


APPENDIX 

In this appendix we outline how to obtain equa¬ 
tions (41) and (42). For equation (41) we use 
Figure 1. For equation (42) we use Figure 2. 


Table 4 Option-Adjusted Spread Results 


Ho-Lee BDT KWF HW-HW BK-HW 


Callable Bond 0.8454% 0.4785% 0.5449% 0.8350% 0.5063% 

Putable Bond 0.5884% 0.4732% 0.5249% 0.5688% 0.4774% 
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We first solve for ti.i and r 2/ 1 in Figure 1. 
Equating the price from the spot rate term struc¬ 
ture with the price from the binomial lattice 
gives us 


Pi 


1 

(1 + R 2 t) 2 


qpu + (i -q) y 2 ,\ 

1 + ri, 0 T 


(Al) 


Substituting in the discount factors = 1/ 
(1 + r j jr) for j = 1,2 and clearing fractions we 
obtain 


P 2 (1 + r i, 0 r) (1 + r hl r) (1 + r 2 ,it) 

(1 + r 2 ,iT) - (1 - </) (1 + r u r) = 0 (A2) 

We let r 1(0 = Ri. This equation can now be 
solved for 0i. 

For the next period in the binomial lattice we 
have from Figure 1 that 


We now introduce some variables that will 
help to generate the coefficients c,.jt for the poly¬ 
nomials that determine the interest rates at time 
period k. We start by doing it for the polynomi¬ 
als in equations (A4) and (A5). This is done in 
two steps. The first step is to notice how the 
coefficients are related to the interest rates at 
the previous time periods. Note that if we let 
fli.i = 1 + n, qt, fl 2 ,i = —1, and A 3 ,! = —1 then 
ci,i = P 2 a u , c 2 ,i = qa 2 , i, and c 3/1 = (1 - q)a 3/1 
in equation (A4). In order to generate equation 
(A5) we first let b 3/3 = a 2 ,i(l + r 2 ,it), b 2/i = a 3<1 
(1 + ti.it). We can then generate fli. 2 = (1 + 
ti,ot)(1 4- it)(1 + r 2 , it), fl 2 , 2 = iq. a 3j2 = fq.i + 

b 2j i, and fl 4 . 2 = b 2/1 . It is now seen that ci, 2 = 
P 3 a i, 2 , c 2 , 2 = q 2 a 2/2/ c 3 , 2 = q( 1 - q)a 3/3 , and c 4 . 2 = 
(1 — q) 2 fl 4/2 . We now let fq. 2 = « 3 ,i(l + t 2 . 2 t) 


P 3 = 


1 


(1 + Rsrf 


qpi,i + (l-g) p 2 ,i 
l + ri, 0 T 


?pi,2 + (i -q)p 2 , 2 

1 + ti.it 


+ (i — q) 


qp 2 , 2 + (l-q) p 3 ,2 
1 + r 2 ,it 


1 +ri, 0 r 


which reduces to 

T 3 (1 + Ti.or) (1 + ri.ir) (1 + r 2 ,it) (1 + ri, 2 r) 
x (1 +r 2 , 2 T)(l +r 3i2 r) 

-q 2 (1 + f-2.iT) (1 + r 2 , 2 T) (1 + t 3 ,it) 

-q (1 - q) [(1 + Tl.lt) + (1 + T 2 ,lt)] 

X (1 + n, 2 t) (1 + r 3 , 2 t) - (1 - q) 2 (1 + r u t) 
x (1 + t i, 2 t) (1 + t 2 . 2 t) = 0 (A3) 

We now solve equation (A3) for 0 2 using the 
bisection method. 

From equation (A2) and equation (A3) we 
can generate the remainder of the no arbitrage 
equations that give the short rates in the bi¬ 
nomial lattice. Note that equation (A2) can be 
written as 

ci.i (1 + ti lt) (1 + r 2jl t) + c 2 ,i (1 + r 2 ,it) 

+ c 3,i (1 + Ti.it) = 0 (A4) 

and that equation (A3) can be written as 


(1 + r 3/2 1 ), b 2j2 = fl 3 , 2 ( 1 + t i /2 t)(1 + r 3/2 1 ), and 
lf3,2 = «4,2(1 + Ti j2 t)(l + t 2/2 t) and continue the 
process to obtain equation (41). 

For the trinomial lattice no arbitrage poly¬ 
nomial we first solve for r^i, r 2/ i, and r 3 3 in 
Figure 2. Equating the price from the spot rate 
term structure with the price from the trinomial 
lattice gives us 

p _ 1 _ qipi,i + q 2 p 2 ,i + q 3 p 3 ,i 

(1 + R 2 t) 2 1 + Tl.ot 

which is similar to equation (Al). Proceeding as 
in the binomial lattice we find that 

P 2 (1 + Ti, 0 t) (1 + ti.it) (1 + t 2 ,it) (1 + r 3 ,it) 
-q 1 (1 + T 2 ,it) (1 + 1 3 . it) -q 2 (l+ Ti.it) 
x (1 + r 3 ,it) -q 3 (l + ti.it) (1 + 1 2 ,it) = 0 

(A6) 


c l,2 (1 + ti,2t) (1 + t 2 ,2t) (1 + t 3 . 2 t) 

+ C 2 ,2 (1 + T 2 , 2 t) (1 + t 3 2 t) + C 3 . 2 (1 + ti. 2 t) 

x (1 + t 3 , 2 t) + C 4 . 2 (1 + t 12 T) (1 + t 2 2 r) = 0 (A5) 


As in the binomial case, ti.o = R] and equa¬ 
tion (A6) is solved for 6\ using the bisection 
method. 
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For the next period in the trinomial lattice 
(Figure 2) gives us 

p _ 1 _ <7Pi,i + t ?2P2,l + <?3P3,1 

(1 + R 3 r) 3 1 + Hot 


two-factor models use a stochastic differen¬ 
tial equation for both the short rate and the 
long rate. 


9i 


7?1 Pi,2 + 2 Pi,2 + Q3 P3,2 
1 + fl.lT 


' ^2 


(?lP2,2 + <?2P3,2 + ^3 P3,3 
1 +T 2 .it 


' 7?3 


?lP3,3 + ?2P3,4 + ?3P3,5 

l + f 3 ,lT 


1 + J'i.qT 


which simplifies to the following equation 
similar to equation (A3) 


3 5 

P3 (1 + ri, 0 r)n (1 + r ;,i r ) Y I + r h lX ) 

i =1 ; =1 

5 

- </i(l + r 2 ,iT)(l + r 3 , 1 T)] _ [(l + r;, 2 r) 

;=z 

- [<71172 (1 + r2.1T) (1 + r 3 ,ir) +171172 (1 + rut) 

5 

x (1 + r 3 ,it)] ]""[ (1 + r ; -. 2 t) 

i=i 

j¥2 

- [171173 (1 + r 2 ,it) (1 + r 3 ,ir) + (?f (1 + r u r) 

x (1 + r 3 ,it) + (73(71 (1 + r u r) (1 + r 2 ,it)] 

5 

x ]~[ (l + ry. 2 t) (A 7 ) 

;=i 

7^3 

- [<72(73 (1 + rut) (1 + r 3il r) + <73(72 (1 + r u r) 

5 

x (1 + r 2 ,it)] ]""[ (1 + r ; -, 2 t) 

7=1 

;¥4 

4 

—(73 (1 + ri.it) (1 + r 2 ,it) ]~~[ (1 + r ; -, 2 t) = 0 

7=1 

Equation (A 7 ) is also solved for 62 using the 
bisection method. We now proceed as in the 
binomial lattice case to generate the no arbitrage 
equation for 0 , given in equation ( 42 ). 


KEY POINTS 

• Interest rates are commonly modeled using 
stochastic differential equations. 

* Qne-factor models use a stochastic differen¬ 
tial equation to represent the short rate and 


• The stochastic differential equations used to 
model interest rates must capture some of 
the market properties of interest rates such 
as mean reversion and/or a volatility that de¬ 
pends on the level of interest rates. 

• The approaches used to implement the SDEs 
into a term structure model include equilib¬ 
rium and no arbitrage. 

• There are five different term structure mod¬ 
els that evolve from three general stochastic 
differential equations. 

• Without market calibration the models pro¬ 
duce very different results. 

• Both the end user and the developer must be 
aware of these properties in order to properly 
implement and interpret any results from the 
models. 

• Even with calibration the models can produce 
different results. Calibration reduces the dif¬ 
ferences across the models but does not elim¬ 
inate them. 


NOTES 

1. Ho and Lee (1986). 

2. Kalotay, Williams, and Fabozzi (1993). 

3. See Buetow and Sochacki (2001). 

4. See, for example, Cox, Ross, and Rubinstein 
(1979). 

5. See Burden and Faires (1998). 

6. For complete details see Hull and White 
(1994). 
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7. See Fabozzi, Buetow, and Johnson (2012) for 
more details on the behavior of putable and 
callable bonds. 

8. Details of these phenomena are provided in 
Buetow, Hanke, and Fabozzi (2001). 
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Abstract: Portfolio managers and traders need to be able to effectively model the impact of trading 
costs on their portfolios and trades. 


Trading is an integral component of the eq¬ 
uity investment process. A poorly executed 
trade can eat directly into portfolio returns. This 
is because equity markets are not frictionless, 
and transactions have a cost associated with 
them. Costs are incurred when buying or sell¬ 
ing stocks in the form of, for example, brokerage 
commissions, bid-ask spreads, taxes, and mar¬ 
ket impact costs. 

In recent years, portfolio managers have 
started to more carefully consider transaction 
costs. The literature on market microstructure, 
analysis and measurement of transaction costs, 
and market impact costs on institutional trades 
is rapidly expanding. 1 One way of describing 
transaction costs is to categorize them in terms 
of explicit costs such as brokerage and taxes, 
and implicit costs, which include market im¬ 
pact costs, price movement risk, and opportu¬ 
nity cost. Market impact cost is, broadly speaking, 
the price an investor has to pay for obtaining 
liquidity in the market, whereas price move¬ 
ment risk is the risk that the price of an asset in¬ 
creases or decreases from the time the investor 


decides to transact in the asset until the transac¬ 
tion actually takes place. Opportunity cost is the 
cost suffered when a trade is not executed. An¬ 
other way of seeing transaction costs is in terms 
of fixed costs versus variable costs. Whereas 
commissions and trading fees are fixed, bid-ask 
spreads, taxes, and all implicit transaction costs 
are variable. 

Portfolio managers and traders need to be 
able to effectively model the impact of trading 
costs on their portfolios and trades. In this entry, 
we introduce several approaches for the mod¬ 
eling of transaction costs, in particular market 
impact costs. 

MARKET IMPACT COSTS 

The market impact cost of a transaction is the 
deviation of the transaction price from the mar¬ 
ket (mid) price 2 that would have prevailed had 
the trade not occurred. The price movement is 
the cost, the market impact cost, for liquidity. 
Market impact of a trade can be negative if, 
for example, a trader buys at a price below the 
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no-trade price (i.e., the price that would have 
prevailed had the trade not taken place). In 
general, liquidity providers experience nega¬ 
tive costs while liquidity demanders will face 
positive costs. 

We distinguish between two different kinds 
of market impact costs, temporary and perma¬ 
nent. Total market impact cost is computed as 
the sum of the two. The temporary market im¬ 
pact cost is of transitory nature and can be seen 
as the additional liquidity concession neces¬ 
sary for the liquidity provider (e.g., the market 
maker) to take the order, inventory effects (price 
effects due to broker/dealer inventory imbal¬ 
ances), or imperfect substitution (for example, 
price incentives to induce market participants 
to absorb the additional shares). 

The permanent market impact cost, however, 
reflects the persistent price change that results 
as the market adjusts to the information content 
of the trade. Intuitively, a sell transaction reveals 
to the market that the security may be overval¬ 
ued, whereas a buy transaction signals that the 
security may be undervalued. Security prices 
change when market participants adjust their 
views and perceptions as they observe news 
and the information contained in new trades 
during the trading day. 

Traders can decrease the temporary market 
impact by extending the trading horizon of an 
order. For example, a trader executing a less 
urgent order can buy or sell his or her posi¬ 
tion in smaller portions over a period and make 
sure that each portion only constitutes a small 
percentage of the average volume. However, 
this comes at the price of increased opportunity 
costs, delay costs, and price movement risk. 

Market impact costs are often asymmetric; 
that is, they are different for buy and sell orders. 
Several empirical studies suggest that market 
impact costs are generally higher for buy or¬ 
ders. Nevertheless, while buying costs might be 
higher than selling costs, this empirical fact is 
most likely due to observations during rising / 
falling markets, rather than any true market mi¬ 
crostructure effects. For example, a study by 


Hu shows that the difference in market impact 
costs between buys and sells is an artifact of 
the trade benchmark. 3 (We discuss trade bench¬ 
marks later in this entry.) When a pre-trade mea¬ 
sure is used, buys (sells) have higher implicit 
trading costs during rising (falling) markets. 
Conversely, if a post-trade measure is used, 
sells (buys) have higher implicit trading costs 
during rising (falling) markets. In fact, both 
pre-trade and post-trade measures are highly 
influenced by market movement, whereas 
during- or average-trade measures are neutral 
to market movement. 

Despite the enormous global size of equity 
markets, the impact of trading is important 
even for relatively small funds. In fact, a siz¬ 
able fraction of the stocks that compose an in¬ 
dex might have to be excluded or their trad¬ 
ing severely limited. For example, RAS Asset 
Management, which is the asset manager arm 
of the large Italian insurance company RAS, 
has determined that single trades exceeding 
10% of the daily trading volume of a stock 
cause an excessive market impact and have 
to be excluded, while trades between 5% and 
10% need execution strategies distributed over 
several days. 4 According to RAS Asset Man¬ 
agement estimates, in practice funds managed 
actively with quantitative techniques and with 
market capitalization in excess of €100 million 
can operate only on the fraction of the market 
above the €5 million, splitting trades over sev¬ 
eral days for stocks with average daily trading 
volume in the range from €5 million to €10 mil¬ 
lion. They can freely operate only on two-thirds 
of the stocks in the MSCI Europe. 

LIQUIDITY AND 
TRANSACTION COSTS 

Liquidity is created by agents transacting in the 
financial markets when they buy and sell securi¬ 
ties. Market makers and brokers-dealers do not 
create liquidity; they are intermediaries who 
facilitate trade execution and maintain an or¬ 
derly market. 
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Liquidity and transaction costs are interre¬ 
lated. A highly liquid market is one where large 
transactions can be immediately executed with¬ 
out incurring high transaction costs. In an in¬ 
definitely liquid market, traders would be able 
to perform very large transactions directly at 
the quoted bid-ask prices. In reality, partic¬ 
ularly for larger orders, the market requires 
traders to pay more than the ask when buying 
and to receive less than the bid when selling. 
As we discussed previously, this percentage 
degradation of the bid-ask prices experienced 
when executing trades is the market impact 
cost. 

The market impact cost varies with transac¬ 
tion size: The larger the trade size, the larger 
the impact cost. Impact costs are not constant 
in time, but vary throughout the day as traders 
change the limit orders that they have in the 
limit order book. A limit order is a conditional 
order; it is executed only if the limit price or 
a better price can be obtained. For example, a 
buy limit order of a security XYZ at $60 indi¬ 
cates that the assets may be purchased only at 
$60 or lower. Therefore, a limit order is very 
different from a market order, which is an un¬ 
conditional order to execute at the current best 
price available in the market (guarantees exe¬ 
cution, not price). With a limit order, a trader 
can improve the execution price relative to the 
market order price, but the execution is neither 
certain nor immediate (guarantees price, not 
execution). 

Notably, there are many different limit or¬ 
der types available such as pegging orders, 
discretionary limit orders, immediate or cancel 
order (IOC) orders, and fleeting orders. For ex¬ 
ample, fleeting orders are those limit orders 
that are canceled within two seconds of sub¬ 
mission. Flasbrouck and Saar find that fleeting 
limit orders are much closer substitutes for mar¬ 
ket orders than for traditional limit orders. 5 This 
suggests that the role of limit orders has 
changed from the traditional view of being liq¬ 
uidity suppliers to being substitutes for market 
orders. 


At any given instant, the list of orders sitting 
in the limit order book embodies the liquidity 
that exists at a particular point in time. By ob¬ 
serving the entire limit order book, impact costs 
can be calculated for different transaction sizes. 
The limit order book reveals the prevailing sup¬ 
ply and demand in the market. 6 Therefore, in a 
pure limit order market, we can obtain a mea¬ 
sure of liquidity by aggregating limit buy orders 
(representing the demand) and limit sell orders 
(representing the supply). 7 

We start by sorting the bid and ask prices, 
Pj ld ,..., p£ ld and Pj sk ,..., pf sk , (from the most 
to the least competitive) and the corresponding 
order quantities q ^ ld ,..., q^ ld and q* sk , ..., q* sk . 
We then combine the sorted bid and ask prices 
into a supply and demand schedule according 
to Figure 1. For example, the block (p 6 ’ 11 , q 2 ld ) 
represents the second best sell limit order with 
price p dld ar >d quantity q^ ld ■ 

We note that unless there is a gap between the 
bid (demand) and the ask (supply) sides, there 
will be a match between a seller and buyer, 
and a trade would occur. The larger the gap, 
the lower the liquidity and the market par¬ 
ticipants' desire to trade. For a trade of size 
Q, we can define its liquidity as the recipro¬ 
cal of the area between the supply and de¬ 
mand curves up to Q (i.e., the "dotted" area in 
Figure 1). 

Flowever, few order books are publicly avail¬ 
able and not all markets are pure limit order 
markets. In 2004, the New York Stock Exchange 
(NYSE) started selling information on its limit 
order book through its new system called 
the NYSE OpenBook®. The system provides 
an aggregated real-time view of the ex¬ 
change's limit-order book for all NYSE-traded 
securities. 8 

In the absence of a fully transparent limit or¬ 
der book, expected market impact cost is the 
most practical and realistic measure of market 
liquidity. It is closer to the true cost of transact¬ 
ing faced by market participants as compared 
to other measures such as those based upon the 
bid-ask spread. 
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Figure 1 The Supply and Demand Schedule of a Security 
Source: Figure 1A in Domowitz and Wang (2002, p. 38). 


MARKET IMPACT 
MEASUREMENTS AND 
EMPIRICAL FINDINGS 

The problem with measuring implicit transac¬ 
tion costs is that the true measure, which is 
the difference between the price of the stock in 
the absence of a money manager's trade and the 
execution price, is not observable. Furthermore, 
the execution price is dependent on supply and 
demand conditions at the margin. Thus, the ex¬ 
ecution price may be influenced by competi¬ 
tive traders who demand immediate execution 
or by other investors with similar motives for 
trading. This means that the execution price re¬ 
alized by an investor is the consequence of the 
structure of the market mechanism, the demand 
for liquidity by the marginal investor, and the 
competitive forces of investors with similar mo¬ 
tivations for trading. 

There are many ways to measure transaction 
costs. However, in general this cost is the dif¬ 
ference between the execution price and some 


appropriate benchmark, a so-called fair market 
benchmark. The fair market benchmark of a secu¬ 
rity is the price that would have prevailed had 
the trade not taken place, the no-trade price. 
Since the no-trade price is not observable, it has 
to be estimated. Practitioners have identified 
three different basic approaches to measure the 
market impact: 9 

1. Pre-trade measures use prices occurring be¬ 
fore or at the decision to trade as the bench¬ 
mark, such as the opening price on the same 
day or the closing price on the previous day. 

2. Post-trade measures use prices occurring af¬ 
ter the decision to trade as the benchmark, 
such as the closing price of the trading day 
or the opening price on the next day. 

3. Same-day or average measures use average 
prices of a large number of trades during 
the day of the decision to trade, such as the 
volume-weighted average price (VWAP) calcu¬ 
lated over all transactions in the security on 
the trade day. 10 























Modeling Market Impact Costs 


627 


The volume-weighted average price is calcu¬ 
lated as follows. Suppose that it was a trader's 
objective to purchase 10,000 shares of stock 
XYZ. After completion of the trade, the trade 
sheet showed that 4,000 shares were purchased 
at $80, another 4,000 at $81, and finally 2,000 at 
$82. In this case, the resulting VWAP is (4,000 x 
80 + 4,000 x 81 + 2,000 x 82)/10,000 = $80.80. 

We denote by / the indicator function that 
takes on the value 1 or —1 if an order is a buy 
or sell order, respectively. Formally, we now ex¬ 
press the three types of measures of market im¬ 
pact (MI) as follows 




MIy WAP 


/ k 

£ Vi- P[ 

i=i 

k 

V ^ 

V 1=1 


\ 


/pp re - 1 




where p ex , p pre , and p post denote the execution 
price, pre-trade price, and post-trade price of 
the stock, and k denotes the number of transac¬ 
tions in a particular security on the trade date. 
Using this definition, for a stock with market 
impact MI the resulting market impact cost for a 
trade of size V , MIC, is given by 


MIC = MI-V 


It is also common to adjust market impact for 
general market movements. For example, the 
pre-trade market impact with market adjust¬ 
ment would take the form 


Mfpre = 



Pm \ 
Pm 7 


X 


where p‘f represent the value of the index at 
the time of the execution, and p^f the price of 
the index at the time before the trade. Market- 
adjusted market impact for the post-trade and 
same-day trade benchmarks are calculated in 
an analogous fashion. 


The above three approaches to measure mar¬ 
ket impact are based upon measuring the fair 
market benchmark of stock at a point in time. 
Clearly, different definitions of market impact 
lead to different results. Which one should be 
used is a matter of preference and is depen¬ 
dent on the application at hand. For example, 
Elkins and McSherry, a financial consulting firm 
that provides customized trading costs and ex¬ 
ecution analysis, calculates a same-day bench¬ 
mark price for each stock by taking the mean 
of the day's open, close, high, and low prices. 
The market impact is then computed as the 
percentage difference between the transaction 
price and this benchmark. However, in most 
cases VWAP and the Elkins McSherry approach 
lead to similar measurements. 11 

As we analyze a portfolio's return over time 
an important question to ask is whether we 
can attribute good/bad performance to invest¬ 
ment profits/losses or to trading profits/losses. 
In other words, in order to better understand 
a portfolio's performance it can be useful to 
decompose investment decisions from order ex¬ 
ecution. This is the basic idea behind the imple¬ 
mentation shortfall approach suggested by Perold 
(1998). 

In the implementation shortfall approach, we 
assume that there is a separation between in¬ 
vestment and trading decisions. The portfolio 
manager makes decisions with respect to the in¬ 
vestment strategy (i.e., what should be bought, 
sold, and held). Subsequently, these decisions 
are implemented by the traders. 

By comparing the actual portfolio profit/loss 
(P/L) with the performance of a hypothetical 
paper portfolio in which all trades are made at 
hypothetical market prices, we can get an es¬ 
timate of the implementation shortfall. For ex¬ 
ample, with a paper portfolio return of 6% and 
an actual portfolio return of 5%, the implemen¬ 
tation shortfall is 1%. 

There is considerable practical and academic 
interest in the measurement and analysis of in¬ 
ternational trading costs. Domowitz, Glen, and 
Madhavan (1999) examine international equity 
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trading costs across a broad sample of 42 coun¬ 
tries using quarterly data from 1995 to 1998. 
They find that the mean total one-way trad¬ 
ing cost is 69.81 basis points. However, there is 
an enormous variation in trading costs across 
countries. For example, in their study the high¬ 
est was Korea with 196.85 basis points whereas 
the lowest was France with 29.85 basis points. 
Explicit costs are roughly two-thirds of total 
costs. However, one exception to this is the 
United States where the implicit costs are about 
60% of the total costs. 

Transaction costs in emerging markets are sig¬ 
nificantly higher than those in more developed 
markets. Domowitz, Glen, and Madhavan ar¬ 
gue that this fact limits the gains of international 
diversification in these countries, explaining in 
part the documented home bias of domestic 
investors. 

In general, they find that transaction costs de¬ 
clined from the middle of 1997 to the end of 
1998, with the exception of Eastern Europe. It is 
interesting to notice that this reduction in trans¬ 
action costs happened despite the turmoil in the 
financial markets during this period. A few ex¬ 
planations that Domowitz et al. suggest are that 
(1) the increased institutional presence has re¬ 
sulted in a more competitive environment for 
brokers / dealers and other trading services; (2) 
technological innovation has led to a growth 
in the use of low-cost electronic crossing net¬ 
works (ECNs) by institutional traders; and (3) 
soft dollar payments are now more common. 

FORECASTING AND 
MODELING MARKET 
IMPACT 

In this section, we describe a general method¬ 
ology for constructing forecasting models for 
market impact. These types of models are very 
useful in predicting the resulting trading costs 
of specific trading strategies and in devising op¬ 
timal trading approaches. 

Explicit transaction costs are relatively 
straightforward to estimate and forecast. There¬ 


fore, our focus in this section is to develop a 
methodology for the implicit transaction costs, 
and more specifically, market impact costs. The 
methodology is a linear factor-based approach 
where market impact is the dependent vari¬ 
able. We distinguish between trade-based and 
asset-based independent variables or forecasting 
factors. 

Trade-Based Factors 

Some examples of trade-based factors include: 

• Trade size 

• Relative trade size 

• Price of market liquidity 

• Type of trade (information or informationless 
trade) 

• Efficiency and trading style of the investor 

• Specific characteristics of the market or the 
exchange 

• Time of trade submission and trade timing 

• Order type 

Probably the most important market impact 
forecasting variables are based on absolute or 
relative trade size. Absolute trade size is of¬ 
ten measured in terms of the number of shares 
traded, or the dollar value of the trade. Relative 
trade size, on the other hand, can be calculated 
as number of shares traded divided by aver¬ 
age daily volume, or number of shares traded 
divided by the total number of shares outstand¬ 
ing. Note that the former can be seen as an 
explanatory variable for the temporary market 
impact and the latter for the permanent market 
impact. In particular, we expect the temporary 
market impact to increase as the trade size to 
the average daily volume increases because a 
larger trade demands more liquidity. 

Each type of investment style requires a differ¬ 
ent need for immediacy. 12 Technical trades often 
have to be traded at a faster pace in order to cap¬ 
italize on some short-term signal and therefore 
exhibit higher market impact costs. In contrast, 
more traditional long-term value strategies can 
be traded more slowly. These types of strategies 
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can in many cases even be liquidity providing, 
which might result in negative market impact 
costs. 

Several studies show that there is a wide vari¬ 
ation in equity transaction costs across differ¬ 
ent countries. 13 Markets and exchanges in each 
country are different, and so are the resulting 
market microstructures. Forecasting variables 
can be used to capture specific market charac¬ 
teristics such as liquidity, efficiency, and insti¬ 
tutional features. 

The particular timing of a trade can affect 
the market impact costs. For example, it ap¬ 
pears that market impact costs are generally 
higher at the beginning of the month as com¬ 
pared to the end of it. 14 One of the reasons for 
this phenomenon is that many institutional in¬ 
vestors tend to rebalance their portfolios at the 
beginning of the month. Because it is likely that 
many of these trades will be executed in the 
same stocks, this rebalancing pattern will in¬ 
duce an increase in market impact costs. The 
particular time of the day a trade takes place 
does also have an effect. Many informed insti¬ 
tutional traders tend to trade at the market open 
as they want to capitalize on new information 
that appeared after the market close the day 
before. 

As we discussed earlier in this entry, market 
impact costs are asymmetric. In other words, 
buy and sell orders have significantly differ¬ 
ent market impact costs. Separate models for 
buy and sell orders can therefore be estimated. 
Flowever, it is now more common to construct 
a model that includes dummy variables for dif¬ 
ferent types of orders such as buy/sell orders, 
market orders, limit orders, and the like. 

Asset-Based Factors 

Some examples of asset-based factors are: 

• Price momentum 

• Price volatility 

• Market capitalization 

• Growth versus value 

• Specific industry or sector characteristics 


For a stock that is exhibiting positive price mo¬ 
mentum, a buy order is liquidity demanding 
and it is, therefore, likely that it will have higher 
market impact cost than a sell order. 

Generally, trades in high volatility stocks re¬ 
sult in higher permanent price effects. It has 
been suggested by Chan and Lakonishok (1997) 
and Smith et al. (2001) that this is because trades 
have a tendency to contain more information 
when volatility is high. Another possibility is 
that higher volatility increases the probability 
of hitting and being able to execute at the liq¬ 
uidity providers' price. Consequently, liquidity 
suppliers display fewer shares at the best prices 
to mitigate adverse selection costs. 

Large-cap stocks are more actively traded and 
therefore more liquid in comparison to small- 
cap stocks. As a result, market impact cost is 
normally lower for large caps. 15 Flowever, if we 
measure market impact costs with respect to 
relative trade size (normalized by average daily 
volume, for instance), they are generally higher. 
Similarly, growth and value stocks have differ¬ 
ent market impact cost. One reason for that is 
related to the trading style. Growth stocks com¬ 
monly exhibit momentum and high volatility. 
This attracts technical traders that are inter¬ 
ested in capitalizing on short-term price swings. 
Value stocks are traded at a slower pace and 
holding periods tend to be slightly longer. 

Different market sectors show different 
trading behaviors. For instance, Bikker and 
Spierdijk (2007) show that equity trades in 
the energy sector exhibit higher market impact 
costs than other comparable equities in nonen¬ 
ergy sectors. 

A Factor-Based Market 
Impact Model 

One of the most common approaches in practice 
and in the literature in modeling market impact 
is through a linear factor model of the form: 

i 

MIt — a -\- y ' Pi Xj + St 
i =1 


630 


Trading Cost Models 


where a, fi, are the factor loadings and x, are the 
factors. Frequently, the error term s t is assumed 
to be independently and identically distributed. 
Recall that the resulting market impact cost of a 
trade of (dollar) size V is then given by MIC t = 
MIt ■ V. However, extensions of this model in¬ 
cluding conditional volatility specifications are 
also possible. By analyzing both the mean and 
the volatility of the market impact, we can better 
understand and manage the trade-off between 
the two. For example, Bikker and Spierdijk use 
a specification where the error terms are jointly 
and serially uncorrelated with mean zero, sat¬ 
isfying 

Var(e t ) = exp + J2 S i z i^ 

where y, Sj, and z ( are the volatility, factor load¬ 
ings, and factors, respectively. 

Although the market impact function is linear, 
this of course does not mean that the dependent 
variables have to be. In particular, the factors 
in the previous specification can be nonlinear 
transformations of the descriptive variables. 

Consider, for example, factors related to trade 
size (e.g., trade size and trade size to daily vol¬ 
ume). It is well known that market impact is 
nonlinear in these trade size measures. One of 
the earliest studies in this regard was performed 
by Loeb (1983), who showed that for a large 
set of stocks the market impact is proportional 
to the square root of the trade size, resulting 
in a market impact cost proportional to V^ 2 . 
Typically, a market impact function linear in 
trade size will underestimate the price impact of 
small- to medium-sized trades whereas larger 
trades will be overestimated. 

Chen, Stanzl, and Watanabe (2002) suggest to 
model the nonlinear effects of trade size (dollar 
trade size V) in a market impact model by using 
the Box-Cox transformation; that is, 

Vi' h - 1 

MI(V t ) — oib + Pb —--h St 

kb 

where t and r represent the time of transaction 
for the buys and the sells, respectively. In their 


specification, they assumed that e t and s z are 
independent and identically distributed with 
mean zero and variance a 1 . The parameters a^, 
fib, /-/), ce s , Ps, an d k s were then estimated from 
market data by nonlinear least squares for each 
individual stock. We remark that kb, k s e [0, 1] 
in order for the market impact for buys to be 
concave and for sells to be convex. 

In their data sample (NYSE and Nasdaq 
trades between January 1993 and June 1993), 
Chen, Stanzl, and Watanabe report that for 
small companies the curvature parameters kb, 
k s are close to zero, whereas for larger compa¬ 
nies they are not far away from 0.5. Observe 
that for kb = k s = 1 market impact is linear in 
the dollar trade size. Moreover, when kb = k s 
= 0 the impact function is logarithmic by the 
virtue of 

W - 1 

lim-= ln(/t) 

x^o k 

As just mentioned, market impact is also a 
function of the characteristics of the particu¬ 
lar exchange where the securities are traded 
as well as of the trading style of the investor. 
These characteristics can also be included in the 
general specification outlined previously. For 
example, Keim and Madhavan (1996,1997) pro¬ 
posed the following two different market im¬ 
pact specifications 

1 9 

1. MI = a + fii xotc + Pi —F P 3 \(j I + P 4 I 2 + 

P 

Ps I<71 3 + P(,Xup + s 
where 

Xotc = a dummy variable equal to one if the 
stock is an OTC traded stock or zero 
otherwise. 
p — the trade price. 

q — the number of shares traded over the 
number of shares outstanding. 

/ Lp = a dummy variable equal to one if 
the trade is done in the upstairs 16 
market or zero otherwise. 
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2. MI — a + Pi /Nasdaq + Pi 1 ! + P?, In (MCap) + 

1 

Pi P 5 X Tech p6 /Index "F £ 

P 

where 

X Nasdaq = & dummy variable equal to one if 
the stock is traded on Nasdaq or 
zero otherwise. 

q = the number of shares traded over 
the number of shares outstanding. 

MCap = the market capitalization of the 
stock. 

p = the trade price. 

X Tec h = a dummy variable equal to one if 
the trade is a short-term technical 
trade or zero otherwise. 

Xjj^g^a dummy variable equal to one if 
the trade is done for a portfolio that 
attempts to closely mimic the be¬ 
havior of the underlying index or 
zero otherwise. 

These two models provide good examples for 
how nonlinear transformations of the underly¬ 
ing dependent variables can be used along with 
dummy variables that describe specific market 
or trade characteristics. 

Several vendors and broker-dealers such 
as MSCI Barra 17 and ITG 18 have developed 
commercially available market impact mod¬ 
els. These are sophisticated multimarket mod¬ 
els that rely upon specialized estimation 
techniques using intraday data or tick-by-tick 
transaction-based data. However, the general 
characteristics of these models are similar to the 
ones described in this section. 

We emphasize that in the modeling of trans¬ 
action costs it is important to factor in the ob¬ 
jective of the trader or investor. For example, 
one market participant might trade just to take 
advantage of price movement and hence will 
only trade during favorable periods. This in¬ 
vestor's trading cost is different from that of 
an investor who has to rebalance a portfolio 
within a fixed time period and can therefore 
only partially use an opportunistic or liquidity 
searching strategy. In particular, this investor 


has to take into account the risk of not com¬ 
pleting the transaction within a specified time 
period. Consequently, even if the market is not 
favorable, this investor may decide to trans¬ 
act a portion of the trade. The market impact 
models described previously assume that or¬ 
ders will be fully completed and ignore this 
point. 

KEY POINTS 

• Trading and execution are integral compo¬ 
nents of the investment process. A poorly 
executed trade can eat directly into portfolio 
returns because of transaction costs. 

• Transaction costs are typically categorized in 
two dimensions: fixed costs versus variable 
costs, and explicit costs versus implicit costs. 

• In the first dimension, fixed costs include 
commissions and fees. Bid-ask spreads, taxes, 
delay cost, price movement risk, market im¬ 
pact costs, timing risk, and opportunity cost 
are variable trading costs. 

• In the second dimension, explicit costs in¬ 
clude commissions, fees, bid-ask spreads, and 
taxes. Delay cost, price movement risk, mar¬ 
ket impact cost, timing risk, and opportunity 
cost are implicit transaction costs. 

• Implicit costs make up the larger part of the 
total transaction costs. These costs are not ob¬ 
servable and have to be estimated. 

• Liquidity is created by agents transacting in 
the financial markets by buying and selling 
securities. 

• Liquidity and transaction costs are interre¬ 
lated: In a highly liquid market, large trans¬ 
actions can be executed immediately without 
incurring high transaction costs. 

• A limit order is an order to execute a trade 
only if the limit price or a better price can be 
obtained. 

• A market order is an order to execute a trade 
at the current best price available in the 
market. 

• In general, trading costs are measured as the 
difference between the execution price and 
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some appropriate fair market benchmark. 
The fair market benchmark of a security is 
the price that would have prevailed had the 
trade not taken place. 

• Typical forecasting models for market impact 
costs are based on a statistical factor approach 
where the independent variables are trade- 
based factors or asset-based factors. 

NOTES 

1. See, for example, Domowitz, Glen, and 
Madhavan (2001) and Keim and Madhavan 
(1998). 

2. Since the buyer buys at the ask and the seller 
sells at the bid, this definition of market im¬ 
pact cost ignores the bid-ask spread, which 
is an explicit cost. 

3. Hu (2009). 

4. Private communication, RAS Asset Man¬ 
agement. 

5. Hasbrouck and Saar (2008). 

6. Note that even if it is possible to view the en¬ 
tire limit order book it does not give a com¬ 
plete picture of the liquidity in the market. 
This is because hidden and discretionary or¬ 
ders are not included. For a discussion on 
this topic, see Tuttle (2002). 

7. Domowitz and Wang (2002) and Foucault, 
Kadan, and Kandel (2005). 

8. NYSE and Securities Industry Automation 
Corporation, NYSE OpenBook® , Version 1.1 
(New York: 2004). 

9. Collins and Fabozzi (1991) and Chan and 
Lakonishok (1993). 

10. Strictly speaking, VWAP is not the bench¬ 
mark here but rather the transaction type. 

11. See Willoughby (1998) and McSherry 
(1998). 

12. Keim and Madhavan (1997). 

13. See Domowitz, Glen, and Madhavan (2001) 
and Chiyachantana, Jain, Jiang, and Wood 
(2004). 

14. Foster and Viswanathan (1990). 

15. Keim and Madhavan (1998) and Spierdijk, 
Nijman, and van Soest ( 2003). 


16. A securities transaction not executed on the 
exchange but completed directly by a bro¬ 
ker in-house is referred to as an upstairs 
market transaction. Typically, the upstairs 
market consists of a network of trading 
desks of the major brokerages and institu¬ 
tional investors. The major purpose of the 
upstairs market is to facilitate large block 
and program trades. 

17. Torre and Ferrari (1999). 

18. Investment Technology Group (2003). 
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Abstract: Monte Carlo simulation has become an essential tool for pricing and risk estimation in 
financial applications. It allows finance professionals to incorporate uncertainty in financial models, 
and to consider additional layers of complexity that are difficult to incorporate in analytical models. 
The main idea of Monte Carlo simulation is to represent the uncertainty in market variables through 
scenarios, and to evaluate parameters of interest that depend on these market variables in complex 
ways. The advantage of such an approach is that it can easily capture the dynamics of underlying 
processes and the otherwise complex effects of interactions among market variables. A substantial 
amount of research in recent years has been dedicated to making scenario generation more accurate 
and efficient, and a number of sophisticated computational techniques are now available to the 
financial modeler. 


This entry provides an introduction to Monte 
Carlo simulation and its applications to finance, 
from financial derivative pricing to portfolio risk 
management. We begin with a discussion of the 
main ideas behind simulation and a listing of 
several important areas in finance where sim¬ 
ulation techniques are widely used. We then 
discuss technical issues that are important for 
understanding the advantages and limitations 
of the Monte Carlo simulation technique, such 
as how random numbers are actually gener¬ 
ated, what techniques are used for increasing 
the accuracy of estimates from simulation, and 
what software can be helpful for applications. 

MAIN IDEAS AND 
IMPORTANT CONCEPTS 

Simulation can be most generally defined as 
imitation of real-life systems with the goal of 


studying important characteristics of their be¬ 
havior. Monte Carlo simulation is named after 
the main residential area of the Monaco prin¬ 
cipality, which was well known for its casino. 
The term alludes to randomness and process 
repetition, analogous to casino games such as 
roulette. 

The idea of applying Monte Carlo simulation 
to finance arises naturally, given the inherent 
variability in markets and the need for finance 
professionals to evaluate strategies with uncer¬ 
tain outcomes. Consider, for example, a port¬ 
folio manager who would like to estimate the 
effect of a market downturn on the portfolio 
(e.g., if the market goes down by 10%). What 
would be the resulting portfolio value? If the 
portfolio beta is 1, the expected decline in the 
portfolio value will be 10% as well; if the port¬ 
folio beta is 0.9, the portfolio will decline 9% 
if the market declines by 10%. More generally. 
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Discrete Uniform Distribution 



Values 

(a) Discrete Uniform Distribution 

Figure 1 Examples of Probability Distributions 

a portfolio manager may want to assess the 
exposure of a portfolio to a set of risk factors 
suggested by economic theory or empirical evi¬ 
dence such as interest rate changes, commodity 
price changes, exchange rate movements, and 
so on. These risk factors and their interactions 
with each other are not straightforward to eval¬ 
uate. One can imagine that a portfolio manager 
would consider scenarios for possible joint re¬ 
alizations of market variables—for example, in 
a global recession or under favorable monetary 
policy changes—and would assess the change 
to the portfolio value in each of these scenar¬ 
ios. Taking it yet another step further, a port¬ 
folio manager may assign probabilities to the 
different scenarios, thus expressing a view on 
their likelihood of occurring. Assigning proba¬ 
bilities to outcomes produces probability distri¬ 
butions. Examples of probability distributions 
include the discrete uniform distribution (see 
Figure la), which assigns equal probabilities 
to all possible discrete outcomes, and the nor¬ 
mal distribution (Figure lb), which is continu¬ 
ous (defined on a range, as opposed to discrete 
values), and allocates more probability to out¬ 
comes close to the average than to those far from 
the average. 

The example in the previous paragraph il¬ 
lustrates a Monte Carlo simulation system: 
Possibly random inputs (the risk factors) in¬ 
corporating subjective or statistically estimated 
views via probability distributions are en¬ 
tered into an evaluation model (computation 
of change in portfolio value), and the result¬ 
ing output (the portfolio change) is not a single 


Normal Distribution 



(b) Normal Distribution 


number, but a probability distribution of out¬ 
comes that incorporates characteristics of the 
input probability distributions and their com¬ 
plex interactions. The actual simulation process 
involves generating a certain number of sce¬ 
narios, evaluating the portfolio change for each 
scenario, and obtaining a corresponding set of 
scenarios for the portfolio change. The latter set 
of scenarios can then be analyzed to determine 
most likely outcomes for portfolio change, vari¬ 
ability of estimated portfolio change, range of 
possible outcomes, and the like. One can use 
the simulation output also to estimate any port¬ 
folio risk measure such as value-at-risk (VaR) 
or expected tail loss (ETL). Since VaR has been 
adopted by regulators and is commonly used 
by portfolio managers, we will use VaR in our 
illustrations. When generating scenarios for the 
factors influencing the future value of the port¬ 
folio, it is easy to collect information on possible 
portfolio losses relative to the current value of 
the portfolio in each scenario. Then, the 95% 
VaR, for example, can be computed as the 95th 
percentile of the distribution of portfolio losses 
(see Figure 2). 

As another illustration of a simulation model, 
consider the problem of finding the fair price of 
a simple European call option on a stock with 
current stock price St- If the strike price is K and 
the option matures at time T, the option payoff 
at time T can be expressed as 

Vj = max {Sr — K, 0} 

According to a fundamental theory in asset 
pricing, the fair price of a financial asset 
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Figure 2 Determining Portfolio VaR from 
Simulation 


under certain conditions can be meaningfully 
estimated as the expected value (equivalently 
as a "probability-weighted average") of the 
possible payoffs of the financial asset in differ¬ 
ent states of the world in the future. The fair 
value of the option at time f will therefore be 
the expected value of the discounted payoff: 

V t = E [e- r(T - f) max{S T -K,0] 

where r is the short-term risk-free rate. 

The expected value in the expression above 
is meaningful only if one can specify a proba¬ 
bility distribution of possible outcomes for the 
future price of the asset. For example, consider 
a European call option on a common stock with 
an exercise price of $20. Assume that the short¬ 
term risk-free rate is 0%, and suppose that the 
stock price at time T can only take the values 
$18, $21, and $23 with (risk-neutral) probabil¬ 
ities 3/6, 2/6, and 1/6, respectively. Then the 
fair price of the option can be computed as the 
weighted average of the payoffs in the three 
possible states of the world: 


V = \ max {18 - 20, 0} + \ max {21 - 20,0} 
6 6 

1 

+ - max {23 — 20, 0} 

6 



0.83 


That is, the fair value of the option is $0.83. 


Typically, however, the stock price can take 
many more values, and the option price cannot 
be valued exactly. It therefore makes sense to 
generate a large number (e.g., 1,000) of scenar¬ 
ios for the future value of the stock price using 
the risk-neutral probabilities, and average out 
the payoffs to the option. The average obtained 
from the simulation will approximate the true 
expected value of the option. 

The Black-Scholes formula for European op¬ 
tions (Black and Scholes, 1973) is widely used 
in the financial industry. It provides a closed- 
form expression for computing the price of the 
option. The underlying assumption used in the 
derivation of the Black-Scholes formula is that 
the percentage changes in the asset price are in¬ 
crements of a Brownian motion. 1 The evolution 
of the stock price can then be described by the 
equation 


dSt = fJ-St dt + a St tfW t (1) 

where W) is standard Brownian motion, and /i 
and a are called "drift" and "volatility" of the 
process, respectively. 

Equation (1) says that the change in the asset 
price at any time period is determined by two 
components: (1) a drift term that is a fraction of 
the current asset price level, and (2) a "random 
noise" term that assumes that volatility is pro¬ 
portional to the current price level. For techni¬ 
cal reasons (namely, absence of arbitrage), when 
pricing an option on an asset whose movement 
is described by equation (1), the drift jx is re¬ 
placed by the risk-free rate r. The technical de¬ 
tails of equation (1) are not important for our 
purposes. The important result is that under the 
assumption for the random process followed by 
the stock price in (1), the value of the stock price 
Sj at time T can be computed as 

S T = S,e (r_ 5 or2 )( T -0+° r V( T - f )* (2) 

where w is a random variable following a nor¬ 
mal distribution with mean 0 and standard de¬ 
viation 1 (see Figure lb). 
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Hence, the option price obtained from the 
Black-Scholes formula can be approximated by 
simulation if a large number of values for the 
normal random variable w are generated, thus 
creating scenarios for the stock price Sj at time 
T and allowing for computing the discounted 
payoffs of the option. Suppose we generate n 
scenarios for w : w\,... ,w n . Then, the price of 
the European option will be 

V t = e -r(T-f) 

■ jr 1 max j s t e (r -i al ^ T - t)+cr ^ F - r)w ‘ — K. o} 

i =1 n 

Note that the expression above is still a 
weighted average of the payoffs of the option 
in each scenario: the "weight," or the probabil¬ 
ity of each scenario, is assumed to be 1 /m, since 
the scenarios are picked at random, and the fre¬ 
quency of their occurrence already incorporates 
the probability distribution of w. 

It appears unnecessarily complicated to price 
the option this way, and indeed, in practice sim¬ 
ulation is rarely used for such simple problems. 
There are more complex derivative instruments 
and more sophisticated models for asset price 
behavior; in such cases, it may be simpler to 
generate scenarios and evaluate prices by sim¬ 
ulation than to look for closed-form analytical 
expressions like the Black-Scholes formula. In 
addition, in the case of portfolios and baskets 
of multiple assets, generating joint scenarios for 
multiple securities in simulation can help cap¬ 
ture the otherwise complicated effect of inter¬ 
actions among different risk factors influencing 
the future value of the portfolio or derivative 
instrument. 


How Many Scenarios? 

A simulation may not be able to capture all pos¬ 
sible realizations of uncertainties in the model. 
For instance, consider the European option pric¬ 
ing example above. If the percentage change in 
the stock price is assumed to be the increment 
of a Brownian motion, the possible number of 


values for the stock price Sj at time T is infinite. 
(This is because the number of values the nor¬ 
mal random variable w can take is infinite—the 
normal distribution has an infinite range.) Thus, 
one could never obtain the exact value of the 
option price by simulation. One can, however, 
get close. The accuracy of the estimation will 
depend on the number of generated scenarios. 
If the scenario generation is truly random, then 
the standard error (the "variability") in the es¬ 
timate of the average will be 

s 

■s/n 

where s is the standard deviation of the sim¬ 
ulated discounted option payoffs, and n is the 
number of scenarios. This result follows from 
the central limit theorem (CLT). This theorem 
states that if a sample of n independent and 
identically distributed observations is drawn 
from a distribution with mean /i and standard 
deviation a , then the sample mean (which is 
an estimate of the true distribution mean /i) 
will follow a normal distribution around the 
actual distribution mean /i with standard devi¬ 
ation ij/^/n as the sample size n tends to infinity, 
regardless of the shape of the original distribu¬ 
tion, as long as n is large. The fact that the dis¬ 
tribution standard deviation a in the CLT can 
be replaced by the sample standard deviation 
s follows from additional theoretical results on 
the convergence of s to a in distribution as the 
number of observations grows large. 

Hence, to double the accuracy of estimating 
the mean of the output distribution, one would 
have to quadruple the number of scenarios. This 
can get expensive computationally, especially 
in more complicated multistage situations. For¬ 
tunately, there are modem methods for gener¬ 
ating random numbers and scenarios that can 
help reduce the computational burden. 

While the average output from a simulation 
is important, it is often not the only quantity of 
interest, something that practitioners tend to 
forget when using simulation to value com¬ 
plex financial instruments. For example, as 
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mentioned earlier, in assessing the risk of a 
portfolio, a portfolio manager may be inter¬ 
ested in the percentiles of the distribution of 
outputs (VaR for portfolios) or the worst-case 
and best-case scenarios. Unfortunately, it is not 
as straightforward to determine the accuracy 
of those estimates from a simulation. There are 
some useful results from probability theory that 
apply. 2 However, the general question of how 
many scenarios one should generate to get a 
good representation of the output distribution 
does not have an easy answer. This issue is com¬ 
plicated further by the fact that results from 
probability theory do not necessarily apply to 
many of the scenario-generating methods used 
in practice, which do not simulate "truly ran¬ 
dom" samples of observations, but instead use 
smarter methods that reduce the number of 
scenarios needed to achieve good estimate ac¬ 
curacy. We will discuss some such methods 
later in this entry. 

Estimator Bias 

The statistical concept of estimator bias is 
important in simulation applications because 
it shows whether an estimator estimates the 
"right thing" on average (that is, whether it ap¬ 
proaches the true parameter one needs to esti¬ 
mate given a sufficient number of replications). 
For example, the average obtained from a sam¬ 
ple of scenarios is an unbiased estimator of the 
true expected value. Depending on the way sce¬ 
narios are generated, however, one may intro¬ 
duce a bias in the estimate of the parameter of 
interest. 

Suppose, for example, that one generates sce¬ 
narios for the future asset price in the option 
pricing example introduced earlier in this entry, 
but instead of the formula describing the evolu¬ 
tion of the asset price in continuous time (equa¬ 
tion (2)), one divides the time between now and 
the maturity of the option into small intervals 
of length h and uses a "discrete-time" formula 
[based on equation (1)] to approximate the stock 
price at each time period between t and T, com¬ 


piling the changes to obtain the final asset price 
at the maturity of the option. 

Simulating the asset price in this manner will 
generate a bias in the estimate of the expected 
present value of the option, because the sim¬ 
ulated changes in the asset price along the 
way are not continuous or instantaneous, but 
happen over a fixed-length time interval. This 
kind of bias is referred to as "discretization er¬ 
ror bias." Of course, in the case of geometric 
Brownian motion with fixed drift and volatility 
described by equation (1) one can obtain an un¬ 
biased estimator of the average option payoff 
by simulating the future asset price with the 
continuous-time formula (2). However, in many 
instances it is not possible to find such a closed- 
form expression for the future asset price; for 
example, such a formula does not exist when 
the volatility a in the random process for the 
asset price is time-dependent, or when one uses 
a mean-reversion process to describe the evolu¬ 
tion of the underlying price. In such cases, one 
can reduce the time interval length h to reduce 
the bias, but it is important to keep in mind that 
reducing the time interval length increases the 
number of steps necessary to create a scenario 
for the future asset price, and becomes compu¬ 
tationally expensive. 

Estimator Efficiency 

If there are two ways to obtain an estimate of 
a quantity of interest and the estimators are 
otherwise equivalent in terms of bias, which 
estimator should be preferable; that is, which 
estimator is more "efficient"? Statistical theory 
states that one should prefer the estimator with 
the smaller standard deviation, because it is 
more accurate. For example, consider two unbi¬ 
ased estimators, both of which are obtained as 
averages from a sample of independent repli¬ 
cations. Their standard errors will be given 
by 


•n/mT V”2 
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where Si and S 2 are the standard deviations from 
the samples of scenarios, and ti\ and n 2 are the 
number of scenarios for each of the estimators. 

In the case of simulation, statistical concepts 
frequently need to be extended to include nu¬ 
merical and computational considerations. For 
example, suppose that it takes longer to gen¬ 
erate the scenarios for the estimator with the 
smaller standard deviation. Is that estimator 
still preferable, given that one can use the ex¬ 
tra time to generate additional scenarios for the 
other estimator, thus reducing the latter esti¬ 
mator's standard error? It is natural (and the¬ 
oretically justified) to modify the measure of 
variability and efficiency so that it includes a 
concept of time. If x\ and r 2 are the times it 
takes to generate one scenario for each of the 
two estimators, then one should select the es¬ 
timator with the smaller of the time-adjusted 
standard deviations s 2v / t 2 . 


FINANCIAL APPLICATIONS 
OF SIMULATION 

Simulation has become an important staple in 
a financial modeler's toolbox. This section lists 
some important examples of simulation appli¬ 
cations in finance. 

Financial Derivative Pricing 

The use of Monte Carlo simulation in derivative 
pricing dates back to Boyle (1977). Although 
the technique is not widely used for pricing of 
European-style securities with a single under¬ 
lying stochastic variable, it is helpful for pricing 
European-style securities with multiple un¬ 
derlying stochastic variables, path-dependent 
options, such as Asian and American options, 
as well as basket options, where correlations 
between assets need to be taken into consid¬ 
eration. Additional examples of Monte Carlo 
simulation applications in financial derivative 
pricing include options on the spread between 


two assets, barrier options, and quantos, whose 
payoff depends both on a stock price and an ex¬ 
change rate. We already described a simple ex¬ 
ample of pricing a European call by simulation. 
In this section, we discuss further simulation 
issues in the context of pricing Asian options. 

The value of an Asian option is determined by 
the average price of the underlying asset either 
continuously over the time to maturity or at a 
prespecified set of monitoring dates h,..., fr- 
In particular, the payoff of an Asian call option 
is 

Vj = max | Saverage k. 0 J 

Thus, to price the option, one needs informa¬ 
tion not only on the value of the asset at time 
T, but also on the possible paths the asset could 
take to reach its terminal value. If the percent¬ 
age change in the underlying asset price S is 
assumed to be the increment of a Brownian mo¬ 
tion and if the average is computed as a geomet¬ 
ric (as opposed to an arithmetic) average, there 
are analytical formulas for pricing continuous¬ 
time Asian options. However, there are no ex¬ 
act formulas in the case of discrete monitoring 
dates or different assumptions on the process 
followed by the asset price. 

To price the option by simulation, one would 
generate possible paths for the underlying asset 
price. Let S^j) be the simulated asset price at 

time tj, i — 1,., T, for path j, j = 1,., n. 

For example, if the percentage change in the 
underlying asset price S is assumed to be the 
increment of a Brownian motion, then the asset 
price at time t\ can be simulated given the asset 
price at time 0 as 

S f = S 0 e ^ ~ 2 0-2 M -0 ) +cr V ft -°) a o 

where, as defined earlier, wo is a random vari¬ 
able following a normal distribution with mean 
0 and standard deviation 1 (the subscript "0" 
stands for the fact that this realization of w is 
for the time period (0, h]). Having generated 
a realization of S^, one can simulate a possible 
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value for Sf 2 by using the formula 

S t = St e( r_ 5°' z )( f2_fl )+' T 'v/( f 2- f i)wi 

and generating a realization of the normal ran¬ 
dom variable w\. After repeating this T times, 
one has generated a path for the asset price. Av¬ 
eraging the (properly discounted) option payoff 
over n paths produces the fair price of the Asian 
option. 

The simulation process makes it easy to cal¬ 
ibrate model parameters to observed market 
factors and to incorporate additional layers of 
modeling complexity. For example, suppose 
that at time 0 one observes a term structure of 
zero-bond prices B(0, h), ..., B(0, t T ) that is not 
necessarily consistent with a single interest rate 
r. In other words, one cannot find a short rate r 
such that 

B(0, ti) = e~ rt ‘ 

for all intermediate time periods f ( . It would 
be difficult to correct for this in a closed-form 
formula such as the Black-Scholes formula for 
European options. However, the correction can 
be easily implemented in the simulation: one 
only needs to simulate future asset prices at 
each intermediate time period as 

C, _ c ^(0’ h) -lCT Z (t, +1 -I,)+or^/(t, +1 -t,)a,- 

1+1 k B(0,t i+1 ) 

Similarly, if one observes forward prices 
F (0, h),..., F (0, tj) on the underlying asset, 
one can obtain a more accurate representation 
of the possible scenarios in the simulation by 
using the formula 

c c E ^+ i ^ -H 2 (fe+i-t)+o'V(ii+i-t)a, 

' F(0, ti) 

The complexity of the pricing model can be in¬ 
creased further by incorporating realistic mod¬ 
els for the volatility a. The simulation technique 
therefore has a tremendous modeling potential. 


Estimating Sensitivities 

For trading, hedging, and risk management 
purposes, the estimation of the sensitivity of 
derivative prices to different inputs is some¬ 
times even more critical than the estimation 
of the prices themselves. These sensitivity 
measures are popularly referred to as the 
"Greeks" because each sensitivity measure is 
traditionally denoted by a Greek letter. A natu¬ 
ral way to think of evaluating the sensitivity of 
a derivative price to a change in an underlying 
parameter is to use Monte Carlo simulation to 
compute the price of the derivative, and then 
use Monte Carlo simulation again to compute 
the price of the derivative if the input param¬ 
eter is changed by a small amount h. This kind 
of estimation (referred to as a "finite-difference 
method"), however, presents both theoretical 
and practical challenges. On the theoretical 
side, finite difference methods frequently result 
in a large amount of bias. On the practical side, 
the amount of computation required for the 
estimation of the sensitivity is large (double 
the amount of computation used in the pricing 
of the derivative), and can become prohibitive 
if this computation is done in the context of 
evaluating the sensitivity of a whole portfolio 
of securities to changes in underlying factors. 

In specific circumstances, the computational 
burden can be reduced by finding an expression 
for the Greek variable of interest that can be cal¬ 
culated as a by-product when paths are gener¬ 
ated in a single simulation. Such expressions ex¬ 
ist when computing the Black-Scholes delta or 
the delta of an Asian option. 3 These methods are 
referred to as "pathwise methods"—namely, 
the evolution of the underlying model over 
paths is differentiated, and the parameter with 
respect to which the change is computed is 
treated as a parameter of that evolution. For ex¬ 
ample, consider the delta (denoted by A) for an 
option price calculated with the Black-Scholes 
formula, where delta is defined as the (math¬ 
ematical) derivative of the option value with 
respect to the value of the underlying asset. To 
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Table 1 Scenarios for Portfolio VaR Estimation 


Scenario 

Market 
Variable 1 

Market 

Variable 2 

Market 

.. Variable m 

Change in Portfolio 
Value ($ million) 

1 

3.54 

21.54 

0.17 

100.32 

2 

3.27 

22.03 

0.18 

101.54 

n 

3.83 

22.32 

0.15 

100.87 


calculate the value of delta, one would generate 
n paths for the evolution of the asset price, and 
keep track of the paths in the simulation that 
end up in-the-money. Let the sum of the asset 
prices at the end of all in-the-money paths be 
Q. Then, the delta at time f can be computed as 

A, = • — 

S t ■ n 

More recently, efficient estimators for sensitiv¬ 
ity from simulation trials have been developed 
based on Malliavin calculus. 4 

Portfolio Risk Management 

Earlier, we mentioned the importance of sim¬ 
ulation for portfolio risk measurement and 
management. We now explain the simulation 
procedure in more detail. 

To estimate the portfolio VaR, for example, 
one would generate n possible scenarios for the 
possible changes in m market variables that in¬ 
fluence the change in the portfolio value, and 
compute the change in portfolio value in each 
scenario (see Table 1). Sometimes historical data 
are used to create the scenarios, but typically 
the scenarios are generated in a more sophis¬ 
ticated manner. The changes in the portfolio 
value are then sorted, and the 95% VaR, for ex¬ 
ample, can be computed as the 5th percentile of 
the so-obtained empirical distribution of port¬ 
folio value changes. (This is equivalent to com¬ 
puting the 95% VaR as the 95th percentile of the 
emprical distribution of future portfolio losses, 
as illustrated in Figure 2.) 

While this standard Monte Carlo simulation 
procedure is comprehensive, it can be very slow, 
especially when the portfolio contains complex 
derivative securities whose changes in value 


must be reevaluated in every scenario for the 
market variables. In fact, the portfolio VaR cal¬ 
culation by simulation involves a number of 
"subsimulations" evaluating the sensitivities 
of the securities in the portfolios to each of 
the market variables. For large portfolios, the 
computational cost of generating each scenario 
for the change in portfolio value can become 
prohibitive. 

In practice, several approaches are used 
to speed up the calculation of VaR. One of 
the earliest approaches, popularized by JP 
Morgan's RiskMetrics software in the 1990s, is 
to assume that all changes in market variables 
are normally distributed. If the portfolio value 
is a linear function of these market variables 
(this happens, e.g., when the portfolio contains 
equities and factor models are used to represent 
the changes in asset value relative to changes 
in market variable values), then the change in 
portfolio value is also normally distributed, and 
can be computed in closed form, by expressing 
the VaR as a multiple of the standard devia¬ 
tion of changes in the market variables. This 
approach does not necessarily have to involve 
simulation, and actually works reasonably well 
for large equity-only portfolios that contain liq¬ 
uid assets, because the empirical distributions 
of their returns can be indeed very close to 
normal. However, it can grossly underestimate 
the true portfolio VaR when the portfolio con¬ 
tains complex derivatives (which are nonlinear 
functions of the returns on the underlying 
market variables) or fixed income securities 
(which depend nonlinearly on interest rates). 

The nonlinearity can be partially incorpo¬ 
rated in the estimate of the change in portfolio 
value by using second-order information. 
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so-called "Delta-Gamma" or quadratic approx¬ 
imation to the change in portfolio value. 5 In 
other words, the change in portfolio value is 
expressed not only through the changes in the 
values of the market variables, but also through 
the changes in the market variables squared 
and scaled by their so-called Hessian matrix. 
(From a mathematical perspective, this is a 
multidimensional Taylor expansion involving 
the Greeks of the different securities in the 
portfolio.) Since traders of complex derivatives 
often have to keep track of this information 
for their own risk management purposes, the 
portfolio risk management process amounts to 
disciplined accumulation of information that 
is already available. This method is only an 
approximation, but it can reduce substantially 
the time for computing the portfolio VaR. 


Valuing Mortgage-Backed Securities 

Monte Carlo methods are often used for valu¬ 
ing mortgage-backed securities (MBSs) such 
as collateralized mortgage obligations (CMOs) 
and stripped MBSs (mortgage strips). The cash 
flows for such products can be calculated using 
different pricing models. The highly uncertain 
terms in those cash flow models, such as the 
behavior of interest rates and the expected pre¬ 
payments over the life of the MBS, are often 
simulated to determine the expected cash flows 
to the MBS holder, which then provide the sam¬ 
ple average ("fair") value for the MBS. 


Valuing Credit-Risky Securities 

Similar ideas to those for pricing CMOs are 
used for pricing collateralized debt obligations 
(CDOs), which employ securitization to pack¬ 
age credit-risky debt obligations (bonds and 
loans) in ways analogous to the way mortgages 
are packaged in CMOs. In order to price the 
CDO, one needs to simulate the defaults of dif¬ 
ferent bond issuers in the collection. 6 


Simulation is also used for pricing other 
credit-risky instruments, such as first-to-default 
baskets and basket default swaps. 7 The simula¬ 
tion techniques applied in such cases can be 
quite advanced, as credit defaults are consid¬ 
ered "rare events" and need to be modeled with 
care. We will discuss the main ideas of simula¬ 
tion modeling techniques for rare events, such 
as importance sampling, later in this entry. 


RANDOM NUMBER 
GENERATION 

At the core of Monte Carlo simulation is the 
generation of random numbers. In fact, however, 
generating random numbers from a wide vari¬ 
ety of distributions reduces to generating ran¬ 
dom numbers on the unit interval from 0 to 1 
uniformly, that is, generating random numbers 
on the interval [0,1] in such a way that each 
value between 0 and 1 is equally likely to oc¬ 
cur. Many computer languages and software 
packages have a command for generating a 
random number between 0 and 1: "=RAND()" 
in Microsoft Excel, "rand(l)" in MATLAB and 
FORTRAN, and "rand()" in C++. 

From a Uniform Random Variable to 
a Variable from an Arbitrary 
Distribution 

The most common method for converting a ran¬ 
dom number between 0 and 1 to a number 
from an arbitrary probability distribution is to 
evaluate the so-called "inverse" of the cumula¬ 
tive probability distribution function at the ran¬ 
dom number between 0 and 1. The idea works 
because the total mass for a probability distri¬ 
bution is always 1, and the cumulative proba¬ 
bility for any value of the distribution (defined 
as the probability that this particular value or 
any value below it will occur) is always be¬ 
tween 0 and 1. For example, suppose that one 
would like to generate a random number from 
the normal distribution in Figure lb. Suppose 
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the =RAND() command in Excel returns the 
number 0.975. The next step is to look for a cor¬ 
responding random number from a normal dis¬ 
tribution so that 97.5% of the probability mass 
(the area under the probability density curve) 
is to the left of that number. In Excel in particu¬ 
lar, the function '=NORMINV(RAND0, mean, 
standard deviation)' can be used to find that 
random number on the x-axis of a normal dis¬ 
tribution with the specified mean and standard 
deviation. 

"Inverting" the cumulating probability distri¬ 
bution is trickier for discrete probability distri¬ 
butions, but the idea still applies. For example, 
suppose that given a random number genera¬ 
tor on the interval [0,1], one would like to simu¬ 
late values for a random variable that takes the 
value 5 with probability 50%, the value 15 with 
probability 30%, and the value 35 with proba¬ 
bility 20%. Let us split the unit interval [0,1] into 
three intervals based on the cumulative proba¬ 
bilities 50%, 80% and 100% for obtaining the 
values 5,15, and 35: [0,0.5], (0.5,0.8], and (0.8,1]. 
If the random number that is drawn falls in 
the interval [0,0.5] (which happens 50% of the 
time if the number generator is truly random), 
then one records a value of 5 for that trial. If 
the random number is in the interval (0.05, 0.8] 
(which happens with probability 30%), then one 
records a value of 15 for that trial. Finally, if the 
random number is in the third interval (which 
happens with probability 20%), one records a 
value of 35. Thus, if many trials are run, the 
values 5,15, and 35 are generated with the de¬ 
sired probabilities. In Excel, one can simulate 
these values with the corresponding probabili¬ 
ties by creating a table with the interval ranges 
in the first two columns, and the corresponding 
values (5, 15, and 35) in the third column, and 
using the Excel function 

VLOOKUP(lookup_value, table_array,col_ 
index_num) 

to look up the range in which a number gener¬ 
ated with RAND() falls. 8 


What Defines a "Good" Random 
Number Generator? 

Given the discussion in the previous section, 
generating "good" uniform random numbers 
on [0,1] is critical for the performance of 
simulation algorithms. Interestingly, defining 
"good" random number generation is not as 
straightforward as it appears. Early random 
number generators tried to use "truly random" 
events for random number generation, such as 
the amount of background cosmic radiation. In 
practice, however, this kind of random number 
generation is time consuming and difficult. 
Moreover, it was realized that the ability to 
reproduce the random number sequence and 
to analyze the random number characteristics 
is actually a desirable property for random 
number generators. In particular, the ability to 
reproduce a sequence of random numbers al¬ 
lows for reducing the variance of estimates and 
for debugging computer code by rerunning 
experiments in the same conditions in which 
they were run in previous iterations of code 
development. 

Most simulation software products employ 
random number generation algorithms that 
produce streams of numbers that appear to be 
random, but are in fact a result of a clearly 
defined series of calculation steps in which 
the next "random number" x„ in the sequence 
is a function of the previous "random num¬ 
ber" x n -i, that is, x n = f(x n -\). The sequence 
starts with a number called the seed, and if 
the same seed is used in several simulations, 
each simulation sequence will contain exactly 
the same numbers, which is helpful for code 
debugging and drawing fair comparisons be¬ 
tween different strategies evaluated under un¬ 
certainty. It is quite an amazing statistical fact 
that some of these recursion formulas (named 
"pseudo-random number generators") define 
sequences of numbers that imitate random be¬ 
havior well and appear to obey (roughly) some 
major laws of probability, such as the CLT and 
the Glivenko-Cantelli lemma. 
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In general, a pseudo-random number gener¬ 
ator is considered "good" if it satisfies the fol¬ 
lowing conditions: 

1. The numbers in the generated sequence are 
uniformly distributed between 0 and 1. This 
can be tested by running a chi-square or a 
Kolmogorov-Smirnov test. 

2. The sequence has a long cycle (that is, it takes 
many iterations before the sequence begins 
repeating itself). 

3. The numbers in the sequence are not auto- 
correlated. This can be verified by running a 
Durbin-Watson test on the sequence of num¬ 
bers. The Durbin-Watson test is widely used 
in statistics for identifying autocorrelation in 
time series of observations. 

In the following section, we discuss briefly a 
couple of important types of pseudo-random 
number generators. The goal is not to provide 
comprehensive coverage of random number 
generators, but rather to give readers a flavor of 
the main ideas behind the method of producing 
apparently random numbers with determinis¬ 
tic algorithms. 

Pseudo-Random Number 
Generators 

One of the earliest pseudo-random number 
generators developed is called the midsquare 
technique. It takes a number (the seed), squares 
it, and takes the set of middle digits as the next 
random number. It is easy to predict when such 
an approach may run into difficulties. As soon 
as the "middle digits" become a small number 
such as 1 or 0, the sequence ends with the same 
numbers generated over and over again; that 
is, the sequence converges to a constant value 
(typically 0) or to a very short cycle of values. 

A better, commonly used type of pseudo¬ 
random number generators is congruential 
pseudo-random number generators. They are 
based on sequences of numbers of the form 

x n — f(x n _ i) mod m 


where mod m stands for "modulus m". 
f{x n _\) mod m is the remainder after dividing 
f(x„_i ) by m. For example, 5 mod 3 = 2,15 mod 
5 = 0, etc. Note that /(;t„_i) mod m will always 
be an integer between 0 and m- 1. Thus, to cre¬ 
ate a good representation of randomness, one 
would want to make the range for the modu¬ 
lus as large as possible. For a 32-bit computer, 
for example, the maximum integer that can be 
stored is 2 31 - 1, which is large enough for prac¬ 
tical purposes. 

More advanced generators include matrix 
multiplicative congruential generators, multi¬ 
ple recursive generators, and shuffled genera¬ 
tors. Most pseudo-random number generators 
used in popular software products nowadays 
have been thoroughly tested and are very good. 

VARIANCE REDUCTION 
TECHNIQUES 

Paradoxically, truly random numbers can be 
too random for all practical purposes. Recall 
that the error in the average estimate obtained 
from truly random Monte Carlo simulation is 
proportional to 1/^/n, where n is the number 
of scenarios for the random variable (this fact 
would be approximately true for good pseudo¬ 
random number generators as well). Much re¬ 
search has been dedicated in recent years to 
finding ways to reduce that error and to be com¬ 
putationally savvy when generating scenarios. 
Several methods for variance reduction, widely 
used in financial applications, are listed below. 9 

Antithetic Variables 

Simulating a random number is computation¬ 
ally expensive. One technique that is used to re¬ 
duce the error in the average estimate in deriva¬ 
tive pricing without increasing the number of 
simulated values is to incorporate the generated 
random number twice in computing the deriva¬ 
tive payoff: once as the original simulated num¬ 
ber, and another as its "antithetic" number. 
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For example, recall from our earlier option 
pricing example that one possibility to model 
the value of the stock price St at time T is by 
using equation (2). In that expression, w is a ran¬ 
dom variable following a normal distribution 
with mean 0 and standard deviation 1. Suppose 
that n values for the normal random variable 
w are generated. With the antithetic variable 
method, the value of the derivative payoff in 
each of the n scenarios is computed as the aver¬ 
age of two payoffs: one obtained by plugging in 
the simulated value for ui, and another obtained 
by plugging in the negative of the simulated 
value for w. These n "adjusted" payoffs are 
otherwise treated in the same way as in the tra¬ 
ditional simulation method described earlier in 
this entry: At the end, the n payoffs are averaged 
and properly discounted to obtain the "fair" 
estimate of the derivative price. The difference 
is that this approach substantially reduces the 
standard error in the average estimate, while 
keeping the number of simulation trials at n. 

The antithetic variable approach does not 
apply only to normal random variables. As ex¬ 
plained in the previous section, random num¬ 
ber generation typically happens in two stages: 
First, a random number between 0 and 1 is gen¬ 
erated, and then this random number is "in¬ 
verted" to obtain a random number from the 
desired probability distribution. Thus, one can 
apply the antithetic technique at the first stage, 
and treat the randomly generated number U as 
two realizations: U and its "antithetic" variable 
1-U. For example, if the number generated on 
the interval [0,1] is 0.7, then the antithetic num¬ 
ber is 0.3. Both of these numbers can then be 
"inverted" to obtain a pair of antithetic variables 
from a prespecified distribution. 

Stratified Sampling 

Observations in the tails of input distributions 
that are typically less likely to be generated may 
never occur in a simulation, because the prob¬ 
ability of their occurrence is small. Such obser¬ 
vations, however, contain important informa¬ 
tion about extreme events which are of partic¬ 


ular interest in financial applications. In order 
to ensure that they appear in the simulation, 
one would need to generate a huge number of 
scenarios. 

This problem is often addressed by stratified 
sampling. Most generally, the term "stratified 
sampling" refers to any technique that divides 
the random values into ranges (called "strata" 
in statistics), and sampling from each range to 
ensure that a good representation of the distri¬ 
bution is obtained. 

A simple way of stratifying the numbers in the 
[0,1] interval to ensure that, when "inverted," 
the generated random numbers cover well the 
whole range of a probability distribution of in¬ 
terest, is to divide the [0,1] interval into k smaller 
intervals of equal length: 


Random numbers can then be drawn se¬ 
quentially from each small interval. Therefore, 
values from the tails of the distribution of in¬ 
terest (which will be generated when uniform 
random numbers from the intervals [0, £] and 
(kA, are drawn) obtain better representation. 

In multiple dimensions (that is, when simu¬ 
lating several random variables), this method 
extends to dividing a hypercube (as opposed to 
an interval) into smaller hypercubes, and draw¬ 
ing an observation along each dimension of the 
smaller hypercubes. An enhanced extension to 
the basic stratified sampling method is Latin hy¬ 
percube sampling (an option in many advanced 
simulation software products), which permutes 
the coordinates of an initially generated ran¬ 
dom vector of observations—one observation 
within each small hypercube—to reduce the 
number of times an actual random number is 
generated while ensuring that all strata are suf¬ 
ficiently well represented. 

Importance Sampling 

Importance sampling is an alternative to strati¬ 
fied sampling for dealing with rare events, or 
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extreme observations, and for reducing the 
number of simulation trials necessary to 
achieve a particular level of accuracy. The 
method changes the underlying scenario prob¬ 
abilities so as to give more weight to impor¬ 
tant outcomes in the simulation. Such outcomes 
are generated with greater frequency than they 
otherwise would. At the end, the observations' 
weights are scaled back in the computation of 
the parameter of interest, so that the estimation 
is correct. 

There is no single recipe for how to construct 
good importance sampling methods. The spe¬ 
cific construction depends on the underlying 
random process dynamics. For example, when 
pricing a European call option in the Black- 
Scholes setting, generating paths that are out- 
of-the-money is wasteful. This is because only 
paths that are in-the-money count in the final 
computation of the option price—the contribu¬ 
tion of out-of-the-money paths to the option 
price is 0. Although in practice one would not 
use importance sampling for pricing a Euro¬ 
pean call option for which there is a closed- 
form formula, we will use European call option 
pricing as a context in which to explain the im¬ 
portance sampling method. 

First, note that in-the-money paths will occur 
only if the asset price at expiration is greater 
than the strike price; that is, they will result 
from realizations of the standard normal ran¬ 
dom variable w such that 

S te ( r_ \ al )( T ~t)+ a V (T-f)ffi > k 

From this inequality, one can derive that only 
normal random numbers higher than 

\n(K/S t )-(r -a 2 /2)(T -t) 
a^/T — t 

will lead to in-the-money paths. Equivalently, 
this means that only random numbers between 

mdl 

V or VT^7 / 

on the unit interval [0,1], when "inverted" to 
obtain normal random numbers, will lead to in- 


the-money paths (N(.) here denotes the cumula¬ 
tive normal distribution). Thus, one only needs 
to simulate random numbers in that range of 
the [0,1] interval. When computing the option 
price at the end, instead of weighing each payoff 
equally by multiplying it by 1 /n as one would 
do in standard Monte Carlo sampling, one mul¬ 
tiplies the sum of the payoffs obtained from the 
simulation by the probability that a particular 
random path would be in-the-money assuming 
truly random sampling, which is the standard 
Monte Carlo method. The latter probability is 

ln(K/S t )-(r-* 2 /2)(T-t) \ 

oJT^t ) 

ln(S t /K) + (r-a 2 /2)(T -t) \ 

aVT^J ) 

The call option price is then 

v __ e ~r{T—t) N ( MS t /K) + (r-a 2 /2)(T-t) \ 

■ ^max {s ( e (r -l or2 K T -*)+” r v / C = 0“'i - K, oj 

f=i 

where w\,... ,w n are all random numbers gen¬ 
erated from a normal distribution in the range 
higher than 

\n(K/S t )-(r -ct 2 /2)(T -t) 

<j^/T — t 

As mentioned above, this is only a simple 
example in order to illustrate the main idea 
of importance sampling. More practical (al¬ 
beit more technically challenging) applications 
can be found, for instance, in Chapter 4.6 in 
Glasserman (2004). 

Quasi-Random (Low-Discrepancy) 
Sequences 

A truly random number generator may pro¬ 
duce clustered observations (see Figure 3a), 
which necessitates generating many scenarios 
in order to obtain a good representation of the 
output distribution of interest. Recall from our 
earlier discussion that stratified sampling can 
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(a) Pseudo-Random Number Generator 



(b) Sobol Quasi-Random Sequence 


Figure 3 One Thousand Simulated Number Values for Two Uniform Random Variables on the Interval 

[ 0 , 1 ] 


be used to deal with this problem—it divides 
the ranges of possible values into a fixed num¬ 
ber of strata, so as to "disperse" observations 
more evenly over the range. Quasi-random se¬ 
quences instead ensure a smooth representation 
of the range by continuously "filling in" gaps on 
the unit interval [0,1] left by previously gener¬ 
ated random numbers (see an example of 1,000 
generated values of a quasi-random sequence in 
Figure 3b). The term "quasi-random" is actually 
a misnomer, because, unlike pseudo-random 
number sequences, quasi-random number se¬ 
quences do not pretend to be random. They are 
deterministic on purpose, and their roots can 
be found in real analysis and abstract algebra 
rather than in simulation or probability theory. 
The term low discrepancy sequences is often used 
interchangeably with the term "quasi-random" 
sequences, and is more accurate. 

Important examples of quasi-random se¬ 
quences were suggested by Sobol (1967), Faure 
(1982), Flalton (1960), and Flammersley (1960). 
These sequences build on a family of so-called 
Van der Korput sequences. 10 For example, the 
Van der Korput sequence of base 2 is 

113 15 3 7 
2’ 4’ 4’ 8' 8’ 8’ 8”" 

The actual generation of Van der Korput se¬ 
quences is somewhat technical, but the outcome 


is intuitive. Note that as new points are added 
to the sequence, they appear on alternate sides 
of \ in a balanced way. The main idea is that as 
the number of generated values increases, the 
sequence covers uniformly the unit interval. 

The values generated with quasi-random se¬ 
quences are treated as "random" numbers for 
the purposes of simulation modeling. In par¬ 
ticular, instead of generating random numbers 
between 0 and 1 and "inverting" them to obtain 
an arbitrary probability distribution, one would 
"invert" the numbers in the quasi-random se¬ 
quence. Different sequences have different ad¬ 
vantages for specific financial applications, but 
the Faure and Sobol sequences in particular 
have been proven to generate very accurate es¬ 
timates for derivative pricing in tests. 11 

Figure 4 illustrates the value of a Euro¬ 
pean call option computed with three differ¬ 
ent methods: BS (the closed-form Black-Scholes 
price), MC (traditional Monte-Carlo), and QMC 
(quasi-random or quasi-Monte-Carlo using a 
Faure low discrepancy sequence to generate 
scenarios). The current asset price is assumed 
to be $100, the exercise price for the option is 
assumed to be $95, the asset volatility is 20%, 
the time to maturity of the option is 1 year, and 
the risk-free rate is 4% per annum. One can ob¬ 
serve that as the number of scenarios gener¬ 
ated increases, the quasi-Monte-Carlo method 
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Convergence of Monte Carlo Methods 


BS MC —-QMC 



Figure 4 Value of a European Call Option Computed with Three Different Methods 


results in a smoother and more consistent ap¬ 
proximation to the true option price computed 
with the Black-Scholes formula than the tradi¬ 
tional Monte Carlo method. In general, as the 
number of generated quasi-random numbers 
increases, so does the accuracy of estimation, 
although it is not easy to state the exact level of 
accuracy, because probability laws do not apply 
to deterministic sequences. 

SIMULATION SOFTWARE 

Today, good random number generators and 
user-friendly simulation software are easily 
available. Most computer languages have a 
"rand()" command that simulates a random 
number between 0 and 1. Microsoft Excel add¬ 
ins such as Crystal Ball and @RISK allow not 
only for simulating random numbers from a 
wide variety of probability distributions, but 
also for incorporating random number gener¬ 
ation into larger models through macros and 
scripts. Computing environments such as Mat- 
lab and Mathematica contain commands for 
random number simulation, and the capability 
of generating low discrepancy sequences can 
be added through widely available libraries. 
In addition, a number of modules that allow 


for simulating sophisticated probability distri¬ 
butions are available for open-source computer 
languages such as Perl (see the Comprehensive 
Perl Archive Network, http://www.cpan.org). 
Python (see http://www.python.org), and R 
(see http://www.r-project.org). 

KEY POINTS 

* The main idea behind the Monte Carlo sim¬ 
ulation technique is to represent uncertainty 
in the form of scenarios and to evaluate vari¬ 
ables of interest based on these scenarios. 

* Monte Carlo simulation has widespread ap¬ 
plications in pricing, hedging, and risk man¬ 
agement. Examples include complex financial 
derivative pricing, assessment of sensitivity 
of prices to changes in market variables, port¬ 
folio risk measurement, and credit risk esti¬ 
mation and pricing. 

* Despite great advances in computational 
power, Monte Carlo simulation can be expen¬ 
sive for large-scale problems, and a substan¬ 
tial amount of research in recent years has 
been dedicated to making it more efficient 
and accurate. 

* Variance reduction methods such as an¬ 
tithetic variables, stratified sampling. 
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importance sampling, and carefully selected 
low discrepancy sequences are widely used 
in practice today 

NOTES 

1. For an introduction to Brownian motion, 
see Hull (2005). 

2. See, for example. Chapter 9 in Glasserman 
(2004). 

3. See Chapter 7.2 in Glasserman (2004). 

4. See Chen and Glasserman (2006a) for fur¬ 
ther details. 

5. See Glasserman et al. (2000). 

6. For example, see Duffie and Garleanu 

( 2001 ). 

7. See Chen and Glasserman (2006b). 

8. See, for example. Chapter 2 in Evans and 
Olson (2002). 

9. For a more detailed discussion of such 
methods, see Chapter 14 in Pachamanova 
and Fabozzi (2010). 

10. See Chapter 5 in Glasserman (2004). 

11. See the survey in Boyle et al. (1997). 
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Stochastic Volatility 
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Abstract: Volatility, as measured by the standard deviation, is an important concept in financial 
modeling because it measures the change in value of a financial instrument over a specific horizon. 
The higher the volatility, the greater the price risk of a financial instrument. There are different types 
of volatility: historical, implied volatility, level-dependent volatility, local volatility, and stochas¬ 
tic volatility (e.g., jump-diffusion volatility). Stochastic volatility models are used in the field of 
quantitative finance. Stochastic volatility means that the volatility is not a constant, but a stochastic 
process and can explain volatility smile and skew. 


Volatility, typically denoted by the Greek letter 
a , is the standard deviation of the change in 
value of a financial instrument over a specific 
horizon such as a day week, month, or year. 
It is often used to quantify the price risk of a 
financial instrument over that time period. The 
price risk of a financial instrument is higher the 
greater its volatility. 

Volatility is an important input in option pric¬ 
ing models. The Black-Scholes model for option 
pricing assumes that the volatility term is a con¬ 
stant. This assumption is not always satisfied in 
real-world options markets because probability 
distribution of common stock returns has been 
observed to have a fatter left tail and thinner 
right tail than the lognormal distribution (see 
Hull, 2000). Moreover, the assumption of con¬ 
stant volatility in a financial model, such as the 
original Black-Scholes option pricing model, is 
incompatible with option prices observed in the 
market. 


As the name suggests, stochastic volatility 
means that volatility is not a constant, but a 
stochastic process. Stochastic volatility models 
are used in the field of quantitative finance 
and financial engineering to evaluate deriva¬ 
tive securities, such as options and swaps. By 
assuming that volatility of the underlying price 
is a stochastic process rather than a constant, 
it becomes possible to more accurately model 
derivatives. In fact, stochastic volatility mod¬ 
els can explain what is known as the volatility 
smile and volatility skew in observed option 
prices. 

In this entry, we provide an overview of 
the different types of nonstochastic volatilities 
and the different types of stochastic volatilities. 
There are two approaches to introduce stochas¬ 
tic volatility: (1) changing the clock time t to a 
random time T(t) (: subordinator ), and (2) chang¬ 
ing constant volatility into a positive stochastic 
process. 
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NONSTOCHASTIC 
VOLATILITY MEASURES 

We begin by providing an overview of the dif¬ 
ferent types of nonstochastic volatility measure. 
These include 

• Historical volatility 

• Implied volatility 

• Level-dependent volatility 

• Local volatility 


Historical Volatility 

Historical volatility is the volatility of a financial 
instrument or a market index based on histori¬ 
cal returns. It is a standard deviation calculated 
using historical (daily, weekly, monthly, quar¬ 
terly, yearly) price data. The annualized volatil¬ 
ity a is the standard deviation of the instru¬ 
ment's logarithmic returns over a one-year pe¬ 
riod: 


a 


1 


N 


n — 1 


Ew-k ) 2 

Z=1 


where R, = In R = - V” , In3- Sf is an 
S f; _i ’ n i—il= 1 S tl l ’ 

asset price at time f„ i = 1,2,..., n. 


Implied Volatility 

Implied volatility is related to historical volatil¬ 
ity. However, there are important differences. 
Historical volatility is a direct measure of the 
movement of the price (realized volatility) over 
recent history. Implied volatility, in contrast, is 
set by the market price of the derivative contract 
itself, and not the underlier. Therefore, different 
derivative contracts on the same underlier have 
different implied volatilities. Most derivative 
markets exhibit persistent patterns of volatil¬ 
ities varying by strike. The pattern displays 
different characteristics for different markets. 
In some markets, those patterns form a smile 
curve. In others, such as equity index options 
markets, they form more of a skewed curve. 


This has motivated the name "volatility skew." 
For markets where the graph is downward slop¬ 
ing, such as for equity options, the term "volatil¬ 
ity skew" is often used. For other markets, such 
as FX options or equity index options, where 
the typical graph turns up at either end, the 
more familiar term "volatility smile" is used. In 
practice, either term may be used to refer to the 
general phenomenon of volatilities varying by 
strike. 

The models by Black and Scholes (1973) 
(continuous-time (B,S)-security market) and 
Cox, Ross, and Rubinstein (1976) (discrete-time 
(B,S)-security market (binomial tree)) are un¬ 
able to explain the negative skewness and lep- 
tokurticity (fat tail) commonly observed in the 
stock markets. The famous implied-volatility 
smile would not exist under their assumptions. 
Most derivatives markets exhibit persistent pat¬ 
terns of volatilities varying by strike. In some 
markets, those patterns form a smile. In oth¬ 
ers, such as equity index options markets, it is 
more of a skewed curve. This has motivated 
the name volatility skew. In practice, either the 
term volatility smile or volatility skew (or sim¬ 
ply skew) may be used to refer to the general 
phenomenon of volatilities varying by strike. 
Another dimension to the problem of volatility 
skew is that of volatilities varying by expiration, 
known as volatility surface. 

Given the prices of call or put options across 
all strikes and maturities, we may deduce the 
volatility that produces those prices via the 
full Black-Scholes equation. 1 This function has 
come to be known as local volatility. Local 
volatility-function of the spot price S t and time 
t: o = a(St, t) (see Dupire's (1994) formulas for 
local volatility). 


Level-Dependent Volatility 

Level-dependent volatility (e.g., constant elastic¬ 
ity of variance (CEV) or firm model, see Beck¬ 
ers, 1980, Cox, 1975) is a function of the spot 
price alone. To have a smile across strike price. 
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we need a to depend on S : a = a (St). In this 
case, the volatility and stock price changes are 
now perfectly negatively correlated (so-called 
"leverage effect"). 

Local Volatility 

Local volatility (LV) is a volatility function of 
the spot price and time. Volatility smile can be 
retrieved in this case from the option prices. 
Dupire (1994) derived the local volatility for¬ 
mula in continuous time and Derman and Kani 
(1994) used the binomial (or trinomial tree) 
framework instead of the continuous one to find 
the local volatility formula. The LV models are 
very elegant and theoretically sound. However, 
they present in practice many stability issues. 
They are ill-posed inversion problems and are 
extremely sensitive to the input data. This might 
introduce arbitrage opportunities and, in some 
cases, negative probabilities or variances. 

Stochastic Volatility 

Stochastic volatility means that volatility is not 
a constant, but a stochastic process. Black and 
Scholes (1973) made a major breakthrough by 
deriving pricing formulas for vanilla options 
written on the stock. The Black-Scholes model 
assumes that the volatility term is a constant. 
Stochastic volatility models are used in the field 
of quantitative finance to evaluate derivative 
securities, such as options and swaps (see Carr 
and Lee, 2009). By assuming that the volatility 
of the underlying price is a stochastic process 
rather than a constant, it becomes possible to 
more accurately model derivatives. 

The above issues have been addressed and 
studied in several ways, such as: 

1. Volatility is assumed to be a deterministic 
function of the time: 2 a = a(t), with the im¬ 
plied volatility for an option of maturity T 
given by <7^ = \ // a^du; 

2. Volatility is assumed to be a function of the 
time and the current level of the stock price 


S(t): a = er(f, S(t)); 3 the dynamics of the 
stock price satisfies the following stochastic 
differential equation: 

dS(t) = nS(t)dt + a(t, S(t))S(t)dW\(t) 

where W\(t) is a standard Wiener process; 

3. The time variation of volatility involves an 
additional source of randomness, besides 
Wi(t), represented by YJ 2 (t), and is given by 

da(t) = a(t, a(t))dt + b(t, a(t))dW 2 (t) 

where W 2 (f) and W\(t) (the initial Wiener 
process that governs the price process) may 
be correlated; 4 

4. Volatility depends on a random parameter 
x such as er(f) = er(x(f)), where x(t) is some 
random process. 5 

5. Stochastic volatility, namely, uncertain 
volatility scenario. This approach is based 
on the uncertain volatility model developed 
in Avellaneda et al. (1995), where a concrete 
volatility surface is selected among a candi¬ 
date set of volatility surfaces. This approach 
addresses the sensitivity question by com¬ 
puting an upper bound for the value of the 
portfolio under arbitrary candidate volatil¬ 
ity, and this is achieved by choosing the local 
volatility cr(f, S(f)) among two extreme val¬ 
ues CTtnin and a max such that the value of the 
portfolio is maximized locally; 

6. The volatility er(f, St) depends on St = S(t + 
6) for 0 e [—r,0], namely, stochastic volatility 
with delay. 6 

In approach (1), the volatility coefficient is in¬ 
dependent of the current level of the underlying 
stochastic process S(f). This is a deterministic 
volatility model, and the special case where a 
is a constant reduces to the well-known Black- 
Scholes model that suggests changes in stock 
prices are lognormal. Empirical tests by Boller- 
slev (1986) seem to indicate otherwise. One ex¬ 
planation for this problem of a lognormal model 
is the possibility that the variance of log(S(f)/ 
S(f — 1)) changes randomly. 
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In approach (2), several ways have been de¬ 
veloped to derive the corresponding Black- 
Scholes formula: One can obtain the formula 
by using stochastic calculus and, in particular, 
Ito's formula (see Shiryaev (1999), for example). 

A generalized volatility coefficient of the form 
er(f,S(f)) is said to be level-dependent. Because 
volatility and asset price are perfectly corre¬ 
lated, we have only one source of randomness 
given by Wi(f). A time and level-dependent 
volatility coefficient makes the arithmetic more 
challenging and usually precludes the existence 
of a closed-form solution. However, the ar¬ 
bitrage argument based on portfolio replica¬ 
tion and completeness of the market remains 
unchanged. 

Approaches to Introduce 
Stochastic Volatility 

The idea to introduce stochastic volatility is to 
make volatility itself a stochastic process. The 
aim with a stochastic volatility model is that 
volatility appears not to be constant and in¬ 
deed varies randomly. For example, the situa¬ 
tion becomes different if volatility is influenced 
by a second "nontradable" source of random¬ 
ness, and we usually obtain a stochastic volatil¬ 
ity model, introduced by Hull and White (1987). 
This model of volatility is general enough to 
include the deterministic model as a special 
case. Stochastic volatility models are useful be¬ 
cause they explain in a self-consistent way why 
it is that options with different strikes and ex¬ 
pirations have different Black-Scholes implied 
volatilities (the volatility smile). These cases 
are addressed in approaches 3, 4 and 5 above. 
Stochastic volatility is the main concept used 
in the fields of financial economics and mathe¬ 
matical finance to deal with the endemic time- 
varying volatility and codependence found in 
financial markets. Such dependence has been 
known for a long time; early comments include 
Mandelbrot (1963) and Officer (1973). 

There are two approaches to introduce 
stochastic volatility: One approach is to change 


the clock time t to a random time T(t) (change 
of time). Another approach is to change con¬ 
stant volatility into a positive stochastic process. 
Continuous-time stochastic volatility models 
include: 

• Omstein-Uhlenbeck (OU) process (Omstein- 
Uhlenbeck (1930)) 

• Geometric Brownian motion with zero corre¬ 
lation with respect to a stock price (Hull and 
White, 1987) 

• Geometric Brownian motion with nonzero 
correlation with respect to a stock price 
(Wiggins, 1987) 

• OU process, mean-reverting, positive with 
nonzero correlation with respect to a stock 
price (Scott, 1989) 

• OU process, mean-reverting, negative, with 
zero correlation with respect to a stock price 
(Stein and Stein, 1991) 

• Cox-Ingersoll-Ross process, mean-reverting, 
nonnegative with non zero correlation with 
respect to a stock price (Heston, 1993). 

Heston and Nandi (1997) showed that the 
OU process corresponds to a special case of 
the GARCH model for stochastic volatility. 
Hobson and Rogers (1998) suggested a new 
class of nonconstant volatility models, which 
can be extended to include the aforementioned 
level-dependent model and share many char¬ 
acteristics with the stochastic volatility model. 
The volatility is nonconstant and can be re¬ 
garded as an endogenous factor in the sense 
that it is defined in terms of the past behavior 
of the stock price. This is done in such a way 
that the price and volatility form a multidimen¬ 
sional Markov process. 

Discrete Models for Stochastic 
Volatility 

Another popular process is the continuous¬ 
time GARCH(1,1) process, developed by Engle 
(1982) and Bollerslev (1986) in a discrete frame¬ 
work. The generalized autoregressive con¬ 
ditional heteroskedasticity (GARCH) model 
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(see Bollerslev, 1986) is popular model for es¬ 
timating stochastic volatility. It assumes that 
the randomness of the variance process varies 
with the variance, as opposed to the square root 
of the variance as in the Heston model. The 
standard GARCH(1,1) model has the following 
form for the variance differential: 

dcif = k(6 — ot)dt + ya t dBt 

The GARCH model has been extended via 
numerous variants, including the NGARCH, 
LGARCH, EGARCH, GJR-GARCH, and so on. 

Continuous-time models provide the natural 
framework for an analysis of option pricing; 
discrete-time models are ideal for the statisti¬ 
cal and descriptive analysis of the patterns of 
daily price changes. Volatility clustering, peri¬ 
ods of high and low variance (large changes 
tend to be followed by small changes; see Man¬ 
delbrot, 1963), led to using discrete models, 
GARCH models. There are two main classes of 
discrete-time stochastic volatility models. First 
class—autoregressive random variance (ARV) 
or stochastic variance model—is a discrete time 
approximation to the continuous time diffusion 
models that we outlined above. Second class 
is the autoregressive conditional heteroskedas- 
tic (ARCH) model introduced by Engle (1982), 
and its descendants GARCH (Bollerslev, 1986), 
NARCH, NGARCH (Duan, 1996), LGARCH, 
EGARCH, GJR-GARCH. General class of 
stochastic volatility models, which includes 
many of the above-mentioned models, has been 
introduced by Ewald, Poulsen, and Schenk- 
Hoppe (2006). Gatheral (2006) introduce the 
Heston-like model for stochastic volatility that 
is more general than the Heston model. 

Jump-Diffusion Volatility 

Jump-diffusion volatility is essential as there is 
evidence that assumption of a pure diffusion 
for the stock return is not accurate. Fat tails 
have been observed away from the mean of the 
stock return. This phenomenon is called lep- 
tokurticity and could be explained in different 


ways. One way to explain smile and leptokur- 
ticity is to introduce a jump-diffusion process 
for stochastic volatility (see Bates, 1996). Jump- 
diffusion is not a level-dependent volatility pro¬ 
cess, but can explain the leverage effect. 

Multifactor Models for 
Stochastic Volatility 

One-factor SV models (all above-mentioned): 
(1) incorporate the leverage between returns 
and volatility and (2) reproduce the skew of 
implied volatility. However, they fail to match 
either the high conditional kurtosis of returns 
(Chernov et al., 2003) or the full term structure 
of implied volatility surface (Cont et al., 2004). 
Two primary generalizations of one-factor SV 
models are: (1) adding jump components in 
returns and/or volatility process, and (2) con¬ 
sidering multifactor SV models. Among multi¬ 
factor SV models we mention here the following 
ones: 

* Fouque et al. (2005) SV model, Chernov et al. 
(2003) model (used efficient method of mo¬ 
ments to obtain comparable empirical-of-fit 
from affine jump-diffusion and two-factor SV 
family models). 

* Molina et al. (2003) model (used a Markov 
chain Monte Carlo method to find strong ev¬ 
idence of two-factor SV models with well- 
separated time scales in foreign exchange 
data). 

* Cont et al. (2004) (found that jump-diffusion 
models have a fairly good fit to the implied 
volatility surface). 

* Fouque et al. (2000) model (found that two- 
factor SV models provide a better fit to the 
term structure of implied volatility than one- 
factor SV models by capturing the behavior 
at short and long maturities). 

* Swishchuk (2006) (introduced two-factor and 
three-factor SV models with delay (incor¬ 
porating mean-reverting level as a random 
process geometric Brownian model, OU, 
continuous-time GARCH(1,1) model). 
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We also mention the SABR model (see 
Hagan et al., 2002), describing a single forward 
under stochastic volatility, and Chen's (1996) 
three-factor model for the dynamics of the in¬ 
stantaneous interest rate. 

Multifactor SV models have advantages and 
disadvantages. One of the disadvantages is that 
multifactor SV models do not admit in gen¬ 
eral explicit solutions for option prices. One of 
the advantages is that they have direct implica¬ 
tions for hedges. As a comparison, a class of 
jump-diffusion models (Bates, 1996) enjoys 
closed-form solutions for option prices. But the 
complexity of hedging strategies increases due 
to jumps. In this way, there is no strong empir¬ 
ical evidence to judge the overwhelming posi¬ 
tion of jump-diffusion models over multifactor 
SV models or vice versa. 

The probability literature demonstrates that 
stochastic volatility models are fundamental 
notions 7 in financial markets analysis. 


KEY POINTS 

• Because it measures the change in value of a 
financial instrument over a specific horizon, 
volatility, as measured by the standard de¬ 
viation, is an important concept in financial 
modeling. 

• The different types of volatility are historical, 
implied, jump-diffusion, level-dependent, lo¬ 
cal, and stochastic volatilities. 

• Stochastic volatility means that the volatil¬ 
ity is not a constant, but a stochastic 
process. Stochastic volatility can explain the 
well-documented volatility smile and skew 
observed in option markets. 

• Stochastic volatility is the main concept used 
in finance to deal with the endemic time- 
varying volatility and codependence found 
in financial markets and stochastic volatility 
models are used to evaluate derivative secu¬ 
rities such as options and swaps. 

• Two approaches to introduce stochastic 
volatility are: (1) changing the clock time to 


a random time and (2) changing constant 
volatility into a positive stochastic process. 

NOTES 

1. Black and Scholes (1973), Dupire (1994), 
Derman and Kani (1994). 

2. Wilmott et al. (1995), Merton (1976). 

3. Dupire (1994), Hull (2000). 

4. Hull and White (1987), Heston (1993). 

5. Elliott and Swishchuk (2007), Swishchuk 
(2000, 2009), Swishchuk et al. (2010). 

6. Kazmerchuk, Swishchuk, and Wu (2005), 
Swishchuk (2005, 2006, 2007, 2009a, 2010). 

7. Barndor-Nielsen, Nicola to, and Shephard 
(1996), Shephard (2005). 
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statistical concepts for, 777:225 
use of, 7:126-127,7:136 
use of in MATLAB, 777:423-427, 
777:447 

use of with VBA, 777:462-463 
and valuation models, 7:271 
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options on, 7:252-253, 7:498-501, 
7:501-502 

planned amortization class (PAC), 
777:6 


plot of convertible functions, 7:273/ 
prediction of yield spreads, 
77:336-344 

price/discount rate relationship, 
7:215-216, 7:215/ 
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putable, effective duration of, 
777:303-304,777:304/ 
regression data for spread 
application, 77:338-3431 
relation to CDSs, 7:525-526 
risk-free, 7:316 
risk-neutral, 777:586 
risk-neutral/equilibrium models 
for, 777:597-598 
security levels of, 7:3751 
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model, 77:140 

Burnout effect, 777:17-18,777:24, 777:74 
Burnout factor 

initializing of, 777:22 
Business cycles, 7:351-352, 7:408, 
77:430-431,77:432-433 
Businesses, correlation within sectors, 
7:411 

Butterfly's wings, effect of, 77:645 

Calculus, stochastic, 7:94—97 
Calendarization, 77:43,77:487-488 
Calibration 

of derivatives, 7:494 
effect of, 777:619 
under GIG model, 77:524 
of local volatility, 77:681-685 
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payoff and payment structure of, 
7:534/ 

premium payments, 7:231/, 

7:533-535 

pricing models for, 7:538-539 
pricing of by static replication, 
7:530-532 

pricing of single-name, 7:532-538 
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Coherent risk measures, 111:327-329 
and VaR, 111:329 

Coins, fair/unfair, 111:169, 111:326-327 
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technique of, 11:384—385 
testing for, 11:386-387 
test of, 11:3941,11:3961 
use of, 11:397 

Collateralized debt obligations 

(CDOs), 1:299,1:525,111:553, 
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Conglomerate discounts, 11:43 
Conseco, debt restructure of, 1:529 
Consistency, notion of, 11:666-667 
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Conversion, 1:274,1:445 
Convexity 

in callable bonds. III: 302-303 
defined, 1:258-259,111:309 
effective. 111: 13, III: 300-304, Ill:617f 
measurement of, 111:13-14, 

Ill:304-305 

negative, 1II.T4,111:49,111:303 
positive, 111:13 
use of. III: 299-300 
Convex programming, 1:29,1:31-32 
Cootner, Paul, 111:242 
Copulas 

advantages of. Ill:284 
defined, 111:283 
mathematics of. III:284-286 
usefulness of. III:287 
visualization of bivariate 
independence, 111:285/ 
visualization of Gaussian, 111:287/ 
Corner solutions, 1:200 
Correlation coefficients 
relation to R 2 ,11:316 
and Theil-Sen regression, 11:444 
use of, 111:286-287 
Correlation matrices, 11:1601, /1:1631, 
Ill:396-397 
Correlations 

in binomial distribution, 1:118 
computation of, 1:92-93 
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concept of. III :283 
drawbacks of. III: 283-284 
between periodic increments, 
777:540f 

and portfolio risk, 7:11 
robust estimates of, 77:443^46 
serial, 77:220 
undesirable, 7:293 
use of, 77:271 

Costs, net financing, 7:481 
Cotton prices, model of, 777:383 
Countable additivity, 777:158 
Counterparts, robust, 77:81 
Countries, low- vs. high inflation, 

7:290 

Coupon payments, 7:212,777:4 
Coupon rates, computing of, 
777:548-549 

Courant-Friedrichs-Lewy (CFL) 
conditions, 77:657 
Covariance 

calculation of between assets, 7:8-9 
estimators for, 7:38^0, 7:194-195 
matrix, 7:38-39, 7:155, 7:190 
relationship with correlation, 7:9 
reliability of sample estimates, 77:77 
use of, 77:370-371 
Covariance matrices 
decisions for interest rates, 777:406 
eigenvectors / eigenvalues, 77:1607 
equally weighted moving average, 
777:402-403 

frequency of observations for, 

777:404 

graphic of, 77:1617 
residuals of return process of, 

77:1627 

of RiskMetrics™ Group, 777:412-413 
statistical methodology for, 
777:398-399 

of ten stock returns, 77:1597 
use of, 77:158-159, 77:169 
using EWMA in, 777:411 
Coverage ratios, 77:560-561 
Cox-Ingersoll-Ross (CIR) model, 7:260, 
7:491-492, 7:547, 7:548, 
777:546-547, 777:656 
Cox processes, 7:315-316, 77:470-471 
Cox-Ross-Rubenstein model, 7:510, 
7:522, 77:678 

CPI (Consumer Price Index), 

7:277-278, 7:291/, 7:292, 7:292/ 
CPRs (conditional prepayment rates). 

See prepayment, conditional 
CPR vector, 777:74. See also 

prepayment, conditional 
Cramer, Harald, 77:470-471 
Crank-Nicolson schemes, 77:666, 

77:669, 77:674, 77:680 


Crank Nicolson-splitting (CN-S) 
schemes, 77:675 

Crashmetrics, use of, 777:379, 777:380 
Credible intervals, 7:156 
Credit-adjusted spread trees, 7:274 
Credit crises 
of 2007,777:74 
of 2008,777:381 

data from and DTS model, 7:396 
in Japan, 7:417 
Credit curing, 777:73 
Credit default swaps (CDSs). Sec CDSs 
(credit default swaps) 

Credit events 
and credit loss, 7:379 
in default swaps, 7:526,7:528-530 
definitions of, 7:528 
descriptions of most used, 7:528f 
exchanges/payments in, 7:231/ 
in MBS turnover, 777:66 
prepayments from, 777:49-50 
protection against, 7:230 
and simultaneous defaults, 7:323 
Credit hedging, 7:405 
Credit inputs, interaction of, 777:36-38 
Credit loss 

computation of, 7:382-383 
distribution of, 7:369/ 
example of distribution of, 7:386/ 
simulated, 7:389 

steps for simulation of, 7:379-380 
Credit models, 7:300, 7:302,7:303 
Credit performance, evolution of, 
777:32-36 
Credit ratings 
categories of, 7:362 
consumer, 7:302 
disadvantages of, 7:300-301 
implied, 7:381-382 
maturity of, 7:301 
reasons for, 7:300 
risks for, 77:280-281,77:2807 
use of, 7:309 
Credit risk 
common, 7:322 
counterparty, 7:413 
in credit default swaps, 7:535 
defined, 7:361 
distribution of, 7:377 
importance of, 777:81 
measures for, 7:386/ 
modeling, 7:299-300, 7:322, 777:183 
quantification of, 7:369-372 
reports on, 77:278-281 
shipping, 7:566 

and spread duration, 7:391-392 
vs. cash flow risk, 777:377-378 
Credit scores, 7:300-302, 7:301-302, 
7:309, 7:310n 


Credit spreads 

alternative models of, 7:405^06 
analysis with stock prices, 7:3057 
applications of, 7:404-405 
decomposition, 7:401-402 
drivers of, 7:402 
interpretation of, 7:403-404 
model specification, 7:403 
relationship with stock prices, 7:304 
risk in, 77:279f 
use of, 7:222-223 

Credit support, evaluation of, 111:39-40 
Credit value at risk (CVaR). See CVaR 
Crisis situations, estimating liquidity 
in, 777:378-380 

Critical line algorithm (CLA), 7:73 
Cross-trading, 77:85n 
Cross-validation, leave-one-out, 
77:413-414 

Crude oil, 7:561, 7:562 
Cumulation, defined, 777:471 
Cumulative default rate (CDX), 777:58 
Cumulative frequency distributions, 
77:493/ 77:4937, 77:498-499 
formal presentation of, 77:492-493 
Currency put options, 7:515 
Current ratio, 77:554 
Curve imbalances, 77:270-271 
Curve options, 777:553 
Curve risk, 77:275-278 
CUSIPs/ticker symbols, changes in, 
77:202-203 

CVaR (credit value at risk), 7:384-385, 
7:385-386, 77:68, 77:85n, 777:3927. 
See also value at risk (VaR) 

Daily increments of volatility, 777:534 
Daily log returns, 77:407-408 
Dark pools, 77:450, 77:454 
Data. See also operational loss data 
absolute, 77:487-488 
acquisition and processing of, 

77:198 

alignment of, 77:202-203 
amount of, 7:196 
augmentation of, 7:186n 
availability of, 77:202, 77:486 
backfilling of, 77:202 
bias of, 77:204, 77:713 
bid-ask aggregation techniques for, 
77:457/ 

classification of, 77:499-500 
collection of, 77:102, 77:103/ 
cross-sectional, 77:201, 77:488, 77:488/ 
in forecasting models, 77:230 
frequency of, 77:113, 77:368, 
77:462-463, 77:500 
fundamental, 77:246-247 
generation of, 77:295-296 
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Data ( Continued ) 
high-frequency (HFD) (See 

high-frequency data (HFD)) 
historical, 11:77-78, II: 122, 11:172 
housing bubble, II: 397-399 
importing into MATLAB, 

1/1:433—434 

industry-specific, II: 105 
integrity of, II: 201-203 
levels and scale of, 71:486-487 
long-term, 777:389-390 
in mean-variance, 7:193-194 
misuse of, 77:108 
on operational loss, 777:99 
from OTC business, 77:486 
patterns in, 77:707-708 
pooling of, 777:96 
of precision, 7:158 
preliminary analysis of, 777:362 
problems in for operational risk, 
777:97-98 

qualitative vs. quantitative, 

77:486 

quality of, 77:204, 77:211, 77:452-453, 
77:486,77:695 

reasons for classification of, 
77:493-494 

for relative valuation, 77:34—35 
restatements of, 77:202 
sampling of, 77:459/, 77:711 
scarcity of, 77:699-700, 77:703-704, 
77:718 

sorting and counting of, 77:488^191 
standardization of, 77:204,7/7:228 
structure/sample size of, 77:703 
types of, 77:486—488 
underlying signals, 77:111 
univariate, defined, 77:485 
working with, 77:201-206 
Databases 

Compustat Point-In-Time, 77:238 
Factiva, 77:482 

Institutional Brokers Estimate 
System (IBES), 77:238 
structured, 77:482 
third-party, 77:198,77:211n 
Data classes, criteria for, 77:500 
Data generating processes (DGPs), 

77:295-296, 77:298/, 77:502, 77:702, 
777:278 

Data periods, length of, 7/7:404 
Data series, effect of large number of, 
77:708-709 

Data sets, training/test, 77:710-711 
Data snooping, 77:700,77:710-712, 
77:714, 77:717,77:718 
Datini, Francesco, 77:479^180 
Davis-Lo infectious defaults model, 
7:324 


Days payables outstanding (DPO), 
calculation of, 77:553-554 
Days sales outstanding (DSO), 
calculation of, 77:553 
DCF (discounted cash flow) models, 
77:16, 77:44-45 

DDM (dividend discount models). See 
dividend discount models 
(DDM) 

Debt 

long-term, in financial statements, 
77:542 

models of risky, 7:304-307 
restructuring of, 7:230 
risky, 7:307-308 
Debt-to-assets ratio, 77:559 
Debt-to-equity ratio, 77:559 
Decomposition models 
active/passive, 777:19 
Default correlation, 7:317-318 
contagion, 7:353-354 
cyclical, 7:352,7:353 
linear, 7:320-321 
measures of, 7:320-321 
tools for modeling, 7:319-333 
Default intensity, 777:225 
Default models, 7:321-322, 7:370/ 
Default probabilities 
adjustments in real time, 7:300-301 
between companies, 7:412-413 
cyclical rise and fall, 7:408/ 7:409/ 
defined, 7:299-300 
effect of business cycle on, 7:408 
effect of rating outlooks on, 
7:365-366 

empirical approach to, 7:362-363 
five-year (Bank of America and 
Citigroup), 7:301/ 7:302/ 
merits of approaches to, 7:365 
Merton's approach to, 7:363-365 
probability of, 77:727, 77:727/ 77:728/ 
and survival, 7:533-535 
and survival probability, 7:323-324 
term structure of, 7:303 
time span of, 7:302-303 
vs. ratings and credit scores, 
7:300-302 

for Washington Mutual, 7:415/ 
7:416/ 

of Washington Mutual, 7:415/ 

7:416/ 

Defaults 

annual rates of, 7:363 
and Bernoulli distributions, 
777:169-170 

calculation of monthly, 77/:61f 
clustering of, 7:324—325 
contagion, 7:320 
copulas for times, 7:329-331 


correlation of between companies, 
7:411 

cost of, 7:401,7:404/ 
dollar amounts of, 777:59/ 
effect of, 7:228, 777:645 
event vs. liquidation, 7:349 
factors influencing, 777:74—75 
first passage model of, 7:349 
historical database of, 7:414 
intensity of, 7:330, 7:414 
looping, 7:324-325 
measures of, 777:58-59 
in Merton approach, 7:306 
Moody's definition of, 7:363 
predictability of, 7:346-347 
and prepayments, 777:49-50, 
7/7:76-77 

process, relationship to recovery 
rate, 7:372 

pseudo intensities, 7:330 
rates of cumulative/conditional, 
777:63 

recovery after, 7:316-317 
risk of, 7:210 

simulation of times, 7:322-324,7:325 
threshold of, 7:345-346 
times simulation of, 7:319 
triggers for, 7:347-348 
variables in, 7:307-308 
Default swaps 

assumptions about, 7:531-532 
and credit events, 7:530 
digital, 7:537 
discussion of, 7:526-528 
market relationship with cash 
market, 7:530 

and restructuring, 7:528-529 
value of spread, 7:534 
Default times, 7:332 
Definite covariance matrix, 77:445 
Deflators, 7:129, 7:136 
Degrees, in ordinary differential 
equations, 77:644-645 
Degrees of freedom (DOF) 
across assets and time, 77:735-736 
in chi-square distribution, 777:212 
defined, 77:734 

for Dow Jones Industrial Average 
(DJIA), 77:735-737, 77:737/ 
prior distribution for, 7:177 
range of, 7:187n 

for S&P 500 index stock returns, 
77:735-736,77:736/ 

Delinquency measures, 7/7:57-58 
Delivery date, 7:478 
Delta, 7:509, 7:516-518, 7:521 
Delta-gamma approximation, 7:519, 
7/7:644-645 

Delta hedging, 7:413,7:416,7:418, 7:517 
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Delta profile, 1:518/ 

Densities 
beta, 111:108/ 

Burr, 111:110/ 

closed-form solutions for, 111:243 
exponential. 111:105-106,111:105/ 
gamma, 111:108/ 

Pareto, 111:109/ 
posterior, 1:170/ 
two-point lognormal, 111:111/ 
Density curves, 1:1 47f 
Density functions 
asymmetric, 111:205/ 
of beta distribution, 111:222/ 
chi-square distributions, 111:213/ 
common means, different variances, 
111:203/ 

computing probabilities from, 

111:201 

discussion of. 111:197-200 
of F-distribution, 111:217/ 
histogram of, 111:198/ 
of log-normal distribution, 111:223/ 
and normal distribution, 11:733 
and probability, 111:206 
rectangular distributions, 111:220 
requirements of, 111:198-200 
symmetric. 111:204/ 
of f-distribution, 111:214/ 
Dependence, 1:326-327,11:305-308 
Depreciation, 11:22 

accumulated, 11:533-534 
expense zjs. book value, 11:539/ 
expense zjs. carrying value, 11:540/ 
in financial statements, 11:537-539 
on income statements, 11:536 
methods of allocation, 11:537-538 
Derivatives 

construction of, 11:586-587 
described, 11:585-586 
embedded, 1:462 
energy, 1:558 
exotic, 1:558,1:559-560 
of functions, defined, 11:593 
and incomplete markets, 1:462 
interest rate. 111:589-590 
nonlinearity of, 111:644-645 
OTC, 1:538 

pricing of, 1:58, 111:594-596 
pricing of financial, 111:642-643 
relationship with integrals, 11:590 
for shipping assets, 1:555,1:558, 
1:565-566 

use of instruments, 1:477 
valuation and hedging of, 1:558-560 
vanilla, 1:559 
Derman, Emanuel, 11:694 
Descriptors, 11:140,11:246-247,11:256 
Determinants, 11:623 


Deterministic methods 
usefulness of, 11:685 
Diagonal VEC model (DVEC), 11:372 
Dice, and probability, 111:152,111:153, 
111:155-156, lll:156f 
Dickey-Fuller statistic, 11:386-387 
Dickey-Fuller tests, 11:514 
Difference, notation of, 1:80 
Differential equations 
classification of, 11:657-658 
defined, 1:95,11:644,11:657 
first-order system of, 11:646 
general solutions to, 11:645 
linear, 11:647-648 
linear ordinary, 11:644—645 
partial (PDE), 11:643,11:654-657 
stochastic, 11:643-644 
systems of ordinary, 11:645-646 
usefulness of, 11:658 
Diffusion, 111:539, 111:554-555 
Diffusion invariance principle, 1:132 
Dimensionality, curse of, 11:673,111:127 
Dirac measures, 111:271 
Directional measures, 11:428,11:429 
Dirichlet boundary conditions, 11:666 
Dirichlet distribution, 1:181-183, 
!:186-187n 

Discounted cash flow (DCF) models, 
11:16,11:44-45 

Discount factors, 1:57-58,1:59-62,1:60, 
11:600-601 
Discount function 
calculation of, 111:571 
defined, 111:563 
discussion of. 111:563-565 
forward rates from. 111:566-567 
graph of, 111:563/ 
for on-the-run Treasuries, 

111:564-565 

Discounting, defined, 11:596 
Discount rates, 1:211,1:212,1:215-216, 
11:6 

Discovery heuristics, 11:711 
Discrepancies, importance of small, 
11:696 

Discrete law. 111: 165-169 
Discrete maximum principle, 11:668 
Discretization, 1:265,11:669/ 11:672 
Disentangling, 11:51-56 
complexities of, 11:55-56 
predictive power of, 11:54-55 
return revelation of, 11:52-54 
usefulness of, 11:52,11:58 
Dispersion measures, 111:352, 

111:353-354,111:357 
Dispersion parameters. 111:202-205 
Distress events, 1:351 
Distributional measures, 11:428 
Distribution analysis, cash flow, 111:310 


Distribution function, 111:218/ 111:224/ 
Distributions 

application of hypergeometric, 
111:177-178 

beliefs about, 1:152-153 
Bernoulli, 111:169-170,111:1851 
beta, 1:148,111:108 
binomial, 1:81/ 111:170-174,111:1851, 
111:363 

Burr, 111:109-110 

categories for extreme values, 11:752 
common loss, 111:1121 
commonly used, 111:225 
conditional, 111:219 
conditional posterior, 1:178-179, 
1:182-183,1:184-185 
conjugate prior, 1:154 
continuous probability, 111:195-196 
discrete, 111:1851 
discrete cumulative. 111: 166 
discrete uniform, 111:183-184, 
111:1851,111:638/ 

empirical, 11:498,111:104-105,111:105/ 
exponential, 111:105-106 
finite-dimensional, 11:502 
of Frechet, Gumbel and Weibull, 
111:267/ 

gamma, 111:107-108, 111:221-222 
Gaussian, 111:210-212 
Gumbel, 111:228,111:230 
heavy-tailed, l:186n, 11:733, 111: 109, 
111:260 

hypergeometric, 111:174-178, lll:185t 
indicating location of, 111:235 
infinitely divisible. 111:253-256, 
111:2531 

informative prior, 1:152-153 
inverted Wishart, 1:172 
light- os. heavy-tailed, 111:111-112 
lognormal, 111:106,111:106/ 

111:538-539 

mixture loss, 111:110-111 
for modeling applications, 111:257 
multinomial. 111:179-182,111:1851 
non-Gaussian, 111:254 
noninformative prior, 1:153-154 
normal (See normal distributions) 
parametric, 111:201 
Poisson, 1:142,111:182-183,111:1851, 
111: 217-218 

Poisson probability, 111:1871 
posterior, 1:147-148,1:165,1:166-167, 
1:169-170,1:177,1:183-184 
power-law. 111:262-263 
predictive, 1:167 
prior, 1:177,1:181-182,1:196 
proposal, 1:183-184 
representation of stable and CTS, 
11:742-743 
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Distributions ( Continued ) 
spherical, 71:310 

stable, 777:238, 777:242, 777:264-265, 
777:384 (See also a-stable 
distributions) 

subexponential, 777:261-262 
tails of, 777:112/, 777:648 
tempered stable, 777:257, 777:382 
testing applied to truncated, 777:367 
Diversification, 77:57-58 
achieving, 7:10 
and cap weighting, 7:38 
and credit default swaps, 7:413-414 
example of, 7:15 
international, 77:393-396 
Markowitz's work on, 77:471 
Diversification effect, 777:321 
Diversification indicators, 7:192 
Dividend discount models (DDM) 
applied to electric utilities, 77:127 
applied to stocks, 77:16-17 
basic, 77:5 

constant growth, 77:7-9, 77:17-18 
defined, 77:14 
finite life general, 77:5-7 
free cash flow model, 77:21-23 
intuition behind, 77:18-19 
multiphase, 77:9-10 
non-constant growth, 77:18 
predictive power of, 77:54 
in the real world, 77:19-20 
stochastic, 77:10-12, 77:127 
Dividend payout ratio, 77:4, 77:20 
Dividends 

expected growth in, 77:19 
forecasting of, 77:6 
measurement of, 77:3-4, 77:14 
per share, 77:3—4 
reasons for not paying, 77:27 
required rate of return, 77:19 
and stock prices, 77:4-5 
Dividend yield, 77:4, 77:19 
Documentation 

of model risk, 77:696, 77:697 
Dothan model, 7:491, 7:493 
Dow Jones Global Titans 500 (DJGTI), 
77:4907, 77:4917 

Dow Jones Industrial Average (DJIA) 
in comparison of risk models, 
77:747-751 

components of, 77:4897 
fitted stable tail index for, 77:740/ 
frequency distribution in, 77:4897 
performance (January 2004 to June 
2011), 77:749/ 

relative frequencies, 77:4917 
stocks by share price, 77:4927 
Drawing without replacement, 
777:174-177 


Drawing with replacement, 777:170, 
777:174, 777:179-180 

Drift 

effects of, 777:537 
of interest rates, 7:263 
in randomness calculations, 777:535 
in random walks, 7:84,7:86 
time increments of, 7:83 
of time series, 7:80 
as variable, 777:536 
DTS (duration times spread), 7:392, 
7:393-394, 7:396-398 
Duffie-Singleton model, 7:542-543 
Dupire's formula, 77:682-683,77:685 
DuPont system, 77:548-551, 77:551/ 
Duration 

calculations of real yield and 
inflation, 7:286 
computing of, 7:285 
defined, 7:284, 777:309 
effective, 777:300-304, 777:6177 
effective/option adjusted, 777:13 
empirical, of common stock, 
77:318-322,77:319-3227 
estimation of, 77:3237 
measurement of, 777:12-13, 
777:304-305 
models of, 77:461 
modified os. effective, 777:299 
Duration/convexity, effective, 7:255, 
7:256/ 

Duration times spread (DTS). Sec DTS 
(duration times spread) 
Durbin-Watson test, 777:647 
Dynamical systems 

equilibrium solution of, 77:653 
study of, 77:651 

Dynamic conditional correlation 
(DCC) model, 77:373 
Dynamic term structures, 777:576-577, 
777:578-579,777:591 

Early exercise, 7:447, 7:455. See calls, 
American-style; options 
Earnings before interest, taxes, 

depreciation and amortization 
(EBITDA), 77:566 

Earnings before interest and taxes 
(EBIT), 77:23, 77:547, 77:556 
Earnings growth factor, 77:223 
Earnings per share (EPS), 77:20-21, 
77:38-39, 77:537 

Earnings revisions factor, 77:207,77:209/ 
EBITDA/EV factor 
correlations with, 77:226 
examples of, 77:203, 77:203/ 77:207, 
77:208/ 

in models, 77:232, 77:238-239 
use of, 77:222-223 


Econometrics 
financial, 77:295,77:298-300, 
77:301-303 

modeling of, 77:373, 77:654 
Economic cycles, 7:537,77.-42M3 
Economic intuition, 77:715-716 
Economic laws, changes in, 77:700 
Economy 

states of, 7:49-50,77:518-519, 777:476 
term structures in certain, 
777:567-568 

time periods of, 77:515-516 
Economy as an Evolving Complex 

System, The (Anderson, Arrow, 
& Pines), 77:699 

Educated guesses, use of, 7:511 
EE (explicit Euler) scheme, 77:674, 
77:677-678 

Effective annual rate (EAR), interest, 
77:616-617 
Efficiency 

in estimation, 777:641-642 
Efficient frontier, 7:13-14, 7:17/ 7:289/ 
Efficient market theory, 77:396,777:92 
Eggs, rotten, 7:457-458 
Eigenvalues, 77:627-628,77:705, 
77:706-707/ 77:707f 
Einstein, Albert, 77:470 
Elements, defined, 777:153-154 
Embedding problem, and change of 
time method, 777:520 
Emerging markets, transaction costs 
in, 777:628 

EM (expectation maximization) 
algorithm, 77:146, 77:165 
Empirical rule, 777:210, 777:225 
Endogenous parameterization, 
777:580-581 
Energy 

cargoes of, 7:561-562 
commodity price models, 7:556-558 
forward curves of, 7:564-565 
power plants and refineries, 7:563 
storage of, 7:560-561, 7:563-564 
Engle-Granger cointegration test, 
77:386-388, 77:391-392, 77:395 
Entropy, 777:354 

EPS (earnings per share), 77:20-21, 
77:38-39, 77:537 

Equally weighted moving average, 
777.-400M02, 777.-406M07, 
777:408-409 

Equal to earnings before interest and 
taxes (EBIT), 77:23,77:547, 77:556 
Equal-variance assumption, 7.T64, 
7:167 

Equations 

difference, homogenous vs. 
nonhomogeneous, 77:638 
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difference vs. differential, II:629 
diffusion, II: 654-656, 11: 658n 
error-correction, II: 391, II:395f 
homogeneous linear difference, 

II:639-642,11:641/ 
homogenous difference, II:630-634, 
11:631-632/ 11:633-634/ 11:642 
linear, II:623-624 
linear difference, systems of. 

If:637-639 

matrix characteristics of, II: 628 
no arbitrage, 111:612,111:617-619 
nonhomogeneous difference, 

II: 634-637,11:635/ II:637-638/ 
stochastic. III: 478 
Equilibrium 

and absolute valuation models, 
1:260 

defined, II: 385-386 
dimensions of. III: 601 
in dynamic term structure models, 
III: 576 

expectations for, 11:112 
expected returns from, 11:112 
modeling of, 111:577,111:594 
in supply and demand. III:568 
Equilibrium models 
use of. III:603-604 
Equilibrium term structure models, 
III: 601 
Equities, 1:279 
investing in, II:89-90 
Equity 

on the balance sheet, 11:535 
changes in homeowner, 111:73 
in homes. III: 69 
as option on assets, 1:304-305 
shareholders', 11:535 
Equity markets, 11:48 
Equity multipliers, 11:550 
Equity risk factor models, 11:173-178 
Equivalent probability measures, 
1:111,111:510-511 
Ergodicity, defined, 11:405 
Erlang distribution. III:221-222 
Errors. See also estimation error; 
standard errors 

absolute percentages of, 11:525/ 
11:526/ 

estimates of, 11:676 
in financial models, 11:719 
a posteriori estimates, II:672-673 
sources of, 11:720 
terms for, 11:126 
in variables problem, 11:220 
Esscher transform, 111:511, III: 514 
Estimates/estimation 
confidence in, 1:199 
consensus, 11:34-35 


equations for, 1:348-349 
in EVT, 111:272-274 
factor models in, 11:154 
with GARCH models, 11:364-365 
in-house from firms, II: 35 
maximum likelihood, 11:311-313 
methodology for, 11:174—176 
and PCA, 11:167/ 
posterior, 1:176 
posterior point, 1:155-156 
processes for, 1:193,11:176 
properties of for EWMA, 1/1:410—411 
robust, 1:189 
techniques of, 11:330 
use of, 11:304 
Estimation errors 

accumulation of, 11:7 8 

in the Black-Litterman model, 1:201 

covariance matrix of, 111:139-140 

effect of, 1:18 

pessimism in, 111:143 

in portfolio optimization, II: 82, 

III: 138-139 
sensitivity to, 1:191 
and uncertainty sets, 111:141 
Estimation risk, 1:193 
minimizing. III: 145 
Estimators 
bias in. III: 641 
efficiency in, 111:641-642 
equally weighted average, 
111:400-402 
factor-based, 1:39 
terms used to describe, 11:314 
unbiased. III: 399 
variance, 11:313 

ETL (expected tail loss). III: 355-356 
Euler approximation, II: 649-650, 
11:649/ 11:650/ 

Euler constant. III: 182 

Euler schemes, explicit/implicit, II: 666 

Europe 

common currency for, 11:393 
risk factors of, 11:174 
European call options 
Black-Scholes formula for, 

III: 639-640 

computed by different methods, 

III: 650-651,111:651/ 
explicit option pricing formula, 

III: 526-527 

pricing by simulation in VBA, 
111:465-466 

pricing in Black-Scholes setting, 

III: 649 

simulation of pricing, 111:444—445, 
111:462^163 

and term structure models, 
111:544-545 


European Central Bank, 1:300 
Events 

defined. III: 85, III: 162, III: 508 
effects of macroeconomic, 11:243-244 
extreme, 111:245-246, III: 260-261, 

III: 407 

identification of, 11:516 
mutually exclusive. III: 158 
in probability. III: 156 
rare. III: 645 
rare us. normal, 1:262 
tail, III:88n, 111:111,111:118 
three-,5, III: 381-382 
EVT (extreme value theory). See 

extreme value theory (EVT) 
EWMA (exponentially weighted 

moving averages), 111:409—113 
Exceedance observations. III: 362-363 
Exceedances, of VaR, III: 325-326, 

III: 339 

Excel 

accessing VBA in. III: 477 
add-ins for, 1:93, III: 651 
data series correlation in, 1:92-93 
determining corresponding 
probabilities in. III: 646 
Excel Link, 111:434 
Excel Solver, 11:70 
interactions with MATLAB, III: 448 
macros in. III: 449,111:454—455 
notations in, III:477n 
random number generation in, 
111:645-646 

random walks with, 1:83,1:85,1:87, 
1:90 

@RISK in, II:12f 
syntax for functions in. III: 456 
Exchange-rate intervention, study on, 
111:177-178 

Exercise prices, 1:452,1:484,1:508 
Expectation maximization (EM) 
algorithm, 11:146,11:165 
Expectations, conditional, 1:122, 

II: 517-518, III: 508-509 
Expectations hypothesis. III: 568-569, 
III: 601n 

Expected shortfall (ES), 1:385-386, 

III:332. See also average value at 
risk (AVaR) 

Expected tail loss (ETL), III:291, 

111:293/ 111:345-347,111:347/ 

III:355-356 

Expected value (EV), 1:511 
Expenses, noncash, 11:25 
Experiments, possibility of, 11:307 
Explicit costs, defined. III: 623 
Explicit Euler (EE) scheme, 11:674, 

II:677-678 

Exponential density function, 111:218/ 
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Exponential distribution. III: 217-219 
applications in finance. III :219 
Exponentially weighted moving 
averages (EWMA) 
discussion of, 777:409—413 
forecasting model of, 777:411 
properties of the estimates, 
111:410-411 

standard errors for, 777:411^412 
statistical methodology in, 777:409 
usefulness of, 777:413—414 
volatility estimates for, 777:410/ 
Exposures 

calculation of, II:247t 
correlation between, 77:186 
distribution of, 77:250/, 17:251/, 77:254 
management of, 77:182-183 
monitoring of portfolio, 77:249-250 
name-specific, 17:188 
Extrema, characterization of local, 

7:23 

Extremal random variables, 777:267 
Extreme value distributions, 
generalized, 777:269 
Extreme value theory (EVT), 
77:744-746, 777:95, 777:228 
defined, 777:238 
for IID processes, 777:265-274 
in IID sequences, 777:275 
role of in modeling, 77:753n 

Factor analysis 
application of, 77:165 
based on information coefficients, 
77:222 

defined, 77:141, 77:169 
discussion of, 77:164-166 
importance of, 77:238 
vs. principal component analysis, 
77:166-168 

Factor-based strategies 
vs. risk models, 77:236 
Factor-based trading, 77:196-197 
model construction for, 77:228-235 
performance evaluation of, 
77:225-228 

Factor exposures, 77:247-248, 
77:275-283 

Factorials, computing of, 777:456 
Factorization, defined, 77:307 
Factor mimicking portfolio (FMP), 
77:214 

Factor model estimation, 77:142-147, 
77:150 

alternative approaches and 
extensions, 77:145-147 
applied to bond returns, 77:144-145 
computational procedure for, 
77:142-144 


fixed N, 77:143 
large N, 77:143-144 
Factor models 

in the Black-Litterman framework, 
7:200 

commonly used, 77:150 
considerations in, 77:178 
cross-sectional, 77:220-221 
defined, 77:153 
fixed income, 77:271-272 
in forecasting, 77:230-231 
linear, 77:154-156,77:168 
normal, 77:156 
predictive, 77:142 
static/dynamic, 77:146-147, 

77:155 

in statistical methodology, 77:141 
strict, 77:155-156 
types of, 77:138-142 
usefulness of, 77:154, 77:503 
use of, 7:354, 77:137,77:150, 77:168, 
77:219-225 

Factor portfolios, 77:224-225 
Factor premiums, cross-sectional 
methods for evaluation of, 
77:214-219 

Factor returns, 77:1917, 77:1927 
calculation of, 77:248 
Factor risk models, 77:113, 77:119 
Factors 

adjustment of, 77:205-206 
analysis of data of, 77:206-211 
categories of, 77:197 
choice of, 77:232-235 
defined, 77:196, 77:211 
desirable properties of, 77:200 
development of, 77:198 
estimation of types of, 77:156 
graph of, 77:166/ 
known, 77:138-139 
K systematic, 77:138-139 
latent, 77:140-141,77:150 
loadings of, 77:144, 77:1457, 77:155, 
77:1667, 77:167/ 77:1687 
market, 77:176 

orthogonalization of, 77:205-206 
relationship to time series, 77:168/ 
sorting of, 77:215 
sources for, 77:200-201 
statistical, 77:197 

summary of well-known, 77:1967 
transformations applied to, 77:206 
use of multiple, 77:141-142 
Failures, probability of, 77:726-727 
Fair equilibrium, between multiple 
accounts, 77:76 
Fair value 

determination of, 777:584-585 
Fair value, assessment of, 77:6-7 


Fama, Eugene, 77:468, 77:473^74 
Fama-French three-factor model, 
77:139-140, 77:177 

Fama-MacBeth regression, 77:220-221, 
77:224, 77:227-228,77:228/ 77:237, 
77:240n 

Fannie Mae/Freddie Mac, 

writedowns of, 777:77n 
Fast Fourier transform algorithm, 
77:743 
Fat tails 

of asset return distributions, 

777:242 

in chaotic systems, 77:653 
class 2 , 777:261-263 
comparison between risk models, 
77:749-750 
effects of, 77:354 
importance of, 77:524 
properties of, 777:260-261 
in Student's t distribution, 77:734 
Favorable selection, 777:76-77 
F-distribution, 777:216-217 
Federal Reserve 

effects of on inflation risk premium, 
7:281 

study by Cleveland Bank, 
777:177-178 

timing of interventions of, 777:178 
Feynman-Kac formulas, 77:661 
FFAs (freight forward agreements), 
7:566 

Filtered probability spaces, 7:314-315, 
7:334n 

Filtration, 77:516-517, 111:476-477, 
777:489-490, 777:508 
Finance, three major revolutions in, 
777:350 

Finance companies, captive, 7:366-369 
Finance theory 
development of, 11:467-468 
effect of computers on, 77:476 
in the nineteenth century, 
77:468-469,77:476 
in the 1960s, 77:476 
in the 1970s, 77:476 
stochastic laws in, 777:472 
in the twentieth century, 77:476 
Financial assets, price distribution of, 
777:349-350 

Financial crisis (2008), 777:71 
Financial date, pro forma, 77:542-543 
Financial distress, defined, 7:351 
Financial institutions, model risk of, 
77:693 

Financial leverage ratios, 77:559-561, 
77:563 

Financial modelers, mistakes of, 
77:707-710 
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Financial planning, 111:126-127, III: 128, 
III: 129 

Financial ratios, II: 546,11:563-564 
Financial statements 

assumptions used in creating, 

II: 532 

data in, 11:563 

information in, 11:533-542,11:543 
pro forma, 11:22-23 
time statements for, 11:532 
usefulness of, 11:531 
use of, 11:204-205,11:246 
Financial time series, 1:79-80, 

1:386-387,11:415-416,11:503-504 
Financial variables, modeling of, 
111:280 

Find, in MATLAB, 111:422 
Finite difference methods, 11:648-652, 
11:656-657,11:665-666, 

11:674-675,11:676-677,111:19 
Finite element methods, 11:669-670, 
11:672,11:679-681 
Finite element space, 11:670-672 
Finite life general DDM, 11:5-7 
Finite states, assumption of, 1:100-101 
Firms 

assessment of, 11:546-547 
and capital structure, 11:473 
characteristics of, 11:94,11:176-177, 
11:201 

clientele of, 11:36 
comparable, 11:34,11:35-36 
geographic location of, 11:36 
history os. future prospects, 11:92 
phases of, 11:9-10 
retained earnings of, 11:20 
valuation of, 11:26-27,11:473 
value of, 11:27-31,11:39 
os. characteristics of group, 11:90-91 
First boundary problem, 11:655-656, 
11:657/ 

First Interstate Bancorp, 1:304 
analysis of credit spreads, 1:305f 
debt ratings of, 1:410 
First passage models (FPMs), 1:342, 
1:344-348 

Fischer-Tippett theorem. 111:266-267 
Fisher, Ronald, 1:140 
Fisherian, defined, 1:140 
Fisher's information matrix, 1:160n 
Fisher's law, 11:322-323 
Fixed-asset turnover ratio, 11:558 
Fixed-charge coverage ratio, 

11:560-561 

Flesaker-Hughston (FH) model, 

111:548-549 

Flows, discrete, 1:448-453 
FMP (factor mimicking portfolio), 
11:214 


Footnotes, in financial statements, 

11:541-542 

Ford Motor Company, 1:408/ 1:409/ 
Forecastability, 11:132 
Forecastability, concept of, 11:123 
Forecast encompassing 
defined, 11:230-231 
Forecasts 

of bid-ask spreads, H.-456M57 
comparisons of, 11:420-421 
contingency tables, !!:429f 
development of, 11:110-114 
directional, 11:428 
effect on future of, 11:122-123 
errors in, 11:422/ 

evaluation of, 11:428-430,111:368-370 
machine-learning approach to, 

11:128 

measures of, 11:429-430,11:430 
need for, 11:110-111 
in neural networks, II.419M20 
one-step ahead, 11:421/ 
parametric bootstraps for, 

11.-428M30 

response to macroeconomic shocks, 
11:55/ 

usefulness of, II: 131-132 
use of models for, 11:302 
of volatility, 111:412 
Foreclosures, III: 31,111:75 
Forward contracts 
advantages of, 1:430 
buying assets of, 1:439 
defined, 1:426,1:478 
equivalence to futures prices, 
1:432^33 

hedging with, 1:429, l:429t 
as OTC instruments, 1:479 
prepaid, 1:428 
price paths of, f:428f 
short us. long, 1:437-438,1:438/ 
valuing of, 1:426^430 
vs. futures, 1:430M31,1:433 
us. options, 1:437-439 
Forward curves 
graph of, 1:434/ 
modeling of, 1:533,1:557-558, 
1:564-565 

normal vs. inverted, 1:434 
of physical commodities, 1:555 
Forward freight agreements (FFAs), 
1:555,1:558,1:566 

Forward measure, use of, 1:543-544 
Forward rates 

calculation of, 1:491,111:572 

defined, 1:509-510 

from discount function, 111:566-567 

implied, 111:565-567 

models of. 111:543-544 


from spot yields, 111:566 
of term structure, 111:586 
Fourier integrals, 11:656 
Fourier methods, 1:559-560 
Fourier transform, 111:265 
FPMs (first passage models), 1:342, 
1:344-348 

Fractals, 11:653-654,111:278-280, 

111:479M80 

Franklin Tempelton Investment 

Funds, ll:496f, !!:497f, I!:498f 
Frechet distribution, !I:754n, 111:228, 
111:230,111:265,111:267,111:268 
Frechet-Hoeffding copulas, 1:327, 

1:329 

Freddie Mac, ll:77n, !!:754n. 111:49 
Free cash flow (FCF), 11:21-23 
analysis of, 11:570-571 
calculation of, 11:23-24,11:571-572 
defined, 11:569-571,11:578 
expected for XYZ, Inc., ll:30t 
financial adjustments to, 11:25-26 
statement of, direct method, 
11:24-25, ll:24f 

statement of, indirect method, 
11:24-25, ll:24f 
us. cash flow, 11:22-23 
Freedman-Diaconis rule, 11:494,11:495, 
11:497 
Frequencies 

accumulating, 11:491-492 
distributions of, 11:488-491,11:499/ 
empirical cumulative, 11:492 
formal presentation of, 11:491 
Frequentist, 1:140,1:148 
Frictions, costs of, 11:472-473 
Friedman, Milton, 1:123 
Frontiers, true, estimated and actual 
efficient, 1:190-191 
F_SCORE, use of, 11:230-231 
F-test, 11:336,11:337,11:344,11:425, 

11:426 

FTSE 100, volatility in. 111:412-413 
Fuel costs, 1:561,1:562-563. See also 
energy 

Full disclosure, defined, 11:532 
Functional, defined, 1:24 
Functional-coefficient autoregressive 
(FAR) model, 11:417 
Functions 
affine, 1:31 

Archimedean, 1:329,1:330-331,1:331 
Bessel, of the third kind, 11:591 
beta, 11:591 

characteristic, 11:591-592,11:593 
choosing and calibrating of, 
1:331-333 

Clayton, Frank, Gumbel, and 
Product, 1:329 
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Functions ( Continued) 
continuous, 77:581-584,77:582/, 
77:583,77:592-593 

continuous/discontinuous, 77:582/ 
convex, 7:24-27, 7:25, 7:25/ 7:26/ 
convex quadratic, 7:26, 7:31/ 
copula, 7:320, 7:325-333, 7:407-408 
for default times, 7:329-331 
defined, 7:24, 7:333 
density, 7:141 
with derivatives, 77:585/ 
elementary, 777:474 
elliptical, 7:328-329 
empirical distribution, 777:270 
factorial, 77:590-591 
gamma, 77:591, 77:591/ 777:212 
gradients of, 7:23 
Heaviside, 77:418-419 
hypergeometric, 777:256, 777:257 
indicator, 77:584-585, 77:584/ 77:593 
likelihood function, 7:141-143, 
7:143/ 7:144/ 7:148,7:176, 7:177 
measurable, 777:159-160, 777:160/ 
777:201 

minimization and maximization of 
values, 7:22, 7:22/ 
monotonically increasing, 
77:587-588, 77:588/ 
nonconvex quadratic, 7:26-27 
nondecreasing, 777:154-155,777:155/ 
normal density, 777:226/ 
optimization of, 7:24 
parameters of copulas, 7:331-332 
properties of quasi-convex, 7:28 
quasi-concave, 7:27-28, 7:27/ 
right-continuous, 777:154—155, 
777:155/ 

surface of linear, 7:33/ 
with two local maxima, 7:23/ 
usefulness of, 7:411—412 
utility, 7:4-5, 7:14-15, 7:461 
Fund management, art of, 7:273 
Fund separation theorems, 7:36 
Futures 

Eurodollar, 7:503 
hedging with, 7:433 
market for housing, 77:396-397 
prices of, and interest rates, 7:435n 
telescoping positions of, 7.-431M32 
theoretical, 7:487 
valuing of, 7.-430M33 
vs. forward contracts, 7:430—431 
Futures contracts 
defined, 7:478 
determining price of, 7:481 
pricing model for, 7:479—481 
theoretical price of, 7.-481M84 
vs. forward contracts, 7:433, 
7:478-479 


Futures options, defined, 7:453 
Future value, 77:618 
determining of money, 

77:596-600 

Galerkin methods, principle of, 

77:671 

Gamma, 7:509, 7:518-520 
Gamma process, 777:498 
Gamma profile, 7:519/ 

Gapping effect, 7:509 
GARCH (generalized autoregressive 
conditional heteroskedastic) 
models 

asymmetric, 77:367-368 
exponential (EGARCH), 77:367-368 
extensions of, 777:657 
factor models, 77:372 
GARCH-M (GARCH in mean), 
77:368 

Markov-switching, 7:180-184 
time aggregation in, 77:369-370 
type of, 77:131 
usefulness of, 777:414 
use of, 7:175-176, 7:185-186, 77:371, 
77:733-734,777:388 
and volatility, 7:179 
weights in, 77:363-364 
GARCH (1,1) model 
Bayesian estimation of, 7:176-180 
defined, 77:364 
results from, 77:366,77:3667 
skewness of, 777:390-391 
strengths of, 777:388-389 
Student's f, 7:182 
use of, 7:550-551, 777:656-657 
GARCH (1,1) process, 7:5517 
Garman-Kohlhagen system, 7:510-511, 
7:522 

Gaussian density, 777:98/ 

Gaussian model, 777:547-548 
Gaussian processes, 777:280,777:504 
Gaussian variables, and Brownian 
motion, 777:480—481 
Gauss-Markov theorem, 77:314 
GBM (geometric Brownian motion), 
7:95, 7:97 

GDP (gross domestic product), 7:278, 
7:282, 77:138, 77:140 
General inverse Gaussian (GIG) 
distribution, 77:523-524 
Generalized autoregressive 

conditional heteroskedastic 
(GARCH) models. See GARCH 
(generalized autoregressive 
conditional heteroskedastic) 
models 

Generalized central limit theorem, 
777:237, 777:239 


Generalized extreme value (GEV) 

distribution, 77:745, 777:228-230, 
777:272-273 

Generalized inverse Gaussian 

distribution, use of, 77:521-522 
Generalized least squares (GLS), 
7:198-199, 77:328 

Generalized tempered stable (GTS) 
processes, 777:512 
Generally accepted accounting 

principles (GAAP), 77:21-22, 
77:531-532,77:542-543 
Geometric mean reversion (GMR) 
model, 7:91-92 
computation of, 7:91 
Gibbs sampler, 7:172n, 7:179, 7:184-185 
GIG models, calibration of, 77:526-527 
Gini index of dissimilarity (Gini 
measure), 777:353-354 
Ginnie Mae/Fannie Mae/Freddie 
Mac, actions of, 777:49 
Girsanov's theorem 
and Black-Scholes option pricing 
formula, 7:132-133 
with Brownian motion, 777:511 
and equivalent martingale 
measures, 7:130-133 
use of, 7:263, 777:517 
Glivenko-Cantelli theorem, 777:270, 
777:272,777:348n, 777:646 
Global Economy Workshop, Santa Fe 
Institute, 77:699 

Global Industry Classification 

Standard (GICS®), 77:36-37, 
77:248 

Global minimum variance (GMV) 
portfolios, 7:39 

GMR (geometric mean reversion) 
model, 7:91-92 

GMV (global minimum variance) 
portfolios, 7:15, 7:194-195 
GNP, growth rate of (1947-1991), 
77:410-411, 77:410/ 

Gradient methods, use of, 77:684 
Granger causality, 77:395-396 
Graphs, in MATLAB, 777.-428M33 
Greeks, the, 7:516-522 
beta and omega, 7:522 
delta, 7:516-518 
gamma, 7:518-520 
rho, 7:521-522 
theta, 7:509, 7:520-521 
use of, 7:559, 77:660, 777:643-644 
vega, 7:521 

Greenspan, Alan, 7:140-141 
Growth, 7:283/ 77:239, 77:597-598, 
77:601-602 

Gumbel distribution, 777:265, 777:267, 
777:268-269 
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Hamilton-Jacobi equations, II :675 
Hankel matrices, 11:5 12 
Hansen-Jagannathan bound, 1: 59, 
7:61-62 

Harrison, Michael, 77:476 
Hazard, defined, 777:85 
Hazard (failure) rate, calculation of, 
777:94-95 

Heat diffusion equation, 77:470 
Heath-Jarrow-Morton framework, 
7:503, 7:557 

Heavy tails, 777:227,777:382 
Hedge funds, and probit regression 
model, 77:349-350 
Hedge ratios, 1:416-417, 7:509 
Hedges 

importance of, 7:300 
improvement using DTS, 7:398 
in the Merton context, 7:409 
rebalancing of, 7:519 
risk-free, 7:532/ 

Hedge test, 7:409, 7:411 
Hedging 

costs of, 7:514, 77:725 
and credit default swaps, 7.-413M14 
determining, 7:303-304 
with forward contracts, 7:429, 7:429f 
of fuel costs, 7:561 
with futures, 7:433 
gamma, 7:519 
portfolio-level, 7:412^13 
of positions, 77:724—726 
ratio for, 77:725 
with swaps, 7:434^135 
transaction-level, 7:412 
usefulness of, 7:418 
use of, 7:125-126 
using macroeconomic indices, 
7:414-417 

Hessian matrix, 7:23-24, 7:25,7:186n, 
777:645 

Heston model, 7:547, 7:548, 7:552, 
77:682 

with change of time, 777:522 
Heteroskedasticity, 77:220, 77:359, 
77:360, 77:403 

HFD (high-frequency data). See 
high-frequency data (HFD) 
Higham's projection algorithm, 

77:446 

High-dimensional problems, 77:673 
High-frequency data (HFD) 
and bid-ask bounce, 11:454-457 
defined, 77:449-450 
generalizations to, 77:368-370 
Level I, 11:451-452, 77:452/ 77:4537 
Level II, 77:451 
properties of, 77:451, 77:4537 
recording of, 77:450^151 


time intervals of, 77:457-462 
use of, 77:300, 77:481 
volume of, 11:451-454 
Hilbert spaces, 77:683 
Hill estimator, 77:747, 777:273-274 
Historical method 
drawbacks of, 777:413 
weighting of data in, 777:397-398 
Hit rate, calculation of, 77:240n 
HJM framework, 7:498 
HJM methodology, 7:496-497 
Holding period return, 7:6 
Ho-Lee model 
continuous variant for, 7:497 
defined, 7:492 
in history, 7:493 
interest rate lattice, 777:614/ 
as short rate model, 777:23 
for short rates, 777:605 
as single factor model, 777:549 
Home equity prepayment (HEP) 
curve, 777:55-56,777:56/ 
Homeowners, refinancing behavior of, 
777:25 

Home prices, 7:412, 77:397/ 77:399f, 
777:74-75 

Homoskedasticity, 77:360, 77:373 
Horizon prices, 777:598 
Housing, 77:396-399, 777:48 
Howard algorithm (policy iteration 
algorithm), 77:676-677,77:680 
Hull-White (HW) models 
binomial lattice, 777:610-611 
for calibration, 77:681 
defined, 7:492 
interest rate lattice, 777:614/ 
and short rates, 777:545-546 
for short rates, 777:605 
trinomial lattice, 777:613, 777:616/ 
usefulness of, 7:503 
use of, 777:557, 777:604 
valuing zero-coupon bond calls 
with, 7:500 
Hume, David, 7:140 
Hurst, Harold, 77:714 
Hypercubes, use of, 777:648 

IBM stock, log returns of, 77:407/ 
Ignorance, prior, 7:153-154 
Implementation risk, 77:694 
Implementation shortfall approach, 
777:627 

Implicit costs, 771:631 
Implicit Euler (IE) scheme, 77:674, 
77:677-678 

Implied forward rates, 777:565-567 
Impurity, measures of, 77:377 
Income, defined for public 
corporation, 77:21-22 


Income statements 
common-size, 77:562-563,77:562f 
defined, 77:536 

in financial statements, 77:536-537 
sample, 77:537t, 77:547t 
structure of, 77:536 
XYZ Inc. (example), 77:28f 
Income taxes. See taxes 
Independence, 7:372-373, 77:624-625, 
777:363-364, 777:368 
Independence function, in VaR 
models, 777:365-366 
Independently and identically 

distributed (IDD) concept, 
7:164,7:171, 77:127, 777:274-280, 
777:367,777:414 
Indexes 

characteristics of efficient, 7:427 
defined, 77:67 

of dissimilarity, 777:353-354 
equity, 7:157, 77:1907, 77:262-263 
tail, 77:740-741, 77:740/ 777:234 
tracking of, 77:64, 77:180 
use of weighted market cap, 7:38 
value weighted, 7:76-77 
volatility, 777:550-552, 777:552/ 

Index returns, scenarios of, 77:1907, 
77:1917 

Indifference curves, 7:4—5, 7:5/ 7:14 
Industries, characteristics of, 77:36-37, 
77:39-40 

Inference, 7:155-158, 7:1697 
Inflation 

effect on after-tax real returns, 
7:286-287 

and GDP growth, 7:282 
indexing for, 7:278-279 
in regression analysis, 77:323 
risk of, 77:282 

risk premiums for, 7:280-283 
seasonal factors in, 7:292 
shifts in, 7:285f 
volatility of, 7:281 
Information 

anticipation of, 777:476 
from arrays in MATLAB, 777:421 
completeness of, 7:353-354 
contained in high volatility stocks, 
777:629 

and filtration, 777:517 
found in data, 77:486 
and information propagation, 

77:515 

insufficient, 777:44 
integration of, 77:481-482 
overload of, 77:481 
prior in Bayesian analysis, 
7:151-155, 7:152 
propagation of, 7:104 
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Information ( Continued) 

structures of, f:106/, 11:515-517 
unstructured us. semistructured, 
77:481-482 

Information coefficients (ICs), 77:98-99, 
77:221-223, 77:223/, 77:227f, 77:234 
Information ratios 

defined, 77:86n, 77:115, 77:119, 77:237 
determining, 77:100/ 
for portfolio sorts, 77:219 
use of, 77:99-100 
Information sets, 77:123 
Information structures 
defined, 77:518 

Information technology, role of, 
77:480-481 

Ingersoll models, 7:271-273, 7:275/ 
Initial conditions, fixing of, 77:502 
Initial margins, 7:478 
Initial value problems, 77:639 
Inner quartile range (IQR), 77:494 
Innovations, 77:126 
Insurance, credit, 7:413^14 
Integrals, 77:588-590, 77:593. See a/so 
stochastic integrals 
Integrated series, and trends, 
77:512-514 

Integration, stochastic, 777:472, 777:473, 
777:483 

Intelligence, general, 77:154 
Intensity-based frameworks, and the 
Poisson process, 7:315 
Interarrival time, 777:219,777:225 
Intercepts, treatment of, 77:334-335 
Interest 

accumulated, 77:604—605, 77:604/ 
annual us. quarterly compounding, 
77:599/ 

compound, 77:597, II:597f 
computing accrued, and clean price, 
7:214-215 

coverage ratio, 77:560 
defined, 77:596 

determining unknown rates, 
77:601-602 

effective annual rate (EAR), 
77:616-617 
mortgage, 77:398 
simple us. compound, 77:596 
terms of, 77:619 
from TIPS, 7:277 
Interest rate models 
binomial, 777:173-174, 777:174/ 
classes of, 771:600 
confusions about, 777:600 
importance of, 777:600 
properties of lattices, 777:610 
realistic, arbitrage-free, 777:599 
risk-neutral / arbitrage-free, 777:597 


Interest rate paths, 777:6-9,777:7, 777:87 
Interest rate risk, 777:12-14 
Interest rates 

absolute us. relative changes in, 
777:533-534 

approaches in determining future, 
777:591 

binomial model of, 777:173-174 
binomial trees, 7:236,7:236/ 7:237/ 
7:240f, 7:244, 7:244/ 777:174/ 
borrowing us. lending, 7:482-483 
calculation of, 77:613-618 
calibration of, 7:495 
caps/caplets of, 777:589-590 
caps on, 7:248-249 
categories of term structure, 777:561 
computing sensitivities, 777:22-23 
continuous, 7:428, 7:439^88 
derivatives of, 777:589-590 
determination of appropriate, 
7:210-211 

distribution of, 777:538-539 
dynamic of process, 7:262 
effect of, 7:514-515 
effect of shocks, 777:23 
effect on putable bonds, 777:303-304 
future course of, 777:567, 777:573 
and futures prices, 7:435n 
importance of models, 777:600 
jumps of, 777:539-541 
jumpy and continuous, 777:539/ 
long us. short, 777:538 
market spot/forward, 7:495f 
mean reversion of, 777:7 
modeling of, 7:261-265, 7:267, 7:318, 
7:491, 7:503,777:212-213 
multiple, 77:599-600 
negative, 777:538 
nominal, 77:615-616 
and option prices, 7:486^487 
and prepayment risk, 777:48 
risk-free, 7:442 
shocks/shifts to, 777:585-596 
short-rate, 7:491-494,777:595 
simulation of, 777:541 
stochastic, 7:344, 7:346 
structures of, 777:573,777:576 
use of for control, 7:489 
volatility of, 777:405, 777:533 
Intermarket relations, no-arbitrage, 
7:453-455 

Internal consistency rule, in OAS 
analysis, 7:265 

Internal rate of return (IRR), 77:617-618 
in MBSs, 777:36 

International Monetary Fund 
Global Stability Report, 7:299 
International Swap and Derivatives 
Association (ISDA). See ISDA 


Interpolated spread (I-spread), 7:227 
Interrate relationship, arbitrage-free, 
777:544 

Intertemporal dependence, and risk, 
771:351 

Intertrade duration, 77.-460M61, 

77:4627 

Intertrade intervals, 77.-460M61 
Intervals, credible, 7:170 
Interval scales, data on, 77:487 
Intrinsic value, 7:441,7:511, 7:513, 
77:16-17 

Invariance property, 777:328-329 
Inventory, 77:542,77:557 
Inverse Gaussian process, 777:499 
Investment, goals of, 77:114-115 
Investment management, 777:146 
Investment processes 
activities of integrated, 77:61 
evaluation of results of, 77:117-118 
model creation, 77:96 
monitoring of performance, 77:104 
quantitative, 77:95, 77:95/ 
quantitative equity, 77:95/ 77:96/ 
77:105 

research, 77:95-102 
sell-structured, 77:108 
steps for equity investment, 77:119 
testing of, 77:109 

Investment risk measures, 771:350-351 
Investments, 7:77-78n, 77:50-51, 
77:617-618 

Investment strategies, 77:66-67, 

77:198 

Investment styles, quantamental, 
77:93-94, 77:93/ 

Investors 

behavior of, 77:207, 77:504 
comfort with risk, 7:193 
completeness of information of, 
7:353-354 

focus of, 7:299, 77:90-91 
fundamental us. quantitative, 
77:90-94, 77:91/ 77:92/ 77:105 
goals/objectives of, 77:114-115, 
77:179, 777:631 

individual accounts of, 77:74 
monotonic preferences of, 7:57 
number of stocks considered, 77:91 
preferences of, 7:5,7:260, 77:48, 77:56, 
77:92-93 

prior beliefs of, 77:727 
real-world, 77:132 
risk aversion of, 77:82-83,77:729 
SL-CAPM assumptions about, 7:66 
sophistication of, 77:108 
in uncertain markets, 77:54 
views of, 7:197-199 
Invisible hand, notion of, 77:468^469 
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ISDA (International Swap and 
Derivatives Association) 

Credit Derivative Definitions (1999), 
1:230,1:528 

Master Agreement, I:538 
organized auctions, I:526-527 
supplement definition, 1:230 
I-spread (interpolated spread), 1:227 
Ito, Kiyosi, II :470 
Ito definition, 111:486-487 
Ito integrals, 1:122,111:475,111:481, 
111:490-491 
Ito isometry. III :475 
Ito processes 
defined, 1:95 
generic univariate, 1:125 
and Girsanov's theorem, 1:131 
under HJM methodology, 1:497 
properties of, 111:487-488 
and smooth maps, 111:493 
Ito's formula, 1:126,111:488-489 
Ito's lemma 
defined, 1:98 
discussion of, 1:95-97 
in estimation, 1:348 
and the Heston model, 1:548 

James-Stein shrinkage estimator, 1:194 
Japan, credit crisis in, 1:417 
Jarrow-Turnbull model, 1:307 
Jarrow-Yu propensity model, 1:324-325 
Jeffreys' prior, 1:153,1:160n, 1:171-172 
Jensen's inequality, 1:86, III:569 
Jevons, Stanley, 11:468 
Johansen-Juselius cointegration tests, 
II:391-393,11:395 
Joint jumps/defaults, 1:322-324 
Joint survival probability, 1:323-324 
Jordan diagonal blocks, 11:641-642 
Jorion shrinkage estimator, 1:194,1:202 
Jump-diffusion, III: 554-557,111:657 
Jumps 

default, 1:322-324 
diffusions, 1:559-560 
downward, 1:347 
idiosyncratic, 1:323 
incorporation of, 1:93-94 
in interest rates. III:539-541 
joint, 1:322-324 
processes of, 111:496 
pure processes, 111:497-501, 111:506 
size of, 111:540 

Kalotay-Williams-Fabozzi (KWF) 
model, 111:604,111:606-607, 
111:615/ 

Kamakura Corporation, 1:301,1:307, 

1:308-309,1:310n 
Kappa, 1:521 


Karush-Kuhn-Tucker conditions (KKT 
conditions), 1:28-29 
Kendall's tau, 1:327,1:332 
Kernel regression, 11:403,11:412-413, 
11:415 

Kernels, 11:412,11:413/ 11:746 
Kernel smoothers, 11:413 
Keynes, John Maynard, 11:471 
Key rate durations (KRD), 11:276, 
111:311-315,111:317 
Key rates, 11:276,111:311 
Kim-Rachev (KR) process. III:512-513 
KKT conditions (Karush-Kuhn-Tucker 
conditions), 1:28-29,1:31,1:32 
KoBoL distribution. Ill:257n 
Kolmogorov extension theorem, 
111:477-478 

Kolmogorov-Smirnov (KS) test, 11:430, 
III:366,111:647 

Kolomogorov equation, use of. III:581 
Kreps, David, 11:476 
Krispy Kreme Doughnuts, II:574-575, 
11:574/ 

Kronecker product, 1:172, l:173n 
Kuiper test. III:366 
Kurtosis, 1:41,111:234 

Lag operator L, 11:504—506,11:507, 

II:629-630 

Lagrange multipliers, 1:28,1:29-31, 
1:30,1:32 

Lag times, 11:387,111:31 
Laplace transforms, 11:647-648 
Last trades, price and size of, 11:450 
Lattice frameworks 
bushy trees in, 1:265,1:266/ 
calibration of, 1:238-240 
fair, 1:235 

interest rate, 1:235-236,1:236-238 

one-factor model, 1:236/ 

for pricing options, 1:487 

usefulness of, 1:235 

use of, 1:240,1:265-266,11I.T4 

value at nodes, 1:237-238 

1-year rates, 1:238/ 1:239/ 

Law of iterated expectations, 1:110, 
1:122,11:308 

Law of large numbers, 1:267,1:270n, 

III:263-264,111:275 
Law of one a, 11:50 
Law of one price (LOP), 1:52-55, 
1:99-100,1:102,1:119,1:260 
LCS (liquidity cost score), 1:402 
use of, 1:403 

LDIs (liability-driven investments), 
1:36 

LD (loss on default), 1:370-371 
Leases, in financial statements, 11:542 
Least-square methods, 11:683-685 


Leavens, D. H., 1:10 
Legal loss data 

Cruz study, 1/1:113,111:1151 
Lewis study, 111:117,111:1171 
Lehman Brothers, bankruptcy of, 1:413 
Level (parallel) effect, 11:145 
Levy-Khinchine formula. 111:253-254, 
111: 257 

Levy measures, 111:254,111:2541 
Levy processes 
and Brownian motion. 111: 504 
in calibration, 11:682 
change of measure for, 111:511-512 
conditions for. III:505 
construction of. III:506 
from Girsanov's theorem, 111:511 
and Poisson process. Ill:496 
as stochastic process. 111:505-506 
as subordinators. III :521 
for tempered stable processes, 
111:512-514,111:5141 
and time change, 111:527 
Levy stable distribution, 111:242, 
111:339, Ill:382-386,111:392 
LGD (loss given default), 1:366,1:370, 
1:371 

Liabilities, 11:533,11:534-535,111:132 
Liability-driven investments (LDIs), 
1:36 

Liability-hedging portfolios (LHPs), 
1:36 

LIBOR (London Interbank Offered 
Rate) 

and asset swaps, 1:227 
changes in, by type. III:539-540 
curve of, 1:226 
interest rate models, 1:494 
market model of. III:589 
spread of, 1:530 
in total return swaps, 1:541 
use of in calibration, 111:7 
Likelihood maximization, 1:176 
Likelihood ratio statistic, 11:425 
Limited liability rule, 1:363 
Limit order books, use of. III:625, 

111:632n 

Lintner, John, 11:474 

Lipschitz condition, II:658n, Ill:489, 

111: 490 
Liquidation 
effect of, 11:186 
procedures for, 1:350-351 
process models for, 1:349-351 
time of, 1:350 
vs. default event, 1:349 
Liquidity 

assumption of, 111:371 
in backtesting, II :235 
changes in, 1:405 
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Liquidity ( Continued ) 
cost of, 7:401 

creation of. III: 624-625, III :631 
defined. III: 372, 777:380 
effect of, 11:284 

estimating in crises, 111:378-380 
in financial analysis, II: 551-555 
and LCS, 7:404 
and market costs, 777:624 
measures of, 77:554-555 
premiums on, 7:294, 7:307 
ratios for, 77:555 
in risk modeling, 77:693 
shortages in, 7:347-348 
and TIPS, 7:293, 7:294 
and transaction costs, 777:624-625 
Liquidity-at-risk (LAR), 777:376-378 
Liquidity cost, 777:373-374,777:375-376 
Liquidity cost score (LCS), 7:402,7:403 
Liquidity preference hypothesis, 
777:570 

Liquidity ratios, 77:563 
Liquidity risk, 77:282, 777:380 
Ljung-Box statistics, 77:407,77:421, 
77:422, 77:427-428 
LnMix models, calibration of, 
77:526-527 

Loading, standardization of, 77:177 

Loan pools, 777:8-9 

Loans 

amortization of, 77:606-607, 
77:611-613 

amortization table for, 77:6127 
delinquent, 777:63 

fixed rate, fully amortized schedule, 
77:6147 

floating rate, 77:613 
fully amortizing, 77:611 
modified, 777:32 
nonperforming, 777:75 
notation for delinquent, 777:45n 
recoverability of, 777:31-32 
refinancing of, 777:68-69 
repayment of, 77:612/, 77:613/ 
term schedule, 77:6157 
Loan-to-value ratios (LTVs), 777:31-32, 
777:69, 777:73, 777:74-75 
Location parameters, 7:160n, 
777:201-202 

Location-scale invariance property 
(Gaussian distribution), 77:732 
Logarithmic Ornstein-Uhlenbeck 

(log-OU) processes, 7:557-558 
Logarithmic returns, 777:211-212, 
777:225 

Logistic distribution, 77:350 
Logistic regression, 7:307, 7:308, 7:310 
Logit regression models, 77:349-350, 
77:350 


Log-Laplace transform, 777:255-256 
Lognormal distribution, 777:222-225, 
777:392 

Lognormal mixture (LnMix) 
distribution, 77:524—525 
Lognormal variables, 7:86 
Log returns, 7:85-86, 7:88 
London Interbank Offered Rate 
(LIBOR). See LIBOR 
Lookback options, 7:114,777:24 
Lookback periods, 777:402,777:407 
LOP (law of one price). See law of one 
price (LOP) 

Lorenz, Edward, 77:653 
Loss distributions, conditional, 
777:340-341 

Losses. See also operational losses 
allocation of, 777:32 
analysis of in backtesting, 777:338 
collateral vs. tranche, 771:36 
computation of, 7:383 
defined, 777:85 

estimation of cumulative, 777:39—40 
expected, 7:369-370, 7:373-374 
expected vs. unexpected, 7:369, 
7:375-376 

internal us. external, 771:83-84 
median of conditional, 777:348n 
projected, 111:37f 
restricting severity of, 7:385-386 
severity of, 777:44 
unexpected, 7:371-372,7:374-375 
Loss functions, 7:160n, 777:369 
Loss given default (LGD), 7:366, 7:370, 
7:371 

Loss matrix analysis, 777:40^1 
Loss on default (LD), 7:370-371 
Loss severity, 777:30-31, 777:60-62, 
777:97-99 

Lottery tickets, 7:462 
Lower partial moment risk measure, 
777:356 

Lundbert, Filip, 77:467, 77:470^71 

Macroeconomic influences, defined, 
77:197 

Magnitude measures, 77:429^130 
Maintenance margins, 7:478 
Major indexes, modeling return 

distributions for, 777:388-392 
Malliavin calculus, 777:644 
Management, active, 77:115 
Mandelbrot, Benoit, 77:653,77:738, 
777:234, 777:241-242 
Manufactured housing prepayment 
(MHP) curve, 777:56 
Marginalization, 77:335 
Marginal rate of growth, 777:197-198 
Marginal rate of substitution, 7:60 


Margin calls, exposure to, 777:377 
Market cap vs. firm value, 77:39 
Market completeness, 7:52, 7.T05 
Market efficiency, 7:68-73, 77:121, 
77:473^74 
Market equilibrium 

and investor's views, 7.T98-199 
Market impact 

costs of, 777:623-624, 777:627 
defined, 77:69 
forecasting/modeling of, 

777:628-631 

forecasting models for, 777:632 
forecasting of, 777:628-629, 
777:629-631 

measurement of, 777:626-628 
between multiple accounts, 77:75-76 
in portfolio construction, 77:116 
and transaction costs, 77:70 
Market model regression, 77:139 
Market opportunity, two state, 7:460/ 
Market portfolios, 7:66-67, 7:72-73 
Market prices, 7:57, 777:372 
Market risk 

approaches to estimation of, 777:380 
in bonds, 777:595 
in CAPM, 7:68-69, 77:474 
importance of, 777:81 
models for, 777:361-362 
premium for, 7:203n, 7:404 
Markets 

approach to segmented, 77:48-51 
arbitrage-free, 7:118 
complete, 7:51-52, 777:578 
complex, 77:49 

effect of uncertainty in on bid-ask 
spreads, 11:455-456 
efficiency of, 77:15-16 
frictionless, 7:261 
incomplete, 7:461^62 
liquidity of, 777:372 
models of, 777:589 
for options and futures, 7:453^454 
perfect, 77:472 

properties of modern, 777:575-576 
sensitivities to value-related 
variables, 77:547 
simple, 7:70 

systematic fluctuations in, 
77:172-173 

unified approach to, 77:49 
up/down, defined, 77:347 
Market sectors, defined, 777:560 
Market standards, 7:257 
Market structure, and exposure, 
77:269-270 

Market timing, 77:260 
Market transactions, upstairs, 
777:630-631, 777:632n 
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Market weights, 77:269f 
Markov chain approximations, 77:678 
Markov chain Monte Carlo (MCMC) 
methods, 11:410/, 11:417-418 
Markov coefficients, II: 506-507, II :512 
Markov matrix, 1 :368 
Markov models, 7:114 
Markov processes 
in dynamic term structures, 777:579 
hidden, 7:182 
use of, 777:509, 777:517 
Markov property, 7:82,7:180-181,7:183, 
77:661, 777:193n 

Markov switching (MS) models 
discussion of, 7:180-184 
and fat tails, 777:277-278 
stationarity with, 777:275 
usefulness of, 77:433 
use of, 77:409-411, 77:4117 
Markowitz, Harry M., 7:38, 7:140, 
77:467, 77:471-472, 777:137, 
777:351-352 

Markowitz constraint sets, 7:69, 7:72 
Markowitz diversification, 7:10-11, 

7:11 

Markowitz efficient frontiers, 7:191/ 
Markowitz model 

in financial planning, 777:126 
Mark-to-market (MTM) 

calculation of value, 7:535-536,7:536f 
defined, 7:535 

and telescoping futures, 7:431—432 
Marshall and Siegel, 77:694 
Marshall-Olkin copula, 7:323-324, 

7:329 

Martingale measures, equivalent 
and arbitrage, 7:111-112, 7:124 
and complete markets, 7:133 
defined, 7:110-111 
and Girsanov's theorem, 7:130-133 
and state prices, 7:133-134 
use of, 7:130-131 
working with, 7:135 
Martingales 

with change of time methods 
(CTM), 777:522-523 
defined, 77:124, 77:126,77:519 
development of concept, 77:469—470 
equivalent, 77:476 
measures of, 7:110-111 
use of conditions, 7:116 
use of in forward rates, 777:586 
Mathematical theory, importance of 
advances in, 777:145 
Mathworks, website of, 777:418 
MATLAB 

array operations in, 777:420-421 
basic mathematical operations in, 
777.-419M20 


construction of vectors/matrices, 
777:420 

control flow statements in, 
777:427-428 
desktop, 777:419/ 

European call option pricing with, 
777:444-445 

functions built into, 111:421-422 
graphs in, 777:428-433, 777.-429M30/, 
777:431/ 

interactions with other software, 
777:433-434 

M-files in, 777.-418M19,777:423, 
777:447 

operations in, 777:447 
optimization in, 777:434—444, 

777:4351 

Optimization Tool, 777.-435M36, 
777:436/ 777:440/ 777:441/ 
overview of desktop and editor, 
777:418-419 

quadprog function, 77:70 
quadratic optimization with, 
777:441-444 

random number generation, 

777:444 

for simulations, 777:651 
Sobol sequences in, 777:445—446 
for stable distributions, 777:344 
surf function in, 111:432-433 
syntax of, 777:426—427 
toolboxes in, 111:417-418 
user-defined functions in, 
777:423—127 
Matrices 

augmented, 77:624 
characteristic polynomial of, 77:628 
coefficient, 77:624 
companion, 77:639-640 
defined, 77:622 
diagonal, 77:622-623,77:640 
eigenvalues of random, 77:704-705 
eigenvectors of, 77:640-641 
in MATLAB, 777:422, 777:432 
operations on, 77:626-627 
ranks of, 77:623,77:628 
square, 77:622-623,77:626-627 
symmetric, 77:623 
traces of, 77:623 

transition, 777:32-33, 777:321, 777:331, 
777:35/ 

types of, 77:622, 77:628 
Matrix differential equations, 777:492 
Maturity value (lump sum), from 
bonds, 7:211 

Maxima, 777:265-269,777:266/ 
Maximum Description Length 
principle, 77:703 

Maximum eigenvalue test, 77:392-393 


Maximum likelihood (ML) 
approach, 7:141, 7:348 
methods, 77:348-349, 77:737-738, 
777:273 

principal, 77:312 

Maximum principle, 77:662, 77:667 
Max-stable distributions, 777:269, 
777:339-340 

MBA (Mortgage Bankers Association) 
refi index, 777:70, 777:70/ 

MBS (mortgage-backed securities), 
7:258 

agency os. nonagency, 777:48 
cash flow characteristics of, 777:48 
default assumptions about, 777:8 
negative convexity of, 777:49 
performance of, 777:74 
prices of, 777:26 

projected long-term performance of, 
777:34/ 

time-related factors in, 777:73-74 
valuation of, 777:62 
valuing of, 777:645 
MBS (mortgage-backed securities), 
nonagency 
analysis of, 111:44-45 
defined, 777:48 

estimation of returns, 777:36—44 
evaluation of, 777:29 
factors impacting returns of, 
777:30-32 

yield tables for, 777:411 
Mean absolute deviation (MAD), 
777:353 

Mean absolute moment (MAM(q)), 
777:353 

Mean colog (M-colog), 777:354 
Mean entropy (M-entropy), 777:354 
Mean excess function, 77:746-747 
Mean/first moment, 777:201-202 
Mean residual life function, 77:754n 
Mean reversion 

discussion of, 7:88-92 
geometric, 7:91-92 
in HW models, 777:605 
and market stability, 777:537-538 
models of, 7:97 

parameter estimation, 7:90-91 
risk-neutral asset model, 777:526 
simulation of, 7:90 
in spot rate models, 777:580 
stabilization by, 777:538 
within a trinomial setting, 777:604 
Mean-reverting asset model (MRAM), 
777:525-526 

Means, 7:148, 7:155, 7:380,777:166-167 
Mean-variance 
efficiency, 7:190-191 
efficient portfolios, 7:13, 7:68, 7:69-70 
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Mean-variance ( Continued) 
nonrobust formulation, 777:139-140 
optimization, t:192 
constraints on, 7:191 
estimation errors and, 7:17-18 
practical problems in, 7:190-194 
risk aversion formulation, 77:70 
Mean variance analysis, 7:3,7:15/, 

7:201, 77:471-472, 777:352 
Measurement levels, in descriptive 
statistics, 77:486-487 
Media effects, 777:70 
Median, 7:155,7:159n, 77:40 
Median tail loss (MTL), 777:341 
Mencken, H. L., 77:57 
Menger, Carl, 77:468 
Mercurio-Moraleda model, 7:493^94 
Merton, Robert, 7:299, 7:310, 77:468, 
77:475,77:476 
Merton model 

advantages and criticisms of, 

7:344 

applied to probability of default, 
7:363-365 

with Black-Scholes approach, 
7:305-306 

default probabilities with, 7:307-308 
discussion of, 7:343-344 
drawbacks of, 7:410 
with early default, 7:306 
evidence on performance, 7:308-309 
as first modern structural model, 
7:313,7:341 
in history, 7:491 

with jumps in asset values, 7:306 
portfolio-level hedging with, 
7:411-413 

with stochastic interest rates, 7:306 
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7:408-410 

usefulness of, 7:410, 7:411-412, 
7:417-418 

use of, 7:304, 7:305, 7:510 
variations on, 7:306-307 
Methodology, equally weighted, 
777:399 
Methods 

quantile, 77:354—356 
Methods pathwise, 777:643 
Metropolis-Hastings (M-H) algorithm, 
7:178 

M-H algorithm, 7:179 
MIB 30, 777:402^03,777:402/ 777:403/ 
Microsoft, 77:722/. See also Excel 
Midsquare technique, 777:647 
Migration mode 

calculation of expected/unexpected 
losses under, 7:376f 
expected loss under, 7:373-374 


Miller, Merton, 77:467, 77:473 
MiniMax (MM) risk measure, 777:356 
Minimization problems, solutions to, 
77:683-684 

Minimum-overall-variance portfolio, 
7:69 

Minority interest, on the balance 
sheet, 77:536 

Mispricing, risk of, 77:691-692 
Model creep, 77:694 
Model diagnosis, 777:367-368 
Model estimation, in non-IDD 
framework, 777:278 
Modeling 

calibration of structure, 777:549-550 
changes in mathematical, 77:480-481 
discrete vs. continuous time, 777:562 
dynamic, 77:105 
issues in, 77:299 

nonlinear time series, 77:427—428, 
77:430^133 
quantitative, 77:481 
Modeling techniques 
non-parametric/nonlinear, 77:375 
Model risk 
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awareness of, 7:145, 77:695-696 
with computer models, 77:695 
consequences of, 77:729-730 
contribution to bond pricing, 
77:727-728 

defined, 7:331, 77:691, 77:697 
discussion of, 77:714-715 
diversification of, 77:378 
endogenous, 77:694-695, 77:697 
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77:696-697 

management of, 77:695-697, 77:697 
misspecification of, 77:199 
and robustness, 77:301 
of simple portfolio, 77:721-726 
sources of, 77:692-695 
Models. See also operational risk 
models 

accuracy in, 777:321 
adjustment, 77:502 
advantages of reduced-form, 7:533 
analytical tractability of, 777:549-550 
APD, 777:18, 777:20-22, 777:21/, 777:26 
application of, 77:694 
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777:597-598 
arbitrage-free, 777:600 
autopredictive, 77:502 
averages across, 77:715 
bilinear, 77:403^04 
binomial, 7:114-116, 7:119 
binomial stochastic, 77:10-11 


block maxima, 77:745 
choosing, 777:550-552 
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compatibility of, 777:373 
complexity of, 77:704, 77:717 
computer, 7:511, 77:695 
conditional normal, 77:733-734 
conditional parametric fat-tailed, 
77:744 

conditioning, 77:105 
construction of, 77:232-235 
for continuous processes, 7:123 
creation of, 77:100-102 
cross-sectional, 77:174-175,77:1757 
cumulative return of, 77:234 
defined, 77:691, 77:697 
to describe default processes, 7:313 
description and estimation of, 
77:256-257 

designing the next, 777:590-591 
determining, 77:299-300 
disclosure of, 7:410 
documentation of, 77:696 
dynamic factor, 77:128, 77:131, 
777:126-127 

dynamic term structure, 777:591 
econometric, 77:295, 77:304 
equilibrium forms of, 777:599-600 
equity risk, 77:174, 77:178-191, 77:192 
error correction, 77:3817, 77:387-388, 
77:394-395 

evidence of performance, 7:308-309, 
77:233 

examples of multifactor, 77:139-140 
financial, 7:139,77:479-480 
forecasting, 77:112, 77:303-304 
for forecasting, 777:411 
formulation of, 777:128-131 
fundamental factor, 77:244, 77:248 
generally, 77:360-362 
Gordon-Shapiro, 77:17-18 
Heath-Jarrow-Morton, 777:586-587, 
777:589 

hidden-variable, 77:128, 77:131 
linear, 77:264, 77:310-311,77:348, 
77:507-508 
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77:130-131 

linear regression, 7:91, 7:163-170, 
77:360, 77:414^15 
liquidation process, 7:342 
martingale, 77:127-128,777:520-521 
MGARCH, 77:371-372 
model-vetting procedure, 77:696-697 
moving average, 777:414 
multifactor, 77:231-232,777:92 
multivariate extensions of, 
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nonlinear, II: 402-421,11:417-418 
penalty functions in, 11:703 
performance measurement of, 11:301 
predictive regressive, 11:130 
predictive return, 11:128-131 
for pricing, 11:127-128 
pricing errors in, 1:322 
principals for engineering, 
11:482-483 
probabilistic, 11:299 
properties of good, 1:320 
ranking alternative. III: 368-370 
recalibration of, 11:713-714 
reduced form default, 1:310,1:313 
regressive, 11:128,11:129-130 
relative valuation, 1:260 
return forecasting, 11:119 
returns of, 11:2331 
robustness of, 11:301 
selection of, 1:145,11:298,11:692-693, 
11:699-701 
short-rate, 1:494 
single-index market, 11:317-318 
static, 11:297,111:573 
static regressive, 11:129-130 
static vs. dynamic, 11:295-296,11:304 
statistical, 11:175,11:1751 
stochastic, 1:557,111:124-125 
structural, 1:305,1:313-314,1:341-342 
structural os. reduced, 1:532-533 
subordinated, 11:742-743 
temporal aggregation of, 11:369 
testing of, 11:126-127,11:696-697 
time horizon of, 11:300-301 
time-series, 11:175,11:1751 
tree, 11:381, 111:22-23 
tuning of, 111:580-581 
two-factor, 1:494 
univariate regression, 1:165 
usefulness of, 11:122 
use of in practice, 1.494M96,111:6001 
Models, lattice 

binomial, 111:610,111:610/ 
Black-Karasinski (BK) lattice, 111:611 
Hull White binomial. 111: 610-611 
Hull White trinomial, 111:613 
trinomial, 111:610,111:610/, 

111:611-612 
Models, selection of 
components of, 11:717 
generally, 11:715-717 
importance of, 11:700 
machine learning approach to, 
11:701-703,11:717 
uncertainty/noise in, 11:716-717 
use of statistical tools in, 11:230 
Modified Accelerated Cost Recovery 
System (MACRS), 11:538 
Modified Restructuring clause, 1:529 


Modified tempered stable (MTS) 
processes, 111:513 
Modigliani, Franco, 11:467,11:473 
Modigliani-Miller theorem, 1:343, 

1:344,11:473,11:476 
Moment ratio estimators, 111:274 
Moments 

exponential. 111:255-256 
first. 111:201-202 
of higher order. 111:202-205 
integration of, 11:367-368 
raw, 11:739 
second, 111:202 
types of, 11:125 
Momentum 

formula for analysis of, 11:239 
portfolios based on, 11:181 
Momentum factor, 11:226-227 
Money, future value of, 11:596-600 
Money funds, European options on, 
1.-498M99 

Money markets, 1:279,1:282,1:314, 
11:244 

Monotonicity property, 111:327 
Monte Carlo methods 
advantages of, 11:672 
approach to estimation, 1:193 
defined, 1:273 
examples of. 111:637-639 
foundations of, 1:377-378 
for interest rate structure, 1:494 
main ideas of. 111:637-642 
for nonlinear state-space modeling, 
11:417-418 

stochastic content of, 1:378 
usefulness of, 1:389 
use of, 1:266-268,111:651 
of VaR calculation. 111:324-325 
Monte Carlo simulations 
for credit loss, 1:379-380 
effect of sampling process, 1:384 
in fixed income valuation modeling, 
111 : 6-12 

sequences in, 1:378-379 
speed of, 111:644 
use of. 111: 10-11,111:642 
Moody's diversity score, use of, 

1:332 

Moody's Investors Service, 1:362 
Moody's KMV, 1:364-365 
Mortgage-backed securities (MBS). See 
MBS (mortgage-backed 
securities) 

Mortgage Bankers Association (MBA) 
method. 111:57-58 
Mortgagee pools 
composition of, 111:52 
defined, 111:23, 111: 65 
nonperforming loans and, 111:75 


population of, 111:19 
seasoning of, 111:20,111:22 
Mortgages, 111: 48M9,111:65,111:69, 

111:71 

Mosaic Company, distribution of price 
changes of, 11:723/ 

Mossin, Jan, 11:468,11:474 
Moving averages, infinite, 11:504—508 
MSCI Barra model, 11:140 
MSCI EM, historical distributions of, 
111:391/ 

MSCI-Germany Index, 1:143 
MSCI World Index, 1:15-17 
analysis of 18 countries, 1:161 
MS GARCH model, 1:185-186 
estimation of, 1:182 
sampling algorithm for, 1:184 
MSR (maximum Sharpe Ratio), 1:36-37 
MS-VAR models, 11:131 
Multiaccount optimization, 11:75-77 
Multicollinearity, 11:221 
Multilayer perceptrons, 11:419 
Multinomial / polynomial coefficients, 
111:191-192 

Multivariate normal distribution, in 
MATLAB, 111:432-433,111:433/ 
Multivariate random walks, 11:124 
Multivariate stationary series, 
11:506-507 

Multivariate f distribution, loss 
simulation, 1:388-389 

Nadaraya-Watson estimator, 11:412, 
11:415 

Natural conjugate priors, !:160n 
Navigation, fuel-efficient, 1:562-563 
Near-misses, management of, 

111: 84-85 

Net cash flow, defined, 11:541 
Net cost of carry, 1.-424M25,1:428, 

1:437,1:439-440,1:455 
Net free cash flow (NFCF), 11:572-574, 
11:578 

Net profit margin, 11:556 
Net working capital-to-sales ratio, 
11:554-555 

Network investment models, 
111:129-130,111:129/ 

Neumann boundary condition, 11:666, 
11:671 

Neural networks, 11:403,11:418^121, 
11:418/ 11:701-702 
Newey-West corrections, 11:220 
NIG distribution. 111:257n 
9/11 attacks, effects of. 111:402-403 
No-arbitrage condition, in certain 
economy. 111:567-568 
No arbitrage models, use of, 111:604 
No-arbitrage relations, 1:423 
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Noise 

continuous-time. III :486 
in financial models, 11:7 21-722 
in model selection, 77:716-717 
models for, 77:726 
reduction of, 77:51-52 
Noise, white 
defined, 7:82, 77:297 
qualities of, 77:127 
sequences, 77:312, 77:313 
in stochastic differential equations, 
777:486 
strict, 77:125 

us. colored noise, 777:275 
Nonlinear additive AR (NAAR) 
model, 77:417 

Nonlinear dynamics and chaos, 77:645, 
77:652-654 
Nonlinearity, 77:433 
in econometrics, 77:401—403 
tests of, 77:421-427 

Non-normal probability distributions, 
77:480 

Nonparametric methods, 77:411^16 
Normal distributions, 7:81,7:82f, 
7:177-178, 777:638/ 
and AVaR, 777:334 
comparison with o'-stable, 777:234/ 
fundamentals of, 77:731-734 
inverse Gaussian, 777:231-233, 
777:232/ 777:233/ (See a/so 
Gaussian distribution) 
likelihood function, 7:142-143 
for logarithmic returns, 777:211-212 
mixtures of for downside risk 
estimation, 777:387-388 
for modeling operational risk, 
777:98-99 

multivariate, and tail dependence, 
7:387 

properties of, 77:732-733,777:209-210 
relaxing assumption of, 7:386-387 
standard, 777:208 

standardized residuals from, 77:751 
use of, 77:752n 

using to approximate binomial 
distribution, 777:211 
for various parameter values, 
777:209/ 

us. normal inverse Gaussian 
distribution, 777:232-233 
Normal mean, and posterior tradeoff, 
7:158-159 

Normal tempered stable (NTS) 
processes, 777:513 
Normative theory, 7:3 
Notes, step-up callable, 7:251-252, 
7:251/ 7:252/ 

Novikov condition, 7:131-132 


NTS distribution, 777:257n 
Null hypothesis, 7:157, 7:170,777:362 
Numeraire, change of, 777:588-589 
Numerical approximation, 7:265 
Numerical models for bonds, 
7:273-275 

OAS (option-adjusted spread). See 
option-adjusted spread 
Obligations, deliverable, 7:231, 7:526 
Observations, frequency of, 777:404 
Occam's razor, in model selection, 
77:696 

Odds ratio, posterior, 7:157 
Office of Thrift Supervision (OTS) 
method, 777:57-58 

Oil industry, free cash flows of, 77:570 
OLS (ordinary least squares). See 

ordinary least squares (OLS) 
Open classes, 77:493^94 
Operating cash flow (OCF), 77:23 
Operating cycles, 77:551-554 
Operating profit margin, 77:556 
Operational loss data 

de Fontnouvelle, Rosengren, and 
Jordan study, 777:116-117, 
777:116/ 

empirical evidence with, 777:112-118 
Moscadelli study, 777:113,777:116, 
777:116/ 

Muller study, 777:113, 777:114/ 

777:115/ 

Reynolds-Syer study, 777:117-118 
Rosenberg-Schuermann study, 
777:118 

Operational losses 
and bank size, 777:83 
definitions of types, 777:84/ 
direct vs. indirect, 777:84-85 
expected os. unexpected, 777:85 
histogram of, 777:104/ 
histogram of severity distribution, 
777:95/ 

historical data on, 777:96 
near-miss, 777:84-85 
process of arriving at data, 777:96-97 
process of occurrence, 777:86/ 
recording of, 777:97 
severity of, 777:104/ 
time lags in, 777:96-97 
types of, 777:81, 777:88 
Operational loss models 
approaches to, 777:103-104 
assumptions in, 777:104 
nonparametric approach, 

777:103-104,777:104-105, 777:118 
parametric approach, 777:104, 
777:105-110,777:118 
types of, 777:118 


Operational risk 

classifications of, 777:83-88,777:87-88, 
777:87/ 777:88 
defined, 777:81-83,777:88 
event types with descriptions, 

777:86/ 

indicators of, 777:83 
models of, 777:91-96 
nature of, 777:99 
and reputational risk, 777:88 
sources of, 777:82 

Operational risk/event/loss types, 
distinctions between, 777:85-87 
Operational risk models 

actuarial (statistical) models, 777:95 
bottom-up, 777:92/ 777:94-96, 777:99 
causal, 777:94 
expense-based, 777:93 
income-based, 777:93 
multifactor causal models, 777:95 
operating leverage, 777:93 
process-based, 777:94-95 
proprietary, 777:96 
reliability, 777:94-95 
top down, 777:92-94,777:99 
types of, 777:91-92 
Operations 
addition, 77:625, 77:626 
defined, 77:628 

inverse and adjoint, 77:626-627 
multiplication, 77:625-626, 77:626 
transpose, 77:625,77:626 
vector, 77:625-626 
Operators in sets, defined, 777:154 
Ophelimity, concept of, 77:469 
Opportunity cost, 7:435,7:438,7:439, 
77:596, 777:623 

Optimal exercise, 7:515-516 
Optimization 

algorithms for, 777:124 
complexity of, 77:82 
constrained, 7:28-34 
defined, 777:434-435 
local us. global, 77:378 
in MATLAB, 777:434-444 
unconstrained, 7:22-28 
Optimization theory, 7:21 
Optimization Toolbox, in MATLAB, 
777:435-436, 777:436/ 

Optimizers, using, 77:115-116, 77:483 
Option-adjusted spread (OAS) 
calculation of, 7:253-255 
defined, 7:254, 777:11 
demonstrated, 7:254/ 
determination of, 7:259 
implementation of, 7:257 
and market value, 7:258 
results from example, 777:617/ 
and risk factors, 777:599 
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rules-of-thumb for analysis, 

1: 264-265 
usefulness of. 777:3 
values of, 1: 267,7:268 
variance between dealers, 7:257-258 
Option premium, 7:508-509 
time/intrinsic values of, 7:513 
Option premium profiles, 7:512,7:512/ 
Option prices 

components of, 7:484^85, 7:511-512 
factors influencing, 7:486-487, 7:486f, 
7:487-488, 7:522-523 
models for, 7:490 
Options 

American, 77:664-665, 77:669-670, 
77:674-679, 77:679-681 
American-style, 7:444, 7:454—455, 
7:490 

Asian, 77:663-664, 77:668-669, 
777:642-643 

on the average, 77:663-664 
barrier, 77:662-663 
basic properties of, 7:507-508 
basket, 77:662, 77:672 
Bermudean, 77:663-664, 777:597 
buying assets of, 7:439 
costs of, 7:441-442, 777:11-12 
difference from forwards, 7:437-439 
early exercise of, 7:442—443, 7:447 
Eurodollar, 7:489 
European, 7:125, 7:127-129, 
77:660-664, 77:665-674 
European-style, 7:444^45, 7:454 
European-style os. American-style, 
7:453f, 7:455n, 7:508, 7:515-516 
and expected volatility, 7:486 
expiration/maturity dates of, 7:484 
factors affecting value of, 7:474 
formulas for pricing, 777:522, 777:527 
in/out of/at-the-money, 7:485 
long os. short call, 7:437-439, 7:438/ 
lookback, 77:663,77:672,77:673/ 
on the maximum, 77:663 
models of, 7:510-511 
no-arbitrage futures, 7:453 
price relations for, 7:448f 
pricing of, 7:124-129, 7:455f, 
7:484-488, 7:507, 777:408 
theoretical valuation of, 7:508-509 
time premiums of, 7:485 
time to expiration of, 7:486 
types of, 7:484 

valuing of, 7:252-253,777:639 
vanilla, 77:661, 777:655 
volatility of, 7:488 
Orders 

in differential equations, 77:643, 
77:644-645 

fleeting limit, 777:625 


limit, 777:625, 777:631 
market, 777:625, 777:631 
Order statistics, 777:269-270 
bivariate, 777:293-295 
joint probability distributions for, 
777:291-292 
use of, 777:289 
for VaR and ETL, 777:292f 
in VaR calculations, 777:291 
Ordinary differential equations 

(ODE), 77:644-645,77:646-648, 
77:648-652, 77:649/ 

Ordinary least squares (OLS) 
alternate weighting of, 77:438—439 
estimation of factor loadings matrix 
with, 77:165 

in maximum likelihood estimates, 
77:313-314 

pictorial representations of, 
77:437-438, 77:438/ 
squared errors in, 77:439^40 
use of, 7:165, 7:172n, 77:353 
vs. Theil-Sen estimates of beta, 
77:442/ 

vs. Theil-Sen regression, 77:441 1 
Ornstein-Uhlenbeck process 
with change of time, 777:523 
and mean reversion, 7:263, 7:264/ 
solutions to, 777:492 
use of, 7:89, 7:95 
and volatility, 777:656 
Outcomes, identification and 

evaluation of worst-case, 
777.379-380 
Outliers 

in data sets, 77:200 

detection and management of, 77:206 
effect of, 77:355/ 77:442^143 
and market crashes, 77:503 
in OLS methods, 77:354 
in quantile methods, 77:355-356 
and the Thiel-Sen regression 
algorithm, 77:440 

Out-of-sample methodology, 77:238 

Pair trading, 77:710 
P-almost surely (P-a.s.) occurring 
events, 777:158 

Parallel yield curve shift assumption, 
777:12-13 
Parameters 
calibration of, 77:693 
density functions for values, 777:229/ 
777:230/ 777:231/ 
distributions of, 77:721 
estimation of for random walk, 7:83 
robust estimation of, 77:77-78 
stable, 777:246/ 

Parametric methods, use of, 77:522 


Parametric models, 77:522-523, 
77:526-527 

Par asset swap spreads, 7:530,7:531 
Par CDS spread, 7:531 
Par-coupon curve, 777:561 
Pareto, Vilfredo, 77:467,77:468-469, 
77:474 

Pareto(2) distribution, 77:441 
Pareto distributions 
density function of, 77:738 
generalized (GPD), 77:745-746, 
77:747, 777:230-231 
in loss distributions, 777:108-109 
parameters for determining, 77:738 
stable, 77:738-741 
stable/varying density, 77:739/ 
tails of, 77:751 
Pareto law, 77:469 
Pareto-Levy stable distribution, 

777:242 

Partial differential equations (PDEs) 
for American options, 77:664-665 
equations for option pricing, 
77:660-665 

framework for, 7:261, 7:265, 77:675, 
777:555 

pricing European options with, 
77:665-674 

usefulness of, 77:659-660 
use of, 777:18-19 
Partitioning, binary recursive, 
77:376-377, 77:376/ 

Paths 

in Brownian motion, 777:501,777:502/ 
dependence, 777:18-19 
stochastic, 77:297 
Payments, 7:229, 77:611-612 
Payment shock, 777:72 
Payoff-rate process, 7.T21-122 
Payoffs, 777:466,777:638-639 
PCA (principal components analysis). 
See principal component 
analysis (PCA) 

Pearson skewness, 777:204-205 
Pension funds, constraints of, 77:62 
Pension plans, 77:541, 777:132 
P/E (price/earnings) ratio, 77:20-21, 
77:38 

Percentage rates, annual os. effective, 
77:615-617 

Percolation models, 777:276 
Performance attribution, 77:57, 77:58, 
77:104, 77:188-189, 77:252-253, 
77:2537 

Performance-seeking portfolios 
(PSPs), 7:36, 7:37 
Perpetuities, 77:607-608 
Pharmaceutical companies, 77:7-8, 
77:11, 77:244 
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Phillips-Perron statistic, 77:386, II: 398 
Pickand-Balkema-de Haan theorem, 
11:746 

Pickand estimator. III :273 

Pliska, Stankey, II :476 

Plot function, in MATLAB, III: 428M32 

P-null sets, 111:197 

Pochhammer symbol. III :256 

Poincare, Henri, II :469 

POINT® 

features of, ff.T93n, 17:291n 
modeling with, 11:182 
screen shot of, II:287f 71:288/ 
use of, 11:179, 77:189, 77:286-287 
Point processes, 777:270-272 
Poisson-Merton jump process, 
distribution tails for, 

777:540-541 

Poisson-Merton jump variable, 

777:540 

Poisson processes 
compounded, 777:497 
homogeneous, 777:270-271 
and jumps, 7:93, 777:498, 777:540 
for modeling durations, 77:461 
as stochastic process, 777:496,777:497, 
777:506 

use of, 7:262, 7:315-316 
Poisson variables, distribution of, 
777:271/ 

Policy iteration algorithm (Howard 
algorithm), 77:676-677 
Polyhedral sets, 7:33, 7:33/ 

Polynomial fitting of trend stationary 
process, 77:702-703, 77:702/ 
Population profiles, in transition 
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Schwarz criterion, 11:387,11:389 
Scorecard Approach, 111:lOOn 
Scott model, 11:681-682 
SDMs (state dependent models), 1:342, 
1:351-352 

Secrecy in economics, 11:716 
Sector views, implementation of, 

11:182-184 
Securities 

alteration of cash flows of, 1:210 
arbitrage-free value of, 1:261 
baskets of, 1:483-484 
convertible, 1:462 


creating weights for, 11:102-104, 
11:103/ 

evaluation of, 1:50 
fixed income, 1:209-210,11:268 
formula for prices, 1:107 
non-Treasury, 1:222-223, !:223f 
of other countries, 1:226 
payoffs of, 1:49-50,1:116-117, 
1 : 121-122 

pricing European-style, 111:642 
primary 1:458 
primitive, 1:51 
private label (See MBS 

(mortgage-backed securities), 
nonagency) 
ranking of, 1:200-201 
redundant, 1:124 
risk-free, 1:115 
selection of, 1:225-226 
structured, 1:564,1:565-566 
supply and demand schedule of, 
111:626/ 

valuing credit-risky, 111:645 
variables on losses, 1:370 
Securities and Exchange Commission 
(SEC) 

filings with, 11:532 
Security levels, two-bond portfolio, 
!:382f 

Selection, adverse its. favorable, 

111:7 6-77 

Self-exciting TAR (SETAR) model, 
11:405 

Self-similarity, 111:278-280 
Selling price, expected future, 11:19-20 
Semimartingales, settings in change of 
time. 111:520-521 
Semi-parametric models 
tail in, 11:744-747 
Semiparametric/nonparametric 
methods, use of, 11:522 
Semivariance, as alternative to 
variance, 111:352 
Sensitivity, 111:643-644 
Sensitivity analysis, 1:192,11:235 
Sequences, 1:378,111:649-651,111:650 
Series, 11:299,11:386,11:507-508,11:512 
SETAR model, 11:425-426 
Set of feasible points, 1:28,1:31 
Set operations, defined. 111: 153-154 
Sets, 111:154 
Settlement date, 1:478 
Settlements, 1:526-528 
Shareholders 
common, 11:4 
equity of, 11:535 
negative equity of, 11:42 
preferred, 11:4—5 
statement of equity 11:541 
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Shares, repurchases of, 77:207,77:210/, 
77:211, 77:215-216, 77:227 
Sharpe, William, 7:75, 77:468, 77:474 
Sharpe-Lintner CAPM (SL-CAPM), 
7:66-67, 7:75, 7:78n 
Sharpe ratios, 7:40, 7:62,7:193 
Sharpe's single-index model, 7:74—75 
Shipping options, pricing of, 7:565 
Shortfall, expected, 7:385-386 
Short positions, 7:67 
Short rate models, 777:543-545, 

777:545-550, 777:552-554,777:557, 
777:604-610 

Short rates, 777:212-213, 777:541, 777:549, 
777:595-596 
Short selling 

constraints on, 7:67 
effect of constraints on, 7:17, 
7:191-192, 77:461 

effect of on efficient frontiers, 7:17/ 
example, 7:480^481 
as hedging route, 7:409 
in inefficient markets, 7:71/ 
and market efficiency, 7:70-71 
net portfolio value, 7:433f 
and OAS, 7:259 
and real estate, 77:396-397 
in reverse cash-and-carry trade, 
7:483 

for terminal wealth positions, 
7:460-461 

using futures, 7:432^33 
Shrinkage 

estimation of, 7:192, 7:194-195, 
7:201-202, 777:142 
optimal intensity of, 7:202n-203n 
use of estimators, 77:78 
6-algebra, 777:15,777:157 
6-fields 

defined, 777:508 

Signals (forecasting variables), use of 
in forecasting returns, 
77:111-112 

evaluation of, 77:111-112 
Similarity, selecting criteria for, 77:35 
Simulated average life, 777:12 
Simulations 
credit loss, 7:378-380 
defined, 777:637 
efficiency of, 7:384 
financial applications of, 777:642-645 
process of, 777:638 
technique of, 777:444^45 
Single firm models, 7:343-352 
Single monthly mortality rate (SMM), 
777:50-51, 777:58 
Skewness 

defined, 777:238-239 

and density function, 777:204-205 


indicating, 777:235 
and the Student's (-distribution, 
777:387 

treatment of stocks with, 7:41 
Sklar's theorem, 7:326,777:288 
Skorokhod embedding problem, 
777:504 

Slackness conditions, complementary, 
7:32 

SL-CAPM (Sharpe-Lintner CAPM), 
7:66-67,7:75,7:78n 

Slope elasticity measure, 777:315,777:317 
Smith, Adam, 77:468,77:472 
Smoothing, in nonparametric 
methods, 77:411^12 
Smoothing constant, 777:409—410 
Smoothly truncated stable distribution 
(STS distribution), 777:245-246 
Smooth transition AR (STAR) model, 
77:408^109 

Sobol sequences, pricing European 
call options with, 777:445-446 
Software 

case sensitivity of, 777:434 
comments in MATLAB code, 777:427 
developments in, 77:481-482 
macros in, 777:450^52, 777:450/ 
777:460, 777:466 
pseudo-random number 
generation, 777:646-647 
random number generation 
commands, 777:645-647 
RiskMetrics Group, 777:413, 777:644 
simulation, 777:651/ 
for stable distributions, 777:344, 
777:383 

stochastic programming 
applications, 777:126 
use of third party, 77:481 
Solutions, stability of, 77:652-653 
Solvers, in MATLAB, 777:435 
Space in probability, 777:156, 777:157 
Sparse tensor product, 77:673 
S&P 60 Canada index, 7:550-552, 

7:550f, 7:553/ 

Spearman, Charles, 77:153-154 
Spearman model, 77:153-154 
Spearman's rho, 7:327, 7:332, 7:336n 
Splits, in recursive partitioning, 
77:376-377 

Spot curves, with key rate shifts, 
777:313/ 777:314/ 

Spot price models, energy 

commodities, 7:556-557 
Spot rates 

arbitrage-free evolution of, 

7:557-558 

bootstrapping of curve, 7:217-220 
calculation of, 777:581 


and cash flows in OAS analysis, 
7:259 

changes in, 777:311,777:312/ 777:3127 
computing, 7:219-220 
under continuous compounding, 
777:571 

defined, 777:595 
effect of changes in, 7:514, 
777:313-314, 777:3147 
and forward rates, 777:572 
models of, 777:579-581 
paths of monthly, 777:9-10,777:107 
theoretical, 7:217 
Treasury, 7:217 
uses for, 7:222 

Spot yields, 777:565, 777:566, 777:571 
Spread analysis, 77:2907 
table of, 77:2907 

Spread duration, beta-adjusted, 7:394 
Spreads 

absolute and relative change 
volatility, 7:396/ 

change in, 7:392,7:393, 7:394/ 7:399 
determining for asset swaps, 
7:227-228 

level os. volatility of, 7:397 
measurement of, 77:336-337 
measure of exposure to change in, 
7:397 

nominal, use of, 777:5 
option-adjusted, 7:253-255, 7:254/ 
reasons for, 7:210-211 
relative vs. absolute modeling, 
7:393 

volatility os. level, 7:394—396, 

7:395/ 

zero-volatility, 777:5 
Squared Gaussian (SqG) model, 
777:547-548 

Square-root rule, 777:534 
SR-SARV model class, 77:370 
St. Petersburg paradox, 777:480 
Stability 
notion of, 77:667 

in Paretian distribution, 77:739-741 
property of, 77:740-741, 777:236-237, 
777:244-245 

Stable density functions, 777:236/ 
Stable Paretian model, a -stable 
distribution in, 77:748 
Standard Default Assumption (SDA) 
convention, 777:59-60,777:60/ 
Standard deviations 
and covariance, 7:9 
defined, 777:168 
mean, 777:353 
posterior, 7.T55 

related to variance, 777:203-204 
rolling, 77:362-363 
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Standard deviations ( Continued ) 
and scale of possible outcomes, 
1/1168/ 
for tail. III :341 

Standard errors. See also errors 
for average estimators, 777:400-402 
defined, 777:399 
estimation of, 777:640 
of the estimator, 777:400 
for exponentially weighted moving 
averages (EWMA), 777:411-412 
reduction of, 777:648 
Standard normality, testing for, 
777:366-367 

Standard North American contract 
(SNAC), 7:529 
Standard & Poors 500 
auto correlation functions of, 77:3897 
cointegration regression, 77:3907 
daily close, 777:402/ 
daily returns (2003), 7/7:326/ 
distributions of, 7/7:384/ 
error correction model, 77:3917 
historical distributions of, 777:390/ 
index and dividends (1962-2006), 
77:388/ 

parameter estimates of, 777:3857, 
777:3877, 7/7:3887 
return and excess return data 
(2005), 77:316-3177 
stationarity test for, 77:3897 
time scaling of, 777:383/ 
worst returns for, 777:3827 
State dependent models (SDMs), 7:342, 
7:351-352 

Statement of stockholders' equity, 
77:541 

State price deflators 
defined, 7:103,7:129-130 
determining, 7:118-119, 7:124 
formulas for, 7:107-108, 7:109-110 
in multiperiod settings, 7:105 
and trading strategy, 7:106 
State prices 
and arbitrage, 7:55-56 
condition, 7:54 
defined, 7:101-102 
and equivalent martingale 
measures, 7:133-134 
vectors, 7:53-55,7:58, 7:119 
States, probabilities of, 7:115 
States of the world, 7:457-458,7:459, 
77:306,77:308,77:720 
State space, 7:269n 
Static factor models, 77:150 
Stationary series, trend vs. difference, 
77:512-513 

Stationary univariate moving average, 
77:506 


Statistical concepts, importance of, 
77:126-127 

Statistical factors, 77:177 
Statistical learning, 77:298 
Statistical methodology, EWMA, 
777:409 

Statistical tests, inconsistencies in, 
77:335-336 

Statistics, 77:387, 77:499 
Stein paradox, 7:194 
Stein-Stein model, 77:682 
Step-up callable notes, valuing of, 
7:251-252 

Stochastic, defined, 777:162 
Stochastic control (SC), 777:124 
Stochastic differential equations 
(SDEs) 

binomial/trinomial solutions to, 
777:610-613 

with change of time methods, 
777:523 

defined, 77:658 
examples of, 777:523-524 
generalization to several 

dimensions with, 777:490^91 
intuition behind, 777:486-487 
modeling states of the world with, 
777:127 

for MRAM equation, 777:525-526 
setting of change of time, 777:521 
solution of, /77:491M93 
steps to definition, 777:487 
usefulness of, 777:493 
use of, 77:295,777:485-486, 

777:489-490,777:536, 777:603, 
777:619 

Stochastic discount factor, 7:57-58 
Stochastic integrals 
defined, 777:481-482 
intuition behind, 777:473-475 
in Ito processes, 777:487 
properties of, 777:482^83 
steps in defining, 777:474^75 
Stochastic processes 
behavior of, 7:262 
characteristic function of, 777:496 
characteristics of, 77:360 
continuous-time, 777:496, 777:506 
defined, 7:263-264, 7:269n, 77:518, 
777:476, 777:496 
discrete time, 77:501 
properties of, 77:515 
representation of, 77:514-515 
and scaling, 777:279 
specification of, 77:692-693 
Stochastic programs 
features of, 777:124, 777:132 
Stochastic time series, linear, 
77:401^02 


Stochastic volatility models (SVMs) 
with change of time, 777:520 
continuous-time, 777:656 
discrete, 777:656-657 
importance of, 777:658 
for modeling derivatives, 

777:655-656 

multifactor models for, 7/7:657-658 
and subordinators, 777:521-522 
use of, 777:653, 777:656 
Stock indexes 
interim cash flows in, 7:482 
risk control against, 77:262-263 
Stock markets 
bubbles in, 77:386 
as complex system, 77:47^8 
1987 crash, 77:521, 777:585-586 
dynamic relationships among, 
77:393-396 

effects of crises, 777:233-234 
variables effects on different sectors 
of, 77:55 

Stock options, valuation of long-term, 
7:449 

Stock price models 
binomial, 777:161,777:171-173,777:173/ 
multinomial, 777:180-182, 777:181/, 
777:184 

probability distribution of 
two-period, 777:1817 
Stock prices 
anomalies in, 77:1117 
behavior of, 77:58 
correlation of, 7:92-93 
and dividends, 77:4-5 
lognormal, 777:655-656 
processes of, 7:125 

Stock research, main areas of, 77:244f 
Stock returns, 77:56,77:159/ 

Stocks 

batting average of, 77:99, 77:99/ 
characteristics of, 77:204 
common, 77:4, 77:316-322 
cross-sectional, 77:197 
defined, 77:106 
defining parameters of, 77:49 
determinants of, 77:245/ 
execution price of, 777:626 
fair value os. expected return, 77:13/ 
finding value for XYZ, Inc., 77:31f 
information coefficient of, 77:98/ 
information sources for, 77:90/ 
measures of consistency, 77:99-100 
mispriced, 77:6-7 

quantitative research metrics tests, 
77:97-99 

quintile spread of, 77:9// 
relative ranking of, 7:196-197 
review of correlations, 77:101/ 
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sale/terminal price of, 77: 5 
short selling of, 7.-432M33 
similarities between, 77:245/ 
sorting of, 77:215 
testing of, 77:95, 77:96/ 
that pay no dividend, 77:17 
use of, 77:90 

valuation of, 77:6, 77:8-9, 77:14, 
77:18-19 

weightings of, 77:101/ 

Stock selection 
models for, 77:197 
in quantitative equity investment 
process, 77:105 
quantitative model, 77:94-95 
for retail sector, 77:94/ 
strategies for, 77:195 
tree for, 77:379-381, 77:380/ 

Stopping times, 77:685 
Straontonovich, Ruslan, 77:470 
Strategies, backtesting of, 77:235-236 
Stress tests, 7:412,7:417,7:418, 777:93, 
777:596-597 

Strike price, 7:509, 7:514 
Strong Law of Large Numbers 

(SLLN), 7:270n, 777:263-264 
Structural breaks, 7:167, 777:274-275 
Student's f distribution 
applications to stock returns, 
777:215-216 

and AVaR, 777:334-335 
classical, 77:734-738 
density function of, 77:735 
discussion of, 777:213-216 
distribution function of, 777:215/ 
for downside risk estimation, 
777:386-387 

fitting and simulation of, 77:737-738 
heavy tails of, 7:160n, 7:176, 

77:747-748, 77:751, 777:227-228 
limitations of, 77:736 
in modeling credit risk, 7:387-388 
normals representation in, 
7:177-178 

skewed, 77:736-737, 77:753n 
skewness of, 777:390 
standard deviation of, 7:173n 
symmetry of, 777:387 
tails of, 777:392 

use of, 7:153-154, 7:172n, 777:234 
Student's t-test, 77:219 
Sturge's rule, 77:495 
Style analysis, 77:189 
Style factors, 77:247 
Style indexes, 77:48 
Stylized facts, 77:503-504 
Subadditivity property, 777:328 
Subordinated processes, 7:186n, 
777:277, 777:521-522 


Successive over relaxation (SOR) 
method, 77:677 

Summation stability property 
(Gaussian distribution), 
77:732-733 

Supervisory Capital Assessment 
Program, 7:300, 7:412 
Support, defined, 777:200 
Survey bias, 7:293 
Survival probability, 7:533-535 
Swap agreements, 7:434, 7:435-436n 
Swap curves, 7:226, 77:275-276 
Swap rates, 7:226,777:536/ 

Swaps 

with change of time method, 777:522 
covariance/correlation, 7:547-548, 
7:549-550, 7:552 
duration-matched, 7:285 
freight rate, 7:558 

modeling and pricing of, 7:548-550 
summary of studies on, 7:546f 
valuing of, 7:434-435 
Swap spread (SS) risk, 77:278,77:2787 
Swaptions, 7:502-503, 777:550 
Synergies, in conglomerates, 77:43—44 
Systematic risk, 77:290 
Systems 

homogenous, 77:624 
linear, 77:624 
types of, 77:47,77:58 

Tailing the hedge, defined, 7:433 
Tail losses 

in loss functions, 777:369-370 
Tail probability, 777:320 
Tail risk, 7:377, 7:385, 77:752 
Tails 

across assets through time, 
77:735-736 

behavior of in operational losses, 
777:111-112 

in density functions, 777:203 
dependence, 7:327-328, 7:387 
Gaussian, 777:98-99,777:260 
heavy, 77:734-744, 777:238 
modeling heaviness of, 77:742-743 
for normal and STS distributions, 
777:2467 

power tail decay property, 77:739, 
777:244 

properties of, 777:261-262 
tempering of, 77:741 
Takeovers, probability of, 7:144-145 
Tangential contour lines, 7:29-30,7:30/ 
7:32/ 

Tanker market, 7:565 
TAR-F test, 77:426 

TAR(l) series, simulated time plot of, 
77:404/ 


Tatonnement, concept of, 77:468 
Taxes 

and bonds, 7:226 
capital gains, 77:73 
cash, 77:573 

for cash/futures transactions, 7:484 
complexity of, 77:73-74 
deferred income, 77:535, 77:538 
effect on returns, 77:83-84, 77:84, 
77:85n 

in financial statements, 77:541 
impact of, 7:286-287 
incorporating expense of, 77:73-75 
managing implications of, 777:146 
and Treasury strips, 7:218 
Tax policy risk, 77:282-283 
Technology, effect of on relative 
values, 77:37 

Telescoping futures strategy, 7:433 
Tempered stable distributions 
discussions of, 777:246-252, 
777:384-386 

generalized (GTS), 777:249 
Kim-Rachev (KRTS), 777:251-252 
modified (MTS), 777:249-250 
normal (NTS), 777:250-251 
probability densities of, 777:247/ 
777:248/ 777:250/ 777:252/ 
rapidly decreasing (RDTS), 777:252 
tempering function in, 777:254, 
777:258n 

Tempered stable processes, 

777:499-501, 777:5007, 777:512-517 
Tempering functions, 777:254, 777:2557 
Templates, for data storage, 77:204 
Terminal profit, options and forwards, 
7:438/, 7:439/ 

Terminal values, 77:45 
Terminology 

of delinquency, default and loss, 
777:56 

of prepayment, 777:49-50 
standard, of tree models, 77:376 
Term structure 

in contiguous time, 777:572-573 
continuous time models of, 
777:570-571 
defined, 777:560 
eclectic theory of, 777:570 
of forward rates, 777:586 
mathematical relationships of, 
777:562 

modeling of, 7:490-494,777:560 
of partial differential equations, 
777:583-584 

in real world, 777:568-570 
Term structure modeling 
applications of, 777:584-586 
arbitrage-free, 777:594 
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Term structure modeling ( Continued) 
calibration of. III:580-581 
discount function in. III: 565 
discussion of. III: 560-561 
Term structure models 
approaches to. III: 603-604 
defined, 1: 262,1:263 
discrete time. III: 562-563 
discussion of. III: 561-562 
of interest rates, 1:314 
internal consistency checks for, 

III: 581 

with no mean reversion. III: 613-616 
for OAS, 1:265-267 
quantitative. III: 563 
static vs. dynamic. III: 561-562 
Term structures. III: 567-568, III: 570, 
111:579,111:587 

Tests 

Anderson-Darling (AD), 111:112-113 
BDS statistic, 11:423-424,11:427 
bispectral, 1TA22-A23 
cointegration, 11:708-710 
Kolmogorov-Smirnov (KS), 
111:112-113 

monotonic relation (MR), 11:219 
nonlinearity, II:426-427, 11:427f 
nonparametric, 11:422^424 
out-of-sample vs. in-sample, 

11:236 

parametric, 11:424^26 
RESET, 11:424-425 
run tests. III: 364 
threshold, 11:425-426 
for uniformity, 111:366 
TEV (tracking error volatility), 11:180, 
11:186,11:272-274,11:286-287 
Theil-Sen regression algorithm, 
11:440-442,11:443-446, 
ll:444f 

The Internal Measurement Approach 
(BIS), III.TOOn 

Theoretical value, determination of, 
111 : 10-11 

Theorie de la Speculation (The Theory of 
Speculation) (Bachelier), 

11:121-122,11:469 

Theory of point processes, 11:470-471 
Three Mile Island power plant crisis, 
11:51-52 

Three-stage growth model, 11:9-10 
Threshold autoregressive (TAR) 
models, 11:404—408 
Thresholds, 11:746-747 
Through the cycle, defined, 1:302-303, 
1:309-310 

Thurstone, Louis Leon, 11:154 
Tick data. See high-frequency data 
(HFD) 


Time 

in differential equations, 11:643-644 
physical vs. intrinsic scales of, 11:742 
use of for financial data, 11:546-547 
Time aggregation, 11:369 
Time decay, 1:509,1:513,1:521/ 

Time dependency, capture of, 

11:362-363 

Time discretization, 11:666,11:679 
Time increments 
models of, 1:79 
in parameter estimation, 1:83 
Time intervals, size of, 11:300-301 
Time lags, 11:299-300 
Time points, spacing of, 11:501 
Time premiums, 1:485 
Time series 

autocorrelation of, 11:331 
causal, 11:504 
concepts of, 11:501-503 
continuity of, 1:80 
defined, 11:501-502,11:519 
fractal nature of. III: 480 
importance of, 11:360 
multivariate, 11:502 
stationary, 11:502 
stationary/nonstationary, 11:299 
for stock prices, 11:296 
Time to expiry, 1:513 
Time value, 1:513,1:513/, II:595-596 
TIPS (Treasury inflation-protected 
securities) 

and after-tax inflation risk, 1:287 
apparent real yield premium, 1:293/ 
effect of inflation and flexible price 
CPI, 1:292/ 
features of, 1:277 
and flexible price CPI, 1:291/ 
and inflation, 1:290,1:294 
performance link with short-term 
inflation, 1:291-292 
real yields on, 1:278 
spread to nominal yield curve, 
1:281/ 

volatility of, 1:288-290,1:294 
vs. real yield, 1:293-294 
10-year data, 1:279-280 
yield of, 1:284 
yields from, 1:278 

TLF model, strengths of. III:388-389 
Total asset turnover ratio, 11:558 
Total return reports, II:237t 
Total return swaps, 1:540-542, 
1:541-542 

Trace test statistic, 11:392 
Tracking error 

actual vs. predicted, 11:69 
alternate definitions of, 11:67-68 
defined, II: 115,11:119 


estimates of future, 11:69 
as measure of consistency, 11:99-100 
reduction of, II:262-263 
standard definition, 11:67 
with TIPS, 1:293 

Tracking error volatility (TEV). See 
TEV (tracking error volatility) 
Trade optimizers, role of, II: 116-117 
Trades 

amount needed for market impact, 
III: 624 

cash-and-carry, 1:487 
crossing of, 11:75 

importance of execution of. III: 623, 
III: 631 

measurement of size. Ill: 628 
in portfolio construction, 11:104, 
11:116-117 

round-trip time of, 11:451 
size effects of, 111:372,111:630 
speed of, 11:105 
timing of. III:628-629 
Trading costs, 11:118, III:627-628, 

Ill:631-632 

Trading gains, defined, 1:122,1.T23 
Trading horizons, extending. Ill: 624 
Trading lists, ll:289f 
Trading strategies 
backtesting of, II:236-237 
categories of, 11:195 
in continuous-state, 

continuous-time, 1:122 
development of factor-based, 
11:197-198,11:211 
factor-based, 11:195, II:232-235 
factor weights in, 11:233/ 
in multiperiod settings, 1:105 
risk to, 11:198-200 
self-financing, 1:126-127,1:136 
Trading venues, electronic, 11:57 
Training windows, moving, 11:713-714 
Tranches, III: 38,111:391, Ill:45 
Transaction costs 
in backtesting, 11:235 
in benchmarking, 11:67 
components of, 11:119 
consideration of, 11:64,11:85-86n 
dimensions of, 111:631 
effect of, 1:483 
figuring, 11:85n 
fixed, 11:72-73 
forecasting of, 11:113-114 
incorporation of, 11:69-73,11:84 
international. III: 629 
linear, 11:70 

and liquidity. III:624-625 
managing, II1.T46 
measurement of, 111:626 
piecewise-linear, 11:70-72,11:71/ 
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quadratic, 11:7 2 
in risk modeling, II: 693 
types of. III: 623 
Transformations, nonlinear, 

III: 630-631 

Transition probabilities, 1:368, J:381f 
Treasuries 

correlations of, lll:405f 
covariance matrix of, fff:406f 
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deterministic, 11:383 
in financial time series, 11:504 
and integrated series, 11:512-514 
stochastic, 11:383,11:384 
Treynor-Black model., I:203n 
Trinomial stochastic models, 11:11-12 
Truncated Levy flight (TLF), III:382, 
III:384-386 
IDD in, 111:386 
time scaling of, 111:385/ 

Truncation, III:385-386 
Truth in Savings Act, 11:615 
T-statistic, ll:240n, 11:336,11:350,11:390 
Tuple, defined. 111: 157 
Turnover 

assessment of, 111:68 
defined, 111:66 
in MBSs, 111:48 
in portfolios, 11:234,11:235 
Two beta trap, 1:74—77 
Two-factor models. 111:553-554 
Two-stage growth model, II:9 

U.K. index-linked gilts, tax treatment 
of, 1:287 
Uncertainties 

and Bayesian statistics, 1:140 
in measurement processes, 11:367 
modeling of, 11:306,111.T24, 

III T31-132 

and model risk, 11:729 
quantification of, 1:101 
representation of. Ill..128 
time behavior of, 11:359 
Uncertainty sets 

effect of size of, III.T43 
in portfolio allocation, 11:80 
selection of. III:T40-141 
structured, 111:143-144 
in three dimensions, 11:81/ 
use of, I11.T38,111:140 
Uncertain volatility model, 11:673-674 
Underperformance, finding reasons 
for, 11:118 

Underwater, on homeowner's equity, 
111:73 

Unemployment rate 
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